Re: [PATCH v5 2/2] powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores

2018-08-06 Thread kbuild test robot
Hi Gautham,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc8 next-20180806]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/powerpc-Detect-the-presence-of-big-cores-via-ibm-thread-groups/20180807-075133
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-pasemi_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/smp.c:1178:30: error: 'smallcore_smt_mask' defined but 
>> not used [-Werror=unused-function]
static const struct cpumask *smallcore_smt_mask(int cpu)
 ^~
   cc1: all warnings being treated as errors

vim +/smallcore_smt_mask +1178 arch/powerpc/kernel/smp.c

  1177  
> 1178  static const struct cpumask *smallcore_smt_mask(int cpu)
  1179  {
  1180  return cpu_smallcore_sibling_mask(cpu);
  1181  }
  1182  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] powerpc/tm: Print 64-bits MSR

2018-08-06 Thread Michael Neuling
On Mon, 2018-08-06 at 21:32 -0300, Breno Leitao wrote:
> On a kernel TM Bad thing program exception, the MSR is not being properly
> displayed, since it dumps a 32-bits value. MSR is a 64 bits register for
> all platforms that have HTM enabled.
> 
> This patch dumps the MSR value as 64-bits instead of 32 bits.

(sorry I was distracted when you asked me about this before offline...)

I think you might need to clear up in the description why you are changing
  reason -> msr.

Mikey

> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/traps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0e17dcb48720..cd561fd89532 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1402,7 +1402,7 @@ void program_check_exception(struct pt_regs *regs)
>   goto bail;
>   } else {
>   printk(KERN_EMERG "Unexpected TM Bad Thing exception
> "
> -"at %lx (msr 0x%x)\n", regs->nip, reason);
> +"at %lx (msr 0x%lx)\n", regs->nip, regs->msr);
> 
>   die("Unrecoverable exception", regs, SIGABRT);
>   }
>   }


[PATCH] powerpc/tm: Print 64-bits MSR

2018-08-06 Thread Breno Leitao
On a kernel TM Bad thing program exception, the MSR is not being properly
displayed, since it dumps a 32-bits value. MSR is a 64 bits register for
all platforms that have HTM enabled.

This patch dumps the MSR value as 64-bits instead of 32 bits.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0e17dcb48720..cd561fd89532 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1402,7 +1402,7 @@ void program_check_exception(struct pt_regs *regs)
goto bail;
} else {
printk(KERN_EMERG "Unexpected TM Bad Thing exception "
-  "at %lx (msr 0x%x)\n", regs->nip, reason);
+  "at %lx (msr 0x%lx)\n", regs->nip, regs->msr);
die("Unrecoverable exception", regs, SIGABRT);
}
}
-- 
2.16.3



Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Tue, 2018-08-07 at 02:45 +0300, Michael S. Tsirkin wrote:
> > OK well, assuming Christoph can solve the direct case in a way that
> > also work for the virtio !iommu case, we still want some bit of logic
> > somewhere that will "switch" to swiotlb based ops if the DMA mask is
> > limited.
> > 
> > You mentioned an RFC for that ? Do you happen to have a link ?
> 
> No but Christoph did I think.

Ok I missed that, sorry, I'll dig it out. Thanks.

Cheers,
Ben.




Re: [PATCH v2 6/6] fsl_pmc: update device bindings

2018-08-06 Thread Scott Wood
On Mon, 2018-04-16 at 10:13 -0500, Rob Herring wrote:
> On Wed, Apr 11, 2018 at 02:35:51PM +0800, Ran Wang wrote:
> > From: Li Yang 
> 
> Needs a commit msg and the subject should give some indication of what 
> the update is. And also start with "dt-bindings: ..."

This patch should also come before the patches that use the new binding.

> > -  fsl,mpc8536-pmc: Sleep specifiers consist of three cells, the third of
> > -  which will be ORed into PMCDR upon suspend, and cleared from PMCDR
> > -  upon resume.  The first two cells are as described for fsl,mpc8578-pmc.
> > -  This sleep controller only supports disabling devices during system
> > -  sleep, or permanently.
> > -
> > -  fsl,mpc8548-pmc: Sleep specifiers consist of one or two cells, the
> > -  first of which will be ORed into DEVDISR (and the second into
> > -  DEVDISR2, if present -- this cell should be zero or absent if the
> > -  hardware does not have DEVDISR2) upon a request for permanent device
> > -  disabling.  This sleep controller does not support configuring devices
> > -  to disable during system sleep (unless supported by another compatible
> > -  match), or dynamically.
> 
> You seem to be breaking backwards compatibility with this change. I 
> doubt that is okay on these platforms.

I don't think the sleep specifier stuff ever got used.

-Scott



Re: powerpc/e200: Skip tlb1 entries used for kernel mapping

2018-08-06 Thread Scott Wood
On Tue, Jul 24, 2018 at 11:29:45AM +0530, Bharat Bhushan wrote:
> E200 have TLB1 only and it does not have TLB0.
> So TLB1 are used for mapping kernel and user-space both.
> TLB miss handler for E200 does not consider skipping TLBs
> used for kernel mapping. This patch ensures that we skip
> tlb1 entries used for kernel mapping (tlbcam_index).

How much more is needed to get e200 working?  What was this tested on?

> Signed-off-by: Bharat Bhushan 
> ---
>  arch/powerpc/kernel/head_fsl_booke.S | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
> b/arch/powerpc/kernel/head_fsl_booke.S
> index bf4c602..951fb96 100644
> --- a/arch/powerpc/kernel/head_fsl_booke.S
> +++ b/arch/powerpc/kernel/head_fsl_booke.S
> @@ -801,12 +801,28 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_BIG_PHYS)
>   /* Round robin TLB1 entries assignment */
>   mfspr   r12, SPRN_MAS0
>  
> + /* Get first free tlbcam entry */
> + lis r11, tlbcam_index@ha
> + lwz r11, tlbcam_index@l(r11)

The existing handler already loads tlbcam_index and uses that when
wrapping.  What specifically is causing that to not work (perhaps it's
just a matter of initializing NV when tlbcam_index changes?), and why
does this patch leave that code in place?

> +
> + /* Extract MAS0(NV) */
> + andi.   r13, r12, 0xfff
> + cmpw0, r13, r11
> + blt 0, 5f
> + b   6f
> +5:

Why these two instructions instead of "bge 6f"?  If it's for branch
prediction, does e200 pay attention to static hints?  If it doesn't,
you could move the wrap code out-of-line.

> + /* When NV is less than first free tlbcam entry, use first free
> +  * tlbcam entry for ESEL and set NV */
> + rlwimi  r12, r11, 16, 4, 15
> + addir11, r11, 1
> + rlwimi  r12, r11, 0, 20, 31
> + b   7f

The 4-argument form of rlwimi is easier to read.

BTW, The TLB miss handler would be simpler/faster if you reserve the
upper entries rather than the lower entries.  Then you would just have
one value to check (instead of using TLB1CFG[NENTRY]) to see if you wrap
back to zero.

-Scott


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > > As I said replying to Christoph, we are "leaking" into the interface
> > > > > something here that is really what's the VM is doing to itself, which
> > > > > is to stash its memory away in an inaccessible place.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > 
> > > > I think Christoph merely objects to the specific implementation.  If
> > > > instead you do something like tweak dev->bus_dma_mask for the virtio
> > > > device I think he won't object.
> > > 
> > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> > > 
> > > So, something like that would be a possibility, but the problem is that
> > > the current virtio (guest side) implementation doesn't honor this when
> > > not using dma ops and will not use dma ops if not using iommu, so back
> > > to square one.
> > 
> > Well we have the RFC for that - the switch to using DMA ops unconditionally 
> > isn't
> > problematic itself IMHO, for now that RFC is blocked
> > by its perfromance overhead for now but Christoph says
> > he's trying to remove that for direct mappings,
> > so we should hopefully be able to get there in X weeks.
> 
> That would be good yes.
> 
>  ../..
> 
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > > *vdev)
> > >  * the DMA API if we're a Xen guest, which at least allows
> > >  * all of the sensible Xen configurations to work correctly.
> > >  */
> > > -   if (xen_domain())
> > > +   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
> > > return true;
> > >  
> > > return false;
> > 
> > Right but can't we fix the retpoline overhead such that
> > vring_use_dma_api will not be called on data path any longer, making
> > this a setup time check?
> 
> Yes it needs to be a setup time check regardless actually !
> 
> The above is broken, sorry I was a bit quick here (too early in the
> morning... ugh). We don't want the arch to go override the dma ops
> every time that is callled.
> 
> But yes, if we can fix the overhead, it becomes just a matter of
> setting up the "right" ops automatically.
> 
> > > (Passing the dev allows the arch to know this is a virtio device in
> > > "direct" mode or whatever we want to call the !iommu case, and
> > > construct appropriate DMA ops for it, which aren't the same as the DMA
> > > ops of any other PCI device who *do* use the iommu).
> > 
> > I think that's where Christoph might have specific ideas about it.
> 
> OK well, assuming Christoph can solve the direct case in a way that
> also work for the virtio !iommu case, we still want some bit of logic
> somewhere that will "switch" to swiotlb based ops if the DMA mask is
> limited.
> 
> You mentioned an RFC for that ? Do you happen to have a link ?

No but Christoph did I think.

> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.
> 
> Cheers,
> Ben.
> 


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > Right, we'll need some quirk to disable balloons  in the guest I
> > > > suppose.
> > > > 
> > > > Passing something from libvirt is cumbersome because the end user may
> > > > not even need to know about secure VMs. There are use cases where the
> > > > security is a contract down to some special application running inside
> > > > the secure VM, the sysadmin knows nothing about.
> > > > 
> > > > Also there's repercussions all the way to admin tools, web UIs etc...
> > > > so it's fairly wide ranging.
> > > > 
> > > > So as long as we only need to quirk a couple of devices, it's much
> > > > better contained that way.
> > > 
> > > So just the balloon thing already means that yes management and all the
> > > way to the user tools must know this is going on. Otherwise
> > > user will try to inflate the balloon and wonder why this does not work.
> > 
> > There is *dozens* of management systems out there, not even all open
> > source, we won't ever be able to see the end of the tunnel if we need
> > to teach every single of them, including end users, about platform
> > specific new VM flags like that.
> > 
> > .../...
> 
> In the end I suspect you will find you have to.

Maybe... we'll tackle this if/when we have to.

For balloon I suspect it's not such a big deal because once secure, all
the guest memory goes into the secure memory which isn't visible or
accounted by the hypervisor, so there's nothing to steal but the guest
is also using no HV memory (other than the few "non-secure" pages used
for swiotlb and a couple of other kernel things).

Future versions of our secure architecture might allow to turn
arbitrary pages of memory secure/non-secure rather than relying on a
separate physical pool, in which case, the balloon will be able to work
normally.

Cheers,
Ben.




Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Tue, 2018-08-07 at 08:13 +1000, Benjamin Herrenschmidt wrote:
> 
> OK well, assuming Christoph can solve the direct case in a way that
> also work for the virtio !iommu case, we still want some bit of logic
> somewhere that will "switch" to swiotlb based ops if the DMA mask is
> limited.
> 
> You mentioned an RFC for that ? Do you happen to have a link ?
> 
> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.

Actually... I can think of a simpler option (Anshuman, didn't you
prototype this earlier ?):

Since that limitaiton of requiring bounce buffering via swiotlb is true
of any device in a secure VM, whether it goes through the iommu or not,
the iommu remapping is essentially pointless.

Thus, we could ensure that the iommu maps 1:1 the swiotlb bounce buffer
(either that or we configure it as "disabled" which is equivalent in
this case).

That way, we can now use the basic swiotlb ops everywhere, the same
dma_ops (swiotlb) will work whether a device uses the iommu or not.

Which boils down now to only making virtio use dma ops, there is no
need to override the dma_ops.

Which means all we have to do is either make xen_domain() return true
(yuck) or replace that one test with arch_virtio_force_dma_api() which
resolves to xen_domain() on x86 and can do something else for us.

As to using a virtio feature flag for that, which is what Christoph
proposes, I'm not too fan of it because this means effectively exposing
this to the peer, ie the interface. I don't think it belong there. The
interface, from the hypervisor perspective, whether it's qemu, vmware,
hyperz etc... have no business knowing how the guest manages its dma
operations, and may not even be aware of the access limitations (in our
case they are somewhat guest self-imposed).

Now, if this flag really is what we have to do, then we'd probably need
a qemu hack which will go set that flag on all virtio devices when it
detects that the VM is going secure.

But I don't think that's where that information "need to use the dma
API even for direct mode" belongs.

Cheers,
Ben.






Re: Build regressions/improvements in v4.17-rc1

2018-08-06 Thread Andrew Morton
On Mon, 6 Aug 2018 12:39:21 +0200 Geert Uytterhoeven  
wrote:

> CC Dan, Michael, AKPM, powerpc
> 
> On Mon, Apr 16, 2018 at 3:10 PM Geert Uytterhoeven  
> wrote:
> > Below is the list of build error/warning regressions/improvements in
> > v4.17-rc1[1] compared to v4.16[2].
> 
> I'd like to point your attention to:
> 
> >   + warning: vmlinux.o(.text+0x376518): Section mismatch in reference from 
> > the function .devm_memremap_pages() to the function 
> > .meminit.text:.arch_add_memory():  => N/A
> >   + warning: vmlinux.o(.text+0x376d64): Section mismatch in reference from 
> > the function .devm_memremap_pages_release() to the function 
> > .meminit.text:.arch_remove_memory():  => N/A

hm.  Dan isn't around at present so we're on our own with this one.

x86 doesn't put arch_add_memory and arch_remove_memory into __meminit. 
x86 does

#ifdef CONFIG_MEMORY_HOTPLUG
int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
bool want_memblock)
{
...


So I guess powerpc should do that as well?


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > As I said replying to Christoph, we are "leaking" into the interface
> > > > something here that is really what's the VM is doing to itself, which
> > > > is to stash its memory away in an inaccessible place.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > 
> > > I think Christoph merely objects to the specific implementation.  If
> > > instead you do something like tweak dev->bus_dma_mask for the virtio
> > > device I think he won't object.
> > 
> > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> > 
> > So, something like that would be a possibility, but the problem is that
> > the current virtio (guest side) implementation doesn't honor this when
> > not using dma ops and will not use dma ops if not using iommu, so back
> > to square one.
> 
> Well we have the RFC for that - the switch to using DMA ops unconditionally 
> isn't
> problematic itself IMHO, for now that RFC is blocked
> by its perfromance overhead for now but Christoph says
> he's trying to remove that for direct mappings,
> so we should hopefully be able to get there in X weeks.

That would be good yes.

 ../..

> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > *vdev)
> >  * the DMA API if we're a Xen guest, which at least allows
> >  * all of the sensible Xen configurations to work correctly.
> >  */
> > -   if (xen_domain())
> > +   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
> > return true;
> >  
> > return false;
> 
> Right but can't we fix the retpoline overhead such that
> vring_use_dma_api will not be called on data path any longer, making
> this a setup time check?

Yes it needs to be a setup time check regardless actually !

The above is broken, sorry I was a bit quick here (too early in the
morning... ugh). We don't want the arch to go override the dma ops
every time that is callled.

But yes, if we can fix the overhead, it becomes just a matter of
setting up the "right" ops automatically.

> > (Passing the dev allows the arch to know this is a virtio device in
> > "direct" mode or whatever we want to call the !iommu case, and
> > construct appropriate DMA ops for it, which aren't the same as the DMA
> > ops of any other PCI device who *do* use the iommu).
> 
> I think that's where Christoph might have specific ideas about it.

OK well, assuming Christoph can solve the direct case in a way that
also work for the virtio !iommu case, we still want some bit of logic
somewhere that will "switch" to swiotlb based ops if the DMA mask is
limited.

You mentioned an RFC for that ? Do you happen to have a link ?

It would be indeed ideal if all we had to do was setup some kind of
bus_dma_mask on all PCI devices and have virtio automagically insert
swiotlb when necessary.

Cheers,
Ben.




Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > As I said replying to Christoph, we are "leaking" into the interface
> > > something here that is really what's the VM is doing to itself, which
> > > is to stash its memory away in an inaccessible place.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> 
> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.

Well we have the RFC for that - the switch to using DMA ops unconditionally 
isn't
problematic itself IMHO, for now that RFC is blocked
by its perfromance overhead for now but Christoph says
he's trying to remove that for direct mappings,
so we should hopefully be able to get there in X weeks.

> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.
> 
> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think ?
> 
> Which brings me back to this option, which may be the simplest and
> avoids the overhead of the proposed series (I found the series to be a
> nice cleanup but retpoline does kick us in the nuts here).
> 
> So what about this ?
> 
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> *vdev)
>  * the DMA API if we're a Xen guest, which at least allows
>  * all of the sensible Xen configurations to work correctly.
>  */
> -   if (xen_domain())
> +   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
> return true;
>  
> return false;

Right but can't we fix the retpoline overhead such that
vring_use_dma_api will not be called on data path any longer, making
this a setup time check?


> (Passing the dev allows the arch to know this is a virtio device in
> "direct" mode or whatever we want to call the !iommu case, and
> construct appropriate DMA ops for it, which aren't the same as the DMA
> ops of any other PCI device who *do* use the iommu).

I think that's where Christoph might have specific ideas about it.

> Otherwise, the harder option would be for us to hack so that
> xen_domain() returns true in our setup (gross), and have the arch code,
> when it sets up PCI device DMA ops, have a gross hack to identify
> virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
> different ops at that point.
> 
> As for those "special" ops, they are of course just normal swiotlb ops,
> there's nothing "special" other that they aren't the ops that other PCI
> device on that bus use.
> 
> Cheers,
> Ben.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > As I said replying to Christoph, we are "leaking" into the interface
> > something here that is really what's the VM is doing to itself, which
> > is to stash its memory away in an inaccessible place.
> > 
> > Cheers,
> > Ben.
> 
> I think Christoph merely objects to the specific implementation.  If
> instead you do something like tweak dev->bus_dma_mask for the virtio
> device I think he won't object.

Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?

So, something like that would be a possibility, but the problem is that
the current virtio (guest side) implementation doesn't honor this when
not using dma ops and will not use dma ops if not using iommu, so back
to square one.

Christoph seems to be wanting to use a flag in the interface to make
the guest use dma_ops which is what I don't understand.

What would be needed then would be something along the lines of virtio
noticing that dma_mask isn't big enough to cover all of memory (which
isn't something generic code can easily do here for various reasons I
can elaborate if you want, but that specific test more/less has to be
arch specific), and in that case, force itself to use DMA ops routed to
swiotlb.

I'd rather have arch code do the bulk of that work, don't you think ?

Which brings me back to this option, which may be the simplest and
avoids the overhead of the proposed series (I found the series to be a
nice cleanup but retpoline does kick us in the nuts here).

So what about this ?

--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
*vdev)
 * the DMA API if we're a Xen guest, which at least allows
 * all of the sensible Xen configurations to work correctly.
 */
-   if (xen_domain())
+   if (xen_domain() || arch_virtio_direct_dma_ops(>dev))
return true;
 
return false;

(Passing the dev allows the arch to know this is a virtio device in
"direct" mode or whatever we want to call the !iommu case, and
construct appropriate DMA ops for it, which aren't the same as the DMA
ops of any other PCI device who *do* use the iommu).

Otherwise, the harder option would be for us to hack so that
xen_domain() returns true in our setup (gross), and have the arch code,
when it sets up PCI device DMA ops, have a gross hack to identify
virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
different ops at that point.

As for those "special" ops, they are of course just normal swiotlb ops,
there's nothing "special" other that they aren't the ops that other PCI
device on that bus use.

Cheers,
Ben.




[PATCH v2 2/2] powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements

2018-08-06 Thread Hari Bathini
With dynamic memory allocation support for crash memory ranges array,
there is no hard limit on the no. of crash memory ranges kernel could
export, but program headers count could overflow in the /proc/vmcore
ELF file while exporting each memory range as PT_LOAD segment. Reduce
the likelihood of a such scenario, by folding adjacent crash memory
ranges which minimizes the total number of PT_LOAD segments.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/fadump.c |   45 ++
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2ec5704..cd0c555 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -908,22 +908,41 @@ static int allocate_crash_memory_ranges(void)
 static inline int fadump_add_crash_memory(unsigned long long base,
  unsigned long long end)
 {
+   u64  start, size;
+   bool is_adjacent = false;
+
if (base == end)
return 0;
 
-   if (crash_mem_ranges == max_crash_mem_ranges) {
-   int ret;
+   /*
+* Fold adjacent memory ranges to bring down the memory ranges/
+* PT_LOAD segments count.
+*/
+   if (crash_mem_ranges) {
+   start = crash_memory_ranges[crash_mem_ranges-1].base;
+   size = crash_memory_ranges[crash_mem_ranges-1].size;
 
-   ret = allocate_crash_memory_ranges();
-   if (ret)
-   return ret;
+   if ((start + size) == base)
+   is_adjacent = true;
+   }
+   if (!is_adjacent) {
+   /* resize the array on reaching the limit */
+   if (crash_mem_ranges == max_crash_mem_ranges) {
+   int ret;
+
+   ret = allocate_crash_memory_ranges();
+   if (ret)
+   return ret;
+   }
+
+   start = base;
+   crash_memory_ranges[crash_mem_ranges].base = start;
+   crash_mem_ranges++;
}
 
+   crash_memory_ranges[crash_mem_ranges-1].size = (end - start);
pr_debug("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n",
-   crash_mem_ranges, base, end - 1, (end - base));
-   crash_memory_ranges[crash_mem_ranges].base = base;
-   crash_memory_ranges[crash_mem_ranges].size = end - base;
-   crash_mem_ranges++;
+   (crash_mem_ranges - 1), start, end - 1, (end - start));
return 0;
 }
 
@@ -999,6 +1018,14 @@ static int fadump_setup_crash_memory_ranges(void)
 
pr_debug("Setup crash memory ranges.\n");
crash_mem_ranges = 0;
+
+   /* allocate memory for crash memory ranges for the first time */
+   if (!max_crash_mem_ranges) {
+   ret = allocate_crash_memory_ranges();
+   if (ret)
+   return ret;
+   }
+
/*
 * add the first memory chunk (RMA_START through boot_memory_size) as
 * a separate memory chunk. The reason is, at the time crash firmware



[PATCH v2 1/2] powerpc/fadump: handle crash memory ranges array index overflow

2018-08-06 Thread Hari Bathini
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").

On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:

  task: c7f39a290010 ti: cb738000 task.ti: cb738000
  NIP: c0047df4 LR: c00f9e58 CTR: c010f180
  REGS: cb73b570 TRAP: 0300   Tainted: G  L   X  (4.4.140+)
  MSR: 80009033   CR: 22004484  XER: 2000
  CFAR: c0008500 DAR: 07a45000 DSISR: 4000 SOFTE: 0
  GPR00: c00f9e58 cb73b7f0 c0f09a00 001a
  GPR04: c7f3bf774c90 0004 c0eb9a00 0800
  GPR08: 0804 07a45000 c0fa9a00 c7ffb169ca20
  GPR12: 22004482 cfa12c00 c7f3a0ea97a8 
  GPR16: c7f3a0ea9a50 cb73bd60 0118 0001fe80
  GPR20: 0118  c0b8c980 00d0
  GPR24: 07ffb0b1 c7ffb169c980  c0b8c980
  GPR28: 0004 c7ffb169c980 001a c7ffb169c980
  NIP [c0047df4] smp_send_reschedule+0x24/0x80
  LR [c00f9e58] resched_curr+0x138/0x160
  Call Trace:
  [cb73b7f0] [c00f9e58] resched_curr+0x138/0x160 (unreliable)
  [cb73b820] [c00fb538] check_preempt_curr+0xc8/0xf0
  [cb73b850] [c00fb598] ttwu_do_wakeup+0x38/0x150
  [cb73b890] [c00fc9c4] try_to_wake_up+0x224/0x4d0
  [cb73b900] [c011ef34] __wake_up_common+0x94/0x100
  [cb73b960] [c034a78c] ep_poll_callback+0xac/0x1c0
  [cb73b9b0] [c011ef34] __wake_up_common+0x94/0x100
  [cb73ba10] [c011f810] __wake_up_sync_key+0x70/0xa0
  [cb73ba60] [c067c3e8] sock_def_readable+0x58/0xa0
  [cb73ba90] [c07848ac] unix_stream_sendmsg+0x2dc/0x4c0
  [cb73bb70] [c0675a38] sock_sendmsg+0x68/0xa0
  [cb73bba0] [c067673c] ___sys_sendmsg+0x2cc/0x2e0
  [cb73bd30] [c0677dbc] __sys_sendmsg+0x5c/0xc0
  [cb73bdd0] [c06789bc] SyS_socketcall+0x36c/0x3f0
  [cb73be30] [c0009488] system_call+0x3c/0x100
  Instruction dump:
  4e800020 6000 6042 3c4c00ec 38421c30 7c0802a6 f8010010 6000
  3d42000a e92ab420 2fa9 4dde0020  2fa9 419e0044 7c0802a6
  ---[ end trace a6d1dd4bab5f8253 ]---

as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.

Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program 
headers.")
Cc: sta...@vger.kernel.org
Cc: Mahesh Salgaonkar 
Signed-off-by: Hari Bathini 
---

Changes in v2:
* Allocating memory for crash ranges in pagesize unit.
* freeing memory allocated while cleaning up.
* Moved the changes to coalesce memory ranges into patch 2/2.


 arch/powerpc/include/asm/fadump.h |4 +-
 arch/powerpc/kernel/fadump.c  |   91 +++--
 2 files changed, 79 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010..3abc738 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -195,8 +195,8 @@ struct fadump_crash_info_header {
struct cpumask  online_mask;
 };
 
-/* Crash memory ranges */
-#define INIT_CRASHMEM_RANGES   (INIT_MEMBLOCK_REGIONS + 2)
+/* Crash memory ranges size unit (pagesize) */
+#define CRASHMEM_RANGES_ALLOC_SIZE PAGE_SIZE
 
 struct fad_crash_memory_ranges {
unsigned long long  base;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 07e8396..2ec5704 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -47,8 +47,10 @@ static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
 
 static DEFINE_MUTEX(fadump_mutex);
-struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
+struct fad_crash_memory_ranges *crash_memory_ranges;
+int crash_memory_ranges_size;
 int crash_mem_ranges;
+int max_crash_mem_ranges;
 
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
@@ -868,22 +870,67 @@ static int __init 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> > 
> > > Right, we'll need some quirk to disable balloons  in the guest I
> > > suppose.
> > > 
> > > Passing something from libvirt is cumbersome because the end user may
> > > not even need to know about secure VMs. There are use cases where the
> > > security is a contract down to some special application running inside
> > > the secure VM, the sysadmin knows nothing about.
> > > 
> > > Also there's repercussions all the way to admin tools, web UIs etc...
> > > so it's fairly wide ranging.
> > > 
> > > So as long as we only need to quirk a couple of devices, it's much
> > > better contained that way.
> > 
> > So just the balloon thing already means that yes management and all the
> > way to the user tools must know this is going on. Otherwise
> > user will try to inflate the balloon and wonder why this does not work.
> 
> There is *dozens* of management systems out there, not even all open
> source, we won't ever be able to see the end of the tunnel if we need
> to teach every single of them, including end users, about platform
> specific new VM flags like that.
> 
> .../...

In the end I suspect you will find you have to.

> > Here's another example: you can't migrate a secure vm to hypervisor
> > which doesn't support this feature. Again management tools above libvirt
> > need to know otherwise they will try.
> 
> There will have to be a new machine type for that I suppose, yes,
> though it's not just the hypervisor that needs to know about the
> modified migration stream, it's also the need to have a compatible
> ultravisor with the right keys on the other side.
> 
> So migration is going to be special and require extra admin work in all
> cases yes. But not all secure VMs are meant to be migratable.
> 
> In any case, back to the problem at hand. What a qemu flag gives us is
> just a way to force iommu at VM creation time.

I don't think a qemu flag is strictly required for a problem at hand.

> This is rather sub-optimal, we don't really want the iommu in the way,
> so it's at best a "workaround", and it's not really solving the real
> problem.

This specific problem, I think I agree.

> As I said replying to Christoph, we are "leaking" into the interface
> something here that is really what's the VM is doing to itself, which
> is to stash its memory away in an inaccessible place.
> 
> Cheers,
> Ben.

I think Christoph merely objects to the specific implementation.  If
instead you do something like tweak dev->bus_dma_mask for the virtio
device I think he won't object.

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> 
> > Right, we'll need some quirk to disable balloons  in the guest I
> > suppose.
> > 
> > Passing something from libvirt is cumbersome because the end user may
> > not even need to know about secure VMs. There are use cases where the
> > security is a contract down to some special application running inside
> > the secure VM, the sysadmin knows nothing about.
> > 
> > Also there's repercussions all the way to admin tools, web UIs etc...
> > so it's fairly wide ranging.
> > 
> > So as long as we only need to quirk a couple of devices, it's much
> > better contained that way.
> 
> So just the balloon thing already means that yes management and all the
> way to the user tools must know this is going on. Otherwise
> user will try to inflate the balloon and wonder why this does not work.

There is *dozens* of management systems out there, not even all open
source, we won't ever be able to see the end of the tunnel if we need
to teach every single of them, including end users, about platform
specific new VM flags like that.

.../...

> Here's another example: you can't migrate a secure vm to hypervisor
> which doesn't support this feature. Again management tools above libvirt
> need to know otherwise they will try.

There will have to be a new machine type for that I suppose, yes,
though it's not just the hypervisor that needs to know about the
modified migration stream, it's also the need to have a compatible
ultravisor with the right keys on the other side.

So migration is going to be special and require extra admin work in all
cases yes. But not all secure VMs are meant to be migratable.

In any case, back to the problem at hand. What a qemu flag gives us is
just a way to force iommu at VM creation time.

This is rather sub-optimal, we don't really want the iommu in the way,
so it's at best a "workaround", and it's not really solving the real
problem.

As I said replying to Christoph, we are "leaking" into the interface
something here that is really what's the VM is doing to itself, which
is to stash its memory away in an inaccessible place.

Cheers,
Ben.




Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Benjamin Herrenschmidt
On Mon, 2018-08-06 at 02:42 -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote:
> > Who would set this bit ? qemu ? Under what circumstances ?
> 
> I don't really care who sets what.  The implementation might not even
> involved qemu.
> 
> It is your job to write a coherent interface specification that does
> not depend on the used components.  The hypervisor might be PAPR,
> Linux + qemu, VMware, Hyperv or something so secret that you'd have
> to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> properly specify the interface it simplify does not matter.

That's the point Christoph. The interface is today's interface. It does
NOT change. That information is not part of the interface.

It's the VM itself that is stashing away its memory in a secret place,
and thus needs to do bounce buffering. There is no change to the virtio
interface per-se.

> > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
> > ie, what would qemu do and what would Linux do ? I'm not sure I fully
> > understand your idea.
> 
> In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the
> description which currently is rather vague but basically captures
> the use case.  Currently is is:
> 
> VIRTIO_F_IOMMU_PLATFORM(33)
> This feature indicates that the device is behind an IOMMU that
> translates bus addresses from the device into physical addresses in
> memory. If this feature bit is set to 0, then the device emits
> physical addresses which are not translated further, even though an
> IOMMU may be present.
> 
> And I'd change it to something like:
> 
> VIRTIO_F_PLATFORM_DMA(33)
> This feature indicates that the device emits platform specific
> bus addresses that might not be identical to physical address.
> The translation of physical to bus address is platform speific
> and defined by the plaform specification for the bus that the virtio
> device is attached to.
> If this feature bit is set to 0, then the device emits
> physical addresses which are not translated further, even if
> the platform would normally require translations for the bus that
> the virtio device is attached to.
> 
> If we can't change the defintion any more we should deprecate the
> old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM
> and VIRTIO_F_PLATFORM_DMA to be not set at the same time.

But this doesn't really change our problem does it ?

None of what happens in our case is part of the "interface". The
suggestion to force the iommu ON was simply that it was a "workaround"
as by doing so, we get to override the DMA ops, but that's just a
trick.

Fundamentally, what we need to solve is pretty much entirely a guest
problem.

> > I'm trying to understand because the limitation is not a device side
> > limitation, it's not a qemu limitation, it's actually more of a VM
> > limitation. It has most of its memory pages made inaccessible for
> > security reasons. The platform from a qemu/KVM perspective is almost
> > entirely normal.
> 
> Well, find a way to describe this either in the qemu specification using
> new feature bits, or by using something like the above.

But again, why do you want to involve the interface, and thus the
hypervisor for something that is essentially what the guest is doign to
itself ?

It really is something we need to solve locally to the guest, it's not
part of the interface.

Cheers,
Ben.




Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-08-06 Thread John Allen

On Wed, Aug 01, 2018 at 11:02:59PM +1000, Michael Ellerman wrote:

Hi John,

I'm still not sure about this one.

John Allen  writes:

On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:

Hi John,

I'm a bit puzzled by this one.

John Allen  writes:

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 */
pseries_devicetree_update(-prrn_update_scope);
numa_update_cpu_topology(false);
+   mutex_unlock(_lock);
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);

 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);


This seems like it's actually the core of the change. Previously we were
basically blocking on the flush before continuing.


The idea here is to replace the blocking flush_work with a non-blocking
mutex. So rather than waiting on the running PRRN event to complete, we
bail out since a PRRN event is already running.


OK, but why is it OK to bail out?

The firmware sent you an error log asking you to do something, with a
scope value that has some meaning, and now you're just going to drop
that on the floor?

Maybe it is OK to just drop these events? Or maybe you're saying that
because the system is crashing under the load of too many events it's OK
to drop the events in this case.


I think I see your point. If a PRRN event comes in while another is 
currently running, the new one may contain a different list of LMBs/CPUs 
and the old list becomes outdated. With the mutex, the only event that 
gets handled is the oldest and we will lose any additional changes 
beyond the initial event. Therefore, as you mentioned in your previous 
message, the behavior of the global workqueue should work just fine once 
we remove the call to flush_work.  While a prrn event is running, only 
one will remain on the workqueue, then when the first one completes, the 
newly scheduled work function should grab the latest PRRN list.


I will send a new version of the patch with just the call to flush_work 
removed.


-John




The situation this is
meant to address is flooding the workqueue with PRRN events, which like
the situation in patch 2/2, these can be queued up faster than they can
actually be handled.


I'm not really sure why this is a problem though.

The current code synchronously processes the events, so there should
only ever be one in flight.

I guess the issue is that each one can queue multiple events on the
hotplug work queue?

But still, we have terabytes of RAM, we should be able to queue a lot
of events before it becomes a problem.

So what exactly is getting flooded, what's the symptom?

If the queuing of the hotplug events is the problem, then why don't we
stop doing that? We could just process them synchronously from the PRRN
update, that would naturally throttle them.

cheers





Re: [PATCH v2] selftests/powerpc: Avoid remaining process/threads

2018-08-06 Thread Breno Leitao
Hello Michael,

On 08/06/2018 08:06 AM, Michael Ellerman wrote:
> Breno Leitao  writes:
> 
>> diff --git a/tools/testing/selftests/powerpc/harness.c 
>> b/tools/testing/selftests/powerpc/harness.c
>> index 66d31de60b9a..06c51e8d8ccb 100644
>> --- a/tools/testing/selftests/powerpc/harness.c
>> +++ b/tools/testing/selftests/powerpc/harness.c
>> @@ -85,13 +85,16 @@ int run_test(int (test_function)(void), char *name)
>>  return status;
>>  }
>>  
>> -static void alarm_handler(int signum)
>> +static void sig_handler(int signum)
>>  {
>> -/* Jut wake us up from waitpid */
>> +if (signum == SIGINT)
>> +kill(-pid, SIGTERM);
> 
> I don't think we need to do that here, if we just return then we'll pop
> out of the waitpid() and go via the normal path.

Correct, if we press ^C while the parent process is waiting at waitpid(),
then waitpid() syscall will be interrupted (EINTR) and never restarted again
(unless we set sa_flags = SA_RESTART), thus, the code will restart to execute
the next instruction when the signal handler is done, as we had skipped
waitpid().

>From a theoretical point of view, the user can press ^C before the process
executes waitpid() syscall. In this case and the process will not 'skip' the
waitpid(), which will continue to wait. We can clearly force this behavior
putting a sleep(1) before waitpid() and pressing  ^C in the very first
second, it will 'skip' the nanosleep() syscall instead of waitpid() which
will be there, and the ^C will be ignored (thus not calling kill(-pid, 
SIGTERM)).

>From a practical point of view, I will prepare a v3 patch. :-)


[PATCH v6 11/11] hugetlb: Introduce generic version of huge_ptep_get

2018-08-06 Thread Alexandre Ghiti
ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the
same version of huge_ptep_get, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 1 +
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 5 -
 arch/mips/include/asm/hugetlb.h   | 5 -
 arch/parisc/include/asm/hugetlb.h | 5 -
 arch/powerpc/include/asm/hugetlb.h| 5 -
 arch/sh/include/asm/hugetlb.h | 5 -
 arch/sparc/include/asm/hugetlb.h  | 5 -
 arch/x86/include/asm/hugetlb.h| 5 -
 include/asm-generic/hugetlb.h | 7 +++
 10 files changed, 9 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index 54e4b097b1f5..0d9f3918fa7e 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -29,6 +29,7 @@
  * ptes.
  * (The valid bit is automatically cleared by set_pte_at for PROT_NONE ptes).
  */
+#define __HAVE_ARCH_HUGE_PTEP_GET
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
pte_t retval = *ptep;
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 80887abcef7f..fb6609875455 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@
 
 #include 
 
+#define __HAVE_ARCH_HUGE_PTEP_GET
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
return READ_ONCE(*ptep);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index e9b42750fdf5..36cc0396b214 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,11 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 120adc3b2ffd..425bb6fc3bda 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -82,11 +82,6 @@ static inline int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
return changed;
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 165b4e5a6f32..7cb595dcb7d7 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -48,11 +48,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty);
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 658bf7136a3c..33a2d9e3ea9e 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -142,11 +142,6 @@ extern int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index c87195ae0cfa..6f025fe18146 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -32,11 +32,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 028a1465fbe7..3963f80d1cb3 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -53,11 +53,6 @@ static inline int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
return changed;
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 574d42eb081e..7469d321f072 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -13,11 +13,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 

[PATCH v6 10/11] hugetlb: Introduce generic version of huge_ptep_set_access_flags

2018-08-06 Thread Alexandre Ghiti
arm, ia64, sh, x86 architectures use the same version
of huge_ptep_set_access_flags, so move this generic implementation
into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 7 ---
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 7 ---
 arch/mips/include/asm/hugetlb.h   | 1 +
 arch/parisc/include/asm/hugetlb.h | 1 +
 arch/powerpc/include/asm/hugetlb.h| 1 +
 arch/sh/include/asm/hugetlb.h | 7 ---
 arch/sparc/include/asm/hugetlb.h  | 1 +
 arch/x86/include/asm/hugetlb.h| 7 ---
 include/asm-generic/hugetlb.h | 9 +
 10 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index 8247cd6a2ac6..54e4b097b1f5 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,11 +37,4 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep,
-pte_t pte, int dirty)
-{
-   return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-}
-
 #endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index f4f69ae5466e..80887abcef7f 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -42,6 +42,7 @@ extern pte_t arch_make_huge_pte(pte_t entry, struct 
vm_area_struct *vma,
 #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
 extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 49d1f7949f3a..e9b42750fdf5 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,13 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep,
-pte_t pte, int dirty)
-{
-   return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-}
-
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
return *ptep;
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 3dcf5debf8c4..120adc3b2ffd 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -63,6 +63,7 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr,
 pte_t *ptep, pte_t pte,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 9c3950ca2974..165b4e5a6f32 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -43,6 +43,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep);
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty);
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 69c14ecac133..658bf7136a3c 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -137,6 +137,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
flush_hugetlb_page(vma, addr);
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 8df4004977b9..c87195ae0cfa 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -32,13 +32,6 @@ static inline void huge_ptep_clear_flush(struct 

[PATCH v6 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect

2018-08-06 Thread Alexandre Ghiti
arm, ia64, mips, powerpc, sh, x86 architectures use the same version
of huge_ptep_set_wrprotect, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h| 6 --
 arch/arm64/include/asm/hugetlb.h | 1 +
 arch/ia64/include/asm/hugetlb.h  | 6 --
 arch/mips/include/asm/hugetlb.h  | 6 --
 arch/parisc/include/asm/hugetlb.h| 1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 6 --
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
 arch/powerpc/include/asm/nohash/32/pgtable.h | 6 --
 arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
 arch/sh/include/asm/hugetlb.h| 6 --
 arch/sparc/include/asm/hugetlb.h | 1 +
 arch/x86/include/asm/hugetlb.h   | 6 --
 include/asm-generic/hugetlb.h| 8 
 13 files changed, 13 insertions(+), 42 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index b897541520ef..8247cd6a2ac6 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3e7f6e69b28d..f4f69ae5466e 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct 
*vma,
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index cbe296271030..49d1f7949f3a 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 6ff2531cfb1d..3dcf5debf8c4 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr,
 pte_t *ptep, pte_t pte,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index fb7e0fd858a3..9c3950ca2974 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep);
 
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 02f5acd7ccc4..fc1511ce33a4 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -228,12 +228,6 @@ static inline void ptep_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
 {
pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
 }
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, 

[PATCH v6 08/11] hugetlb: Introduce generic version of prepare_hugepage_range

2018-08-06 Thread Alexandre Ghiti
arm, arm64, powerpc, sparc, x86 architectures use the same version of
prepare_hugepage_range, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb.h | 11 ---
 arch/arm64/include/asm/hugetlb.h   | 11 ---
 arch/ia64/include/asm/hugetlb.h|  1 +
 arch/mips/include/asm/hugetlb.h|  1 +
 arch/parisc/include/asm/hugetlb.h  |  1 +
 arch/powerpc/include/asm/hugetlb.h | 15 ---
 arch/sh/include/asm/hugetlb.h  |  1 +
 arch/sparc/include/asm/hugetlb.h   | 16 
 arch/x86/include/asm/hugetlb.h | 15 ---
 include/asm-generic/hugetlb.h  | 15 +++
 10 files changed, 19 insertions(+), 68 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 9ca14227eeb7..3fcef21ff2c2 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -33,17 +33,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 }
 
-static inline int prepare_hugepage_range(struct file *file,
-unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return -EINVAL;
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1fd64ebf0cd7..3e7f6e69b28d 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -31,17 +31,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 }
 
-static inline int prepare_hugepage_range(struct file *file,
-unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return -EINVAL;
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 82fe3d7a38d9..cbe296271030 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -9,6 +9,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned 
long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
 
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len);
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index b3d6bb53ee6e..6ff2531cfb1d 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -18,6 +18,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
return 0;
 }
 
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
 unsigned long addr,
 unsigned long len)
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 5a102d7251e4..fb7e0fd858a3 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -22,6 +22,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
  */
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len)
 {
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 7123599089c6..69c14ecac133 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -117,21 +117,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, 
unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
 
-/*
- * If the arch doesn't supply something else, assume that hugepage
- * size aligned regions are ok without further preparation.
- */
-static inline int prepare_hugepage_range(struct file *file,
-   unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return 

[PATCH v6 07/11] hugetlb: Introduce generic version of huge_pte_wrprotect

2018-08-06 Thread Alexandre Ghiti
arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86
architectures use the same version of huge_pte_wrprotect, so move
this generic implementation into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb.h | 5 -
 arch/arm64/include/asm/hugetlb.h   | 5 -
 arch/ia64/include/asm/hugetlb.h| 5 -
 arch/mips/include/asm/hugetlb.h| 5 -
 arch/parisc/include/asm/hugetlb.h  | 5 -
 arch/powerpc/include/asm/hugetlb.h | 5 -
 arch/sh/include/asm/hugetlb.h  | 5 -
 arch/sparc/include/asm/hugetlb.h   | 5 -
 arch/x86/include/asm/hugetlb.h | 5 -
 include/asm-generic/hugetlb.h  | 7 +++
 10 files changed, 7 insertions(+), 45 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index c821b550d6a4..9ca14227eeb7 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -44,11 +44,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 49247c6f94db..1fd64ebf0cd7 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -42,11 +42,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index bf573500b3c4..82fe3d7a38d9 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -26,11 +26,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 1c9c4531376c..b3d6bb53ee6e 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -62,11 +62,6 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index c09d8c74553c..5a102d7251e4 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -38,11 +38,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep);
 
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 3562d46585ba..7123599089c6 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -152,11 +152,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
flush_hugetlb_page(vma, addr);
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index a9f8266f33cf..54f65094efe6 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -31,11 +31,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
-   return pte_wrprotect(pte);
-}
-
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 5bbd712e..f661362376e0 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -48,11 +48,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t 

[PATCH v6 06/11] hugetlb: Introduce generic version of huge_pte_none

2018-08-06 Thread Alexandre Ghiti
arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures
use the same version of huge_pte_none, so move this generic
implementation into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb.h | 5 -
 arch/arm64/include/asm/hugetlb.h   | 5 -
 arch/ia64/include/asm/hugetlb.h| 5 -
 arch/mips/include/asm/hugetlb.h| 1 +
 arch/parisc/include/asm/hugetlb.h  | 5 -
 arch/powerpc/include/asm/hugetlb.h | 5 -
 arch/sh/include/asm/hugetlb.h  | 5 -
 arch/sparc/include/asm/hugetlb.h   | 5 -
 arch/x86/include/asm/hugetlb.h | 5 -
 include/asm-generic/hugetlb.h  | 7 +++
 10 files changed, 8 insertions(+), 40 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 537660891f9f..c821b550d6a4 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -44,11 +44,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 4c8dd488554d..49247c6f94db 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -42,11 +42,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 41b5f6adeee4..bf573500b3c4 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -26,11 +26,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 7df1f116a3cc..1c9c4531376c 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -55,6 +55,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
flush_tlb_page(vma, addr & huge_page_mask(hstate_vma(vma)));
 }
 
+#define __HAVE_ARCH_HUGE_PTE_NONE
 static inline int huge_pte_none(pte_t pte)
 {
unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 9afff26747a1..c09d8c74553c 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -38,11 +38,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 0b02856aa85b..3562d46585ba 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -152,11 +152,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
flush_hugetlb_page(vma, addr);
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 9abf9c86b769..a9f8266f33cf 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -31,11 +31,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 651a9593fcee..5bbd712e 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -48,11 +48,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_pte_none(pte_t pte)
-{
-   return pte_none(pte);
-}
-
 static inline pte_t huge_pte_wrprotect(pte_t pte)
 {
return pte_wrprotect(pte);
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index fd59673e7a0a..42d872054791 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -28,11 +28,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline int 

[PATCH v6 05/11] hugetlb: Introduce generic version of huge_ptep_clear_flush

2018-08-06 Thread Alexandre Ghiti
arm, x86 architectures use the same version of
huge_ptep_clear_flush, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 6 --
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 1 +
 arch/mips/include/asm/hugetlb.h   | 1 +
 arch/parisc/include/asm/hugetlb.h | 1 +
 arch/powerpc/include/asm/hugetlb.h| 1 +
 arch/sh/include/asm/hugetlb.h | 1 +
 arch/sparc/include/asm/hugetlb.h  | 1 +
 arch/x86/include/asm/hugetlb.h| 6 --
 include/asm-generic/hugetlb.h | 8 
 10 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index ad36e84b819a..b897541520ef 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
-{
-   ptep_clear_flush(vma, addr, ptep);
-}
-
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 6ae0bcafe162..4c8dd488554d 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -71,6 +71,7 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 6719c74da0de..41b5f6adeee4 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 0959cc5a41fa..7df1f116a3cc 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -48,6 +48,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct 
*mm,
return pte;
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 6e281e1bb336..9afff26747a1 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -32,6 +32,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 970101cf9c82..0b02856aa85b 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -143,6 +143,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 #endif
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 08ee6c00b5e9..9abf9c86b769 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,6 +25,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 944e3a4bfaff..651a9593fcee 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -42,6 +42,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 

[PATCH v6 04/11] hugetlb: Introduce generic version of huge_ptep_get_and_clear

2018-08-06 Thread Alexandre Ghiti
arm, ia64, sh, x86 architectures use the
same version of huge_ptep_get_and_clear, so move this generic
implementation into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 6 --
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 6 --
 arch/mips/include/asm/hugetlb.h   | 1 +
 arch/parisc/include/asm/hugetlb.h | 1 +
 arch/powerpc/include/asm/hugetlb.h| 1 +
 arch/sh/include/asm/hugetlb.h | 6 --
 arch/sparc/include/asm/hugetlb.h  | 1 +
 arch/x86/include/asm/hugetlb.h| 6 --
 include/asm-generic/hugetlb.h | 8 
 10 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index 398fb06e8207..ad36e84b819a 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -49,12 +49,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct 
*mm,
ptep_set_wrprotect(mm, addr, ptep);
 }
 
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
-   unsigned long addr, pte_t *ptep)
-{
-   return ptep_get_and_clear(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 874661a1dff1..6ae0bcafe162 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -66,6 +66,7 @@ extern void set_huge_pte_at(struct mm_struct *mm, unsigned 
long addr,
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index a235d6f60fb3..6719c74da0de 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
 }
 
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
-   unsigned long addr, pte_t *ptep)
-{
-   return ptep_get_and_clear(mm, addr, ptep);
-}
-
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 8ea439041d5d..0959cc5a41fa 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -36,6 +36,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 77c8adbac7c3..6e281e1bb336 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -8,6 +8,7 @@
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte);
 
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep);
 
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 0794b53439d4..970101cf9c82 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -132,6 +132,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index bc552e37c1c9..08ee6c00b5e9 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,12 +25,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
-   unsigned long addr, pte_t *ptep)
-{
-   

[PATCH v6 03/11] hugetlb: Introduce generic version of set_huge_pte_at

2018-08-06 Thread Alexandre Ghiti
arm, ia64, mips, powerpc, sh, x86 architectures use the
same version of set_huge_pte_at, so move this generic
implementation into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 6 --
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 6 --
 arch/mips/include/asm/hugetlb.h   | 6 --
 arch/parisc/include/asm/hugetlb.h | 1 +
 arch/powerpc/include/asm/hugetlb.h| 6 --
 arch/sh/include/asm/hugetlb.h | 6 --
 arch/sparc/include/asm/hugetlb.h  | 1 +
 arch/x86/include/asm/hugetlb.h| 6 --
 include/asm-generic/hugetlb.h | 8 +++-
 10 files changed, 10 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index d4014fbe5ea3..398fb06e8207 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   set_pte_at(mm, addr, ptep, pte);
-}
-
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 4af1a800a900..874661a1dff1 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -60,6 +60,7 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
struct page *page, int writable);
 #define arch_make_huge_pte arch_make_huge_pte
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
 extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index afe9fa4d969b..a235d6f60fb3 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
 }
 
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   set_pte_at(mm, addr, ptep, pte);
-}
-
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 53764050243e..8ea439041d5d 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -36,12 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   set_pte_at(mm, addr, ptep, pte);
-}
-
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 28c23b68d38d..77c8adbac7c3 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 
 #include 
 
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte);
 
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index a7d5c739df9b..0794b53439d4 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -132,12 +132,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   set_pte_at(mm, addr, ptep, pte);
-}
-
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index f6a51b609409..bc552e37c1c9 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,12 +25,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, 

[PATCH v6 02/11] hugetlb: Introduce generic version of hugetlb_free_pgd_range

2018-08-06 Thread Alexandre Ghiti
arm, arm64, mips, parisc, sh, x86 architectures use the
same version of hugetlb_free_pgd_range, so move this generic
implementation into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Tested-by: Helge Deller  # parisc
Acked-by: Catalin Marinas  # arm64
Acked-by: Paul Burton  # MIPS parts
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb.h |  9 -
 arch/arm64/include/asm/hugetlb.h   | 10 --
 arch/ia64/include/asm/hugetlb.h|  5 +++--
 arch/mips/include/asm/hugetlb.h| 13 ++---
 arch/parisc/include/asm/hugetlb.h  | 12 ++--
 arch/powerpc/include/asm/hugetlb.h |  4 +++-
 arch/sh/include/asm/hugetlb.h  | 12 ++--
 arch/sparc/include/asm/hugetlb.h   |  4 +++-
 arch/x86/include/asm/hugetlb.h |  8 
 include/asm-generic/hugetlb.h  | 11 +++
 10 files changed, 26 insertions(+), 62 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 7d26f6c4f0f5..537660891f9f 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -27,15 +27,6 @@
 
 #include 
 
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
-   free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
-
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr, unsigned long len)
 {
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3fcf14663dfa..4af1a800a900 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -25,16 +25,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return READ_ONCE(*ptep);
 }
 
-
-
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
-   free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr, unsigned long len)
 {
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 74d2a5540aaf..afe9fa4d969b 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -3,9 +3,8 @@
 #define _ASM_IA64_HUGETLB_H
 
 #include 
-#include 
-
 
+#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
 void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
@@ -70,4 +69,6 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
 }
 
+#include 
+
 #endif /* _ASM_IA64_HUGETLB_H */
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 982bc0685330..53764050243e 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -10,8 +10,6 @@
 #define __ASM_HUGETLB_H
 
 #include 
-#include 
-
 
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
@@ -38,15 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr,
- unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
-   free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
 static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, pte_t pte)
 {
@@ -114,4 +103,6 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
 }
 
+#include 
+
 #endif /* __ASM_HUGETLB_H */
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 58e0f4620426..28c23b68d38d 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -3,8 +3,6 @@
 #define _ASM_PARISC64_HUGETLB_H
 
 #include 
-#include 
-
 
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte);
@@ -32,14 +30,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
 }
 
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
-   

[PATCH v6 01/11] hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h

2018-08-06 Thread Alexandre Ghiti
asm-generic/hugetlb.h proposes generic implementations of hugetlb
related functions: use __HAVE_ARCH_HUGE* defines in order to make arch
specific implementations of hugetlb functions consistent with pgtable.h
scheme.

Signed-off-by: Alexandre Ghiti 
Acked-by: Catalin Marinas  # arm64
Reviewed-by: Luiz Capitulino 
Reviewed-by: Mike Kravetz 
---
 arch/arm64/include/asm/hugetlb.h | 2 +-
 include/asm-generic/hugetlb.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index e73f68569624..3fcf14663dfa 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -81,9 +81,9 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
-#define huge_pte_clear huge_pte_clear
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 9d0cde8ab716..3da7cff52360 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -32,7 +32,7 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t 
newprot)
return pte_modify(pte, newprot);
 }
 
-#ifndef huge_pte_clear
+#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)
 {
-- 
2.16.2



[PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-08-06 Thread Alexandre Ghiti
[CC linux-mm for inclusion in -mm tree] 
 

 
In order to reduce copy/paste of functions across architectures and then
 
make riscv hugetlb port (and future ports) simpler and smaller, this
 
patchset intends to factorize the numerous hugetlb primitives that are  
 
defined across all the architectures.   
 

 
Except for prepare_hugepage_range, this patchset moves the versions that
 
are just pass-through to standard pte primitives into   
 
asm-generic/hugetlb.h by using the same #ifdef semantic that can be 
 
found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.   
 

 
s390 architecture has not been tackled in this serie since it does not  
 
use asm-generic/hugetlb.h at all.   
 

 
This patchset has been compiled on all addressed architectures with 
 
success (except for parisc, but the problem does not come from this 
 
series).
 

 
v6: 
 
  - Remove nohash/32 and book3s/32 powerpc specific implementations in
order to use the generic ones.  
  
  - Add all the Reviewed-by, Acked-by and Tested-by in the commits, 
 
thanks to everyone. 
 

 
v5: 
 
  As suggested by Mike Kravetz, no need to move the #include
 
   for arm and x86 architectures, let it live at 
 
  the top of the file.  
 

 
v4: 
 
  Fix powerpc build error due to misplacing of #include 
 
   outside of #ifdef CONFIG_HUGETLB_PAGE, as 
 
  pointed by Christophe Leroy.  
 

 
v1, v2, v3: 
 
  Same version, just problems with email provider and misuse of 
 
  --batch-size option of git send-email

Alexandre Ghiti (11):
  hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
  hugetlb: Introduce generic version of hugetlb_free_pgd_range
  hugetlb: Introduce generic version of set_huge_pte_at
  hugetlb: Introduce generic version of huge_ptep_get_and_clear
  hugetlb: Introduce generic version of huge_ptep_clear_flush
  hugetlb: Introduce generic version of huge_pte_none
  hugetlb: Introduce generic version of huge_pte_wrprotect
  hugetlb: Introduce generic version of prepare_hugepage_range
  hugetlb: Introduce generic version of huge_ptep_set_wrprotect
  hugetlb: Introduce generic version of huge_ptep_set_access_flags
  hugetlb: Introduce generic version of huge_ptep_get

 arch/arm/include/asm/hugetlb-3level.h| 32 +-
 arch/arm/include/asm/hugetlb.h   | 30 --
 arch/arm64/include/asm/hugetlb.h | 39 +++-
 arch/ia64/include/asm/hugetlb.h  | 47 ++-
 arch/mips/include/asm/hugetlb.h  | 40 +++--
 arch/parisc/include/asm/hugetlb.h| 33 +++
 arch/powerpc/include/asm/book3s/32/pgtable.h |  6 --
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
 arch/powerpc/include/asm/hugetlb.h   | 43 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  6 --
 arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
 arch/sh/include/asm/hugetlb.h| 54 ++---
 arch/sparc/include/asm/hugetlb.h | 40 +++--
 arch/x86/include/asm/hugetlb.h   | 69 --
 include/asm-generic/hugetlb.h| 88 +++-
 15 files changed, 135 insertions(+), 394 deletions(-)

-- 
2.16.2



is there still any need PPC checking for "chosen@0"?

2018-08-06 Thread Robert P. J. Day


  given that there are no .dts files in the current kernel code base
that define the node name "/chosen@0" instead of the proper "/chosen",
is there any need for arch/powerpc/boot/oflib.c to still make this
test:

chosen = of_finddevice("/chosen");
if (chosen == (phandle) -1) {
chosen = of_finddevice("/chosen@0");   <--- this
if (chosen == (phandle) -1) {
printf("no chosen\n");
return 0;
}
}

are there still PPC machines that require the recognition of
"/chosen@0"?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



[PATCH] powerpc/Makefiles: Convert ifeq to ifdef where possible

2018-08-06 Thread Rodrigo R. Galvao
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.

Signed-off-by: Rodrigo R. Galvao 
Reviewed-by: Mauro S. M. Rodrigues 
---
 arch/powerpc/Makefile   | 26 +-
 arch/powerpc/boot/Makefile  |  2 +-
 arch/powerpc/kernel/Makefile| 14 --
 arch/powerpc/mm/Makefile|  4 ++--
 arch/powerpc/net/Makefile   |  2 +-
 arch/powerpc/platforms/52xx/Makefile|  2 +-
 arch/powerpc/platforms/cell/Makefile|  2 +-
 arch/powerpc/platforms/pseries/Makefile |  2 +-
 arch/powerpc/sysdev/Makefile|  2 +-
 9 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index fb96206..9aee790 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -36,7 +36,7 @@ else
 KBUILD_DEFCONFIG := ppc64_defconfig
 endif
 
-ifeq ($(CONFIG_PPC64),y)
+ifdef CONFIG_PPC64
 new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' > /dev/null; 
then echo y; else echo n; fi)
 
 ifeq ($(new_nm),y)
@@ -74,7 +74,7 @@ KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
 endif
 endif
 
-ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
+ifdef CONFIG_CPU_LITTLE_ENDIAN
 KBUILD_CFLAGS  += -mlittle-endian
 LDFLAGS+= -EL
 LDEMULATION:= lppc
@@ -117,7 +117,7 @@ LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux:= $(LDFLAGS_vmlinux-y)
 LDFLAGS_vmlinux += $(call ld-option,--orphan-handling=warn)
 
-ifeq ($(CONFIG_PPC64),y)
+ifdef CONFIG_PPC64
 ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
@@ -134,7 +134,7 @@ endif
 endif
 
 CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
-ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
+ifdef CONFIG_CPU_LITTLE_ENDIAN
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 else
@@ -148,8 +148,8 @@ CFLAGS-$(CONFIG_PPC64)  += $(call 
cc-option,-mno-pointers-to-nested-functions)
 CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 $(MULTIPLEWORD)
 CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)
 
-ifeq ($(CONFIG_PPC_BOOK3S_64),y)
-ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
+ifdef CONFIG_PPC_BOOK3S_64
+ifdef CONFIG_CPU_LITTLE_ENDIAN
 CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
 CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power9,-mtune=power8)
 else
@@ -173,7 +173,7 @@ CFLAGS-$(CONFIG_POWER9_CPU) += $(call 
cc-option,-mcpu=power9)
 CFLAGS-$(CONFIG_PPC_8xx) += $(call cc-option,-mcpu=860)
 
 # Altivec option not allowed with e500mc64 in GCC.
-ifeq ($(CONFIG_ALTIVEC),y)
+ifdef CONFIG_ALTIVEC
 E5500_CPU := -mcpu=powerpc64
 else
 E5500_CPU := $(call cc-option,-mcpu=e500mc64,-mcpu=powerpc64)
@@ -181,8 +181,8 @@ endif
 CFLAGS-$(CONFIG_E5500_CPU) += $(E5500_CPU)
 CFLAGS-$(CONFIG_E6500_CPU) += $(call cc-option,-mcpu=e6500,$(E5500_CPU))
 
-ifeq ($(CONFIG_PPC32),y)
-ifeq ($(CONFIG_PPC_E500MC),y)
+ifdef CONFIG_PPC32
+ifdef CONFIG_PPC_E500MC
 CFLAGS-y += $(call cc-option,-mcpu=e500mc,-mcpu=powerpc)
 else
 CFLAGS-$(CONFIG_E500) += $(call cc-option,-mcpu=8540 
-msoft-float,-mcpu=powerpc)
@@ -204,7 +204,7 @@ else
 CHECKFLAGS += -D__LITTLE_ENDIAN__
 endif
 
-ifeq ($(CONFIG_476FPE_ERR46),y)
+ifdef CONFIG_476FPE_ERR46
KBUILD_LDFLAGS_MODULE += --ppc476-workaround \
-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
 endif
@@ -231,12 +231,12 @@ KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm)
 # often slow when they are implemented at all
 KBUILD_CFLAGS  += $(call cc-option,-mno-string)
 
-ifeq ($(CONFIG_6xx),y)
+ifdef CONFIG_6xx
 KBUILD_CFLAGS  += -mcpu=powerpc
 endif
 
 # Work around a gcc code-gen bug with -fno-omit-frame-pointer.
-ifeq ($(CONFIG_FUNCTION_TRACER),y)
+ifdef CONFIG_FUNCTION_TRACER
 KBUILD_CFLAGS  += -mno-sched-epilog
 endif
 
@@ -381,7 +381,7 @@ install:
$(Q)$(MAKE) $(build)=$(boot) install
 
 vdso_install:
-ifeq ($(CONFIG_PPC64),y)
+ifdef CONFIG_PPC64
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso64 $@
 endif
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso32 $@
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index deea20c..0fb96c2 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -354,7 +354,7 @@ image-$(CONFIG_AMIGAONE)+= cuImage.amigaone
 
 # For 32-bit powermacs, build the COFF and miboot images
 # as well as the ELF images.
-ifeq ($(CONFIG_PPC32),y)
+ifdef CONFIG_PPC32
 image-$(CONFIG_PPC_PMAC)   += zImage.coff zImage.miboot
 endif
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2b4c40b2..7b09182 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Christoph Hellwig
On Mon, Aug 06, 2018 at 07:13:32PM +0300, Michael S. Tsirkin wrote:
> Oh that makes sense then. Could you post a pointer pls so
> this patchset is rebased on top (there are things to
> change about 4/4 but 1-3 could go in if they don't add
> overhead)?

The dma mapping direct calls will need a major work vs what I posted.
I plan to start that work in about two weeks once returning from my
vacation.


[PATCH v5 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"

2018-08-06 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share a
particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch defines the helper function to parse the contents of
"ibm,thread-groups" and a new structure to contain the parsed output.

The patch also creates the sysfs file named "small_core_siblings" that
returns the physical ids of the threads in the core that share the L1
cache, translation cache and instruction data flow.

Signed-off-by: Gautham R. Shenoy 
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 ++
 arch/powerpc/include/asm/cputhreads.h  |  22 +++
 arch/powerpc/kernel/setup-common.c | 154 +
 arch/powerpc/kernel/sysfs.c|  35 +
 4 files changed, 219 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9c5e7732..52c9b50 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,3 +487,11 @@ Description:   Information about CPU vulnerabilities
"Not affected"CPU is not affected by the vulnerability
"Vulnerable"  CPU is affected and no mitigation in effect
"Mitigation: $M"  CPU is affected and mitigation $M is in effect
+
+What:  /sys/devices/system/cpu/cpu[0-9]+/small_core_siblings
+Date:  06-Aug-2018
+KernelVersion: v4.19.0
+Contact:   Linux for PowerPC mailing list 
+Description:   List of Physical ids of CPUs which share the L1 cache,
+   translation cache and instruction data-flow with this CPU.
+Values:Comma separated list of decimal integers.
diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index d71a909..33226d7 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core   1
 #define threads_per_subcore1
 #define threads_shift  0
+#define has_big_cores  0
 #define threads_core_mask  (*get_cpu_mask(0))
 #endif
 
@@ -69,12 +71,32 @@ static inline cpumask_t cpu_online_cores_map(void)
return cpu_thread_mask_to_cores(cpu_online_mask);
 }
 
+#define MAX_THREAD_LIST_SIZE   8
+struct thread_groups {
+   unsigned int property;
+   unsigned int nr_groups;
+   unsigned int threads_per_group;
+   unsigned int thread_list[MAX_THREAD_LIST_SIZE];
+};
+
 #ifdef CONFIG_SMP
 int cpu_core_index_of_thread(int cpu);
 int cpu_first_thread_of_core(int core);
+int parse_thread_groups(struct device_node *dn, struct thread_groups *tg);
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg);
 #else
 static inline int cpu_core_index_of_thread(int cpu) { return cpu; }
 static inline int cpu_first_thread_of_core(int core) { return core; }
+static inline int parse_thread_groups(struct device_node *dn,
+ struct thread_groups *tg)
+{
+   return -ENODATA;
+}
+
+static inline int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+   return -1;
+}
 #endif
 
 static inline int cpu_thread_in_core(int cpu)
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 40b44bb..989edc1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -402,10 +402,12 @@ void __init check_for_initrd(void)
 #ifdef CONFIG_SMP
 
 int threads_per_core, threads_per_subcore, threads_shift;
+bool has_big_cores;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
 EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
+EXPORT_SYMBOL_GPL(has_big_cores);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
 static void __init cpu_init_thread_core_maps(int tpc)
@@ -433,6 +435,152 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+/*
+ * parse_thread_groups: Parses the "ibm,thread-groups" device tree
+ *  property for the CPU device node @dn and stores
+ *  the parsed output in the thread_groups
+ *  structure @tg.
+ *
+ * @dn: The device node of the CPU device.
+ * @tg: Pointer to a thread group structure into which the parsed
+ * output of "ibm,thread-groups" is stored.
+ *
+ * ibm,thread-groups[0..N-1] array defines which group of threads in
+ * the CPU-device node can be grouped together based on the property.
+ *
+ * ibm,thread-groups[0] tells us the 

[PATCH v5 2/2] powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores

2018-08-06 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Each of the SMT4 cores forming a big-core are more or less independent
units. Thus when multiple tasks are scheduled to run on the fused
core, we get the best performance when the tasks are spread across the
pair of SMT4 cores.

This patch achieves this by setting the SMT level mask to correspond
to the smallcore sibling mask on big-core systems.

With this patch, the SMT sched-domain with SMT=8,4,2 on big-core
systems are as follows:

1) ppc64_cpu --smt=8

 CPU0 attaching sched-domain(s):
  domain-0: span=0,2,4,6 level=SMT
   groups: 0:{ span=0 cap=294 }, 2:{ span=2 cap=294 },
   4:{ span=4 cap=294 }, 6:{ span=6 cap=294 }
 CPU1 attaching sched-domain(s):
  domain-0: span=1,3,5,7 level=SMT
   groups: 1:{ span=1 cap=294 }, 3:{ span=3 cap=294 },
   5:{ span=5 cap=294 }, 7:{ span=7 cap=294 }

2) ppc64_cpu --smt=4

 CPU0 attaching sched-domain(s):
  domain-0: span=0,2 level=SMT
   groups: 0:{ span=0 cap=589 }, 2:{ span=2 cap=589 }
 CPU1 attaching sched-domain(s):
  domain-0: span=1,3 level=SMT
   groups: 1:{ span=1 cap=589 }, 3:{ span=3 cap=589 }

3) ppc64_cpu --smt=2
   SMT domain ceases to exist as each domain consists of just one
   group.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/smp.h |  6 +
 arch/powerpc/kernel/smp.c  | 55 +++---
 2 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 29ffaab..30798c7 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -99,6 +99,7 @@ static inline void set_hard_smp_processor_id(int cpu, int 
phys)
 #endif
 
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map);
+DECLARE_PER_CPU(cpumask_var_t, cpu_smallcore_sibling_map);
 DECLARE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_map);
 
@@ -107,6 +108,11 @@ static inline struct cpumask *cpu_sibling_mask(int cpu)
return per_cpu(cpu_sibling_map, cpu);
 }
 
+static inline struct cpumask *cpu_smallcore_sibling_mask(int cpu)
+{
+   return per_cpu(cpu_smallcore_sibling_map, cpu);
+}
+
 static inline struct cpumask *cpu_core_mask(int cpu)
 {
return per_cpu(cpu_core_map, cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 4794d6b..ea3b306 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,10 +76,12 @@
 struct thread_info *secondary_ti;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
+DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
 
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
+EXPORT_PER_CPU_SYMBOL(cpu_smallcore_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 
@@ -689,6 +691,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
for_each_possible_cpu(cpu) {
zalloc_cpumask_var_node(_cpu(cpu_sibling_map, cpu),
GFP_KERNEL, cpu_to_node(cpu));
+   zalloc_cpumask_var_node(_cpu(cpu_smallcore_sibling_map,
+cpu),
+   GFP_KERNEL, cpu_to_node(cpu));
zalloc_cpumask_var_node(_cpu(cpu_l2_cache_map, cpu),
GFP_KERNEL, cpu_to_node(cpu));
zalloc_cpumask_var_node(_cpu(cpu_core_map, cpu),
@@ -707,6 +712,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
+   if (has_big_cores) {
+   cpumask_set_cpu(boot_cpuid,
+   cpu_smallcore_sibling_mask(boot_cpuid));
+   }
 
if (smp_ops && smp_ops->probe)
smp_ops->probe();
@@ -991,6 +1000,10 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_core_mask);
set_cpus_unrelated(cpu, i, cpu_l2_cache_mask);
set_cpus_unrelated(cpu, i, cpu_sibling_mask);
+   if (has_big_cores) {
+   set_cpus_unrelated(cpu, i,
+  cpu_smallcore_sibling_mask);
+   }
}
 }
 #endif
@@ -999,7 +1012,17 @@ static void add_cpu_to_masks(int cpu)
 {
int first_thread = cpu_first_thread_sibling(cpu);
int chipid = cpu_to_chip_id(cpu);
-   int i;
+
+   struct thread_groups tg;
+   int i, cpu_group_start = -1;
+
+   if (has_big_cores) {
+   struct device_node *dn = of_get_cpu_node(cpu, NULL);
+
+   parse_thread_groups(dn, );
+   cpu_group_start = get_cpu_thread_group_start(cpu, );
+   cpumask_set_cpu(cpu, cpu_smallcore_sibling_mask(cpu));
+ 

[PATCH v5 0/2] powerpc: Detection and scheduler optimization for POWER9 bigcore

2018-08-06 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Hi,

This is the fifth iteration of the patchset to add support for
big-core on POWER9.

The previous versions can be found here:

v4: https://lkml.org/lkml/2018/7/24/79
v3: https://lkml.org/lkml/2018/7/6/255
v2: https://lkml.org/lkml/2018/7/3/401
v1: https://lkml.org/lkml/2018/5/11/245

Changes :

v4 --> v5:
   - Patch 2 is entirely different: Instead of using CPU_FTR_ASYM_SMT
 feature, use the small core siblings at the SMT level
 sched-domain. This was suggested by Nicholas Piggin and Michael
 Ellerman.

   - A more detailed description follows below.

v3 --> v4:
   - Build fix for powerpc-g5 : Enable CPU_FTR_ASYM_SMT only on
 CONFIG_PPC_POWERNV and CONFIG_PPC_PSERIES.
   - Fixed a minor error in the ABI description.

v2 --> v3
- Set sane values in the tg->property, tg->nr_groups inside
parse_thread_groups before returning due to an error.
- Define a helper function to determine whether a CPU device node
  is a big-core or not.
- Updated the comments around the functions to describe the
  arguments passed to them.

v1 --> v2
- Added comments explaining the "ibm,thread-groups" device tree property.
- Uses cleaner device-tree parsing functions to parse the u32 arrays.
- Adds a sysfs file listing the small-core siblings for every CPU.
- Enables the scheduler optimization by setting the CPU_FTR_ASYM_SMT bit
  in the cur_cpu_spec->cpu_features on detecting the presence
  of interleaved big-core.
- Handles the corner case where there is only a single thread-group
  or when there is a single thread in a thread-group.

Description:

A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. Furthermore, on POWER9 the thread-ids of
such a big-core is obtained by interleaving the thread-ids of the
component SMT4 cores.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small Core0
   | | | |  |
Big Core   --
   | | | |  |
   |  1  |  3  |  5  |  7   |   Small Core1
   | | | |  |
   --

On such a big-core system, when multiple tasks are scheduled to run on
the big-core, we get the best performance when the tasks are spread
across the pair of SMT4 cores.

Eg: Suppose there 4 tasks {p1, p2, p3, p4} are run on a big core, then

An Example of Optimal Task placement:
   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small Core0
   | (p1)| (p2)| |  |
Big Core   --
   | | | |  |
   |  1  |  3  |  5  |  7   |   Small Core1
   | | (p3)| | (p4) |
   --

An example of Suboptimal Task placement:
   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small Core0
   | (p1)| (p2)| |  (p4)|
Big Core   --
   | | | |  |
   |  1  |  3  |  5  |  7   |   Small Core1
   | | (p3)| |  |
   --

In order to achieve optimal task placement, on big-core systems, we
define the he SMT level sched-domain to consist of the threads
belonging to the small cores. With this, the Linux Kernel
load-balancer will ensure that the tasks are spread across all the
component small cores in the system, thereby yielding optimum
performance.

Furthermore, this solution works correctly across all SMT modes
(8,4,2), as the interleaved thread-ids ensures that when we go to
lower SMT modes (4,2) the threads are offlined in a descending order,
thereby leaving equal number of threads from the component small cores
online as illustrated below.

With Patches: (ppc64_cpu --smt=on) : SMT domain

 CPU0 attaching sched-domain(s):
  domain-0: span=0,2,4,6 level=SMT
   groups: 0:{ span=0 cap=294 }, 2:{ span=2 cap=294 },
   4:{ span=4 cap=294 }, 6:{ span=6 cap=294 }
 CPU1 attaching sched-domain(s):
  domain-0: span=1,3,5,7 level=SMT
   groups: 1:{ span=1 cap=294 }, 3:{ span=3 cap=294 },
   5:{ span=5 cap=294 }, 7:{ span=7 cap=294 }

Optimal Task placement (SMT 8)
   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 09:10:40AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote:
> > > I've done something very similar in the thread I posted a few years
> > > ago.
> > 
> > Right so that was before spectre where a virtual call was cheaper :(
> 
> Sorry, I meant days, not years.  The whole point of the thread was the
> slowdowns due to retpolines, which are the software spectre mitigation.

Oh that makes sense then. Could you post a pointer pls so
this patchset is rebased on top (there are things to
change about 4/4 but 1-3 could go in if they don't add
overhead)?

-- 
MST


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Christoph Hellwig
On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote:
> > I've done something very similar in the thread I posted a few years
> > ago.
> 
> Right so that was before spectre where a virtual call was cheaper :(

Sorry, I meant days, not years.  The whole point of the thread was the
slowdowns due to retpolines, which are the software spectre mitigation.


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 08:24:06AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > >> Please go through these patches and review whether this approach 
> > > >> broadly
> > > >> makes sense. I will appreciate suggestions, inputs, comments 
> > > >> regarding
> > > >> the patches or the approach in general. Thank you.
> > > >
> > > > Jason did some work on profiling this. Unfortunately he reports
> > > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > 
> > >  The test is rather simple, just run pktgen 
> > >  (pktgen_sample01_simple.sh) in
> > >  guest and measure PPS on tap on host.
> > > 
> > >  Thanks
> > > >>>
> > > >>> Could you supply host configuration involved please?
> > > >>
> > > >> I wonder how much of that could be caused by Spectre mitigations
> > > >> blowing up indirect function calls...
> > > >>
> > > >> Cheers,
> > > >> Ben.
> > > > 
> > > > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> > > 
> > > Did we get better results (lower regression due to indirect calls) with
> > > the suggested mitigation ? Just curious.
> > 
> > I'm referring to this:
> > I wonder whether we can support map_sg and friends being NULL, then use
> > that when mapping is an identity. A conditional branch there is likely
> > very cheap.
> > 
> > I don't think anyone tried implementing this yes.
> 
> I've done something very similar in the thread I posted a few years
> ago.

Right so that was before spectre where a virtual call was cheaper :(

> I plan to get a version of that upstream for 4.20, but it won't
> cover the virtio case, just the real direct mapping.

I guess this RFC will have to be reworked on top and performance retested.

Thanks,

-- 
MST


Re: vio.c:__vio_register_driver && LPAR Migration issue

2018-08-06 Thread Michael Bringmann
On 08/04/2018 06:01 PM, Tyrel Datwyler wrote:
> On 08/03/2018 02:23 PM, Tyrel Datwyler wrote:
>> On 08/02/2018 11:15 AM, Michael Bringmann wrote:
>>> Hello:
>>> I have been observing an anomaly during LPAR migrations between
>>> a couple of P8 systems.
>>>
>>> This is the problem.  After migrating an LPAR, the PPC mobility code
>>> receives RTAS requests to delete nodes with platform-/hardware-specific
>>> attributes when restarting the kernel after a migration.  My example is
>>> for migration between a P8 Alpine and a P8 Brazos.  Among the nodes
>>> that I see being deleted are 'ibm,random-v1', 'ibm,compression-v1',
>>> 'ibm,platform-facilities', and 'ibm,sym-encryption-v1'.  Of these
>>> nodes, the following are created during initial boot by calls to
>>> vio_register_driver:
>>>
>>> drivers/char/hw_random/pseries-rng.c
>>> ibm,random-v1
>>>
>>> drivers/crypto/nx/nx-842-pseries.c
>>> ibm,compression-v1
>>>
>>> drivers/crypto/nx/nx.c
>>> ibm,sym-encryption-v1
>>>
>>> After the migration, these nodes are deleted, but nothing recreates
>>> them.  If I boot the LPAR on the target system, the nodes are added
>>> again.
>>>
>>> My question is how do we recreate these nodes after migration?
>>
>> Hmm, I'd have to see the scenario in action, but these should be added back 
>> by ibm,update-nodes RTAS call. There is some debug code in 
>> driver/of/dynamic.c that can be enabled that will log node/property dynamic 
>> reconfiguration events.
>>
> 
> I took a quick look by turning of phandle caching and turning on the OF 
> reconfig debug on one of my lpars. I noticed that it looks like this could be 
> a firmware issue. Looks to me like we aren't being notified to re-add those 
> child nodes of ibm,platform-facilities.
> 
> [ 1671.638041] OF: notify DETACH_NODE /cpus/l2-cache@2022
> [ 1671.638094] OF: notify DETACH_NODE /cpus/l3-cache@3122
> [ 1671.638119] OF: notify DETACH_NODE /cpus/l2-cache@2023
> [ 1671.638136] OF: notify DETACH_NODE /cpus/l3-cache@3123
> [ 1671.638164] OF: notify DETACH_NODE 
> /ibm,platform-facilities/ibm,random-v1
> [ 1671.638182] OF: notify DETACH_NODE 
> /ibm,platform-facilities/ibm,compression-v1
> [ 1671.638198] OF: notify DETACH_NODE 
> /ibm,platform-facilities/ibm,sym-encryption-v1
> [ 1671.638219] OF: notify DETACH_NODE /ibm,platform-facilities
> [ 1671.776419] OF: notify UPDATE_PROPERTY /:ibm,model-class
> 
> ... snipped property updates
> 
> [ 1672.129941] OF: notify UPDATE_PROPERTY /cpus/PowerPC,POWER8@8:ibm,dfp
> [ 1672.147551] OF: notify ATTACH_NODE /cpus/l2-cache@202e
> [ 1672.166321] OF: notify ATTACH_NODE /cpus/l3-cache@312e
> [ 1672.183971] OF: notify ATTACH_NODE /cpus/l2-cache@202f
> [ 1672.202752] OF: notify ATTACH_NODE /cpus/l3-cache@312f
> [ 1672.230760] OF: notify ATTACH_NODE /ibm,platform-facilities
> 
> Need to verify this by tracing the RTAS calls to ibm,update-nodes. I'll try 
> and look at that tomorrow. The loop that processes the nodes to update in 
> pseries_devicetree_update() blindly ignores the return codes from 
> delete_dt_node(), update_dt_node(), and add_dt_node(). So, it is also 
> possible that we are notified, but are silently failing the add.

I am seeing the same set of detach/attach events for migration between an 
Alpine and Brazos.
I did the logging up in delete_dt_node, update_dt_node, and add_dt_node, and 
did not see any
notification of additional nodes.

> 
> -Tyrel
> 

Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Christoph Hellwig
On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > >> Please go through these patches and review whether this approach 
> > >> broadly
> > >> makes sense. I will appreciate suggestions, inputs, comments 
> > >> regarding
> > >> the patches or the approach in general. Thank you.
> > >
> > > Jason did some work on profiling this. Unfortunately he reports
> > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > 
> >  The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) 
> >  in
> >  guest and measure PPS on tap on host.
> > 
> >  Thanks
> > >>>
> > >>> Could you supply host configuration involved please?
> > >>
> > >> I wonder how much of that could be caused by Spectre mitigations
> > >> blowing up indirect function calls...
> > >>
> > >> Cheers,
> > >> Ben.
> > > 
> > > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> > 
> > Did we get better results (lower regression due to indirect calls) with
> > the suggested mitigation ? Just curious.
> 
> I'm referring to this:
>   I wonder whether we can support map_sg and friends being NULL, then use
>   that when mapping is an identity. A conditional branch there is likely
>   very cheap.
> 
> I don't think anyone tried implementing this yes.

I've done something very similar in the thread I posted a few years
ago. I plan to get a version of that upstream for 4.20, but it won't
cover the virtio case, just the real direct mapping.


[PATCH] powerpc/cpm1: fix compilation error with CONFIG_PPC_EARLY_DEBUG_CPM

2018-08-06 Thread Christophe Leroy
commit e8cb7a55eb8dc ("powerpc: remove superflous inclusions of
asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
including objects from that file.

However, asm/mmu-8xx.h includes  call to __fix_to_virt(). The proper
way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
an inclusion loop.

So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
CONFIG_PPC_EARLY_DEBUG_CPM

  CC  arch/powerpc/sysdev/cpm_common.o
In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
 from ./arch/powerpc/include/asm/reg_8xx.h:8,
 from ./arch/powerpc/include/asm/reg.h:29,
 from ./arch/powerpc/include/asm/processor.h:13,
 from ./arch/powerpc/include/asm/thread_info.h:28,
 from ./include/linux/thread_info.h:38,
 from ./arch/powerpc/include/asm/ptrace.h:159,
 from ./arch/powerpc/include/asm/hw_irq.h:12,
 from ./arch/powerpc/include/asm/irqflags.h:12,
 from ./include/linux/irqflags.h:16,
 from ./include/asm-generic/cmpxchg-local.h:6,
 from ./arch/powerpc/include/asm/cmpxchg.h:537,
 from ./arch/powerpc/include/asm/atomic.h:11,
 from ./include/linux/atomic.h:5,
 from ./include/linux/mutex.h:18,
 from ./include/linux/kernfs.h:13,
 from ./include/linux/sysfs.h:16,
 from ./include/linux/kobject.h:20,
 from ./include/linux/device.h:16,
 from ./include/linux/node.h:18,
 from ./include/linux/cpu.h:17,
 from ./include/linux/of_device.h:5,
 from arch/powerpc/sysdev/cpm_common.c:21:
arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of 
function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
 ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro 
‘VIRT_IMMR_BASE’
   VIRT_IMMR_BASE);
   ^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared 
(first use in this function)
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
   ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro 
‘VIRT_IMMR_BASE’
   VIRT_IMMR_BASE);
   ^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier 
is reported only once for each function it appears in
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
   ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro 
‘VIRT_IMMR_BASE’
   VIRT_IMMR_BASE);
   ^
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1

Fixes: e8cb7a55eb8dc ("powerpc: remove superflous inclusions of asm/fixmap.h")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/fixmap.h | 1 +
 arch/powerpc/sysdev/cpm_common.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/fixmap.h 
b/arch/powerpc/include/asm/fixmap.h
index 40efdf1d2d6e..41cc15c14eee 100644
--- a/arch/powerpc/include/asm/fixmap.h
+++ b/arch/powerpc/include/asm/fixmap.h
@@ -16,6 +16,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
+#include 
 #ifdef CONFIG_HIGHMEM
 #include 
 #include 
diff --git a/arch/powerpc/sysdev/cpm_common.c b/arch/powerpc/sysdev/cpm_common.c
index 010975c3422f..b74508175b67 100644
--- a/arch/powerpc/sysdev/cpm_common.c
+++ b/arch/powerpc/sysdev/cpm_common.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
-- 
2.13.3



[PATCH v02] powerpc/mobility: Fix node detach/rename problem

2018-08-06 Thread Michael Bringmann
The PPC mobility code receives RTAS requests to delete nodes with
platform-/hardware-specific attributes when restarting the kernel
after a migration.  My example is for migration between a P8 Alpine
and a P8 Brazos.   Nodes to be deleted include 'ibm,random-v1',
'ibm,platform-facilities', 'ibm,sym-encryption-v1', and,
'ibm,compression-v1'.

The mobility.c code calls 'of_detach_node' for the nodes and their
children.  This makes calls to detach the properties and to remove
the associated sysfs/kernfs files.

Then new copies of the same nodes are next provided by the PHYP,
local copies are built, and a pointer to the 'struct device_node'
is passed to of_attach_node.  Before the call to of_attach_node,
the phandle is initialized to 0 when the data structure is alloced.
During the call to of_attach_node, it calls __of_attach_node which
pulls the actual name and phandle from just created sub-properties
named something like 'name' and 'ibm,phandle'.

This is all fine for the first migration.  The problem occurs with
the second and subsequent migrations when the PHYP on the new system
wants to replace the same set of nodes again, referenced with the
same names and phandle values.

On the second and subsequent migrations, the PHYP tells the system
to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
nodes by its known set of phandle values -- the same handles used
by the PHYP on the source system are known on the target system.
The mobility.c code calls of_find_node_by_phandle() with these values
and ends up locating the first instance of each node that was added
during the original boot, instead of the second instance of each node
created after the first migration.  The detach during the second
migration fails with errors like,

[ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
__of_detach_node+0x8/0xa0
[ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag 
unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto 
sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi 
scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
4.18.0-rc1-wi107836-v05-120+ #201
[ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 00655170
[ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW  
(4.18.0-rc1-wi107836-v05-120+)
[ 4565.030745] MSR:  80010282b033   
CR: 22288822  XER: 000a
[ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
[ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
c0038e68
[ 4565.030757] GPR04: 0001  80c008e0b4b8 

[ 4565.030757] GPR08:  0001 8003 
2843
[ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 

[ 4565.030757] GPR16:  0008  
f6ff
[ 4565.030757] GPR20: 0007  c003e9f1f034 
0001
[ 4565.030757] GPR24:    

[ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
c003f302b930
[ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
[ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0
[ 4565.030811] Call Trace:
[ 4565.030815] [c003f302b910] [c07c1fa4] of_detach_node+0x64/0xd0 
(unreliable)
[ 4565.030821] [c003f302b980] [c00c33c4] 
dlpar_detach_node+0xb4/0x150
[ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80
[ 4565.030831] [c003f302ba40] [c00c4380] 
pseries_devicetree_update+0x150/0x4f0
[ 4565.030836] [c003f302bb70] [c00c479c] 
post_mobility_fixup+0x7c/0xf0
[ 4565.030841] [c003f302bbe0] [c00c4908] migration_store+0xf8/0x130
[ 4565.030847] [c003f302bc70] [c0998160] kobj_attr_store+0x30/0x60
[ 4565.030852] [c003f302bc90] [c0412f14] sysfs_kf_write+0x64/0xa0
[ 4565.030857] [c003f302bcb0] [c0411cac] 
kernfs_fop_write+0x16c/0x240
[ 4565.030862] [c003f302bd00] [c0355f20] __vfs_write+0x40/0x220
[ 4565.030867] [c003f302bd90] [c0356358] vfs_write+0xc8/0x240
[ 4565.030872] [c003f302bde0] [c03566cc] ksys_write+0x5c/0x100
[ 4565.030880] [c003f302be30] [c000b288] system_call+0x5c/0x70
[ 4565.030884] Instruction dump:
[ 4565.030887] 38210070 3860 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 
ebe1fff8
[ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b09> 2f89 4cde0020 
e9030040
[ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---

The mobility.c code continues on during the second migration, accepts
the definitions of the new nodes from the PHYP and ends up renaming
the new properties 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Will Deacon
Hi Michael,

On Sun, Aug 05, 2018 at 03:27:42AM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > > However the question people raise is that DMA API is already full of
> > > > > arch-specific tricks the likes of which are outlined in your post 
> > > > > linked
> > > > > above. How is this one much worse?
> > > > 
> > > > None of these warts is visible to the driver, they are all handled in
> > > > the architecture (possibly on a per-bus basis).
> > > > 
> > > > So for virtio we really need to decide if it has one set of behavior
> > > > as specified in the virtio spec, or if it behaves exactly as if it
> > > > was on a PCI bus, or in fact probably both as you lined up.  But no
> > > > magic arch specific behavior inbetween.
> > > 
> > > The only arch specific behaviour is needed in the case where it doesn't
> > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> > > our secure VMs, we still need to make it use swiotlb in order to bounce
> > > through non-secure pages.
> > 
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops, we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
> > 
> > Will
> 
> Right that's not very different from placing the device within the IOMMU
> domain but in fact bypassing the IOMMU

Hmm, I'm not sure I follow you here -- the IOMMU bypassing is handled
inside the IOMMU driver, so we'd still end up with non-coherent DMA ops
for the guest accesses. The presence of an IOMMU doesn't imply coherency for
us. Or am I missing your point here?

> I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we
> can extend PLATFORM_IOMMU to cover that or add another bit.

I think that's probably the right way around: assume that legacy virtio-mmio
devices are coherent by default.

> What exactly do the non-coherent ops do that causes the corruption?

The non-coherent ops mean that the guest ends up allocating the vring queues
using non-cacheable mappings, whereas qemu/hypervisor uses a cacheable
mapping despite not advertising the devices as being cache-coherent.

This hits something in the architecture known as "mismatched aliases", which
means that coherency is lost between the guest and the hypervisor,
consequently resulting in data not being visible and ordering not being
guaranteed. The usual symptom is that the device appears to lock up iirc,
because the guest and the hypervisor are unable to communicate with each
other.

Does that help to clarify things?

Thanks,

Will


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Sun, Aug 05, 2018 at 02:52:54PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote:
> > I see the allure of this, but I think down the road you will
> > discover passing a flag in libvirt XML saying
> > "please use a secure mode" or whatever is a good idea.
> > 
> > Even thought it is probably not required to address this
> > specific issue.
> > 
> > For example, I don't think ballooning works in secure mode,
> > you will be able to teach libvirt not to try to add a
> > balloon to the guest.
> 
> Right, we'll need some quirk to disable balloons  in the guest I
> suppose.
> 
> Passing something from libvirt is cumbersome because the end user may
> not even need to know about secure VMs. There are use cases where the
> security is a contract down to some special application running inside
> the secure VM, the sysadmin knows nothing about.
> 
> Also there's repercussions all the way to admin tools, web UIs etc...
> so it's fairly wide ranging.
> 
> So as long as we only need to quirk a couple of devices, it's much
> better contained that way.

So just the balloon thing already means that yes management and all the
way to the user tools must know this is going on. Otherwise
user will try to inflate the balloon and wonder why this does not work.

> > > Later on, (we may have even already run Linux at that point,
> > > unsecurely, as we can use Linux as a bootloader under some
> > > circumstances), we start a "secure image".
> > > 
> > > This is a kernel zImage that includes a "ticket" that has the
> > > appropriate signature etc... so that when that kernel starts, it can
> > > authenticate with the ultravisor, be verified (along with its ramdisk)
> > > etc... and copied (by the UV) into secure memory & run from there.
> > > 
> > > At that point, the hypervisor is informed that the VM has become
> > > secure.
> > > 
> > > So at that point, we could exit to qemu to inform it of the change,
> > 
> > That's probably a good idea too.
> 
> We probably will have to tell qemu eventually for migration, as we'll
> need some kind of key exchange phase etc... to deal with the crypto
> aspects (the actual page copy is sorted via encrypting the secure pages
> back to normal pages in qemu, but we'll need extra metadata).
> 
> > > and
> > > have it walk the qtree and "Switch" all the virtio devices to use the
> > > IOMMU I suppose, but it feels a lot grosser to me.
> > 
> > That part feels gross, yes.
> > 
> > > That's the only other option I can think of.
> > > 
> > > > However in this specific case, the flag does not need to come from the
> > > > hypervisor, it can be set by arch boot code I think.
> > > > Christoph do you see a problem with that?
> > > 
> > > The above could do that yes. Another approach would be to do it from a
> > > small virtio "quirk" that pokes a bit in the device to force it to
> > > iommu mode when it detects that we are running in a secure VM. That's a
> > > bit warty on the virito side but probably not as much as having a qemu
> > > one that walks of the virtio devices to change how they behave.
> > > 
> > > What do you reckon ?
> > 
> > I think you are right that for the dma limit the hypervisor doesn't seem
> > to need to know.
> 
> It's not just a limit mind you. It's a range, at least if we allocate
> just a single pool of insecure pages. swiotlb feels like a better
> option for us.
> 
> > > What we want to avoid is to expose any of this to the *end user* or
> > > libvirt or any other higher level of the management stack. We really
> > > want that stuff to remain contained between the VM itself, KVM and
> > > maybe qemu.
> > > 
> > > We will need some other qemu changes for migration so that's ok. But
> > > the minute you start touching libvirt and the higher levels it becomes
> > > a nightmare.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I don't believe you'll be able to avoid that entirely. The split between
> > libvirt and qemu is more about community than about code, random bits of
> > functionality tend to land on random sides of that fence.  Better add a
> > tag in domain XML early is my advice. Having said that, it's your
> > hypervisor. I'm just suggesting that when hypervisor does somehow need
> > to care then I suspect most people won't be receptive to the argument
> > that changing libvirt is a nightmare.
> 
> It only needs to care at runtime. The problem isn't changing libvirt
> per-se, I don't have a problem with that. The problem is that it means
> creating two categories of machines "secure" and "non-secure", which is
> end-user visible, and thus has to be escalated to all the various
> management stacks, UIs, etc... out there.
> 
> In addition, there are some cases where the individual creating the VMs
> may not have any idea that they are secure.
> 
> But yes, if we have to, we'll do it. However, so far, we don't think
> it's a great idea.
> 
> Cheers,
> Ben.

Here's another example: you can't migrate a secure vm to hypervisor

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Michael S. Tsirkin
On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> >> Please go through these patches and review whether this approach 
> >> broadly
> >> makes sense. I will appreciate suggestions, inputs, comments regarding
> >> the patches or the approach in general. Thank you.
> >
> > Jason did some work on profiling this. Unfortunately he reports
> > about 4% extra overhead from this switch on x86 with no vIOMMU.
> 
>  The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
>  guest and measure PPS on tap on host.
> 
>  Thanks
> >>>
> >>> Could you supply host configuration involved please?
> >>
> >> I wonder how much of that could be caused by Spectre mitigations
> >> blowing up indirect function calls...
> >>
> >> Cheers,
> >> Ben.
> > 
> > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> 
> Did we get better results (lower regression due to indirect calls) with
> the suggested mitigation ? Just curious.

I'm referring to this:
I wonder whether we can support map_sg and friends being NULL, then use
that when mapping is an identity. A conditional branch there is likely
very cheap.

I don't think anyone tried implementing this yes.

-- 
MST


[PATCH] misc: ibmvsm: Fix wrong assignment of return code

2018-08-06 Thread Bryant G. Ly
From: "Bryant G. Ly" 

Currently the assignment is flipped and rc is always 0.

Signed-off-by: Bryant G. Ly 
Reviewed-by: Bradley Warrum 
---
 drivers/misc/ibmvmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ibmvmc.c b/drivers/misc/ibmvmc.c
index 8f82bb9..b8aaa68 100644
--- a/drivers/misc/ibmvmc.c
+++ b/drivers/misc/ibmvmc.c
@@ -2131,7 +2131,7 @@ static int ibmvmc_init_crq_queue(struct 
crq_server_adapter *adapter)
retrc = plpar_hcall_norets(H_REG_CRQ,
   vdev->unit_address,
   queue->msg_token, PAGE_SIZE);
-   retrc = rc;
+   rc = retrc;
 
if (rc == H_RESOURCE)
rc = ibmvmc_reset_crq_queue(adapter);
-- 
2.7.2



Re: [PATCH v5 0/8] powerpc/fsl: Speculation barrier for NXP PowerPC Book3E

2018-08-06 Thread Diana Madalina Craciun
Hi Michael,

Sorry for the late answer, I was out of the office last week.

It looks fine to me, I have tested the patches on NXP PowerPC Book 3E
platforms and it worked well.

Thanks,

Diana

On 7/28/2018 2:06 AM, Michael Ellerman wrote:
> Implement barrier_nospec for NXP PowerPC Book3E processors.
>
> Hi Diana,
>
> This series interacts with another series of mine, so I wanted to rework it
> slightly. Let me know if this looks OK to you.
>
> cheers
>
> Diana Craciun (6):
>   powerpc/64: Disable the speculation barrier from the command line
>   powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
>   powerpc/64: Make meltdown reporting Book3S 64 specific
>   powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
>   powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit
> platforms
>   Documentation: Add nospectre_v1 parameter
>
> Michael Ellerman (2):
>   powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
>   powerpc/64: Call setup_barrier_nospec() from setup_arch()
>
>  Documentation/admin-guide/kernel-parameters.txt |  4 +++
>  arch/powerpc/Kconfig|  7 -
>  arch/powerpc/include/asm/barrier.h  | 12 ++---
>  arch/powerpc/include/asm/setup.h|  6 -
>  arch/powerpc/kernel/Makefile|  3 ++-
>  arch/powerpc/kernel/entry_32.S  | 10 +++
>  arch/powerpc/kernel/module.c|  4 ++-
>  arch/powerpc/kernel/security.c  | 17 +++-
>  arch/powerpc/kernel/setup-common.c  |  2 ++
>  arch/powerpc/kernel/vmlinux.lds.S   |  4 ++-
>  arch/powerpc/lib/feature-fixups.c   | 35 
> -
>  arch/powerpc/platforms/powernv/setup.c  |  1 -
>  arch/powerpc/platforms/pseries/setup.c  |  1 -
>  13 files changed, 94 insertions(+), 12 deletions(-)
>



Re: [PATCH v2] selftests/powerpc: Avoid remaining process/threads

2018-08-06 Thread Michael Ellerman
Breno Leitao  writes:

> diff --git a/tools/testing/selftests/powerpc/harness.c 
> b/tools/testing/selftests/powerpc/harness.c
> index 66d31de60b9a..06c51e8d8ccb 100644
> --- a/tools/testing/selftests/powerpc/harness.c
> +++ b/tools/testing/selftests/powerpc/harness.c
> @@ -85,13 +85,16 @@ int run_test(int (test_function)(void), char *name)
>   return status;
>  }
>  
> -static void alarm_handler(int signum)
> +static void sig_handler(int signum)
>  {
> - /* Jut wake us up from waitpid */
> + if (signum == SIGINT)
> + kill(-pid, SIGTERM);

I don't think we need to do that here, if we just return then we'll pop
out of the waitpid() and go via the normal path.

Can you test with the existing signal handler, but wired up to SIGINT?

cheers


Re: Build regressions/improvements in v4.17-rc1

2018-08-06 Thread Geert Uytterhoeven
CC Dan, Michael, AKPM, powerpc

On Mon, Apr 16, 2018 at 3:10 PM Geert Uytterhoeven  wrote:
> Below is the list of build error/warning regressions/improvements in
> v4.17-rc1[1] compared to v4.16[2].

I'd like to point your attention to:

>   + warning: vmlinux.o(.text+0x376518): Section mismatch in reference from 
> the function .devm_memremap_pages() to the function 
> .meminit.text:.arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x376d64): Section mismatch in reference from 
> the function .devm_memremap_pages_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x37dfd8): Section mismatch in reference from 
> the function .devm_memremap_pages() to the function 
> .meminit.text:.arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x37e824): Section mismatch in reference from 
> the function .devm_memremap_pages_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x3944): Section mismatch in reference from the 
> variable start_here_multiplatform to the function .init.text:early_setup():  
> => N/A
>   + warning: vmlinux.o(.text+0x3978): Section mismatch in reference from the 
> variable start_here_common to the function .init.text:start_kernel():  => N/A
>   + warning: vmlinux.o(.text+0x3a66c): Section mismatch in reference from the 
> function mips_sc_init() to the function .init.text:mips_sc_probe_cm3():  => 
> N/A
>   + warning: vmlinux.o(.text+0x3e9908): Section mismatch in reference from 
> the function .devm_memremap_pages() to the function 
> .meminit.text:.arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x3ea154): Section mismatch in reference from 
> the function .devm_memremap_pages_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x498dbc): Section mismatch in reference from 
> the function hmm_devmem_release() to the function 
> .meminit.text:arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x499130): Section mismatch in reference from 
> the function hmm_devmem_pages_create() to the function 
> .meminit.text:arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x4a59ec): Section mismatch in reference from 
> the function .hmm_devmem_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x4a5d08): Section mismatch in reference from 
> the function .hmm_devmem_pages_create() to the function 
> .meminit.text:.arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x4ad8ac): Section mismatch in reference from 
> the function .hmm_devmem_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x4adbc8): Section mismatch in reference from 
> the function .hmm_devmem_pages_create() to the function 
> .meminit.text:.arch_add_memory():  => N/A
>   + warning: vmlinux.o(.text+0x4ca7238): Section mismatch in reference from 
> the function .create_device_attrs() to the function 
> .init.text:.make_sensor_label():  => N/A
>   + warning: vmlinux.o(.text+0x51ffec): Section mismatch in reference from 
> the function .hmm_devmem_release() to the function 
> .meminit.text:.arch_remove_memory():  => N/A
>   + warning: vmlinux.o(.text+0x520308): Section mismatch in reference from 
> the function .hmm_devmem_pages_create() to the function 
> .meminit.text:.arch_add_memory():  => N/A

These are still seen on v4.18-rc8 on various powerpc all{mod,yes}config builds.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Christoph Hellwig
On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote:
> Who would set this bit ? qemu ? Under what circumstances ?

I don't really care who sets what.  The implementation might not even
involved qemu.

It is your job to write a coherent interface specification that does
not depend on the used components.  The hypervisor might be PAPR,
Linux + qemu, VMware, Hyperv or something so secret that you'd have
to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
AIX, OS400 or a Hipster project of the day in Rust.  As long as we
properly specify the interface it simplify does not matter.

> What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
> ie, what would qemu do and what would Linux do ? I'm not sure I fully
> understand your idea.

In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the
description which currently is rather vague but basically captures
the use case.  Currently is is:

VIRTIO_F_IOMMU_PLATFORM(33)
This feature indicates that the device is behind an IOMMU that
translates bus addresses from the device into physical addresses in
memory. If this feature bit is set to 0, then the device emits
physical addresses which are not translated further, even though an
IOMMU may be present.

And I'd change it to something like:

VIRTIO_F_PLATFORM_DMA(33)
This feature indicates that the device emits platform specific
bus addresses that might not be identical to physical address.
The translation of physical to bus address is platform speific
and defined by the plaform specification for the bus that the virtio
device is attached to.
If this feature bit is set to 0, then the device emits
physical addresses which are not translated further, even if
the platform would normally require translations for the bus that
the virtio device is attached to.

If we can't change the defintion any more we should deprecate the
old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM
and VIRTIO_F_PLATFORM_DMA to be not set at the same time.

> I'm trying to understand because the limitation is not a device side
> limitation, it's not a qemu limitation, it's actually more of a VM
> limitation. It has most of its memory pages made inaccessible for
> security reasons. The platform from a qemu/KVM perspective is almost
> entirely normal.

Well, find a way to describe this either in the qemu specification using
new feature bits, or by using something like the above.


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-06 Thread Anshuman Khandual
On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
>> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
>
> Jason did some work on profiling this. Unfortunately he reports
> about 4% extra overhead from this switch on x86 with no vIOMMU.

 The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
 guest and measure PPS on tap on host.

 Thanks
>>>
>>> Could you supply host configuration involved please?
>>
>> I wonder how much of that could be caused by Spectre mitigations
>> blowing up indirect function calls...
>>
>> Cheers,
>> Ben.
> 
> I won't be surprised. If yes I suggested a way to mitigate the overhead.

Did we get better results (lower regression due to indirect calls) with
the suggested mitigation ? Just curious.



Re: [PATCH] powerpc/fadump: handle crash memory ranges array overflow

2018-08-06 Thread Michael Ellerman
Hari Bathini  writes:
> On Monday 06 August 2018 09:52 AM, Mahesh Jagannath Salgaonkar wrote:
>> On 07/31/2018 07:26 PM, Hari Bathini wrote:
>>> Crash memory ranges is an array of memory ranges of the crashing kernel
>>> to be exported as a dump via /proc/vmcore file. The size of the array
>>> is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
>>> where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
>>> value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
>>> commit 142b45a72e22 ("memblock: Add array resizing support").
>>>
...
>>>
>>> Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD 
>>> program headers.")
>>> Cc: sta...@vger.kernel.org
>>> Cc: Mahesh Salgaonkar 
>>> Signed-off-by: Hari Bathini 
>>> ---
>>>   arch/powerpc/include/asm/fadump.h |2 +
>>>   arch/powerpc/kernel/fadump.c  |   63 
>>> ++---
>>>   2 files changed, 59 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/fadump.h 
>>> b/arch/powerpc/include/asm/fadump.h
>>> index 5a23010..ff708b3 100644
>>> --- a/arch/powerpc/include/asm/fadump.h
>>> +++ b/arch/powerpc/include/asm/fadump.h
>>> @@ -196,7 +196,7 @@ struct fadump_crash_info_header {
>>>   };
>>>
>>>   /* Crash memory ranges */
>>> -#define INIT_CRASHMEM_RANGES   (INIT_MEMBLOCK_REGIONS + 2)
>>> +#define INIT_CRASHMEM_RANGES   INIT_MEMBLOCK_REGIONS
>>>
>>>   struct fad_crash_memory_ranges {
>>> unsigned long long  base;
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index 07e8396..1c1df4f 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
...
>
>> Also alongwith this change, Should we also double the initial array size
>> (e.g. INIT_CRASHMEM_RANGES * 2) to reduce our chances to go for memory
>> allocation ?
>
> Agreed that doubling the static array size reduces the likelihood of the 
> need for
> dynamic array resizing. Will do that.
>
> Nonetheless, if we get to the point where 2K memory allocation fails on 
> a system with so many memory ranges, it is likely that the kernel has some 
> basic 
> problems to deal with first :)

Yes, this all seems a bit silly. 

Why not just allocate a 64K page and be done with it?

AFAICS we're not being called too early to do that, and if you can't
allocate a single page then the system is going to OOM anyway.

cheers