Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Andrea Arcangeli
Hi!

On Wed, Jul 24, 2013 at 01:30:12PM +0300, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
> > 
> > On 24.07.2013, at 12:19, Gleb Natapov wrote:
> > 
> > > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> > >> 
> > >> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> > >> 
> > >>> Copying Andrea for him to verify that I am not talking nonsense :)
> > >>> 
> > >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 1580dd4..5e8635b 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> > > 
> > > bool kvm_is_mmio_pfn(pfn_t pfn)
> > > {
> > > +#ifdef CONFIG_MEMORY_HOTPLUG
> >  
> >  I'd feel safer if we narrow this down to e500.
> >  
> > > +   /*
> > > +* Currently only in memory hot remove case we may still need 
> > > this.
> > > +*/
> > >  if (pfn_valid(pfn)) {
> >  
> >  We still have to check for pfn_valid, no? So the #ifdef should be down 
> >  here.
> >  
> > >  int reserved;
> > >  struct page *tail = pfn_to_page(pfn);
> > > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> > >  }
> > >  return PageReserved(tail);
> > >  }
> > > +#endif
> > > 
> > >  return true;
> > > }
> > > 
> > > Before apply this change:
> > > 
> > > real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 
> > > 1m22.120s)/5= 1m21.376s
> > > user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 
> > > 0m23.520s)/5= 0m23.433s
> > > sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
> > > 0m50.349s
> > > 
> > > After apply this change:
> > > 
> > > real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 
> > > 1m20.293s)/5= 1m20.667s
> > > user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 
> > > 0m22.467s)/5= 0m22.615s
> > > sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
> > > 0m49.496s
> > > 
> > > So,
> > > 
> > > real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > > user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > > sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >  
> >  Very nice, so there is a real world performance benefit to doing this. 
> >  Then yes, I think it would make sense to change the global helper 
> >  function to be fast on e500 and use that one from 
> >  e500_shadow_mas2_attrib() instead.
> >  
> >  Gleb, Paolo, any hard feelings?
> >  
> > >>> I do not see how can we break the function in such a way and get
> > >>> away with it. Not all valid pfns point to memory. Physical address can
> > >>> be sparse (due to PCI hole, framebuffer or just because).
> > >> 
> > >> But we don't check for sparseness today in here either. We merely check 
> > >> for incomplete huge pages.
> > >> 
> > > That's not how I read the code. The code checks for reserved flag set.
> > > It should be set on pfns that point to memory holes. As far as I
> > 
> > I couldn't find any traces of code that sets the reserved bits on e500 
> > chips though. I've only seen it getting set for memory hotplug and memory 
> > incoherent DMA code which doesn't get used on e500.
> > 
> > But I'd be more than happy to get proven wrong :).
> > 
> Can you write a module that scans all page structures? AFAIK all pages
> are marked as reserved and then those that become regular memory are
> marked as unreserved. Hope Andrea will chime in here :)

So the situation with regard to non-RAM and PageReserved/pfn_valid is
quite simple.

"struct page" exists for non-RAM too as "struct page" must exist up to
at least 2^MAX_ORDER pfn alignment or things breaks, like the first
pfn must be 2^MXA_ORDER aligned or again things break in the buddy. We
don't make an effort to save a few "struct page" to keep it simpler.

But those non-RAM pages (or tiny non-RAM page holes if any) are marked
PageReserved.

If "struct page" doesn't exist pfn_valid returns false.

So you cannot get away skipping pfn_valid and at least one
PageReserved.

However it gets more complex than just ram vs non-RAM, because there
are pages that are real RAM (not left marked PageReserved at boot
after checking e820 or equivalent bios data for non-x86 archs) but
that are taken over by drivers, than then could use it as mmio regions
snooping the writes and mapping them in userland too as hugepages
maybe. That is the motivation for the THP related code in
kvm_is_mmio_pfn.

Those vmas have VM_PFNMAP set so vm_normal_page is zero and the
refcounting is skipped like if it's non-RAM and they're mapped with
remap_pfn_range (different mechanism for VM_MIXEDMAP that does the
refco

Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 15:43 -0700, Andrew Morton wrote:
> For what?  The three lines of comment in page-flags.h?   ack :)
> 
> Manipulating page->_count directly is considered poor form.  Don't
> blame us if we break your code ;)
> 
> Actually, the manipulation in realmode_get_page() duplicates the
> existing get_page_unless_zero() and the one in realmode_put_page()
> could perhaps be placed in mm.h with a suitable name and some
> documentation.  That would improve your form and might protect the code
> from getting broken later on.

Yes, this stuff makes me really nervous :-) If it didn't provide an order
of magnitude performance improvement in KVM I would avoid it but heh...

Alexey, I like having that stuff in generic code.

However the meaning of the words "real mode" can be ambiguous accross
architectures, it might be best to then name it "mmu_off_put_page" to
make things a bit clearer, along with a comment explaining that this is
called in a context where none of the virtual mappings are accessible
(vmalloc, vmemmap, IOs, ...), and that in the case of sparsemem vmemmap
the caller must have taken care of getting the physical address of the
struct page and of ensuring it isn't split accross two vmemmap blocks.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Andrew Morton
On Tue, 23 Jul 2013 12:22:59 +1000 Alexey Kardashevskiy  wrote:

> Ping, anyone, please?

ew, you top-posted.

> Ben needs ack from any of MM people before proceeding with this patch. Thanks!

For what?  The three lines of comment in page-flags.h?   ack :)

Manipulating page->_count directly is considered poor form.  Don't
blame us if we break your code ;)

Actually, the manipulation in realmode_get_page() duplicates the
existing get_page_unless_zero() and the one in realmode_put_page()
could perhaps be placed in mm.h with a suitable name and some
documentation.  That would improve your form and might protect the code
from getting broken later on.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Scott Wood

On 07/24/2013 04:39:59 AM, Alexander Graf wrote:


On 24.07.2013, at 11:35, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>> Are not we going to use page_is_ram() from   
e500_shadow_mas2_attrib() as Scott commented?

>>
>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>
>>
> Because it is much slower and, IIRC, actually used to build pfn map  
that allow

> us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the  
rest of the kvm code is doing.


I don't understand "actually used to build pfn map...".  What code is  
this?  I don't see any calls to page_is_ram() in the KVM code, or in  
generic mm code.  Is this a statement about what x86 does?


On PPC page_is_ram() is only called (AFAICT) for determining what  
attributes to set on mmaps.  We want to be sure that KVM always makes  
the same decision.  While pfn_valid() seems like it should be  
equivalent, it's not obvious from the PPC code that it is.


If pfn_valid() is better, why is that not used for mmap?  Why are there  
two different names for the same thing?


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
> 
> On 24.07.2013, at 12:19, Gleb Natapov wrote:
> 
> > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> >> 
> >> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> >> 
> >>> Copying Andrea for him to verify that I am not talking nonsense :)
> >>> 
> >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> > 
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
>  
>  I'd feel safer if we narrow this down to e500.
>  
> > +   /*
> > +* Currently only in memory hot remove case we may still need 
> > this.
> > +*/
> >  if (pfn_valid(pfn)) {
>  
>  We still have to check for pfn_valid, no? So the #ifdef should be down 
>  here.
>  
> >  int reserved;
> >  struct page *tail = pfn_to_page(pfn);
> > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >  }
> >  return PageReserved(tail);
> >  }
> > +#endif
> > 
> >  return true;
> > }
> > 
> > Before apply this change:
> > 
> > real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
> > 1m21.376s
> > user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
> > 0m23.433s
> > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
> > 0m50.349s
> > 
> > After apply this change:
> > 
> > real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
> > 1m20.667s
> > user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
> > 0m22.615s
> > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
> > 0m49.496s
> > 
> > So,
> > 
> > real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>  
>  Very nice, so there is a real world performance benefit to doing this. 
>  Then yes, I think it would make sense to change the global helper 
>  function to be fast on e500 and use that one from 
>  e500_shadow_mas2_attrib() instead.
>  
>  Gleb, Paolo, any hard feelings?
>  
> >>> I do not see how can we break the function in such a way and get
> >>> away with it. Not all valid pfns point to memory. Physical address can
> >>> be sparse (due to PCI hole, framebuffer or just because).
> >> 
> >> But we don't check for sparseness today in here either. We merely check 
> >> for incomplete huge pages.
> >> 
> > That's not how I read the code. The code checks for reserved flag set.
> > It should be set on pfns that point to memory holes. As far as I
> 
> I couldn't find any traces of code that sets the reserved bits on e500 chips 
> though. I've only seen it getting set for memory hotplug and memory 
> incoherent DMA code which doesn't get used on e500.
> 
> But I'd be more than happy to get proven wrong :).
> 
Can you write a module that scans all page structures? AFAIK all pages
are marked as reserved and then those that become regular memory are
marked as unreserved. Hope Andrea will chime in here :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:19, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
>> 
>> On 24.07.2013, at 12:01, Gleb Natapov wrote:
>> 
>>> Copying Andrea for him to verify that I am not talking nonsense :)
>>> 
>>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1580dd4..5e8635b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> 
> bool kvm_is_mmio_pfn(pfn_t pfn)
> {
> +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
> +   /*
> +* Currently only in memory hot remove case we may still need 
> this.
> +*/
>  if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down 
 here.
 
>  int reserved;
>  struct page *tail = pfn_to_page(pfn);
> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>  }
>  return PageReserved(tail);
>  }
> +#endif
> 
>  return true;
> }
> 
> Before apply this change:
> 
> real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
> 1m21.376s
> user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
> 0m23.433s
> sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
> 0m50.349s
> 
> After apply this change:
> 
> real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
> 1m20.667s
> user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
> 0m22.615s
> sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
> 0m49.496s
> 
> So,
> 
> real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. 
 Then yes, I think it would make sense to change the global helper function 
 to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
>>> I do not see how can we break the function in such a way and get
>>> away with it. Not all valid pfns point to memory. Physical address can
>>> be sparse (due to PCI hole, framebuffer or just because).
>> 
>> But we don't check for sparseness today in here either. We merely check for 
>> incomplete huge pages.
>> 
> That's not how I read the code. The code checks for reserved flag set.
> It should be set on pfns that point to memory holes. As far as I

I couldn't find any traces of code that sets the reserved bits on e500 chips 
though. I've only seen it getting set for memory hotplug and memory incoherent 
DMA code which doesn't get used on e500.

But I'd be more than happy to get proven wrong :).


Alex

> understand huge page tricks they are there to guaranty that THP does not
> change flags under us, Andrea?
> 
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> 
> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> 
> > Copying Andrea for him to verify that I am not talking nonsense :)
> > 
> > On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1580dd4..5e8635b 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >>> 
> >>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> {
> >>> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> 
> >> I'd feel safer if we narrow this down to e500.
> >> 
> >>> +   /*
> >>> +* Currently only in memory hot remove case we may still need 
> >>> this.
> >>> +*/
> >>>   if (pfn_valid(pfn)) {
> >> 
> >> We still have to check for pfn_valid, no? So the #ifdef should be down 
> >> here.
> >> 
> >>>   int reserved;
> >>>   struct page *tail = pfn_to_page(pfn);
> >>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>   }
> >>>   return PageReserved(tail);
> >>>   }
> >>> +#endif
> >>> 
> >>>   return true;
> >>> }
> >>> 
> >>> Before apply this change:
> >>> 
> >>> real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
> >>> 1m21.376s
> >>> user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
> >>> 0m23.433s
> >>> sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
> >>> 0m50.349s
> >>> 
> >>> After apply this change:
> >>> 
> >>> real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
> >>> 1m20.667s
> >>> user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
> >>> 0m22.615s
> >>> sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
> >>> 0m49.496s
> >>> 
> >>> So,
> >>> 
> >>> real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> >>> user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> >>> sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >> 
> >> Very nice, so there is a real world performance benefit to doing this. 
> >> Then yes, I think it would make sense to change the global helper function 
> >> to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> >> 
> >> Gleb, Paolo, any hard feelings?
> >> 
> > I do not see how can we break the function in such a way and get
> > away with it. Not all valid pfns point to memory. Physical address can
> > be sparse (due to PCI hole, framebuffer or just because).
> 
> But we don't check for sparseness today in here either. We merely check for 
> incomplete huge pages.
> 
That's not how I read the code. The code checks for reserved flag set.
It should be set on pfns that point to memory holes. As far as I
understand huge page tricks they are there to guaranty that THP does not
change flags under us, Andrea?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:01, Gleb Natapov wrote:

> Copying Andrea for him to verify that I am not talking nonsense :)
> 
> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>> 
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> 
>> I'd feel safer if we narrow this down to e500.
>> 
>>> +   /*
>>> +* Currently only in memory hot remove case we may still need this.
>>> +*/
>>>   if (pfn_valid(pfn)) {
>> 
>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>> 
>>>   int reserved;
>>>   struct page *tail = pfn_to_page(pfn);
>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>   }
>>>   return PageReserved(tail);
>>>   }
>>> +#endif
>>> 
>>>   return true;
>>> }
>>> 
>>> Before apply this change:
>>> 
>>> real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
>>> 1m21.376s
>>> user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
>>> 0m23.433s
>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>> 
>>> After apply this change:
>>> 
>>> real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
>>> 1m20.667s
>>> user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
>>> 0m22.615s
>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>> 
>>> So,
>>> 
>>> real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>> user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>> 
>> Very nice, so there is a real world performance benefit to doing this. Then 
>> yes, I think it would make sense to change the global helper function to be 
>> fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>> 
>> Gleb, Paolo, any hard feelings?
>> 
> I do not see how can we break the function in such a way and get
> away with it. Not all valid pfns point to memory. Physical address can
> be sparse (due to PCI hole, framebuffer or just because).

But we don't check for sparseness today in here either. We merely check for 
incomplete huge pages.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
Copying Andrea for him to verify that I am not talking nonsense :)

On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> > 
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> 
> I'd feel safer if we narrow this down to e500.
> 
> > +   /*
> > +* Currently only in memory hot remove case we may still need this.
> > +*/
> >if (pfn_valid(pfn)) {
> 
> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> 
> >int reserved;
> >struct page *tail = pfn_to_page(pfn);
> > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >}
> >return PageReserved(tail);
> >}
> > +#endif
> > 
> >return true;
> > }
> > 
> > Before apply this change:
> > 
> > real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
> > 1m21.376s
> > user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
> > 0m23.433s
> > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> > 
> > After apply this change:
> > 
> > real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
> > 1m20.667s
> > user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
> > 0m22.615s
> > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> > 
> > So,
> > 
> > real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> 
> Very nice, so there is a real world performance benefit to doing this. Then 
> yes, I think it would make sense to change the global helper function to be 
> fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> 
> Gleb, Paolo, any hard feelings?
> 
I do not see how can we break the function in such a way and get
away with it. Not all valid pfns point to memory. Physical address can
be sparse (due to PCI hole, framebuffer or just because).

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:35, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>> Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as 
>>> Scott commented?
>> 
>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>> 
>> 
> Because it is much slower and, IIRC, actually used to build pfn map that allow
> us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the rest of the 
kvm code is doing.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> > Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as 
> > Scott commented?
> 
> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> 
> 
Because it is much slower and, IIRC, actually used to build pfn map that allow
us to check quickly for valid pfn.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Wednesday, July 24, 2013 1:55 PM
>> To: "“tiejun.chen”"
>> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org 
>> list;
>> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>> 
>> 
>> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
> 
> 
>> -Original Message-
>> From: Bhushan Bharat-R65777
>> Sent: Thursday, July 18, 2013 1:53 PM
>> To: '" tiejun.chen "'
>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
>> ag...@suse.de; Wood Scott-
>> B07421
>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
>> for kernel managed pages
>> 
>> 
>> 
>>> -Original Message-
>>> From: " tiejun.chen " [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 1:52 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
>>> ag...@suse.de; Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>> only for kernel managed pages
>>> 
>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org
> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of " tiejun.chen 
> "
> Sent: Thursday, July 18, 2013 1:01 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
> ag...@suse.de; Wood
> Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> only for kernel managed pages
> 
> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: " tiejun.chen " [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 11:56 AM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
>>> ag...@suse.de; Wood
>>> Scott- B07421; Bhushan Bharat-R65777
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>> only for kernel managed pages
>>> 
>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then
 it's normal DDR and the mapping sets "M" bit (coherent,
 cacheable) else this is treated as I/O and we set  "I +
 G"  (cache inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned
 device
 
 Signed-off-by: Bharat Bhushan
 
 ---
   arch/powerpc/kvm/e500_mmu_host.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
>>> usermode)
return mas3;
   }
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
 +pfn_t pfn)
   {
 +  u32 mas2_attr;
 +
 +  mas2_attr = mas2 & MAS2_ATTRIB_MASK;
 +
 +  if (!pfn_valid(pfn)) {
>>> 
>>> Why not directly use kvm_is_mmio_pfn()?
>> 
>> What I understand from this function (someone can correct
>> me) is that it
> returns "false" when the page is managed by kernel and is
> not marked as RESERVED (for some reason). For us it does not
> ma

RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, July 24, 2013 1:55 PM
> To: "“tiejun.chen”"
> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org list;
> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
> 
> > On 07/18/2013 06:27 PM, Alexander Graf wrote:
> >>
> >> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
> >>
> >>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
> 
>  On 18.07.2013, at 12:08, “tiejun.chen” wrote:
> 
> > On 07/18/2013 05:48 PM, Alexander Graf wrote:
> >>
> >> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
> >>
> >>>
> >>>
>  -Original Message-
>  From: Bhushan Bharat-R65777
>  Sent: Thursday, July 18, 2013 1:53 PM
>  To: '" tiejun.chen "'
>  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
>  ag...@suse.de; Wood Scott-
>  B07421
>  Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
>  for kernel managed pages
> 
> 
> 
> > -Original Message-
> > From: " tiejun.chen " [mailto:tiejun.c...@windriver.com]
> > Sent: Thursday, July 18, 2013 1:52 PM
> > To: Bhushan Bharat-R65777
> > Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
> > ag...@suse.de; Wood
> > Scott-
> > B07421
> > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> > only for kernel managed pages
> >
> > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: kvm-ppc-ow...@vger.kernel.org
> >>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of " tiejun.chen 
> >>> "
> >>> Sent: Thursday, July 18, 2013 1:01 PM
> >>> To: Bhushan Bharat-R65777
> >>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
> >>> ag...@suse.de; Wood
> >>> Scott-
> >>> B07421
> >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>> only for kernel managed pages
> >>>
> >>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> 
> 
> > -Original Message-
> > From: " tiejun.chen " [mailto:tiejun.c...@windriver.com]
> > Sent: Thursday, July 18, 2013 11:56 AM
> > To: Bhushan Bharat-R65777
> > Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
> > ag...@suse.de; Wood
> > Scott- B07421; Bhushan Bharat-R65777
> > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> > only for kernel managed pages
> >
> > On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >> If there is a struct page for the requested mapping then
> >> it's normal DDR and the mapping sets "M" bit (coherent,
> >> cacheable) else this is treated as I/O and we set  "I +
> >> G"  (cache inhibited,
> >> guarded)
> >>
> >> This helps setting proper TLB mapping for direct assigned
> >> device
> >>
> >> Signed-off-by: Bharat Bhushan
> >> 
> >> ---
> >>arch/powerpc/kvm/e500_mmu_host.c |   17 -
> >>1 files changed, 12 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >> b/arch/powerpc/kvm/e500_mmu_host.c
> >> index 1c6a9d7..089c227 100644
> >> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >> @@ -64,13 +64,20 @@ static inline u32
> >> e500_shadow_mas3_attrib(u32 mas3, int
> > usermode)
> >>return mas3;
> >>}
> >>
> >> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> >> usermode)
> >> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
> >> +pfn_t pfn)
> >>{
> >> +  u32 mas2_attr;
> >> +
> >> +  mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >> +
> >> +  if (!pfn_valid(pfn)) {
> >
> > Why not directly use kvm_is_mmio_pfn()?
> 
>  What I understand from this function (someone can correct
>  me) is that it
> >>> returns "false" when the page is managed by kernel and is
> >>> not marked as RESERVED (for some reason). For us it does not
> >>> matter whether the page is reserved or not, if it is kernel
> >>> visible page then it
> >

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 04:26, “tiejun.chen” wrote:

> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>> 
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, July 18, 2013 1:53 PM
 To: '"�tiejun.chen�"'
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood 
 Scott-
 B07421
 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for 
 kernel
 managed pages
 
 
 
> -Original Message-
> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 1:52 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
> Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> kernel managed pages
> 
> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: kvm-ppc-ow...@vger.kernel.org
>>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>> Sent: Thursday, July 18, 2013 1:01 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
>>> Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>> 
>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 11:56 AM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
> Wood
> Scott- B07421; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> for kernel managed pages
> 
> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>> If there is a struct page for the requested mapping then it's
>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>> else this is treated as I/O and we set  "I + G"  (cache
>> inhibited,
>> guarded)
>> 
>> This helps setting proper TLB mapping for direct assigned device
>> 
>> Signed-off-by: Bharat Bhushan 
>> ---
>>arch/powerpc/kvm/e500_mmu_host.c |   17 -
>>1 files changed, 12 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>> b/arch/powerpc/kvm/e500_mmu_host.c
>> index 1c6a9d7..089c227 100644
>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>> @@ -64,13 +64,20 @@ static inline u32
>> e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
>>  return mas3;
>>}
>> 
>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>> usermode)
>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>{
>> +u32 mas2_attr;
>> +
>> +mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>> +
>> +if (!pfn_valid(pfn)) {
> 
> Why not directly use kvm_is_mmio_pfn()?
 
 What I understand from this function (someone can correct me) is
 that it
>>> returns "false" when the page is managed by kernel and is not
>>> marked as RESERVED (for some reason). For us it does not matter
>>> whether the page is reserved or not, if it is kernel visible page 
>>> then it
 is DDR.
 
>>> 
>>> I think you are setting I|G by addressing all mmio pages, right? If
>>> so,
>>> 
>>>  KVM: direct mmio pfn check
>>> 
>>>  Userspace may specify memory slots that are backed by mmio
>>> pages rather than
>>>  normal RAM.  In some cases it is not enough to identify these
>>> mmio
> pages
>>>  by pfn_valid().  This patch adds checking the PageReserved as 
>>> well.
>> 
>> Do you know what are those "some cases" and how checking
>> PageRes