On 24.07.2013, at 04:26, “tiejun.chen” wrote: > On 07/18/2013 06:27 PM, Alexander Graf wrote: >> >> On 18.07.2013, at 12:19, “tiejun.chen” wrote: >> >>> On 07/18/2013 06:12 PM, Alexander Graf wrote: >>>> >>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote: >>>> >>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote: >>>>>> >>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Bhushan Bharat-R65777 >>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>>>> To: '"�tiejun.chen�"' >>>>>>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood >>>>>>>> Scott- >>>>>>>> B07421 >>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>>>>>>> kernel >>>>>>>> managed pages >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood >>>>>>>>> Scott- >>>>>>>>> B07421 >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>>>>>>>> kernel managed pages >>>>>>>>> >>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: kvm-ppc-ow...@vger.kernel.org >>>>>>>>>>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; >>>>>>>>>>> Wood >>>>>>>>>>> Scott- >>>>>>>>>>> B07421 >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>>>>>>>>>> kernel managed pages >>>>>>>>>>> >>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; >>>>>>>>>>>>> Wood >>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only >>>>>>>>>>>>> for kernel managed pages >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>>>> If there is a struct page for the requested mapping then it's >>>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable) >>>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache >>>>>>>>>>>>>> inhibited, >>>>>>>>>>>>>> guarded) >>>>>>>>>>>>>> >>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhus...@freescale.com> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++----- >>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>>>> usermode) >>>>>>>>>>>>>> return mas3; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>>>>>>>>>>>>> usermode) >>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>>>> >>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>>>> >>>>>>>>>>>> What I understand from this function (someone can correct me) is >>>>>>>>>>>> that it >>>>>>>>>>> returns "false" when the page is managed by kernel and is not >>>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter >>>>>>>>>>> whether the page is reserved or not, if it is kernel visible page >>>>>>>>>>> then it >>>>>>>> is DDR. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If >>>>>>>>>>> so, >>>>>>>>>>> >>>>>>>>>>> KVM: direct mmio pfn check >>>>>>>>>>> >>>>>>>>>>> Userspace may specify memory slots that are backed by mmio >>>>>>>>>>> pages rather than >>>>>>>>>>> normal RAM. In some cases it is not enough to identify these >>>>>>>>>>> mmio >>>>>>>>> pages >>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as >>>>>>>>>>> well. >>>>>>>>>> >>>>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>>>> PageReserved helps in >>>>>>>>> those cases? >>>>>>>>> >>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should >>>>>>>>> be chronically persistent as I understand ;-) >>>>>>>> >>>>>>>> Then I will wait till someone educate me :) >>>>>>> >>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do >>>>>>> not want to call this for all tlbwe operation unless it is necessary. >>>>>> >>>>>> It certainly does more than we need and potentially slows down the fast >>>>>> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" >>>>>> is to check for pages that are declared reserved on the host. This >>>>>> happens in 2 cases: >>>>>> >>>>>> 1) Non cache coherent DMA >>>>>> 2) Memory hot remove >>>>>> >>>>>> The non coherent DMA case would be interesting, as with the mechanism as >>>>>> it is in place in Linux today, we could potentially break normal guest >>>>>> operation if we don't take it into account. However, it's Kconfig >>>>>> guarded by: >>>>>> >>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON >>>>>> default n if PPC_47x >>>>>> default y >>>>>> >>>>>> so we never hit it with any core we care about ;). >>>>>> >>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry >>>>>> about that one either. >>>>> >>>>> Thanks for this good information :) >>>>> >>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside >>>>> kvm_is_mmio_pfn() to make sure that check is only valid when that is >>>>> really needed? This can decrease those unnecessary performance loss. >>>>> >>>>> If I'm wrong please correct me :) >>>> >>>> You're perfectly right, but this is generic KVM code. So it gets run >>>> across all architectures. What if someone has the great idea to add a new >>>> case here for x86, but doesn't tell us? In that case we potentially break >>>> x86. >>>> >>>> I'd rather not like to break x86 :). >>>> >>>> However, it'd be very interesting to see a benchmark with this change. Do >>>> you think you could just rip out the whole reserved check and run a few >>>> benchmarks and show us the results? >>>> >>> >>> Often what case should be adopted to validate this scenario? >> >> Something which hammers the TLB emulation heavily. I usually just run >> /bin/echo a thousand times in "time" and see how long it takes ;) >> > > I tried to run five times with this combination, "time `for ((i=0; i<5000; > i++)); do /bin/echo; done`", to calculate the average value with this change: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 1580dd4..5e8635b 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -102,6 +102,10 @@ static bool largepages_enabled = true; > > bool kvm_is_mmio_pfn(pfn_t pfn) > { > +#ifdef CONFIG_MEMORY_HOTPLUG
I'd feel safer if we narrow this down to e500. > + /* > + * Currently only in memory hot remove case we may still need this. > + */ > if (pfn_valid(pfn)) { We still have to check for pfn_valid, no? So the #ifdef should be down here. > int reserved; > struct page *tail = pfn_to_page(pfn); > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) > } > return PageReserved(tail); > } > +#endif > > return true; > } > > Before apply this change: > > real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= > 1m21.376s > user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= > 0m23.433s > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s > > After apply this change: > > real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= > 1m20.667s > user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= > 0m22.615s > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s > > So, > > real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% > user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead. Gleb, Paolo, any hard feelings? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html