On Mon, Jan 16, 2017 at 03:47:08PM +0800, Jason Wang wrote: > > > On 2017年01月16日 15:31, Peter Xu wrote: > >On Fri, Jan 13, 2017 at 05:26:06PM +0800, Jason Wang wrote: > >> > >>On 2017年01月13日 11:06, Peter Xu wrote: > >>>The default replay() don't work for VT-d since vt-d will have a huge > >>>default memory region which covers address range 0-(2^64-1). This will > >>>normally bring a dead loop when guest starts. > >>I think it just takes too much time instead of dead loop? > >Hmm, I can touch the commit message above to make it more precise. > > > >>>The solution is simple - we don't walk over all the regions. Instead, we > >>>jump over the regions when we found that the page directories are empty. > >>>It'll greatly reduce the time to walk the whole region. > >>Yes, the problem is memory_region_is_iommu_reply() not smart because: > >> > >>- It doesn't understand large page > >>- try go over all possible iova > >> > >>So I'm thinking to introduce something like iommu_ops->iova_iterate() which > >> > >>1) accept an start iova and return the next exist map > >>2) understand large page > >>3) skip unmapped iova > >Though I haven't tested with huge pages yet, but this patch should > >both solve above issue? I don't know whether you went over the page > >walk logic - it should both support huge page, and it will skip > >unmapped iova range (at least that's my goal to have this patch). In > >that case, looks like this patch is solving the same problem? :) > >(though without introducing iova_iterate() interface) > > > >Please correct me if I misunderstood it. > > Kind of :) I'm fine with this patch, but just want: > > - reuse most of the codes in the patch > - current memory_region_iommu_replay() logic > > So what I'm suggesting is a just slight change of API which can let caller > decide it need to do with each range of iova. So it could be reused for > other things except for replaying. > > But if you like to keep this patch as is, I don't object it.
I see. Then I can understand your mean here. I had the same thought before, that's why I exposed the vtd_page_walk with a hook. If you check the page_walk function comment: /** * vtd_page_walk - walk specific IOVA range, and call the hook * * @ce: context entry to walk upon * @start: IOVA address to start the walk * @end: IOVA range end address (start <= addr < end) * @hook_fn: the hook that to be called for each detected area * @private: private data for the hook function */ So I didn't implement the notification in page_walk at all - but in the hook_fn. If any caller that is interested in doing something else rather than the notification, we can just simply export the page walk interface and provide his/her own "hook_fn", then it'll be triggered for each valid page (no matter a huge/small one). If we can have a more general interface in the future - no matter whether we call it iova_iterate() or something else (I'll prefer the hooker way to do it, so maybe a common page walker with a hook function), we can do it simply (at least for Intel platform) based on this vtd_page_walk thing. Thanks, -- peterx