Re: [PATCH] iommu: Improve the performance for direct_mapping
On Thu, 2020-11-26 at 15:19 +, Robin Murphy wrote: > On 2020-11-20 09:06, Yong Wu wrote: > > Currently direct_mapping always use the smallest pgsize which is SZ_4K > > normally to mapping. This is unnecessary. we could gather the size, and > > call iommu_map then, iommu_map could decide how to map better with the > > just right pgsize. > > > > From the original comment, we should take care overlap, otherwise, > > iommu_map may return -EEXIST. In this overlap case, we should map the > > previous region before overlap firstly. then map the left part. > > > > Each a iommu device will call this direct_mapping when its iommu > > initialize, This patch is effective to improve the boot/initialization > > time especially while it only needs level 1 mapping. > > > > Signed-off-by: Anan Sun > > Signed-off-by: Yong Wu > > --- > > drivers/iommu/iommu.c | 20 ++-- > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index df87c8e825f7..854a8fcb928d 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct > > iommu_group *group, > > /* We need to consider overlapping regions for different devices */ > > list_for_each_entry(entry, &mappings, list) { > > dma_addr_t start, end, addr; > > + size_t unmapped_sz = 0; > > > > if (domain->ops->apply_resv_region) > > domain->ops->apply_resv_region(dev, domain, entry); > > @@ -752,10 +753,25 @@ static int iommu_create_device_direct_mappings(struct > > iommu_group *group, > > phys_addr_t phys_addr; > > > > phys_addr = iommu_iova_to_phys(domain, addr); > > - if (phys_addr) > > + if (phys_addr == 0) { > > + unmapped_sz += pg_size; /* Gather the size. */ > > continue; > > + } > > I guess the reason we need to validate every page is because they may > already have been legitimately mapped if someone else's reserved region > overlaps - is it worth explicitly validating that, i.e. bail out if > something's gone wrong enough that phys_addr != addr? I'm not sure the history about why to validate every page. this direct_mapping is called very early, normally after alloc_default_domain and _attach_device. the "phys_addr != addr" looks impossible. If there is a normal flow that may cause "phys_addr != addr", then something go wrong, Could we give a warning like adding a WARN_ON_ONCE(phys_addr != addr)? and it should be in a another patch. > > Other than the naming issue (I agree that map_size is a far, far better > choice), I don't have any strong opinions about the rest of the > implementation - I've written enough variations of this pattern to know > that there's just no "nice" way to do it in C; all you can do is shuffle > the clunkiness around :) :). I will send a v2. Thanks. > > Robin. > > > > > - ret = iommu_map(domain, addr, addr, pg_size, > > entry->prot); > > + if (unmapped_sz) { > > + /* Map the region before the overlap. */ > > + ret = iommu_map(domain, start, start, > > + unmapped_sz, entry->prot); > > + if (ret) > > + goto out; > > + start += unmapped_sz; > > + unmapped_sz = 0; > > + } > > + start += pg_size; > > + } > > + if (unmapped_sz) { > > + ret = iommu_map(domain, start, start, unmapped_sz, > > + entry->prot); > > if (ret) > > goto out; > > } > > > > ___ > Linux-mediatek mailing list > linux-media...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
Re: [PATCH] iommu: Improve the performance for direct_mapping
On 2020-11-20 09:06, Yong Wu wrote: Currently direct_mapping always use the smallest pgsize which is SZ_4K normally to mapping. This is unnecessary. we could gather the size, and call iommu_map then, iommu_map could decide how to map better with the just right pgsize. From the original comment, we should take care overlap, otherwise, iommu_map may return -EEXIST. In this overlap case, we should map the previous region before overlap firstly. then map the left part. Each a iommu device will call this direct_mapping when its iommu initialize, This patch is effective to improve the boot/initialization time especially while it only needs level 1 mapping. Signed-off-by: Anan Sun Signed-off-by: Yong Wu --- drivers/iommu/iommu.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index df87c8e825f7..854a8fcb928d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, /* We need to consider overlapping regions for different devices */ list_for_each_entry(entry, &mappings, list) { dma_addr_t start, end, addr; + size_t unmapped_sz = 0; if (domain->ops->apply_resv_region) domain->ops->apply_resv_region(dev, domain, entry); @@ -752,10 +753,25 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, phys_addr_t phys_addr; phys_addr = iommu_iova_to_phys(domain, addr); - if (phys_addr) + if (phys_addr == 0) { + unmapped_sz += pg_size; /* Gather the size. */ continue; + } I guess the reason we need to validate every page is because they may already have been legitimately mapped if someone else's reserved region overlaps - is it worth explicitly validating that, i.e. bail out if something's gone wrong enough that phys_addr != addr? Other than the naming issue (I agree that map_size is a far, far better choice), I don't have any strong opinions about the rest of the implementation - I've written enough variations of this pattern to know that there's just no "nice" way to do it in C; all you can do is shuffle the clunkiness around :) Robin. - ret = iommu_map(domain, addr, addr, pg_size, entry->prot); + if (unmapped_sz) { + /* Map the region before the overlap. */ + ret = iommu_map(domain, start, start, + unmapped_sz, entry->prot); + if (ret) + goto out; + start += unmapped_sz; + unmapped_sz = 0; + } + start += pg_size; + } + if (unmapped_sz) { + ret = iommu_map(domain, start, start, unmapped_sz, + entry->prot); if (ret) goto out; }
Re: [PATCH] iommu: Improve the performance for direct_mapping
On Wed, Nov 25, 2020 at 07:03:34PM +0800, Yong Wu wrote: > On Tue, 2020-11-24 at 11:05 +, Will Deacon wrote: > > On Tue, Nov 24, 2020 at 05:24:44PM +0800, Yong Wu wrote: > > > On Mon, 2020-11-23 at 12:32 +, Will Deacon wrote: > > That said, maybe we could simplify this further by changing the loop bounds > > to be: > > > > for (addr = start; addr <= end; addr += pg_size) > > > > and checking: > > > > if (!phys_addr && addr != end) { > > map_size += pg_size; > > continue; > > } > > > > does that work? > > It works but I think we can not check iommu_iova_to_phys(domain, end). > We should add a "if", like: > > for (addr = start; addr <= end; addr += pg_size) { > ... > if (addr < end) { > phys_addr = iommu_iova_to_phys(domain, addr); > if (!phys_addr) { > map_size += pg_size; > continue; > } > } > ... > Oh yes, you're right. > If you don't like this "if (addr < end)", then we have to add a "goto". > like this: > > > for (addr = start; addr <= end; addr += pg_size) { > phys_addr_t phys_addr; > > if (addr == end) > goto map_last; > > phys_addr = iommu_iova_to_phys(domain, addr); > if (!phys_addr) { > map_size += pg_size; > continue; > } > > map_last: > if (!map_size) > continue; > ret = iommu_map(domain, addr - map_size, > addr - map_size, map_size, entry->prot); I think it's cleared to invert this as you had before: if (map_size) ret = iommu_map(...); > Which one is better? The second one is easier to read. I'll stop making suggestions now, thanks. Will
Re: [PATCH] iommu: Improve the performance for direct_mapping
On Tue, 2020-11-24 at 11:05 +, Will Deacon wrote: > On Tue, Nov 24, 2020 at 05:24:44PM +0800, Yong Wu wrote: > > On Mon, 2020-11-23 at 12:32 +, Will Deacon wrote: > > > On Fri, Nov 20, 2020 at 05:06:28PM +0800, Yong Wu wrote: > > > > + unmapped_sz = 0; > > > > + } > > > > + start += pg_size; > > > > + } > > > > + if (unmapped_sz) { > > > > + ret = iommu_map(domain, start, start, > > > > unmapped_sz, > > > > + entry->prot); > > > > > > Can you avoid this hunk by changing your loop check to something like: > > > > > > if (!phys_addr) { > > > map_size += pg_size; > > > if (addr + pg_size < end) > > > continue; > > > } > > > > Thanks for your quick review. I have fixed and tested it. the patch is > > simple. I copy it here. Is this readable for you now? > > > > > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -737,6 +737,7 @@ static int > > iommu_create_device_direct_mappings(struct iommu_group *group, > > /* We need to consider overlapping regions for different devices */ > > list_for_each_entry(entry, &mappings, list) { > > dma_addr_t start, end, addr; > > + size_t map_size = 0; > > > > if (domain->ops->apply_resv_region) > > domain->ops->apply_resv_region(dev, domain, entry); > > @@ -752,12 +753,21 @@ static int > > iommu_create_device_direct_mappings(struct iommu_group *group, > > phys_addr_t phys_addr; > > > > phys_addr = iommu_iova_to_phys(domain, addr); > > - if (phys_addr) > > - continue; > > + if (!phys_addr) { > > + map_size += pg_size; > > + if (addr + pg_size < end) > > + continue; > > + else > > You don't need the 'else' here ^^^ > > > + addr += pg_size; /*Point to End */ > > addr = end ? > > That said, maybe we could simplify this further by changing the loop bounds > to be: > > for (addr = start; addr <= end; addr += pg_size) > > and checking: > > if (!phys_addr && addr != end) { > map_size += pg_size; > continue; > } > > does that work? It works but I think we can not check iommu_iova_to_phys(domain, end). We should add a "if", like: for (addr = start; addr <= end; addr += pg_size) { ... if (addr < end) { phys_addr = iommu_iova_to_phys(domain, addr); if (!phys_addr) { map_size += pg_size; continue; } } ... If you don't like this "if (addr < end)", then we have to add a "goto". like this: for (addr = start; addr <= end; addr += pg_size) { phys_addr_t phys_addr; if (addr == end) goto map_last; phys_addr = iommu_iova_to_phys(domain, addr); if (!phys_addr) { map_size += pg_size; continue; } map_last: if (!map_size) continue; ret = iommu_map(domain, addr - map_size, addr - map_size, map_size, entry->prot); if (ret) goto out; } Which one is better? > > Will > > ___ > Linux-mediatek mailing list > linux-media...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
Re: [PATCH] iommu: Improve the performance for direct_mapping
On Tue, Nov 24, 2020 at 05:24:44PM +0800, Yong Wu wrote: > On Mon, 2020-11-23 at 12:32 +, Will Deacon wrote: > > On Fri, Nov 20, 2020 at 05:06:28PM +0800, Yong Wu wrote: > > > + unmapped_sz = 0; > > > + } > > > + start += pg_size; > > > + } > > > + if (unmapped_sz) { > > > + ret = iommu_map(domain, start, start, unmapped_sz, > > > + entry->prot); > > > > Can you avoid this hunk by changing your loop check to something like: > > > > if (!phys_addr) { > > map_size += pg_size; > > if (addr + pg_size < end) > > continue; > > } > > Thanks for your quick review. I have fixed and tested it. the patch is > simple. I copy it here. Is this readable for you now? > > > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -737,6 +737,7 @@ static int > iommu_create_device_direct_mappings(struct iommu_group *group, > /* We need to consider overlapping regions for different devices */ > list_for_each_entry(entry, &mappings, list) { > dma_addr_t start, end, addr; > + size_t map_size = 0; > > if (domain->ops->apply_resv_region) > domain->ops->apply_resv_region(dev, domain, entry); > @@ -752,12 +753,21 @@ static int > iommu_create_device_direct_mappings(struct iommu_group *group, > phys_addr_t phys_addr; > > phys_addr = iommu_iova_to_phys(domain, addr); > - if (phys_addr) > - continue; > + if (!phys_addr) { > + map_size += pg_size; > + if (addr + pg_size < end) > + continue; > + else You don't need the 'else' here ^^^ > + addr += pg_size; /*Point to End */ addr = end ? That said, maybe we could simplify this further by changing the loop bounds to be: for (addr = start; addr <= end; addr += pg_size) and checking: if (!phys_addr && addr != end) { map_size += pg_size; continue; } does that work? Will
Re: [PATCH] iommu: Improve the performance for direct_mapping
On Mon, 2020-11-23 at 12:32 +, Will Deacon wrote: > On Fri, Nov 20, 2020 at 05:06:28PM +0800, Yong Wu wrote: > > Currently direct_mapping always use the smallest pgsize which is SZ_4K > > normally to mapping. This is unnecessary. we could gather the size, and > > call iommu_map then, iommu_map could decide how to map better with the > > just right pgsize. > > > > From the original comment, we should take care overlap, otherwise, > > iommu_map may return -EEXIST. In this overlap case, we should map the > > previous region before overlap firstly. then map the left part. > > > > Each a iommu device will call this direct_mapping when its iommu > > initialize, This patch is effective to improve the boot/initialization > > time especially while it only needs level 1 mapping. > > > > Signed-off-by: Anan Sun > > Signed-off-by: Yong Wu > > --- > > drivers/iommu/iommu.c | 20 ++-- > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index df87c8e825f7..854a8fcb928d 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct > > iommu_group *group, > > /* We need to consider overlapping regions for different devices */ > > list_for_each_entry(entry, &mappings, list) { > > dma_addr_t start, end, addr; > > + size_t unmapped_sz = 0; > > I think "unmapped" is the wrong word here, as this variable actually > represents the amount we want to map! I suggest "map_size" instead. > > > if (domain->ops->apply_resv_region) > > domain->ops->apply_resv_region(dev, domain, entry); > > @@ -752,10 +753,25 @@ static int iommu_create_device_direct_mappings(struct > > iommu_group *group, > > phys_addr_t phys_addr; > > > > phys_addr = iommu_iova_to_phys(domain, addr); > > - if (phys_addr) > > + if (phys_addr == 0) { > > + unmapped_sz += pg_size; /* Gather the size. */ > > continue; > > + } > > > > - ret = iommu_map(domain, addr, addr, pg_size, > > entry->prot); > > + if (unmapped_sz) { > > + /* Map the region before the overlap. */ > > + ret = iommu_map(domain, start, start, > > + unmapped_sz, entry->prot); > > + if (ret) > > + goto out; > > + start += unmapped_sz; > > I think it's a bit confusing to update start like this. Can we call > iommu_map(domain, addr - map_size, addr - map_size, map_size, entry->prot) > instead? > > > + unmapped_sz = 0; > > + } > > + start += pg_size; > > + } > > + if (unmapped_sz) { > > + ret = iommu_map(domain, start, start, unmapped_sz, > > + entry->prot); > > Can you avoid this hunk by changing your loop check to something like: > > if (!phys_addr) { > map_size += pg_size; > if (addr + pg_size < end) > continue; > } Thanks for your quick review. I have fixed and tested it. the patch is simple. I copy it here. Is this readable for you now? --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, /* We need to consider overlapping regions for different devices */ list_for_each_entry(entry, &mappings, list) { dma_addr_t start, end, addr; + size_t map_size = 0; if (domain->ops->apply_resv_region) domain->ops->apply_resv_region(dev, domain, entry); @@ -752,12 +753,21 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, phys_addr_t phys_addr; phys_addr = iommu_iova_to_phys(domain, addr); - if (phys_addr) - continue; + if (!phys_addr) { + map_size += pg_size; + if (addr + pg_size < end) + continue; + else + addr += pg_size; /*Point to End */ + } - ret = iommu_map(domain, addr, addr, pg_size, entry->prot); - if (ret) - goto out; + if (map_size) { + ret = iommu_map(domain, addr - map_size, addr - map_size, + map_size, entry->pr
Re: [PATCH] iommu: Improve the performance for direct_mapping
On Fri, Nov 20, 2020 at 05:06:28PM +0800, Yong Wu wrote: > Currently direct_mapping always use the smallest pgsize which is SZ_4K > normally to mapping. This is unnecessary. we could gather the size, and > call iommu_map then, iommu_map could decide how to map better with the > just right pgsize. > > From the original comment, we should take care overlap, otherwise, > iommu_map may return -EEXIST. In this overlap case, we should map the > previous region before overlap firstly. then map the left part. > > Each a iommu device will call this direct_mapping when its iommu > initialize, This patch is effective to improve the boot/initialization > time especially while it only needs level 1 mapping. > > Signed-off-by: Anan Sun > Signed-off-by: Yong Wu > --- > drivers/iommu/iommu.c | 20 ++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index df87c8e825f7..854a8fcb928d 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct > iommu_group *group, > /* We need to consider overlapping regions for different devices */ > list_for_each_entry(entry, &mappings, list) { > dma_addr_t start, end, addr; > + size_t unmapped_sz = 0; I think "unmapped" is the wrong word here, as this variable actually represents the amount we want to map! I suggest "map_size" instead. > if (domain->ops->apply_resv_region) > domain->ops->apply_resv_region(dev, domain, entry); > @@ -752,10 +753,25 @@ static int iommu_create_device_direct_mappings(struct > iommu_group *group, > phys_addr_t phys_addr; > > phys_addr = iommu_iova_to_phys(domain, addr); > - if (phys_addr) > + if (phys_addr == 0) { > + unmapped_sz += pg_size; /* Gather the size. */ > continue; > + } > > - ret = iommu_map(domain, addr, addr, pg_size, > entry->prot); > + if (unmapped_sz) { > + /* Map the region before the overlap. */ > + ret = iommu_map(domain, start, start, > + unmapped_sz, entry->prot); > + if (ret) > + goto out; > + start += unmapped_sz; I think it's a bit confusing to update start like this. Can we call iommu_map(domain, addr - map_size, addr - map_size, map_size, entry->prot) instead? > + unmapped_sz = 0; > + } > + start += pg_size; > + } > + if (unmapped_sz) { > + ret = iommu_map(domain, start, start, unmapped_sz, > + entry->prot); Can you avoid this hunk by changing your loop check to something like: if (!phys_addr) { map_size += pg_size; if (addr + pg_size < end) continue; } Will
[PATCH] iommu: Improve the performance for direct_mapping
Currently direct_mapping always use the smallest pgsize which is SZ_4K normally to mapping. This is unnecessary. we could gather the size, and call iommu_map then, iommu_map could decide how to map better with the just right pgsize. >From the original comment, we should take care overlap, otherwise, iommu_map may return -EEXIST. In this overlap case, we should map the previous region before overlap firstly. then map the left part. Each a iommu device will call this direct_mapping when its iommu initialize, This patch is effective to improve the boot/initialization time especially while it only needs level 1 mapping. Signed-off-by: Anan Sun Signed-off-by: Yong Wu --- drivers/iommu/iommu.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index df87c8e825f7..854a8fcb928d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -737,6 +737,7 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, /* We need to consider overlapping regions for different devices */ list_for_each_entry(entry, &mappings, list) { dma_addr_t start, end, addr; + size_t unmapped_sz = 0; if (domain->ops->apply_resv_region) domain->ops->apply_resv_region(dev, domain, entry); @@ -752,10 +753,25 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, phys_addr_t phys_addr; phys_addr = iommu_iova_to_phys(domain, addr); - if (phys_addr) + if (phys_addr == 0) { + unmapped_sz += pg_size; /* Gather the size. */ continue; + } - ret = iommu_map(domain, addr, addr, pg_size, entry->prot); + if (unmapped_sz) { + /* Map the region before the overlap. */ + ret = iommu_map(domain, start, start, + unmapped_sz, entry->prot); + if (ret) + goto out; + start += unmapped_sz; + unmapped_sz = 0; + } + start += pg_size; + } + if (unmapped_sz) { + ret = iommu_map(domain, start, start, unmapped_sz, + entry->prot); if (ret) goto out; } -- 2.18.0