On 05.12.2016 20:37, Thierry Reding wrote: > On Thu, Nov 10, 2016 at 08:23:38PM +0200, Mikko Perttunen wrote: >> Add a new IO virtual memory allocation API to allow clients to >> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is >> required e.g. for loading client firmware when clients are attached >> to the IOMMU domain. >> >> The allocator allocates contiguous physical pages that are then >> mapped contiguously to the IOMMU domain using the iova_domain >> library provided by the kernel. Contiguous physical pages are >> used so that the same allocator works also when IOMMU support >> is disabled and therefore devices access physical memory directly. >> >> Signed-off-by: Mikko Perttunen <mperttunen at nvidia.com> >> --- >> drivers/gpu/drm/tegra/drm.c | 98 >> ++++++++++++++++++++++++++++++++++++++++++--- >> drivers/gpu/drm/tegra/drm.h | 7 ++++ >> 2 files changed, 100 insertions(+), 5 deletions(-) > > This looks mostly good, just a few comments below... > >> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c >> index b8be3ee..ecfe7ba 100644 >> --- a/drivers/gpu/drm/tegra/drm.c >> +++ b/drivers/gpu/drm/tegra/drm.c >> @@ -1,12 +1,13 @@ >> /* >> * Copyright (C) 2012 Avionic Design GmbH >> - * Copyright (C) 2012-2013 NVIDIA CORPORATION. All rights reserved. >> + * Copyright (C) 2012-2015 NVIDIA CORPORATION. All rights reserved. > > It's almost 2017 now, I think this is out of date. =) > >> * >> * This program is free software; you can redistribute it and/or modify >> * it under the terms of the GNU General Public License version 2 as >> * published by the Free Software Foundation. >> */ >> >> +#include <linux/bitops.h> >> #include <linux/host1x.h> >> #include <linux/iommu.h> >> >> @@ -23,6 +24,8 @@ >> #define DRIVER_MINOR 0 >> #define DRIVER_PATCHLEVEL 0 >> >> +#define IOVA_AREA_SZ (1024 * 1024 * 64) /* 64 MiB */ > > SZ_64M?
Fixed. > >> + >> struct tegra_drm_file { >> struct list_head contexts; >> }; >> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, >> unsigned long flags) >> >> if (iommu_present(&platform_bus_type)) { >> struct iommu_domain_geometry *geometry; >> - u64 start, end; >> + unsigned long order; >> + u64 iova_start, start, end; > > Can we be a little more explicit and keep an additional iova_end to > simplify the code a little? Good idea. I replaced this with gem_{start,end} and carveout_{start,end} and it looks nicer. > >> tegra->domain = iommu_domain_alloc(&platform_bus_type); >> if (!tegra->domain) { >> @@ -138,10 +142,17 @@ static int tegra_drm_load(struct drm_device *drm, >> unsigned long flags) >> geometry = &tegra->domain->geometry; >> start = geometry->aperture_start; >> end = geometry->aperture_end; >> + iova_start = end - IOVA_AREA_SZ + 1; >> + >> + order = __ffs(tegra->domain->pgsize_bitmap); >> + init_iova_domain(&tegra->iova, 1UL << order, >> + iova_start >> order, >> + end >> order); > > I hadn't seen this IOVA domain thing and it looks rather handy. Yes, definitely nicer than rolling by hand. Could be a bit easier still though :) Too many "<< order" everywhere. > >> >> - DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n", >> - start, end); >> - drm_mm_init(&tegra->mm, start, end - start + 1); >> + drm_mm_init(&tegra->mm, start, iova_start - start); >> + >> + DRM_DEBUG("IOMMU apertures initialized (GEM: %#llx-%#llx, IOVA: >> %#llx-%#llx)\n", >> + start, iova_start-1, iova_start, end); > > Might be worth splitting this into two, or perhaps even three lines: > > IOMMU apertures: > GEM: %#llx-%#llx > IOVA: %#llx-%#llx > > Maybe also find a better name for IOVA, since GEM will be using I/O > virtual addresses as well. Perhaps just name it "carveout" or something > like that? Done and renamed to carveout. > >> } >> >> mutex_init(&tegra->clients_lock); >> @@ -208,6 +219,7 @@ static int tegra_drm_load(struct drm_device *drm, >> unsigned long flags) >> if (tegra->domain) { >> iommu_domain_free(tegra->domain); >> drm_mm_takedown(&tegra->mm); >> + put_iova_domain(&tegra->iova); >> } >> free: >> kfree(tegra); >> @@ -232,6 +244,7 @@ static int tegra_drm_unload(struct drm_device *drm) >> if (tegra->domain) { >> iommu_domain_free(tegra->domain); >> drm_mm_takedown(&tegra->mm); >> + put_iova_domain(&tegra->iova); >> } >> >> kfree(tegra); >> @@ -975,6 +988,81 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra, >> return 0; >> } >> >> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, >> + dma_addr_t *iova) > > Maybe name the iova parameter phys? iova implies that it's always > translated, whereas according to the code it can be physical, too. How about something like "dma"? I find the use of "phys" for device addresses slightly confusing since sometimes they are and sometimes not and it's hard to say at a glance if the code in question actually assumes that the addresses are physical or not. "dma" should convey the meaning of "address used by device" a bit better. > >> +{ >> + struct iova *alloc; >> + unsigned long shift; >> + void *virt; >> + gfp_t gfp; >> + int err; >> + >> + if (tegra->domain) >> + size = iova_align(&tegra->iova, size); >> + else >> + size = PAGE_ALIGN(size); >> + >> + gfp = GFP_KERNEL | __GFP_ZERO; >> + if (!tegra->domain) { >> + /* >> + * Tegra210 has 64-bit physical addresses but units only support >> + * 32-bit addresses, so if we don't have an IOMMU to do >> + * translation, force allocations to be in the 32-bit range. >> + */ >> + gfp |= GFP_DMA; > > Technically we do support Tegra132 and that already has 64-bit physical > addresses, so perhaps include that in the comment, or make it more > generic: > > /* > * Many units only support 32-bit addresses, even on 64-bit SoCs. > * If there is no IOMMU to translate into a 32-bit IO virtual address > * address space, force allocations to be in the lower 32-bit range. > */ > Will fix. >> + } >> + >> + virt = (void *)__get_free_pages(gfp, get_order(size)); >> + if (!virt) >> + return NULL; > > Perhaps we want to return an ERR_PTR()-encoded error code here so we can > differentiate between out-of-memory and out-of-virtual-address-space > conditions in the caller? Or perhaps other conditions as well. Sure. > > Also, is __get_free_pages() the right API here? I thought that it always > returned contiguous memory, which, in the presence of an IOMMU, will be > completely unnecessary. That is true. However, this API is intended only for a few small allocations is done by the kernel, so it shouldn't be that bad. And sources on the internet say that vmalloc should only be used when absolutely necessary :) > >> + >> + if (!tegra->domain) { >> + /* >> + * If IOMMU is disabled, devices address physical memory >> + * directly. >> + */ >> + *iova = virt_to_phys(virt); >> + return virt; >> + } >> + >> + shift = iova_shift(&tegra->iova); >> + alloc = alloc_iova(&tegra->iova, size >> shift, >> + tegra->domain->geometry.aperture_end >> shift, true); > > This is fairly cumbersome to use. Maybe a drm_mm would be easier in this > case? If not, could we at least store the upper limit of the carveout so > that we don't have to use the really long expression above? Stored shift and aperture_end >> shift. > > Thierry > Thanks, Mikko.