On 05.12.2016 20:37, Thierry Reding wrote:
> On Thu, Nov 10, 2016 at 08:23:38PM +0200, Mikko Perttunen wrote:
>> Add a new IO virtual memory allocation API to allow clients to
>> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
>> required e.g. for loading client firmware when clients are attached
>> to the IOMMU domain.
>>
>> The allocator allocates contiguous physical pages that are then
>> mapped contiguously to the IOMMU domain using the iova_domain
>> library provided by the kernel. Contiguous physical pages are
>> used so that the same allocator works also when IOMMU support
>> is disabled and therefore devices access physical memory directly.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen at nvidia.com>
>> ---
>>  drivers/gpu/drm/tegra/drm.c | 98 
>> ++++++++++++++++++++++++++++++++++++++++++---
>>  drivers/gpu/drm/tegra/drm.h |  7 ++++
>>  2 files changed, 100 insertions(+), 5 deletions(-)
>
> This looks mostly good, just a few comments below...
>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index b8be3ee..ecfe7ba 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -1,12 +1,13 @@
>>  /*
>>   * Copyright (C) 2012 Avionic Design GmbH
>> - * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
>> + * Copyright (C) 2012-2015 NVIDIA CORPORATION.  All rights reserved.
>
> It's almost 2017 now, I think this is out of date. =)
>
>>   *
>>   * This program is free software; you can redistribute it and/or modify
>>   * it under the terms of the GNU General Public License version 2 as
>>   * published by the Free Software Foundation.
>>   */
>>
>> +#include <linux/bitops.h>
>>  #include <linux/host1x.h>
>>  #include <linux/iommu.h>
>>
>> @@ -23,6 +24,8 @@
>>  #define DRIVER_MINOR 0
>>  #define DRIVER_PATCHLEVEL 0
>>
>> +#define IOVA_AREA_SZ (1024 * 1024 * 64) /* 64 MiB */
>
> SZ_64M?

Fixed.

>
>> +
>>  struct tegra_drm_file {
>>      struct list_head contexts;
>>  };
>> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, 
>> unsigned long flags)
>>
>>      if (iommu_present(&platform_bus_type)) {
>>              struct iommu_domain_geometry *geometry;
>> -            u64 start, end;
>> +            unsigned long order;
>> +            u64 iova_start, start, end;
>
> Can we be a little more explicit and keep an additional iova_end to
> simplify the code a little?

Good idea. I replaced this with gem_{start,end} and carveout_{start,end} 
and it looks nicer.

>
>>              tegra->domain = iommu_domain_alloc(&platform_bus_type);
>>              if (!tegra->domain) {
>> @@ -138,10 +142,17 @@ static int tegra_drm_load(struct drm_device *drm, 
>> unsigned long flags)
>>              geometry = &tegra->domain->geometry;
>>              start = geometry->aperture_start;
>>              end = geometry->aperture_end;
>> +            iova_start = end - IOVA_AREA_SZ + 1;
>> +
>> +            order = __ffs(tegra->domain->pgsize_bitmap);
>> +            init_iova_domain(&tegra->iova, 1UL << order,
>> +                             iova_start >> order,
>> +                             end >> order);
>
> I hadn't seen this IOVA domain thing and it looks rather handy.

Yes, definitely nicer than rolling by hand. Could be a bit easier still 
though :) Too many "<< order" everywhere.

>
>>
>> -            DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
>> -                             start, end);
>> -            drm_mm_init(&tegra->mm, start, end - start + 1);
>> +            drm_mm_init(&tegra->mm, start, iova_start - start);
>> +
>> +            DRM_DEBUG("IOMMU apertures initialized (GEM: %#llx-%#llx, IOVA: 
>> %#llx-%#llx)\n",
>> +                      start, iova_start-1, iova_start, end);
>
> Might be worth splitting this into two, or perhaps even three lines:
>
>       IOMMU apertures:
>         GEM: %#llx-%#llx
>         IOVA: %#llx-%#llx
>
> Maybe also find a better name for IOVA, since GEM will be using I/O
> virtual addresses as well. Perhaps just name it "carveout" or something
> like that?

Done and renamed to carveout.

>
>>      }
>>
>>      mutex_init(&tegra->clients_lock);
>> @@ -208,6 +219,7 @@ static int tegra_drm_load(struct drm_device *drm, 
>> unsigned long flags)
>>      if (tegra->domain) {
>>              iommu_domain_free(tegra->domain);
>>              drm_mm_takedown(&tegra->mm);
>> +            put_iova_domain(&tegra->iova);
>>      }
>>  free:
>>      kfree(tegra);
>> @@ -232,6 +244,7 @@ static int tegra_drm_unload(struct drm_device *drm)
>>      if (tegra->domain) {
>>              iommu_domain_free(tegra->domain);
>>              drm_mm_takedown(&tegra->mm);
>> +            put_iova_domain(&tegra->iova);
>>      }
>>
>>      kfree(tegra);
>> @@ -975,6 +988,81 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>      return 0;
>>  }
>>
>> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
>> +                          dma_addr_t *iova)
>
> Maybe name the iova parameter phys? iova implies that it's always
> translated, whereas according to the code it can be physical, too.

How about something like "dma"? I find the use of "phys" for device 
addresses slightly confusing since sometimes they are and sometimes not 
and it's hard to say at a glance if the code in question actually 
assumes that the addresses are physical or not. "dma" should convey the 
meaning of "address used by device" a bit better.

>
>> +{
>> +    struct iova *alloc;
>> +    unsigned long shift;
>> +    void *virt;
>> +    gfp_t gfp;
>> +    int err;
>> +
>> +    if (tegra->domain)
>> +            size = iova_align(&tegra->iova, size);
>> +    else
>> +            size = PAGE_ALIGN(size);
>> +
>> +    gfp = GFP_KERNEL | __GFP_ZERO;
>> +    if (!tegra->domain) {
>> +            /*
>> +             * Tegra210 has 64-bit physical addresses but units only support
>> +             * 32-bit addresses, so if we don't have an IOMMU to do
>> +             * translation, force allocations to be in the 32-bit range.
>> +             */
>> +            gfp |= GFP_DMA;
>
> Technically we do support Tegra132 and that already has 64-bit physical
> addresses, so perhaps include that in the comment, or make it more
> generic:
>
>       /*
>        * Many units only support 32-bit addresses, even on 64-bit SoCs.
>        * If there is no IOMMU to translate into a 32-bit IO virtual address
>        * address space, force allocations to be in the lower 32-bit range.
>        */
>

Will fix.

>> +    }
>> +
>> +    virt = (void *)__get_free_pages(gfp, get_order(size));
>> +    if (!virt)
>> +            return NULL;
>
> Perhaps we want to return an ERR_PTR()-encoded error code here so we can
> differentiate between out-of-memory and out-of-virtual-address-space
> conditions in the caller? Or perhaps other conditions as well.

Sure.

>
> Also, is __get_free_pages() the right API here? I thought that it always
> returned contiguous memory, which, in the presence of an IOMMU, will be
> completely unnecessary.

That is true. However, this API is intended only for a few small 
allocations is done by the kernel, so it shouldn't be that bad. And 
sources on the internet say that vmalloc should only be used when 
absolutely necessary :)

>
>> +
>> +    if (!tegra->domain) {
>> +            /*
>> +             * If IOMMU is disabled, devices address physical memory
>> +             * directly.
>> +             */
>> +            *iova = virt_to_phys(virt);
>> +            return virt;
>> +    }
>> +
>> +    shift = iova_shift(&tegra->iova);
>> +    alloc = alloc_iova(&tegra->iova, size >> shift,
>> +                       tegra->domain->geometry.aperture_end >> shift, true);
>
> This is fairly cumbersome to use. Maybe a drm_mm would be easier in this
> case? If not, could we at least store the upper limit of the carveout so
> that we don't have to use the really long expression above?

Stored shift and aperture_end >> shift.

>
> Thierry
>

Thanks,
Mikko.

Reply via email to