On 10/02/2014 12:41 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 30, 2014 at 09:49:54PM -0400, Peter Hurley wrote:
>> On 09/30/2014 07:45 PM, Thomas Gleixner wrote:
>>> Whether the proposed patchset is the correct solution to support it is
>>> a completely different question.
>>
>> This patchset has been in mainline since 3.16 and has already caused
>> regressions, so the question of whether this is the correct solution has
>> already been answered.
>>
>>> So either you stop this right now and help Akinobu to find the proper
>>> solution 
>>
>> If this is only a test platform for ARM parts then I don't think it
>> unreasonable to suggest forking x86 swiotlb support into a iommu=cma
> 
> Not sure what you mean by 'forking x86 swiotlb' ? As in have SWIOTLB
> work under ARM?

No, that's not what I meant.

>> selector that gets DMA mapping working for this test platform and doesn't
>> cause a bunch of breakage.
> 
> I think you might want to take a look at the IOMMU_DETECT macros
> and enable CMA there only if the certain devices are available.
> 
> That way the normal flow of detecting which IOMMU to use is still present
> and will turn of CMA if there is no device that would use it.
> 
>>
>> Which is different than if the plan is to ship production units for x86;
>> then a general purpose solution will be required.
>>
>> As to the good design of a general purpose solution for allocating and
>> mapping huge order pages, you are certainly more qualified to help Akinobu
>> than I am.

What Akinobu's patches intend to support is:

        phys_addr = dma_alloc_coherent(dev, 64 * 1024 * 1024, &bus_addr, 
GFP_KERNEL);

which raises three issues:

1. Where do coherent blocks of this size come from?
2. How to prevent fragmentation of these reserved blocks over time by
   existing DMA users?
3. Is this support generically required across all iommu implementations on x86?

Questions 1 and 2 are non-trivial, in the general case, otherwise the page
allocator would already do this. Simply dropping in the contiguous memory
allocator doesn't work because CMA does not have the same policy and performance
as the page allocator, and is already causing performance regressions even
in the absence of huge page allocations.

So that's why I raised question 3; is making the necessary compromises to 
support
64MB coherent DMA allocations across all x86 iommu implementations actually
required?

Prior to Akinobu's patches, the use of CMA by x86 iommu configurations was
designed to be limited to testing configurations, as the introductory
commit states:

commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
Author: Marek Szyprowski <m.szyprow...@samsung.com>
Date:   Thu Dec 29 13:09:51 2011 +0100

    X86: integrate CMA with DMA-mapping subsystem
    
    This patch adds support for CMA to dma-mapping subsystem for x86
    architecture that uses common pci-dma/pci-nommu implementation. This
    allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
    
    Signed-off-by: Marek Szyprowski <m.szyprow...@samsung.com>
    Signed-off-by: Kyungmin Park <kyungmin.p...@samsung.com>
    CC: Michal Nazarewicz <min...@mina86.com>
    Acked-by: Arnd Bergmann <a...@arndb.de>


Which brings me to my suggestion: if support for huge coherent DMA is
required only for a special test platform, then could not this support
be specific to a new iommu configuration, namely iommu=cma, which would
get initialized much the same way that iommu=calgary is now.

The code for such a iommu configuration would mostly duplicate
arch/x86/kernel/pci-swiotlb.c and the CMA support would get removed from
the other x86 iommu implementations.

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to