On 27/02/17 22:00, Michael Ellerman wrote: > Alexey Kardashevskiy <a...@ozlabs.ru> writes: > >> The IODA2 specification says that a 64 DMA address cannot use top 4 bits >> (3 are reserved and one is a "TVE select"); bottom page_shift bits >> cannot be used for multilevel table addressing either. >> >> The existing IODA2 table allocation code aligns the minimum TCE table >> size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages, >> we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs >> 13 bits, the maximum number of levels is 48/13 = 3 so we physically >> cannot address more and EEH happens on DMA accesses. >> >> This adds a check that too many levels were requested. >> >> It is still possible to have 5 levels in the case of 4K system page size. >> >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >> --- >> >> The alternative would be allocating TCE tables as big as PAGE_SIZE but >> only using parts of it but this would complicate a bit bits of code >> responsible for overall amount of memory used for TCE table. >> >> Or kmem_cache_create() could be used to allocate as big TCE table levels >> as we really need but that API does not seem to support NUMA nodes. > > kmem_cache_alloc_node() ?
Yeah, discovered this later. Still, if a single level is used, then the table is 4MB and kmem_cache_alloc_node() does not seem the right tool here (although I cannot find any enforced upper limit). So to keep things simpler, I decided to stick to alloc_pages_node() and avoid mixing memory allocation APIs. -- Alexey