On 06/03/17 10:03, Benjamin Herrenschmidt wrote: > On Mon, 2017-02-27 at 22:00 +1100, Michael Ellerman wrote: >>> The alternative would be allocating TCE tables as big as PAGE_SIZE >>> but >>> only using parts of it but this would complicate a bit bits of code >>> responsible for overall amount of memory used for TCE table. >>> >>> Or kmem_cache_create() could be used to allocate as big TCE table >>> levels >>> as we really need but that API does not seem to support NUMA nodes. >> >> kmem_cache_alloc_node() ? > > Is that 55 bits of address space (ie, 3 indirect levels + 64k pages) ? > Or only 39 (2 indirect level + 64k pages) ?
39, yes. > In the former case, I'm happy to limit the levels to 3 for 64K pages, > 55 bits of TCE space is more than enough. 39 isn't however. 8192*8192*8192*65536>>40 = 32768TB of addressable memory (but there is no good reason not to use huge pages); 8192*8192*8192*4096>>40 = 2048TB or addressable memory (even with 2 indirect levels but we can have all 5 levels with 4K IOMMU pages). Looks enough to me... And in this particular patch I am not limiting anything, I just replace already existing EEH condition with -EINVAL. If it is this important to have all 5 levels, then we can switch from alloc_pages_node() to kmem_cache_alloc_node(), in a separate patch. -- Alexey