Hi Andrew!

On 2023-02-16T23:06:44+0100, I wrote:
> On 2023-02-16T16:17:32+0000, "Stubbs, Andrew via Gcc-patches" 
> <gcc-patches@gcc.gnu.org> wrote:
>> The mmap implementation was not optimized for a lot of small allocations, 
>> and I can't see that issue changing here
>
> That's correct, 'mmap' remains.  Under the hood, 'cuMemHostRegister' must
> surely also be doing some 'mlock'-like thing, so I figured it's best to
> feed page-boundary memory regions to it, which 'mmap' gets us.
>
>> so I don't know if this can be used for mlockall replacement.
>>
>> I had assumed that using the Cuda allocator would fix that limitation.
>
> From what I've read (but no first-hand experiments), there's non-trivial
> overhead with 'cuMemHostRegister' (just like with 'mlock'), so routing
> all small allocations individually through it probably isn't a good idea
> either.  Therefore, I suppose, we'll indeed want to use some local
> allocator if we wish this "optimized for a lot of small allocations".

Eh, I suppose your point indirectly was that instead of 'mmap' plus
'cuMemHostRegister' we ought to use 'cuMemAllocHost'/'cuMemHostAlloc', as
we assume those already do implement such a local allocator.  Let me
quickly change that indeed -- we don't currently have a need to use
'cuMemHostRegister' instead of 'cuMemAllocHost'/'cuMemHostAlloc'.

> And, getting rid of 'mlockall' is yet another topic.

Here, the need to use 'cuMemHostRegister' may then again come up, as
begun to discuss as my "different idea" re "-foffload-memory=pinned",
<https://inbox.sourceware.org/gcc-patches/87sff9zl3u....@euler.schwinge.homeip.net>.
(Let's continue that discussion there.)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to