mmap implementation advice needed.
Hi tech-kern, I'm trying to solve PR#28379 and ran into a problem and I don't really understand how it is supposed to work: If a process tries to mmap for example a file with a length of just over 1GB it will always succeed as I understand the code, but that may not be true depending on the underlying hardware, and I cannot find any way to control this from the MD code...? On vax, for example, large mmap's cannot be done due to hardware constraints. In the above example it will cause the mmap() to succeed, but when touching the pages it will hang forever since there will never be available pte's. So, any advice how a max size of allowed mmap'able memory be controlled? Notes about vax memory management if someone is wondering: - 2 areas (P0 and P1) of size 1G each, P0 grows from bottom, P1 grows from top (intended for stack). - The PTEs for KVM must be in contiguous physical memory, hence the allocation for one process with all of P0 and P1 mapped takes 128k. - Vax uses VM_MAP_TOPDOWN so that not too much of KVM space is needed for mmap. -- Ragge
Re: mmap implementation advice needed.
In article <1ce8eac5-3639-aec1-0e0c-fe857f49b...@ludd.ltu.se>, Anders Magnusson wrote: >Hi tech-kern, > >I'm trying to solve PR#28379 and ran into a problem and I don't really >understand how it is supposed to work: >If a process tries to mmap for example a file with a length of just over >1GB it will always succeed as I understand the code, but that may not be >true depending on the underlying hardware, and I cannot find any way to >control this from the MD code...? > >On vax, for example, large mmap's cannot be done due to hardware >constraints. >In the above example it will cause the mmap() to succeed, but when >touching the pages it will hang forever since there will never be >available pte's. > >So, any advice how a max size of allowed mmap'able memory be controlled? > >Notes about vax memory management if someone is wondering: >- 2 areas (P0 and P1) of size 1G each, P0 grows from bottom, P1 grows >from top (intended for stack). >- The PTEs for KVM must be in contiguous physical memory, hence the >allocation for one process with all of P0 and P1 mapped takes 128k. >- Vax uses VM_MAP_TOPDOWN so that not too much of KVM space is needed >for mmap. Perhaps we should add a resource limit for contiguous memory allocations. RLIMIT_MEMCONT? The actual value can be MD. christos
Re: mmap implementation advice needed.
Den 2018-03-30 kl. 16:46, skrev Christos Zoulas: In article <1ce8eac5-3639-aec1-0e0c-fe857f49b...@ludd.ltu.se>, Anders Magnusson wrote: Hi tech-kern, I'm trying to solve PR#28379 and ran into a problem and I don't really understand how it is supposed to work: If a process tries to mmap for example a file with a length of just over 1GB it will always succeed as I understand the code, but that may not be true depending on the underlying hardware, and I cannot find any way to control this from the MD code...? On vax, for example, large mmap's cannot be done due to hardware constraints. In the above example it will cause the mmap() to succeed, but when touching the pages it will hang forever since there will never be available pte's. So, any advice how a max size of allowed mmap'able memory be controlled? Notes about vax memory management if someone is wondering: - 2 areas (P0 and P1) of size 1G each, P0 grows from bottom, P1 grows >from top (intended for stack). - The PTEs for KVM must be in contiguous physical memory, hence the allocation for one process with all of P0 and P1 mapped takes 128k. - Vax uses VM_MAP_TOPDOWN so that not too much of KVM space is needed for mmap. Perhaps we should add a resource limit for contiguous memory allocations. RLIMIT_MEMCONT? The actual value can be MD. That will not solve the problem; just doing two mmap'is and we were at the same spot again. The problem is that too much virtual memory can be allocated. A resource limit for mmap in total would solve the problem though. -- Ragge
Re: mmap implementation advice needed.
On Fri, Mar 30, 2018 at 11:33:48AM +0200, Anders Magnusson wrote: > Notes about vax memory management if someone is wondering: > - 2 areas (P0 and P1) of size 1G each, P0 grows from bottom, P1 grows from > top (intended for stack). AFAICT, VAX uses a max userland address of 2G, so what exactly is the problem? That you can't allocate enough continous memory for the PTEs? Joerg
Re: mmap implementation advice needed.
>> Notes about vax memory management if someone is wondering: >> - 2 areas (P0 and P1) of size 1G each, P0 grows from bottom, P1 >> grows from top (intended for stack). (Plus, though only slightly relevant to userland mmap, system area (1G) and reserved area (1G). P0 space is 0x-0x3fff, P1 is 0x4000-0x7fff, system is 0x8000-0xbfff, and reserved is 0xc000-0x.) > AFAICT, VAX uses a max userland address of 2G, so what exactly is the > problem? That you can't allocate enough continous memory for the > PTEs? Yes. It takes 4 bytes of PTE to map 512 bytes of VA. (The VAX uses the small, by today's standards, page size of 512 bytes.) So 2G of userland space requires 16M of PTEs. Those PTEs must be in system virtual space. And that 16M of system virtual space requires 128K of PTEs to map, and _those_ PTEs require contiguous physical space. There is a VAX variant, the rtVAX, where P0 and P1 space PTEs are not in system virtual space, but there the situation is even worse - they are in physical space, so 2G of userland VM requires 16M, not 128K, of continguous physical space. (As far as I can recall, NetBSD doesn't support rtVAXen, but I don't recall looking in detail.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: mmap implementation advice needed.
On Fri, Mar 30, 2018 at 01:10:37PM -0400, Mouse wrote: > It takes 4 bytes of PTE to map 512 bytes of VA. (The VAX uses the > small, by today's standards, page size of 512 bytes.) So 2G of > userland space requires 16M of PTEs. Those PTEs must be in system > virtual space. And that 16M of system virtual space requires 128K of > PTEs to map, and _those_ PTEs require contiguous physical space. Let me try to rephrase that: The first level page table of VAX needs up to 128K (for each of the two ranges?) as contiguous physical space. The second level page table needs 16M in some block size, but they don't need to be all contigously? Joerg
re: mmap implementation advice needed.
> A resource limit for mmap in total would solve the problem though. RLIMIT_AS? you'll have to add support to set it in MD code, but eg. these lines should help. 465:uvm_init_limits(struct proc *p) [..] 479:p->p_rlimit[RLIMIT_AS].rlim_cur = RLIM_INFINITY; 480:p->p_rlimit[RLIMIT_AS].rlim_max = RLIM_INFINITY; .mrg.
Re: mmap implementation advice needed.
>> It takes 4 bytes of PTE to map 512 bytes of VA. (The VAX uses the >> small, by today's standards, page size of 512 bytes.) So 2G of >> userland space requires 16M of PTEs. Those PTEs must be in system >> virtual space. And that 16M of system virtual space requires 128K >> of PTEs to map, and _those_ PTEs require contiguous physical space. > Let me try to rephrase that: > The first level page table of VAX needs up to 128K (for each of the > two ranges?) as contiguous physical space. Not quite. "[F]irst level" is correct from one perspective (two PTE lookups are potentially necessary to resolve a userland address to a physical address) but incorrect from another (the "second" level is entirely implicit, implicit in P0/P1 PTEs living in system space). And I (and ragge, I think it was) misspoke. It doesn't quite require 128K of contiguous physical space. It needs two 64K blocks of physically contiguous space, both within the block that maps system space. (Nothing says that P0 PTEs have to be anywhere near P1 PTEs in system virtual space, but they do have to be within system space.) > The second level page table needs 16M in some block size, but they > don't need to be all contigously? Not quite. Userland conceptually has just a single level of PTEs; it's just that those PTEs live not in physical space but in system ("kernel") virtual space. The resulting system->physical lookups are not based directly on the userland address, the way they would be with a typical two-level page table system (as I understand them), but rather they are ordinary system->physical lookups based on the PTE's (system) virtual address. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: mmap implementation advice needed.
On Fri, Mar 30, 2018 at 04:22:29PM -0400, Mouse wrote: > And I (and ragge, I think it was) misspoke. It doesn't quite require > 128K of contiguous physical space. It needs two 64K blocks of > physically contiguous space, both within the block that maps system > space. (Nothing says that P0 PTEs have to be anywhere near P1 PTEs in > system virtual space, but they do have to be within system space.) ...and the problem to be solved here is that the memory has become fragmented enough that you can't find 64KB of contiguous pages? If so, what about having a fixed set of emergency reservations and copying the non-contiguous pmap content into that during context switch? Joerg
Re: mmap implementation advice needed.
Den 2018-03-30 kl. 20:43, skrev matthew green: A resource limit for mmap in total would solve the problem though. RLIMIT_AS? you'll have to add support to set it in MD code, but eg. these lines should help. 465:uvm_init_limits(struct proc *p) [..] 479:p->p_rlimit[RLIMIT_AS].rlim_cur = RLIM_INFINITY; 480:p->p_rlimit[RLIMIT_AS].rlim_max = RLIM_INFINITY; Thanks! This was exactly what I wanted! Problem solved! Hm, why didn't I see it myself when looking? :-) -- R
Re: mmap implementation advice needed.
Den 2018-03-30 kl. 22:31, skrev Joerg Sonnenberger: On Fri, Mar 30, 2018 at 04:22:29PM -0400, Mouse wrote: And I (and ragge, I think it was) misspoke. It doesn't quite require 128K of contiguous physical space. It needs two 64K blocks of physically contiguous space, both within the block that maps system space. (Nothing says that P0 PTEs have to be anywhere near P1 PTEs in system virtual space, but they do have to be within system space.) ...and the problem to be solved here is that the memory has become fragmented enough that you can't find 64KB of contiguous pages? If so, what about having a fixed set of emergency reservations and copying the non-contiguous pmap content into that during context switch? It's not only contiguous memory that is the problem; the memory must be in the system page table, which place and size is determined at boot. The usrptmap should (in an ideal world) be sized depending on available user memory and maxusers. Until then, we'll live with these limits. Which is not a problem on vax, only want to avoid unexpected hangs and crashes. -- Ragge
Re: mmap implementation advice needed.
On 2018-03-30 22:31, Joerg Sonnenberger wrote: On Fri, Mar 30, 2018 at 04:22:29PM -0400, Mouse wrote: And I (and ragge, I think it was) misspoke. It doesn't quite require 128K of contiguous physical space. It needs two 64K blocks of physically contiguous space, both within the block that maps system space. (Nothing says that P0 PTEs have to be anywhere near P1 PTEs in system virtual space, but they do have to be within system space.) ...and the problem to be solved here is that the memory has become fragmented enough that you can't find 64KB of contiguous pages? If so, what about having a fixed set of emergency reservations and copying the non-contiguous pmap content into that during context switch? I don't think that was Ragge's problem. The problem was/is (if I understood it right), that someone can mmap more than 1G, and that will never be possible to map on the VAX. The P0 space is only 1G, and the same is true for the P1 space. But P0 and P1 is disjunct, so don't try to think of them as contiguous space. So, anything trying to grab more than 1G will never be possible. But it would appear that the MI part don't give any hooks to stop a process from going just that. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: mmap implementation advice needed.
(Should this be moved to somewhere VAX-specific, maybe?) > The problem was/is (if I understood it right), that someone can mmap > more than 1G, and that will never be possible to map on the VAX. Well...close. > The P0 space is only 1G, and the same is true for the P1 space. But > P0 and P1 is disjunct, so don't try to think of them as contiguous > space. But they are contiguous: the top of P0 space abuts the bottom of P1 space. The problem is, P0 grows up and P1 grows down, so taking advantage of that contiguity means both spaces must be fully existent (in the sense that the P0 and P1 length registers must indicate that all possible pages exist). That means two 8M chunks of system virtual space to contain all those PTEs. And _that_ means two 64K chunks out of the (physically contiguous) block holding system PTEs - which means either moving the system page tables live or allocating over 128K for system PTEs at boot, each of which has its own problems. Well over 128K, if you want to quickly switch among multiple such processes. And, even then, you'd still be limited to 2G. In principle, this could be raised to a little under 3G; nothing says system pages can't be accessible to user mode. But it would greatly complicate - and slow down - task switching, because there is no hardware assist for switching the system-space portion. ("A little under" 3G because you'd need 16M of system virtual space for P0/P1 page tables, plus at least one page for trap handler entry points. Unless you want to get really fancy, you'll also need space for kernel text/data/bss.) If you just want more than 1G mapped, but mapped in small chunks, then you don't need to use the point where P0 abuts P1. But you will still need that (approximately) 1/16384th as much physical space as your total user space, for system PTEs to map user PTEs. > So, anything trying to grab more than 1G will never be possible. Probably. Theoretically possible, but unlikely to ever be implemented. > But it would appear that the MI part don't give any hooks to stop a > process from going just that. Isn't the VAX pmap code in a position to reject attempts to allocate virtual space it doesn't support? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B