Re: remap_file_pages() use
Jeff Smith wrote: > I've got no real issues at this point, but could you perhaps elaborate > a bit on the rough order of magnitude of "long" time We have never-break-ABI policy in kernel. In practice it means we don't remove an interface if somebody could notice that it disappears. Most likely it will go though intermediate step with separate kernel config option, like uselib(2) recently. Don't worry. Just convert your code to avoid remap_file_pages() where it's possible. > and what cases would be slower? With emulation each remap_file_pages() will usually create one or two additional kernel structure which represents part of virtual address space of the process. Kernel often needs to lookup that structure by virtual address and this operation can be slower with high number of structures. I've tried to test how bad it would be in near-worst case: 4G mapped in reverse page order. It creates 1 million structures. On my machine, fault in of all this memory is ~1.9 times slower with emulation comparing to original remap_file_pages(2). In practice, nobody will notice, I believe. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
I've got no real issues at this point, but could you perhaps elaborate a bit on the rough order of magnitude of "long" time and what cases would be slower? I'm pretty sure that some places I've worked that still have some remap_file_pages() logic in place aren't too religious about checking dmesg. --Jeff On Mon, May 26, 2014 at 8:47 AM, Kirill A. Shutemov wrote: > Jeff Smith wrote: >> OK, I misinterpreted "the overlapped part of the mapping(s) will be >> discarded" as discarding the -new- mappings. My objections about >> needing a replacement for remap_file_pages() are gone, but my concerns >> about existing code still remain. > > As I said, emulation will be there for long time. With warning in dmesg. > The emulation is interface-compatible, but slower in some cases (not > your's). > > -- > Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Jeff Smith wrote: > OK, I misinterpreted "the overlapped part of the mapping(s) will be > discarded" as discarding the -new- mappings. My objections about > needing a replacement for remap_file_pages() are gone, but my concerns > about existing code still remain. As I said, emulation will be there for long time. With warning in dmesg. The emulation is interface-compatible, but slower in some cases (not your's). -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
OK, I misinterpreted "the overlapped part of the mapping(s) will be discarded" as discarding the -new- mappings. My objections about needing a replacement for remap_file_pages() are gone, but my concerns about existing code still remain. --Jeff On Mon, May 26, 2014 at 8:35 AM, Paolo Bonzini wrote: > Il 26/05/2014 15:24, Jeff Smith ha scritto: > >> Your addr2 mmap() call is a bit incorrect semantically and >> syntactically (you skipped the length arg). The addr2 request will >> fail because mmap() does not implicitly munmap() occupied virtual >> address space. > > > With MAP_FIXED it does. It is in the man page. > > Paolo > > >> Even if you did that, the following still has a race >> condition between the addr2 request and another thread grabbing the >> same virtual space, which nothing short of a lock on all threads' >> mmap()-ing logic can protect: > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Jeff Smith wrote: > >> Mirrored mapping is absolutely required by several > >> independent proprietary platforms I'm aware of, and remap_file_pages() > >> has historically been the only sane way to accomplish this. (i.e., > >> shm_open(), mmap(NULL, 2^(n+1) pages), remap_file_pages() on 2nd > >> half). > > > > Em.. What's wrong with shm_open() + two mmap()s to cover both halfs? > > > > fd = shm_open(); > > addr1 = mmap(NULL, 2*SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > > addr2 = mmap(addr1 + SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_FIXED, > > fd, 0); > > > > Is there a reason why it doens't work? > > Your addr2 mmap() call is a bit incorrect semantically and > syntactically (you skipped the length arg). My bad. > The addr2 request will fail because mmap() does not implicitly munmap() > occupied virtual address space. Please, consider reading man page for mmap(2). MAP_FIXED in particular. > Even if you did that, the following still has a race > condition between the addr2 request and another thread grabbing the > same virtual space, which nothing short of a lock on all threads' > mmap()-ing logic can protect: > > addr1 = mmap(NULL, 2*SIZE, PROT_READ, MAP_SHARED, fd, 0); > munmap(addr1 + SIZE, SIZE); > /* race on virtual address space here, but n/a for remap_file_pages() ... */ > addr2 = mmap(addr1, SIZE, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0); No. MAP_FIXED will do the job: it does munmap() + mmap() atomically from userspace POV. > >> but failing that, a reservation API would need > >> to be created (possibly a MAP_RESERVE flag) that would set aside a > >> region that could only be subsequently mapped via explicit > >> address-requesting mmap() calls. > > > > I don't get this part. > > I'm proposing that a call along the lines of mmap(NULL, len, prot, > MAP_RESERVED | ..., fd, offset) could return a virtual address block > that is -not- actually mapped but -is- protected from other mmap() > calls not explicitly requesting the space via their addr parameters. > Unfortunately, you'd also need to define separate semantics to > un-reserving not-mapped space, etc. You're inventing a wheel. All you need is there for ages. And in portable way. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Il 26/05/2014 15:24, Jeff Smith ha scritto: Your addr2 mmap() call is a bit incorrect semantically and syntactically (you skipped the length arg). The addr2 request will fail because mmap() does not implicitly munmap() occupied virtual address space. With MAP_FIXED it does. It is in the man page. Paolo Even if you did that, the following still has a race condition between the addr2 request and another thread grabbing the same virtual space, which nothing short of a lock on all threads' mmap()-ing logic can protect: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
>> Mirrored mapping is absolutely required by several >> independent proprietary platforms I'm aware of, and remap_file_pages() >> has historically been the only sane way to accomplish this. (i.e., >> shm_open(), mmap(NULL, 2^(n+1) pages), remap_file_pages() on 2nd >> half). > > Em.. What's wrong with shm_open() + two mmap()s to cover both halfs? > > fd = shm_open(); > addr1 = mmap(NULL, 2*SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > addr2 = mmap(addr1 + SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, > 0); > > Is there a reason why it doens't work? Your addr2 mmap() call is a bit incorrect semantically and syntactically (you skipped the length arg). The addr2 request will fail because mmap() does not implicitly munmap() occupied virtual address space. Even if you did that, the following still has a race condition between the addr2 request and another thread grabbing the same virtual space, which nothing short of a lock on all threads' mmap()-ing logic can protect: addr1 = mmap(NULL, 2*SIZE, PROT_READ, MAP_SHARED, fd, 0); munmap(addr1 + SIZE, SIZE); /* race on virtual address space here, but n/a for remap_file_pages() ... */ addr2 = mmap(addr1, SIZE, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0); remap_file_pages() is not subject to this problem and allows the creation of considerably cleaner code. Protecting the address space corner cases with locks or arbitrarily bounded munmap()-and-retry loops is a substantial burden over the historically provided approach. >> but failing that, a reservation API would need >> to be created (possibly a MAP_RESERVE flag) that would set aside a >> region that could only be subsequently mapped via explicit >> address-requesting mmap() calls. > > I don't get this part. I'm proposing that a call along the lines of mmap(NULL, len, prot, MAP_RESERVED | ..., fd, offset) could return a virtual address block that is -not- actually mapped but -is- protected from other mmap() calls not explicitly requesting the space via their addr parameters. Unfortunately, you'd also need to define separate semantics to un-reserving not-mapped space, etc. The important issue is that users need to be able to trivially protect themselves from transient virtual address space congestion problems and only fail early on long-term exhaustion situations. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Jeff Smith wrote: > On Mon, May 19, 2014 at 9:38 AM, Christoph Hellwig wrote: > > On Mon, May 19, 2014 at 05:35:40PM +0300, Kirill A. Shutemov wrote: > >> >From functional POV, emulation *should* be identical to original > >> remap_file_pages(), but slower. It would be nice, if you test it early. > >> > >> It's not clear yet how long emulation will be there. > > > > Stop right there. We found out about two real life users of > > remap_file_pages() already, without even committing the patches to warn > > about using it to any tree. > > > > I think at this point the whole idea of removing the API should be dead > > on the floor, as we do not needlessly break userspace programs. > > > > If we can get rid of the ugly guts and provide a good enough emulation > > that the user won't cry I'd love to get rid of this cruft, but even > > that doesn't look certain yet. > > Sorry for being late to the party, but I just noticed this proposal > via the LWN summary byline. > > I wanted to comment that Kenny's use case is (I believe) quite > widespread. I've used the technique since ~2008, and I've come across > other people in subsequent jobs who independently developed the same > technique. Mirrored mapping is absolutely required by several > independent proprietary platforms I'm aware of, and remap_file_pages() > has historically been the only sane way to accomplish this. (i.e., > shm_open(), mmap(NULL, 2^(n+1) pages), remap_file_pages() on 2nd > half). Em.. What's wrong with shm_open() + two mmap()s to cover both halfs? fd = shm_open(); addr1 = mmap(NULL, 2*SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); addr2 = mmap(addr1 + SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0); Is there a reason why it doens't work? > It may not be individuals who are involved in the kernel development > scene to any great extent, but I am sure that remap_file_pages() being > deprecated would seriously piss off a lot of individuals. The pattern > has even had a section in the Wikipedia article for quite some time: > http://en.wikipedia.org/wiki/Circular_buffer#Mirroring I believe remap_file_pages() is abused here. But it seems we will have to keep emulation in place for a long time. > > It would be most preferable from a user standpoint to keep the > existing system intact, but failing that, a reservation API would need > to be created (possibly a MAP_RESERVE flag) that would set aside a > region that could only be subsequently mapped via explicit > address-requesting mmap() calls. I don't get this part. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 9:38 AM, Christoph Hellwig wrote: > On Mon, May 19, 2014 at 05:35:40PM +0300, Kirill A. Shutemov wrote: >> >From functional POV, emulation *should* be identical to original >> remap_file_pages(), but slower. It would be nice, if you test it early. >> >> It's not clear yet how long emulation will be there. > > Stop right there. We found out about two real life users of > remap_file_pages() already, without even committing the patches to warn > about using it to any tree. > > I think at this point the whole idea of removing the API should be dead > on the floor, as we do not needlessly break userspace programs. > > If we can get rid of the ugly guts and provide a good enough emulation > that the user won't cry I'd love to get rid of this cruft, but even > that doesn't look certain yet. Sorry for being late to the party, but I just noticed this proposal via the LWN summary byline. I wanted to comment that Kenny's use case is (I believe) quite widespread. I've used the technique since ~2008, and I've come across other people in subsequent jobs who independently developed the same technique. Mirrored mapping is absolutely required by several independent proprietary platforms I'm aware of, and remap_file_pages() has historically been the only sane way to accomplish this. (i.e., shm_open(), mmap(NULL, 2^(n+1) pages), remap_file_pages() on 2nd half). It may not be individuals who are involved in the kernel development scene to any great extent, but I am sure that remap_file_pages() being deprecated would seriously piss off a lot of individuals. The pattern has even had a section in the Wikipedia article for quite some time: http://en.wikipedia.org/wiki/Circular_buffer#Mirroring It would be most preferable from a user standpoint to keep the existing system intact, but failing that, a reservation API would need to be created (possibly a MAP_RESERVE flag) that would set aside a region that could only be subsequently mapped via explicit address-requesting mmap() calls. Thanks for any consideration of these concerns. --Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Tue, 20 May 2014, Kenny Simpson wrote: > I might need a gentle nudge with a clue stick... > checking against latest git tree it looks as though most common > filesystem types do support remap_file_pages. > > I just wrote a simple test case and it worked on my 3.13-based ubuntu > 14.04 system on an ext4 filesystem. It is all very confusing, yes. When Kirill said disk-backed files don't support remap_file_pages since commit 3ee6dafc677a, he was meaning that that they do not support it with a special nonlinear vma; but the remap_file_pages syscall emulates the layout for them with separate linear vmas instead. Confusingly, these filesystems opt in to this emulation by pointing their remap_pages method to generic_file_remap_pages - code which is then never used for them! tmpfs is the only filesystem (having no page_mkwrite) which actually passes through that code. You can understand why there's some enthusiasm for cleaning this up :) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
I might need a gentle nudge with a clue stick... checking against latest git tree it looks as though most common filesystem types do support remap_file_pages. I just wrote a simple test case and it worked on my 3.13-based ubuntu 14.04 system on an ext4 filesystem. thanks, -Kenny Here was my simple test case: (it doesn't have error handling, but the case passed, and running under strace shows all system calls as passing as well) #include #include #include #include #include // make a 16-page file, map page 17 over the first page, write to the aliasing page, assert that it is seen on the first page int main() { unlink("foo"); int fd = open("foo", O_CREAT | O_RDWR, 0755); ftruncate(fd, 16*4096); void* ptr = mmap(0, 17*4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); char* cptr = (char*) ptr; // remap the last page over the first remap_file_pages(cptr + 16*4096/*addr*/, 4096/*size*/, 0/*prot*/, 0/*pgoff*/, 0/*flags*/); cptr[16*4096] = 'a'; return cptr[0] != 'a'; // if this aliases, this will be 'a' } On Tue, May 20, 2014 at 9:53 PM, Kenny Simpson wrote: > ouch... hope they don't try to run that code on anything newer then :( > Will let them know. > > -Kenny > > > On Mon, May 19, 2014 at 5:24 PM, Kirill A. Shutemov > wrote: >> On Mon, May 19, 2014 at 01:34:05PM -0400, Kenny Simpson wrote: >>> For the other cases I had used the remapping to have more of a sliding >>> window over a disk-backed file where I also was using aliasing to >>> eliminate the corner cases of hitting the end of a window and needing >>> to split records due to crossing boundaries, etc.. >> >> Disk backed files are not supported by remap_file_pages() since 2007. >> See commit 3ee6dafc677a. >> >> -- >> Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
ouch... hope they don't try to run that code on anything newer then :( Will let them know. -Kenny On Mon, May 19, 2014 at 5:24 PM, Kirill A. Shutemov wrote: > On Mon, May 19, 2014 at 01:34:05PM -0400, Kenny Simpson wrote: >> For the other cases I had used the remapping to have more of a sliding >> window over a disk-backed file where I also was using aliasing to >> eliminate the corner cases of hitting the end of a window and needing >> to split records due to crossing boundaries, etc.. > > Disk backed files are not supported by remap_file_pages() since 2007. > See commit 3ee6dafc677a. > > -- > Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Hi Dave, On 19 May 2014 19:50, Dave Hansen wrote: > We keep the current count as mm->map_count in the kernel, and the limit > is available because it's a sysctl. It wouldn't be hard to dump > mm->map_count out in a /proc file somewhere if it would be useful to > you. Would that work, or is there some other interface that would be > more convenient? We can keep an in-process estimate of this value anyway. The sysctl to get vm.max_map_count is all we need. That is, unless the limit is moved to be per-user instead of per-process, as I think was discussed here --- which would complicate further our job :-/ A bientôt, Armin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 01:34:05PM -0400, Kenny Simpson wrote: > For the other cases I had used the remapping to have more of a sliding > window over a disk-backed file where I also was using aliasing to > eliminate the corner cases of hitting the end of a window and needing > to split records due to crossing boundaries, etc.. Disk backed files are not supported by remap_file_pages() since 2007. See commit 3ee6dafc677a. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 5:17 PM, Kirill A. Shutemov wrote: > Christoph Hellwig wrote: >> On Mon, May 19, 2014 at 06:02:38PM +0300, Kirill A. Shutemov wrote: >> > > Stop right there. We found out about two real life users of >> > > remap_file_pages() already, without even committing the patches to warn >> > > about using it to any tree. >> > >> > Who is the second here? Oracle? >> >> PyPy. >> >> Oracle would be the third, although use of the feature is optional and >> fairly hard to opt in to, so I didn't count it. > > IIUC PyPy uses the syscall in some early prototype and looks like guys are > okay to rework it to mmap() if default sysctl_max_map_count will be high > enough. A quick search on github exposed three more users: https://github.com/LotharSchwab/R-Framework https://github.com/const86/glgrab https://github.com/minimoog/virtualringbuffer -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On 05/19/2014 09:42 AM, Armin Rigo wrote: > If there is an official way to know in advance how many remappings our > process is allowed to perform, then we could adapt as the size > increases. Or maybe catching ENOMEM and doubling the remapping size > (in some process-wide synchronization point). All in all, thanks for > the note: it looks like there are solutions (even if less elegant than > remap_file_pages from the user's perspective). We keep the current count as mm->map_count in the kernel, and the limit is available because it's a sysctl. It wouldn't be hard to dump mm->map_count out in a /proc file somewhere if it would be useful to you. Would that work, or is there some other interface that would be more convenient? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
For the circular buffer case, yes I could make some temp file under /dev/shm, unlink it, mmap() it multiple times, etc... its just makes it a little more hairy. https://lkml.org/lkml/2011/3/19/94 For the other cases I had used the remapping to have more of a sliding window over a disk-backed file where I also was using aliasing to eliminate the corner cases of hitting the end of a window and needing to split records due to crossing boundaries, etc.. These were done for other projects in the past, but I am no longer on those projects (or with some of the companies), so I can't as easily double check the code or make changes. Being able to replace mappings in-place was logically simpler than doing mmap() over the old mappings. I was also under the impression that it should have less pressure on the mm_sem - I'm pretty sure I did profile it many years ago (>5), but many things have changed since then so maybe this is not a big deal any more. This use case was more performance-critical than the one-time setup of the anonymous circular buffer as it was more like a transaction log/journal. All that said, if remap_file_pages were to go away as of a mainline kernel this year, the projects I had worked on would probably not be impacted for many years and would probably have been retired/rewritten before they would ever see the change (unless some Linux vendor in N.C. backported its removal). Since valgrind doesn't support it, I tend to write fallback/debug variants anyway. -Kenny On Mon, May 19, 2014 at 10:35 AM, Kirill A. Shutemov wrote: > Michal Hocko wrote: >> [CCing Kirill and other people involved] >> >> On Sun 18-05-14 00:03:28, Kenny Simpson wrote: >> > I saw that remap_file_pages() was possibly going away to be replaced >> > by some emulation. I've used this call in several projects over the >> > years mostly as a way of mapping multiple virtual memory pages to >> > alias the same private or shared memory region (to do things like >> > circular buffers). mmap() >> > in the case of anonymous memory doesn't work as well since there is >> > not a file descriptor to reference. >> > >> > Would this sort of thing be supported in the emulation, or should I be >> > planning on reimplementing/rewriting some things? > > From functional POV, emulation *should* be identical to original > remap_file_pages(), but slower. It would be nice, if you test it early. > > It's not clear yet how long emulation will be there. > Is there a reason why you can't use fd from shared memory (shm_open() or > direct open() on /dev/shm/xxx)? > > -- > Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 06:17:58PM +0300, Kirill A. Shutemov wrote: > IIUC PyPy uses the syscall in some early prototype and looks like guys are > okay to rework it to mmap() if default sysctl_max_map_count will be high > enough. My point is that we already found a few users just by discussing the issue on lkml, and thus establishing that a) there are users outside of Oracle, and b) there probably will be lots more that we don't even know about. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Hi Kirill, On 19 May 2014 17:53, Kirill A. Shutemov wrote: > Is it nessesary to remap in 4k chunks for you? > What about 64k chunks? Or something bigger? Good point. We remap chunks of 4k, which is not much, but is already much larger than the typical object size. Suppose we do such a remapping for a single object: then all other neighbouring objects that happen to live in the same page are also copied. Then, if some other thread modifies these other objects, we need extra copies to keep the objects in sync across all of their versions. That's the reason for keeping the size of remappings as small as possible. But we need to measure the actual impact. We can easily argue that if the process is using many GB of memory, then the risk of unrelated copies starts to decrease. It might be fine to increase the remapping unit in this case. If there is an official way to know in advance how many remappings our process is allowed to perform, then we could adapt as the size increases. Or maybe catching ENOMEM and doubling the remapping size (in some process-wide synchronization point). All in all, thanks for the note: it looks like there are solutions (even if less elegant than remap_file_pages from the user's perspective). A bientôt, Armin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Armin Rigo wrote: > Hi Kirill, > > On 19 May 2014 17:17, Kirill A. Shutemov > wrote: > > IIUC PyPy uses the syscall in some early prototype and looks like guys are > > okay to rework it to mmap() if default sysctl_max_map_count will be high > > enough. > > Yes, we can switch easily if needed. The syscall is not in any > "production" version yet. > > Please note that "high enough" in this context means higher than > 2**20. We need it high enough to handle regularly 10-20% of all the > RAM used by each program. If I count correctly, at 20%, 2**20 fails > above 20GB. In general I would suggest to use a default limit that > depends on the amount of RAM (+swap) available. Is it nessesary to remap in 4k chunks for you? What about 64k chunks? Or something bigger? This way you can scale down required number of VMAs to something more reasonable. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Hi Kirill, On 19 May 2014 17:17, Kirill A. Shutemov wrote: > IIUC PyPy uses the syscall in some early prototype and looks like guys are > okay to rework it to mmap() if default sysctl_max_map_count will be high > enough. Yes, we can switch easily if needed. The syscall is not in any "production" version yet. Please note that "high enough" in this context means higher than 2**20. We need it high enough to handle regularly 10-20% of all the RAM used by each program. If I count correctly, at 20%, 2**20 fails above 20GB. In general I would suggest to use a default limit that depends on the amount of RAM (+swap) available. A bientôt, Armin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Christoph Hellwig wrote: > On Mon, May 19, 2014 at 06:02:38PM +0300, Kirill A. Shutemov wrote: > > > Stop right there. We found out about two real life users of > > > remap_file_pages() already, without even committing the patches to warn > > > about using it to any tree. > > > > Who is the second here? Oracle? > > PyPy. > > Oracle would be the third, although use of the feature is optional and > fairly hard to opt in to, so I didn't count it. IIUC PyPy uses the syscall in some early prototype and looks like guys are okay to rework it to mmap() if default sysctl_max_map_count will be high enough. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 06:02:38PM +0300, Kirill A. Shutemov wrote: > > Stop right there. We found out about two real life users of > > remap_file_pages() already, without even committing the patches to warn > > about using it to any tree. > > Who is the second here? Oracle? PyPy. Oracle would be the third, although use of the feature is optional and fairly hard to opt in to, so I didn't count it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Christoph Hellwig wrote: > On Mon, May 19, 2014 at 05:35:40PM +0300, Kirill A. Shutemov wrote: > > >From functional POV, emulation *should* be identical to original > > remap_file_pages(), but slower. It would be nice, if you test it early. > > > > It's not clear yet how long emulation will be there. > > Stop right there. We found out about two real life users of > remap_file_pages() already, without even committing the patches to warn > about using it to any tree. Who is the second here? Oracle? > I think at this point the whole idea of removing the API should be dead > on the floor, as we do not needlessly break userspace programs. I'm fine if emulation will stay there for long time, but it's good idea covert users to more standard API eariler where possible. > If we can get rid of the ugly guts and provide a good enough emulation > that the user won't cry I'd love to get rid of this cruft, but even > that doesn't look certain yet. And that's why I ask to test it. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
On Mon, May 19, 2014 at 05:35:40PM +0300, Kirill A. Shutemov wrote: > >From functional POV, emulation *should* be identical to original > remap_file_pages(), but slower. It would be nice, if you test it early. > > It's not clear yet how long emulation will be there. Stop right there. We found out about two real life users of remap_file_pages() already, without even committing the patches to warn about using it to any tree. I think at this point the whole idea of removing the API should be dead on the floor, as we do not needlessly break userspace programs. If we can get rid of the ugly guts and provide a good enough emulation that the user won't cry I'd love to get rid of this cruft, but even that doesn't look certain yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
Michal Hocko wrote: > [CCing Kirill and other people involved] > > On Sun 18-05-14 00:03:28, Kenny Simpson wrote: > > I saw that remap_file_pages() was possibly going away to be replaced > > by some emulation. I've used this call in several projects over the > > years mostly as a way of mapping multiple virtual memory pages to > > alias the same private or shared memory region (to do things like > > circular buffers). mmap() > > in the case of anonymous memory doesn't work as well since there is > > not a file descriptor to reference. > > > > Would this sort of thing be supported in the emulation, or should I be > > planning on reimplementing/rewriting some things? >From functional POV, emulation *should* be identical to original remap_file_pages(), but slower. It would be nice, if you test it early. It's not clear yet how long emulation will be there. Is there a reason why you can't use fd from shared memory (shm_open() or direct open() on /dev/shm/xxx)? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remap_file_pages() use
[CCing Kirill and other people involved] On Sun 18-05-14 00:03:28, Kenny Simpson wrote: > I saw that remap_file_pages() was possibly going away to be replaced > by some emulation. I've used this call in several projects over the > years mostly as a way of mapping multiple virtual memory pages to > alias the same private or shared memory region (to do things like > circular buffers). mmap() > in the case of anonymous memory doesn't work as well since there is > not a file descriptor to reference. > > Would this sort of thing be supported in the emulation, or should I be > planning on reimplementing/rewriting some things? > > thanks, > -Kenny > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/