Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote: > Here is my understanding of how things work for old_len == 0 of anon > mappings: > - shared mappings > - New vma is created at new virtual address > - vma refers to the same underlying object/pages as old vma > - after mremap, no page tables exist for new vma, they are > created as pages are accessed/faulted > - page at new_address is same as page at old_address Yes, and this isn't backed by anon memory, it's backed by shmem. "Shared anon mapping" is really synonymous of shmem, the fact it's not a mmap of a tmpfs file is purely an API detail. > - private mappings > - New vma is created at new virtual address > - vma does not refer to same pages as old vma. It is a 'new' > private anon mapping. > - after mremap, no page tables exist for new vma. access to > the range of the new vma will result in faults that allocate > a new page. > - page at new_address is different than page at old_address > the new vma will result in new Yes, for a anon private mapping (so backed by real anonymous memory) no payload in the old vma could possibly go in the new vma. > So, the result of mremap(old_len == 0) on a private mapping is that it > simply creates a new private mapping. IMO, this is contrary to the purpose > of mremap. mremap should return a mapping that is somehow related to > the original mapping. I agree there's no point to ever use the mremap(old_len == 0) undocumented trick, to create a new anon private mmap, when you could use mmap instead and the result would be the same. So it's plausible nobody could use it for it. > Perhaps you are thinking about mremap of a private file mapping? I was > not considering that case. I believe you are right. In this case a > private COW mapping based on the original mapping would be created. So, > this seems more in line with the intent of mremap. The new mapping is > still related to the old mapping. Yes my earlier example was all about filebacked private mappings, to point out those also have a deterministic behavior with the old_len == 0 trick and it could be still used because the IPC_RMID was executed early on. The point is that you could always use a plain new mmap instead of the old_len == 0 trick, but that applies to shared mappings as well. My argument is that if you keep it and document it for shared anon mappings, I don't see something fundamentally wrong as keeping it for private filebacked mappings too as the shmat ID may have been deleted for those too. > With this in mind, what about returning EINVAL only for the anon private > mapping case? The only case where there's no excuse to use mremap(old_len == 0) as replacement for a new mmap is the private anon mappings case, so while it may still break something (as opposed to a deprecation warning), I guess the likely hood somebody is using it, is very low. > However, if you have a fd (for a file mapping) then I can not see why > someone would be using the old_len == 0 trick. It would be more straight > forward to simply use mmap to create the additional mapping. That applies to MAP_SHARED too and that's why deprecating the whole undocumented old_len ==0 sounded and still sound attractive to me, but doing it right away without a deprecation warning cycle, sounds too risky. > > So an alternative would be to start by adding a WARN_ON_ONCE deprecation > > warning instead of -EINVAL right away. > > > > The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side > > effect of using the old_len == 0 trick looks like a bug, I guess it > > should get fixed if we intend to keep old_len and document it for the > > long term. > > Others seem to think we should keep old_len == 0 and document. The only case where it makes sense is after IPC_RMID, but with memfd_create there's no point anymore to use IPC_RMID. tmpfs/hugetlbfs/realfs files can be unlinked while the fd is still open so again no need of the mremap(old_len == 0) trick. Which is why I'd find it attractive to deprecate it if we could, but I assume we can't drop it even if undocumented, which is why I felt a deprecation warning would be suitable in this case (similar to deprecation warning of sysfs and then dropped via config option). I am assuming here that nobody is using it because it's undocumented and it has a bug in the VM_ACCOUNT code too. Without a deprecation warning it'd be hard to tell if the assumption is correct. > I assume you are concerned about the do_munmap call in move_vma? That Yes exactly. > does indeed look to be of concern. This happens AFTER setting up the > new mapping. So, I'm thinking we should tear down the new mapping in > the case do_munmap of the old mapping fails? That 'should' simply > be a matter of: > - moving page tables back to original mapping > - remove/delete new vma Yes. > - I don't think we need to 'unmap' the new vma as there
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote: > Here is my understanding of how things work for old_len == 0 of anon > mappings: > - shared mappings > - New vma is created at new virtual address > - vma refers to the same underlying object/pages as old vma > - after mremap, no page tables exist for new vma, they are > created as pages are accessed/faulted > - page at new_address is same as page at old_address Yes, and this isn't backed by anon memory, it's backed by shmem. "Shared anon mapping" is really synonymous of shmem, the fact it's not a mmap of a tmpfs file is purely an API detail. > - private mappings > - New vma is created at new virtual address > - vma does not refer to same pages as old vma. It is a 'new' > private anon mapping. > - after mremap, no page tables exist for new vma. access to > the range of the new vma will result in faults that allocate > a new page. > - page at new_address is different than page at old_address > the new vma will result in new Yes, for a anon private mapping (so backed by real anonymous memory) no payload in the old vma could possibly go in the new vma. > So, the result of mremap(old_len == 0) on a private mapping is that it > simply creates a new private mapping. IMO, this is contrary to the purpose > of mremap. mremap should return a mapping that is somehow related to > the original mapping. I agree there's no point to ever use the mremap(old_len == 0) undocumented trick, to create a new anon private mmap, when you could use mmap instead and the result would be the same. So it's plausible nobody could use it for it. > Perhaps you are thinking about mremap of a private file mapping? I was > not considering that case. I believe you are right. In this case a > private COW mapping based on the original mapping would be created. So, > this seems more in line with the intent of mremap. The new mapping is > still related to the old mapping. Yes my earlier example was all about filebacked private mappings, to point out those also have a deterministic behavior with the old_len == 0 trick and it could be still used because the IPC_RMID was executed early on. The point is that you could always use a plain new mmap instead of the old_len == 0 trick, but that applies to shared mappings as well. My argument is that if you keep it and document it for shared anon mappings, I don't see something fundamentally wrong as keeping it for private filebacked mappings too as the shmat ID may have been deleted for those too. > With this in mind, what about returning EINVAL only for the anon private > mapping case? The only case where there's no excuse to use mremap(old_len == 0) as replacement for a new mmap is the private anon mappings case, so while it may still break something (as opposed to a deprecation warning), I guess the likely hood somebody is using it, is very low. > However, if you have a fd (for a file mapping) then I can not see why > someone would be using the old_len == 0 trick. It would be more straight > forward to simply use mmap to create the additional mapping. That applies to MAP_SHARED too and that's why deprecating the whole undocumented old_len ==0 sounded and still sound attractive to me, but doing it right away without a deprecation warning cycle, sounds too risky. > > So an alternative would be to start by adding a WARN_ON_ONCE deprecation > > warning instead of -EINVAL right away. > > > > The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side > > effect of using the old_len == 0 trick looks like a bug, I guess it > > should get fixed if we intend to keep old_len and document it for the > > long term. > > Others seem to think we should keep old_len == 0 and document. The only case where it makes sense is after IPC_RMID, but with memfd_create there's no point anymore to use IPC_RMID. tmpfs/hugetlbfs/realfs files can be unlinked while the fd is still open so again no need of the mremap(old_len == 0) trick. Which is why I'd find it attractive to deprecate it if we could, but I assume we can't drop it even if undocumented, which is why I felt a deprecation warning would be suitable in this case (similar to deprecation warning of sysfs and then dropped via config option). I am assuming here that nobody is using it because it's undocumented and it has a bug in the VM_ACCOUNT code too. Without a deprecation warning it'd be hard to tell if the assumption is correct. > I assume you are concerned about the do_munmap call in move_vma? That Yes exactly. > does indeed look to be of concern. This happens AFTER setting up the > new mapping. So, I'm thinking we should tear down the new mapping in > the case do_munmap of the old mapping fails? That 'should' simply > be a matter of: > - moving page tables back to original mapping > - remove/delete new vma Yes. > - I don't think we need to 'unmap' the new vma as there
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/13/2017 09:30 AM, Andrea Arcangeli wrote: > On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: >> Sent a patch (in separate e-mail thread) to return EINVAL for private >> mappings. > > The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me > than the alternative of copying pagetables for anon pages (as behaving > the way that way avoids to break anon pages invariants), despite it's > not creating an exact mirror of what was in the original vma as it > excludes any modification done to cowed anon pages. > > By nullifying move_page_tables old_len == 0 is simply duping the vma > which is equivalent to a new mmap on the file for the MAP_PRIVATE > case, it has a deterministic result. The real question is if it > anybody is using it. As previously discussed, copying pagetables (via move_page_tables) does not happen if old_len == 0. This is true for both for private and shared mappings. Here is my understanding of how things work for old_len == 0 of anon mappings: - shared mappings - New vma is created at new virtual address - vma refers to the same underlying object/pages as old vma - after mremap, no page tables exist for new vma, they are created as pages are accessed/faulted - page at new_address is same as page at old_address - private mappings - New vma is created at new virtual address - vma does not refer to same pages as old vma. It is a 'new' private anon mapping. - after mremap, no page tables exist for new vma. access to the range of the new vma will result in faults that allocate a new page. - page at new_address is different than page at old_address the new vma will result in new So, the result of mremap(old_len == 0) on a private mapping is that it simply creates a new private mapping. IMO, this is contrary to the purpose of mremap. mremap should return a mapping that is somehow related to the original mapping. Perhaps you are thinking about mremap of a private file mapping? I was not considering that case. I believe you are right. In this case a private COW mapping based on the original mapping would be created. So, this seems more in line with the intent of mremap. The new mapping is still related to the old mapping. With this in mind, what about returning EINVAL only for the anon private mapping case? However, if you have a fd (for a file mapping) then I can not see why someone would be using the old_len == 0 trick. It would be more straight forward to simply use mmap to create the additional mapping. > So an alternative would be to start by adding a WARN_ON_ONCE deprecation > warning instead of -EINVAL right away. > > The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side > effect of using the old_len == 0 trick looks like a bug, I guess it > should get fixed if we intend to keep old_len and document it for the > long term. Others seem to think we should keep old_len == 0 and document. > Overall I'm more concerned about the fact an allocation failure in > do_munmap is unreported to userland and it will leave the old vma > intact like old_len == 0 would do (unless I'm misreading something > there). The VM_ACCOUNT wipe as side effect of old_len == 0 is not > major short term concern. I assume you are concerned about the do_munmap call in move_vma? That does indeed look to be of concern. This happens AFTER setting up the new mapping. So, I'm thinking we should tear down the new mapping in the case do_munmap of the old mapping fails? That 'should' simply be a matter of: - moving page tables back to original mapping - remove/delete new vma - I don't think we need to 'unmap' the new vma as there should be no associated pages. I'll look into doing this as well. Just curious, do those userfaultfd callouts still work as desired in the case of map duplication (old_len == 0)? -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/13/2017 09:30 AM, Andrea Arcangeli wrote: > On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: >> Sent a patch (in separate e-mail thread) to return EINVAL for private >> mappings. > > The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me > than the alternative of copying pagetables for anon pages (as behaving > the way that way avoids to break anon pages invariants), despite it's > not creating an exact mirror of what was in the original vma as it > excludes any modification done to cowed anon pages. > > By nullifying move_page_tables old_len == 0 is simply duping the vma > which is equivalent to a new mmap on the file for the MAP_PRIVATE > case, it has a deterministic result. The real question is if it > anybody is using it. As previously discussed, copying pagetables (via move_page_tables) does not happen if old_len == 0. This is true for both for private and shared mappings. Here is my understanding of how things work for old_len == 0 of anon mappings: - shared mappings - New vma is created at new virtual address - vma refers to the same underlying object/pages as old vma - after mremap, no page tables exist for new vma, they are created as pages are accessed/faulted - page at new_address is same as page at old_address - private mappings - New vma is created at new virtual address - vma does not refer to same pages as old vma. It is a 'new' private anon mapping. - after mremap, no page tables exist for new vma. access to the range of the new vma will result in faults that allocate a new page. - page at new_address is different than page at old_address the new vma will result in new So, the result of mremap(old_len == 0) on a private mapping is that it simply creates a new private mapping. IMO, this is contrary to the purpose of mremap. mremap should return a mapping that is somehow related to the original mapping. Perhaps you are thinking about mremap of a private file mapping? I was not considering that case. I believe you are right. In this case a private COW mapping based on the original mapping would be created. So, this seems more in line with the intent of mremap. The new mapping is still related to the old mapping. With this in mind, what about returning EINVAL only for the anon private mapping case? However, if you have a fd (for a file mapping) then I can not see why someone would be using the old_len == 0 trick. It would be more straight forward to simply use mmap to create the additional mapping. > So an alternative would be to start by adding a WARN_ON_ONCE deprecation > warning instead of -EINVAL right away. > > The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side > effect of using the old_len == 0 trick looks like a bug, I guess it > should get fixed if we intend to keep old_len and document it for the > long term. Others seem to think we should keep old_len == 0 and document. > Overall I'm more concerned about the fact an allocation failure in > do_munmap is unreported to userland and it will leave the old vma > intact like old_len == 0 would do (unless I'm misreading something > there). The VM_ACCOUNT wipe as side effect of old_len == 0 is not > major short term concern. I assume you are concerned about the do_munmap call in move_vma? That does indeed look to be of concern. This happens AFTER setting up the new mapping. So, I'm thinking we should tear down the new mapping in the case do_munmap of the old mapping fails? That 'should' simply be a matter of: - moving page tables back to original mapping - remove/delete new vma - I don't think we need to 'unmap' the new vma as there should be no associated pages. I'll look into doing this as well. Just curious, do those userfaultfd callouts still work as desired in the case of map duplication (old_len == 0)? -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: > Sent a patch (in separate e-mail thread) to return EINVAL for private > mappings. The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me than the alternative of copying pagetables for anon pages (as behaving the way that way avoids to break anon pages invariants), despite it's not creating an exact mirror of what was in the original vma as it excludes any modification done to cowed anon pages. By nullifying move_page_tables old_len == 0 is simply duping the vma which is equivalent to a new mmap on the file for the MAP_PRIVATE case, it has a deterministic result. The real question is if it anybody is using it. So an alternative would be to start by adding a WARN_ON_ONCE deprecation warning instead of -EINVAL right away. The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side effect of using the old_len == 0 trick looks like a bug, I guess it should get fixed if we intend to keep old_len and document it for the long term. Overall I'm more concerned about the fact an allocation failure in do_munmap is unreported to userland and it will leave the old vma intact like old_len == 0 would do (unless I'm misreading something there). The VM_ACCOUNT wipe as side effect of old_len == 0 is not major short term concern. Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: > Sent a patch (in separate e-mail thread) to return EINVAL for private > mappings. The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me than the alternative of copying pagetables for anon pages (as behaving the way that way avoids to break anon pages invariants), despite it's not creating an exact mirror of what was in the original vma as it excludes any modification done to cowed anon pages. By nullifying move_page_tables old_len == 0 is simply duping the vma which is equivalent to a new mmap on the file for the MAP_PRIVATE case, it has a deterministic result. The real question is if it anybody is using it. So an alternative would be to start by adding a WARN_ON_ONCE deprecation warning instead of -EINVAL right away. The vma->vm_flags VM_ACCOUNT being wiped on the original vma as side effect of using the old_len == 0 trick looks like a bug, I guess it should get fixed if we intend to keep old_len and document it for the long term. Overall I'm more concerned about the fact an allocation failure in do_munmap is unreported to userland and it will leave the old vma intact like old_len == 0 would do (unless I'm misreading something there). The VM_ACCOUNT wipe as side effect of old_len == 0 is not major short term concern. Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/12/2017 11:16 PM, Michal Hocko wrote: > On Wed 12-07-17 09:55:48, Mike Kravetz wrote: >> On 07/12/2017 04:46 AM, Michal Hocko wrote: >>> On Tue 11-07-17 11:23:19, Mike Kravetz wrote: On 07/11/2017 05:36 AM, Michal Hocko wrote: >>> [...] > Anyway the patch should fail with -EINVAL on private mappings as Kirill > already pointed out Yes. I think this should be a separate patch. As mentioned earlier, mremap today creates a new/additional private mapping if called in this way with old_size == 0. To me, this is a bug. >>> >>> Not only that. It clears existing ptes in the old mapping so the content >>> is lost. That is quite unexpected behavior. Now it is hard to assume >>> whether somebody relies on the behavior (I can easily imagine somebody >>> doing backup in atomic way) so failing with EINVAL might break >>> userspace so I am not longer sure. Anyway this really needs to be >>> documented. >> >> I am pretty sure it does not clear ptes in the old mapping, or modify it >> in any way. Are you thinking they are cleared as part of the call to >> move_page_tables? Since old_size == 0 (len as passed to move_page_tables), >> the for loop in move_page_tables is not run and it doesn't do much of >> anything in this case. > > Dang. I have completely missed that we give old_len as the len > parameter. Then it is clear that this old_len == 0 trick never really > worked for MAP_PRIVATE because it simply fails the main invariant that > the content at the new location matches the old one. Care to send a > patch to clarify that and sent EINVAL or should I do it? Sent a patch (in separate e-mail thread) to return EINVAL for private mappings. >> If adding hugetlbfs support to memfd_create works out, I would like to >> see mremap(old_size == 0) support dropped. Nobody here (kernel mm >> development) seems to like it. However, as you note there may be somebody >> depending on this behavior. What would be the process for removing >> such support? AFAIK, it is not documented anywhere. If we do document >> the behavior, then we will certainly be stuck with it for a long time. > > I would rather document it than remove it. From the past we know that > there are users and my experience tells me that once something is used > it lives its life for ever basically. And moreover it is not like this > costs us any maintenance burden to support the hack. Just make it more > obvious so that we do not have to rediscover it each time. I will put together a patch to add a description of (old_size == 0) behavior to the man page. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/12/2017 11:16 PM, Michal Hocko wrote: > On Wed 12-07-17 09:55:48, Mike Kravetz wrote: >> On 07/12/2017 04:46 AM, Michal Hocko wrote: >>> On Tue 11-07-17 11:23:19, Mike Kravetz wrote: On 07/11/2017 05:36 AM, Michal Hocko wrote: >>> [...] > Anyway the patch should fail with -EINVAL on private mappings as Kirill > already pointed out Yes. I think this should be a separate patch. As mentioned earlier, mremap today creates a new/additional private mapping if called in this way with old_size == 0. To me, this is a bug. >>> >>> Not only that. It clears existing ptes in the old mapping so the content >>> is lost. That is quite unexpected behavior. Now it is hard to assume >>> whether somebody relies on the behavior (I can easily imagine somebody >>> doing backup in atomic way) so failing with EINVAL might break >>> userspace so I am not longer sure. Anyway this really needs to be >>> documented. >> >> I am pretty sure it does not clear ptes in the old mapping, or modify it >> in any way. Are you thinking they are cleared as part of the call to >> move_page_tables? Since old_size == 0 (len as passed to move_page_tables), >> the for loop in move_page_tables is not run and it doesn't do much of >> anything in this case. > > Dang. I have completely missed that we give old_len as the len > parameter. Then it is clear that this old_len == 0 trick never really > worked for MAP_PRIVATE because it simply fails the main invariant that > the content at the new location matches the old one. Care to send a > patch to clarify that and sent EINVAL or should I do it? Sent a patch (in separate e-mail thread) to return EINVAL for private mappings. >> If adding hugetlbfs support to memfd_create works out, I would like to >> see mremap(old_size == 0) support dropped. Nobody here (kernel mm >> development) seems to like it. However, as you note there may be somebody >> depending on this behavior. What would be the process for removing >> such support? AFAIK, it is not documented anywhere. If we do document >> the behavior, then we will certainly be stuck with it for a long time. > > I would rather document it than remove it. From the past we know that > there are users and my experience tells me that once something is used > it lives its life for ever basically. And moreover it is not like this > costs us any maintenance burden to support the hack. Just make it more > obvious so that we do not have to rediscover it each time. I will put together a patch to add a description of (old_size == 0) behavior to the man page. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Wed 12-07-17 09:55:48, Mike Kravetz wrote: > On 07/12/2017 04:46 AM, Michal Hocko wrote: > > On Tue 11-07-17 11:23:19, Mike Kravetz wrote: > >> On 07/11/2017 05:36 AM, Michal Hocko wrote: > > [...] > >>> Anyway the patch should fail with -EINVAL on private mappings as Kirill > >>> already pointed out > >> > >> Yes. I think this should be a separate patch. As mentioned earlier, > >> mremap today creates a new/additional private mapping if called in this > >> way with old_size == 0. To me, this is a bug. > > > > Not only that. It clears existing ptes in the old mapping so the content > > is lost. That is quite unexpected behavior. Now it is hard to assume > > whether somebody relies on the behavior (I can easily imagine somebody > > doing backup in atomic way) so failing with EINVAL might break > > userspace so I am not longer sure. Anyway this really needs to be > > documented. > > I am pretty sure it does not clear ptes in the old mapping, or modify it > in any way. Are you thinking they are cleared as part of the call to > move_page_tables? Since old_size == 0 (len as passed to move_page_tables), > the for loop in move_page_tables is not run and it doesn't do much of > anything in this case. Dang. I have completely missed that we give old_len as the len parameter. Then it is clear that this old_len == 0 trick never really worked for MAP_PRIVATE because it simply fails the main invariant that the content at the new location matches the old one. Care to send a patch to clarify that and sent EINVAL or should I do it? > My plan is to look into adding hugetlbfs support to memfd_create, as this > would meet the user's needs. And, this is a much more sane API than this > mremap(old_size == 0) behavior. agreed > If adding hugetlbfs support to memfd_create works out, I would like to > see mremap(old_size == 0) support dropped. Nobody here (kernel mm > development) seems to like it. However, as you note there may be somebody > depending on this behavior. What would be the process for removing > such support? AFAIK, it is not documented anywhere. If we do document > the behavior, then we will certainly be stuck with it for a long time. I would rather document it than remove it. From the past we know that there are users and my experience tells me that once something is used it lives its life for ever basically. And moreover it is not like this costs us any maintenance burden to support the hack. Just make it more obvious so that we do not have to rediscover it each time. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Wed 12-07-17 09:55:48, Mike Kravetz wrote: > On 07/12/2017 04:46 AM, Michal Hocko wrote: > > On Tue 11-07-17 11:23:19, Mike Kravetz wrote: > >> On 07/11/2017 05:36 AM, Michal Hocko wrote: > > [...] > >>> Anyway the patch should fail with -EINVAL on private mappings as Kirill > >>> already pointed out > >> > >> Yes. I think this should be a separate patch. As mentioned earlier, > >> mremap today creates a new/additional private mapping if called in this > >> way with old_size == 0. To me, this is a bug. > > > > Not only that. It clears existing ptes in the old mapping so the content > > is lost. That is quite unexpected behavior. Now it is hard to assume > > whether somebody relies on the behavior (I can easily imagine somebody > > doing backup in atomic way) so failing with EINVAL might break > > userspace so I am not longer sure. Anyway this really needs to be > > documented. > > I am pretty sure it does not clear ptes in the old mapping, or modify it > in any way. Are you thinking they are cleared as part of the call to > move_page_tables? Since old_size == 0 (len as passed to move_page_tables), > the for loop in move_page_tables is not run and it doesn't do much of > anything in this case. Dang. I have completely missed that we give old_len as the len parameter. Then it is clear that this old_len == 0 trick never really worked for MAP_PRIVATE because it simply fails the main invariant that the content at the new location matches the old one. Care to send a patch to clarify that and sent EINVAL or should I do it? > My plan is to look into adding hugetlbfs support to memfd_create, as this > would meet the user's needs. And, this is a much more sane API than this > mremap(old_size == 0) behavior. agreed > If adding hugetlbfs support to memfd_create works out, I would like to > see mremap(old_size == 0) support dropped. Nobody here (kernel mm > development) seems to like it. However, as you note there may be somebody > depending on this behavior. What would be the process for removing > such support? AFAIK, it is not documented anywhere. If we do document > the behavior, then we will certainly be stuck with it for a long time. I would rather document it than remove it. From the past we know that there are users and my experience tells me that once something is used it lives its life for ever basically. And moreover it is not like this costs us any maintenance burden to support the hack. Just make it more obvious so that we do not have to rediscover it each time. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/12/2017 04:46 AM, Michal Hocko wrote: > On Tue 11-07-17 11:23:19, Mike Kravetz wrote: >> On 07/11/2017 05:36 AM, Michal Hocko wrote: > [...] >>> Anyway the patch should fail with -EINVAL on private mappings as Kirill >>> already pointed out >> >> Yes. I think this should be a separate patch. As mentioned earlier, >> mremap today creates a new/additional private mapping if called in this >> way with old_size == 0. To me, this is a bug. > > Not only that. It clears existing ptes in the old mapping so the content > is lost. That is quite unexpected behavior. Now it is hard to assume > whether somebody relies on the behavior (I can easily imagine somebody > doing backup in atomic way) so failing with EINVAL might break > userspace so I am not longer sure. Anyway this really needs to be > documented. I am pretty sure it does not clear ptes in the old mapping, or modify it in any way. Are you thinking they are cleared as part of the call to move_page_tables? Since old_size == 0 (len as passed to move_page_tables), the for loop in move_page_tables is not run and it doesn't do much of anything in this case. My plan is to look into adding hugetlbfs support to memfd_create, as this would meet the user's needs. And, this is a much more sane API than this mremap(old_size == 0) behavior. If adding hugetlbfs support to memfd_create works out, I would like to see mremap(old_size == 0) support dropped. Nobody here (kernel mm development) seems to like it. However, as you note there may be somebody depending on this behavior. What would be the process for removing such support? AFAIK, it is not documented anywhere. If we do document the behavior, then we will certainly be stuck with it for a long time. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/12/2017 04:46 AM, Michal Hocko wrote: > On Tue 11-07-17 11:23:19, Mike Kravetz wrote: >> On 07/11/2017 05:36 AM, Michal Hocko wrote: > [...] >>> Anyway the patch should fail with -EINVAL on private mappings as Kirill >>> already pointed out >> >> Yes. I think this should be a separate patch. As mentioned earlier, >> mremap today creates a new/additional private mapping if called in this >> way with old_size == 0. To me, this is a bug. > > Not only that. It clears existing ptes in the old mapping so the content > is lost. That is quite unexpected behavior. Now it is hard to assume > whether somebody relies on the behavior (I can easily imagine somebody > doing backup in atomic way) so failing with EINVAL might break > userspace so I am not longer sure. Anyway this really needs to be > documented. I am pretty sure it does not clear ptes in the old mapping, or modify it in any way. Are you thinking they are cleared as part of the call to move_page_tables? Since old_size == 0 (len as passed to move_page_tables), the for loop in move_page_tables is not run and it doesn't do much of anything in this case. My plan is to look into adding hugetlbfs support to memfd_create, as this would meet the user's needs. And, this is a much more sane API than this mremap(old_size == 0) behavior. If adding hugetlbfs support to memfd_create works out, I would like to see mremap(old_size == 0) support dropped. Nobody here (kernel mm development) seems to like it. However, as you note there may be somebody depending on this behavior. What would be the process for removing such support? AFAIK, it is not documented anywhere. If we do document the behavior, then we will certainly be stuck with it for a long time. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue 11-07-17 11:23:19, Mike Kravetz wrote: > On 07/11/2017 05:36 AM, Michal Hocko wrote: [...] > > Anyway the patch should fail with -EINVAL on private mappings as Kirill > > already pointed out > > Yes. I think this should be a separate patch. As mentioned earlier, > mremap today creates a new/additional private mapping if called in this > way with old_size == 0. To me, this is a bug. Not only that. It clears existing ptes in the old mapping so the content is lost. That is quite unexpected behavior. Now it is hard to assume whether somebody relies on the behavior (I can easily imagine somebody doing backup in atomic way) so failing with EINVAL might break userspace so I am not longer sure. Anyway this really needs to be documented. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue 11-07-17 11:23:19, Mike Kravetz wrote: > On 07/11/2017 05:36 AM, Michal Hocko wrote: [...] > > Anyway the patch should fail with -EINVAL on private mappings as Kirill > > already pointed out > > Yes. I think this should be a separate patch. As mentioned earlier, > mremap today creates a new/additional private mapping if called in this > way with old_size == 0. To me, this is a bug. Not only that. It clears existing ptes in the old mapping so the content is lost. That is quite unexpected behavior. Now it is hard to assume whether somebody relies on the behavior (I can easily imagine somebody doing backup in atomic way) so failing with EINVAL might break userspace so I am not longer sure. Anyway this really needs to be documented. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote: > Well, the JVM has had a config option for the use of hugetlbfs for quite > some time. I assume they have already had to deal with these issues. Yes, the config tweak exists well before THP existed but in production I know nobody who used it because as you start more processes you risk running out of hugetlbfs reservation and in addition the reservation "wastes memory" at times. > What prompted this discussion is that they want the mremap mirroring/ > duplication functionality extended to support hugetlbfs. This is pretty > straight forward. But, I wanted to have a discussion about whether the > mremap(old_size == 0) functionality should be formally documented first. Agreed. > Do note that if you actually create/mount a hugetlbfs filesystem and > use a fd in that filesystem you can get the desired functionality. However, > they want to avoid this extra step if possible and use mmap(anon, hugetlb). I see, I thought they needed to use the mremap on pure SHM because of the there was no MAP_HUGETLB in the mmap flags of the use case. > I'm guessing that if memfd_create supported hugetlbfs, that would also > meet their needs. Any thoughts about extending memfd_create support to > hugetlbfs? I can't think of any big issues. In fact, 'under the covers' > there actually is a hugetlbfs file created for anon mappings. However, > that is not exposed to the user. Yes, that should fit fine as MFD_HUGETLB or similar. > Yes, that is why I think it is a bug. Not that kernel is unstable, but > rather the unintentional/unexpected result. The most unexpected is the old mapping isn't wiped, at least it doesn't seem to cause trouble to anon as move_page_tables is nullified (old_end == old_addr so the loop never runs). > > memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will > > get the file pages correctly after a new mmap (even if there were cows > > in the old MAP_PRIVATE mmap). > > > >> One reason for the RFC was to determine if people thought we should: > >> 1) Just document the existing old_size == 0 functionality > >> 2) Create a more explicit interface such as a new mremap flag for this > >>functionality > >> > >> I am waiting to see what direction people prefer before making any > >> man page updates. > > > > I guess old_size == 0 would better be dropped if possible, if > > memfd_create fits perfectly your needs as I supposed above. If it's > > not dropped then it's not very far from allowing mmap of /proc/self/mm > > again (removed around so far as 2.3.x?). > > Yes, in my google'ing it appears the first users of mremap(old_size == 0) > previously used mmap of /proc/self/mm. > > If memfd_create can be extended to support hugetlbfs, then I might suggest > dropping the memfd_create(old_size == 0) support. Just a thought. memfd_create interface sounds more robust than this mremap trick, they would have to deal with one more fd that's all. old_len == 0 by nullifying move_page_tables will cause not harm to anon pages however the place where we would drop the vma is do_munmap here: if (vm_flags & VM_ACCOUNT) { vma->vm_flags &= ~VM_ACCOUNT; excess = vma->vm_end - vma->vm_start - old_len; [..] if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); excess = 0; } It looks like a split_vma allocation failure can leave the old vma around in a equal way to old_len == 0 (but in such case all anon payload will have been moved to the new vma). That also seems safe as far as the kernel is concerned but it could cause userland failure if you depend on SIGSEGV to trigger later on the original vma you thought was implicitly munmapped (and in MAP_SHARED case it could even lead to unexpected file corruption instead of an expected SIGSEGV). If nobody ever depends on whatever is left on the old vma it's ok, but it could still leave file handle pinned unexpectedly if it's not anon. The other issue of the old_len = 0 trick is that will unexpectedly wipe the VM_ACCOUNT from the original vma as side effect of the above, but it'd only be noticeable if you care about strict accounting. So there is at least such one glitch in it. Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote: > Well, the JVM has had a config option for the use of hugetlbfs for quite > some time. I assume they have already had to deal with these issues. Yes, the config tweak exists well before THP existed but in production I know nobody who used it because as you start more processes you risk running out of hugetlbfs reservation and in addition the reservation "wastes memory" at times. > What prompted this discussion is that they want the mremap mirroring/ > duplication functionality extended to support hugetlbfs. This is pretty > straight forward. But, I wanted to have a discussion about whether the > mremap(old_size == 0) functionality should be formally documented first. Agreed. > Do note that if you actually create/mount a hugetlbfs filesystem and > use a fd in that filesystem you can get the desired functionality. However, > they want to avoid this extra step if possible and use mmap(anon, hugetlb). I see, I thought they needed to use the mremap on pure SHM because of the there was no MAP_HUGETLB in the mmap flags of the use case. > I'm guessing that if memfd_create supported hugetlbfs, that would also > meet their needs. Any thoughts about extending memfd_create support to > hugetlbfs? I can't think of any big issues. In fact, 'under the covers' > there actually is a hugetlbfs file created for anon mappings. However, > that is not exposed to the user. Yes, that should fit fine as MFD_HUGETLB or similar. > Yes, that is why I think it is a bug. Not that kernel is unstable, but > rather the unintentional/unexpected result. The most unexpected is the old mapping isn't wiped, at least it doesn't seem to cause trouble to anon as move_page_tables is nullified (old_end == old_addr so the loop never runs). > > memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will > > get the file pages correctly after a new mmap (even if there were cows > > in the old MAP_PRIVATE mmap). > > > >> One reason for the RFC was to determine if people thought we should: > >> 1) Just document the existing old_size == 0 functionality > >> 2) Create a more explicit interface such as a new mremap flag for this > >>functionality > >> > >> I am waiting to see what direction people prefer before making any > >> man page updates. > > > > I guess old_size == 0 would better be dropped if possible, if > > memfd_create fits perfectly your needs as I supposed above. If it's > > not dropped then it's not very far from allowing mmap of /proc/self/mm > > again (removed around so far as 2.3.x?). > > Yes, in my google'ing it appears the first users of mremap(old_size == 0) > previously used mmap of /proc/self/mm. > > If memfd_create can be extended to support hugetlbfs, then I might suggest > dropping the memfd_create(old_size == 0) support. Just a thought. memfd_create interface sounds more robust than this mremap trick, they would have to deal with one more fd that's all. old_len == 0 by nullifying move_page_tables will cause not harm to anon pages however the place where we would drop the vma is do_munmap here: if (vm_flags & VM_ACCOUNT) { vma->vm_flags &= ~VM_ACCOUNT; excess = vma->vm_end - vma->vm_start - old_len; [..] if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); excess = 0; } It looks like a split_vma allocation failure can leave the old vma around in a equal way to old_len == 0 (but in such case all anon payload will have been moved to the new vma). That also seems safe as far as the kernel is concerned but it could cause userland failure if you depend on SIGSEGV to trigger later on the original vma you thought was implicitly munmapped (and in MAP_SHARED case it could even lead to unexpected file corruption instead of an expected SIGSEGV). If nobody ever depends on whatever is left on the old vma it's ok, but it could still leave file handle pinned unexpectedly if it's not anon. The other issue of the old_len = 0 trick is that will unexpectedly wipe the VM_ACCOUNT from the original vma as side effect of the above, but it'd only be noticeable if you care about strict accounting. So there is at least such one glitch in it. Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/11/2017 02:02 PM, Andrea Arcangeli wrote: > On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: >> I was surprised as well when a JVM developer pointed this out. >> >> From the old e-mail thread, here is original use case: >> shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 >> shmat(11337732, 0, 0) = 0x40299000 >> shmctl(11337732, IPC_RMID, 0) = 0 >> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 >> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = >> 0x10 >> >> The JVM team wants to do something similar. They are using >> mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead >> of shmget/shmat. As Vlastimil mentioned previously, one would not >> expect a shared mapping for parts of the JVM heap. I am working >> to get clarification from the JVM team. > > Why don't they use memfd_create instead? That's made so that the fd is > born anon unlinked so when the last reference is dropped all memory > associated with it is automatically freed. No need of IC_RMID and then > they can use mmap instead of mremap(len=0) to get a double map of it. Wow! I did not even know about memfd_create until you mentioned it. That would certainly work for 'normal' pages. > If they use mmap(MAP_ANONYMOUS|MAP_SHARED) it's not hugetlbfs, that > would have been the only issue. > > Using hugetlbfs for JVM wouldn't be really flexible, better they try > to leverage THP on SHM or the hugetlbfs reservation gets in the way of > efficient use of the unused memory for memory allocations that don't > have a definitive size (i.e. JVM forks or more JVM are run in > parallel). Well, the JVM has had a config option for the use of hugetlbfs for quite some time. I assume they have already had to deal with these issues. What prompted this discussion is that they want the mremap mirroring/ duplication functionality extended to support hugetlbfs. This is pretty straight forward. But, I wanted to have a discussion about whether the mremap(old_size == 0) functionality should be formally documented first. Do note that if you actually create/mount a hugetlbfs filesystem and use a fd in that filesystem you can get the desired functionality. However, they want to avoid this extra step if possible and use mmap(anon, hugetlb). I'm guessing that if memfd_create supported hugetlbfs, that would also meet their needs. Any thoughts about extending memfd_create support to hugetlbfs? I can't think of any big issues. In fact, 'under the covers' there actually is a hugetlbfs file created for anon mappings. However, that is not exposed to the user. >> Yes. I think this should be a separate patch. As mentioned earlier, >> mremap today creates a new/additional private mapping if called in this >> way with old_size == 0. To me, this is a bug. > > Kernel by sheer luck should stay stable, but the result is weird and > it's unlikely intentional. Yes, that is why I think it is a bug. Not that kernel is unstable, but rather the unintentional/unexpected result. > memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will > get the file pages correctly after a new mmap (even if there were cows > in the old MAP_PRIVATE mmap). > >> One reason for the RFC was to determine if people thought we should: >> 1) Just document the existing old_size == 0 functionality >> 2) Create a more explicit interface such as a new mremap flag for this >>functionality >> >> I am waiting to see what direction people prefer before making any >> man page updates. > > I guess old_size == 0 would better be dropped if possible, if > memfd_create fits perfectly your needs as I supposed above. If it's > not dropped then it's not very far from allowing mmap of /proc/self/mm > again (removed around so far as 2.3.x?). Yes, in my google'ing it appears the first users of mremap(old_size == 0) previously used mmap of /proc/self/mm. If memfd_create can be extended to support hugetlbfs, then I might suggest dropping the memfd_create(old_size == 0) support. Just a thought. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/11/2017 02:02 PM, Andrea Arcangeli wrote: > On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: >> I was surprised as well when a JVM developer pointed this out. >> >> From the old e-mail thread, here is original use case: >> shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 >> shmat(11337732, 0, 0) = 0x40299000 >> shmctl(11337732, IPC_RMID, 0) = 0 >> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 >> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = >> 0x10 >> >> The JVM team wants to do something similar. They are using >> mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead >> of shmget/shmat. As Vlastimil mentioned previously, one would not >> expect a shared mapping for parts of the JVM heap. I am working >> to get clarification from the JVM team. > > Why don't they use memfd_create instead? That's made so that the fd is > born anon unlinked so when the last reference is dropped all memory > associated with it is automatically freed. No need of IC_RMID and then > they can use mmap instead of mremap(len=0) to get a double map of it. Wow! I did not even know about memfd_create until you mentioned it. That would certainly work for 'normal' pages. > If they use mmap(MAP_ANONYMOUS|MAP_SHARED) it's not hugetlbfs, that > would have been the only issue. > > Using hugetlbfs for JVM wouldn't be really flexible, better they try > to leverage THP on SHM or the hugetlbfs reservation gets in the way of > efficient use of the unused memory for memory allocations that don't > have a definitive size (i.e. JVM forks or more JVM are run in > parallel). Well, the JVM has had a config option for the use of hugetlbfs for quite some time. I assume they have already had to deal with these issues. What prompted this discussion is that they want the mremap mirroring/ duplication functionality extended to support hugetlbfs. This is pretty straight forward. But, I wanted to have a discussion about whether the mremap(old_size == 0) functionality should be formally documented first. Do note that if you actually create/mount a hugetlbfs filesystem and use a fd in that filesystem you can get the desired functionality. However, they want to avoid this extra step if possible and use mmap(anon, hugetlb). I'm guessing that if memfd_create supported hugetlbfs, that would also meet their needs. Any thoughts about extending memfd_create support to hugetlbfs? I can't think of any big issues. In fact, 'under the covers' there actually is a hugetlbfs file created for anon mappings. However, that is not exposed to the user. >> Yes. I think this should be a separate patch. As mentioned earlier, >> mremap today creates a new/additional private mapping if called in this >> way with old_size == 0. To me, this is a bug. > > Kernel by sheer luck should stay stable, but the result is weird and > it's unlikely intentional. Yes, that is why I think it is a bug. Not that kernel is unstable, but rather the unintentional/unexpected result. > memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will > get the file pages correctly after a new mmap (even if there were cows > in the old MAP_PRIVATE mmap). > >> One reason for the RFC was to determine if people thought we should: >> 1) Just document the existing old_size == 0 functionality >> 2) Create a more explicit interface such as a new mremap flag for this >>functionality >> >> I am waiting to see what direction people prefer before making any >> man page updates. > > I guess old_size == 0 would better be dropped if possible, if > memfd_create fits perfectly your needs as I supposed above. If it's > not dropped then it's not very far from allowing mmap of /proc/self/mm > again (removed around so far as 2.3.x?). Yes, in my google'ing it appears the first users of mremap(old_size == 0) previously used mmap of /proc/self/mm. If memfd_create can be extended to support hugetlbfs, then I might suggest dropping the memfd_create(old_size == 0) support. Just a thought. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: > I was surprised as well when a JVM developer pointed this out. > > From the old e-mail thread, here is original use case: > shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 > shmat(11337732, 0, 0) = 0x40299000 > shmctl(11337732, IPC_RMID, 0) = 0 > mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 > mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = 0x10 > > The JVM team wants to do something similar. They are using > mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead > of shmget/shmat. As Vlastimil mentioned previously, one would not > expect a shared mapping for parts of the JVM heap. I am working > to get clarification from the JVM team. Why don't they use memfd_create instead? That's made so that the fd is born anon unlinked so when the last reference is dropped all memory associated with it is automatically freed. No need of IC_RMID and then they can use mmap instead of mremap(len=0) to get a double map of it. If they use mmap(MAP_ANONYMOUS|MAP_SHARED) it's not hugetlbfs, that would have been the only issue. Using hugetlbfs for JVM wouldn't be really flexible, better they try to leverage THP on SHM or the hugetlbfs reservation gets in the way of efficient use of the unused memory for memory allocations that don't have a definitive size (i.e. JVM forks or more JVM are run in parallel). > Yes. I think this should be a separate patch. As mentioned earlier, > mremap today creates a new/additional private mapping if called in this > way with old_size == 0. To me, this is a bug. Kernel by sheer luck should stay stable, but the result is weird and it's unlikely intentional. memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will get the file pages correctly after a new mmap (even if there were cows in the old MAP_PRIVATE mmap). > One reason for the RFC was to determine if people thought we should: > 1) Just document the existing old_size == 0 functionality > 2) Create a more explicit interface such as a new mremap flag for this >functionality > > I am waiting to see what direction people prefer before making any > man page updates. I guess old_size == 0 would better be dropped if possible, if memfd_create fits perfectly your needs as I supposed above. If it's not dropped then it's not very far from allowing mmap of /proc/self/mm again (removed around so far as 2.3.x?). Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: > I was surprised as well when a JVM developer pointed this out. > > From the old e-mail thread, here is original use case: > shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 > shmat(11337732, 0, 0) = 0x40299000 > shmctl(11337732, IPC_RMID, 0) = 0 > mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 > mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = 0x10 > > The JVM team wants to do something similar. They are using > mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead > of shmget/shmat. As Vlastimil mentioned previously, one would not > expect a shared mapping for parts of the JVM heap. I am working > to get clarification from the JVM team. Why don't they use memfd_create instead? That's made so that the fd is born anon unlinked so when the last reference is dropped all memory associated with it is automatically freed. No need of IC_RMID and then they can use mmap instead of mremap(len=0) to get a double map of it. If they use mmap(MAP_ANONYMOUS|MAP_SHARED) it's not hugetlbfs, that would have been the only issue. Using hugetlbfs for JVM wouldn't be really flexible, better they try to leverage THP on SHM or the hugetlbfs reservation gets in the way of efficient use of the unused memory for memory allocations that don't have a definitive size (i.e. JVM forks or more JVM are run in parallel). > Yes. I think this should be a separate patch. As mentioned earlier, > mremap today creates a new/additional private mapping if called in this > way with old_size == 0. To me, this is a bug. Kernel by sheer luck should stay stable, but the result is weird and it's unlikely intentional. memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will get the file pages correctly after a new mmap (even if there were cows in the old MAP_PRIVATE mmap). > One reason for the RFC was to determine if people thought we should: > 1) Just document the existing old_size == 0 functionality > 2) Create a more explicit interface such as a new mremap flag for this >functionality > > I am waiting to see what direction people prefer before making any > man page updates. I guess old_size == 0 would better be dropped if possible, if memfd_create fits perfectly your needs as I supposed above. If it's not dropped then it's not very far from allowing mmap of /proc/self/mm again (removed around so far as 2.3.x?). Thanks, Andrea
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/11/2017 05:36 AM, Michal Hocko wrote: > On Thu 06-07-17 09:17:26, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > I have to admit that this came as a suprise to me. There is no mention > about this special case in the man page and the mremap code is so > convoluted that I simply didn't see it there. I guess the only > reasonable usecase is when you do not have a fd for the shared memory. I was surprised as well when a JVM developer pointed this out. >From the old e-mail thread, here is original use case: shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 shmat(11337732, 0, 0) = 0x40299000 shmctl(11337732, IPC_RMID, 0) = 0 mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = 0x10 The JVM team wants to do something similar. They are using mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead of shmget/shmat. As Vlastimil mentioned previously, one would not expect a shared mapping for parts of the JVM heap. I am working to get clarification from the JVM team. > Anyway the patch should fail with -EINVAL on private mappings as Kirill > already pointed out Yes. I think this should be a separate patch. As mentioned earlier, mremap today creates a new/additional private mapping if called in this way with old_size == 0. To me, this is a bug. > and this should go along with an update to the > man page which describes also the historical behavior. Yes, man page updates are a must. One reason for the RFC was to determine if people thought we should: 1) Just document the existing old_size == 0 functionality 2) Create a more explicit interface such as a new mremap flag for this functionality I am waiting to see what direction people prefer before making any man page updates. >Make sure you > document that this is not really a mirroring (e.g. faulting page in one > address will automatically map it to the other mapping(s)) but merely a > copy of the range. Maybe MREMAP_COPY would be more appropriate name. Good point. mirror is the first word that came to mind, but it does not exactly apply. -- Mike Kravetz > >> Signed-off-by: Mike Kravetz>> --- >> include/uapi/linux/mman.h | 5 +++-- >> mm/mremap.c | 23 --- >> tools/include/uapi/linux/mman.h | 5 +++-- >> 3 files changed, 22 insertions(+), 11 deletions(-) >> >> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h >> index ade4acd..6b3e0df 100644 >> --- a/include/uapi/linux/mman.h >> +++ b/include/uapi/linux/mman.h >> @@ -3,8 +3,9 @@ >> >> #include >> >> -#define MREMAP_MAYMOVE 1 >> -#define MREMAP_FIXED2 >> +#define MREMAP_MAYMOVE 0x01 >> +#define MREMAP_FIXED0x02 >> +#define MREMAP_MIRROR 0x04 >> >> #define OVERCOMMIT_GUESS0 >> #define OVERCOMMIT_ALWAYS 1 >> diff --git a/mm/mremap.c b/mm/mremap.c >> index cd8a1b1..f18ab36 100644 >> --- a/mm/mremap.c >> +++ b/mm/mremap.c >> @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned >> long, old_len, >> struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; >> LIST_HEAD(uf_unmap); >> >> -if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) >> +if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR)) >> return ret; >> >> -if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) >> +if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) && >> +!(flags & MREMAP_MAYMOVE)) >> return ret; >> >> if (offset_in_page(addr)) >> @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned >> long, old_len, >> old_len = PAGE_ALIGN(old_len); >> new_len = PAGE_ALIGN(new_len); >> >> -/* >> - * We allow a zero old-len as a special case >> - * for DOS-emu "duplicate shm area" thing. But >> - * a zero new-len is nonsensical. >> - */ >> +/* A zero new-len is nonsensical. */ >> if (!new_len) >> return ret; >> >> +/* >> + * For backward compatibility, we allow a zero old-len to imply >> + * mirroring. This was originally a special case for DOS-emu. >> + */ >> +if (!old_len) >> +
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/11/2017 05:36 AM, Michal Hocko wrote: > On Thu 06-07-17 09:17:26, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > I have to admit that this came as a suprise to me. There is no mention > about this special case in the man page and the mremap code is so > convoluted that I simply didn't see it there. I guess the only > reasonable usecase is when you do not have a fd for the shared memory. I was surprised as well when a JVM developer pointed this out. >From the old e-mail thread, here is original use case: shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 shmat(11337732, 0, 0) = 0x40299000 shmctl(11337732, IPC_RMID, 0) = 0 mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0 mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x10) = 0x10 The JVM team wants to do something similar. They are using mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead of shmget/shmat. As Vlastimil mentioned previously, one would not expect a shared mapping for parts of the JVM heap. I am working to get clarification from the JVM team. > Anyway the patch should fail with -EINVAL on private mappings as Kirill > already pointed out Yes. I think this should be a separate patch. As mentioned earlier, mremap today creates a new/additional private mapping if called in this way with old_size == 0. To me, this is a bug. > and this should go along with an update to the > man page which describes also the historical behavior. Yes, man page updates are a must. One reason for the RFC was to determine if people thought we should: 1) Just document the existing old_size == 0 functionality 2) Create a more explicit interface such as a new mremap flag for this functionality I am waiting to see what direction people prefer before making any man page updates. >Make sure you > document that this is not really a mirroring (e.g. faulting page in one > address will automatically map it to the other mapping(s)) but merely a > copy of the range. Maybe MREMAP_COPY would be more appropriate name. Good point. mirror is the first word that came to mind, but it does not exactly apply. -- Mike Kravetz > >> Signed-off-by: Mike Kravetz >> --- >> include/uapi/linux/mman.h | 5 +++-- >> mm/mremap.c | 23 --- >> tools/include/uapi/linux/mman.h | 5 +++-- >> 3 files changed, 22 insertions(+), 11 deletions(-) >> >> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h >> index ade4acd..6b3e0df 100644 >> --- a/include/uapi/linux/mman.h >> +++ b/include/uapi/linux/mman.h >> @@ -3,8 +3,9 @@ >> >> #include >> >> -#define MREMAP_MAYMOVE 1 >> -#define MREMAP_FIXED2 >> +#define MREMAP_MAYMOVE 0x01 >> +#define MREMAP_FIXED0x02 >> +#define MREMAP_MIRROR 0x04 >> >> #define OVERCOMMIT_GUESS0 >> #define OVERCOMMIT_ALWAYS 1 >> diff --git a/mm/mremap.c b/mm/mremap.c >> index cd8a1b1..f18ab36 100644 >> --- a/mm/mremap.c >> +++ b/mm/mremap.c >> @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned >> long, old_len, >> struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; >> LIST_HEAD(uf_unmap); >> >> -if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) >> +if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR)) >> return ret; >> >> -if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) >> +if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) && >> +!(flags & MREMAP_MAYMOVE)) >> return ret; >> >> if (offset_in_page(addr)) >> @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned >> long, old_len, >> old_len = PAGE_ALIGN(old_len); >> new_len = PAGE_ALIGN(new_len); >> >> -/* >> - * We allow a zero old-len as a special case >> - * for DOS-emu "duplicate shm area" thing. But >> - * a zero new-len is nonsensical. >> - */ >> +/* A zero new-len is nonsensical. */ >> if (!new_len) >> return ret; >> >> +/* >> + * For backward compatibility, we allow a zero old-len to imply >> + * mirroring. This was originally a special case for DOS-emu. >> + */ >> +if (!old_len) >> +flags |= MREMAP_MIRROR; >> +
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu 06-07-17 09:17:26, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. I have to admit that this came as a suprise to me. There is no mention about this special case in the man page and the mremap code is so convoluted that I simply didn't see it there. I guess the only reasonable usecase is when you do not have a fd for the shared memory. Anyway the patch should fail with -EINVAL on private mappings as Kirill already pointed out and this should go along with an update to the man page which describes also the historical behavior. Make sure you document that this is not really a mirroring (e.g. faulting page in one address will automatically map it to the other mapping(s)) but merely a copy of the range. Maybe MREMAP_COPY would be more appropriate name. > Signed-off-by: Mike Kravetz> --- > include/uapi/linux/mman.h | 5 +++-- > mm/mremap.c | 23 --- > tools/include/uapi/linux/mman.h | 5 +++-- > 3 files changed, 22 insertions(+), 11 deletions(-) > > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h > index ade4acd..6b3e0df 100644 > --- a/include/uapi/linux/mman.h > +++ b/include/uapi/linux/mman.h > @@ -3,8 +3,9 @@ > > #include > > -#define MREMAP_MAYMOVE 1 > -#define MREMAP_FIXED 2 > +#define MREMAP_MAYMOVE 0x01 > +#define MREMAP_FIXED 0x02 > +#define MREMAP_MIRROR0x04 > > #define OVERCOMMIT_GUESS 0 > #define OVERCOMMIT_ALWAYS1 > diff --git a/mm/mremap.c b/mm/mremap.c > index cd8a1b1..f18ab36 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned > long, old_len, > struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; > LIST_HEAD(uf_unmap); > > - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) > + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR)) > return ret; > > - if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) > + if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) && > + !(flags & MREMAP_MAYMOVE)) > return ret; > > if (offset_in_page(addr)) > @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned > long, old_len, > old_len = PAGE_ALIGN(old_len); > new_len = PAGE_ALIGN(new_len); > > - /* > - * We allow a zero old-len as a special case > - * for DOS-emu "duplicate shm area" thing. But > - * a zero new-len is nonsensical. > - */ > + /* A zero new-len is nonsensical. */ > if (!new_len) > return ret; > > + /* > + * For backward compatibility, we allow a zero old-len to imply > + * mirroring. This was originally a special case for DOS-emu. > + */ > + if (!old_len) > + flags |= MREMAP_MIRROR; > + else if (flags & MREMAP_MIRROR) { > + if (old_len != new_len) > + return ret; > + old_len = 0; > + } > + > if (down_write_killable(>mm->mmap_sem)) > return -EINTR; > > diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h > index 81d8edf..069f7a5 100644 > --- a/tools/include/uapi/linux/mman.h > +++ b/tools/include/uapi/linux/mman.h > @@ -3,8 +3,9 @@ > > #include > > -#define MREMAP_MAYMOVE 1 > -#define MREMAP_FIXED 2 > +#define MREMAP_MAYMOVE 0x01 > +#define MREMAP_FIXED 0x02 > +#define MREMAP_MIRROR0x04 > > #define OVERCOMMIT_GUESS 0 > #define OVERCOMMIT_ALWAYS1 > -- > 2.7.5 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu 06-07-17 09:17:26, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. I have to admit that this came as a suprise to me. There is no mention about this special case in the man page and the mremap code is so convoluted that I simply didn't see it there. I guess the only reasonable usecase is when you do not have a fd for the shared memory. Anyway the patch should fail with -EINVAL on private mappings as Kirill already pointed out and this should go along with an update to the man page which describes also the historical behavior. Make sure you document that this is not really a mirroring (e.g. faulting page in one address will automatically map it to the other mapping(s)) but merely a copy of the range. Maybe MREMAP_COPY would be more appropriate name. > Signed-off-by: Mike Kravetz > --- > include/uapi/linux/mman.h | 5 +++-- > mm/mremap.c | 23 --- > tools/include/uapi/linux/mman.h | 5 +++-- > 3 files changed, 22 insertions(+), 11 deletions(-) > > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h > index ade4acd..6b3e0df 100644 > --- a/include/uapi/linux/mman.h > +++ b/include/uapi/linux/mman.h > @@ -3,8 +3,9 @@ > > #include > > -#define MREMAP_MAYMOVE 1 > -#define MREMAP_FIXED 2 > +#define MREMAP_MAYMOVE 0x01 > +#define MREMAP_FIXED 0x02 > +#define MREMAP_MIRROR0x04 > > #define OVERCOMMIT_GUESS 0 > #define OVERCOMMIT_ALWAYS1 > diff --git a/mm/mremap.c b/mm/mremap.c > index cd8a1b1..f18ab36 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned > long, old_len, > struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; > LIST_HEAD(uf_unmap); > > - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) > + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR)) > return ret; > > - if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) > + if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) && > + !(flags & MREMAP_MAYMOVE)) > return ret; > > if (offset_in_page(addr)) > @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned > long, old_len, > old_len = PAGE_ALIGN(old_len); > new_len = PAGE_ALIGN(new_len); > > - /* > - * We allow a zero old-len as a special case > - * for DOS-emu "duplicate shm area" thing. But > - * a zero new-len is nonsensical. > - */ > + /* A zero new-len is nonsensical. */ > if (!new_len) > return ret; > > + /* > + * For backward compatibility, we allow a zero old-len to imply > + * mirroring. This was originally a special case for DOS-emu. > + */ > + if (!old_len) > + flags |= MREMAP_MIRROR; > + else if (flags & MREMAP_MIRROR) { > + if (old_len != new_len) > + return ret; > + old_len = 0; > + } > + > if (down_write_killable(>mm->mmap_sem)) > return -EINTR; > > diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h > index 81d8edf..069f7a5 100644 > --- a/tools/include/uapi/linux/mman.h > +++ b/tools/include/uapi/linux/mman.h > @@ -3,8 +3,9 @@ > > #include > > -#define MREMAP_MAYMOVE 1 > -#define MREMAP_FIXED 2 > +#define MREMAP_MAYMOVE 0x01 > +#define MREMAP_FIXED 0x02 > +#define MREMAP_MIRROR0x04 > > #define OVERCOMMIT_GUESS 0 > #define OVERCOMMIT_ALWAYS1 > -- > 2.7.5 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Michal Hocko SUSE Labs
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/10/2017 09:22 AM, Vlastimil Babka wrote: > On 07/09/2017 09:32 AM, Anshuman Khandual wrote: >> On 07/07/2017 11:39 PM, Mike Kravetz wrote: >>> On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: > On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: >> What is going to happen to mirrored after CoW for instance? >> >> In my opinion, it shouldn't be allowed for anon/private mappings at >> least. >> And with this limitation, I don't see much sense in the new interface -- >> just create mirror by mmap()ing the file again. > > The code today works for anon shared mappings. See simple program below. > > You are correct in that it makes little or no sense for private mappings. > When looking closer at existing code, mremap() creates a new private > mapping in this case. This is most likely a bug. IIRC, existing code doesn't create mirrors of private pages as it requires old_len to be zero. There's no way to get private pages mapped twice this way. >>> >>> Correct. >>> As mentioned above, mremap does 'something' for private anon pages when >>> old_len == 0. However, this may be considered a bug. In this case, mremap >>> creates a new private anon mapping of length new_size. Since old_len == 0, >>> it does not unmap any of the old mapping. So, in this case mremap basically >>> creates a new private mapping (unrealted to the original) and does not >>> modify the old mapping. >>> >> >> Yeah, in my experiment, after the mremap() exists we have two different VMAs >> which can contain two different set of data. No page sharing is happening. > > So how does this actually work for the JVM garbage collector use case? > Aren't the garbage collected objects private anon? Good point. The sample program the JVM team gave me uses a shared anon mapping. As you mention one would expect these mappings to be private. I have asked them for more details on their use case. > Anyway this should be documented. Yes, their prototype work seems to take advantage of this existing undocumented behavior. It seems we have been carrying this functionality for at least 13 years. It may be time to document. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/10/2017 09:22 AM, Vlastimil Babka wrote: > On 07/09/2017 09:32 AM, Anshuman Khandual wrote: >> On 07/07/2017 11:39 PM, Mike Kravetz wrote: >>> On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: > On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: >> What is going to happen to mirrored after CoW for instance? >> >> In my opinion, it shouldn't be allowed for anon/private mappings at >> least. >> And with this limitation, I don't see much sense in the new interface -- >> just create mirror by mmap()ing the file again. > > The code today works for anon shared mappings. See simple program below. > > You are correct in that it makes little or no sense for private mappings. > When looking closer at existing code, mremap() creates a new private > mapping in this case. This is most likely a bug. IIRC, existing code doesn't create mirrors of private pages as it requires old_len to be zero. There's no way to get private pages mapped twice this way. >>> >>> Correct. >>> As mentioned above, mremap does 'something' for private anon pages when >>> old_len == 0. However, this may be considered a bug. In this case, mremap >>> creates a new private anon mapping of length new_size. Since old_len == 0, >>> it does not unmap any of the old mapping. So, in this case mremap basically >>> creates a new private mapping (unrealted to the original) and does not >>> modify the old mapping. >>> >> >> Yeah, in my experiment, after the mremap() exists we have two different VMAs >> which can contain two different set of data. No page sharing is happening. > > So how does this actually work for the JVM garbage collector use case? > Aren't the garbage collected objects private anon? Good point. The sample program the JVM team gave me uses a shared anon mapping. As you mention one would expect these mappings to be private. I have asked them for more details on their use case. > Anyway this should be documented. Yes, their prototype work seems to take advantage of this existing undocumented behavior. It seems we have been carrying this functionality for at least 13 years. It may be time to document. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/09/2017 09:32 AM, Anshuman Khandual wrote: > On 07/07/2017 11:39 PM, Mike Kravetz wrote: >> On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: >>> On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > What is going to happen to mirrored after CoW for instance? > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > And with this limitation, I don't see much sense in the new interface -- > just create mirror by mmap()ing the file again. The code today works for anon shared mappings. See simple program below. You are correct in that it makes little or no sense for private mappings. When looking closer at existing code, mremap() creates a new private mapping in this case. This is most likely a bug. >>> >>> IIRC, existing code doesn't create mirrors of private pages as it requires >>> old_len to be zero. There's no way to get private pages mapped twice this >>> way. >> >> Correct. >> As mentioned above, mremap does 'something' for private anon pages when >> old_len == 0. However, this may be considered a bug. In this case, mremap >> creates a new private anon mapping of length new_size. Since old_len == 0, >> it does not unmap any of the old mapping. So, in this case mremap basically >> creates a new private mapping (unrealted to the original) and does not >> modify the old mapping. >> > > Yeah, in my experiment, after the mremap() exists we have two different VMAs > which can contain two different set of data. No page sharing is happening. So how does this actually work for the JVM garbage collector use case? Aren't the garbage collected objects private anon? Anyway this should be documented.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/09/2017 09:32 AM, Anshuman Khandual wrote: > On 07/07/2017 11:39 PM, Mike Kravetz wrote: >> On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: >>> On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > What is going to happen to mirrored after CoW for instance? > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > And with this limitation, I don't see much sense in the new interface -- > just create mirror by mmap()ing the file again. The code today works for anon shared mappings. See simple program below. You are correct in that it makes little or no sense for private mappings. When looking closer at existing code, mremap() creates a new private mapping in this case. This is most likely a bug. >>> >>> IIRC, existing code doesn't create mirrors of private pages as it requires >>> old_len to be zero. There's no way to get private pages mapped twice this >>> way. >> >> Correct. >> As mentioned above, mremap does 'something' for private anon pages when >> old_len == 0. However, this may be considered a bug. In this case, mremap >> creates a new private anon mapping of length new_size. Since old_len == 0, >> it does not unmap any of the old mapping. So, in this case mremap basically >> creates a new private mapping (unrealted to the original) and does not >> modify the old mapping. >> > > Yeah, in my experiment, after the mremap() exists we have two different VMAs > which can contain two different set of data. No page sharing is happening. So how does this actually work for the JVM garbage collector use case? Aren't the garbage collected objects private anon? Anyway this should be documented.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 11:39 PM, Mike Kravetz wrote: > On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: >> On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: >>> On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. The patch breaks important invariant that anon page can be mapped into a process only once. >>> >>> Actually, the patch does not add any new functionality. It only provides >>> a new interface to existing functionality. >>> >>> Is it not possible to have an anon page mapped twice into the same process >>> via system V shared memory? shmget(anon), shmat(), shmat. >>> Of course, those are shared rather than private anon pages. >> >> By anon pages I mean, private anon or file pages. These are subject to CoW. >> What is going to happen to mirrored after CoW for instance? In my opinion, it shouldn't be allowed for anon/private mappings at least. And with this limitation, I don't see much sense in the new interface -- just create mirror by mmap()ing the file again. >>> >>> The code today works for anon shared mappings. See simple program below. >>> >>> You are correct in that it makes little or no sense for private mappings. >>> When looking closer at existing code, mremap() creates a new private >>> mapping in this case. This is most likely a bug. >> >> IIRC, existing code doesn't create mirrors of private pages as it requires >> old_len to be zero. There's no way to get private pages mapped twice this >> way. > > Correct. > As mentioned above, mremap does 'something' for private anon pages when > old_len == 0. However, this may be considered a bug. In this case, mremap > creates a new private anon mapping of length new_size. Since old_len == 0, > it does not unmap any of the old mapping. So, in this case mremap basically > creates a new private mapping (unrealted to the original) and does not > modify the old mapping. > Yeah, in my experiment, after the mremap() exists we have two different VMAs which can contain two different set of data. No page sharing is happening.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 11:39 PM, Mike Kravetz wrote: > On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: >> On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: >>> On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. The patch breaks important invariant that anon page can be mapped into a process only once. >>> >>> Actually, the patch does not add any new functionality. It only provides >>> a new interface to existing functionality. >>> >>> Is it not possible to have an anon page mapped twice into the same process >>> via system V shared memory? shmget(anon), shmat(), shmat. >>> Of course, those are shared rather than private anon pages. >> >> By anon pages I mean, private anon or file pages. These are subject to CoW. >> What is going to happen to mirrored after CoW for instance? In my opinion, it shouldn't be allowed for anon/private mappings at least. And with this limitation, I don't see much sense in the new interface -- just create mirror by mmap()ing the file again. >>> >>> The code today works for anon shared mappings. See simple program below. >>> >>> You are correct in that it makes little or no sense for private mappings. >>> When looking closer at existing code, mremap() creates a new private >>> mapping in this case. This is most likely a bug. >> >> IIRC, existing code doesn't create mirrors of private pages as it requires >> old_len to be zero. There's no way to get private pages mapped twice this >> way. > > Correct. > As mentioned above, mremap does 'something' for private anon pages when > old_len == 0. However, this may be considered a bug. In this case, mremap > creates a new private anon mapping of length new_size. Since old_len == 0, > it does not unmap any of the old mapping. So, in this case mremap basically > creates a new private mapping (unrealted to the original) and does not > modify the old mapping. > Yeah, in my experiment, after the mremap() exists we have two different VMAs which can contain two different set of data. No page sharing is happening.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 10:44 PM, Mike Kravetz wrote: > On 07/07/2017 01:45 AM, Anshuman Khandual wrote: >> On 07/06/2017 09:47 PM, Mike Kravetz wrote: >>> The mremap system call has the ability to 'mirror' parts of an existing >>> mapping. To do so, it creates a new mapping that maps the same pages as >>> the original mapping, just at a different virtual address. This >>> functionality has existed since at least the 2.6 kernel. >>> >>> This patch simply adds a new flag to mremap which will make this >>> functionality part of the API. It maintains backward compatibility with >>> the existing way of requesting mirroring (old_size == 0). >>> >>> If this new MREMAP_MIRROR flag is specified, then new_size must equal >>> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. >> >> Yeah it all looks good. But why is this requirement that if >> MREMAP_MAYMOVE is specified then old_size and new_size must >> be equal. > > No real reason. I just wanted to clearly separate the new interface from > the old. On second thought, it would be better to require old_size == 0 > as in the legacy interface. That would be redundant. Mirroring will just happen because old_size is 0 whether we mention the MREMAP_MIRROR flag or not. IMHO it should just mirror if the flag is specified irrespective of the old_size value.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 10:44 PM, Mike Kravetz wrote: > On 07/07/2017 01:45 AM, Anshuman Khandual wrote: >> On 07/06/2017 09:47 PM, Mike Kravetz wrote: >>> The mremap system call has the ability to 'mirror' parts of an existing >>> mapping. To do so, it creates a new mapping that maps the same pages as >>> the original mapping, just at a different virtual address. This >>> functionality has existed since at least the 2.6 kernel. >>> >>> This patch simply adds a new flag to mremap which will make this >>> functionality part of the API. It maintains backward compatibility with >>> the existing way of requesting mirroring (old_size == 0). >>> >>> If this new MREMAP_MIRROR flag is specified, then new_size must equal >>> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. >> >> Yeah it all looks good. But why is this requirement that if >> MREMAP_MAYMOVE is specified then old_size and new_size must >> be equal. > > No real reason. I just wanted to clearly separate the new interface from > the old. On second thought, it would be better to require old_size == 0 > as in the legacy interface. That would be redundant. Mirroring will just happen because old_size is 0 whether we mention the MREMAP_MIRROR flag or not. IMHO it should just mirror if the flag is specified irrespective of the old_size value.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: > On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: >> On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: >>> On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: The mremap system call has the ability to 'mirror' parts of an existing mapping. To do so, it creates a new mapping that maps the same pages as the original mapping, just at a different virtual address. This functionality has existed since at least the 2.6 kernel. This patch simply adds a new flag to mremap which will make this functionality part of the API. It maintains backward compatibility with the existing way of requesting mirroring (old_size == 0). If this new MREMAP_MIRROR flag is specified, then new_size must equal old_size. In addition, the MREMAP_MAYMOVE flag must be specified. >>> >>> The patch breaks important invariant that anon page can be mapped into a >>> process only once. >> >> Actually, the patch does not add any new functionality. It only provides >> a new interface to existing functionality. >> >> Is it not possible to have an anon page mapped twice into the same process >> via system V shared memory? shmget(anon), shmat(), shmat. >> Of course, those are shared rather than private anon pages. > > By anon pages I mean, private anon or file pages. These are subject to CoW. > >>> What is going to happen to mirrored after CoW for instance? >>> >>> In my opinion, it shouldn't be allowed for anon/private mappings at least. >>> And with this limitation, I don't see much sense in the new interface -- >>> just create mirror by mmap()ing the file again. >> >> The code today works for anon shared mappings. See simple program below. >> >> You are correct in that it makes little or no sense for private mappings. >> When looking closer at existing code, mremap() creates a new private >> mapping in this case. This is most likely a bug. > > IIRC, existing code doesn't create mirrors of private pages as it requires > old_len to be zero. There's no way to get private pages mapped twice this > way. Correct. As mentioned above, mremap does 'something' for private anon pages when old_len == 0. However, this may be considered a bug. In this case, mremap creates a new private anon mapping of length new_size. Since old_len == 0, it does not unmap any of the old mapping. So, in this case mremap basically creates a new private mapping (unrealted to the original) and does not modify the old mapping. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 10:45 AM, Kirill A. Shutemov wrote: > On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: >> On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: >>> On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: The mremap system call has the ability to 'mirror' parts of an existing mapping. To do so, it creates a new mapping that maps the same pages as the original mapping, just at a different virtual address. This functionality has existed since at least the 2.6 kernel. This patch simply adds a new flag to mremap which will make this functionality part of the API. It maintains backward compatibility with the existing way of requesting mirroring (old_size == 0). If this new MREMAP_MIRROR flag is specified, then new_size must equal old_size. In addition, the MREMAP_MAYMOVE flag must be specified. >>> >>> The patch breaks important invariant that anon page can be mapped into a >>> process only once. >> >> Actually, the patch does not add any new functionality. It only provides >> a new interface to existing functionality. >> >> Is it not possible to have an anon page mapped twice into the same process >> via system V shared memory? shmget(anon), shmat(), shmat. >> Of course, those are shared rather than private anon pages. > > By anon pages I mean, private anon or file pages. These are subject to CoW. > >>> What is going to happen to mirrored after CoW for instance? >>> >>> In my opinion, it shouldn't be allowed for anon/private mappings at least. >>> And with this limitation, I don't see much sense in the new interface -- >>> just create mirror by mmap()ing the file again. >> >> The code today works for anon shared mappings. See simple program below. >> >> You are correct in that it makes little or no sense for private mappings. >> When looking closer at existing code, mremap() creates a new private >> mapping in this case. This is most likely a bug. > > IIRC, existing code doesn't create mirrors of private pages as it requires > old_len to be zero. There's no way to get private pages mapped twice this > way. Correct. As mentioned above, mremap does 'something' for private anon pages when old_len == 0. However, this may be considered a bug. In this case, mremap creates a new private anon mapping of length new_size. Since old_len == 0, it does not unmap any of the old mapping. So, in this case mremap basically creates a new private mapping (unrealted to the original) and does not modify the old mapping. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: > On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > > On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > >> The mremap system call has the ability to 'mirror' parts of an existing > >> mapping. To do so, it creates a new mapping that maps the same pages as > >> the original mapping, just at a different virtual address. This > >> functionality has existed since at least the 2.6 kernel. > >> > >> This patch simply adds a new flag to mremap which will make this > >> functionality part of the API. It maintains backward compatibility with > >> the existing way of requesting mirroring (old_size == 0). > >> > >> If this new MREMAP_MIRROR flag is specified, then new_size must equal > >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > > > The patch breaks important invariant that anon page can be mapped into a > > process only once. > > Actually, the patch does not add any new functionality. It only provides > a new interface to existing functionality. > > Is it not possible to have an anon page mapped twice into the same process > via system V shared memory? shmget(anon), shmat(), shmat. > Of course, those are shared rather than private anon pages. By anon pages I mean, private anon or file pages. These are subject to CoW. > > What is going to happen to mirrored after CoW for instance? > > > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > > And with this limitation, I don't see much sense in the new interface -- > > just create mirror by mmap()ing the file again. > > The code today works for anon shared mappings. See simple program below. > > You are correct in that it makes little or no sense for private mappings. > When looking closer at existing code, mremap() creates a new private > mapping in this case. This is most likely a bug. IIRC, existing code doesn't create mirrors of private pages as it requires old_len to be zero. There's no way to get private pages mapped twice this way. -- Kirill A. Shutemov
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Fri, Jul 07, 2017 at 10:29:52AM -0700, Mike Kravetz wrote: > On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > > On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > >> The mremap system call has the ability to 'mirror' parts of an existing > >> mapping. To do so, it creates a new mapping that maps the same pages as > >> the original mapping, just at a different virtual address. This > >> functionality has existed since at least the 2.6 kernel. > >> > >> This patch simply adds a new flag to mremap which will make this > >> functionality part of the API. It maintains backward compatibility with > >> the existing way of requesting mirroring (old_size == 0). > >> > >> If this new MREMAP_MIRROR flag is specified, then new_size must equal > >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > > > The patch breaks important invariant that anon page can be mapped into a > > process only once. > > Actually, the patch does not add any new functionality. It only provides > a new interface to existing functionality. > > Is it not possible to have an anon page mapped twice into the same process > via system V shared memory? shmget(anon), shmat(), shmat. > Of course, those are shared rather than private anon pages. By anon pages I mean, private anon or file pages. These are subject to CoW. > > What is going to happen to mirrored after CoW for instance? > > > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > > And with this limitation, I don't see much sense in the new interface -- > > just create mirror by mmap()ing the file again. > > The code today works for anon shared mappings. See simple program below. > > You are correct in that it makes little or no sense for private mappings. > When looking closer at existing code, mremap() creates a new private > mapping in this case. This is most likely a bug. IIRC, existing code doesn't create mirrors of private pages as it requires old_len to be zero. There's no way to get private pages mapped twice this way. -- Kirill A. Shutemov
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > The patch breaks important invariant that anon page can be mapped into a > process only once. Actually, the patch does not add any new functionality. It only provides a new interface to existing functionality. Is it not possible to have an anon page mapped twice into the same process via system V shared memory? shmget(anon), shmat(), shmat. Of course, those are shared rather than private anon pages. > > What is going to happen to mirrored after CoW for instance? > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > And with this limitation, I don't see much sense in the new interface -- > just create mirror by mmap()ing the file again. The code today works for anon shared mappings. See simple program below. You are correct in that it makes little or no sense for private mappings. When looking closer at existing code, mremap() creates a new private mapping in this case. This is most likely a bug. Again, my intention is not to create new functionality but rather document existing functionality as part of a programming interface. -- Mike Kravetz #include #include #include #include #define __USE_GNU #include #include #include #include #include #define H_PAGESIZE (2 * 1024 * 1024) int hugetlb = 0; #define PROTECTION (PROT_READ | PROT_WRITE) #define ADDR (void *)(0x0UL) /* #define FLAGS (MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB) */ #define FLAGS (MAP_SHARED|MAP_ANONYMOUS) int main(int argc, char ** argv) { int fd, ret; int i; long long hpages, tpage; void *addr; void *addr2; char foo; if (argc == 2) { if (!strcmp(argv[1], "hugetlb")) hugetlb = 1; } hpages = 5; printf("Reserving an address ...\n"); addr = mmap(ADDR, H_PAGESIZE * hpages * 2, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE| (hugetlb ? MAP_HUGETLB : 0), -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit (1); } printf("\tgot address %p to %p\n", (void *)addr, (void *)(addr + H_PAGESIZE * hpages * 2)); printf("mmapping %d 2MB huge pages\n", hpages); addr = mmap(addr, H_PAGESIZE * hpages, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS| (hugetlb ? MAP_HUGETLB : 0), -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit (1); } /* initialize data */ for (i = 0; i < hpages; i++) *((char *)addr + (i * H_PAGESIZE)) = 'a'; printf("pages allocated and initialized at %p\n", (void *)addr); addr2 = mremap(addr, 0, H_PAGESIZE * hpages, MREMAP_MAYMOVE | MREMAP_FIXED, addr + (H_PAGESIZE * hpages)); if (addr2 == MAP_FAILED) { perror("mremap"); exit (1); } printf("mapping relocated to %p\n", (void *)addr2); /* verify data */ printf("Verifying data at address %p\n", (void *)addr); for (i = 0; i < hpages; i++) { if (*((char *)addr + (i * H_PAGESIZE)) != 'a') { printf("data at address %p not as expected\n", (void *)((char *)addr + (i * H_PAGESIZE))); } } if (i >= hpages) printf("\t success!\n"); /* verify data */ printf("Verifying data at address %p\n", (void *)addr2); for (i = 0; i < hpages; i++) { if (*((char *)addr2 + (i * H_PAGESIZE)) != 'a') { printf("data at address %p not as expected\n", (void *)((char *)addr2 + (i * H_PAGESIZE))); } } if (i >= hpages) printf("\t success!\n"); return ret; }
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 03:23 AM, Kirill A. Shutemov wrote: > On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > The patch breaks important invariant that anon page can be mapped into a > process only once. Actually, the patch does not add any new functionality. It only provides a new interface to existing functionality. Is it not possible to have an anon page mapped twice into the same process via system V shared memory? shmget(anon), shmat(), shmat. Of course, those are shared rather than private anon pages. > > What is going to happen to mirrored after CoW for instance? > > In my opinion, it shouldn't be allowed for anon/private mappings at least. > And with this limitation, I don't see much sense in the new interface -- > just create mirror by mmap()ing the file again. The code today works for anon shared mappings. See simple program below. You are correct in that it makes little or no sense for private mappings. When looking closer at existing code, mremap() creates a new private mapping in this case. This is most likely a bug. Again, my intention is not to create new functionality but rather document existing functionality as part of a programming interface. -- Mike Kravetz #include #include #include #include #define __USE_GNU #include #include #include #include #include #define H_PAGESIZE (2 * 1024 * 1024) int hugetlb = 0; #define PROTECTION (PROT_READ | PROT_WRITE) #define ADDR (void *)(0x0UL) /* #define FLAGS (MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB) */ #define FLAGS (MAP_SHARED|MAP_ANONYMOUS) int main(int argc, char ** argv) { int fd, ret; int i; long long hpages, tpage; void *addr; void *addr2; char foo; if (argc == 2) { if (!strcmp(argv[1], "hugetlb")) hugetlb = 1; } hpages = 5; printf("Reserving an address ...\n"); addr = mmap(ADDR, H_PAGESIZE * hpages * 2, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE| (hugetlb ? MAP_HUGETLB : 0), -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit (1); } printf("\tgot address %p to %p\n", (void *)addr, (void *)(addr + H_PAGESIZE * hpages * 2)); printf("mmapping %d 2MB huge pages\n", hpages); addr = mmap(addr, H_PAGESIZE * hpages, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS| (hugetlb ? MAP_HUGETLB : 0), -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit (1); } /* initialize data */ for (i = 0; i < hpages; i++) *((char *)addr + (i * H_PAGESIZE)) = 'a'; printf("pages allocated and initialized at %p\n", (void *)addr); addr2 = mremap(addr, 0, H_PAGESIZE * hpages, MREMAP_MAYMOVE | MREMAP_FIXED, addr + (H_PAGESIZE * hpages)); if (addr2 == MAP_FAILED) { perror("mremap"); exit (1); } printf("mapping relocated to %p\n", (void *)addr2); /* verify data */ printf("Verifying data at address %p\n", (void *)addr); for (i = 0; i < hpages; i++) { if (*((char *)addr + (i * H_PAGESIZE)) != 'a') { printf("data at address %p not as expected\n", (void *)((char *)addr + (i * H_PAGESIZE))); } } if (i >= hpages) printf("\t success!\n"); /* verify data */ printf("Verifying data at address %p\n", (void *)addr2); for (i = 0; i < hpages; i++) { if (*((char *)addr2 + (i * H_PAGESIZE)) != 'a') { printf("data at address %p not as expected\n", (void *)((char *)addr2 + (i * H_PAGESIZE))); } } if (i >= hpages) printf("\t success!\n"); return ret; }
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 01:45 AM, Anshuman Khandual wrote: > On 07/06/2017 09:47 PM, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > Yeah it all looks good. But why is this requirement that if > MREMAP_MAYMOVE is specified then old_size and new_size must > be equal. No real reason. I just wanted to clearly separate the new interface from the old. On second thought, it would be better to require old_size == 0 as in the legacy interface. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/07/2017 01:45 AM, Anshuman Khandual wrote: > On 07/06/2017 09:47 PM, Mike Kravetz wrote: >> The mremap system call has the ability to 'mirror' parts of an existing >> mapping. To do so, it creates a new mapping that maps the same pages as >> the original mapping, just at a different virtual address. This >> functionality has existed since at least the 2.6 kernel. >> >> This patch simply adds a new flag to mremap which will make this >> functionality part of the API. It maintains backward compatibility with >> the existing way of requesting mirroring (old_size == 0). >> >> If this new MREMAP_MIRROR flag is specified, then new_size must equal >> old_size. In addition, the MREMAP_MAYMOVE flag must be specified. > > Yeah it all looks good. But why is this requirement that if > MREMAP_MAYMOVE is specified then old_size and new_size must > be equal. No real reason. I just wanted to clearly separate the new interface from the old. On second thought, it would be better to require old_size == 0 as in the legacy interface. -- Mike Kravetz
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. The patch breaks important invariant that anon page can be mapped into a process only once. What is going to happen to mirrored after CoW for instance? In my opinion, it shouldn't be allowed for anon/private mappings at least. And with this limitation, I don't see much sense in the new interface -- just create mirror by mmap()ing the file again. -- Kirill A. Shutemov
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On Thu, Jul 06, 2017 at 09:17:26AM -0700, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. The patch breaks important invariant that anon page can be mapped into a process only once. What is going to happen to mirrored after CoW for instance? In my opinion, it shouldn't be allowed for anon/private mappings at least. And with this limitation, I don't see much sense in the new interface -- just create mirror by mmap()ing the file again. -- Kirill A. Shutemov
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/06/2017 09:47 PM, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. Yeah it all looks good. But why is this requirement that if MREMAP_MAYMOVE is specified then old_size and new_size must be equal.
Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
On 07/06/2017 09:47 PM, Mike Kravetz wrote: > The mremap system call has the ability to 'mirror' parts of an existing > mapping. To do so, it creates a new mapping that maps the same pages as > the original mapping, just at a different virtual address. This > functionality has existed since at least the 2.6 kernel. > > This patch simply adds a new flag to mremap which will make this > functionality part of the API. It maintains backward compatibility with > the existing way of requesting mirroring (old_size == 0). > > If this new MREMAP_MIRROR flag is specified, then new_size must equal > old_size. In addition, the MREMAP_MAYMOVE flag must be specified. Yeah it all looks good. But why is this requirement that if MREMAP_MAYMOVE is specified then old_size and new_size must be equal.