"Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: > On 6/1/20 5:37 PM, Michal Suchánek wrote: >> On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote: >>> On 6/1/20 3:39 PM, Jan Kara wrote: >>>> On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote: >>>>> On 5/29/20 3:22 PM, Jan Kara wrote: >>>>>> On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote: >>>>>>> Thanks Michal. I also missed Jeff in this email thread. >>>>>> >>>>>> And I think you'll also need some of the sched maintainers for the prctl >>>>>> bits... >>>>>> >>>>>>> On 5/29/20 3:03 PM, Michal Suchánek wrote: >>>>>>>> Adding Jan >>>>>>>> >>>>>>>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote: >>>>>>>>> With POWER10, architecture is adding new pmem flush and sync >>>>>>>>> instructions. >>>>>>>>> The kernel should prevent the usage of MAP_SYNC if applications are >>>>>>>>> not using >>>>>>>>> the new instructions on newer hardware. >>>>>>>>> >>>>>>>>> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to >>>>>>>>> enable >>>>>>>>> the usage of MAP_SYNC. The kernel config option is added to allow the >>>>>>>>> user >>>>>>>>> to control whether MAP_SYNC should be enabled by default or not. >>>>>>>>> >>>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> >>>>>> ... >>>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c >>>>>>>>> index 8c700f881d92..d5a9a363e81e 100644 >>>>>>>>> --- a/kernel/fork.c >>>>>>>>> +++ b/kernel/fork.c >>>>>>>>> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp >>>>>>>>> DEFINE_SPINLOCK(mmlist_lock); >>>>>>>>> static unsigned long default_dump_filter = >>>>>>>>> MMF_DUMP_FILTER_DEFAULT; >>>>>>>>> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE >>>>>>>>> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK; >>>>>>>>> +#else >>>>>>>>> +unsigned long default_map_sync_mask = 0; >>>>>>>>> +#endif >>>>>>>>> + >>>>>> >>>>>> I'm not sure CONFIG is really the right approach here. For a distro that >>>>>> would >>>>>> basically mean to disable MAP_SYNC for all PPC kernels unless application >>>>>> explicitly uses the right prctl. Shouldn't we rather initialize >>>>>> default_map_sync_mask on boot based on whether the CPU we run on requires >>>>>> new flush instructions or not? Otherwise the patch looks sensible. >>>>>> >>>>> >>>>> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10. >>>>> But on a virtualized platform there is no easy way to detect that. We >>>>> could >>>>> ideally hook this into the nvdimm driver where we look at the new compat >>>>> string ibm,persistent-memory-v2 and then disable MAP_SYNC >>>>> if we find a device with the specific value. >>>> >>>> Hum, couldn't we set some flag for nvdimm devices with >>>> "ibm,persistent-memory-v2" property and then check it during mmap(2) time >>>> and when the device has this propery and the mmap(2) caller doesn't have >>>> the prctl set, we'd disallow MAP_SYNC? That should make things mostly >>>> seamless, shouldn't it? Only apps that want to use MAP_SYNC on these >>>> devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these >>>> applications need to be aware of new instructions so this isn't that much >>>> additional burden... >>> >>> I am not sure application would want to add that much details/knowledge >>> about a platform in their code. I was expecting application to do >>> >>> #ifdef __ppc64__ >>> prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0)); >>> #endif >>> a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, >>> MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0); >>> >>> >>> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2 >>> is not useful. Do you see a value in making all these device specific rather >>> than a conditional on __ppc64__? > >> If the vpmem devices continue to work with the old instruction on >> POWER10 then it makes sense to make this per-device. > > vPMEM doesn't have write_cache and hence it is synchronous even without > using any specific flush instruction. The question is do we want to have > different programming steps when running on vPMEM vs a persistent PMEM > device on ppc64. > > I will work on the device specific ENABLE flag and then we can compare > the kernel complexity against the added benefit.
I have posted an RFC v2 [1] that implements a device-specific MAP_SYNC enable/disable feature. The Posted changes also add a dax flag suggested by Dan. With device-specific MAP_SYNC enable/disable, it was just a sysfs file export of the same flag. 1. https://lore.kernel.org/linuxppc-dev/20200602074909.36738-1-aneesh.ku...@linux.ibm.com/ -aneesh