On 5/30/20 12:52 AM, Dan Williams wrote:
On Fri, May 29, 2020 at 3:55 AM Aneesh Kumar K.V
<aneesh.ku...@linux.ibm.com> wrote:
On 5/29/20 3:22 PM, Jan Kara wrote:
Hi!
On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
Thanks Michal. I also missed Jeff in this email thread.
And I think you'll also need some of the sched maintainers for the prctl
bits...
On 5/29/20 3:03 PM, Michal Suchánek wrote:
Adding Jan
On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.
This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.
Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
...
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
I'm not sure CONFIG is really the right approach here. For a distro that would
basically mean to disable MAP_SYNC for all PPC kernels unless application
explicitly uses the right prctl. Shouldn't we rather initialize
default_map_sync_mask on boot based on whether the CPU we run on requires
new flush instructions or not? Otherwise the patch looks sensible.
yes that is correct. We ideally want to deny MAP_SYNC only w.r.t
POWER10. But on a virtualized platform there is no easy way to detect
that. We could ideally hook this into the nvdimm driver where we look at
the new compat string ibm,persistent-memory-v2 and then disable MAP_SYNC
if we find a device with the specific value.
BTW with the recent changes I posted for the nvdimm driver, older kernel
won't initialize persistent memory device on newer hardware. Newer
hardware will present the device to OS with a different device tree
compat string.
My expectation w.r.t this patch was, Distro would want to mark
CONFIG_ARCH_MAP_SYNC_DISABLE=n based on the different application
certification. Otherwise application will have to end up calling the
prctl(MMF_DISABLE_MAP_SYNC, 0) any way. If that is the case, should this
be dependent on P10?
With that I am wondering should we even have this patch? Can we expect
userspace get updated to use new instruction?.
With ppc64 we never had a real persistent memory device available for
end user to try. The available persistent memory stack was using vPMEM
which was presented as a volatile memory region for which there is no
need to use any of the flush instructions. We could safely assume that
as we get applications certified/verified for working with pmem device
on ppc64, they would all be using the new instructions?
I think prctl is the wrong interface for this. I was thinking a sysfs
interface along the same lines as /sys/block/pmemX/dax/write_cache.
That attribute is toggling DAXDEV_WRITE_CACHE for the determination of
whether the platform or the kernel needs to handle cache flushing
relative to power loss. A similar attribute can be established for
DAXDEV_SYNC, it would simply default to off based on a configuration
time policy, but be dynamically changeable at runtime via sysfs.
These flags are device properties that affect the kernel and
userspace's handling of persistence.
That will not handle the scenario with multiple applications using the
same fsdax mount point where one is updated to use the new instruction
and the other is not.
-aneeseh