RE: [ndctl PATCH v2 2/2] ndctl, monitor: refactor read_config_file

2019-01-09 Thread qi.f...@fujitsu.com


> -Original Message-
> From: Verma, Vishal L [mailto:vishal.l.ve...@intel.com]
> Sent: Tuesday, January 8, 2019 7:02 AM
> To: linux-nvdimm@lists.01.org; Qi, Fuli/斉 福利 
> Subject: Re: [ndctl PATCH v2 2/2] ndctl, monitor: refactor read_config_file
> 
> 
> On Mon, 2019-01-07 at 17:38 +0900, QI Fuli wrote:
> > This patch is used for refactoring read_config_file by replacing the
> > open coded implementation with ccan/ciniparser library.
> >
> > Signed-off-by: QI Fuli 
> > ---
> >  ndctl/Makefile.am  |   1 +
> >  ndctl/monitor.c| 115 -
> >  ndctl/monitor.conf |   2 +
> >  3 files changed, 32 insertions(+), 86 deletions(-)
> >
> > diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
> > ff01e06..2f00b65 100644
> > --- a/ndctl/Makefile.am
> > +++ b/ndctl/Makefile.am
> > @@ -28,6 +28,7 @@ ndctl_LDADD =\
> > lib/libndctl.la \
> > ../daxctl/lib/libdaxctl.la \
> > ../libutil.a \
> > +   ../libccan.a \
> > $(UUID_LIBS) \
> > $(KMOD_LIBS) \
> > $(JSON_LIBS)
> > diff --git a/ndctl/monitor.c b/ndctl/monitor.c index 233f2bb..8499fd4
> > 100644
> > --- a/ndctl/monitor.c
> > +++ b/ndctl/monitor.c
> 
> [..]
> 
> > +   dictionary *dic;
> > +
> > +   dic = ciniparser_load(config_file);
> > +   if (!dic)
> > +   return -errno;
> > +
> > +   parse_config(¶m.bus,
> > +   ciniparser_getstring(dic, "monitor:bus", NULL));
> > +   parse_config(¶m.dimm,
> > +   ciniparser_getstring(dic, "monitor:dimm", NULL));
> > +   parse_config(¶m.region,
> > +   ciniparser_getstring(dic, "monitor:region", NULL));
> > +   parse_config(¶m.namespace,
> > +   ciniparser_getstring(dic, "monitor:namespace", NULL));
> > +   parse_config(&monitor.dimm_event,
> > +   ciniparser_getstring(dic, "monitor:dimm-event", NULL));
> 
> Since the default return is always NULL in the above calls, we can use the 
> more concise
> form, ciniparser_getstr() -
> 
>   #define ciniparser_getstr(d, k)  ciniparser_getstring(d, k, NULL)
> 
> 
> > +   if (monitor.log)
> > goto out;
> 
> [..]
> 
> > diff --git a/ndctl/monitor.conf b/ndctl/monitor.conf index
> > 934e2c0..edcf8e2 100644
> > --- a/ndctl/monitor.conf
> > +++ b/ndctl/monitor.conf
> > @@ -9,6 +9,8 @@
> >  # Multiple space-separated values are allowed, but except the
> > following  # characters: : ? / \ % " ' $ & ! * { } [ ] ( ) = < > @
> 
> Is this character restriction still true?
> 
> >
> > +[monitor]
> > +
> >  # The objects to monitor are filtered via dimm's name by setting key 
> > "dimm".
> >  # If this value is different from the value of [--dimm=]
> > option,  # both of the values will work.
> 
> With Dan's recent reworks changing the build-time define to reflect an 
> 'ndctl-wide'
> config, and with the introduction of ini style sections, I think this change 
> merits
> a little more thought.
> 
> The addition of the [monitor] section breaks any existing configs - which is 
> fine,
> I don't expect we will break thousands, or even tens, of existing deployments 
> out
> in the wild :)
> 
> That being said, if we are breaking old config files, we can take this chance 
> to
> rethink and restructure configuration a bit. Here is my proposal.
> 
> - Rename monitor.conf to ndctl.conf, it will be a global ndctl-wide config 
> file that
> all commands can refer to.
> 
> - Define sections only for command-specific options, but have a [core] 
> section for
> options that apply to several different commands. For example core:region and
> core:bus will specify the region and bus to always operate on by default. In 
> contrast,
> monitor:dimm-event and monitor:log will be in the [monitor] section, and will 
> always
> be monitor specific.
> 
> - You can still have multiple monitor processes started each using a distinct 
> config.
> The way this can work is that the 'global' ndctl config file will be read 
> first,
> and will always apply (default /etc/ndctl/ndctl.conf, configurable at compile 
> time).
> Any 'local'
> config supplied using the config-file option to monitor will override any 
> field that
> it changes over the global config. This way the local configs for monitor 
> don't need
> to duplicate any truly common core information (say bus, for example), but 
> still
> have the option to override them if needed. This follows the user experience 
> git
> provides with ~/.gitconfig vs. repo local config.
> 
> - Existing commands can then be taught to respect the core config options, 
> and even
> grow their own specific sections, but all that can be part of a longer term 
> effort.
> Putting in the foundations for that now can allow us to convert individual 
> commands
> as needed. For now monitor will be the first and only user of this override 
> hierarchy.
> I suspect there will be some refactoring of option parsing in different 
> commands,
> but we probably shouldn't attempt that until a second user of the config 
> fram

RE: [ndctl PATCH v2 1/2] ndctl: add the ciniparser tool from ccan

2019-01-09 Thread qi.f...@fujitsu.com
> > > > >
> > > > > Hi Qi,
> > > > >
> > > > > Thanks for these patches, and also for rebasing to the latest!
> > > > >
> > > > > ciniparser.c adds a new warning (see below). Could you fix that
> > > > > up in a new patch on top of the initial import (i.e. we retain
> > > > > the as-is import, and make a record of what we changed).
> > > > >
> > > > > Looks like the -Wformat-truncation= warnings were first
> > > > > introduced in gcc-7.1, but only became really zealous after
> > > > > gcc-8.1. The right solution here might just be to suppress it by
> > > > > adding a -Wno-format- truncation to CFLAGS, but I'm open to
> > > > > considering other alternatives.
> > > >
> > > > That warning looks pretty handy.
> > > >
> > > > >
> > > > >$ gcc --version
> > > > >gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5)
> > > > >
> > > > >ccan/ciniparser/ciniparser.c: In function ‘ciniparser_load’:
> > > > >ccan/ciniparser/ciniparser.c:442:39: warning: ‘%s’ directive
> > > > > output may be truncated writing up to 1024 bytes into a region
> > > > > of size between 0 and 1024 [-Wformat-truncation=]
> > > > >snprintf(tmp, ASCIILINESZ + 1, "%s:%s", section, key);
> > > > >   ^~~~~
> > > > >In file included from /usr/include/stdio.h:873,
> > > > > from ./ccan/ciniparser/ciniparser.h:39,
> > > > > from ccan/ciniparser/ciniparser.c:36:
> > > > >/usr/include/bits/stdio2.h:67:10: note:
> > > > > ‘__builtin___snprintf_chk’ output between 2 and 2050 bytes into
> > > > > a destination of size 1025
> > > > >   return __builtin___snprintf_chk (__s, __n,
> > > > > __USE_FORTIFY_LEVEL - 1,
> > > > >
> > > > > ^~
> > > > > ~~
> > > > >__bos (__s), __fmt, __va_arg_pack ());
> > > > >~
> > > >
> > > > Since it's an snprintf without the error return being checked,
> > > > perhaps it would be happier as an sprintf with a precision
> > > > specified to limit the output?
> > >
> > > That would work, but 'section' and 'key' are both ASCIILINESZ+1, so
> > > how would we specify precision for both the %s specifiers? i.e.
> > > would we have to make up an artificial split?
> >
> > Good point. No way to do it without variable precision. However,
> > looking closer I think the warning only triggers when the return value
> > is not checked, so perhaps better to just add error handling to that
> > case.
> 
> Yep that seems reasonable to me.

Hi,

Thank you for your comments.
I updated the gcc version in my local environment and got the warning messages.
I will make a new version to fix them.

Thank you.
Qi
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
>  This patch series has implementation for "virtio pmem". 
>  "virtio pmem" is fake persistent memory(nvdimm) in guest 
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.  

H. Sharing the host page cache direct into the guest VM. Sounds
like a good idea, but.

This means the guest VM can now run timing attacks to observe host
side page cache residency, and depending on the implementation I'm
guessing that the guest will be able to control host side page
cache eviction, too (e.g. via discard or hole punch operations).

Which means this functionality looks to me like a new vector for
information leakage into and out of the guest VM via guest
controlled host page cache manipulation.

https://arxiv.org/pdf/1901.01161

I might be wrong, but if I'm not we're going to have to be very
careful about how guest VMs can access and manipulate host side
resources like the page cache.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: PMEM error-handling forces SIGKILL causes kernel panic

2019-01-09 Thread Dan Williams
[ switch to text mail, add lkml and Naoya ]

On Wed, Jan 9, 2019 at 12:19 PM Jane Chu  wrote:
>
> Hi, Dan,
>
> Sorry for the late report.
> We recently saw panics from PMEM error handling, here are the log messages
> and stack trace.  "<--" are added by me.
>
> > [ 4488.098830] mce: Uncorrected hardware memory error in user-access at 
> > a6ec46f8  <--
> > [ 4488.131625] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_pmon_tpcc1:84778
> > [ 4488.144326] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_clmn_tpcc1:84780  <-- SIGKILL
> > [ 4488.156965] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_psp0_tpcc1:84782
> > [ 4488.169608] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_ipc0_tpcc1:84785
> > [ 4488.182210] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_vktm_tpcc1:84793
> > [ 4488.194924] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_gen0_tpcc1:84797
> > [ 4488.207212] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_mman_tpcc1:84799
> > [ 4488.220168] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_scmn_tpcc1:84803
> > [ 4488.232974] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_diag_tpcc1:84806
> > [ 4488.245660] Memory failure: 0xa6ec46f: forcibly killing 
> > ora_scmn_tpcc1:84808
> ..
> > [ 4488.595847] BUG: unable to handle kernel NULL pointer dereference at 
> >   <--
> > [ 4488.604834] IP: _raw_spin_lock_irqsave+0x27/0x48 <-- 
> > task->sighand->siglock is NULL
> > [ 4488.610079] PGD 51ef45067 P4D 51ef45067 PUD 51ef44067 PMD 0
> > [ 4488.616514] Oops: 0002 [#1] SMP NOPTI<-- 2'010: 
> > no-page/write/kernel-mode
> > [ 4488.620674] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
> > nf_n
> > [ 4488.712640]  lpc_ich shpchp wmi pcc_cpufreq dm_multipath binfmt_misc sg 
> > fuse
> > [ 4488.749400] CPU: 2 PID: 296408 Comm: oracle_296408_t Tainted: G   M
> > [ 4488.764383] Hardware name: Oracle Corporation ORACLE SERVER 
> > X8-2/ASM,MB,X8-2,
> > [ 4488.778955] task: 997ccab28000 task.stack: b9b832944000
> > [ 4488.789802] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x48
> > [ 4488.799874] RSP: 0018:b9b832947d28 EFLAGS: 00010046
> > [ 4488.810071] RAX:  RBX: 0808 RCX: 
> > 
> > [ 4488.822959] RDX: 0001 RSI: 0001 RDI: 
> > 0808
> > [ 4488.835375] RBP: b9b832947d38 R08:  R09: 
> > 52bb
> > [ 4488.847854] R10: 0001 R11: 00aa R12: 
> > 0246
> > [ 4488.860446] R13: 0004 R14: 0001 R15: 
> > 0009
> > [ 4488.872994] FS:  7fa0f17533c0() GS:98ba0068() 
> > knlGS:0
> > [ 4488.886705] CS:  0010 DS:  ES:  CR0: 80050033
> > [ 4488.897877] CR2: 0808 CR3: 00051146a004 CR4: 
> > 007606e0
> > [ 4488.910774] DR0:  DR1:  DR2: 
> > 
> > [ 4488.923665] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [ 4488.936517] PKRU: 5554
> > [ 4488.944467] Call Trace:
> > [ 4488.952152]  force_sig_info+0x2e/0xde
> > [ 4488.961266]  force_sig+0x16/0x18
> > [ 4488.970568]  kill_procs+0x15b/0x1a0<-- forcing  SIGKILL to the 
> > user process
> <-- due to tk->addr_valid=0 
> which means rmap()
> <-- can't find a 'vma' 
> hosting the offending
> <-- 'pfn', because the 
> process has exited by now.

Ok, that race can happen for the DRAM case as well.

>
> > [ 4488.979601]  memory_failure+0x1dd/0x235
> > [ 4488.989071]  do_machine_check+0x738/0xc89
> > [ 4488.998739]  ? devm_free_irq+0x22/0x71
> > [ 4489.008129]  ? machine_check+0x115/0x124
> > [ 4489.017708]  do_mce+0x15/0x17
> > [ 4489.026260]  machine_check+0x11f/0x124
> > [ 4489.035725] RIP: 0033:0x1253b9e6
> > [ 4489.044695] RSP: 002b:7ffc205c1600 EFLAGS: 00010246
> > [ 4489.055964] RAX: 384d RBX: 0003 RCX: 
> > 002f
> > [ 4489.069509] RDX: 0007 RSI: 3f12 RDI: 
> > 3786
> > [ 4489.083065] RBP: 7ffc205c16a0 R08: 009fa526c0e4 R09: 
> > 009fa526c064
> > [ 4489.096672] R10:  R11:  R12: 
> > 009fa526c086
> > [ 4489.110342] R13: 7fa0eb399c08 R14: 0015 R15: 
> > 009fa526f8b1
> > [ 4489.124026] Code: 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 
> > fb 9c
> > [ 4489.156598] RIP: _raw_spin_lock_irqsave+0x27/0x48 RSP: b9b832947d28
>
>
> After reboot, the offending 'pfn' is recorded in the 'badblock' record in 
> namespace0.0.
>
> > "dev":"namespace0.0",
> > "mode":"fsdax",
> > "map":"dev",
> > "size":799063146496,
> > "uuid":"bf8dc912-e8c2-4099-8571-b0cc32cc9f68",
> > "blockdev":"pmem0",
> > "badblock_count":1,
> > "badblocks":[
> >   {

Re: [PATCH v7 07/12] ndctl: setup modprobe rules

2019-01-09 Thread Dan Williams
On Wed, Jan 9, 2019 at 9:55 AM Dave Jiang  wrote:
>
> Adding reference config file for modprobe.d in order to trigger the
> reference script that will inject keys associated with the nvdimms into
> the kernel user ring for unlock.
>
> Signed-off-by: Dave Jiang 
> ---
>  Makefile.am  |   10 ++
>  contrib/ndctl-loadkeys.sh|   25 +
>  contrib/nvdimm_modprobe.conf |1 +
>  3 files changed, 36 insertions(+)
>  create mode 100755 contrib/ndctl-loadkeys.sh
>  create mode 100644 contrib/nvdimm_modprobe.conf

This file is installed to /etc/modprobe.d, so no need to duplicate
"modprobe" in the name. I'd prefer "nvidimm-security.conf" to make it
explicit.

>
> diff --git a/Makefile.am b/Makefile.am
> index e0c463a3..5a3f03aa 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -42,6 +42,16 @@ bashcompletiondir = $(BASH_COMPLETION_DIR)
>  dist_bashcompletion_DATA = contrib/ndctl
>  endif
>
> +load_key_file = contrib/ndctl-loadkeys.sh
> +load_keydir = $(sysconfdir)/ndctl/
> +load_key_DATA = $(load_key_file)
> +EXTRA_DIST += $(load_key_file)

No need for EXTRA_DIST,  _DATA will do this.

> +
> +modprobe_file = contrib/nvdimm_modprobe.conf
> +modprobedir = $(sysconfdir)/modprobe.d/
> +modprobe_DATA = $(modprobe_file)
> +EXTRA_DIST += $(modprobe_file)

ditto.

> +
>  noinst_LIBRARIES = libccan.a
>  libccan_a_SOURCES = \
> ccan/str/str.h \
> diff --git a/contrib/ndctl-loadkeys.sh b/contrib/ndctl-loadkeys.sh
> new file mode 100755
> index ..bc2c94df
> --- /dev/null
> +++ b/contrib/ndctl-loadkeys.sh
> @@ -0,0 +1,25 @@
> +#!/bin/bash -Ex
> +
> +# This script assumes a single master key for all DIMMs
> +
> +key_path=/etc/ndctl/keys

Hard coded path, this should be constructed from the variables from
the Makefile.

Where is this shell script installed? I think this should become an
actual ndctl command rather than an on-the side shell script. It
otherwise seems odd to name it under "contrib" since this seems
generic enough to be the "official" solution. I think contrib should
be reserved for things that are not fundamental to the operation of
the utility. This seems integral to the security implementation.

The git command harness had support for optionally calling built-in C
routines or shell scripts, would just need to resurrect the support to
route "ndctl load-keys" to this script, likely installed to
/usr/libexec/ndctl.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 03/12] ndctl: add disable security support

2019-01-09 Thread Verma, Vishal L


On Wed, 2019-01-09 at 10:54 -0700, Dave Jiang wrote:
> Add support for disable security to libndctl and also command line option
> of "disable-passphrase" for ndctl. This provides a way to disable security
> on the nvdimm.
> 
> Signed-off-by: Dave Jiang 
> ---
>  Documentation/ndctl/Makefile.am  |3 +-
>  Documentation/ndctl/ndctl-disable-passphrase.txt |   32 +++
>  ndctl/builtin.h  |1 +
>  ndctl/dimm.c |   38 
> --
>  ndctl/lib/dimm.c |9 +
>  ndctl/lib/keys.c |   29 +
>  ndctl/lib/libndctl.sym   |2 +
>  ndctl/libndctl.h |8 +
>  ndctl/ndctl.c|1 +
>  9 files changed, 119 insertions(+), 4 deletions(-)
>  create mode 100644 Documentation/ndctl/ndctl-disable-passphrase.txt
> 
> diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
> index 7adb..31570a77 100644
> --- a/Documentation/ndctl/Makefile.am
> +++ b/Documentation/ndctl/Makefile.am
> @@ -49,7 +49,8 @@ man1_MANS = \
>   ndctl-list.1 \
>   ndctl-monitor.1 \
>   ndctl-enable-passphrase.1 \
> - ndctl-update-passphrase.1
> + ndctl-update-passphrase.1 \
> + ndctl-disable-passphrase.1
>  
>  CLEANFILES = $(man1_MANS)
>  
> diff --git a/Documentation/ndctl/ndctl-disable-passphrase.txt 
> b/Documentation/ndctl/ndctl-disable-passphrase.txt
> new file mode 100644
> index ..3c8bfe47
> --- /dev/null
> +++ b/Documentation/ndctl/ndctl-disable-passphrase.txt
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +ndctl-disable-passphrase(1)
> +===
> +
> +NAME
> +
> +ndctl-disable-passphrase - disabling passphrase for an NVDIMM
> +
> +SYNOPSIS
> +
> +[verse]
> +'ndctl disable-passphrase'  []
> +
> +DESCRIPTION
> +---
> +Provide a generic interface for disabling passphrase for NVDIMM.
> +Ndctl will search the user key ring for the associated DIMM. If no key is
> +found, it will attempt to load the key blob from the expected location. Ndctl
> +will remove the key and the key blob once passphrase is disabled.

This applies to the Description section of a few of the other patches
as well. the 'Ndctl' capitalized drew my attention to it, but thinking
about it, we should never say something like 'ndctl will do xyz'
the ndctl is implied, just rephrase as 'do xyz' in each of these cases.

So -

Provide a command for disabling the passphrase on an NVDIMM.

Search the user key ring for the associated NVDIMM. If not found,
attempt to load the key blob from the default location. After disabling
the passphrase, remove the key and the key blob.

You can adapt this to the other patches that reference ndctl in the
description section.

> +
> +OPTIONS
> +---
> +::
> +include::xable-dimm-options.txt[]
> +
> +-p::
> +--key-path=::
> + Path to where key related files resides. This parameter is optional
> + and the default is set to /etc/ndctl/keys.
> +
> +include::../copyright.txt[]
> diff --git a/ndctl/builtin.h b/ndctl/builtin.h
> index 231fda25..821ea690 100644
> --- a/ndctl/builtin.h
> +++ b/ndctl/builtin.h
> @@ -34,4 +34,5 @@ int cmd_update_firmware(int argc, const char **argv, struct 
> ndctl_ctx *ctx);
>  int cmd_inject_smart(int argc, const char **argv, struct ndctl_ctx *ctx);
>  int cmd_passphrase_setup(int argc, const char **argv, struct ndctl_ctx *ctx);
>  int cmd_passphrase_update(int argc, const char **argv, struct ndctl_ctx 
> *ctx);
> +int cmd_disable_passphrase(int argc, const char **argv, struct ndctl_ctx 
> *ctx);
>  #endif /* _NDCTL_BUILTIN_H_ */
> diff --git a/ndctl/dimm.c b/ndctl/dimm.c
> index 1ab6b29f..4f0466a1 100644
> --- a/ndctl/dimm.c
> +++ b/ndctl/dimm.c
> @@ -864,6 +864,18 @@ static int action_key_update(struct ndctl_dimm *dimm,
>   param.key_path);
>  }
>  
> +static int action_passphrase_disable(struct ndctl_dimm *dimm,
> + struct action_context *actx)
> +{
> + if (!ndctl_dimm_security_supported(dimm)) {
> + error("%s: security operation not supported\n",
> + ndctl_dimm_get_devname(dimm));
> + return -EOPNOTSUPP;
> + }
> +
> + return ndctl_dimm_disable_key(dimm, param.key_path);
> +}
> +
>  static int __action_init(struct ndctl_dimm *dimm,
>   enum ndctl_namespace_version version, int chk_only)
>  {
> @@ -953,12 +965,14 @@ OPT_BOOLEAN('f', "force", ¶m.force, \
>  OPT_STRING('V', "label-version", ¶m.labelversion, "version-number", \
>   "namespace label specification version (default: 1.1)")
>  
> -#define KEY_OPTIONS() \
> -OPT_STRING('m', "master-key", ¶m.master_key, ":", \
> - "master key for security"), \
> +#define KEY_BASE_OPTIONS() \
>  OPT_FILENAME('p', "key-path", ¶m.key_path, "key-path

Re: [PATCH v7 07/12] ndctl: setup modprobe rules

2019-01-09 Thread Verma, Vishal L


On Wed, 2019-01-09 at 10:54 -0700, Dave Jiang wrote:
> Adding reference config file for modprobe.d in order to trigger the
> reference script that will inject keys associated with the nvdimms
> into
> the kernel user ring for unlock.
> 
> Signed-off-by: Dave Jiang 
> ---
>  Makefile.am  |   10 ++
>  contrib/ndctl-loadkeys.sh|   25 +
>  contrib/nvdimm_modprobe.conf |1 +
>  3 files changed, 36 insertions(+)
>  create mode 100755 contrib/ndctl-loadkeys.sh
>  create mode 100644 contrib/nvdimm_modprobe.conf
> 
> diff --git a/Makefile.am b/Makefile.am
> index e0c463a3..5a3f03aa 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -42,6 +42,16 @@ bashcompletiondir = $(BASH_COMPLETION_DIR)
>  dist_bashcompletion_DATA = contrib/ndctl
>  endif
>  
> +load_key_file = contrib/ndctl-loadkeys.sh
> +load_keydir = $(sysconfdir)/ndctl/
> +load_key_DATA = $(load_key_file)
> +EXTRA_DIST += $(load_key_file)
> +
> +modprobe_file = contrib/nvdimm_modprobe.conf
> +modprobedir = $(sysconfdir)/modprobe.d/
> +modprobe_DATA = $(modprobe_file)
> +EXTRA_DIST += $(modprobe_file)
> +

We're installing these files via the Makefile, but I think the spec is
missing them? Presumably the spec should also install them in the same
way?

>  noinst_LIBRARIES = libccan.a
>  libccan_a_SOURCES = \
>   ccan/str/str.h \
> diff --git a/contrib/ndctl-loadkeys.sh b/contrib/ndctl-loadkeys.sh
> new file mode 100755
> index ..bc2c94df
> --- /dev/null
> +++ b/contrib/ndctl-loadkeys.sh
> @@ -0,0 +1,25 @@
> +#!/bin/bash -Ex

I didn't catch this before, but this script doesn't need -E since we're
not setting up a trap.
If anything use -e, the regular version.
I'm also not sure -x is needed - it is just an example script right? I 
don't feel strongly about it either way, if having the extra debug here
might be helpful we can keep it.

> +
> +# This script assumes a single master key for all DIMMs
> +
> +key_path=/etc/ndctl/keys
> +tpmh_path="$key_path"/tpm.handle
> +key_type=""
> +tpm_handle=""
> +id=""
> +
> +if [ -f $tpmh_path ]; then
> + key_type=trusted
> + tpm_handle="keyhandle=$(cat $tpmh_path)"
> +else
> + key_type=user
> +fi
> +
> +if ! keyctl search @u "$key_type" nvdimm-master; then
> + keyctl add "$key_type" nvdimm-master "load $(cat
> $key_path/nvdimm-master.blob) $tpm_handle" @u > /dev/null
> +fi
> +
> +for file in "$key_path"/nvdimm_*; do
> + id="$(cut -d'_' -f2 <<< "${file##*/}")"
> + keyctl add encrypted nvdimm:"$id" "load $(cat "$file")" @u
> +done
> diff --git a/contrib/nvdimm_modprobe.conf
> b/contrib/nvdimm_modprobe.conf
> new file mode 100644
> index ..b113d8d7
> --- /dev/null
> +++ b/contrib/nvdimm_modprobe.conf
> @@ -0,0 +1 @@
> +install libnvdimm /usr/sbin/ndctl-loadkeys.sh ; /sbin/modprobe --
> ignore-install libnvdimm $CMDLINE_OPTS

I'm not familiar with how modprobe.conf works, but is it looking for
ndctl-loadkeys.sh in /usr/sbin? If so are we installing it there? The
lines above in the Makefile seem to have it going into /etc/ndctl ?

> 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 06/12] ndctl: add unit test for security ops (minus overwrite)

2019-01-09 Thread Verma, Vishal L


On Wed, 2019-01-09 at 10:54 -0700, Dave Jiang wrote:
> Add unit test for security enable, disable, update, erase, unlock, and
> freeze.
> 
> Signed-off-by: Dave Jiang 
> ---
>  test/Makefile.am |4 +
>  test/security.sh |  203 
> ++
>  2 files changed, 207 insertions(+)
>  create mode 100755 test/security.sh
> 
> diff --git a/test/Makefile.am b/test/Makefile.am
> index ebdd23f6..42009c31 100644
> --- a/test/Makefile.am
> +++ b/test/Makefile.am
> @@ -27,6 +27,10 @@ TESTS =\
>   max_available_extent_ns.sh \
>   pfn-meta-errors.sh
>  
> +if ENABLE_KEYUTILS
> +TESTS += security.sh
> +endif
> +
>  check_PROGRAMS =\
>   libndctl \
>   dsm-fail \
> diff --git a/test/security.sh b/test/security.sh
> new file mode 100755
> index ..1dbe04d2
> --- /dev/null
> +++ b/test/security.sh
> @@ -0,0 +1,203 @@
> +#!/bin/bash -Ex
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2018 Intel Corporation. All rights reserved.
> +
> +rc=77
> +dev=""
> +id=""
> +dev_no=""
> +keypath="/etc/ndctl/keys"
> +masterkey="nvdimm-master-test"
> +masterpath="$keypath/$masterkey"
> +keyctl="/usr/bin/keyctl"

Hm, no need to hard-code the path for keyctl, it should be in the $PATH

> +
> +. ./common
> +
> +lockpath="/sys/devices/platform/${NFIT_TEST_BUS0}/nfit_test_dimm/test_dimm"
> +
> +trap 'err $LINENO' ERR
> +
> +check_prereq()
> +{
> + if [ ! -f "$keyctl" ]; then
> + echo "$keyctl does not exist."
> + exit 1
> + fi

And then instead of checking for the path here, test/common already has
a function for checking the presence of system utilities, just use
that. In fact that function is called check_prereq, so let us not
duplicate the name.

The keypath check below is still valid.

> +
> + if [ ! -d "$keypath" ]; then
> + echo "$keypath directory does not exist."
> + exit 1
> + fi
> +}
> +
> +setup()
> +{
> + $NDCTL disable-region -b "$NFIT_TEST_BUS0" all
> +}
> +
> +detect()
> +{
> + dev=$($NDCTL list -b "$NFIT_TEST_BUS0" -D | jq .[0].dev | tr -d '"')
> + [ -n "$dev" ] || err "$LINENO"
> + id=$($NDCTL list -b "$NFIT_TEST_BUS0" -D | jq .[0].id | tr -d '"')

Quote these command substitutions as well "$(...)"

Also, this applies to several existing tests, I'm sure, and that
cleanup can happen later..
But every jq query doesn't need to be appended with | tr -d '"' to
remove the quotes. jq has a '-r' option to do exactly that.

Maybe the 3 spots in this file where jq | tr is used can be corrected
as a start :)

> + [ -n "$id" ] || err "$LINENO"
> +}
> +
> +setup_keys()
> +{
> + if [ ! -f "$masterpath" ]; then
> + keyctl add user $masterkey "$(dd if=/dev/urandom bs=1 count=32 
> 2>/dev/null)" @u
> + keyctl pipe "$(keyctl search @u user $masterkey)" > $masterpath

Quotes around $masterkey and $masterpath

We defined $keyctl with a fixed path, but are just calling it directly
from the system PATH :)
Just just check_prereq from test/common to check for keyctl's presence,
and then use keyctl directly (like you do here).

> + else
> + echo "Unclean setup. Please cleanup $masterpath file."

Can't the test just perform this cleanup before it starts?

> + exit 1
> + fi
> +}
> +
> +test_cleanup()
> +{
> + keyctl unlink "$(keyctl search @u encrypted nvdimm:"$id")"
> + keyctl unlink "$(keyctl search @u user $masterkey)"
> + rm -f "$keypath"/nvdimm_"$id"\_"$(hostname)".blob

The underscore shouldn't need to be escaped?
You can also quote the whole thing at once instead of each variable,
looks much cleaner:

rm -f "${keypath}/nvdimm_${id}_$(hostname).blob"


> + rm -f "$masterpath"
> +}
> +
> +lock_dimm()
> +{
> + $NDCTL disable-dimm "$dev"
> + dev_no="$(echo "$dev" | cut -b 5-)"

I was going to say 'useless use of echo' -- you can do:

dev_no="$(cut -b 5- <<< "$dev")"

but there is really no need to call spawn a subshell to call cut
either, bash can do this directly by parameter expansion:

if dev is set to "dimm0", then:

dev_no="${dev#dimm}"

will drop the 'dimm' from $dev and yield '0'

> + echo 1 > "${lockpath}${dev_no}/lock_dimm"
> + sstate="$(get_security_state)"
> + if [ "$sstate" != "locked" ]; then
> + echo "Incorrect security state: $sstate expected: disabled"
> + exit 1
> + fi
> +}
> +
> +get_security_state()
> +{
> + $NDCTL list -i -b "$NFIT_TEST_BUS0" -d "$dev" | jq 
> .[].dimms[0].security | tr -d '"'
> +}
> +
> +enable_passphrase()
> +{
> + $NDCTL enable-passphrase -m user:"$masterkey" "$dev"
> + sstate=$(get_security_state)

Quote command substitution

> + if [ "$sstate" != "unlocked" ]; then
> + echo "Incorrect security state: $sstate expected: unlocked"
> + exit 1
> + fi
> +}
> +
> +disable_passphrase()
> +{
> + $NDCTL disable-passphrase "$dev"
> + sstate=$(get_security_state)

Quote co

PMEM error-handling forces SIGKILL causes kernel panic

2019-01-09 Thread Jane Chu

Hi, Dan,

Sorry for the late report.
We recently saw panics from PMEM error handling, here are the log messages
and stack trace.  "<--" are added by me.


[ 4488.098830] mce: Uncorrected hardware memory error in user-access at a6ec46f8   
   <--
[ 4488.131625] Memory failure: 0xa6ec46f: forcibly killing ora_pmon_tpcc1:84778
[ 4488.144326] Memory failure: 0xa6ec46f: forcibly killing ora_clmn_tpcc1:84780  
<-- SIGKILL
[ 4488.156965] Memory failure: 0xa6ec46f: forcibly killing ora_psp0_tpcc1:84782
[ 4488.169608] Memory failure: 0xa6ec46f: forcibly killing ora_ipc0_tpcc1:84785
[ 4488.182210] Memory failure: 0xa6ec46f: forcibly killing ora_vktm_tpcc1:84793
[ 4488.194924] Memory failure: 0xa6ec46f: forcibly killing ora_gen0_tpcc1:84797
[ 4488.207212] Memory failure: 0xa6ec46f: forcibly killing ora_mman_tpcc1:84799
[ 4488.220168] Memory failure: 0xa6ec46f: forcibly killing ora_scmn_tpcc1:84803
[ 4488.232974] Memory failure: 0xa6ec46f: forcibly killing ora_diag_tpcc1:84806
[ 4488.245660] Memory failure: 0xa6ec46f: forcibly killing ora_scmn_tpcc1:84808

..

[ 4488.595847] BUG: unable to handle kernel NULL pointer dereference at    
   <--
[ 4488.604834] IP: _raw_spin_lock_irqsave+0x27/0x48 <-- 
task->sighand->siglock is NULL
[ 4488.610079] PGD 51ef45067 P4D 51ef45067 PUD 51ef44067 PMD 0   
[ 4488.616514] Oops: 0002 [#1] SMP NOPTI<-- 2'010: no-page/write/kernel-mode

[ 4488.620674] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_n
[ 4488.712640]  lpc_ich shpchp wmi pcc_cpufreq dm_multipath binfmt_misc sg fuse
[ 4488.749400] CPU: 2 PID: 296408 Comm: oracle_296408_t Tainted: G   M 
[ 4488.764383] Hardware name: Oracle Corporation ORACLE SERVER X8-2/ASM,MB,X8-2,

[ 4488.778955] task: 997ccab28000 task.stack: b9b832944000
[ 4488.789802] RIP: 0010:_raw_spin_lock_irqsave+0x27/0x48
[ 4488.799874] RSP: 0018:b9b832947d28 EFLAGS: 00010046
[ 4488.810071] RAX:  RBX: 0808 RCX: 
[ 4488.822959] RDX: 0001 RSI: 0001 RDI: 0808
[ 4488.835375] RBP: b9b832947d38 R08:  R09: 52bb
[ 4488.847854] R10: 0001 R11: 00aa R12: 0246
[ 4488.860446] R13: 0004 R14: 0001 R15: 0009
[ 4488.872994] FS:  7fa0f17533c0() GS:98ba0068() knlGS:0
[ 4488.886705] CS:  0010 DS:  ES:  CR0: 80050033
[ 4488.897877] CR2: 0808 CR3: 00051146a004 CR4: 007606e0
[ 4488.910774] DR0:  DR1:  DR2: 
[ 4488.923665] DR3:  DR6: fffe0ff0 DR7: 0400
[ 4488.936517] PKRU: 5554
[ 4488.944467] Call Trace:
[ 4488.952152]  force_sig_info+0x2e/0xde
[ 4488.961266]  force_sig+0x16/0x18
[ 4488.970568]  kill_procs+0x15b/0x1a0<-- forcing  SIGKILL to the user 
process

<-- due to tk->addr_valid=0 
which means rmap()
<-- can't find a 'vma' hosting 
the offending
<-- 'pfn', because the process 
has exited by now.


[ 4488.979601]  memory_failure+0x1dd/0x235
[ 4488.989071]  do_machine_check+0x738/0xc89
[ 4488.998739]  ? devm_free_irq+0x22/0x71
[ 4489.008129]  ? machine_check+0x115/0x124
[ 4489.017708]  do_mce+0x15/0x17
[ 4489.026260]  machine_check+0x11f/0x124
[ 4489.035725] RIP: 0033:0x1253b9e6
[ 4489.044695] RSP: 002b:7ffc205c1600 EFLAGS: 00010246
[ 4489.055964] RAX: 384d RBX: 0003 RCX: 002f
[ 4489.069509] RDX: 0007 RSI: 3f12 RDI: 3786
[ 4489.083065] RBP: 7ffc205c16a0 R08: 009fa526c0e4 R09: 009fa526c064
[ 4489.096672] R10:  R11:  R12: 009fa526c086
[ 4489.110342] R13: 7fa0eb399c08 R14: 0015 R15: 009fa526f8b1
[ 4489.124026] Code: 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 9c
[ 4489.156598] RIP: _raw_spin_lock_irqsave+0x27/0x48 RSP: b9b832947d28



After reboot, the offending 'pfn' is recorded in the 'badblock' record in 
namespace0.0.


"dev":"namespace0.0",
"mode":"fsdax",
"map":"dev",
"size":799063146496,
"uuid":"bf8dc912-e8c2-4099-8571-b0cc32cc9f68",
"blockdev":"pmem0",
"badblock_count":1,
"badblocks":[
  {
"offset":969675644,   <-- XXX
"length":1,
"dimms":[
  "nmem4"
]


Per /proc/iomem -
0306000 - 0ed5fff   pmem0
pmem0 is region0 with size = 811748818944,
namespace0.0 was created over the entire region0, usable size = 799063146496
so the metadata took (811748818944 - 799063146496) = 0x2F420 bytes
badblock address: (969675644 * 512) + 0x2F420 + 0x306000 = 0xa6ec46f800
offending 'pfn': 0xa6ec46f  at starting address 0xa6ec46f000.

Some additional information worth to be noted here:
1. The work

Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta


> >
> > This patch adds 'DAXDEV_BUFFERED' flag which is set
> > for virtio pmem corresponding nd_region. This later
> > is used to disable MAP_SYNC functionality for ext4
> > & xfs filesystem.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/dax/super.c  | 17 +
> >  drivers/nvdimm/pmem.c|  3 +++
> >  drivers/nvdimm/region_devs.c |  7 +++
> >  drivers/virtio/pmem.c|  1 +
> >  include/linux/dax.h  |  9 +
> >  include/linux/libnvdimm.h|  6 ++
> >  6 files changed, 43 insertions(+)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 6e928f3..9128740 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -167,6 +167,8 @@ enum dax_device_flags {
> > DAXDEV_ALIVE,
> > /* gate whether dax_flush() calls the low level flush routine */
> > DAXDEV_WRITE_CACHE,
> > +   /* flag to disable MAP_SYNC for virtio based host page cache flush
> > */
> > +   DAXDEV_BUFFERED,
> >  };
> >
> >  /**
> > @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device
> > *dax_dev)
> >  }
> >  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
> >
> > +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
> > +{
> > +   if (wc)
> > +   set_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> > +   else
> > +   clear_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
> 
> The "write_cache" property was structured this way because it can
> conceivably change at runtime. The MAP_SYNC capability should be
> static and never changed after init.

o.k. Will change.

> 
> > +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
> > +{
> > +   return test_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
> 
> Echoing Darrick and Jan this is should be a generic property of a
> dax_device and not specific to virtio. I don't like the "buffered"
> designation as that's not accurate. There may be hardware reasons why
> a dax_device is not synchronous, like a requirement to flush a
> write-pending queue or otherwise notify the device of new writes.

Agree.

> 
> I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I
> would also modify alloc_dax() to take a flags argument so that the
> capability can be instantiated when the dax_device is allocated.

o.k. Will make the change.

Thanks,
Pankaj
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta


> > Virtio pmem provides asynchronous host page cache flush
> > mechanism. we don't support 'MAP_SYNC' with virtio pmem
> > and xfs.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  fs/xfs/xfs_file.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e474250..eae4aa4 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1190,6 +1190,14 @@ xfs_file_mmap(
> > if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > return -EOPNOTSUPP;
> >  
> > +   /* We don't support synchronous mappings with guest direct access
> > +* and virtio based host page cache mechanism.
> > +*/
> > +   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
> 
> Echoing what Jan said, this ought to be some sort of generic function
> that tells us whether or not memory mapped from the dax device will
> always still be accessible even after a crash (i.e. supports MAP_SYNC).

o.k
> 
> What if the underlying file on the host is itself on pmem and can be
> MAP_SYNC'd?  Shouldn't the guest be able to use MAP_SYNC as well?

Guest MAP_SYNC on actual host pmem will sync guest metadata, as guest 
writes are persistent on actual pmem. Host side Qemu MAP_SYNC enabling
work for pmem is in progress. It will make sure metadata at host side
is in consistent state after a crash or any other metadata corruption 
operation.

For virtio-pmem, we are emulating pmem device over regular storage on
host side. Guest need to call fsync followed by write to make sure
guest metadata is in consistent state(or journalled). Host backing file
data & metadata will also be persistent after guest to host fsync.

Thanks,
Pankaj

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v7 10/12] ndctl: master phassphrase management support

2019-01-09 Thread Dave Jiang
Adding master passphrase enabling and update to ndctl. This is a new
feature from Intel DSM v1.8.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/ndctl-enable-passphrase.txt |7 +
 Documentation/ndctl/ndctl-update-passphrase.txt |7 +
 ndctl/dimm.c|   13 +-
 ndctl/lib/dimm.c|9 ++
 ndctl/lib/keys.c|  127 ---
 ndctl/lib/libndctl.sym  |1 
 ndctl/libndctl.h|   14 ++-
 7 files changed, 130 insertions(+), 48 deletions(-)

diff --git a/Documentation/ndctl/ndctl-enable-passphrase.txt 
b/Documentation/ndctl/ndctl-enable-passphrase.txt
index c14a206c..c025b1c3 100644
--- a/Documentation/ndctl/ndctl-enable-passphrase.txt
+++ b/Documentation/ndctl/ndctl-enable-passphrase.txt
@@ -29,7 +29,7 @@ OPTIONS
 include::xable-dimm-options.txt[]
 
 -m::
---master=::
+--master-key=::
Key name for the master key used to seal the NVDIMM security keys.
The format would be :
i.e.: trusted:master-nvdimm
@@ -39,4 +39,9 @@ include::xable-dimm-options.txt[]
Path to where key related files resides. This parameter is optional
and the default is set to /etc/ndctl/keys.
 
+-M::
+--master-passphrase::
+   Indicates that we are managing the master passphrase instead of the
+   user passphrase.
+
 include::../copyright.txt[]
diff --git a/Documentation/ndctl/ndctl-update-passphrase.txt 
b/Documentation/ndctl/ndctl-update-passphrase.txt
index dd6e4e4e..8b5dfe01 100644
--- a/Documentation/ndctl/ndctl-update-passphrase.txt
+++ b/Documentation/ndctl/ndctl-update-passphrase.txt
@@ -26,7 +26,7 @@ OPTIONS
 include::xable-dimm-options.txt[]
 
 -m::
---master::
+--master-key=::
New key name for the master key to seal the new nvdimm key, or the
existing master key name. i.e trusted:master-key.
 
@@ -35,4 +35,9 @@ include::xable-dimm-options.txt[]
Path to where key related files resides. This parameter is optional
and the default is set to /etc/ndctl/keys.
 
+-M::
+--master-passphrase::
+   Parameter to indicate that we are managing the master passphrase
+   instead of the user passphrase.
+
 include::../copyright.txt[]
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index ce477981..4875e414 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -49,6 +49,7 @@ static struct parameters {
const char *master_key;
bool crypto_erase;
bool overwrite;
+   bool master_pass;
bool force;
bool json;
bool verbose;
@@ -849,8 +850,8 @@ static int action_key_enable(struct ndctl_dimm *dimm,
return -EOPNOTSUPP;
}
 
-   return ndctl_dimm_enable_key(dimm, param.master_key,
-   param.key_path);
+   return ndctl_dimm_enable_key(dimm, param.master_key, param.key_path,
+   param.master_pass ? ND_MASTER_KEY : ND_USER_KEY);
 }
 
 static int action_key_update(struct ndctl_dimm *dimm,
@@ -862,8 +863,8 @@ static int action_key_update(struct ndctl_dimm *dimm,
return -EOPNOTSUPP;
}
 
-   return ndctl_dimm_update_key(dimm, param.master_key,
-   param.key_path);
+   return ndctl_dimm_update_key(dimm, param.master_key, param.key_path,
+   param.master_pass ? ND_MASTER_KEY : ND_USER_KEY);
 }
 
 static int action_passphrase_disable(struct ndctl_dimm *dimm,
@@ -1044,7 +1045,9 @@ OPT_FILENAME('p', "key-path", ¶m.key_path, 
"key-path", \
 
 #define KEY_OPTIONS() \
 OPT_STRING('m', "master-key", ¶m.master_key, ":", \
-   "master key for security")
+   "master key for security"), \
+OPT_BOOLEAN('M', "master-passphrase", ¶m.master_pass, \
+   "use master passphrase")
 
 #define SANITIZE_OPTIONS() \
 OPT_BOOLEAN('c', "crypto-erase", ¶m.crypto_erase, \
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index eb950331..dc945296 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -771,3 +771,12 @@ NDCTL_EXPORT int ndctl_dimm_wait_overwrite(struct 
ndctl_dimm *dimm)
close(fd);
return rc;
 }
+
+NDCTL_EXPORT int ndctl_dimm_update_master_passphrase(struct ndctl_dimm *dimm,
+   long ckey, long nkey)
+{
+   char buf[SYSFS_ATTR_SIZE];
+
+   sprintf(buf, "master_update %ld %ld\n", ckey, nkey);
+   return write_security(dimm, buf);
+}
diff --git a/ndctl/lib/keys.c b/ndctl/lib/keys.c
index 2eb23d38..fc71cc2b 100644
--- a/ndctl/lib/keys.c
+++ b/ndctl/lib/keys.c
@@ -21,7 +21,7 @@
 #define KEY_CMD_SIZE   128
 
 static int get_key_path(struct ndctl_dimm *dimm, char *path,
-   enum ndctl_key_type key_type, const char *keypath)
+   const char *keypath, enum ndctl_key_type key_type)
 {
struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
char hostname[HOST_NAME_MAX];
@@ -33,16 +33,29 @@ static int get_key_path(struct ndctl_dimm *dimm, char *path,
 

[PATCH v7 12/12] ndctl: documentation for security and key management

2019-01-09 Thread Dave Jiang
Add a "Theory of Operation" section describing the Intel DSM operations to
the relevant man pages.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/intel-nvdimm-security.txt|  140 ++
 Documentation/ndctl/ndctl-disable-passphrase.txt |2 
 Documentation/ndctl/ndctl-enable-passphrase.txt  |2 
 Documentation/ndctl/ndctl-freeze-security.txt|2 
 Documentation/ndctl/ndctl-sanitize-dimm.txt  |2 
 Documentation/ndctl/ndctl-update-passphrase.txt  |2 
 6 files changed, 150 insertions(+)
 create mode 100644 Documentation/ndctl/intel-nvdimm-security.txt

diff --git a/Documentation/ndctl/intel-nvdimm-security.txt 
b/Documentation/ndctl/intel-nvdimm-security.txt
new file mode 100644
index ..b8443e12
--- /dev/null
+++ b/Documentation/ndctl/intel-nvdimm-security.txt
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+
+THEORY OF OPERATOIN
+---
+With the introduction of Intel Device Specific Methods (DSM) specification
+v1.7 and v1.8 [1], security DSMs were introduced. The operations supported by 
+ndctl are enable passhprase, update passphrase, disable security,
+freeze security, secure (crypto) erase, overwrite, master passphrase enable,
+master passphrase update, and master passphrase secure (crypto) erase.
+The 'unlock' DSM is not supported by ndctl, that is left for the kernel to
+manage with some assistance from userspace.
+
+The security management for nvdimm is composed of two parts. The front end
+utilizes the Linux key management framework (trusted and encrypted keys [2]).
+It uses the keyutils on the user side and Linux key management APIs in
+the kernel. The backend takes the decrypted payload of the key and passes the
+plaintext payload to the nvdimm for processing.
+
+Unlike typical DSMs, the security DSMs are managed through the 'security'
+sysfs attribute under the dimm devices rather than an ioctl call by libndctl.
+The relevant key id is written to the 'security' attribute and the kernel will
+pull that key from the kernel's user key ring for processing.
+
+The entire security process starts with a master key that is used to seal the
+encrypted keys that are used to protect the passphrase for each nvdimm. We
+recommend using the *system* master key from the Trusted Platform
+Module (TPM), but a master key generated by the TPM can also
+be used. For testing purposes a user key with randomized payload can
+also be served as a master key. See [2] for details. To perform any security
+operations, it is expected that at the minimum the master key is already
+in the kernel's user keyring as shown in example below:
+
+> keyctl show
+Session Keyring
+ 736023423 --alswrv  0 0  keyring: _ses
+ 675104189 --alswrv  0 65534   \_ keyring: _uid.0
+ 680187394 --alswrv  0 0   \_ trusted: nvdimm-master
+
+Except for 'overwrite', all operations expect the relevant regions associated
+with the nvdimm are disabled before proceeding. For 'overwrite', in addition
+to the regions, the dimm itself is expected to be disabled.
+
+The following sections describe specifics of each security features.
+
+UNLOCK
+--
+Unlock is performed by the kernel, however a preparation step must happen
+before the unlock DSM can be issued by the kernel. The expectation is that
+during initramfs, a setup script is called before the libnvdimm module is
+loaded by modprobe. This script script will inject the master key and the
+related encrypted keys into the kernel's user key ring. A reference modprobe
+config file and a setup script have been provided by ndctl. During the 'probe'
+of the nvdimm driver, it will:
+1. First, check the security state of the device and see if the DIMM is locked
+2. Request the associated encrypted key from the kernel's user key ring.
+3. Finally, create the unlock DSM, copy the decrypted payload into the DSM
+   passphrase field, and issue the DSM to unlock the DIMM.
+
+If the DIMM is already unlocked, the kernel will attempt to revalidate the key.
+This can be overriden with a kernel module parameter. If we fail to revalidate
+the key, the kernel will freeze the security and disallow any further security
+configuration changes.
+
+ENABLE USER PASSPHRASE
+--
+To enable the user passphrase for a DIMM, it is expected that the master key
+is already in the kernel's user key ring and the master key name is passed to
+ndctl so it can do key management. An encrypted key with a 32 bytes payload
+and encrypted key format 'enc32' is created and sealed by the master key. Be
+aware that the passphrase is never provided by or visible to the user.
+The decrypted payload for the encrypted key will be randomly generated by the
+kernel and userspace does not have access to this decrypted payload. When the
+encrypted key is created, a binary blob of the encrypted key is written to the
+designated key blob storage directory (/etc/ndctl/keys as default). The user is
+responsible for backing up the dimm key blobs

[PATCH v7 11/12] ndctl: add master secure erase support

2019-01-09 Thread Dave Jiang
Intel DSM v1.8 introduced the concept of master passphrase and allowing
nvdimm to be secure erased via the master passphrase in addition to the
user passphrase. Add ndctl support to provide master passphrase secure
erase.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/ndctl-sanitize-dimm.txt |6 ++
 ndctl/dimm.c|   14 --
 ndctl/lib/dimm.c|9 +
 ndctl/lib/keys.c|   28 +++
 ndctl/lib/libndctl.sym  |1 +
 ndctl/libndctl.h|5 +++--
 6 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/Documentation/ndctl/ndctl-sanitize-dimm.txt 
b/Documentation/ndctl/ndctl-sanitize-dimm.txt
index d37c2a4b..f8ffb42c 100644
--- a/Documentation/ndctl/ndctl-sanitize-dimm.txt
+++ b/Documentation/ndctl/ndctl-sanitize-dimm.txt
@@ -38,4 +38,10 @@ include::xable-dimm-options.txt[]
 --ovewrite::
Wipe the entire DIMM, including label data. Can take significant time.
 
+-M::
+--master_passphrase::
+   Parameter to indicate that we are managing the master passphrase
+   instead of the user passphrase. This only is applicable to the
+   crypto-erase option.
+
 include::../copyright.txt[]
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index 4875e414..7f2d4873 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -908,6 +908,12 @@ static int action_sanitize_dimm(struct ndctl_dimm *dimm,
return -EOPNOTSUPP;
}
 
+   if (param.overwrite && param.master_pass) {
+   error("%s: overwrite does not support master passphrase\n",
+   ndctl_dimm_get_devname(dimm));
+   return -EINVAL;
+   }
+
/*
 * Setting crypto erase to be default. The other method will be
 * overwrite.
@@ -918,7 +924,9 @@ static int action_sanitize_dimm(struct ndctl_dimm *dimm,
}
 
if (param.crypto_erase) {
-   rc = ndctl_dimm_secure_erase_key(dimm, param.key_path);
+   rc = ndctl_dimm_secure_erase_key(dimm, param.key_path,
+   param.master_pass ?
+   ND_MASTER_KEY : ND_USER_KEY);
if (rc < 0)
return rc;
}
@@ -1053,7 +1061,9 @@ OPT_BOOLEAN('M', "master-passphrase", ¶m.master_pass, 
\
 OPT_BOOLEAN('c', "crypto-erase", ¶m.crypto_erase, \
"crypto erase a dimm"), \
 OPT_BOOLEAN('o', "overwrite", ¶m.overwrite, \
-   "overwrite a dimm")
+   "overwrite a dimm"), \
+OPT_BOOLEAN('M', "master-passphrase", ¶m.master_pass, \
+   "use master passphrase")
 
 static const struct option read_options[] = {
BASE_OPTIONS(),
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index dc945296..b9bf9cc2 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -780,3 +780,12 @@ NDCTL_EXPORT int 
ndctl_dimm_update_master_passphrase(struct ndctl_dimm *dimm,
sprintf(buf, "master_update %ld %ld\n", ckey, nkey);
return write_security(dimm, buf);
 }
+
+NDCTL_EXPORT int ndctl_dimm_master_secure_erase(struct ndctl_dimm *dimm,
+   long key)
+{
+   char buf[SYSFS_ATTR_SIZE];
+
+   sprintf(buf, "master_erase %ld\n", key);
+   return write_security(dimm, buf);
+}
diff --git a/ndctl/lib/keys.c b/ndctl/lib/keys.c
index fc71cc2b..dba53c69 100644
--- a/ndctl/lib/keys.c
+++ b/ndctl/lib/keys.c
@@ -447,13 +447,13 @@ NDCTL_EXPORT int ndctl_dimm_update_key(struct ndctl_dimm 
*dimm,
 
 static int check_key_run_and_discard(struct ndctl_dimm *dimm,
int (*run_op)(struct ndctl_dimm *, long), const char *name,
-   const char *keypath)
+   const char *keypath, enum ndctl_key_type key_type)
 {
struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
key_serial_t key;
int rc;
 
-   key = dimm_check_key(dimm, ND_USER_KEY);
+   key = dimm_check_key(dimm, key_type);
if (key < 0) {
key = dimm_load_key(dimm, keypath, ND_USER_KEY);
if (key < 0 && run_op != ndctl_dimm_overwrite) {
@@ -470,8 +470,12 @@ static int check_key_run_and_discard(struct ndctl_dimm 
*dimm,
return rc;
}
 
+   /* we do not delete the key if master secure erase */
+   if (key_type == ND_MASTER_KEY)
+   return 0;
+
if (key) {
-   rc = dimm_remove_key(dimm, keypath, ND_USER_KEY);
+   rc = dimm_remove_key(dimm, keypath, key_type);
if (rc < 0)
err(ctx, "Unable to cleanup key.\n");
}
@@ -482,19 +486,27 @@ NDCTL_EXPORT int ndctl_dimm_disable_key(struct ndctl_dimm 
*dimm,
const char *keypath)
 {
return check_key_run_and_discard(dimm, ndctl_dimm_disable_passphrase,
-   "disable passphrase", keypath);
+   "disable passphrase", keypath, 

[PATCH v7 09/12] ndctl: add wait-overwrite support

2019-01-09 Thread Dave Jiang
Add a blocking 'wait-overwrite' command to ndctl to let a user wait for an
overwrite operation on a dimm to complete.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/Makefile.am  |3 +
 Documentation/ndctl/ndctl-wait-overwrite.txt |   31 ++
 ndctl/builtin.h  |1 
 ndctl/dimm.c |   27 +
 ndctl/lib/dimm.c |   78 ++
 ndctl/lib/libndctl.sym   |1 
 ndctl/libndctl.h |1 
 ndctl/ndctl.c|1 
 8 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ndctl/ndctl-wait-overwrite.txt

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index bbea9674..a60a67e5 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -52,7 +52,8 @@ man1_MANS = \
ndctl-update-passphrase.1 \
ndctl-disable-passphrase.1 \
ndctl-freeze-security.1 \
-   ndctl-sanitize-dimm.1
+   ndctl-sanitize-dimm.1 \
+   ndctl-wait-overwrite.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-wait-overwrite.txt 
b/Documentation/ndctl/ndctl-wait-overwrite.txt
new file mode 100644
index ..5d4c72ef
--- /dev/null
+++ b/Documentation/ndctl/ndctl-wait-overwrite.txt
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-wait-overwrite(1)
+===
+
+NAME
+
+ndctl-wait-overwrite - wait for nvdimm overwrite operation to complete
+
+SYNOPSIS
+
+[verse]
+'ndctl wait-overwrite'  []
+
+DESCRIPTION
+---
+The kernel provides a POLL(2) capable sysfs file ('security') to indicate
+the state of overwrite. The 'ndctl wait-overwrite' operation waits for
+a change in the state of the 'security' file across all specified dimms.
+
+OPTIONS
+---
+-v::
+--verbose::
+   Emit debug messages for the overwrite wait process
+
+include::../copyright.txt[]
+
+SEE ALSO
+
+linkndctl:ndctl-sanitize-dimm[1]
diff --git a/ndctl/builtin.h b/ndctl/builtin.h
index 55bee47c..a8472e87 100644
--- a/ndctl/builtin.h
+++ b/ndctl/builtin.h
@@ -37,4 +37,5 @@ int cmd_passphrase_update(int argc, const char **argv, struct 
ndctl_ctx *ctx);
 int cmd_disable_passphrase(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_freeze_security(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_sanitize_dimm(int argc, const char **argv, struct ndctl_ctx *ctx);
+int cmd_wait_overwrite(int argc, const char **argv, struct ndctl_ctx *ctx);
 #endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index 799823d6..ce477981 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -931,6 +931,24 @@ static int action_sanitize_dimm(struct ndctl_dimm *dimm,
return 0;
 }
 
+static int action_wait_overwrite(struct ndctl_dimm *dimm,
+   struct action_context *actx)
+{
+   int rc;
+
+   if (!ndctl_dimm_security_supported(dimm)) {
+   error("%s: security operation not supported\n",
+   ndctl_dimm_get_devname(dimm));
+   return -EOPNOTSUPP;
+   }
+
+   rc = ndctl_dimm_wait_overwrite(dimm);
+   if (rc == 1)
+   printf("%s: overwrite completed.\n",
+   ndctl_dimm_get_devname(dimm));
+   return rc;
+}
+
 static int __action_init(struct ndctl_dimm *dimm,
enum ndctl_namespace_version version, int chk_only)
 {
@@ -1378,3 +1396,12 @@ int cmd_sanitize_dimm(int argc, const char **argv, void 
*ctx)
count >= 0 ? count : 0, count > 1 ? "s" : "");
return count >= 0 ? 0 : EXIT_FAILURE;
 }
+
+int cmd_wait_overwrite(int argc, const char **argv, void *ctx)
+{
+   int count = dimm_action(argc, argv, ctx, action_wait_overwrite,
+   base_options,
+   "ndctl wait-overwrite  [..] 
[]");
+
+   return count >= 0 ? 0 : EXIT_FAILURE;
+}
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index f27b966c..eb950331 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "private.h"
 
 static const char NSINDEX_SIGNATURE[] = "NAMESPACE_INDEX\0";
@@ -693,3 +694,80 @@ NDCTL_EXPORT int ndctl_dimm_overwrite(struct ndctl_dimm 
*dimm, long key)
sprintf(buf, "overwrite %ld\n", key);
return write_security(dimm, buf);
 }
+
+NDCTL_EXPORT int ndctl_dimm_wait_overwrite(struct ndctl_dimm *dimm)
+{
+   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
+   struct pollfd fds;
+   char buf[SYSFS_ATTR_SIZE];
+   int fd = 0, rc;
+   char *path = dimm->dimm_buf;
+   int len = dimm->buf_len;
+
+   if (snprintf(path, len, "%s/security", dimm->dimm_path) >= len) {
+   err(ctx, "%s: buffer too small!\n",
+   ndctl_dimm_get_devname(dimm));
+

[PATCH v7 08/12] ndctl: add overwrite operation support

2019-01-09 Thread Dave Jiang
Add support for overwrite to libndctl. The operation will be triggered
by the sanitize-dimm command with -o switch. This will initiate the request
to wipe the entire nvdimm. Success return of the command only indicate
overwrite has started and does not indicate completion of overwrite.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/ndctl-sanitize-dimm.txt |4 
 ndctl/dimm.c|   21 +
 ndctl/lib/dimm.c|8 
 ndctl/lib/keys.c|   25 +
 ndctl/lib/libndctl.sym  |2 ++
 ndctl/libndctl.h|8 
 6 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/Documentation/ndctl/ndctl-sanitize-dimm.txt 
b/Documentation/ndctl/ndctl-sanitize-dimm.txt
index a1eab8bc..d37c2a4b 100644
--- a/Documentation/ndctl/ndctl-sanitize-dimm.txt
+++ b/Documentation/ndctl/ndctl-sanitize-dimm.txt
@@ -34,4 +34,8 @@ include::xable-dimm-options.txt[]
Replaces encryption keys and securely erases the data. This does not
change label data. This is the default sanitize method.
 
+-o::
+--ovewrite::
+   Wipe the entire DIMM, including label data. Can take significant time.
+
 include::../copyright.txt[]
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index a91b40d5..799823d6 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -48,6 +48,7 @@ static struct parameters {
const char *key_path;
const char *master_key;
bool crypto_erase;
+   bool overwrite;
bool force;
bool json;
bool verbose;
@@ -910,7 +911,7 @@ static int action_sanitize_dimm(struct ndctl_dimm *dimm,
 * Setting crypto erase to be default. The other method will be
 * overwrite.
 */
-   if (!param.crypto_erase) {
+   if (!param.crypto_erase && !param.overwrite) {
param.crypto_erase = true;
printf("No santize method passed in, default to 
crypto-erase\n");
}
@@ -921,6 +922,12 @@ static int action_sanitize_dimm(struct ndctl_dimm *dimm,
return rc;
}
 
+   if (param.overwrite) {
+   rc = ndctl_dimm_overwrite_key(dimm, param.key_path);
+   if (rc < 0)
+   return rc;
+   }
+
return 0;
 }
 
@@ -1023,7 +1030,9 @@ OPT_STRING('m', "master-key", ¶m.master_key, 
":", \
 
 #define SANITIZE_OPTIONS() \
 OPT_BOOLEAN('c', "crypto-erase", ¶m.crypto_erase, \
-   "crypto erase a dimm")
+   "crypto erase a dimm"), \
+OPT_BOOLEAN('o', "overwrite", ¶m.overwrite, \
+   "overwrite a dimm")
 
 static const struct option read_options[] = {
BASE_OPTIONS(),
@@ -1361,7 +1370,11 @@ int cmd_sanitize_dimm(int argc, const char **argv, void 
*ctx)
sanitize_options,
"ndctl sanitize-dimm  [..] 
[]");
 
-   fprintf(stderr, "sanitized %d nmem%s.\n", count >= 0 ? count : 0,
-   count > 1 ? "s" : "");
+   if (param.overwrite)
+   fprintf(stderr, "overwrite issued for %d nmem%s.\n",
+   count >= 0 ? count : 0, count > 1 ? "s" : "");
+   else
+   fprintf(stderr, "sanitized %d nmem%s.\n",
+   count >= 0 ? count : 0, count > 1 ? "s" : "");
return count >= 0 ? 0 : EXIT_FAILURE;
 }
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index 285e09ce..f27b966c 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -685,3 +685,11 @@ NDCTL_EXPORT int ndctl_dimm_secure_erase(struct ndctl_dimm 
*dimm, long key)
sprintf(buf, "erase %ld\n", key);
return write_security(dimm, buf);
 }
+
+NDCTL_EXPORT int ndctl_dimm_overwrite(struct ndctl_dimm *dimm, long key)
+{
+   char buf[SYSFS_ATTR_SIZE];
+
+   sprintf(buf, "overwrite %ld\n", key);
+   return write_security(dimm, buf);
+}
diff --git a/ndctl/lib/keys.c b/ndctl/lib/keys.c
index 95abd61a..2eb23d38 100644
--- a/ndctl/lib/keys.c
+++ b/ndctl/lib/keys.c
@@ -85,10 +85,9 @@ static char *load_key_blob(struct ndctl_ctx *ctx, const char 
*path, int *size)
char prefix[] = "load ";
 
rc = stat(path, &st);
-   if (rc < 0) {
-   err(ctx, "stat: %s\n", strerror(errno));
+   if (rc < 0)
return NULL;
-   }
+
if ((st.st_mode & S_IFMT) != S_IFREG) {
err(ctx, "%s not a regular file\n", path);
return NULL;
@@ -404,10 +403,11 @@ static int check_key_run_and_discard(struct ndctl_dimm 
*dimm,
key = dimm_check_key(dimm, false);
if (key < 0) {
key = dimm_load_key(dimm, false, keypath);
-   if (key < 0) {
+   if (key < 0 && run_op != ndctl_dimm_overwrite) {
err(ctx, "Unable to load key\n");
return -ENOKEY;
-   }
+   } else
+

[PATCH v7 07/12] ndctl: setup modprobe rules

2019-01-09 Thread Dave Jiang
Adding reference config file for modprobe.d in order to trigger the
reference script that will inject keys associated with the nvdimms into
the kernel user ring for unlock.

Signed-off-by: Dave Jiang 
---
 Makefile.am  |   10 ++
 contrib/ndctl-loadkeys.sh|   25 +
 contrib/nvdimm_modprobe.conf |1 +
 3 files changed, 36 insertions(+)
 create mode 100755 contrib/ndctl-loadkeys.sh
 create mode 100644 contrib/nvdimm_modprobe.conf

diff --git a/Makefile.am b/Makefile.am
index e0c463a3..5a3f03aa 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -42,6 +42,16 @@ bashcompletiondir = $(BASH_COMPLETION_DIR)
 dist_bashcompletion_DATA = contrib/ndctl
 endif
 
+load_key_file = contrib/ndctl-loadkeys.sh
+load_keydir = $(sysconfdir)/ndctl/
+load_key_DATA = $(load_key_file)
+EXTRA_DIST += $(load_key_file)
+
+modprobe_file = contrib/nvdimm_modprobe.conf
+modprobedir = $(sysconfdir)/modprobe.d/
+modprobe_DATA = $(modprobe_file)
+EXTRA_DIST += $(modprobe_file)
+
 noinst_LIBRARIES = libccan.a
 libccan_a_SOURCES = \
ccan/str/str.h \
diff --git a/contrib/ndctl-loadkeys.sh b/contrib/ndctl-loadkeys.sh
new file mode 100755
index ..bc2c94df
--- /dev/null
+++ b/contrib/ndctl-loadkeys.sh
@@ -0,0 +1,25 @@
+#!/bin/bash -Ex
+
+# This script assumes a single master key for all DIMMs
+
+key_path=/etc/ndctl/keys
+tpmh_path="$key_path"/tpm.handle
+key_type=""
+tpm_handle=""
+id=""
+
+if [ -f $tpmh_path ]; then
+   key_type=trusted
+   tpm_handle="keyhandle=$(cat $tpmh_path)"
+else
+   key_type=user
+fi
+
+if ! keyctl search @u "$key_type" nvdimm-master; then
+   keyctl add "$key_type" nvdimm-master "load $(cat 
$key_path/nvdimm-master.blob) $tpm_handle" @u > /dev/null
+fi
+
+for file in "$key_path"/nvdimm_*; do
+   id="$(cut -d'_' -f2 <<< "${file##*/}")"
+   keyctl add encrypted nvdimm:"$id" "load $(cat "$file")" @u
+done
diff --git a/contrib/nvdimm_modprobe.conf b/contrib/nvdimm_modprobe.conf
new file mode 100644
index ..b113d8d7
--- /dev/null
+++ b/contrib/nvdimm_modprobe.conf
@@ -0,0 +1 @@
+install libnvdimm /usr/sbin/ndctl-loadkeys.sh ; /sbin/modprobe 
--ignore-install libnvdimm $CMDLINE_OPTS

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v7 06/12] ndctl: add unit test for security ops (minus overwrite)

2019-01-09 Thread Dave Jiang
Add unit test for security enable, disable, update, erase, unlock, and
freeze.

Signed-off-by: Dave Jiang 
---
 test/Makefile.am |4 +
 test/security.sh |  203 ++
 2 files changed, 207 insertions(+)
 create mode 100755 test/security.sh

diff --git a/test/Makefile.am b/test/Makefile.am
index ebdd23f6..42009c31 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -27,6 +27,10 @@ TESTS =\
max_available_extent_ns.sh \
pfn-meta-errors.sh
 
+if ENABLE_KEYUTILS
+TESTS += security.sh
+endif
+
 check_PROGRAMS =\
libndctl \
dsm-fail \
diff --git a/test/security.sh b/test/security.sh
new file mode 100755
index ..1dbe04d2
--- /dev/null
+++ b/test/security.sh
@@ -0,0 +1,203 @@
+#!/bin/bash -Ex
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+rc=77
+dev=""
+id=""
+dev_no=""
+keypath="/etc/ndctl/keys"
+masterkey="nvdimm-master-test"
+masterpath="$keypath/$masterkey"
+keyctl="/usr/bin/keyctl"
+
+. ./common
+
+lockpath="/sys/devices/platform/${NFIT_TEST_BUS0}/nfit_test_dimm/test_dimm"
+
+trap 'err $LINENO' ERR
+
+check_prereq()
+{
+   if [ ! -f "$keyctl" ]; then
+   echo "$keyctl does not exist."
+   exit 1
+   fi
+
+   if [ ! -d "$keypath" ]; then
+   echo "$keypath directory does not exist."
+   exit 1
+   fi
+}
+
+setup()
+{
+   $NDCTL disable-region -b "$NFIT_TEST_BUS0" all
+}
+
+detect()
+{
+   dev=$($NDCTL list -b "$NFIT_TEST_BUS0" -D | jq .[0].dev | tr -d '"')
+   [ -n "$dev" ] || err "$LINENO"
+   id=$($NDCTL list -b "$NFIT_TEST_BUS0" -D | jq .[0].id | tr -d '"')
+   [ -n "$id" ] || err "$LINENO"
+}
+
+setup_keys()
+{
+   if [ ! -f "$masterpath" ]; then
+   keyctl add user $masterkey "$(dd if=/dev/urandom bs=1 count=32 
2>/dev/null)" @u
+   keyctl pipe "$(keyctl search @u user $masterkey)" > $masterpath
+   else
+   echo "Unclean setup. Please cleanup $masterpath file."
+   exit 1
+   fi
+}
+
+test_cleanup()
+{
+   keyctl unlink "$(keyctl search @u encrypted nvdimm:"$id")"
+   keyctl unlink "$(keyctl search @u user $masterkey)"
+   rm -f "$keypath"/nvdimm_"$id"\_"$(hostname)".blob
+   rm -f "$masterpath"
+}
+
+lock_dimm()
+{
+   $NDCTL disable-dimm "$dev"
+   dev_no="$(echo "$dev" | cut -b 5-)"
+   echo 1 > "${lockpath}${dev_no}/lock_dimm"
+   sstate="$(get_security_state)"
+   if [ "$sstate" != "locked" ]; then
+   echo "Incorrect security state: $sstate expected: disabled"
+   exit 1
+   fi
+}
+
+get_security_state()
+{
+   $NDCTL list -i -b "$NFIT_TEST_BUS0" -d "$dev" | jq 
.[].dimms[0].security | tr -d '"'
+}
+
+enable_passphrase()
+{
+   $NDCTL enable-passphrase -m user:"$masterkey" "$dev"
+   sstate=$(get_security_state)
+   if [ "$sstate" != "unlocked" ]; then
+   echo "Incorrect security state: $sstate expected: unlocked"
+   exit 1
+   fi
+}
+
+disable_passphrase()
+{
+   $NDCTL disable-passphrase "$dev"
+   sstate=$(get_security_state)
+   if [ "$sstate" != "disabled" ]; then
+   echo "Incorrect security state: $sstate expected: disabled"
+   exit 1
+   fi
+}
+
+erase_security()
+{
+   $NDCTL sanitize-dimm -c "$dev"
+   sstate=$(get_security_state)
+   if [ "$sstate" != "disabled" ]; then
+   echo "Incorrect security state: $sstate expected: disabled"
+   exit 1
+   fi
+}
+
+update_security()
+{
+   $NDCTL update-passphrase -m user:"$masterkey" "$dev"
+   sstate=$(get_security_state)
+   if [ "$sstate" != "unlocked" ]; then
+   echo "Incorrect security state: $sstate expected: unlocked"
+   exit 1
+   fi
+}
+
+freeze_security()
+{
+   $NDCTL freeze-security "$dev"
+}
+
+test_1_security_enable_and_disable()
+{
+   enable_passphrase
+   disable_passphrase
+}
+
+test_2_security_enable_and_update()
+{
+   enable_passphrase
+   update_security
+   disable_passphrase
+}
+
+test_3_security_enable_and_erase()
+{
+   enable_passphrase
+   erase_security
+}
+
+test_4_security_unlocking()
+{
+   enable_passphrase
+   lock_dimm
+   $NDCTL enable-dimm "$dev"
+   sstate=$(get_security_state)
+   if [ "$sstate" != "unlocked" ]; then
+   echo "Incorrect security state: $sstate expected: unlocked"
+   exit 1
+   fi
+   $NDCTL disable-region -b "$NFIT_TEST_BUS0" all
+   disable_passphrase
+}
+
+# this should always be the last test. with security frozen, nfit_test must
+# be removed and is no longer usable
+test_5_security_freeze()
+{
+   enable_passphrase
+   freeze_security
+   sstate=$(get_security_state)
+   if [ "$sstate" != "frozen" ]; then
+   echo "Incorrect security state: $sstate expected

[PATCH v7 04/12] ndctl: add support for freeze security

2019-01-09 Thread Dave Jiang
Add support for freeze security to libndctl and also command line option
of "freeze-security" for ndctl. This will lock the ability to make changes
to the NVDIMM security.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/Makefile.am   |3 ++-
 Documentation/ndctl/ndctl-freeze-security.txt |   20 ++
 ndctl/builtin.h   |1 +
 ndctl/dimm.c  |   28 +
 ndctl/lib/dimm.c  |5 
 ndctl/lib/libndctl.sym|1 +
 ndctl/libndctl.h  |1 +
 ndctl/ndctl.c |1 +
 8 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ndctl/ndctl-freeze-security.txt

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index 31570a77..a97f193d 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -50,7 +50,8 @@ man1_MANS = \
ndctl-monitor.1 \
ndctl-enable-passphrase.1 \
ndctl-update-passphrase.1 \
-   ndctl-disable-passphrase.1
+   ndctl-disable-passphrase.1 \
+   ndctl-freeze-security.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-freeze-security.txt 
b/Documentation/ndctl/ndctl-freeze-security.txt
new file mode 100644
index ..4e9d2d61
--- /dev/null
+++ b/Documentation/ndctl/ndctl-freeze-security.txt
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-freeze-security(1)
+
+
+NAME
+
+ndctl-freeze-security - enabling or freeze the security for an NVDIMM
+
+SYNOPSIS
+
+[verse]
+'ndctl freeze-security' 
+
+DESCRIPTION
+---
+Provide a generic interface to freeze the security for NVDIMM. Once security
+is frozen, no other security operations can succeed until reboot happens.
+
+include::../copyright.txt[]
diff --git a/ndctl/builtin.h b/ndctl/builtin.h
index 821ea690..f7469598 100644
--- a/ndctl/builtin.h
+++ b/ndctl/builtin.h
@@ -35,4 +35,5 @@ int cmd_inject_smart(int argc, const char **argv, struct 
ndctl_ctx *ctx);
 int cmd_passphrase_setup(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_passphrase_update(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_disable_passphrase(int argc, const char **argv, struct ndctl_ctx *ctx);
+int cmd_freeze_security(int argc, const char **argv, struct ndctl_ctx *ctx);
 #endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index 4f0466a1..19301791 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -876,6 +876,24 @@ static int action_passphrase_disable(struct ndctl_dimm 
*dimm,
return ndctl_dimm_disable_key(dimm, param.key_path);
 }
 
+static int action_security_freeze(struct ndctl_dimm *dimm,
+   struct action_context *actx)
+{
+   int rc;
+
+   if (!ndctl_dimm_security_supported(dimm)) {
+   error("%s: security operation not supported\n",
+   ndctl_dimm_get_devname(dimm));
+   return -EOPNOTSUPP;
+   }
+
+   rc = ndctl_dimm_freeze_security(dimm);
+   if (rc < 0)
+   error("Failed to freeze security for %s\n",
+   ndctl_dimm_get_devname(dimm));
+   return rc;
+}
+
 static int __action_init(struct ndctl_dimm *dimm,
enum ndctl_namespace_version version, int chk_only)
 {
@@ -1285,3 +1303,13 @@ int cmd_disable_passphrase(int argc, const char **argv, 
void *ctx)
count > 1 ? "s" : "");
return count >= 0 ? 0 : EXIT_FAILURE;
 }
+
+int cmd_freeze_security(int argc, const char **argv, void *ctx)
+{
+   int count = dimm_action(argc, argv, ctx, action_security_freeze, 
base_options,
+   "ndctl freeze-security  [..] 
[]");
+
+   fprintf(stderr, "security freezed %d nmem%s.\n", count >= 0 ? count : 0,
+   count > 1 ? "s" : "");
+   return count >= 0 ? 0 : EXIT_FAILURE;
+}
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index 076ccbf6..8f0f0486 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -672,3 +672,8 @@ NDCTL_EXPORT int ndctl_dimm_disable_passphrase(struct 
ndctl_dimm *dimm,
sprintf(buf, "disable %ld\n", key);
return write_security(dimm, buf);
 }
+
+NDCTL_EXPORT int ndctl_dimm_freeze_security(struct ndctl_dimm *dimm)
+{
+   return write_security(dimm, "freeze");
+}
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index 90038e75..a1c56060 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -395,4 +395,5 @@ global:
ndctl_dimm_update_passphrase;
ndctl_dimm_disable_passphrase;
ndctl_dimm_disable_key;
+   ndctl_dimm_freeze_security;
 } LIBNDCTL_18;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index cb58f89c..48c26476 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -702,6 +702,7 @@ bool ndctl_dimm_security_supp

[PATCH v7 01/12] ndctl: add support for display security state

2019-01-09 Thread Dave Jiang
Adding libndctl API call for retrieving security state for a DIMM and also
adding support to ndctl list for displaying security state.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/ndctl-list.txt |8 
 ndctl/lib/dimm.c   |   37 
 ndctl/lib/libndctl.sym |5 +
 ndctl/libndctl.h   |   13 +
 util/json.c|   31 ++
 5 files changed, 94 insertions(+)

diff --git a/Documentation/ndctl/ndctl-list.txt 
b/Documentation/ndctl/ndctl-list.txt
index e24c8f40..bdd69add 100644
--- a/Documentation/ndctl/ndctl-list.txt
+++ b/Documentation/ndctl/ndctl-list.txt
@@ -98,6 +98,14 @@ include::xable-region-options.txt[]
 -D::
 --dimms::
Include dimm info in the listing
+[verse]
+{
+  "dev":"nmem0",
+  "id":"cdab-0a-07e0-",
+  "handle":0,
+  "phys_id":0,
+  "security:":"disabled"
+}
 
 -H::
 --health::
diff --git a/ndctl/lib/dimm.c b/ndctl/lib/dimm.c
index 79e2ca0a..e03135d9 100644
--- a/ndctl/lib/dimm.c
+++ b/ndctl/lib/dimm.c
@@ -587,3 +587,40 @@ NDCTL_EXPORT unsigned long ndctl_dimm_get_available_labels(
 
return strtoul(buf, NULL, 0);
 }
+
+NDCTL_EXPORT int ndctl_dimm_get_security(struct ndctl_dimm *dimm,
+   enum nd_security_state *state)
+{
+   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
+   char *path = dimm->dimm_buf;
+   int len = dimm->buf_len;
+   char buf[64];
+   int rc;
+
+   if (snprintf(path, len, "%s/security", dimm->dimm_path) >= len) {
+   err(ctx, "%s: buffer too small!\n",
+   ndctl_dimm_get_devname(dimm));
+   return -ERANGE;
+   }
+
+   rc = sysfs_read_attr(ctx, path, buf);
+   if (rc < 0)
+   return rc;
+
+   if (strcmp(buf, "unsupported") == 0)
+   *state = ND_SECURITY_UNSUPPORTED;
+   else if (strcmp(buf, "disabled") == 0)
+   *state = ND_SECURITY_DISABLED;
+   else if (strcmp(buf, "unlocked") == 0)
+   *state = ND_SECURITY_UNLOCKED;
+   else if (strcmp(buf, "locked") == 0)
+   *state = ND_SECURITY_LOCKED;
+   else if (strcmp(buf, "frozen") == 0)
+   *state = ND_SECURITY_FROZEN;
+   else if (strcmp(buf, "overwrite") == 0)
+   *state = ND_SECURITY_OVERWRITE;
+   else
+   *state = ND_SECURITY_INVALID;
+
+   return 0;
+}
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index 6c4c8b4d..1bd63fa1 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -385,3 +385,8 @@ global:
ndctl_namespace_get_next_badblock;
ndctl_dimm_get_dirty_shutdown;
 } LIBNDCTL_17;
+
+LIBNDCTL_19 {
+global:
+   ndctl_dimm_get_security;
+} LIBNDCTL_18;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index c81cc032..4255252c 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -681,6 +681,19 @@ enum ND_FW_STATUS ndctl_cmd_fw_xlat_firmware_status(struct 
ndctl_cmd *cmd);
 struct ndctl_cmd *ndctl_dimm_cmd_new_ack_shutdown_count(struct ndctl_dimm 
*dimm);
 int ndctl_dimm_fw_update_supported(struct ndctl_dimm *dimm);
 
+enum nd_security_state {
+   ND_SECURITY_INVALID = -1,
+   ND_SECURITY_UNSUPPORTED = 0,
+   ND_SECURITY_DISABLED,
+   ND_SECURITY_UNLOCKED,
+   ND_SECURITY_LOCKED,
+   ND_SECURITY_FROZEN,
+   ND_SECURITY_OVERWRITE,
+};
+
+int ndctl_dimm_get_security(struct ndctl_dimm *dimm,
+   enum nd_security_state *sstate);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/util/json.c b/util/json.c
index 5c3424e2..e3b9e72e 100644
--- a/util/json.c
+++ b/util/json.c
@@ -164,6 +164,7 @@ struct json_object *util_dimm_to_json(struct ndctl_dimm 
*dimm,
unsigned int handle = ndctl_dimm_get_handle(dimm);
unsigned short phys_id = ndctl_dimm_get_phys_id(dimm);
struct json_object *jobj;
+   enum nd_security_state sstate;
 
if (!jdimm)
return NULL;
@@ -243,6 +244,36 @@ struct json_object *util_dimm_to_json(struct ndctl_dimm 
*dimm,
json_object_object_add(jdimm, "flag_smart_event", jobj);
}
 
+   if (ndctl_dimm_get_security(dimm, &sstate) == 0) {
+   switch (sstate) {
+   case ND_SECURITY_UNSUPPORTED:
+   jobj = json_object_new_string("unsupported");
+   break;
+   case ND_SECURITY_DISABLED:
+   jobj = json_object_new_string("disabled");
+   break;
+   case ND_SECURITY_UNLOCKED:
+   jobj = json_object_new_string("unlocked");
+   break;
+   case ND_SECURITY_LOCKED:
+   jobj = json_object_new_string("locked");
+   break;
+   case ND_SECURITY_FROZEN:
+   jobj = json_object_new_string("frozen");
+

[PATCH v7 05/12] ndctl: add support for sanitize dimm

2019-01-09 Thread Dave Jiang
Add support to secure erase to libndctl and also command line option
of "sanitize-dimm" for ndctl. This will initiate the request to crypto
erase a DIMM.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/Makefile.am |3 +-
 Documentation/ndctl/ndctl-sanitize-dimm.txt |   37 +++
 ndctl/builtin.h |1 +
 ndctl/dimm.c|   52 +++
 ndctl/lib/dimm.c|8 
 ndctl/lib/keys.c|   21 +--
 ndctl/lib/libndctl.sym  |2 +
 ndctl/libndctl.h|9 +
 ndctl/ndctl.c   |1 +
 9 files changed, 130 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/ndctl/ndctl-sanitize-dimm.txt

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index a97f193d..bbea9674 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -51,7 +51,8 @@ man1_MANS = \
ndctl-enable-passphrase.1 \
ndctl-update-passphrase.1 \
ndctl-disable-passphrase.1 \
-   ndctl-freeze-security.1
+   ndctl-freeze-security.1 \
+   ndctl-sanitize-dimm.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-sanitize-dimm.txt 
b/Documentation/ndctl/ndctl-sanitize-dimm.txt
new file mode 100644
index ..a1eab8bc
--- /dev/null
+++ b/Documentation/ndctl/ndctl-sanitize-dimm.txt
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-sanitize-dimm(1)
+==
+
+NAME
+
+ndctl-sanitize-dimm - sanitize the data on the NVDIMM
+
+SYNOPSIS
+
+[verse]
+'ndctl sanitize'  []
+
+DESCRIPTION
+---
+Provide a generic interface to crypto erase a NVDIMM.
+Ndctl will search the user key ring for the associated DIMM. If no key is
+found, it will attempt to load the key blob from the expected location. Ndctl
+will remove the key and the key blob once security is disabled.
+
+OPTIONS
+---
+::
+include::xable-dimm-options.txt[]
+
+-p::
+--key-path=::
+   Path to where key related files resides. This parameter is optional
+   and the default is set to /etc/ndctl/keys.
+
+-c::
+--crypto-erase::
+   Replaces encryption keys and securely erases the data. This does not
+   change label data. This is the default sanitize method.
+
+include::../copyright.txt[]
diff --git a/ndctl/builtin.h b/ndctl/builtin.h
index f7469598..55bee47c 100644
--- a/ndctl/builtin.h
+++ b/ndctl/builtin.h
@@ -36,4 +36,5 @@ int cmd_passphrase_setup(int argc, const char **argv, struct 
ndctl_ctx *ctx);
 int cmd_passphrase_update(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_disable_passphrase(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_freeze_security(int argc, const char **argv, struct ndctl_ctx *ctx);
+int cmd_sanitize_dimm(int argc, const char **argv, struct ndctl_ctx *ctx);
 #endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index 19301791..a91b40d5 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -47,6 +47,7 @@ static struct parameters {
const char *labelversion;
const char *key_path;
const char *master_key;
+   bool crypto_erase;
bool force;
bool json;
bool verbose;
@@ -894,6 +895,35 @@ static int action_security_freeze(struct ndctl_dimm *dimm,
return rc;
 }
 
+static int action_sanitize_dimm(struct ndctl_dimm *dimm,
+   struct action_context *actx)
+{
+   int rc;
+
+   if (!ndctl_dimm_security_supported(dimm)) {
+   error("%s: security operation not supported\n",
+   ndctl_dimm_get_devname(dimm));
+   return -EOPNOTSUPP;
+   }
+
+   /*
+* Setting crypto erase to be default. The other method will be
+* overwrite.
+*/
+   if (!param.crypto_erase) {
+   param.crypto_erase = true;
+   printf("No santize method passed in, default to 
crypto-erase\n");
+   }
+
+   if (param.crypto_erase) {
+   rc = ndctl_dimm_secure_erase_key(dimm, param.key_path);
+   if (rc < 0)
+   return rc;
+   }
+
+   return 0;
+}
+
 static int __action_init(struct ndctl_dimm *dimm,
enum ndctl_namespace_version version, int chk_only)
 {
@@ -991,6 +1021,10 @@ OPT_FILENAME('p', "key-path", ¶m.key_path, 
"key-path", \
 OPT_STRING('m', "master-key", ¶m.master_key, ":", \
"master key for security")
 
+#define SANITIZE_OPTIONS() \
+OPT_BOOLEAN('c', "crypto-erase", ¶m.crypto_erase, \
+   "crypto erase a dimm")
+
 static const struct option read_options[] = {
BASE_OPTIONS(),
READ_OPTIONS(),
@@ -1033,6 +1067,13 @@ static const struct option key_options[] = {
OPT_END(),
 };
 
+static const struct option sanitize_options[] = {
+   BASE_OPTIONS(),
+

[PATCH v7 03/12] ndctl: add disable security support

2019-01-09 Thread Dave Jiang
Add support for disable security to libndctl and also command line option
of "disable-passphrase" for ndctl. This provides a way to disable security
on the nvdimm.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/Makefile.am  |3 +-
 Documentation/ndctl/ndctl-disable-passphrase.txt |   32 +++
 ndctl/builtin.h  |1 +
 ndctl/dimm.c |   38 --
 ndctl/lib/dimm.c |9 +
 ndctl/lib/keys.c |   29 +
 ndctl/lib/libndctl.sym   |2 +
 ndctl/libndctl.h |8 +
 ndctl/ndctl.c|1 +
 9 files changed, 119 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/ndctl/ndctl-disable-passphrase.txt

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index 7adb..31570a77 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -49,7 +49,8 @@ man1_MANS = \
ndctl-list.1 \
ndctl-monitor.1 \
ndctl-enable-passphrase.1 \
-   ndctl-update-passphrase.1
+   ndctl-update-passphrase.1 \
+   ndctl-disable-passphrase.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-disable-passphrase.txt 
b/Documentation/ndctl/ndctl-disable-passphrase.txt
new file mode 100644
index ..3c8bfe47
--- /dev/null
+++ b/Documentation/ndctl/ndctl-disable-passphrase.txt
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-disable-passphrase(1)
+===
+
+NAME
+
+ndctl-disable-passphrase - disabling passphrase for an NVDIMM
+
+SYNOPSIS
+
+[verse]
+'ndctl disable-passphrase'  []
+
+DESCRIPTION
+---
+Provide a generic interface for disabling passphrase for NVDIMM.
+Ndctl will search the user key ring for the associated DIMM. If no key is
+found, it will attempt to load the key blob from the expected location. Ndctl
+will remove the key and the key blob once passphrase is disabled.
+
+OPTIONS
+---
+::
+include::xable-dimm-options.txt[]
+
+-p::
+--key-path=::
+   Path to where key related files resides. This parameter is optional
+   and the default is set to /etc/ndctl/keys.
+
+include::../copyright.txt[]
diff --git a/ndctl/builtin.h b/ndctl/builtin.h
index 231fda25..821ea690 100644
--- a/ndctl/builtin.h
+++ b/ndctl/builtin.h
@@ -34,4 +34,5 @@ int cmd_update_firmware(int argc, const char **argv, struct 
ndctl_ctx *ctx);
 int cmd_inject_smart(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_passphrase_setup(int argc, const char **argv, struct ndctl_ctx *ctx);
 int cmd_passphrase_update(int argc, const char **argv, struct ndctl_ctx *ctx);
+int cmd_disable_passphrase(int argc, const char **argv, struct ndctl_ctx *ctx);
 #endif /* _NDCTL_BUILTIN_H_ */
diff --git a/ndctl/dimm.c b/ndctl/dimm.c
index 1ab6b29f..4f0466a1 100644
--- a/ndctl/dimm.c
+++ b/ndctl/dimm.c
@@ -864,6 +864,18 @@ static int action_key_update(struct ndctl_dimm *dimm,
param.key_path);
 }
 
+static int action_passphrase_disable(struct ndctl_dimm *dimm,
+   struct action_context *actx)
+{
+   if (!ndctl_dimm_security_supported(dimm)) {
+   error("%s: security operation not supported\n",
+   ndctl_dimm_get_devname(dimm));
+   return -EOPNOTSUPP;
+   }
+
+   return ndctl_dimm_disable_key(dimm, param.key_path);
+}
+
 static int __action_init(struct ndctl_dimm *dimm,
enum ndctl_namespace_version version, int chk_only)
 {
@@ -953,12 +965,14 @@ OPT_BOOLEAN('f', "force", ¶m.force, \
 OPT_STRING('V', "label-version", ¶m.labelversion, "version-number", \
"namespace label specification version (default: 1.1)")
 
-#define KEY_OPTIONS() \
-OPT_STRING('m', "master-key", ¶m.master_key, ":", \
-   "master key for security"), \
+#define KEY_BASE_OPTIONS() \
 OPT_FILENAME('p', "key-path", ¶m.key_path, "key-path", \
"override the default key path")
 
+#define KEY_OPTIONS() \
+OPT_STRING('m', "master-key", ¶m.master_key, ":", \
+   "master key for security")
+
 static const struct option read_options[] = {
BASE_OPTIONS(),
READ_OPTIONS(),
@@ -988,8 +1002,15 @@ static const struct option init_options[] = {
OPT_END(),
 };
 
+static const struct option key_base_options[] = {
+   BASE_OPTIONS(),
+   KEY_BASE_OPTIONS(),
+   OPT_END(),
+};
+
 static const struct option key_options[] = {
BASE_OPTIONS(),
+   KEY_BASE_OPTIONS(),
KEY_OPTIONS(),
OPT_END(),
 };
@@ -1253,3 +1274,14 @@ int cmd_passphrase_setup(int argc, const char **argv, 
struct ndctl_ctx *ctx)
count > 1 ? "s" : "");
return count >= 0 ? 0 : EXIT_FAILURE;
 }
+
+int cmd_disable_passphrase(int argc, cons

[PATCH v7 02/12] ndctl: add passphrase update to ndctl

2019-01-09 Thread Dave Jiang
Add API call for triggering sysfs knob to update the security for a DIMM
in libndctl. Also add the ndctl "update-passphrase" to trigger the
operation.

Signed-off-by: Dave Jiang 
---
 Documentation/ndctl/Makefile.am |4 
 Documentation/ndctl/ndctl-enable-passphrase.txt |   42 ++
 Documentation/ndctl/ndctl-update-passphrase.txt |   38 ++
 configure.ac|   19 +
 ndctl.spec.in   |2 
 ndctl/Makefile.am   |3 
 ndctl/builtin.h |2 
 ndctl/dimm.c|   94 +
 ndctl/lib/Makefile.am   |8 
 ndctl/lib/dimm.c|   39 ++
 ndctl/lib/keys.c|  394 +++
 ndctl/lib/libndctl.sym  |4 
 ndctl/libndctl.h|   32 ++
 ndctl/ndctl.c   |2 
 14 files changed, 670 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/ndctl/ndctl-enable-passphrase.txt
 create mode 100644 Documentation/ndctl/ndctl-update-passphrase.txt
 create mode 100644 ndctl/lib/keys.c

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index a30b139b..7adb 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -47,7 +47,9 @@ man1_MANS = \
ndctl-inject-smart.1 \
ndctl-update-firmware.1 \
ndctl-list.1 \
-   ndctl-monitor.1
+   ndctl-monitor.1 \
+   ndctl-enable-passphrase.1 \
+   ndctl-update-passphrase.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-enable-passphrase.txt 
b/Documentation/ndctl/ndctl-enable-passphrase.txt
new file mode 100644
index ..c14a206c
--- /dev/null
+++ b/Documentation/ndctl/ndctl-enable-passphrase.txt
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-enable-passphrase(1)
+==
+
+NAME
+
+ndctl-enable-passphrase - enable the security passphrase for a NVDIMM
+
+SYNOPSIS
+
+[verse]
+'ndctl enable-passphrase'  []
+
+DESCRIPTION
+---
+Provide a command to enable the security passphrase for the NVDIMM.
+It is expected that the master key has already been loaded into the user
+key ring. The encrypted key blobs will be created in /etc/nvdimm directory
+with the file name of "nvdimm--.blob".
+
+The command will fail if the nvdimm key is already in the user key ring and/or
+the key blob already resides in /etc/nvdimm. Do not touch the /etc/nvdimm
+directory and let ndctl manage the keys, unless you know what you are doing.
+
+OPTIONS
+---
+::
+include::xable-dimm-options.txt[]
+
+-m::
+--master=::
+   Key name for the master key used to seal the NVDIMM security keys.
+   The format would be :
+   i.e.: trusted:master-nvdimm
+
+-p::
+--key-path=::
+   Path to where key related files resides. This parameter is optional
+   and the default is set to /etc/ndctl/keys.
+
+include::../copyright.txt[]
diff --git a/Documentation/ndctl/ndctl-update-passphrase.txt 
b/Documentation/ndctl/ndctl-update-passphrase.txt
new file mode 100644
index ..dd6e4e4e
--- /dev/null
+++ b/Documentation/ndctl/ndctl-update-passphrase.txt
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ndctl-update-passphrase(1)
+==
+
+NAME
+
+ndctl-update-passphrase - update the security passphrase for a NVDIMM
+
+SYNOPSIS
+
+[verse]
+'ndctl update-passphrase'  []
+
+DESCRIPTION
+---
+Provide a command to update the security key for NVDIMM.
+It is expected that the current and the new (if new master key is desired)
+master key has already been loaded into the user key ring. The new encrypted
+key blobs will be created in /etc/nvdimm directory
+with the file name of "nvdimm--.blob".
+
+OPTIONS
+---
+::
+include::xable-dimm-options.txt[]
+
+-m::
+--master::
+   New key name for the master key to seal the new nvdimm key, or the
+   existing master key name. i.e trusted:master-key.
+
+-p::
+--key-path=::
+   Path to where key related files resides. This parameter is optional
+   and the default is set to /etc/ndctl/keys.
+
+include::../copyright.txt[]
diff --git a/configure.ac b/configure.ac
index aa07ec7b..22efc871 100644
--- a/configure.ac
+++ b/configure.ac
@@ -154,6 +154,7 @@ fi
 AC_SUBST([systemd_unitdir])
 AM_CONDITIONAL([ENABLE_SYSTEMD_UNITS], [test "x$with_systemd" = "xyes"])
 
+
 ndctl_monitorconfdir=${sysconfdir}/ndctl
 ndctl_monitorconf=monitor.conf
 AC_SUBST([ndctl_monitorconfdir])
@@ -161,6 +162,24 @@ AC_SUBST([ndctl_monitorconf])
 AC_DEFINE_UNQUOTED(NDCTL_CONF_FILE, 
["$ndctl_monitorconfdir/$ndctl_monitorconf"],
[default ndctl monitor conf path])
 
+AC_ARG_WITH([keyutils],
+   AS_HELP_STRING([--with-keyutils],
+   [Enable keyutils functionality (security).  
@<:@def

[PATCH v7 00/12] ndctl: add security support

2019-01-09 Thread Dave Jiang
The following series implements mechanisms that utilize the sysfs knobs
provided by the kernel in order to support the Intel DSM v1.8 spec
that provides security to NVDIMM. The following abilities are added:
1. display security state
2. enable/update passphrase
3. disable passphrase
4. freeze security
5. secure erase
6. overwrite
7. master passphrase enable/update

v7:
- Added option to provide path to key directory. (Vishal)
- Cleaned up shell scripts. (Vishal)
- Cleaned up documentation. (Vishal)
- Addressed various comments from Vishal.

v6:
- Fix spelling and grammar errors for documentation. (Jing)
- Change bool for indicate master passphrase and old passphrase to enum.
- Fix key load script master key name.
- Update to match v15 of kernel patch series.

v5:
- Updated to match latest kernel interface (encrypted keys)
- Added overwrite support
- Added support for DSM v1.8 master passphrase operations
- Removed upcall related code
- Moved security state to enum (Dan)
- Change security output "security_state" to just "security". (Dan)
- Break out enable and update passphrase operation. (Dan)
- Security build can be compiled out when keyutils does not exist. (Dan)
- Move all keyutils related operations to libndctl. (Dan)

v4:
- Updated to match latest kernel interface.
- Added unit test for all security calls

v3:
- Added support to inject keys in order to update nvdimm security.

v2:
- Fixup the upcall util to match recent kernel updates for nvdimm security.

---

Dave Jiang (12):
  ndctl: add support for display security state
  ndctl: add passphrase update to ndctl
  ndctl: add disable security support
  ndctl: add support for freeze security
  ndctl: add support for sanitize dimm
  ndctl: add unit test for security ops (minus overwrite)
  ndctl: setup modprobe rules
  ndctl: add overwrite operation support
  ndctl: add wait-overwrite support
  ndctl: master phassphrase management support
  ndctl: add master secure erase support
  ndctl: documentation for security and key management


 Documentation/ndctl/Makefile.am  |8 
 Documentation/ndctl/intel-nvdimm-security.txt|  140 ++
 Documentation/ndctl/ndctl-disable-passphrase.txt |   34 +
 Documentation/ndctl/ndctl-enable-passphrase.txt  |   49 ++
 Documentation/ndctl/ndctl-freeze-security.txt|   22 +
 Documentation/ndctl/ndctl-list.txt   |8 
 Documentation/ndctl/ndctl-sanitize-dimm.txt  |   49 ++
 Documentation/ndctl/ndctl-update-passphrase.txt  |   45 ++
 Documentation/ndctl/ndctl-wait-overwrite.txt |   31 +
 Makefile.am  |   10 
 configure.ac |   19 +
 contrib/ndctl-loadkeys.sh|   25 +
 contrib/nvdimm_modprobe.conf |1 
 ndctl.spec.in|2 
 ndctl/Makefile.am|3 
 ndctl/builtin.h  |6 
 ndctl/dimm.c |  259 +++
 ndctl/lib/Makefile.am|8 
 ndctl/lib/dimm.c |  202 +
 ndctl/lib/keys.c |  512 ++
 ndctl/lib/libndctl.sym   |   19 +
 ndctl/libndctl.h |   79 +++
 ndctl/ndctl.c|6 
 test/Makefile.am |4 
 test/security.sh |  203 +
 util/json.c  |   31 +
 26 files changed, 1762 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/ndctl/intel-nvdimm-security.txt
 create mode 100644 Documentation/ndctl/ndctl-disable-passphrase.txt
 create mode 100644 Documentation/ndctl/ndctl-enable-passphrase.txt
 create mode 100644 Documentation/ndctl/ndctl-freeze-security.txt
 create mode 100644 Documentation/ndctl/ndctl-sanitize-dimm.txt
 create mode 100644 Documentation/ndctl/ndctl-update-passphrase.txt
 create mode 100644 Documentation/ndctl/ndctl-wait-overwrite.txt
 create mode 100755 contrib/ndctl-loadkeys.sh
 create mode 100644 contrib/nvdimm_modprobe.conf
 create mode 100644 ndctl/lib/keys.c
 create mode 100755 test/security.sh

--
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Dan Williams
On Wed, Jan 9, 2019 at 5:53 AM Pankaj Gupta  wrote:
>
> This patch adds 'DAXDEV_BUFFERED' flag which is set
> for virtio pmem corresponding nd_region. This later
> is used to disable MAP_SYNC functionality for ext4
> & xfs filesystem.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/dax/super.c  | 17 +
>  drivers/nvdimm/pmem.c|  3 +++
>  drivers/nvdimm/region_devs.c |  7 +++
>  drivers/virtio/pmem.c|  1 +
>  include/linux/dax.h  |  9 +
>  include/linux/libnvdimm.h|  6 ++
>  6 files changed, 43 insertions(+)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 6e928f3..9128740 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -167,6 +167,8 @@ enum dax_device_flags {
> DAXDEV_ALIVE,
> /* gate whether dax_flush() calls the low level flush routine */
> DAXDEV_WRITE_CACHE,
> +   /* flag to disable MAP_SYNC for virtio based host page cache flush */
> +   DAXDEV_BUFFERED,
>  };
>
>  /**
> @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
>
> +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
> +{
> +   if (wc)
> +   set_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> +   else
> +   clear_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);

The "write_cache" property was structured this way because it can
conceivably change at runtime. The MAP_SYNC capability should be
static and never changed after init.

> +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
> +{
> +   return test_bit(DAXDEV_BUFFERED, &dax_dev->flags);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);

Echoing Darrick and Jan this is should be a generic property of a
dax_device and not specific to virtio. I don't like the "buffered"
designation as that's not accurate. There may be hardware reasons why
a dax_device is not synchronous, like a requirement to flush a
write-pending queue or otherwise notify the device of new writes.

I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I
would also modify alloc_dax() to take a flags argument so that the
capability can be instantiated when the dax_device is allocated.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Darrick J. Wong
On Wed, Jan 09, 2019 at 08:17:36PM +0530, Pankaj Gupta wrote:
> Virtio pmem provides asynchronous host page cache flush
> mechanism. we don't support 'MAP_SYNC' with virtio pmem 
> and xfs.
> 
> Signed-off-by: Pankaj Gupta 
> ---
>  fs/xfs/xfs_file.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e474250..eae4aa4 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1190,6 +1190,14 @@ xfs_file_mmap(
>   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
>   return -EOPNOTSUPP;
>  
> + /* We don't support synchronous mappings with guest direct access
> +  * and virtio based host page cache mechanism.
> +  */
> + if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(

Echoing what Jan said, this ought to be some sort of generic function
that tells us whether or not memory mapped from the dax device will
always still be accessible even after a crash (i.e. supports MAP_SYNC).

What if the underlying file on the host is itself on pmem and can be
MAP_SYNC'd?  Shouldn't the guest be able to use MAP_SYNC as well?

--D

> + xfs_find_daxdev_for_inode(file_inode(filp))) &&
> + (vma->vm_flags & VM_SYNC))
> + return -EOPNOTSUPP;
> +
>   file_accessed(filp);
>   vma->vm_ops = &xfs_file_vm_ops;
>   if (IS_DAX(file_inode(filp)))
> -- 
> 2.9.3
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMs

2019-01-09 Thread Dave Jiang



On 1/8/19 6:24 PM, Dan Williams wrote:
> Kees reports a crash with the following signature...
> 
>  RIP: 0010:nvdimm_visible+0x79/0x80
>  [..]
>  Call Trace:
>   internal_create_group+0xf4/0x380
>   sysfs_create_groups+0x46/0xb0
>   device_add+0x331/0x680
>   nd_async_device_register+0x15/0x60
>   async_run_entry_fn+0x38/0x100
> 
> ...when starting a QEMU environment with "label-less" DIMM. Without
> labels QEMU does not publish any DSM methods. Without defined methods
> the NVDIMM_FAMILY type is not established and the nfit driver will skip
> registering security operations.
> 
> In that case the security state should be initialized to a negative
> value in __nvdimm_create() and nvdimm_visible() should skip
> interrogating the specific ops. However, since 'enum
> nvdimm_security_state' was only defined to contain positive values the
> "if (nvdimm->sec.state < 0)" check always fails.
> 
> Define a negative error state to allow negative state values to be
> handled as expected.
> 
> Fixes: f2989396553a ("acpi/nfit, libnvdimm: Introduce nvdimm_security_ops")
> Cc: Dave Jiang 
> Reported-by: Kees Cook 
> Tested-by: Kees Cook 
> Signed-off-by: Dan Williams 

Reviewed-by: Dave Jiang 

> ---
>  include/linux/libnvdimm.h |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 5440f11b0907..7315977b64da 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -160,6 +160,7 @@ static inline struct nd_blk_region_desc 
> *to_blk_region_desc(
>  }
>  
>  enum nvdimm_security_state {
> + NVDIMM_SECURITY_ERROR = -1,
>   NVDIMM_SECURITY_DISABLED,
>   NVDIMM_SECURITY_UNLOCKED,
>   NVDIMM_SECURITY_LOCKED,
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta


> 
> On Wed 09-01-19 19:26:05, Pankaj Gupta wrote:
> > Virtio pmem provides asynchronous host page cache flush
> > mechanism. We don't support 'MAP_SYNC' with virtio pmem
> > and ext4.
> > 
> > Signed-off-by: Pankaj Gupta 
> ...
> > @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct
> > vm_area_struct *vma)
> > if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
> > return -EOPNOTSUPP;
> >  
> > +   /* We don't support synchronous mappings with guest direct access
> > +* and virtio based host page cache flush mechanism.
> > +*/
> > +   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
> > +   && (vma->vm_flags & VM_SYNC))
> > +   return -EOPNOTSUPP;
> > +
> 
> Shouldn't there rather be some generic way of doing this? Having
> virtio_pmem_host_cache_enabled() check in filesystem code just looks like
> filesystem sniffing into details is should not care about... Maybe just
> naming this (or having a wrapper) dax_dev_map_sync_supported()?

Thanks for the feedback.

Just wanted to avoid 'dax' in function name just to differentiate this with 
real dax.
But yes can add a wrapper: dax_dev_map_sync_supported()

Thanks,
Pankaj
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta
 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2

 drivers/acpi/nfit

[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta
This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(&dax_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(&nd_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(®ion_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+   set_bit(ND_REGION_BUFFERED, &ndr_desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm)

[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+   req->done = true;
+   wake_up(&req->host_acked);
+
+   if (!list_empty(&vpmem->req_list)) {
+   req_buf = list_first_entry(&vpmem->req_list,
+   struct virtio_pmem_request, list);
+   list_del(&vpmem->req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(&req_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(&req->host_acked);
+   init_waitqueue_head(&req->wq_buf);
+   sg_init_one(&sg, req->name, strlen(req->name));
+   sgs[0] = &sg;
+   sg_init_one(&ret, &req->ret, sizeof(req->ret));
+   sgs[1] = &ret;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(&vdev->dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(&vpmem->req_list, &req->list);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/driv

[PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d4..e54f48b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache flush mechanism.
+*/
+   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
-- 
2.9.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta


Please ignore this series as my network went down while 
sending this. I will send this series again.

Thanks,
Pankaj  

> 
> This patch series has implementation for "virtio pmem".
>  "virtio pmem" is fake persistent memory(nvdimm) in guest
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.
>  
>  Sharing guest kernel driver in this patchset with the
>  changes suggested in v2. Tested with Qemu side device
>  emulation for virtio-pmem [6].
>  
>  Details of project idea for 'virtio pmem' flushing interface
>  is shared [3] & [4].
> 
>  Implementation is divided into two parts:
>  New virtio pmem guest driver and qemu code changes for new
>  virtio pmem paravirtualized device.
> 
> 1. Guest virtio-pmem kernel driver
> -
>- Reads persistent memory range from paravirt device and
>  registers with 'nvdimm_bus'.
>- 'nvdimm/pmem' driver uses this information to allocate
>  persistent memory region and setup filesystem operations
>  to the allocated memory.
>- virtio pmem driver implements asynchronous flushing
>  interface to flush from guest to host.
> 
> 2. Qemu virtio-pmem device
> -
>- Creates virtio pmem device and exposes a memory range to
>  KVM guest.
>- At host side this is file backed memory which acts as
>  persistent memory.
>- Qemu side flush uses aio thread pool API's and virtio
>  for asynchronous guest multi request handling.
> 
>David Hildenbrand CCed also posted a modified version[6] of
>qemu virtio-pmem code based on updated Qemu memory device API.
> 
>  Virtio-pmem errors handling:
>  
>   Checked behaviour of virtio-pmem for below types of errors
>   Need suggestions on expected behaviour for handling these errors?
> 
>   - Hardware Errors: Uncorrectable recoverable Errors:
>   a] virtio-pmem:
> - As per current logic if error page belongs to Qemu process,
>   host MCE handler isolates(hwpoison) that page and send SIGBUS.
>   Qemu SIGBUS handler injects exception to KVM guest.
> - KVM guest then isolates the page and send SIGBUS to guest
>   userspace process which has mapped the page.
>   
>   b] Existing implementation for ACPI pmem driver:
> - Handles such errors with MCE notifier and creates a list
>   of bad blocks. Read/direct access DAX operation return EIO
>   if accessed memory page fall in bad block list.
> - It also starts backgound scrubbing.
> - Similar functionality can be reused in virtio-pmem with MCE
>   notifier but without scrubbing(no ACPI/ARS)? Need inputs to
>   confirm if this behaviour is ok or needs any change?
> 
> Changes from PATCH v2: [1]
> - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
> - Use name 'virtio pmem' in place of 'fake dax'
> 
> Changes from PATCH v1: [2]
> - 0-day build test for build dependency on libnvdimm
> 
>  Changes suggested by - [Dan Williams]
> - Split the driver into two parts virtio & pmem
> - Move queuing of async block request to block layer
> - Add "sync" parameter in nvdimm_flush function
> - Use indirect call for nvdimm_flush
> - Don’t move declarations to common global header e.g nd.h
> - nvdimm_flush() return 0 or -EIO if it fails
> - Teach nsio_rw_bytes() that the flush can fail
> - Rename nvdimm_flush() to generic_nvdimm_flush()
> - Use 'nd_region->provider_data' for long dereferencing
> - Remove virtio_pmem_freeze/restore functions
> - Remove BSD license text with SPDX license text
> 
> - Add might_sleep() in virtio_pmem_flush - [Luiz]
> - Make spin_lock_irqsave() narrow
> 
> Changes from RFC v3
> - Rebase to latest upstream - Luiz
> - Call ndregion->flush in place of nvdimm_flush- Luiz
> - kmalloc return check - Luiz
> - virtqueue full handling - Stefan
> - Don't map entire virtio_pmem_req to device - Stefan
> - request leak, correct sizeof req- Stefan
> - Move declaration to virtio_pmem.c
> 
> Changes from RFC v2:
> - Add flush function in the nd_region in place of switching
>   on a flag - Dan & Stefan
> - Add flush completion function with proper locking and wait
>   for host side flush completion - Stefan & Dan
> - Keep userspace API in uapi header file - Stefan, MST
> - Use LE fields & New device id - MST
> - Indentation & spacing suggestions - MST & Eric
> - Remove extra header files & add licensing - Stefan
> 
> Changes from RFC v1:
> - Reuse existing 'pmem' code for registering persistent
>   memory and other operations instead of creating an entirely
>   new block driver.
> - Use VIRTIO driver to register memory information with
>   nvdimm_bus and create region_type accordingly.
> - Call VIRTIO flush from existing pmem driver.
> 
> Pankaj Gupta (5):
>libnvdimm: nd_region flush callback support
>virtio-pmem: Add virtio-pmem guest driver
>libnvdimm: add nd_region buffered dax_dev flag
>ext4: disable map_sync for virtio pmem

Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Jan Kara
On Wed 09-01-19 19:26:05, Pankaj Gupta wrote:
> Virtio pmem provides asynchronous host page cache flush
> mechanism. We don't support 'MAP_SYNC' with virtio pmem 
> and ext4. 
> 
> Signed-off-by: Pankaj Gupta 
...
> @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
>   return -EOPNOTSUPP;
>  
> + /* We don't support synchronous mappings with guest direct access
> +  * and virtio based host page cache flush mechanism.
> +  */
> + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
> + && (vma->vm_flags & VM_SYNC))
> + return -EOPNOTSUPP;
> +

Shouldn't there rather be some generic way of doing this? Having
virtio_pmem_host_cache_enabled() check in filesystem code just looks like
filesystem sniffing into details is should not care about... Maybe just
naming this (or having a wrapper) dax_dev_map_sync_supported()?

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d4..e54f48b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache flush mechanism.
+*/
+   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
-- 
2.9.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta
This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(&dax_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(&nd_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(®ion_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+   set_bit(ND_REGION_BUFFERED, &ndr_desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm)

[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+   req->done = true;
+   wake_up(&req->host_acked);
+
+   if (!list_empty(&vpmem->req_list)) {
+   req_buf = list_first_entry(&vpmem->req_list,
+   struct virtio_pmem_request, list);
+   list_del(&vpmem->req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(&req_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(&req->host_acked);
+   init_waitqueue_head(&req->wq_buf);
+   sg_init_one(&sg, req->name, strlen(req->name));
+   sgs[0] = &sg;
+   sg_init_one(&ret, &req->ret, sizeof(req->ret));
+   sgs[1] = &ret;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(&vdev->dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(&vpmem->req_list, &req->list);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/driv

[PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;

[PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta
 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2

 drivers/acpi/nfit