Re: [RFC PATCH 1/3] hw/block/nvme: set NVME_DNR in a single place

2021-02-10 Thread Minwoo Im
On 21-02-10 21:19:43, Klaus Jensen wrote:
> On Feb 11 04:52, Minwoo Im wrote:
> > @@ -945,6 +945,11 @@ static void nvme_post_cqes(void *opaque)
> >  static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
> >  {
> >  assert(cq->cqid == req->sq->cqid);
> > +
> > +if (req->status != NVME_SUCCESS) {
> > +req->status |= NVME_DNR;
> > +}
> 
> There are status codes where we do not set the DNR bit (e.g. Data
> Transfer Error, and that might be the only one actually).

Ouch, I think I need to prepare some of switch-helper to figure out
which one needs to be retried or not.

> Maybe a switch such that we do not explicitly set DNR for Data Transfer
> Error (and any other errors we identify), but only if we set it earlier
> in the stack.

Agreed.



Re: [RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command

2021-02-10 Thread Minwoo Im
On 21-02-11 12:00:11, Keith Busch wrote:
> On Thu, Feb 11, 2021 at 04:52:52AM +0900, Minwoo Im wrote:
> > nvme_inject_state command is to give a controller state to be.
> > Human Monitor Interface(HMP) supports users to make controller to a
> > specified state of:
> > 
> > normal: Normal state (no injection)
> > cmd-interrupted:Commands will be interrupted internally
> > 
> > This patch is just a start to give dynamic command from the HMP to the
> > QEMU NVMe device model.  If "cmd-interrupted" state is given, then the
> > controller will return all the CQ entries with Command Interrupts status
> > code.
> > 
> > Usage:
> > -device nvme,id=nvme0,
> > 
> > (qemu) nvme_inject_state nvme0 cmd-interrupted
> > 
> > 
> > 
> > (qemu) nvme_inject_state nvme0 normal
> > 
> > This feature is required to test Linux kernel NVMe driver for the
> > command retry feature.
> 
> Once the user sets the injected state, all commands return that status
> until the user injects the normal state, so the CRD time is meaningless
> here. If we're really going this route, the state needs to return to
> normal on it's own.

That would also be fine to me.

> But I would prefer to see advanced retry tied to real errors that can be
> retried, like if we got an EBUSY or EAGAIN errno or something like that.

I have seen a thread [1] about ACRE.  Forgive me If I misunderstood this
thread or missed something after this thread.  It looks like CRD field in
the CQE can be set for any NVMe error state which means it *may* depend on
the device status.  And this patch just introduced a internal temporarily
error state of the controller by returning Command Intrrupted status.

I think, in this stage, we can go with some errors in the middle of the
AIO (nvme_aio_err()) for advanced retry.  Shouldn't AIO errors are
retry-able and supposed to be retried ?

> The interface you found to implement this is very interesting though.

Thanks, I just wanted to suggest a scheme to inject something to a
running NVMe device model for various testing.

[1] https://www.spinics.net/lists/dm-devel/msg42165.html



Re: [RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command

2021-02-10 Thread Minwoo Im
On 21-02-10 21:33:50, Klaus Jensen wrote:
> On Feb 11 04:52, Minwoo Im wrote:
> > nvme_inject_state command is to give a controller state to be.
> > Human Monitor Interface(HMP) supports users to make controller to a
> > specified state of:
> > 
> > normal: Normal state (no injection)
> > cmd-interrupted:Commands will be interrupted internally
> > 
> > This patch is just a start to give dynamic command from the HMP to the
> > QEMU NVMe device model.  If "cmd-interrupted" state is given, then the
> > controller will return all the CQ entries with Command Interrupts status
> > code.
> > 
> > Usage:
> > -device nvme,id=nvme0,
> > 
> > (qemu) nvme_inject_state nvme0 cmd-interrupted
> > 
> > 
> > 
> > (qemu) nvme_inject_state nvme0 normal
> > 
> > This feature is required to test Linux kernel NVMe driver for the
> > command retry feature.
> > 
> 
> This is super cool and commands like this feel much nicer than the
> qom-set approach that the SMART critical warning feature took.

This interface is super easy to inject some errors to the running
device with a function call-back.

> But... looking at the existing commands I don't think we can "bloat" it
> up with a device specific command like this, but I don't know the policy
> around this.

Me neither, but I've seen the PCI AER error injection feature from
the existing commands, so I suggested this command to control the
NVMe device itself like error injection.

> If an HMP command is out, then we should be able to make do with the
> qom-set approach just fine though.

Hope so.



Re: [PATCH 2/2] hw/nvme: move device-scoped functions

2021-02-10 Thread Minwoo Im
On 21-02-09 12:08:26, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Move a bunch of functions that are internal to a device out of the
> shared header.

May I ask why?  I think some kind of these helpers can stick onto the
header.



Re: [PATCH 1/2] hw/nvme: move nvme emulation out of hw/block

2021-02-10 Thread Minwoo Im
On 21-02-09 12:08:25, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> With the introduction of the nvme-subsystem device we are really
> cluttering up the hw/block directory.
> 
> As suggested by Philippe previously, move the nvme emulation to
> hw/nvme.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Klaus Jensen 

Please add description about consolidation of nvme-ns.h and
nvme-subsys.h to the nvme.h for a unified header file when you apply
this patch! :)

Acked-by: Minwoo Im 



Re: [PATCH] hw/block/nvme: drain namespaces on sq deletion

2021-02-10 Thread Minwoo Im
On 21-01-27 14:15:05, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> For most commands, when issuing an AIO, the BlockAIOCB is stored in the
> NvmeRequest aiocb pointer when the AIO is issued. The purpose of storing
> this is to allow the AIO to be cancelled when deleting submission
> queues (it is currently not used for Abort).
> 
> Since the addition of the Dataset Management command and Zoned
> Namespaces, NvmeRequests may involve more than one AIO and the AIOs are
> issued without saving a reference to the BlockAIOCB. This is a problem
> since nvme_del_sq will attempt to cancel outstanding AIOs, potentially
> with an invalid BlockAIOCB.
> 
> Fix this by instead of explicitly cancelling the requests, just allow
> the AIOs to complete by draining the namespace blockdevs.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 316858fd8adf..91f6fb6da1e2 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -403,6 +403,7 @@ static void nvme_req_clear(NvmeRequest *req)
>  {
>  req->ns = NULL;
>  req->opaque = NULL;
> +req->aiocb = NULL;
>  memset(>cqe, 0x0, sizeof(req->cqe));
>  req->status = NVME_SUCCESS;
>  }
> @@ -2396,6 +2397,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest 
> *req)
>  NvmeSQueue *sq;
>  NvmeCQueue *cq;
>  uint16_t qid = le16_to_cpu(c->qid);
> +int i;
>  
>  if (unlikely(!qid || nvme_check_sqid(n, qid))) {
>  trace_pci_nvme_err_invalid_del_sq(qid);
> @@ -2404,12 +2406,18 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest 
> *req)
>  
>  trace_pci_nvme_del_sq(qid);
>  
> -sq = n->sq[qid];
> -while (!QTAILQ_EMPTY(>out_req_list)) {
> -r = QTAILQ_FIRST(>out_req_list);
> -assert(r->aiocb);
> -blk_aio_cancel(r->aiocb);
> +for (i = 1; i <= n->num_namespaces; i++) {
> +NvmeNamespace *ns = nvme_ns(n, i);
> +if (!ns) {
> +continue;
> +}
> +
> +nvme_ns_drain(ns);

If we just drain the entire namespaces here, commands which has nothing
to do with the target sq to be deleted will be drained.  And this might
be a burden for a single SQ deletion.

By the way, agree with the multiple AIOs references problem for newly added
commands.  But, shouldn't we manage the inflight AIO request references for
the newlly added commands with some other way and kill them all
explicitly as it was?  Maybe some of list for AIOCBs?



[RFC PATCH 1/3] hw/block/nvme: set NVME_DNR in a single place

2021-02-10 Thread Minwoo Im
Set NVME_DNR in the CQ entry status field right before writing the CQ
entry: in nvme_post_cqes().  We have put NVME_DNR for all CQ entry
status for all error cases.  This patch is a former patch to support
command retry feature.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 192 
 1 file changed, 97 insertions(+), 95 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 93345bf3c1fc..816e0e8e5205 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -270,12 +270,12 @@ static int nvme_aor_check(NvmeNamespace *ns, uint32_t 
act, uint32_t opn)
 if (ns->params.max_active_zones != 0 &&
 ns->nr_active_zones + act > ns->params.max_active_zones) {
 trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
-return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
+return NVME_ZONE_TOO_MANY_ACTIVE;
 }
 if (ns->params.max_open_zones != 0 &&
 ns->nr_open_zones + opn > ns->params.max_open_zones) {
 trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
-return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR;
+return NVME_ZONE_TOO_MANY_OPEN;
 }
 
 return NVME_SUCCESS;
@@ -492,7 +492,7 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, 
QEMUIOVector *iov,
 
 if (cmb || pmr) {
 if (qsg && qsg->sg) {
-return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+return NVME_INVALID_USE_OF_CMB;
 }
 
 assert(iov);
@@ -509,7 +509,7 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, 
QEMUIOVector *iov,
 }
 
 if (iov && iov->iov) {
-return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+return NVME_INVALID_USE_OF_CMB;
 }
 
 assert(qsg);
@@ -568,7 +568,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 if (i == n->max_prp_ents - 1 && len > n->page_size) {
 if (unlikely(prp_ent & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
-return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+return NVME_INVALID_PRP_OFFSET;
 }
 
 i = 0;
@@ -585,7 +585,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 
 if (unlikely(prp_ent & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
-return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+return NVME_INVALID_PRP_OFFSET;
 }
 
 trans_len = MIN(len, n->page_size);
@@ -600,7 +600,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 } else {
 if (unlikely(prp2 & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prp2_align(prp2);
-return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+return NVME_INVALID_PRP_OFFSET;
 }
 status = nvme_map_addr(n, qsg, iov, prp2, len);
 if (status) {
@@ -637,9 +637,9 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 break;
 case NVME_SGL_DESCR_TYPE_SEGMENT:
 case NVME_SGL_DESCR_TYPE_LAST_SEGMENT:
-return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
+return NVME_INVALID_NUM_SGL_DESCRS;
 default:
-return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
+return NVME_SGL_DESCR_TYPE_INVALID;
 }
 
 dlen = le32_to_cpu(segment[i].len);
@@ -660,7 +660,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 }
 
 trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
-return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+return NVME_DATA_SGL_LEN_INVALID;
 }
 
 trans_len = MIN(*len, dlen);
@@ -672,7 +672,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 addr = le64_to_cpu(segment[i].addr);
 
 if (UINT64_MAX - addr < dlen) {
-return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+return NVME_DATA_SGL_LEN_INVALID;
 }
 
 status = nvme_map_addr(n, qsg, iov, addr, trans_len);
@@ -731,7 +731,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, 
QEMUIOVector *iov,
 case NVME_SGL_DESCR_TYPE_LAST_SEGMENT:
 break;
 default:
-return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
+return NVME_INVALID_SGL_SEG_DESCR;
 }
 
 seg_len = le32_to_cpu(sgld->len);
@@ -739,11 +739,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList 
*qsg, QEMUIOVector *iov,
 /* check the length of the (Last) Segment descriptor */
 if ((!seg_len || seg_len & 0xf) &&
 (NVME_SGL_TYPE(s

[RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command

2021-02-10 Thread Minwoo Im
nvme_inject_state command is to give a controller state to be.
Human Monitor Interface(HMP) supports users to make controller to a
specified state of:

normal: Normal state (no injection)
cmd-interrupted:Commands will be interrupted internally

This patch is just a start to give dynamic command from the HMP to the
QEMU NVMe device model.  If "cmd-interrupted" state is given, then the
controller will return all the CQ entries with Command Interrupts status
code.

Usage:
-device nvme,id=nvme0,

(qemu) nvme_inject_state nvme0 cmd-interrupted



(qemu) nvme_inject_state nvme0 normal

This feature is required to test Linux kernel NVMe driver for the
command retry feature.

Signed-off-by: Minwoo Im 
---
 hmp-commands.hx   | 13 
 hw/block/nvme.c   | 49 +++
 hw/block/nvme.h   |  8 +++
 include/monitor/hmp.h |  1 +
 4 files changed, 71 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d4001f9c5dc6..ef288c567b46 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1307,6 +1307,19 @@ SRST
   Inject PCIe AER error
 ERST
 
+{
+.name   = "nvme_inject_state",
+.args_type  = "id:s,state:s",
+.params = "id [normal|cmd-interrupted]",
+.help   = "inject controller/namespace state",
+.cmd= hmp_nvme_inject_state,
+},
+
+SRST
+``nvme_inject_state``
+  Inject NVMe controller/namespace state
+ERST
+
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6d3c554a0e99..42cf5bd113e6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -123,6 +123,7 @@
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include "qapi/qmp/qdict.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/block-backend.h"
 #include "exec/memory.h"
@@ -132,6 +133,7 @@
 #include "trace.h"
 #include "nvme.h"
 #include "nvme-ns.h"
+#include "monitor/monitor.h"
 
 #define NVME_MAX_IOQPAIRS 0x
 #define NVME_DB_SIZE  4
@@ -966,6 +968,14 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, 
NvmeRequest *req)
 {
 assert(cq->cqid == req->sq->cqid);
 
+/*
+ * Override request status field if controller state has been injected by
+ * the QMP.
+ */
+if (cq->ctrl->state == NVME_STATE_CMD_INTERRUPTED) {
+req->status = NVME_COMMAND_INTERRUPTED;
+}
+
 if (req->status != NVME_SUCCESS) {
 if (cq->ctrl->features.acre && nvme_should_retry(req)) {
 if (cq->ctrl->params.cmd_retry_delay > 0) {
@@ -5025,4 +5035,43 @@ static void nvme_register_types(void)
 type_register_static(_bus_info);
 }
 
+static void nvme_inject_state(NvmeCtrl *n, NvmeState state)
+{
+n->state = state;
+}
+
+static const char *nvme_states[] = {
+[NVME_STATE_NORMAL] = "normal",
+[NVME_STATE_CMD_INTERRUPTED]= "cmd-interrupted",
+};
+
+void hmp_nvme_inject_state(Monitor *mon, const QDict *qdict)
+{
+const char *id = qdict_get_str(qdict, "id");
+const char *state = qdict_get_str(qdict, "state");
+PCIDevice *dev;
+NvmeCtrl *n;
+int ret, i;
+
+ret = pci_qdev_find_device(id, );
+if (ret < 0) {
+monitor_printf(mon, "invalid device id %s\n", id);
+return;
+}
+
+n = NVME(dev);
+
+for (i = 0; i < ARRAY_SIZE(nvme_states); i++) {
+if (!strcmp(nvme_states[i], state)) {
+nvme_inject_state(n, i);
+monitor_printf(mon,
+   "-device nvme,id=%s: state %s injected\n",
+   id, state);
+return;
+}
+}
+
+monitor_printf(mon, "invalid state %s\n", state);
+}
+
 type_init(nvme_register_types)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 37940b3ac2d2..1af1e0380d9b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -128,6 +128,11 @@ typedef struct NvmeFeatureVal {
 uint8_t acre;
 } NvmeFeatureVal;
 
+typedef enum NvmeState {
+NVME_STATE_NORMAL,
+NVME_STATE_CMD_INTERRUPTED,
+} NvmeState;
+
 typedef struct NvmeCtrl {
 PCIDeviceparent_obj;
 MemoryRegion bar0;
@@ -185,6 +190,8 @@ typedef struct NvmeCtrl {
 NvmeCQueue  admin_cq;
 NvmeIdCtrl  id_ctrl;
 NvmeFeatureVal  features;
+
+NvmeState   state;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
@@ -212,4 +219,5 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
 
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 
+void hmp_nvme_inject_state(Monitor *mon, const QDict *qdict);

[RFC PATCH 2/3] hw/block/nvme: support command retry delay

2021-02-10 Thread Minwoo Im
Set CRDT1(Command Retry Delay Time 1) in the Identify controller data
structure to milliseconds units of 100ms by the given value of
'cmd-retry-delay' parameter which is newly added.  If
cmd-retry-delay=1000, it will be set CRDT1 to 10.  This patch only
considers the CRDT1 without CRDT2 and 3 for the simplicity.

This patch also introduced set/get feature command handler for Host
Behavior feature (16h).  In this feature, ACRE(Advanced Command Retry
Enable) will be set by the host based on the Identify controller data
structure, especially by CRDTs.

If 'cmd-retry-delay' is not given, the default value will be -1 which is
CRDT will not be configured at all and ACRE will not be supported.  In
this case, we just set NVME_DNR to the error CQ entry just like we used
to.  If it's given to positive value, then ACRE will be supported by the
device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c  | 65 ++--
 hw/block/nvme.h  |  2 ++
 include/block/nvme.h | 13 -
 3 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 816e0e8e5205..6d3c554a0e99 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,7 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit=, \
- *  subsys= \
+ *  subsys=,cmd-retry-delay= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
  *  subsys=
@@ -71,6 +71,14 @@
  *   data size being in effect. By setting this property to 0, users can make
  *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
  *
+ * - `cmd-retry-delay`
+ *   Command Retry Delay value in unit of millisecond.  This value will be
+ *   reported to the CRDT1(Command Retry Delay Time 1) in Identify Controller
+ *   data structure in 100 milliseconds unit.  If this is not given, DNR(Do Not
+ *   Retry) bit field in the Status field of CQ entry.  If it's given to 0,
+ *   CRD(Command Retry Delay) will be set to 0 which is for retry without
+ *   delay.  Otherwise, it will set to 1 to delay for CRDT1 value.
+ *
  * nvme namespace device parameters
  * 
  * - `subsys`
@@ -154,6 +162,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
 [NVME_WRITE_ATOMICITY]  = true,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = true,
 [NVME_TIMESTAMP]= true,
+[NVME_HOST_BEHAVIOR_SUPPORT]= true,
 };
 
 static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
@@ -163,6 +172,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
+[NVME_HOST_BEHAVIOR_SUPPORT]= NVME_FEAT_CAP_CHANGE,
 };
 
 static const uint32_t nvme_cse_acs[256] = {
@@ -904,6 +914,16 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, 
uint32_t len,
 return status;
 }
 
+static inline bool nvme_should_retry(NvmeRequest *req)
+{
+switch (req->status) {
+case NVME_COMMAND_INTERRUPTED:
+return true;
+default:
+return false;
+}
+}
+
 static void nvme_post_cqes(void *opaque)
 {
 NvmeCQueue *cq = opaque;
@@ -947,7 +967,13 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, 
NvmeRequest *req)
 assert(cq->cqid == req->sq->cqid);
 
 if (req->status != NVME_SUCCESS) {
-req->status |= NVME_DNR;
+if (cq->ctrl->features.acre && nvme_should_retry(req)) {
+if (cq->ctrl->params.cmd_retry_delay > 0) {
+req->status |= NVME_CRD_CRDT1;
+}
+} else {
+req->status |= NVME_DNR;
+}
 }
 
 trace_pci_nvme_enqueue_req_completion(nvme_cid(req), cq->cqid,
@@ -3401,6 +3427,16 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, 
NvmeRequest *req)
 DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_get_feature_host_behavior(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeFeatureHostBehavior data = {};
+
+data.acre = n->features.acre;
+
+return nvme_dma(n, (uint8_t *), sizeof(data),
+DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeCmd *cmd = >cmd;
@@ -3506,6 +3542,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest 
*req)
 goto out;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, req);
+case NVME_HOST_BEHAVIOR_SUPPORT:
+return nvme_get_feature_host_behavior(n, req);
 default:
 break;
 }
@@ -3569,6 +3607,22 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_SUCCESS;
 }
 
+static uint16_t nvme_set_feature_host_behavior(NvmeCtrl *n, NvmeRequest *req)
+{
+  

[RFC PATCH 0/3] support command retry

2021-02-10 Thread Minwoo Im
Hello,

This series is RFC about supporting command retry feature in NVMe device
model.  The background to propose this feature is that in kernel
development and testing, retry scheme has not been able to be covered in
QEMU NVMe model device.  If we are able to control the retry scheme
fromt he device side, it would be nice for kernel developers to test.

We have been putting NVME_DNR in the CQ entry status field for
all error cases.  This series added a control for the command retry
based on the 'cmd-retry-delay' parameter which is newly added.  If it's
given to positive value, Command Retry Delay Time1(CRDT1) in the
Identify Controller data structure will be set in 100msec units.
Accordingly, it will cause host to Set Feature with Host Behavior(0x16)
to enable the Advanced Command Retry Enable(ACRE) feature to support
command retry with defined delay.  If 'cmd-retry-delay' param is given
to 0, then command failures will be retried directly without delay.

This series just considered Command Interrupted status code first which
is mainly about the ACRE feature addition.  nvme_should_retry() helper
will decide command should be retried or not.

But, we don't have any use-cases specified for the Command Interrupted
status code in the device model.  So, I proposed [3/3] patch by adding
'nvme_inject_state' HMP command to make users to give pre-defined state
to the controller device by injecting it via QEMU monitor.

Usage:

  # Configure the nvme0 device to be retried every 1sec(1000msec)
  -device nvme,id=nvme0,cmd-retry-delay=1000,...

  (qemu) nvme_inject_state nvme0 cmd-interrupted
  -device nvme,id=nvme0: state cmd-interrupted injected
  (qemu)

  # Then from now on, controller will interrupt all the commands
  # to be processed with Command Interrupted status code.  Then host
  # will retry based on the delay.

Thanks,

Minwoo Im (3):
  hw/block/nvme: set NVME_DNR in a single place
  hw/block/nvme: support command retry delay
  hw/block/nvme: add nvme_inject_state HMP command

 hmp-commands.hx   |  13 ++
 hw/block/nvme.c   | 304 +-
 hw/block/nvme.h   |  10 ++
 include/block/nvme.h  |  13 +-
 include/monitor/hmp.h |   1 +
 5 files changed, 244 insertions(+), 97 deletions(-)

-- 
2.17.1




[PATCH V2 6/7] hw/block/nvme: support namespace attachment command

2021-02-10 Thread Minwoo Im
This patch supports Namespace Attachment command for the pre-defined
nvme-ns device nodes.  Of course, attach/detach namespace should only be
supported in case 'subsys' is given.  This is because if we detach a
namespace from a controller, somebody needs to manage the detached, but
allocated namespace in the NVMe subsystem.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 10 +++
 hw/block/nvme.c| 59 ++
 hw/block/nvme.h|  5 
 hw/block/trace-events  |  2 ++
 include/block/nvme.h   |  5 
 5 files changed, 81 insertions(+)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 14627f9ccb41..ef4bec928eae 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,6 +30,16 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
+uint32_t cntlid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->ctrls[cntlid];
+}
+
 /*
  * Return allocated namespace of the specified nsid in the subsystem.
  */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 697368a6ae0c..71bcd66f1956 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -183,6 +183,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_none[256];
@@ -3766,6 +3767,62 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
+static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeCtrl *ctrl;
+uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+bool attach = !(dw10 & 0xf);
+uint16_t *nr_ids = [0];
+uint16_t *ids = [1];
+uint16_t ret;
+int i;
+
+trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf);
+
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+ret = nvme_dma(n, (uint8_t *)list, 4096,
+   DMA_DIRECTION_TO_DEVICE, req);
+if (ret) {
+return ret;
+}
+
+if (!*nr_ids) {
+return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
+}
+
+for (i = 0; i < *nr_ids; i++) {
+ctrl = nvme_subsys_ctrl(n->subsys, ids[i]);
+if (!ctrl) {
+return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
+}
+
+if (attach) {
+if (nvme_ns_is_attached(ctrl, ns)) {
+return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
+}
+
+nvme_ns_attach(ctrl, ns);
+__nvme_select_ns_iocs(ctrl, ns);
+} else {
+if (!nvme_ns_is_attached(ctrl, ns)) {
+return NVME_NS_NOT_ATTACHED | NVME_DNR;
+}
+
+nvme_ns_detach(ctrl, ns);
+}
+}
+
+return NVME_SUCCESS;
+}
+
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
@@ -3797,6 +3854,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_get_feature(n, req);
 case NVME_ADM_CMD_ASYNC_EV_REQ:
 return nvme_aer(n, req);
+case NVME_ADM_CMD_NS_ATTACHMENT:
+return nvme_ns_attachment(n, req);
 default:
 assert(false);
 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1c7796b20996..5a1ab857d166 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -222,6 +222,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
NvmeNamespace *ns)
 n->namespaces[nvme_nsid(ns) - 1] = ns;
 }
 
+static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
+{
+n->namespaces[nvme_nsid(ns) - 1] = NULL;
+}
+
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
 {
 NvmeSQueue *sq = req->sq;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b6e972d733a6..bf67fe7873d2 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -80,6 +80,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16""
 pci_nvme_aer_aerl_exceeded(void) "aerl exceeded"
 pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 
0x%"PRIx8""
 pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 
0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", 
sel=0x%"PRIx8""
+pci_nvme_ns_attachm

[PATCH V2 3/7] hw/block/nvme: fix allocated namespace list to 256

2021-02-10 Thread Minwoo Im
Expand allocated namespace list (subsys->namespaces) to have 256 entries
which is a value lager than at least NVME_MAX_NAMESPACES which is for
attached namespace list in a controller.

Allocated namespace list should at least larger than attached namespace
list.

n->num_namespaces = NVME_MAX_NAMESPACES;

The above line will set the NN field by id->nn so that the subsystem
should also prepare at least this number of namespace list entries.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 2 +-
 hw/block/nvme.h| 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 574774390c4c..8a0732b22316 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  32
+#define NVME_SUBSYS_MAX_NAMESPACES  256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index bde0ed7c2679..1c7796b20996 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -10,6 +10,12 @@
 #define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 
+/*
+ * Subsystem namespace list for allocated namespaces should be larger than
+ * attached namespace list in a controller.
+ */
+QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_SUBSYS_MAX_NAMESPACES);
+
 typedef struct NvmeParams {
 char *serial;
 uint32_t num_queues; /* deprecated since 5.1 */
-- 
2.17.1




[PATCH V2 5/7] hw/block/nvme: refactor nvme_select_ns_iocs

2021-02-10 Thread Minwoo Im
This patch has no functional changes.  This patch just refactored
nvme_select_ns_iocs() to iterate the attached namespaces of the
controlller and make it invoke __nvme_select_ns_iocs().

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d1761a82731f..697368a6ae0c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3896,6 +3896,25 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 }
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns)
+{
+ns->iocs = nvme_cse_iocs_none;
+switch (ns->csi) {
+case NVME_CSI_NVM:
+if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+case NVME_CSI_ZONED:
+if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+ns->iocs = nvme_cse_iocs_zoned;
+} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+}
+}
+
 static void nvme_select_ns_iocs(NvmeCtrl *n)
 {
 NvmeNamespace *ns;
@@ -3906,21 +3925,8 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
 if (!ns) {
 continue;
 }
-ns->iocs = nvme_cse_iocs_none;
-switch (ns->csi) {
-case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
-ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-}
+
+__nvme_select_ns_iocs(n, ns);
 }
 }
 
-- 
2.17.1




[PATCH V2 2/7] hw/block/nvme: fix namespaces array to 1-based

2021-02-10 Thread Minwoo Im
subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES.
But subsys->namespaces are being accessed with 1-based namespace id
which means the very first array entry will always be empty(NULL).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 890d118117dc..574774390c4c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
+NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-- 
2.17.1




[PATCH V2 7/7] hw/block/nvme: support Identify NS Attached Controller List

2021-02-10 Thread Minwoo Im
Support Identify command for Namespace attached controller list.  This
command handler will traverse the controller instances in the given
subsystem to figure out whether the specified nsid is attached to the
controllers or not.

The 4096bytes Identify data will return with the first entry (16bits)
indicating the number of the controller id entries.  So, the data can
hold up to 2047 entries for the controller ids.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c   | 42 ++
 hw/block/trace-events |  1 +
 include/block/nvme.h  |  1 +
 3 files changed, 44 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 71bcd66f1956..da60335def9f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3157,6 +3157,46 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
+static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint16_t min_id = le16_to_cpu(c->ctrlid);
+uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
+uint16_t *ids = [1];
+NvmeNamespace *ns;
+NvmeCtrl *ctrl;
+int cntlid, nr_ids = 0;
+
+trace_pci_nvme_identify_ns_attached_list(min_id);
+
+if (c->nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+ns = nvme_subsys_ns(n->subsys, c->nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+for (cntlid = min_id; cntlid < ARRAY_SIZE(n->subsys->ctrls); cntlid++) {
+ctrl = nvme_subsys_ctrl(n->subsys, cntlid);
+if (!ctrl) {
+continue;
+}
+
+if (!nvme_ns_is_attached(ctrl, ns)) {
+continue;
+}
+
+ids[nr_ids++] = cntlid;
+}
+
+list[0] = nr_ids;
+
+return nvme_dma(n, (uint8_t *)list, sizeof(list),
+DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
 bool active)
 {
@@ -3356,6 +3396,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ns(n, req, true);
 case NVME_ID_CNS_NS_PRESENT:
 return nvme_identify_ns(n, req, false);
+case NVME_ID_CNS_NS_ATTACHED_CTRL_LIST:
+return nvme_identify_ns_attached_list(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index bf67fe7873d2..2d88d96c2165 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -62,6 +62,7 @@ pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, 
cqid=%"PRIu16""
 pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
+pci_nvme_identify_ns_attached_list(uint16_t cntid) "cntid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 4b016f954fee..fb82d8682e9f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -968,6 +968,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
 NVME_ID_CNS_NS_PRESENT_LIST   = 0x10,
 NVME_ID_CNS_NS_PRESENT= 0x11,
+NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
-- 
2.17.1




[PATCH V2 4/7] hw/block/nvme: support allocated namespace type

2021-02-10 Thread Minwoo Im
>From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines
valid namespace types:

- Unallocated: Not exists in the NVMe subsystem
- Allocated: Exists in the NVMe subsystem
- Inactive: Not attached to the controller
- Active: Attached to the controller

This patch added support for allocated, but not attached namespace type:

!nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid)

nvme_ns() returns attached namespace instance of the given controller
and nvme_subsys_ns() returns allocated namespace instance in the
subsystem.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 13 +
 hw/block/nvme.c| 63 +++---
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 8a0732b22316..14627f9ccb41 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,4 +30,17 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+/*
+ * Return allocated namespace of the specified nsid in the subsystem.
+ */
+static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys,
+uint32_t nsid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->namespaces[nsid];
+}
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a1e930f7c8e4..d1761a82731f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3124,7 +3124,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3138,7 +3138,14 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3149,7 +3156,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3163,7 +3171,14 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3176,7 +3191,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3201,7 +3217,14 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, i);
+if (!ns) {
+continue;
+}
+} else {
+continue;
+}
 }
 if (ns->params.nsid <= min_nsid) {
 continue;
@@ -3215,7 +3238,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3241,7 +3265,14 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns =

[PATCH V2 1/7] hw/block/nvme: support namespace detach

2021-02-10 Thread Minwoo Im
Given that now we have nvme-subsys device supported, we can manage
namespace allocated, but not attached: detached.  This patch introduced
a parameter for nvme-ns device named 'detached'.  This parameter
indicates whether the given namespace device is detached from
a entire NVMe subsystem('subsys' given case, shared namespace) or a
controller('bus' given case, private namespace).

- Allocated namespace

  1) Shared ns in the subsystem 'subsys0':

 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true

  2) Private ns for the controller 'nvme0' of the subsystem 'subsys0':

 -device nvme-subsys,id=subsys0
 -device nvme,serial=foo,id=nvme0,subsys=subsys0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

  3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns:

 -device nvme,serial=foo,id=nvme0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c |  1 +
 hw/block/nvme-ns.h |  1 +
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.c| 41 +++--
 hw/block/nvme.h| 22 ++
 5 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c3b513b0fc78..cdcb81319fb5 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -393,6 +393,7 @@ static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
  NvmeSubsystem *),
+DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..b0c00e115d81 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -26,6 +26,7 @@ typedef struct NvmeZone {
 } NvmeZone;
 
 typedef struct NvmeNamespaceParams {
+bool detached;
 uint32_t nsid;
 QemuUUID uuid;
 
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index ccf6a71398d3..890d118117dc 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -23,6 +23,7 @@ typedef struct NvmeSubsystem {
 uint8_t subnqn[256];
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
+/* Allocated namespaces for this subsystem */
 NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
 } NvmeSubsystem;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b84e34843f5..a1e930f7c8e4 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,7 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit=, \
- *  subsys= \
+ *  subsys=,detached=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
  *  subsys=
@@ -78,6 +78,13 @@
  *   controllers in the subsystem. Otherwise, `bus` must be given to attach
  *   this namespace to a specified single controller as a non-shared namespace.
  *
+ * - `detached`
+ *   Not to attach the namespace device to controllers in the NVMe subsystem
+ *   during boot-up. If not given, namespaces are all attahced to all
+ *   controllers in the subsystem by default.
+ *   It's mutual exclusive with 'bus' parameter. It's only valid in case
+ *   `subsys` is provided.
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -4521,6 +4528,20 @@ static void nvme_init_state(NvmeCtrl *n)
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 
+static int nvme_attach_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+if (nvme_ns_is_attached(n, ns)) {
+error_setg(errp,
+   "namespace %d is already attached to controller %d",
+   nvme_nsid(ns), n->cntlid);
+return -1;
+}
+
+nvme_ns_attach(n, ns);
+
+return 0;
+}
+
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
 uint32_t nsid = nvme_nsid(ns);
@@ -4552,7 +4573,23 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
*ns, Error **errp)
 
 trace_pci_nvme_register_namespace(nsid);
 
-n->namespaces[nsid - 1] = ns;
+/*
+ * If subsys is not given, namespae is always attached to the controller
+ * because there's no subsystem to manage namespace allocation.
+ */
+if (!n->subsys) {
+if (ns->params.detached) {
+error_setg(errp,
+   "detached needs nvme-subsys specified nvme or nvme-ns");
+return -1;
+}
+
+return nvme_attach_namesp

[PATCH V2 0/6] hw/block/nvme: support namespace attachment

2021-02-10 Thread Minwoo Im
  FW Rev  
  -  
 - -- 
 
  /dev/nvme0n1  foo  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n2  foo  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme1n2  bar  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme1n3  bar  QEMU NVMe Ctrl 
  3 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  root@vm:~/work# nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=1
  attach-ns: Success, nsid:1
  root@vm:~/work# echo 1 > /sys/class/nvme/nvme1/rescan_controller 
  root@vm:~/work# nvme list
  Node  SN   Model  
  Namespace Usage  Format   FW Rev  
  -  
 - -- 
 
  /dev/nvme0n1  foo  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n2  foo  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme1n1  bar  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme1n2  bar  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme1n3  bar  QEMU NVMe Ctrl 
  3 268.44  MB / 268.44  MB512   B +  0 B   1.0 

Minwoo Im (7):
  hw/block/nvme: support namespace detach
  hw/block/nvme: fix namespaces array to 1-based
  hw/block/nvme: fix allocated namespace list to 256
  hw/block/nvme: support allocated namespace type
  hw/block/nvme: refactor nvme_select_ns_iocs
  hw/block/nvme: support namespace attachment command
  hw/block/nvme: support Identify NS Attached Controller List

 hw/block/nvme-ns.c |   1 +
 hw/block/nvme-ns.h |   1 +
 hw/block/nvme-subsys.h |  28 -
 hw/block/nvme.c| 241 +++--
 hw/block/nvme.h|  33 ++
 hw/block/trace-events  |   3 +
 include/block/nvme.h   |   6 +
 7 files changed, 278 insertions(+), 35 deletions(-)

-- 
2.17.1




Re: [PATCH 2/2] hw/block/nvme: add write uncorrectable command

2021-02-10 Thread Minwoo Im
> > It might be nitpick, 'nlb' would easily represent the value which is
> > defined itself in the spec which is zero-based.  Can we have this like:
> > 
> > uint32_t nlb = le16_to_cpu(rw->nlb);
> > 
> > bitmap_clear(ns->uncorrectable, slba, nlb + 1);
> > 
> 
> 
> I do not disagree, but the `uint32_t nlb = le16_to_cpu(rw->nlb) + 1;`
> pattern is already used in several places.

Oh yes, Now I just saw some places.  Then, please take my review tag for
this patch.

Thanks!



Re: [PATCH 2/2] hw/block/nvme: add write uncorrectable command

2021-02-10 Thread Minwoo Im
On 21-02-10 08:06:46, Klaus Jensen wrote:
> From: Gollu Appalanaidu 
> 
> Add support for marking blocks invalid with the Write Uncorrectable
> command. Block status is tracked in a (non-persistent) bitmap that is
> checked on all reads and written to on all writes. This is potentially
> expensive, so keep Write Uncorrectable disabled by default.
> 
> Signed-off-by: Gollu Appalanaidu 
> Signed-off-by: Klaus Jensen 
> ---
>  docs/specs/nvme.txt   |  3 ++
>  hw/block/nvme-ns.h|  2 ++
>  hw/block/nvme.h   |  1 +
>  hw/block/nvme-ns.c|  2 ++
>  hw/block/nvme.c   | 65 +--
>  hw/block/trace-events |  1 +
>  6 files changed, 66 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
> index 56d393884e7a..88f9cc278d4c 100644
> --- a/docs/specs/nvme.txt
> +++ b/docs/specs/nvme.txt
> @@ -19,5 +19,8 @@ Known issues
>  
>  * The accounting numbers in the SMART/Health are reset across power cycles
>  
> +* Marking blocks invalid with the Write Uncorrectable is not persisted across
> +  power cycles.
> +
>  * Interrupt Coalescing is not supported and is disabled by default in 
> volation
>of the specification.
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index 7af6884862b5..15fa422ded03 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -72,6 +72,8 @@ typedef struct NvmeNamespace {
>  struct {
>  uint32_t err_rec;
>  } features;
> +
> +unsigned long *uncorrectable;
>  } NvmeNamespace;
>  
>  static inline uint32_t nvme_nsid(NvmeNamespace *ns)
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 98082b2dfba3..9b8f85b9cf16 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -68,6 +68,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
>  case NVME_CMD_FLUSH:return "NVME_NVM_CMD_FLUSH";
>  case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
>  case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
> +case NVME_CMD_WRITE_UNCOR:  return "NVME_CMD_WRITE_UNCOR";
>  case NVME_CMD_COMPARE:  return "NVME_NVM_CMD_COMPARE";
>  case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
>  case NVME_CMD_DSM:  return "NVME_NVM_CMD_DSM";
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index ade46e2f3739..742bbc4b4b62 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -72,6 +72,8 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
>  id_ns->mcl = cpu_to_le32(ns->params.mcl);
>  id_ns->msrc = ns->params.msrc;
>  
> +ns->uncorrectable = bitmap_new(id_ns->nsze);
> +
>  return 0;
>  }
>  
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index e5f725d7..56048046c193 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1112,6 +1112,20 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, 
> uint64_t slba,
>  return NVME_SUCCESS;
>  }
>  
> +static inline uint16_t nvme_check_uncor(NvmeNamespace *ns, uint64_t slba,
> +uint32_t nlb)
> +{
> +uint64_t elba = nlb + slba;
> +
> +if (ns->uncorrectable) {
> +if (find_next_bit(ns->uncorrectable, elba, slba) < elba) {
> +return NVME_UNRECOVERED_READ | NVME_DNR;
> +}
> +}
> +
> +return NVME_SUCCESS;
> +}
> +
>  static void nvme_aio_err(NvmeRequest *req, int ret)
>  {
>  uint16_t status = NVME_SUCCESS;
> @@ -1423,14 +1437,24 @@ static void nvme_rw_cb(void *opaque, int ret)
>  BlockAcctCookie *acct = >acct;
>  BlockAcctStats *stats = blk_get_stats(blk);
>  
> +bool is_write = nvme_is_write(req);
> +
>  trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
>  
> -if (ns->params.zoned && nvme_is_write(req)) {
> +if (ns->params.zoned && is_write) {
>  nvme_finalize_zoned_write(ns, req);
>  }
>  
>  if (!ret) {
>  block_acct_done(stats, acct);
> +
> +if (is_write) {
> +NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
> +    uint64_t slba = le64_to_cpu(rw->slba);
> +uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
> +
> +bitmap_clear(ns->uncorrectable, slba, nlb);

It might be nitpick, 'nlb' would easily represent the value which is
defined itself in the spec which is zero-based.  Can we have this like:

uint32_t nlb = le16_to_cpu(rw->nlb);

bitmap_clear(ns->uncorrectable, slba, nlb + 1);

Otherwise, it looks good to me.

Reviewed-by: Minwoo Im 



Re: [PATCH 1/2] hw/block/nvme: add oncs device parameter

2021-02-10 Thread Minwoo Im
On 21-02-10 08:06:45, Klaus Jensen wrote:
> From: Gollu Appalanaidu 
> 
> Add the 'oncs' nvme device parameter to allow optional features to be
> enabled/disabled explicitly. Since most of these are optional commands,
> make the CSE log pages dynamic to account for the value of ONCS.
> 
> Signed-off-by: Gollu Appalanaidu 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Minwoo Im 



Re: [PATCH v2] hw/block: nvme: Fix a build error in nvme_get_feature()

2021-02-10 Thread Minwoo Im
On 21-02-10 18:23:17, Bin Meng wrote:
> From: Bin Meng 
> 
> Current QEMU HEAD nvme.c does not compile:
> 
>   hw/block/nvme.c:3242:9: error: ‘result’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>  trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
>  ^
>   hw/block/nvme.c:3150:14: note: ‘result’ was declared here
>  uint32_t result;
>   ^
> 
> Explicitly initialize the result to fix it.
> 
> Fixes: aa5e55e3b07e ("hw/block/nvme: open code for volatile write cache")
> Signed-off-by: Bin Meng 

Bin,

Thanks for the fix!



Re: [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing

2021-02-08 Thread Minwoo Im
On 21-02-07 22:49:35, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Fix missing sign inversion.
> 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Minwoo Im 



Re: [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes

2021-02-08 Thread Minwoo Im
On 21-02-07 22:49:34, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> A Write Zeroes commands should not be counted in either the 'Data Units
> Written' or in 'Host Write Commands' SMART/Health Information Log page.
> 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Minwoo Im 



Re: [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context

2021-02-08 Thread Minwoo Im
On 21-02-07 22:49:33, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The 'len' member of the nvme_compare_ctx struct is redundant since the
> same information is available in the 'iov' member.
> 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Minwoo Im 



[PATCH 6/6] hw/block/nvme: support namespace attachment command

2021-02-05 Thread Minwoo Im
This patch supports Namespace Attachment command for the pre-defined
nvme-ns device nodes.  Of course, attach/detach namespace should only be
supported in case 'subsys' is given.  This is because if we detach a
namespace from a controller, somebody needs to manage the detached, but
allocated namespace in the NVMe subsystem.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 10 +++
 hw/block/nvme.c| 59 ++
 hw/block/nvme.h|  5 
 hw/block/trace-events  |  2 ++
 include/block/nvme.h   |  5 
 5 files changed, 81 insertions(+)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 14627f9ccb41..ef4bec928eae 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,6 +30,16 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
+uint32_t cntlid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->ctrls[cntlid];
+}
+
 /*
  * Return allocated namespace of the specified nsid in the subsystem.
  */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 697368a6ae0c..769436722c7e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -183,6 +183,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_none[256];
@@ -3766,6 +3767,62 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
+static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeCtrl *ctrl;
+uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+bool attach = !(dw10 & 0xf);
+uint16_t *nr_ids = [0];
+uint16_t *ids = [1];
+uint16_t ret;
+int i;
+
+trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf);
+
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+ret = nvme_dma(n, (uint8_t *)list, 4096,
+   DMA_DIRECTION_TO_DEVICE, req);
+if (ret) {
+return ret;
+}
+
+if (!*nr_ids) {
+return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
+}
+
+for (i = 0; i < *nr_ids; i++) {
+ctrl = nvme_subsys_ctrl(n->subsys, ids[i]);
+if (!ctrl) {
+return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
+}
+
+if (attach) {
+if (nvme_ns_is_attached(ctrl, ns)) {
+return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
+}
+
+nvme_ns_attach(n, ns);
+__nvme_select_ns_iocs(n, ns);
+} else {
+if (!nvme_ns_is_attached(ctrl, ns)) {
+return NVME_NS_NOT_ATTACHED | NVME_DNR;
+}
+
+nvme_ns_detach(n, ns);
+}
+}
+
+return NVME_SUCCESS;
+}
+
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
@@ -3797,6 +3854,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_get_feature(n, req);
 case NVME_ADM_CMD_ASYNC_EV_REQ:
 return nvme_aer(n, req);
+case NVME_ADM_CMD_NS_ATTACHMENT:
+return nvme_ns_attachment(n, req);
 default:
 assert(false);
 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1c7796b20996..5a1ab857d166 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -222,6 +222,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
NvmeNamespace *ns)
 n->namespaces[nvme_nsid(ns) - 1] = ns;
 }
 
+static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
+{
+n->namespaces[nvme_nsid(ns) - 1] = NULL;
+}
+
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
 {
 NvmeSQueue *sq = req->sq;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b6e972d733a6..bf67fe7873d2 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -80,6 +80,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16""
 pci_nvme_aer_aerl_exceeded(void) "aerl exceeded"
 pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 
0x%"PRIx8""
 pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 
0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", 
sel=0x%"PRIx8""
+pci_nvme_ns_attachment_attac

[PATCH 5/6] hw/block/nvme: refactor nvme_select_ns_iocs

2021-02-05 Thread Minwoo Im
This patch has no functional changes.  This patch just refactored
nvme_select_ns_iocs() to iterate the attached namespaces of the
controlller and make it invoke __nvme_select_ns_iocs().

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d1761a82731f..697368a6ae0c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3896,6 +3896,25 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 }
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns)
+{
+ns->iocs = nvme_cse_iocs_none;
+switch (ns->csi) {
+case NVME_CSI_NVM:
+if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+case NVME_CSI_ZONED:
+if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+ns->iocs = nvme_cse_iocs_zoned;
+} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+}
+}
+
 static void nvme_select_ns_iocs(NvmeCtrl *n)
 {
 NvmeNamespace *ns;
@@ -3906,21 +3925,8 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
 if (!ns) {
 continue;
 }
-ns->iocs = nvme_cse_iocs_none;
-switch (ns->csi) {
-case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
-ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-}
+
+__nvme_select_ns_iocs(n, ns);
 }
 }
 
-- 
2.17.1




[PATCH 4/6] hw/block/nvme: support allocated namespace type

2021-02-05 Thread Minwoo Im
>From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines
valid namespace types:

- Unallocated: Not exists in the NVMe subsystem
- Allocated: Exists in the NVMe subsystem
- Inactive: Not attached to the controller
- Active: Attached to the controller

This patch added support for allocated, but not attached namespace type:

!nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid)

nvme_ns() returns attached namespace instance of the given controller
and nvme_subsys_ns() returns allocated namespace instance in the
subsystem.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 13 +
 hw/block/nvme.c| 63 +++---
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 8a0732b22316..14627f9ccb41 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,4 +30,17 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+/*
+ * Return allocated namespace of the specified nsid in the subsystem.
+ */
+static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys,
+uint32_t nsid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->namespaces[nsid];
+}
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a1e930f7c8e4..d1761a82731f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3124,7 +3124,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3138,7 +3138,14 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3149,7 +3156,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3163,7 +3171,14 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3176,7 +3191,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3201,7 +3217,14 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, i);
+if (!ns) {
+continue;
+}
+} else {
+continue;
+}
 }
 if (ns->params.nsid <= min_nsid) {
 continue;
@@ -3215,7 +3238,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3241,7 +3265,14 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns =

[PATCH 3/6] hw/block/nvme: fix allocated namespace list to 256

2021-02-05 Thread Minwoo Im
Expand allocated namespace list (subsys->namespaces) to have 256 entries
which is a value lager than at least NVME_MAX_NAMESPACES which is for
attached namespace list in a controller.

Allocated namespace list should at least larger than attached namespace
list.

n->num_namespaces = NVME_MAX_NAMESPACES;

The above line will set the NN field by id->nn so that the subsystem
should also prepare at least this number of namespace list entries.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 2 +-
 hw/block/nvme.h| 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 574774390c4c..8a0732b22316 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  32
+#define NVME_SUBSYS_MAX_NAMESPACES  256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index bde0ed7c2679..1c7796b20996 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -10,6 +10,12 @@
 #define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 
+/*
+ * Subsystem namespace list for allocated namespaces should be larger than
+ * attached namespace list in a controller.
+ */
+QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_SUBSYS_MAX_NAMESPACES);
+
 typedef struct NvmeParams {
 char *serial;
 uint32_t num_queues; /* deprecated since 5.1 */
-- 
2.17.1




[PATCH 1/6] hw/block/nvme: support namespace detach

2021-02-05 Thread Minwoo Im
Given that now we have nvme-subsys device supported, we can manage
namespace allocated, but not attached: detached.  This patch introduced
a parameter for nvme-ns device named 'detached'.  This parameter
indicates whether the given namespace device is detached from
a entire NVMe subsystem('subsys' given case, shared namespace) or a
controller('bus' given case, private namespace).

- Allocated namespace

  1) Shared ns in the subsystem 'subsys0':

 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true

  2) Private ns for the controller 'nvme0' of the subsystem 'subsys0':

 -device nvme-subsys,id=subsys0
 -device nvme,serial=foo,id=nvme0,subsys=subsys0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

  3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns:

 -device nvme,serial=foo,id=nvme0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c |  1 +
 hw/block/nvme-ns.h |  1 +
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.c| 41 +++--
 hw/block/nvme.h| 22 ++
 5 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c3b513b0fc78..cdcb81319fb5 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -393,6 +393,7 @@ static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
  NvmeSubsystem *),
+DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..b0c00e115d81 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -26,6 +26,7 @@ typedef struct NvmeZone {
 } NvmeZone;
 
 typedef struct NvmeNamespaceParams {
+bool detached;
 uint32_t nsid;
 QemuUUID uuid;
 
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index ccf6a71398d3..890d118117dc 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -23,6 +23,7 @@ typedef struct NvmeSubsystem {
 uint8_t subnqn[256];
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
+/* Allocated namespaces for this subsystem */
 NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
 } NvmeSubsystem;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b84e34843f5..a1e930f7c8e4 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,7 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit=, \
- *  subsys= \
+ *  subsys=,detached=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
  *  subsys=
@@ -78,6 +78,13 @@
  *   controllers in the subsystem. Otherwise, `bus` must be given to attach
  *   this namespace to a specified single controller as a non-shared namespace.
  *
+ * - `detached`
+ *   Not to attach the namespace device to controllers in the NVMe subsystem
+ *   during boot-up. If not given, namespaces are all attahced to all
+ *   controllers in the subsystem by default.
+ *   It's mutual exclusive with 'bus' parameter. It's only valid in case
+ *   `subsys` is provided.
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -4521,6 +4528,20 @@ static void nvme_init_state(NvmeCtrl *n)
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 
+static int nvme_attach_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+if (nvme_ns_is_attached(n, ns)) {
+error_setg(errp,
+   "namespace %d is already attached to controller %d",
+   nvme_nsid(ns), n->cntlid);
+return -1;
+}
+
+nvme_ns_attach(n, ns);
+
+return 0;
+}
+
 int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
 uint32_t nsid = nvme_nsid(ns);
@@ -4552,7 +4573,23 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
*ns, Error **errp)
 
 trace_pci_nvme_register_namespace(nsid);
 
-n->namespaces[nsid - 1] = ns;
+/*
+ * If subsys is not given, namespae is always attached to the controller
+ * because there's no subsystem to manage namespace allocation.
+ */
+if (!n->subsys) {
+if (ns->params.detached) {
+error_setg(errp,
+   "detached needs nvme-subsys specified nvme or nvme-ns");
+return -1;
+}
+
+return nvme_attach_namesp

[PATCH 2/6] hw/block/nvme: fix namespaces array to 1-based

2021-02-05 Thread Minwoo Im
subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES.
But subsys->namespaces are being accessed with 1-based namespace id
which means the very first array entry will always be empty(NULL).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 890d118117dc..574774390c4c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
+NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-- 
2.17.1




[PATCH 0/6] hw/block/nvme: support namespace attachment

2021-02-05 Thread Minwoo Im
Hello,

This series supports namespace attachment: attach and detach.  It means
that this series also introduced a scheme for allocated namespace which
is detached, but allocated in a NVMe subsystem.  Given that now we have
nvme-subsys device to specify a NVMe subsystem, it can manage detached
namespaces from controllers in the subsystem itself.

Tested:

  -device nvme-subsys,id=subsys0 \  
  
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ 
  
  -device nvme-ns,id=ns1,drive=drv0,nsid=1,subsys=subsys0,zoned=false \ 
 
  -device nvme-ns,id=ns2,drive=drv1,nsid=2,subsys=subsys0,zoned=true \  
 
  -device 
nvme-ns,id=ns3,drive=drv2,nsid=3,subsys=subsys0,detached=true,zoned=false \
  -device 
nvme-ns,id=ns4,drive=drv3,nsid=4,subsys=subsys0,detached=true,zoned=true \ 

  root@vm:~# nvme list
  Node  SN   Model  
  Namespace Usage  Format   FW Rev
  -  
 - -- 
 
  /dev/nvme0n1  foo  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0
  /dev/nvme0n2  foo  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0

  root@vm:~# nvme attach-ns /dev/nvme0 --namespace-id=3 --controllers=0
  attach-ns: Success, nsid:3
  root@vm:~# nvme attach-ns /dev/nvme0 --namespace-id=4 --controllers=0
  attach-ns: Success, nsid:4
  root@vm:~# echo 1 > /sys/class/nvme/nvme0/rescan_controller

  root@vm:~# nvme list
  Node  SN   Model  
  Namespace Usage  Format   FW Rev  
  -  
 - -- 
 
  /dev/nvme0n1  foo  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n2  foo  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n3  foo  QEMU NVMe Ctrl 
  3 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n4  foo  QEMU NVMe Ctrl 
  4 268.44  MB / 268.44  MB512   B +  0 B   1.0 

  root@vm:~# nvme detach-ns /dev/nvme0 --namespace-id=3 --controllers=0
  detach-ns: Success, nsid:3
  root@vm:~# nvme detach-ns /dev/nvme0 --namespace-id=4 --controllers=0
  detach-ns: Success, nsid:4
  root@vm:~# echo 1 > /sys/class/nvme/nvme0/rescan_controller

  root@vm:~# nvme list
  Node  SN   Model  
  Namespace Usage  Format   FW Rev  
  -  
 - -- 
 
  /dev/nvme0n1  foo  QEMU NVMe Ctrl 
  1 268.44  MB / 268.44  MB512   B +  0 B   1.0 
  /dev/nvme0n2  foo  QEMU NVMe Ctrl 
  2 268.44  MB / 268.44  MB512   B +  0 B   1.0 

Thanks,

Minwoo Im (6):
  hw/block/nvme: support namespace detach
  hw/block/nvme: fix namespaces array to 1-based
  hw/block/nvme: fix allocated namespace list to 256
  hw/block/nvme: support allocated namespace type
  hw/block/nvme: refactor nvme_select_ns_iocs
  hw/block/nvme: support namespace attachment command

 hw/block/nvme-ns.c |   1 +
 hw/block/nvme-ns.h |   1 +
 hw/block/nvme-subsys.h |  28 +-
 hw/block/nvme.c| 199 ++---
 hw/block/nvme.h|  33 +++
 hw/block/trace-events  |   2 +
 include/block/nvme.h   |   5 ++
 7 files changed, 234 insertions(+), 35 deletions(-)

-- 
2.17.1




Re: [PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-26 Thread Minwoo Im
On 21-01-27 05:39:29, Keith Busch wrote:
> This came out looking cleaner than I had initially expected. Thanks for
> seeing this feature through!
> 
> Reviewed-by: Keith Busch 

Thanks Keith for the review!



Re: [PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-26 Thread Minwoo Im
On 21-01-26 09:57:23, Keith Busch wrote:
> On Tue, Jan 26, 2021 at 09:52:48AM +0900, Minwoo Im wrote:
> > On 21-01-25 10:11:43, Keith Busch wrote:
> > > On Mon, Jan 25, 2021 at 07:03:32PM +0100, Klaus Jensen wrote:
> > > > On Jan 24 11:54, Minwoo Im wrote:
> > > > > We have nvme-subsys and nvme devices mapped together.  To support
> > > > > multi-controller scheme to this setup, controller identifier(id) has 
> > > > > to
> > > > > be managed.  Earlier, cntlid(controller id) used to be always 0 
> > > > > because
> > > > > we didn't have any subsystem scheme that controller id matters.
> > > > > 
> > > > > This patch introduced 'cntlid' attribute to the nvme controller
> > > > > instance(NvmeCtrl) and make it allocated by the nvme-subsys device
> > > > > mapped to the controller.  If nvme-subsys is not given to the
> > > > > controller, then it will always be 0 as it was.
> > > > > 
> > > > > Added 'ctrls' array in the nvme-subsys instance to manage attached
> > > > > controllers to the subsystem with a limit(32).  This patch didn't take
> > > > > list for the controllers to make it seamless with nvme-ns device.
> > > > > 
> > > > > Signed-off-by: Minwoo Im 
> > > > > ---
> > > > >  hw/block/nvme-subsys.c | 21 +
> > > > >  hw/block/nvme-subsys.h |  4 
> > > > >  hw/block/nvme.c| 29 +
> > > > >  hw/block/nvme.h|  1 +
> > > > >  4 files changed, 55 insertions(+)
> > > > > 
> > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > > > index b525fca14103..7138389be4bd 100644
> > > > > --- a/hw/block/nvme.c
> > > > > +++ b/hw/block/nvme.c
> > > > > @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, 
> > > > > PCIDevice *pci_dev)
> > > > >  id->psd[0].enlat = cpu_to_le32(0x10);
> > > > >  id->psd[0].exlat = cpu_to_le32(0x4);
> > > > >  
> > > > > +if (n->subsys) {
> > > > > +id->cmic |= NVME_CMIC_MULTI_CTRL;
> > > > > +}
> > > > 
> > > > Since multiple controllers show up with a PCIe port of their own, do we
> > > > need to set bit 0 (NVME_CMIC_MULTI_PORT?) as well? Or am I
> > > > misunderstanding that bit?
> > > 
> > > AIUI, if you report this MULTI_PORT bit, then each PCI device in the
> > > subsystem needs to report a different "Port Number" in their PCIe Link
> > > Capabilities register. I don't think we can manipulate that value from
> > > the nvme "device", but I also don't know what a host should do with this
> > > information even if we could. So, I think it's safe to leave it at 0.
> > 
> > AFAIK, If we leave it to 0, kernel will not allocate disk for multi-path
> > case (e.g., nvmeXcYnZ).
> 
> Kernel only checks for MULTI_CTRL. It doesn't do anything with MULTI_PORT.

Please forgive me that I took this discussion as MULTI_CTRL rather than
MULTI_PORT.  Please ignore this noise ;)

Thanks!



Re: [PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-25 Thread Minwoo Im
On 21-01-25 21:29:58, Klaus Jensen wrote:
> On Jan 24 11:54, Minwoo Im wrote:
> > Hello,
> > 
> > This is sixth patch series for the support of NVMe subsystem scheme with
> > multi-controller and namespace sharing in a subsystem.
> > 
> > This version has a fix in nvme_init_ctrl() when 'cntlid' is set to the
> > Identify Controller data structure by making it by cpu_to_le16() as
> > Keith reviewed.
> > 
> > Here's test result with a simple 'nvme list -v' command from this model:
> > 
> >   -device nvme-subsys,id=subsys0 \
> >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> >   \
> >   -device nvme,serial=qux,id=nvme3 \
> >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \
> >   \
> >   -device nvme-subsys,id=subsys1 \
> >   -device nvme,serial=quux,id=nvme4,subsys=subsys1 \
> >   -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \
> > 
> >   root@vm:~/work# nvme list -v
> >   NVM Express Subsystems
> > 
> >   SubsystemSubsystem-NQN
> > Controllers
> >    
> > 
> >  
> >   nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
> > nvme0, nvme1, nvme2
> >   nvme-subsys3 nqn.2019-08.org.qemu:qux 
> > nvme3
> >   nvme-subsys4 nqn.2019-08.org.qemu:subsys1 
> > nvme4
> > 
> >   NVM Express Controllers
> > 
> >   Device   SN   MN   FR 
> >   TxPort AddressSubsystemNamespaces
> >      
> >  -- --  
> >   nvme0foo  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:06.0   nvme-subsys1 nvme1c0n1
> >   nvme1bar  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:07.0   nvme-subsys1 nvme1c1n1
> >   nvme2baz  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:08.0   nvme-subsys1 nvme1c2n1, nvme1c2n2
> >   nvme3qux  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:09.0   nvme-subsys3 nvme3n1
> >   nvme4quux QEMU NVMe Ctrl   
> > 1.0  pcie   :00:0a.0   nvme-subsys4 nvme4c4n1
> > 
> >   NVM Express Namespaces
> > 
> >   Device   NSID Usage  Format   
> > Controllers
> >     --  
> > 
> >   nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
> > nvme1, nvme2
> >   nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
> >   nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3
> >   nvme4n1  1268.44  MB / 268.44  MB512   B +  0 B   nvme4
> > 
> > Thanks,
> > 
> > Since V5:
> >   - Fix endianness for 'cntlid' in Identify Controller data structure.
> > (Keith)
> > 
> > Since V4:
> >   - Code clean-up to snprintf rather than duplicating it and copy.
> > (Keith)
> >   - Documentation for 'subsys' clean-up.  (Keith)
> >   - Remove 'cntlid' param from nvme_init_ctrl().  (Keith)
> >   - Put error_propagate() in nvme_realize().  (Keith)
> > 
> > Since RFC V3:
> >   - Exclude 'deatched' scheme from this series.  This will be covered in
> > the next series by covering all the ns-related admin commands
> > including ZNS and ns-mgmt. (Niklas)
> >   - Rebased on nvme-next.
> >   - Remove RFC tag from this V4.
> > 
> > Since RFC V2:
> >   - Rebased on nvme-next branch with trivial patches from the previous
> > version(V2) applied. (Klaus)
> >   - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
> >   - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
> > missed i

Re: [PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-25 Thread Minwoo Im
On 21-01-25 10:11:43, Keith Busch wrote:
> On Mon, Jan 25, 2021 at 07:03:32PM +0100, Klaus Jensen wrote:
> > On Jan 24 11:54, Minwoo Im wrote:
> > > We have nvme-subsys and nvme devices mapped together.  To support
> > > multi-controller scheme to this setup, controller identifier(id) has to
> > > be managed.  Earlier, cntlid(controller id) used to be always 0 because
> > > we didn't have any subsystem scheme that controller id matters.
> > > 
> > > This patch introduced 'cntlid' attribute to the nvme controller
> > > instance(NvmeCtrl) and make it allocated by the nvme-subsys device
> > > mapped to the controller.  If nvme-subsys is not given to the
> > > controller, then it will always be 0 as it was.
> > > 
> > > Added 'ctrls' array in the nvme-subsys instance to manage attached
> > > controllers to the subsystem with a limit(32).  This patch didn't take
> > > list for the controllers to make it seamless with nvme-ns device.
> > > 
> > > Signed-off-by: Minwoo Im 
> > > ---
> > >  hw/block/nvme-subsys.c | 21 +
> > >  hw/block/nvme-subsys.h |  4 
> > >  hw/block/nvme.c| 29 +
> > >  hw/block/nvme.h|  1 +
> > >  4 files changed, 55 insertions(+)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index b525fca14103..7138389be4bd 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> > > *pci_dev)
> > >  id->psd[0].enlat = cpu_to_le32(0x10);
> > >  id->psd[0].exlat = cpu_to_le32(0x4);
> > >  
> > > +if (n->subsys) {
> > > +id->cmic |= NVME_CMIC_MULTI_CTRL;
> > > +}
> > 
> > Since multiple controllers show up with a PCIe port of their own, do we
> > need to set bit 0 (NVME_CMIC_MULTI_PORT?) as well? Or am I
> > misunderstanding that bit?
> 
> AIUI, if you report this MULTI_PORT bit, then each PCI device in the
> subsystem needs to report a different "Port Number" in their PCIe Link
> Capabilities register. I don't think we can manipulate that value from
> the nvme "device", but I also don't know what a host should do with this
> information even if we could. So, I think it's safe to leave it at 0.

AFAIK, If we leave it to 0, kernel will not allocate disk for multi-path
case (e.g., nvmeXcYnZ).



[PATCH] hw/block/nvme: fix wrong parameter name 'cross_read'

2021-01-25 Thread Minwoo Im
The actual parameter name is 'cross_read' rather than 'cross_zone_read'.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 21aec90637fa..bf9134f73d81 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -81,7 +81,7 @@
  * The default value means there is no limit to the number of
  * concurrently open zones.
  *
- * zoned.cross_zone_read=
+ * zoned.cross_read=
  * Setting this property to true enables Read Across Zone Boundaries.
  */
 
-- 
2.17.1




[PATCH V6 6/6] hw/block/nvme: support for shared namespace in subsystem

2021-01-23 Thread Minwoo Im
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme.c| 10 +-
 5 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 62b25cf69bfa..9b493f2ead03 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index e9d61c993c90..641de33e99fc 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ 

[PATCH V6 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-23 Thread Minwoo Im
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index d6415a869c1c..ad68cdc2b92d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1




[PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-23 Thread Minwoo Im
We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 29 +
 hw/block/nvme.h|  1 +
 4 files changed, 55 insertions(+)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index aa82911b951c..e9d61c993c90 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b525fca14103..7138389be4bd 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cpu_to_le16(n->cntlid);
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4495,6 +4502,24 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+n->cntlid = cntlid;
+
+return 0;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -4515,6 +4540,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
+if (nvme_init_subsys(n, errp)) {
+error_propagate(errp, local_err);
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 04d4684601fd..b8f5f2d6ffb8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -134,6 +134,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1




[PATCH V6 3/6] hw/block/nvme: add CMIC enum value for Identify Controller

2021-01-23 Thread Minwoo Im
Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e4b918064df9..d6415a869c1c 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.17.1




[PATCH V6 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-23 Thread Minwoo Im
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
---
 hw/block/meson.build   |  2 +-
 hw/block/nvme-subsys.c | 60 ++
 hw/block/nvme-subsys.h | 25 ++
 hw/block/nvme.c|  3 +++
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..aa82911b951c
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,60 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
+ "nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 21aec90637fa..aabccdf36f4b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the
@@ -38,6 +39,8 @@
  *
  * The PMR will use BAR 4/5 exclusively.
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
-- 
2.17.1




[PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-23 Thread Minwoo Im
Hello,

This is sixth patch series for the support of NVMe subsystem scheme with
multi-controller and namespace sharing in a subsystem.

This version has a fix in nvme_init_ctrl() when 'cntlid' is set to the
Identify Controller data structure by making it by cpu_to_le16() as
Keith reviewed.

Here's test result with a simple 'nvme list -v' command from this model:

  -device nvme-subsys,id=subsys0 \
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
  -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
  -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
  \
  -device nvme,serial=qux,id=nvme3 \
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \
  \
  -device nvme-subsys,id=subsys1 \
  -device nvme,serial=quux,id=nvme4,subsys=subsys1 \
  -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \

  root@vm:~/work# nvme list -v
  NVM Express Subsystems

  SubsystemSubsystem-NQN
Controllers
   

 
  nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
nvme0, nvme1, nvme2
  nvme-subsys3 nqn.2019-08.org.qemu:qux 
nvme3
  nvme-subsys4 nqn.2019-08.org.qemu:subsys1 
nvme4

  NVM Express Controllers

  Device   SN   MN   FR 
  TxPort AddressSubsystemNamespaces
     
 -- --  
  nvme0foo  QEMU NVMe Ctrl   1.0
  pcie   :00:06.0   nvme-subsys1 nvme1c0n1
  nvme1bar  QEMU NVMe Ctrl   1.0
  pcie   :00:07.0   nvme-subsys1 nvme1c1n1
  nvme2baz  QEMU NVMe Ctrl   1.0
  pcie   :00:08.0   nvme-subsys1 nvme1c2n1, nvme1c2n2
  nvme3qux  QEMU NVMe Ctrl   1.0
  pcie   :00:09.0   nvme-subsys3 nvme3n1
  nvme4quux QEMU NVMe Ctrl   1.0
  pcie   :00:0a.0   nvme-subsys4 nvme4c4n1

  NVM Express Namespaces

  Device   NSID Usage  Format   Controllers
    --  

  nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
nvme1, nvme2
  nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
  nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3
  nvme4n1  1268.44  MB / 268.44  MB512   B +  0 B   nvme4

Thanks,

Since V5:
  - Fix endianness for 'cntlid' in Identify Controller data structure.
(Keith)

Since V4:
  - Code clean-up to snprintf rather than duplicating it and copy.
(Keith)
  - Documentation for 'subsys' clean-up.  (Keith)
  - Remove 'cntlid' param from nvme_init_ctrl().  (Keith)
  - Put error_propagate() in nvme_realize().  (Keith)

Since RFC V3:
  - Exclude 'deatched' scheme from this series.  This will be covered in
the next series by covering all the ns-related admin commands
including ZNS and ns-mgmt. (Niklas)
  - Rebased on nvme-next.
  - Remove RFC tag from this V4.

Since RFC V2:
  - Rebased on nvme-next branch with trivial patches from the previous
version(V2) applied. (Klaus)
  - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
  - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
missed in V2.
  - Added 'detached' parameter to nvme-ns device to decide whether to
attach or not to controller(s) in the subsystem. (Klaus)
  - Implemented Identify Active Namespace ID List aprt from Identify
Allocated Namespace ID List by removing fall-thru statement.

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
hierarchy.

Minwoo Im (6):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem

 hw/block/meson.build   |   2 +-
 hw/block/nvme-ns.c |  23 +++--
 hw/block/nvme-ns.h |   7 +++
 hw/block/nvme-subsys.c | 106 +
 hw/block/nvme-subsys.h |  32 +
 hw/block/nvme.c|  72 +---
 hw/block/nvme.h

[PATCH V6 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-23 Thread Minwoo Im
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 30 +-
 hw/block/nvme.h |  3 +++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aabccdf36f4b..b525fca14103 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -22,7 +22,8 @@
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
- *  mdts=,zoned.append_size_limit= \
+ *  mdts=,zoned.append_size_limit=, \
+ *  subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -44,6 +45,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4404,11 +4412,23 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+
+if (!subsys) {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
+ "nqn.2019-08.org.qemu:%s", n->params.serial);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4455,9 +4475,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4545,6 +4563,8 @@ static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dee6092bd45f..04d4684601fd 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -170,6 +171,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
-- 
2.17.1




Re: [PATCH V5 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-22 Thread Minwoo Im
On 21-01-22 10:42:36, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 09:07:34PM +0900, Minwoo Im wrote:
> > index b525fca14103..3dedefb8ebba 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> > *pci_dev)
> >  strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
> >  strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
> >  strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
> > +
> > +id->cntlid = n->cntlid;
> 
> cpu_to_le16()? It might be okay to not do that since the only
> requirement is that this is a unique value, but it would be confusing
> for decoding commands that have a controller id field.

Agreed.

Yes, cntlids are allocated in unique values so that functionality has no
problem here.  But, even if so, we should make it have proper value in
Identify data structure with the policy it has to avoid confusing.

Thanks Keith! will fix it :)



[PATCH V5 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-22 Thread Minwoo Im
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index d6415a869c1c..ad68cdc2b92d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1




[PATCH V5 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-22 Thread Minwoo Im
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 30 +-
 hw/block/nvme.h |  3 +++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aabccdf36f4b..b525fca14103 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -22,7 +22,8 @@
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
- *  mdts=,zoned.append_size_limit= \
+ *  mdts=,zoned.append_size_limit=, \
+ *  subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -44,6 +45,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4404,11 +4412,23 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+
+if (!subsys) {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
+ "nqn.2019-08.org.qemu:%s", n->params.serial);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4455,9 +4475,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4545,6 +4563,8 @@ static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dee6092bd45f..04d4684601fd 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -170,6 +171,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
-- 
2.17.1




[PATCH V5 6/6] hw/block/nvme: support for shared namespace in subsystem

2021-01-22 Thread Minwoo Im
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme.c| 10 +-
 5 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 62b25cf69bfa..9b493f2ead03 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index e9d61c993c90..641de33e99fc 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ 

[PATCH V5 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-22 Thread Minwoo Im
We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 29 +
 hw/block/nvme.h|  1 +
 4 files changed, 55 insertions(+)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index aa82911b951c..e9d61c993c90 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b525fca14103..3dedefb8ebba 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = n->cntlid;
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4495,6 +4502,24 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+n->cntlid = cntlid;
+
+return 0;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -4515,6 +4540,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
+if (nvme_init_subsys(n, errp)) {
+error_propagate(errp, local_err);
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 04d4684601fd..b8f5f2d6ffb8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -134,6 +134,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1




[PATCH V5 3/6] hw/block/nvme: add CMIC enum value for Identify Controller

2021-01-22 Thread Minwoo Im
Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e4b918064df9..d6415a869c1c 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.17.1




[PATCH V5 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-22 Thread Minwoo Im
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
---
 hw/block/meson.build   |  2 +-
 hw/block/nvme-subsys.c | 60 ++
 hw/block/nvme-subsys.h | 25 ++
 hw/block/nvme.c|  3 +++
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..aa82911b951c
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,60 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
+ "nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 21aec90637fa..aabccdf36f4b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the
@@ -38,6 +39,8 @@
  *
  * The PMR will use BAR 4/5 exclusively.
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
-- 
2.17.1




[PATCH V5 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-22 Thread Minwoo Im
Hello,

Here's fifth patch series for the support of NVMe subsystem scheme with
multi-controller and namespace sharing in a subsystem.

This series has applied review comments from the previous series,
mostly from Keith's review.  Thanks Keith!

Here's test result with a simple 'nvme list -v' command from this model
with adding a ZNS example with subsys.

  -device nvme-subsys,id=subsys0 \
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
  -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
  -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
  \
  -device nvme,serial=qux,id=nvme3 \
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \
  \
  -device nvme-subsys,id=subsys1 \
  -device nvme,serial=quux,id=nvme4,subsys=subsys1 \
  -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \

  root@vm:~/work# nvme list -v
  NVM Express Subsystems

  SubsystemSubsystem-NQN
Controllers
   

 
  nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
nvme0, nvme1, nvme2
  nvme-subsys3 nqn.2019-08.org.qemu:qux 
nvme3
  nvme-subsys4 nqn.2019-08.org.qemu:subsys1 
nvme4

  NVM Express Controllers

  Device   SN   MN   FR 
  TxPort AddressSubsystemNamespaces
     
 -- --  
  nvme0foo  QEMU NVMe Ctrl   1.0
  pcie   :00:06.0   nvme-subsys1 nvme1c0n1
  nvme1bar  QEMU NVMe Ctrl   1.0
  pcie   :00:07.0   nvme-subsys1 nvme1c1n1
  nvme2baz  QEMU NVMe Ctrl   1.0
  pcie   :00:08.0   nvme-subsys1 nvme1c2n1, nvme1c2n2
  nvme3qux  QEMU NVMe Ctrl   1.0
  pcie   :00:09.0   nvme-subsys3 nvme3n1
  nvme4quux QEMU NVMe Ctrl   1.0
  pcie   :00:0a.0   nvme-subsys4 nvme4c4n1

  NVM Express Namespaces

  Device   NSID Usage  Format   Controllers
    --  

  nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
nvme1, nvme2
  nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
  nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3
  nvme4n1  1268.44  MB / 268.44  MB512   B +  0 B   nvme4

Thanks,

Since V4:
  - Code clean-up to snprintf rather than duplicating it and copy.
(Keith)
  - Documentation for 'subsys' clean-up.  (Keith)
  - Remove 'cntlid' param from nvme_init_ctrl().  (Keith)
  - Put error_propagate() in nvme_realize().  (Keith)

Since RFC V3:
  - Exclude 'deatched' scheme from this series.  This will be covered in
the next series by covering all the ns-related admin commands
including ZNS and ns-mgmt. (Niklas)
  - Rebased on nvme-next.
  - Remove RFC tag from this V4.

Since RFC V2:
  - Rebased on nvme-next branch with trivial patches from the previous
version(V2) applied. (Klaus)
  - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
  - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
missed in V2.
  - Added 'detached' parameter to nvme-ns device to decide whether to
attach or not to controller(s) in the subsystem. (Klaus)
  - Implemented Identify Active Namespace ID List aprt from Identify
Allocated Namespace ID List by removing fall-thru statement.

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
hierarchy.

Minwoo Im (6):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem

 hw/block/meson.build   |   2 +-
 hw/block/nvme-ns.c |  23 +++--
 hw/block/nvme-ns.h |   7 +++
 hw/block/nvme-subsys.c | 106 +
 hw/block/nvme-subsys.h |  32 +
 hw/block/nvme.c|  72 +---
 hw/block/nvme.h|   4 ++
 include/block/nvme.h   |   8 
 8 files changed, 242 insertions(+), 12 deletions(-)
 create

Re: [PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-21 Thread Minwoo Im
On 21-01-21 15:17:16, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:06AM +0900, Minwoo Im wrote:
> > -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> > +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t 
> > cntlid)
> >  {
> >  NvmeIdCtrl *id = >id_ctrl;
> >  uint8_t *pci_conf = pci_dev->config;
> >  
> > +n->cntlid = cntlid;
> 
> I don't think 'cntlid' is important enough to be a function parameter.
> You can just set it within the 'NvmeCtrl' struct before calling this
> function like all the other properties.

Okay.  Rather than adding a parameter to this function,
nvme_init_subsys() may take this job to assign cntlid to the controller
instance first.  Let me fix one!

> > @@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> > **errp)
> >  return;
> >  }
> >  
> > -nvme_init_ctrl(n, pci_dev);
> > +cntlid = nvme_init_subsys(n, errp);
> > +if (cntlid < 0) {
> 
> error_propogate();

Thanks for catching this.



Re: [PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-21 Thread Minwoo Im
On 21-01-21 15:03:38, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:04AM +0900, Minwoo Im wrote:
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -23,6 +23,7 @@
> >   *  max_ioqpairs=, \
> >   *  aerl=, aer_max_queued=, \
> >   *  mdts=,zoned.append_size_limit= \
> > + *  ,subsys= \
> 
> For consistency, the ',' goes in the preceeding line.

I have no idea what happened here :(.  Will fix it. Thanks!



Re: [PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-21 Thread Minwoo Im
On 21-01-21 14:52:02, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:03AM +0900, Minwoo Im wrote:
> > +static void nvme_subsys_setup(NvmeSubsystem *subsys)
> > +{
> > +char *subnqn;
> > +
> > +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", 
> > subsys->parent_obj.id);
> > +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, 
> > '\0');
> > +g_free(subnqn);
> 
> Instead of the duplication and copy, you could format the string
> directly into the destination:
> 
> snprintf(subsys->subnqn, sizeof(subsys->subnqn), 
> "nqn.2019-08.org.qemu:%s",
>  subsys->parent_obj.id);

Oh, Thanks. Will fix it!



[PATCH V4 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-21 Thread Minwoo Im
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index d6415a869c1c..ad68cdc2b92d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1




[PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-21 Thread Minwoo Im
We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 34 --
 hw/block/nvme.h|  1 +
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index f1dc71d588d9..a01003136b12 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ab0531492ddd..225f0d3f3a27 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4427,16 +4427,21 @@ static void nvme_init_subnqn(NvmeCtrl *n)
 }
 }
 
-static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
+static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
 
+n->cntlid = cntlid;
+
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cntlid;
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4483,6 +4488,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4497,11 +4506,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+return cntlid;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeNamespace *ns;
 Error *local_err = NULL;
+int cntlid;
 
 nvme_check_constraints(n, _err);
 if (local_err) {
@@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-nvme_init_ctrl(n, pci_dev);
+cntlid = nvme_init_subsys(n, errp);
+if (cntlid < 0) {
+return;
+}
+nvme_init_ctrl(n, pci_dev, cntlid);
 
 /* setup a namespace if the controller drive property was given */
 if (n->namespace.blkconf.blk) {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 04d4684601fd..b8f5f2d6ffb8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -134,6 +134,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1




[PATCH V4 3/6] hw/block/nvme: add CMIC enum value for Identify Controller

2021-01-21 Thread Minwoo Im
Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e4b918064df9..d6415a869c1c 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.17.1




[PATCH V4 6/6] hw/block/nvme: support for shared namespace in subsystem

2021-01-21 Thread Minwoo Im
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme.c| 10 +-
 5 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 62b25cf69bfa..9b493f2ead03 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index a01003136b12..e7efdcae7d0d 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,6 +14,7 @@
 OBJ

[PATCH V4 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-21 Thread Minwoo Im
Hello,

This series is fourth series for the support of NVMe subsystem scheme
with multi-controller and namespace sharing in a subsystem.

This time, I've removed 'detached' scheme introduced in the previous
series out of this series to focus on subsystem scheme in ths series.
The attach/detach scheme will be introduced in the next series with
ns-mgmt functionality.

Here's an example of how-to:

  # Specify a subsystem
  -device nvme-subsys,id=subsys0 \
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
  -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
  -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
  \
  # Not specify a subsystem
  -device nvme,serial=qux,id=nvme3 \
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \

# nvme list -v
  NVM Express Subsystems

  SubsystemSubsystem-NQN
Controllers
   

 
  nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
nvme0, nvme1, nvme2
  nvme-subsys3 nqn.2019-08.org.qemu:qux 
nvme3

  NVM Express Controllers

  Device   SN   MN   FR 
  TxPort AddressSubsystemNamespaces  
     
 -- --  
  nvme0foo  QEMU NVMe Ctrl   1.0
  pcie   :00:06.0   nvme-subsys1 nvme1c0n1
  nvme1bar  QEMU NVMe Ctrl   1.0
  pcie   :00:07.0   nvme-subsys1 nvme1c1n1
  nvme2baz  QEMU NVMe Ctrl   1.0
  pcie   :00:08.0   nvme-subsys1 nvme1c2n1, nvme1c2n2
  nvme3qux  QEMU NVMe Ctrl   1.0
  pcie   :00:09.0   nvme-subsys3 nvme3c3n1

  NVM Express Namespaces

  Device   NSID Usage  Format   Controllers 

    --  

  nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
nvme1, nvme2
  nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
  nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3

Thanks,

Since RFC V3:
  - Exclude 'deatched' scheme from this series.  This will be covered in
the next series by covering all the ns-related admin commands
including ZNS and ns-mgmt. (Niklas)
  - Rebased on nvme-next.
  - Remove RFC tag from this V4.

Since RFC V2:
  - Rebased on nvme-next branch with trivial patches from the previous
version(V2) applied. (Klaus)
  - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
  - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
missed in V2.
  - Added 'detached' parameter to nvme-ns device to decide whether to
attach or not to controller(s) in the subsystem. (Klaus)
  - Implemented Identify Active Namespace ID List aprt from Identify
Allocated Namespace ID List by removing fall-thru statement.

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
hierarchy.

Minwoo Im (6):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem

 hw/block/meson.build   |   2 +-
 hw/block/nvme-ns.c |  23 +++--
 hw/block/nvme-ns.h |   7 +++
 hw/block/nvme-subsys.c | 109 +
 hw/block/nvme-subsys.h |  32 
 hw/block/nvme.c|  77 ++---
 hw/block/nvme.h|   4 ++
 include/block/nvme.h   |   8 +++
 8 files changed, 249 insertions(+), 13 deletions(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

-- 
2.17.1




[PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-21 Thread Minwoo Im
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
---
 hw/block/meson.build   |  2 +-
 hw/block/nvme-subsys.c | 63 ++
 hw/block/nvme-subsys.h | 25 +
 hw/block/nvme.c|  3 ++
 4 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..f1dc71d588d9
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,63 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+char *subnqn;
+
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0');
+g_free(subnqn);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 21aec90637fa..aabccdf36f4b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the
@@ -38,6 +39,8 @@
  *
  * The PMR will use BAR 4/5 exclusively.
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
-- 
2.17.1




[PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-21 Thread Minwoo Im
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 30 ++
 hw/block/nvme.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aabccdf36f4b..ab0531492ddd 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,6 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit= \
+ *  ,subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -44,6 +45,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4404,11 +4412,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+char *subnqn;
+
+if (!subsys) {
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
+strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
+g_free(subnqn);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4455,9 +4477,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4545,6 +4565,8 @@ static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dee6092bd45f..04d4684601fd 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -170,6 +171,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
-- 
2.17.1




Re: [PATCH v3 12/12] hw/block/nvme: lift cmb restrictions

2021-01-20 Thread Minwoo Im
Nice for codes much more clean.

Reviewed-by: Minwoo Im 



Re: [PATCH v3 07/12] hw/block/nvme: remove redundant zeroing of PMR registers

2021-01-20 Thread Minwoo Im
Reviewed-by: Minwoo Im 



Re: [PATCH v3 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-20 Thread Minwoo Im
Reviewed-by: Minwoo Im 



Re: [RFC PATCH V3 8/8] hw/block/nvme: Add Identify Active Namespace ID List

2021-01-20 Thread Minwoo Im
> Hello Minwoo,
> 
> By introducing a detached parameter,
> you are also implicitly making the following
> NVMe commands no longer be spec compliant:
> 
> NVME_ID_CNS_NS, NVME_ID_CNS_CS_NS,
> NVME_ID_CNS_NS_ACTIVE_LIST, NVME_ID_CNS_CS_NS_ACTIVE_LIST
> 
> When these commands are called on a detached namespace,
> they should usually return a zero filled data struct.

Agreed.

> Dmitry and I had a patch on V8 on the ZNS series
> that tried to fix some the existing NVMe commands
> to be spec compliant, by handling detached namespaces
> properly. In the end, in order to make it easier to
> get the ZNS series accepted, we decided to drop the
> detached related stuff from the series.
> 
> Feel free to look at that patch for inspiration:
> https://github.com/dmitry-fomichev/qemu/commit/251c0ffee5149c739b1347811fa7e32a1c36bf7c

I've seen this patch and as Klaus said, only thing patches need go with
is to put some of nvme_ns_is_attached() helper among the Identify
handlers.

> I'm not sure if you want to modify all the functions that
> our patch modifies, but I think that you should at least
> modify the following nvme functions:
> 
> nvme_identify_ns()
> nvme_identify_ns_csi()
> nvme_identify_nslist()
> nvme_identify_nslist_csi()

Yes, pretty makes sense to update them.  But now it seems like
'attach/detach' scheme should have been separated out of this series
which just introduced the multi-path for controllers and namespace
sharing.  I will drop this 'detach' scheme out of this series and make
another series to support all of the Identify you mentioned above
cleanly.

> So they handle detached namespaces correctly for both:
> NVME_ID_CNS_NS, NVME_ID_CNS_CS_NS,
> NVME_ID_CNS_NS_ACTIVE_LIST, NVME_ID_CNS_CS_NS_ACTIVE_LIST
> as well as for:
> NVME_ID_CNS_NS_PRESENT, NVME_ID_CNS_CS_NS_PRESENT,
> NVME_ID_CNS_NS_PRESENT_LIST, NVME_ID_CNS_CS_NS_PRESENT_LIST
> 
> 
> Kind regards,
> Niklas



Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns

2021-01-20 Thread Minwoo Im
On 21-01-20 08:52:14, Klaus Jensen wrote:
> On Jan 20 09:44, Minwoo Im wrote:
> > On 21-01-19 19:18:16, Klaus Jensen wrote:
> > > On Jan 20 02:01, Minwoo Im wrote:
> > > > Hello,
> > > > 
> > > > This patch series is third one to support multi-controller and namespace
> > > > sharing in multi-path.  This series introduced subsystem scheme to
> > > > manage controller(s) and namespace(s) in the subsystem.
> > > > 
> > > > This series has new patches from the V2:  'detached' parameter has been
> > > > added to the nvme-ns device.  This will decide whether to attach the
> > > > namespace to controller(s) in the current subsystem or not.  If it's
> > > > given with true, then it will be just allocated in the subsystem, but
> > > > not attaching to any controllers in the subsystem.  Otherwise, it will
> > > > automatically attach to all the controllers in the subsystem.  The other
> > > > t hing is that the last patch implemented Identify Active Namespace ID
> > > > List command handler apart from the Allocated Namespace ID List.
> > > > 
> > > > Run with:
> > > >   -device nvme,serial=qux,id=nvme3
> > > >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > > > 
> > > >   -device nvme-subsys,id=subsys0
> > > >   -device nvme,serial=foo,id=nvme0,subsys=subsys0
> > > >   -device nvme,serial=bar,id=nvme1,subsys=subsys0
> > > >   -device nvme,serial=baz,id=nvme2,subsys=subsys0
> > > >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true
> > > >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2
> > > > 
> > > > nvme-cli:
> > > >   root@vm:~/work# nvme list -v  
> > > > 
> > > >   NVM Express Subsystems
> > > >  
> > > > 
> > > >  
> > > >   SubsystemSubsystem-NQN
> > > > Controllers
> > > >    
> > > > 
> > > >  
> > > >   nvme-subsys0 nqn.2019-08.org.qemu:qux 
> > > > nvme0
> > > >   nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
> > > > nvme1, nvme2, nvme3
> > > > 
> > > >
> > > >   NVM Express Controllers   
> > > > 
> > > > 
> > > >   
> > > >   Device   SN   MN  
> > > >  FR   TxPort AddressSubsystemNamespaces
> > > >     
> > > >   -- -- 
> > > >  
> > > >   nvme0qux  QEMU NVMe Ctrl  
> > > >  1.0  pcie   :00:06.0   nvme-subsys0
> > > 
> > > Shouldn't nvme0n1 be listed under Namespaces for nvme0?
> > 
> > Oh, I missed that one from the output.  As Keith mentioned, I ran the
> > list command again based on the latest nvme-cli.git:
> > 
> > Please refer the following result.  I think it's okay not to send the
> > cover letter again :)
> > 
> > # nvme --version
> > nvme version 1.13.48.g33c6
> > 
> > # nvme list -v
> > NVM Express Subsystems
> > 
> > SubsystemSubsystem-NQN  
> >   Controllers
> >  
> > 
> >  
> > nvme-subsys0 n

Re: [RFC PATCH V3 7/8] hw/block/nvme: add 'detached' param not to attach namespace

2021-01-19 Thread Minwoo Im
> Isn't the HBitmap slightly overkill? Can qemu/bitmap.h suffice?

Definitely, yes, I think.  Current patch series supoprt up to 32
controllers so I think qemu/bitmap.h is enough for us.

Will update the bitmap operations in the next series.



Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns

2021-01-19 Thread Minwoo Im
> Minwoo, try pulling the most current nvme-cli. There was a sysfs
> scanning bug for non-mpath drives that should be fixed now.

Thank you, Keith!  I've posted list result based on the latest one :)



Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns

2021-01-19 Thread Minwoo Im
On 21-01-19 19:18:16, Klaus Jensen wrote:
> On Jan 20 02:01, Minwoo Im wrote:
> > Hello,
> > 
> > This patch series is third one to support multi-controller and namespace
> > sharing in multi-path.  This series introduced subsystem scheme to
> > manage controller(s) and namespace(s) in the subsystem.
> > 
> > This series has new patches from the V2:  'detached' parameter has been
> > added to the nvme-ns device.  This will decide whether to attach the
> > namespace to controller(s) in the current subsystem or not.  If it's
> > given with true, then it will be just allocated in the subsystem, but
> > not attaching to any controllers in the subsystem.  Otherwise, it will
> > automatically attach to all the controllers in the subsystem.  The other
> > t hing is that the last patch implemented Identify Active Namespace ID
> > List command handler apart from the Allocated Namespace ID List.
> > 
> > Run with:
> >   -device nvme,serial=qux,id=nvme3
> >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > 
> >   -device nvme-subsys,id=subsys0
> >   -device nvme,serial=foo,id=nvme0,subsys=subsys0
> >   -device nvme,serial=bar,id=nvme1,subsys=subsys0
> >   -device nvme,serial=baz,id=nvme2,subsys=subsys0
> >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true
> >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2
> > 
> > nvme-cli:
> >   root@vm:~/work# nvme list -v  
> > 
> >   NVM Express Subsystems
> >  
> > 
> >  
> >   SubsystemSubsystem-NQN
> > Controllers
> >    
> > 
> >  
> >   nvme-subsys0 nqn.2019-08.org.qemu:qux 
> > nvme0
> >   nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
> > nvme1, nvme2, nvme3
> > 
> >
> >   NVM Express Controllers   
> > 
> > 
> >   
> >   Device   SN   MN   FR 
> >   TxPort AddressSubsystemNamespaces
> >      
> >  -- --  
> >   nvme0qux  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:06.0   nvme-subsys0
> 
> Shouldn't nvme0n1 be listed under Namespaces for nvme0?

Oh, I missed that one from the output.  As Keith mentioned, I ran the
list command again based on the latest nvme-cli.git:

Please refer the following result.  I think it's okay not to send the
cover letter again :)

# nvme --version
nvme version 1.13.48.g33c6

# nvme list -v
NVM Express Subsystems

SubsystemSubsystem-NQN  
  Controllers
 

 
nvme-subsys0 nqn.2019-08.org.qemu:qux   
  nvme0
nvme-subsys1 nqn.2019-08.org.qemu:subsys0   
  nvme1, nvme2, nvme3

NVM Express Controllers

Device   SN   MN   FR   
TxPort AddressSubsystemNamespaces  
    
-- --  
nvme0qux  QEMU NVMe Ctrl   1.0  
pcie   :00:06.0   nvme-subsys0 nvme0n1
nvme1foo  QEMU NVMe Ctrl   1.0  
pcie   :00:07.0   nvme-subsys1 
nvme2bar

[RFC PATCH V3 7/8] hw/block/nvme: add 'detached' param not to attach namespace

2021-01-19 Thread Minwoo Im
Introduced 'detached' parameter to nvme-ns device.  If given, the
namespace will not be attached to controller(s) in the subsystem.  If
'subsys' is not given with this option, it should be provided with 'bus'
which is for private namespace.

This patch also introduced 'ctrls_bitmap' in NvmeNamespace instance to
represent which controler id(cntlid) is attached to this namespace
device.  A namespace can be attached to multiple controllers in a
subsystem so that this bitmap maps those two relationships.

The ctrls_bitmap bitmap should not be accessed directly, but through the
helpers introduced in this patch: nvme_ns_is_attached(),
nvme_ns_attach(), nvme_ns_detach().

Note that this patch made identify namespace list data not hold
non-attached namespace ID in nvme_identify_nslist.  Currently, this
command handler is for CNS 0x2(Active) and 0x10(Allocated) both.  The
next patch will introduce a handler for later on.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c |  9 +
 hw/block/nvme-ns.h |  6 ++
 hw/block/nvme-subsys.c |  2 ++
 hw/block/nvme.c| 31 ++-
 hw/block/nvme.h| 15 +++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 073f65e49cac..70d42c24065c 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -17,6 +17,7 @@
 #include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/error-report.h"
+#include "qemu/hbitmap.h"
 #include "hw/block/block.h"
 #include "hw/pci/pci.h"
 #include "sysemu/sysemu.h"
@@ -304,6 +305,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return 0;
 }
 
+static void nvme_ns_init_state(NvmeNamespace *ns)
+{
+ns->ctrls_bitmap = hbitmap_alloc(NVME_SUBSYS_MAX_CTRLS, 0);
+}
+
 int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
 if (nvme_ns_check_constraints(ns, errp)) {
@@ -314,6 +320,8 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 return -1;
 }
 
+nvme_ns_init_state(ns);
+
 if (nvme_ns_init(ns, errp)) {
 return -1;
 }
@@ -381,6 +389,7 @@ static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
  NvmeSubsystem *),
+DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 929e78861903..ad2f55931d1b 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -26,6 +26,7 @@ typedef struct NvmeZone {
 } NvmeZone;
 
 typedef struct NvmeNamespaceParams {
+bool detached;
 uint32_t nsid;
 QemuUUID uuid;
 
@@ -48,6 +49,11 @@ typedef struct NvmeNamespace {
 uint8_t  csi;
 
 NvmeSubsystem   *subsys;
+/*
+ * Whether this namespace is attached to a controller or not.  This bitmap
+ * is based on controller id.  This is valid only in case 'subsys' != NULL.
+ */
+HBitmap *ctrls_bitmap;
 
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index e7efdcae7d0d..32ad8ef2825a 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -11,6 +11,7 @@
 #include "qemu/uuid.h"
 #include "qemu/iov.h"
 #include "qemu/cutils.h"
+#include "qemu/hbitmap.h"
 #include "qapi/error.h"
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
@@ -20,6 +21,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/pci/pci.h"
 #include "nvme.h"
+#include "nvme-ns.h"
 #include "nvme-subsys.h"
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 06bccf1b9e9e..2b2c07b36c2b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -26,7 +26,7 @@
  *  ,subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
- *  subsys=
+ *  subsys=,detached=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
@@ -77,6 +77,13 @@
  *   controllers in the subsystem. Otherwise, `bus` must be given to attach
  *   this namespace to a specified single controller as a non-shared namespace.
  *
+ * - `detached`
+ *   Not to attach the namespace device to controllers in the NVMe subsystem
+ *   during boot-up.  If not given, namespaces are all attached to all
+ *   controllers in the subsystem by default.
+ *   It's mutual exclusive with 'bus' paraemter.  It's only valid in case
+ *   'subsys' is provided.
+ *
  * Setting `zoned` 

[RFC PATCH V3 3/8] hw/block/nvme: add CMIC enum value for Identify Controller

2021-01-19 Thread Minwoo Im
Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 45b2678db1f0..733fb35fedde 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -941,6 +941,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.17.1




[RFC PATCH V3 1/8] hw/block/nvme: introduce nvme-subsys device

2021-01-19 Thread Minwoo Im
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
---
 hw/block/meson.build   |  2 +-
 hw/block/nvme-subsys.c | 63 ++
 hw/block/nvme-subsys.h | 25 +
 hw/block/nvme.c|  3 ++
 4 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..f1dc71d588d9
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,63 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+char *subnqn;
+
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0');
+g_free(subnqn);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 309c26db8ff7..4644937a5c50 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -37,6 +38,8 @@
  * -object memory-backend-file,id=,share=on,mem-path=, \
  *  size=  -device nvme,...,pmrdev=
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
-- 
2.17.1




[RFC PATCH V3 8/8] hw/block/nvme: Add Identify Active Namespace ID List

2021-01-19 Thread Minwoo Im
Spec v1.4b 6.1.4 "Active and Inactive NSID Types" says:

"Active NSIDs for a controller refer to namespaces that are attached to
that controller. Allocated NSIDs that are inactive for a controller refer
to namespaces that are not attached to that controller."

This patch introduced for Identify Active Namespace ID List (CNS 02h).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2b2c07b36c2b..7247167b0ee6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2883,6 +2883,39 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+static uint16_t nvme_identify_nslist_active(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint32_t min_nsid = le32_to_cpu(c->nsid);
+uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {};
+static const int data_len = sizeof(list);
+uint32_t *list_ptr = (uint32_t *)list;
+int i, j = 0;
+
+if (min_nsid >= NVME_NSID_BROADCAST - 1) {
+return NVME_INVALID_NSID | NVME_DNR;
+}
+
+for (i = 1; i <= n->num_namespaces; i++) {
+ns = nvme_ns(n, i);
+if (!ns || ns->params.nsid <= min_nsid) {
+continue;
+}
+
+if (!nvme_ns_is_attached(n, ns)) {
+continue;
+}
+
+list_ptr[j++] = cpu_to_le32(ns->params.nsid);
+if (j == data_len / sizeof(uint32_t)) {
+break;
+}
+}
+
+return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
@@ -2914,10 +2947,6 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 continue;
 }
 
-if (!nvme_ns_is_attached(n, ns)) {
-continue;
-}
-
 list_ptr[j++] = cpu_to_le32(ns->params.nsid);
 if (j == data_len / sizeof(uint32_t)) {
 break;
@@ -3045,7 +3074,7 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 case NVME_ID_CNS_CS_CTRL:
 return nvme_identify_ctrl_csi(n, req);
 case NVME_ID_CNS_NS_ACTIVE_LIST:
- /* fall through */
+ return nvme_identify_nslist_active(n, req);
 case NVME_ID_CNS_NS_PRESENT_LIST:
 return nvme_identify_nslist(n, req);
 case NVME_ID_CNS_CS_NS_ACTIVE_LIST:
-- 
2.17.1




[RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns

2021-01-19 Thread Minwoo Im
Hello,

This patch series is third one to support multi-controller and namespace
sharing in multi-path.  This series introduced subsystem scheme to
manage controller(s) and namespace(s) in the subsystem.

This series has new patches from the V2:  'detached' parameter has been
added to the nvme-ns device.  This will decide whether to attach the
namespace to controller(s) in the current subsystem or not.  If it's
given with true, then it will be just allocated in the subsystem, but
not attaching to any controllers in the subsystem.  Otherwise, it will
automatically attach to all the controllers in the subsystem.  The other
t hing is that the last patch implemented Identify Active Namespace ID
List command handler apart from the Allocated Namespace ID List.

Run with:
  -device nvme,serial=qux,id=nvme3
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2

nvme-cli:
  root@vm:~/work# nvme list -v  

  NVM Express Subsystems
 

 
  SubsystemSubsystem-NQN
Controllers
   

 
  nvme-subsys0 nqn.2019-08.org.qemu:qux 
nvme0
  nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
nvme1, nvme2, nvme3

   
  NVM Express Controllers   


  
  Device   SN   MN   FR 
  TxPort AddressSubsystemNamespaces
     
 -- --  
  nvme0qux  QEMU NVMe Ctrl   1.0
  pcie   :00:06.0   nvme-subsys0
  nvme1foo  QEMU NVMe Ctrl   1.0
  pcie   :00:07.0   nvme-subsys1
  nvme2bar  QEMU NVMe Ctrl   1.0
  pcie   :00:08.0   nvme-subsys1
  nvme3baz  QEMU NVMe Ctrl   1.0
  pcie   :00:09.0   nvme-subsys1 nvme1n1

  
  NVM Express Namespaces 
   
  Device   NSID Usage  Format   Controllers
    --  

  nvme0n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme0
  nvme1n1  2268.44  MB / 268.44  MB512   B +  0 B   nvme3

Now we have [allocated|not-allocated]/[attached/detached] scheme for
namespaces and controllers.  The next series is going to be to support
namespace management and attachment with few Identify command handlers.

Please review.

Thanks!

Since RFC V2:
  - Rebased on nvme-next branch with trivial patches from the previous
version(V2) applied. (Klaus)
  - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
  - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
missed in V2.
  - Added 'detached' parameter to nvme-ns device to decide whether to
attach or not to controller(s) in the subsystem. (Klaus)
  - Implemented Identify Active Namespace ID List aprt from Identify
Allocated Namespace ID List by removing fall-thru statement.

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
hierarchy.

Minwoo Im (8):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem
  hw

[RFC PATCH V3 6/8] hw/block/nvme: support for shared namespace in subsystem

2021-01-19 Thread Minwoo Im
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme.c| 10 +-
 5 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c8b75fa02138..073f65e49cac 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -358,16 +362,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index a01003136b12..e7efdcae7d0d 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,6 +14,7 @@
 OBJ

[RFC PATCH V3 4/8] hw/block/nvme: support for multi-controller in subsystem

2021-01-19 Thread Minwoo Im
We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 34 --
 hw/block/nvme.h|  1 +
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index f1dc71d588d9..a01003136b12 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3e3b5451ea3d..9f8a739fcd8f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4304,16 +4304,21 @@ static void nvme_init_subnqn(NvmeCtrl *n)
 }
 }
 
-static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
+static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
 
+n->cntlid = cntlid;
+
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cntlid;
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4359,6 +4364,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4371,11 +4380,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+return cntlid;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeNamespace *ns;
 Error *local_err = NULL;
+int cntlid;
 
 nvme_check_constraints(n, _err);
 if (local_err) {
@@ -4391,7 +4417,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-nvme_init_ctrl(n, pci_dev);
+cntlid = nvme_init_subsys(n, errp);
+if (cntlid < 0) {
+return;
+}
+nvme_init_ctrl(n, pci_dev, cntlid);
 
 /* setup a namespace if the controller drive property was given */
 if (n->namespace.blkconf.blk) {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 3fa0e0a15539..c158cc873b59 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -133,6 +133,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1




[RFC PATCH V3 2/8] hw/block/nvme: support to map controller to a subsystem

2021-01-19 Thread Minwoo Im
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 30 ++
 hw/block/nvme.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4644937a5c50..3e3b5451ea3d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,6 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit= \
+ *  ,subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -43,6 +44,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4281,11 +4289,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+char *subnqn;
+
+if (!subsys) {
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
+strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
+g_free(subnqn);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4331,9 +4353,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4417,6 +4437,8 @@ static void nvme_exit(PCIDevice *pci_dev)
 
 static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmrdev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 347c149e7905..3fa0e0a15539 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -158,6 +159,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
-- 
2.17.1




[RFC PATCH V3 5/8] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-19 Thread Minwoo Im
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 733fb35fedde..28404a62b9d2 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1110,6 +1110,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1




Re: [PATCH v3 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-19 Thread Minwoo Im
On 21-01-19 11:15:02, Klaus Jensen wrote:
> From: Padmakar Kalghatgi 
> 
> Implement v1.4 logic for configuring the Controller Memory Buffer. This
> is not backward compatible with v1.3, so drivers that only support v1.3
> will not be able to use the CMB anymore.

Reviewed the legacy-cmb paramete, but we should update the commit
description up there as we can support v1.3 behavior also, or you can
update it when you are picking up :)



Re: [RFC PATCH V2 02/11] hw/block/nvme: open code for volatile write cache

2021-01-19 Thread Minwoo Im
On 21-01-19 08:32:21, Klaus Jensen wrote:
> On Jan 17 23:53, Minwoo Im wrote:
> > Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the
> > initial time.  This feature is related to block device backed,  but this
> > feature is controlled in controller level via Set/Get Features command.
> > 
> > This patch removed dependency between nvme and nvme-ns to manage the VWC
> > flag value.  Also, it open coded the Get Features for VWC to check all
> > namespaces attached to the controller, and if false detected, return
> > directly false.
> > 
> > Signed-off-by: Minwoo Im 
> > ---
> >  hw/block/nvme-ns.c |  4 
> >  hw/block/nvme.c| 15 ---
> >  hw/block/nvme.h|  1 -
> >  3 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> > index 32662230130b..c403cd36b6bd 100644
> > --- a/hw/block/nvme-ns.c
> > +++ b/hw/block/nvme-ns.c
> > @@ -89,10 +89,6 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace 
> > *ns, Error **errp)
> >  return -1;
> >  }
> >  
> > -if (blk_enable_write_cache(ns->blkconf.blk)) {
> > -n->features.vwc = 0x1;
> > -}
> > -
> >  return 0;
> >  }
> >  
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index cf0fe28fe6eb..b2a9c48a9d81 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -3033,6 +3033,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, 
> > NvmeRequest *req)
> >  NvmeGetFeatureSelect sel = NVME_GETFEAT_SELECT(dw10);
> >  uint16_t iv;
> >  NvmeNamespace *ns;
> > +int i;
> >  
> >  static const uint32_t nvme_feature_default[NVME_FID_MAX] = {
> >  [NVME_ARBITRATION] = NVME_ARB_AB_NOLIMIT,
> > @@ -3108,7 +3109,17 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, 
> > NvmeRequest *req)
> >  result = ns->features.err_rec;
> >  goto out;
> >  case NVME_VOLATILE_WRITE_CACHE:
> > -result = n->features.vwc;
> > +for (i = 1; i <= n->num_namespaces; i++) {
> > +ns = nvme_ns(n, i);
> > +if (!ns) {
> > +continue;
> > +}
> > +
> > +result = blk_enable_write_cache(ns->blkconf.blk);
> > +if (!result) {
> > +break;
> > +}
> > +}
> 
> I did a small tweak here and changed `if (!result)` to `if (result)`. We
> want to report that a volatile write cache is present if ANY of the
> namespace backing devices have a write cache.

Oh Thanks for the fix!



Re: [RFC PATCH V2 00/11] hw/block/nvme: support multi-path for ctrl/ns

2021-01-18 Thread Minwoo Im
On 21-01-19 07:04:04, Klaus Jensen wrote:
> On Jan 19 12:21, Minwoo Im wrote:
> > On 21-01-18 22:14:45, Klaus Jensen wrote:
> > > On Jan 17 23:53, Minwoo Im wrote:
> > > > Hello,
> > > > 
> > > > This patch series introduces NVMe subsystem device to support multi-path
> > > > I/O in NVMe device model.  Two use-cases are supported along with this
> > > > patch: Multi-controller, Namespace Sharing.
> > > > 
> > > > V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> > > > for this patch series to have proper direction [1].
> > > > 
> > > > This patch series contains few start-up refactoring pathces from the
> > > > first to fifth patches to make nvme-ns device not to rely on the nvme
> > > > controller always.  Because nvme-ns shall be able to be mapped to the
> > > > subsystem level, not a single controller level so that it should provide
> > > > generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> > > > the first five patches are to remove the NvmeCtrl * instance argument
> > > > from the nvme_ns_setup().  I'd be happy if they are picked!
> > > > 
> > > > For controller and namespace devices, 'subsys' property has been
> > > > introduced to map them to a subsystem.  If multi-controller needed, we
> > > > can specify 'subsys' to controllers the same.
> > > > 
> > > > For namespace deivice, if 'subsys' is not given just like it was, it
> > > > will have to be provided with 'bus' parameter to specify a nvme
> > > > controller device to attach, it means, they are mutual-exlusive.  To
> > > > share a namespace between or among controllers, then nvme-ns should have
> > > > 'subsys' property to a single nvme subsystem instance.  To make a
> > > > namespace private one, then we need to specify 'bus' property rather
> > > > than the 'subsys'.
> > > > 
> > > > Of course, this series does not require any updates for the run command
> > > > for the previos users.
> > > > 
> > > > Plase refer the following example with nvme-cli output:
> > > > 
> > > > QEMU Run:
> > > >   -device nvme-subsys,id=subsys0 \
> > > >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> > > >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> > > >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> > > >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> > > >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> > > >   \
> > > >   -device nvme,serial=qux,id=nvme3 \
> > > >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > > > 
> > > > nvme-cli:
> > > >   root@vm:~/work# nvme list -v
> > > >   NVM Express Subsystems
> > > > 
> > > >   SubsystemSubsystem-NQN
> > > > Controllers
> > > >    
> > > > 
> > > >  
> > > >   nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
> > > > nvme0, nvme1, nvme2
> > > >   nvme-subsys3 nqn.2019-08.org.qemu:qux 
> > > > nvme3
> > > > 
> > > >   NVM Express Controllers
> > > > 
> > > >   Device   SN   MN  
> > > >  FR   TxPort AddressSubsystemNamespaces
> > > >     
> > > >   -- -- 
> > > >  
> > > >   nvme0foo  QEMU NVMe Ctrl  
> > > >  1.0  pcie   :00:06.0   nvme-subsys1 nvme1n1
> > > >   nvme1bar  QEMU NVMe Ctrl  
> > > >  1.0  pcie   :00:07.0   nvme-subsys1 nvme1n1
> > > >   nvme2baz  QEMU NVMe Ctrl  
> > > >  1.0  pcie   :00:08.0   nvme-subsys1 nvme1n1, nvme1n2
> > > >   nvme3qux  QEMU NVMe Ctrl  
> >

Re: [RFC PATCH V2 00/11] hw/block/nvme: support multi-path for ctrl/ns

2021-01-18 Thread Minwoo Im
On 21-01-18 22:14:45, Klaus Jensen wrote:
> On Jan 17 23:53, Minwoo Im wrote:
> > Hello,
> > 
> > This patch series introduces NVMe subsystem device to support multi-path
> > I/O in NVMe device model.  Two use-cases are supported along with this
> > patch: Multi-controller, Namespace Sharing.
> > 
> > V1 RFC has been discussed with Klaus and Keith, I really appreciate them
> > for this patch series to have proper direction [1].
> > 
> > This patch series contains few start-up refactoring pathces from the
> > first to fifth patches to make nvme-ns device not to rely on the nvme
> > controller always.  Because nvme-ns shall be able to be mapped to the
> > subsystem level, not a single controller level so that it should provide
> > generic initialization code: nvme_ns_setup() with NvmeCtrl.  To do that,
> > the first five patches are to remove the NvmeCtrl * instance argument
> > from the nvme_ns_setup().  I'd be happy if they are picked!
> > 
> > For controller and namespace devices, 'subsys' property has been
> > introduced to map them to a subsystem.  If multi-controller needed, we
> > can specify 'subsys' to controllers the same.
> > 
> > For namespace deivice, if 'subsys' is not given just like it was, it
> > will have to be provided with 'bus' parameter to specify a nvme
> > controller device to attach, it means, they are mutual-exlusive.  To
> > share a namespace between or among controllers, then nvme-ns should have
> > 'subsys' property to a single nvme subsystem instance.  To make a
> > namespace private one, then we need to specify 'bus' property rather
> > than the 'subsys'.
> > 
> > Of course, this series does not require any updates for the run command
> > for the previos users.
> > 
> > Plase refer the following example with nvme-cli output:
> > 
> > QEMU Run:
> >   -device nvme-subsys,id=subsys0 \
> >   -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
> >   -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
> >   -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
> >   -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
> >   -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
> >   \
> >   -device nvme,serial=qux,id=nvme3 \
> >   -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3
> > 
> > nvme-cli:
> >   root@vm:~/work# nvme list -v
> >   NVM Express Subsystems
> > 
> >   SubsystemSubsystem-NQN
> > Controllers
> >    
> > 
> >  
> >   nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
> > nvme0, nvme1, nvme2
> >   nvme-subsys3 nqn.2019-08.org.qemu:qux 
> > nvme3
> > 
> >   NVM Express Controllers
> > 
> >   Device   SN   MN   FR 
> >   TxPort AddressSubsystemNamespaces
> >      
> >  -- --  
> >   nvme0foo  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:06.0   nvme-subsys1 nvme1n1
> >   nvme1bar  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:07.0   nvme-subsys1 nvme1n1
> >   nvme2baz  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:08.0   nvme-subsys1 nvme1n1, nvme1n2
> >   nvme3qux  QEMU NVMe Ctrl   
> > 1.0  pcie   :00:09.0   nvme-subsys3
> > 
> >   NVM Express Namespaces
> > 
> >   Device   NSID Usage  Format   
> > Controllers
> >     --  
> > 
> >   nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
> > nvme1, nvme2
> >   nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
> >   nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3
> > 
> > Summary:
> >   - Refactored nvme-ns device not to rely on controller during the
> > setup.  [1/11 - 5/11]
> >   - Introduced a nvme-subsys device model. [6/11]
> >   - Create subsystem NQN based on subsystem. [7/11]

Re: [RFC PATCH V2 10/11] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-18 Thread Minwoo Im
> Let's keep convention (or should be convention...) of using NvmeIdNs
> prefix for identify data structure fields on the enum.

Okay!



Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-18 Thread Minwoo Im
On 21-01-18 20:23:30, Klaus Jensen wrote:
> On Jan 18 14:22, Klaus Jensen wrote:
> > On Jan 18 22:12, Minwoo Im wrote:
> > > On 21-01-18 14:10:50, Klaus Jensen wrote:
> > > > On Jan 18 22:09, Minwoo Im wrote:
> > > > > > > Yes, CMB in v1.4 is not backward-compatible, but is it okay to 
> > > > > > > move onto
> > > > > > > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this 
> > > > > > > device
> > > > > > > model?
> > > > > > 
> > > > > > Next patch moves to v1.4. I wanted to split it because the "bump" 
> > > > > > patch
> > > > > > also adds a couple of other v1.4 requires fields. But I understand 
> > > > > > that
> > > > > > this is slightly wrong.
> > > > > 
> > > > > Sorry, I meant I'd like to have CMB for v1.3 support along with the 
> > > > > v1.4
> > > > > support in this device model separately.  Maybe I missed the 
> > > > > linux-nvme
> > > > > mailing list for CMB v1.4, but is there no plan to keep supportin CMB
> > > > > v1.3 in NVMe driver?
> > > > 
> > > > I posted a patch on linux-nvme for v1.4 support in the kernel. It will
> > > > support both the v1.3 and v1.4 behavior :)
> > > 
> > > Then, can we maintain CMB v1.3 also in QEMU also along with v1.4 ? :)
> > 
> > My first version of this patch actually included compatibility support
> > for v1.3 ("legacy cmb"), but Keith suggested we just drop that and keep
> > this device compliant.
> > 
> > What we could do is allow the spec version to be chosen with a
> > parameter?
> 
> Uhm. Maybe not. I gave this some more thought.
> 
> Adding a device parameter to choose the specification version requires
> us to maintain QEMU "compat" properties across different QEMU version.
> I'm not sure we want that for something like this.

Agreed.  The default should head for latest one.

> 
> Maybe the best course of action actually *is* an 'x-legacy-cmb'
> parameter.

This looks fine.

As kernel driver does not remove the v1.3 CMB support, then it would be
great if QEMU suports that also.



Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes

2021-01-18 Thread Minwoo Im
On 21-01-18 20:53:24, Klaus Jensen wrote:
> On Jan 18 21:59, Minwoo Im wrote:
> > > The spec requires aligned 32 bit accesses (or the size of the register),
> > > so maybe it's about time we actually ignore instead of falling through.
> > 
> > Agreed.
> > 
> 
> On the other hand.
> 
> The spec just allows undefined behavior if the alignment or size
> requirement is violated. So falling through is not wrong.

If so, maybe we just can make this MMIO region support under 4bytes
access with error messaging.  I don't think we should not support them
on purpose because spec says it just results in undefined behaviors
which is just undefined how to handle them.



Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-18 Thread Minwoo Im
On 21-01-18 14:10:50, Klaus Jensen wrote:
> On Jan 18 22:09, Minwoo Im wrote:
> > > > Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto
> > > > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device
> > > > model?
> > > 
> > > Next patch moves to v1.4. I wanted to split it because the "bump" patch
> > > also adds a couple of other v1.4 requires fields. But I understand that
> > > this is slightly wrong.
> > 
> > Sorry, I meant I'd like to have CMB for v1.3 support along with the v1.4
> > support in this device model separately.  Maybe I missed the linux-nvme
> > mailing list for CMB v1.4, but is there no plan to keep supportin CMB
> > v1.3 in NVMe driver?
> 
> I posted a patch on linux-nvme for v1.4 support in the kernel. It will
> support both the v1.3 and v1.4 behavior :)

Then, can we maintain CMB v1.3 also in QEMU also along with v1.4 ? :)



Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-18 Thread Minwoo Im
> > Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto
> > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device
> > model?
> 
> Next patch moves to v1.4. I wanted to split it because the "bump" patch
> also adds a couple of other v1.4 requires fields. But I understand that
> this is slightly wrong.

Sorry, I meant I'd like to have CMB for v1.3 support along with the v1.4
support in this device model separately.  Maybe I missed the linux-nvme
mailing list for CMB v1.4, but is there no plan to keep supportin CMB
v1.3 in NVMe driver?



Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4

2021-01-18 Thread Minwoo Im
Hello,

On 21-01-18 10:47:03, Klaus Jensen wrote:
> From: Padmakar Kalghatgi 
> 
> Implement v1.4 logic for configuring the Controller Memory Buffer. This
> is not backward compatible with v1.3, so drivers that only support v1.3
> will not be able to use the CMB anymore.
> 
> Signed-off-by: Padmakar Kalghatgi 
> Signed-off-by: Klaus Jensen 

Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto
the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device
model?



Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes

2021-01-18 Thread Minwoo Im
> The spec requires aligned 32 bit accesses (or the size of the register),
> so maybe it's about time we actually ignore instead of falling through.

Agreed.



Re: [PATCH v2 07/12] hw/block/nvme: remove redundant zeroing of PMR registers

2021-01-18 Thread Minwoo Im
On 21-01-18 10:47:00, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The controller registers are initially zero. Remove the redundant
> zeroing.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 35 ---
>  1 file changed, 35 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f3bea582b3c0..9ee9570bb65c 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -4179,43 +4179,8 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  
>  static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev)
>  {
> -/* PMR Capabities register */
> -n->bar.pmrcap = 0;
> -NVME_PMRCAP_SET_RDS(n->bar.pmrcap, 0);
> -NVME_PMRCAP_SET_WDS(n->bar.pmrcap, 0);
>  NVME_PMRCAP_SET_BIR(n->bar.pmrcap, NVME_PMR_BIR);
> -NVME_PMRCAP_SET_PMRTU(n->bar.pmrcap, 0);
> -/* Turn on bit 1 support */

This comment says that PMRWBM [1]th bit is set to PMRCAP below :).

>  NVME_PMRCAP_SET_PMRWBM(n->bar.pmrcap, 0x02);
> -NVME_PMRCAP_SET_PMRTO(n->bar.pmrcap, 0);
> -NVME_PMRCAP_SET_CMSS(n->bar.pmrcap, 0);
> -
> -/* PMR Control register */
> -n->bar.pmrctl = 0;
> -NVME_PMRCTL_SET_EN(n->bar.pmrctl, 0);
> -
> -/* PMR Status register */
> -n->bar.pmrsts = 0;
> -NVME_PMRSTS_SET_ERR(n->bar.pmrsts, 0);
> -NVME_PMRSTS_SET_NRDY(n->bar.pmrsts, 0);
> -NVME_PMRSTS_SET_HSTS(n->bar.pmrsts, 0);
> -NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 0);
> -
> -/* PMR Elasticity Buffer Size register */
> -n->bar.pmrebs = 0;
> -NVME_PMREBS_SET_PMRSZU(n->bar.pmrebs, 0);
> -NVME_PMREBS_SET_RBB(n->bar.pmrebs, 0);
> -NVME_PMREBS_SET_PMRWBZ(n->bar.pmrebs, 0);
> -
> -/* PMR Sustained Write Throughput register */
> -n->bar.pmrswtp = 0;
> -NVME_PMRSWTP_SET_PMRSWTU(n->bar.pmrswtp, 0);
> -NVME_PMRSWTP_SET_PMRSWTV(n->bar.pmrswtp, 0);
> -
> -/* PMR Memory Space Control register */
> -n->bar.pmrmsc = 0;
> -NVME_PMRMSC_SET_CMSE(n->bar.pmrmsc, 0);
> -NVME_PMRMSC_SET_CBA(n->bar.pmrmsc, 0);
>  
>  pci_register_bar(pci_dev, NVME_PMRCAP_BIR(n->bar.pmrcap),
>   PCI_BASE_ADDRESS_SPACE_MEMORY |
> -- 
> 2.30.0
> 
> 



Re: [PATCH v2 06/12] hw/block/nvme: rename PMR/CMB shift/mask fields

2021-01-18 Thread Minwoo Im
On 21-01-18 10:46:59, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Use the correct field names.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  include/block/nvme.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 86d7fc2f905c..f3cbe17d0971 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -35,8 +35,8 @@ enum NvmeCapShift {
>  CAP_CSS_SHIFT  = 37,
>  CAP_MPSMIN_SHIFT   = 48,
>  CAP_MPSMAX_SHIFT   = 52,
> -CAP_PMR_SHIFT  = 56,
> -CAP_CMB_SHIFT  = 57,
> +CAP_PMRS_SHIFT = 56,
> +CAP_CMBS_SHIFT = 57,
>  };
>  
>  enum NvmeCapMask {
> @@ -49,8 +49,8 @@ enum NvmeCapMask {
>  CAP_CSS_MASK   = 0xff,
>  CAP_MPSMIN_MASK= 0xf,
>  CAP_MPSMAX_MASK= 0xf,
> -CAP_PMR_MASK   = 0x1,
> -CAP_CMB_MASK   = 0x1,
> +CAP_PMRS_MASK  = 0x1,
> +CAP_CMBS_MASK  = 0x1,
>  };
>  
>  #define NVME_CAP_MQES(cap)  (((cap) >> CAP_MQES_SHIFT)   & CAP_MQES_MASK)
> @@ -81,10 +81,10 @@ enum NvmeCapMask {
> << 
> CAP_MPSMIN_SHIFT)
>  #define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & 
> CAP_MPSMAX_MASK)\
> << 
> CAP_MPSMAX_SHIFT)
> -#define NVME_CAP_SET_PMRS(cap, val)   (cap |= (uint64_t)(val & CAP_PMR_MASK) 
>   \
> -   << CAP_PMR_SHIFT)
> -#define NVME_CAP_SET_CMBS(cap, val)   (cap |= (uint64_t)(val & CAP_CMB_MASK) 
>   \
> -   << CAP_CMB_SHIFT)
> +#define NVME_CAP_SET_PMRS(cap, val)   (cap |= (uint64_t)(val & 
> CAP_PMRS_MASK)  \
> +   << CAP_PMRS_SHIFT)
> +#define NVME_CAP_SET_CMBS(cap, val)   (cap |= (uint64_t)(val & 
> CAP_CMBS_MASK)  \
> +   << CAP_CMBS_SHIFT)

Oh, it would have been better folded into [3/12] patch, though.

Changes are looking good to me to represent "Supported".

Reviewed-by: Minwoo Im 

>  
>  enum NvmeCapCss {
>  NVME_CAP_CSS_NVM= 1 << 0,
> -- 
> 2.30.0
> 
> 



Re: [PATCH v2 05/12] hw/block/nvme: allow cmb and pmr to coexist

2021-01-18 Thread Minwoo Im
Reviewed-by: Minwoo Im 



Re: [PATCH v2 04/12] hw/block/nvme: move msix table and pba to BAR 0

2021-01-18 Thread Minwoo Im


On 21-01-18 10:46:57, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> In the interest of supporting both CMB and PMR to be enabled on the same
> device, move the MSI-X table and pending bit array out of BAR 4 and into
> BAR 0.

Nice!

Reviewed-by: Minwoo Im 
Tested-by: Minwoo Im 

Thanks,



Re: [PATCH v2 03/12] hw/block/nvme: indicate CMB support through controller capabilities register

2021-01-18 Thread Minwoo Im
Reviewed-by: Minwoo Im 



Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes

2021-01-18 Thread Minwoo Im
On 21-01-18 10:46:55, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> 64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32
> bit write combination as well as a plain 64 bit write. The spec does not
> define ordering on the hi/lo split, but the code currently assumes that
> the low order bits are written first. Additionally, the code does not
> consider that another address might already have been written into the
> register, causing the OR'ing to result in a bad address.
> 
> Fix this by explicitly overwriting only the low or high order bits for
> 32 bit writes.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index bd7e258c3c26..40b9f8ccfc0e 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -3781,19 +3781,21 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr 
> offset, uint64_t data,
>  trace_pci_nvme_mmio_aqattr(data & 0x);
>  break;
>  case 0x28:  /* ASQ */
> -n->bar.asq = data;
> +n->bar.asq = size == 8 ? data :
> +(n->bar.asq & ~0x) | (data & 0x);
 ^^^
If this one is to unmask lower 32bits other than higher 32bits, then
it should be:

(n->bar.asq & ~0xULL)

Also, maybe we should take care of lower than 4bytes as:

.min_access_size = 2,
.max_access_size = 8,

Even we have some error messages up there with:

if (unlikely(size < sizeof(uint32_t))) {
NVME_GUEST_ERR(pci_nvme_ub_mmiowr_toosmall,
   "MMIO write smaller than 32-bits,"
   " offset=0x%"PRIx64", size=%u",
   offset, size);
/* should be ignored, fall through for now */
}

It's a fall-thru error, and that's it.  I would prefer not to have this
error and support for 2bytes access also, OR do not support for 2bytes
access for this MMIO area.

Thanks!



Re: [PATCH v2 01/12] hw/block/nvme: add size to mmio read/write trace events

2021-01-18 Thread Minwoo Im
Reviewed-by: Minwoo Im 



[RFC PATCH V2 11/11] hw/block/nvme: support for shared namespace in subsystem

2021-01-17 Thread Minwoo Im
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 17 +
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.c| 10 +-
 5 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c8b75fa02138..073f65e49cac 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -358,16 +362,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index d87c025b0ab6..3fc7262a7915 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index a01003136b12..d67160a0ed6f 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,23 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..cae0fbd464e2 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -25,5 +25,6 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2aeb164dd508..7869bd95b099 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,7 +25,8 @@
  *  mdts=,zoned.append_s

[RFC PATCH V2 10/11] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-17 Thread Minwoo Im
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e83ec1e124c6..dd7b5ba9ef4c 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1110,6 +1110,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1




[RFC PATCH V2 09/11] hw/block/nvme: support for multi-controller in subsystem

2021-01-17 Thread Minwoo Im
We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 32 ++--
 hw/block/nvme.h|  1 +
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index f1dc71d588d9..a01003136b12 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index af5b2162e2b5..2aeb164dd508 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4242,7 +4242,7 @@ static void nvme_init_subnqn(NvmeCtrl *n)
 }
 }
 
-static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
+static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
@@ -4252,6 +4252,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cntlid;
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4297,6 +4300,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4309,11 +4316,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+return cntlid;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeNamespace *ns;
 Error *local_err = NULL;
+int cntlid;
 
 nvme_check_constraints(n, _err);
 if (local_err) {
@@ -4329,7 +4353,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-nvme_init_ctrl(n, pci_dev);
+cntlid = nvme_init_subsys(n, errp);
+if (cntlid < 0) {
+return;
+}
+nvme_init_ctrl(n, pci_dev, cntlid);
 
 /* setup a namespace if the controller drive property was given */
 if (n->namespace.blkconf.blk) {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 2061e53e2cee..329902a9fc04 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -129,6 +129,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1




<    1   2   3   >