Re: [RFC PATCH 1/3] hw/block/nvme: set NVME_DNR in a single place
On 21-02-10 21:19:43, Klaus Jensen wrote: > On Feb 11 04:52, Minwoo Im wrote: > > @@ -945,6 +945,11 @@ static void nvme_post_cqes(void *opaque) > > static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) > > { > > assert(cq->cqid == req->sq->cqid); > > + > > +if (req->status != NVME_SUCCESS) { > > +req->status |= NVME_DNR; > > +} > > There are status codes where we do not set the DNR bit (e.g. Data > Transfer Error, and that might be the only one actually). Ouch, I think I need to prepare some of switch-helper to figure out which one needs to be retried or not. > Maybe a switch such that we do not explicitly set DNR for Data Transfer > Error (and any other errors we identify), but only if we set it earlier > in the stack. Agreed.
Re: [RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command
On 21-02-11 12:00:11, Keith Busch wrote: > On Thu, Feb 11, 2021 at 04:52:52AM +0900, Minwoo Im wrote: > > nvme_inject_state command is to give a controller state to be. > > Human Monitor Interface(HMP) supports users to make controller to a > > specified state of: > > > > normal: Normal state (no injection) > > cmd-interrupted:Commands will be interrupted internally > > > > This patch is just a start to give dynamic command from the HMP to the > > QEMU NVMe device model. If "cmd-interrupted" state is given, then the > > controller will return all the CQ entries with Command Interrupts status > > code. > > > > Usage: > > -device nvme,id=nvme0, > > > > (qemu) nvme_inject_state nvme0 cmd-interrupted > > > > > > > > (qemu) nvme_inject_state nvme0 normal > > > > This feature is required to test Linux kernel NVMe driver for the > > command retry feature. > > Once the user sets the injected state, all commands return that status > until the user injects the normal state, so the CRD time is meaningless > here. If we're really going this route, the state needs to return to > normal on it's own. That would also be fine to me. > But I would prefer to see advanced retry tied to real errors that can be > retried, like if we got an EBUSY or EAGAIN errno or something like that. I have seen a thread [1] about ACRE. Forgive me If I misunderstood this thread or missed something after this thread. It looks like CRD field in the CQE can be set for any NVMe error state which means it *may* depend on the device status. And this patch just introduced a internal temporarily error state of the controller by returning Command Intrrupted status. I think, in this stage, we can go with some errors in the middle of the AIO (nvme_aio_err()) for advanced retry. Shouldn't AIO errors are retry-able and supposed to be retried ? > The interface you found to implement this is very interesting though. Thanks, I just wanted to suggest a scheme to inject something to a running NVMe device model for various testing. [1] https://www.spinics.net/lists/dm-devel/msg42165.html
Re: [RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command
On 21-02-10 21:33:50, Klaus Jensen wrote: > On Feb 11 04:52, Minwoo Im wrote: > > nvme_inject_state command is to give a controller state to be. > > Human Monitor Interface(HMP) supports users to make controller to a > > specified state of: > > > > normal: Normal state (no injection) > > cmd-interrupted:Commands will be interrupted internally > > > > This patch is just a start to give dynamic command from the HMP to the > > QEMU NVMe device model. If "cmd-interrupted" state is given, then the > > controller will return all the CQ entries with Command Interrupts status > > code. > > > > Usage: > > -device nvme,id=nvme0, > > > > (qemu) nvme_inject_state nvme0 cmd-interrupted > > > > > > > > (qemu) nvme_inject_state nvme0 normal > > > > This feature is required to test Linux kernel NVMe driver for the > > command retry feature. > > > > This is super cool and commands like this feel much nicer than the > qom-set approach that the SMART critical warning feature took. This interface is super easy to inject some errors to the running device with a function call-back. > But... looking at the existing commands I don't think we can "bloat" it > up with a device specific command like this, but I don't know the policy > around this. Me neither, but I've seen the PCI AER error injection feature from the existing commands, so I suggested this command to control the NVMe device itself like error injection. > If an HMP command is out, then we should be able to make do with the > qom-set approach just fine though. Hope so.
Re: [PATCH 2/2] hw/nvme: move device-scoped functions
On 21-02-09 12:08:26, Klaus Jensen wrote: > From: Klaus Jensen > > Move a bunch of functions that are internal to a device out of the > shared header. May I ask why? I think some kind of these helpers can stick onto the header.
Re: [PATCH 1/2] hw/nvme: move nvme emulation out of hw/block
On 21-02-09 12:08:25, Klaus Jensen wrote: > From: Klaus Jensen > > With the introduction of the nvme-subsystem device we are really > cluttering up the hw/block directory. > > As suggested by Philippe previously, move the nvme emulation to > hw/nvme. > > Suggested-by: Philippe Mathieu-Daudé > Signed-off-by: Klaus Jensen Please add description about consolidation of nvme-ns.h and nvme-subsys.h to the nvme.h for a unified header file when you apply this patch! :) Acked-by: Minwoo Im
Re: [PATCH] hw/block/nvme: drain namespaces on sq deletion
On 21-01-27 14:15:05, Klaus Jensen wrote: > From: Klaus Jensen > > For most commands, when issuing an AIO, the BlockAIOCB is stored in the > NvmeRequest aiocb pointer when the AIO is issued. The purpose of storing > this is to allow the AIO to be cancelled when deleting submission > queues (it is currently not used for Abort). > > Since the addition of the Dataset Management command and Zoned > Namespaces, NvmeRequests may involve more than one AIO and the AIOs are > issued without saving a reference to the BlockAIOCB. This is a problem > since nvme_del_sq will attempt to cancel outstanding AIOs, potentially > with an invalid BlockAIOCB. > > Fix this by instead of explicitly cancelling the requests, just allow > the AIOs to complete by draining the namespace blockdevs. > > Signed-off-by: Klaus Jensen > --- > hw/block/nvme.c | 18 +- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index 316858fd8adf..91f6fb6da1e2 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -403,6 +403,7 @@ static void nvme_req_clear(NvmeRequest *req) > { > req->ns = NULL; > req->opaque = NULL; > +req->aiocb = NULL; > memset(>cqe, 0x0, sizeof(req->cqe)); > req->status = NVME_SUCCESS; > } > @@ -2396,6 +2397,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest > *req) > NvmeSQueue *sq; > NvmeCQueue *cq; > uint16_t qid = le16_to_cpu(c->qid); > +int i; > > if (unlikely(!qid || nvme_check_sqid(n, qid))) { > trace_pci_nvme_err_invalid_del_sq(qid); > @@ -2404,12 +2406,18 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest > *req) > > trace_pci_nvme_del_sq(qid); > > -sq = n->sq[qid]; > -while (!QTAILQ_EMPTY(>out_req_list)) { > -r = QTAILQ_FIRST(>out_req_list); > -assert(r->aiocb); > -blk_aio_cancel(r->aiocb); > +for (i = 1; i <= n->num_namespaces; i++) { > +NvmeNamespace *ns = nvme_ns(n, i); > +if (!ns) { > +continue; > +} > + > +nvme_ns_drain(ns); If we just drain the entire namespaces here, commands which has nothing to do with the target sq to be deleted will be drained. And this might be a burden for a single SQ deletion. By the way, agree with the multiple AIOs references problem for newly added commands. But, shouldn't we manage the inflight AIO request references for the newlly added commands with some other way and kill them all explicitly as it was? Maybe some of list for AIOCBs?
[RFC PATCH 1/3] hw/block/nvme: set NVME_DNR in a single place
Set NVME_DNR in the CQ entry status field right before writing the CQ entry: in nvme_post_cqes(). We have put NVME_DNR for all CQ entry status for all error cases. This patch is a former patch to support command retry feature. Signed-off-by: Minwoo Im --- hw/block/nvme.c | 192 1 file changed, 97 insertions(+), 95 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 93345bf3c1fc..816e0e8e5205 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -270,12 +270,12 @@ static int nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn) if (ns->params.max_active_zones != 0 && ns->nr_active_zones + act > ns->params.max_active_zones) { trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones); -return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR; +return NVME_ZONE_TOO_MANY_ACTIVE; } if (ns->params.max_open_zones != 0 && ns->nr_open_zones + opn > ns->params.max_open_zones) { trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones); -return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR; +return NVME_ZONE_TOO_MANY_OPEN; } return NVME_SUCCESS; @@ -492,7 +492,7 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov, if (cmb || pmr) { if (qsg && qsg->sg) { -return NVME_INVALID_USE_OF_CMB | NVME_DNR; +return NVME_INVALID_USE_OF_CMB; } assert(iov); @@ -509,7 +509,7 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov, } if (iov && iov->iov) { -return NVME_INVALID_USE_OF_CMB | NVME_DNR; +return NVME_INVALID_USE_OF_CMB; } assert(qsg); @@ -568,7 +568,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2, if (i == n->max_prp_ents - 1 && len > n->page_size) { if (unlikely(prp_ent & (n->page_size - 1))) { trace_pci_nvme_err_invalid_prplist_ent(prp_ent); -return NVME_INVALID_PRP_OFFSET | NVME_DNR; +return NVME_INVALID_PRP_OFFSET; } i = 0; @@ -585,7 +585,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2, if (unlikely(prp_ent & (n->page_size - 1))) { trace_pci_nvme_err_invalid_prplist_ent(prp_ent); -return NVME_INVALID_PRP_OFFSET | NVME_DNR; +return NVME_INVALID_PRP_OFFSET; } trans_len = MIN(len, n->page_size); @@ -600,7 +600,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2, } else { if (unlikely(prp2 & (n->page_size - 1))) { trace_pci_nvme_err_invalid_prp2_align(prp2); -return NVME_INVALID_PRP_OFFSET | NVME_DNR; +return NVME_INVALID_PRP_OFFSET; } status = nvme_map_addr(n, qsg, iov, prp2, len); if (status) { @@ -637,9 +637,9 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg, break; case NVME_SGL_DESCR_TYPE_SEGMENT: case NVME_SGL_DESCR_TYPE_LAST_SEGMENT: -return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR; +return NVME_INVALID_NUM_SGL_DESCRS; default: -return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR; +return NVME_SGL_DESCR_TYPE_INVALID; } dlen = le32_to_cpu(segment[i].len); @@ -660,7 +660,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg, } trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req)); -return NVME_DATA_SGL_LEN_INVALID | NVME_DNR; +return NVME_DATA_SGL_LEN_INVALID; } trans_len = MIN(*len, dlen); @@ -672,7 +672,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg, addr = le64_to_cpu(segment[i].addr); if (UINT64_MAX - addr < dlen) { -return NVME_DATA_SGL_LEN_INVALID | NVME_DNR; +return NVME_DATA_SGL_LEN_INVALID; } status = nvme_map_addr(n, qsg, iov, addr, trans_len); @@ -731,7 +731,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov, case NVME_SGL_DESCR_TYPE_LAST_SEGMENT: break; default: -return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR; +return NVME_INVALID_SGL_SEG_DESCR; } seg_len = le32_to_cpu(sgld->len); @@ -739,11 +739,11 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov, /* check the length of the (Last) Segment descriptor */ if ((!seg_len || seg_len & 0xf) && (NVME_SGL_TYPE(s
[RFC PATCH 3/3] hw/block/nvme: add nvme_inject_state HMP command
nvme_inject_state command is to give a controller state to be. Human Monitor Interface(HMP) supports users to make controller to a specified state of: normal: Normal state (no injection) cmd-interrupted:Commands will be interrupted internally This patch is just a start to give dynamic command from the HMP to the QEMU NVMe device model. If "cmd-interrupted" state is given, then the controller will return all the CQ entries with Command Interrupts status code. Usage: -device nvme,id=nvme0, (qemu) nvme_inject_state nvme0 cmd-interrupted (qemu) nvme_inject_state nvme0 normal This feature is required to test Linux kernel NVMe driver for the command retry feature. Signed-off-by: Minwoo Im --- hmp-commands.hx | 13 hw/block/nvme.c | 49 +++ hw/block/nvme.h | 8 +++ include/monitor/hmp.h | 1 + 4 files changed, 71 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index d4001f9c5dc6..ef288c567b46 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1307,6 +1307,19 @@ SRST Inject PCIe AER error ERST +{ +.name = "nvme_inject_state", +.args_type = "id:s,state:s", +.params = "id [normal|cmd-interrupted]", +.help = "inject controller/namespace state", +.cmd= hmp_nvme_inject_state, +}, + +SRST +``nvme_inject_state`` + Inject NVMe controller/namespace state +ERST + { .name = "netdev_add", .args_type = "netdev:O", diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6d3c554a0e99..42cf5bd113e6 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -123,6 +123,7 @@ #include "sysemu/sysemu.h" #include "qapi/error.h" #include "qapi/visitor.h" +#include "qapi/qmp/qdict.h" #include "sysemu/hostmem.h" #include "sysemu/block-backend.h" #include "exec/memory.h" @@ -132,6 +133,7 @@ #include "trace.h" #include "nvme.h" #include "nvme-ns.h" +#include "monitor/monitor.h" #define NVME_MAX_IOQPAIRS 0x #define NVME_DB_SIZE 4 @@ -966,6 +968,14 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) { assert(cq->cqid == req->sq->cqid); +/* + * Override request status field if controller state has been injected by + * the QMP. + */ +if (cq->ctrl->state == NVME_STATE_CMD_INTERRUPTED) { +req->status = NVME_COMMAND_INTERRUPTED; +} + if (req->status != NVME_SUCCESS) { if (cq->ctrl->features.acre && nvme_should_retry(req)) { if (cq->ctrl->params.cmd_retry_delay > 0) { @@ -5025,4 +5035,43 @@ static void nvme_register_types(void) type_register_static(_bus_info); } +static void nvme_inject_state(NvmeCtrl *n, NvmeState state) +{ +n->state = state; +} + +static const char *nvme_states[] = { +[NVME_STATE_NORMAL] = "normal", +[NVME_STATE_CMD_INTERRUPTED]= "cmd-interrupted", +}; + +void hmp_nvme_inject_state(Monitor *mon, const QDict *qdict) +{ +const char *id = qdict_get_str(qdict, "id"); +const char *state = qdict_get_str(qdict, "state"); +PCIDevice *dev; +NvmeCtrl *n; +int ret, i; + +ret = pci_qdev_find_device(id, ); +if (ret < 0) { +monitor_printf(mon, "invalid device id %s\n", id); +return; +} + +n = NVME(dev); + +for (i = 0; i < ARRAY_SIZE(nvme_states); i++) { +if (!strcmp(nvme_states[i], state)) { +nvme_inject_state(n, i); +monitor_printf(mon, + "-device nvme,id=%s: state %s injected\n", + id, state); +return; +} +} + +monitor_printf(mon, "invalid state %s\n", state); +} + type_init(nvme_register_types) diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 37940b3ac2d2..1af1e0380d9b 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -128,6 +128,11 @@ typedef struct NvmeFeatureVal { uint8_t acre; } NvmeFeatureVal; +typedef enum NvmeState { +NVME_STATE_NORMAL, +NVME_STATE_CMD_INTERRUPTED, +} NvmeState; + typedef struct NvmeCtrl { PCIDeviceparent_obj; MemoryRegion bar0; @@ -185,6 +190,8 @@ typedef struct NvmeCtrl { NvmeCQueue admin_cq; NvmeIdCtrl id_ctrl; NvmeFeatureVal features; + +NvmeState state; } NvmeCtrl; static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid) @@ -212,4 +219,5 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req) int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); +void hmp_nvme_inject_state(Monitor *mon, const QDict *qdict);
[RFC PATCH 2/3] hw/block/nvme: support command retry delay
Set CRDT1(Command Retry Delay Time 1) in the Identify controller data structure to milliseconds units of 100ms by the given value of 'cmd-retry-delay' parameter which is newly added. If cmd-retry-delay=1000, it will be set CRDT1 to 10. This patch only considers the CRDT1 without CRDT2 and 3 for the simplicity. This patch also introduced set/get feature command handler for Host Behavior feature (16h). In this feature, ACRE(Advanced Command Retry Enable) will be set by the host based on the Identify controller data structure, especially by CRDTs. If 'cmd-retry-delay' is not given, the default value will be -1 which is CRDT will not be configured at all and ACRE will not be supported. In this case, we just set NVME_DNR to the error CQ entry just like we used to. If it's given to positive value, then ACRE will be supported by the device. Signed-off-by: Minwoo Im --- hw/block/nvme.c | 65 ++-- hw/block/nvme.h | 2 ++ include/block/nvme.h | 13 - 3 files changed, 77 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 816e0e8e5205..6d3c554a0e99 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -23,7 +23,7 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts=,zoned.append_size_limit=, \ - * subsys= \ + * subsys=,cmd-retry-delay= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ * subsys= @@ -71,6 +71,14 @@ * data size being in effect. By setting this property to 0, users can make * ZASL to be equal to MDTS. This property only affects zoned namespaces. * + * - `cmd-retry-delay` + * Command Retry Delay value in unit of millisecond. This value will be + * reported to the CRDT1(Command Retry Delay Time 1) in Identify Controller + * data structure in 100 milliseconds unit. If this is not given, DNR(Do Not + * Retry) bit field in the Status field of CQ entry. If it's given to 0, + * CRD(Command Retry Delay) will be set to 0 which is for retry without + * delay. Otherwise, it will set to 1 to delay for CRDT1 value. + * * nvme namespace device parameters * * - `subsys` @@ -154,6 +162,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = { [NVME_WRITE_ATOMICITY] = true, [NVME_ASYNCHRONOUS_EVENT_CONF] = true, [NVME_TIMESTAMP]= true, +[NVME_HOST_BEHAVIOR_SUPPORT]= true, }; static const uint32_t nvme_feature_cap[NVME_FID_MAX] = { @@ -163,6 +172,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = { [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE, [NVME_ASYNCHRONOUS_EVENT_CONF] = NVME_FEAT_CAP_CHANGE, [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE, +[NVME_HOST_BEHAVIOR_SUPPORT]= NVME_FEAT_CAP_CHANGE, }; static const uint32_t nvme_cse_acs[256] = { @@ -904,6 +914,16 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len, return status; } +static inline bool nvme_should_retry(NvmeRequest *req) +{ +switch (req->status) { +case NVME_COMMAND_INTERRUPTED: +return true; +default: +return false; +} +} + static void nvme_post_cqes(void *opaque) { NvmeCQueue *cq = opaque; @@ -947,7 +967,13 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req) assert(cq->cqid == req->sq->cqid); if (req->status != NVME_SUCCESS) { -req->status |= NVME_DNR; +if (cq->ctrl->features.acre && nvme_should_retry(req)) { +if (cq->ctrl->params.cmd_retry_delay > 0) { +req->status |= NVME_CRD_CRDT1; +} +} else { +req->status |= NVME_DNR; +} } trace_pci_nvme_enqueue_req_completion(nvme_cid(req), cq->cqid, @@ -3401,6 +3427,16 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_get_feature_host_behavior(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeFeatureHostBehavior data = {}; + +data.acre = n->features.acre; + +return nvme_dma(n, (uint8_t *), sizeof(data), +DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = >cmd; @@ -3506,6 +3542,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) goto out; case NVME_TIMESTAMP: return nvme_get_feature_timestamp(n, req); +case NVME_HOST_BEHAVIOR_SUPPORT: +return nvme_get_feature_host_behavior(n, req); default: break; } @@ -3569,6 +3607,22 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) return NVME_SUCCESS; } +static uint16_t nvme_set_feature_host_behavior(NvmeCtrl *n, NvmeRequest *req) +{ +
[RFC PATCH 0/3] support command retry
Hello, This series is RFC about supporting command retry feature in NVMe device model. The background to propose this feature is that in kernel development and testing, retry scheme has not been able to be covered in QEMU NVMe model device. If we are able to control the retry scheme fromt he device side, it would be nice for kernel developers to test. We have been putting NVME_DNR in the CQ entry status field for all error cases. This series added a control for the command retry based on the 'cmd-retry-delay' parameter which is newly added. If it's given to positive value, Command Retry Delay Time1(CRDT1) in the Identify Controller data structure will be set in 100msec units. Accordingly, it will cause host to Set Feature with Host Behavior(0x16) to enable the Advanced Command Retry Enable(ACRE) feature to support command retry with defined delay. If 'cmd-retry-delay' param is given to 0, then command failures will be retried directly without delay. This series just considered Command Interrupted status code first which is mainly about the ACRE feature addition. nvme_should_retry() helper will decide command should be retried or not. But, we don't have any use-cases specified for the Command Interrupted status code in the device model. So, I proposed [3/3] patch by adding 'nvme_inject_state' HMP command to make users to give pre-defined state to the controller device by injecting it via QEMU monitor. Usage: # Configure the nvme0 device to be retried every 1sec(1000msec) -device nvme,id=nvme0,cmd-retry-delay=1000,... (qemu) nvme_inject_state nvme0 cmd-interrupted -device nvme,id=nvme0: state cmd-interrupted injected (qemu) # Then from now on, controller will interrupt all the commands # to be processed with Command Interrupted status code. Then host # will retry based on the delay. Thanks, Minwoo Im (3): hw/block/nvme: set NVME_DNR in a single place hw/block/nvme: support command retry delay hw/block/nvme: add nvme_inject_state HMP command hmp-commands.hx | 13 ++ hw/block/nvme.c | 304 +- hw/block/nvme.h | 10 ++ include/block/nvme.h | 13 +- include/monitor/hmp.h | 1 + 5 files changed, 244 insertions(+), 97 deletions(-) -- 2.17.1
[PATCH V2 6/7] hw/block/nvme: support namespace attachment command
This patch supports Namespace Attachment command for the pre-defined nvme-ns device nodes. Of course, attach/detach namespace should only be supported in case 'subsys' is given. This is because if we detach a namespace from a controller, somebody needs to manage the detached, but allocated namespace in the NVMe subsystem. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 10 +++ hw/block/nvme.c| 59 ++ hw/block/nvme.h| 5 hw/block/trace-events | 2 ++ include/block/nvme.h | 5 5 files changed, 81 insertions(+) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 14627f9ccb41..ef4bec928eae 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -30,6 +30,16 @@ typedef struct NvmeSubsystem { int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp); +static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys, +uint32_t cntlid) +{ +if (!subsys) { +return NULL; +} + +return subsys->ctrls[cntlid]; +} + /* * Return allocated namespace of the specified nsid in the subsystem. */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 697368a6ae0c..71bcd66f1956 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -183,6 +183,7 @@ static const uint32_t nvme_cse_acs[256] = { [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP, }; static const uint32_t nvme_cse_iocs_none[256]; @@ -3766,6 +3767,62 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns); +static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeNamespace *ns; +NvmeCtrl *ctrl; +uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {}; +uint32_t nsid = le32_to_cpu(req->cmd.nsid); +uint32_t dw10 = le32_to_cpu(req->cmd.cdw10); +bool attach = !(dw10 & 0xf); +uint16_t *nr_ids = [0]; +uint16_t *ids = [1]; +uint16_t ret; +int i; + +trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf); + +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +ret = nvme_dma(n, (uint8_t *)list, 4096, + DMA_DIRECTION_TO_DEVICE, req); +if (ret) { +return ret; +} + +if (!*nr_ids) { +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR; +} + +for (i = 0; i < *nr_ids; i++) { +ctrl = nvme_subsys_ctrl(n->subsys, ids[i]); +if (!ctrl) { +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR; +} + +if (attach) { +if (nvme_ns_is_attached(ctrl, ns)) { +return NVME_NS_ALREADY_ATTACHED | NVME_DNR; +} + +nvme_ns_attach(ctrl, ns); +__nvme_select_ns_iocs(ctrl, ns); +} else { +if (!nvme_ns_is_attached(ctrl, ns)) { +return NVME_NS_NOT_ATTACHED | NVME_DNR; +} + +nvme_ns_detach(ctrl, ns); +} +} + +return NVME_SUCCESS; +} + static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode, @@ -3797,6 +3854,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) return nvme_get_feature(n, req); case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); +case NVME_ADM_CMD_NS_ATTACHMENT: +return nvme_ns_attachment(n, req); default: assert(false); } diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 1c7796b20996..5a1ab857d166 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -222,6 +222,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, NvmeNamespace *ns) n->namespaces[nvme_nsid(ns) - 1] = ns; } +static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns) +{ +n->namespaces[nvme_nsid(ns) - 1] = NULL; +} + static inline NvmeCQueue *nvme_cq(NvmeRequest *req) { NvmeSQueue *sq = req->sq; diff --git a/hw/block/trace-events b/hw/block/trace-events index b6e972d733a6..bf67fe7873d2 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -80,6 +80,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16"" pci_nvme_aer_aerl_exceeded(void) "aerl exceeded" pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8"" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" +pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=0x%"PRIx8"" +pci_nvme_ns_attachm
[PATCH V2 3/7] hw/block/nvme: fix allocated namespace list to 256
Expand allocated namespace list (subsys->namespaces) to have 256 entries which is a value lager than at least NVME_MAX_NAMESPACES which is for attached namespace list in a controller. Allocated namespace list should at least larger than attached namespace list. n->num_namespaces = NVME_MAX_NAMESPACES; The above line will set the NN field by id->nn so that the subsystem should also prepare at least this number of namespace list entries. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 2 +- hw/block/nvme.h| 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 574774390c4c..8a0732b22316 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -14,7 +14,7 @@ OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) #define NVME_SUBSYS_MAX_CTRLS 32 -#define NVME_SUBSYS_MAX_NAMESPACES 32 +#define NVME_SUBSYS_MAX_NAMESPACES 256 typedef struct NvmeCtrl NvmeCtrl; typedef struct NvmeNamespace NvmeNamespace; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index bde0ed7c2679..1c7796b20996 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -10,6 +10,12 @@ #define NVME_DEFAULT_ZONE_SIZE (128 * MiB) #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) +/* + * Subsystem namespace list for allocated namespaces should be larger than + * attached namespace list in a controller. + */ +QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_SUBSYS_MAX_NAMESPACES); + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ -- 2.17.1
[PATCH V2 5/7] hw/block/nvme: refactor nvme_select_ns_iocs
This patch has no functional changes. This patch just refactored nvme_select_ns_iocs() to iterate the attached namespaces of the controlller and make it invoke __nvme_select_ns_iocs(). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d1761a82731f..697368a6ae0c 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -3896,6 +3896,25 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n) } } +static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns) +{ +ns->iocs = nvme_cse_iocs_none; +switch (ns->csi) { +case NVME_CSI_NVM: +if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { +ns->iocs = nvme_cse_iocs_nvm; +} +break; +case NVME_CSI_ZONED: +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { +ns->iocs = nvme_cse_iocs_zoned; +} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { +ns->iocs = nvme_cse_iocs_nvm; +} +break; +} +} + static void nvme_select_ns_iocs(NvmeCtrl *n) { NvmeNamespace *ns; @@ -3906,21 +3925,8 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) if (!ns) { continue; } -ns->iocs = nvme_cse_iocs_none; -switch (ns->csi) { -case NVME_CSI_NVM: -if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { -ns->iocs = nvme_cse_iocs_nvm; -} -break; -case NVME_CSI_ZONED: -if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { -ns->iocs = nvme_cse_iocs_zoned; -} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { -ns->iocs = nvme_cse_iocs_nvm; -} -break; -} + +__nvme_select_ns_iocs(n, ns); } } -- 2.17.1
[PATCH V2 2/7] hw/block/nvme: fix namespaces array to 1-based
subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES. But subsys->namespaces are being accessed with 1-based namespace id which means the very first array entry will always be empty(NULL). Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 890d118117dc..574774390c4c 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -24,7 +24,7 @@ typedef struct NvmeSubsystem { NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; /* Allocated namespaces for this subsystem */ -NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES]; +NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1]; } NvmeSubsystem; int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); -- 2.17.1
[PATCH V2 7/7] hw/block/nvme: support Identify NS Attached Controller List
Support Identify command for Namespace attached controller list. This command handler will traverse the controller instances in the given subsystem to figure out whether the specified nsid is attached to the controllers or not. The 4096bytes Identify data will return with the first entry (16bits) indicating the number of the controller id entries. So, the data can hold up to 2047 entries for the controller ids. Signed-off-by: Minwoo Im --- hw/block/nvme.c | 42 ++ hw/block/trace-events | 1 + include/block/nvme.h | 1 + 3 files changed, 44 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 71bcd66f1956..da60335def9f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -3157,6 +3157,46 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active) return NVME_INVALID_CMD_SET | NVME_DNR; } +static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeIdentify *c = (NvmeIdentify *)>cmd; +uint16_t min_id = le16_to_cpu(c->ctrlid); +uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {}; +uint16_t *ids = [1]; +NvmeNamespace *ns; +NvmeCtrl *ctrl; +int cntlid, nr_ids = 0; + +trace_pci_nvme_identify_ns_attached_list(min_id); + +if (c->nsid == NVME_NSID_BROADCAST) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +ns = nvme_subsys_ns(n->subsys, c->nsid); +if (!ns) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +for (cntlid = min_id; cntlid < ARRAY_SIZE(n->subsys->ctrls); cntlid++) { +ctrl = nvme_subsys_ctrl(n->subsys, cntlid); +if (!ctrl) { +continue; +} + +if (!nvme_ns_is_attached(ctrl, ns)) { +continue; +} + +ids[nr_ids++] = cntlid; +} + +list[0] = nr_ids; + +return nvme_dma(n, (uint8_t *)list, sizeof(list), +DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, bool active) { @@ -3356,6 +3396,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) return nvme_identify_ns(n, req, true); case NVME_ID_CNS_NS_PRESENT: return nvme_identify_ns(n, req, false); +case NVME_ID_CNS_NS_ATTACHED_CTRL_LIST: +return nvme_identify_ns_attached_list(n, req); case NVME_ID_CNS_CS_NS: return nvme_identify_ns_csi(n, req, true); case NVME_ID_CNS_CS_NS_PRESENT: diff --git a/hw/block/trace-events b/hw/block/trace-events index bf67fe7873d2..2d88d96c2165 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -62,6 +62,7 @@ pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16"" pci_nvme_identify_ctrl(void) "identify controller" pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8"" pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_ns_attached_list(uint16_t cntid) "cntid=%"PRIu16"" pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", csi=0x%"PRIx8"" pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32"" pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", csi=0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 4b016f954fee..fb82d8682e9f 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -968,6 +968,7 @@ enum NvmeIdCns { NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, NVME_ID_CNS_NS_PRESENT_LIST = 0x10, NVME_ID_CNS_NS_PRESENT= 0x11, +NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12, NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a, NVME_ID_CNS_CS_NS_PRESENT = 0x1b, NVME_ID_CNS_IO_COMMAND_SET= 0x1c, -- 2.17.1
[PATCH V2 4/7] hw/block/nvme: support allocated namespace type
>From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines valid namespace types: - Unallocated: Not exists in the NVMe subsystem - Allocated: Exists in the NVMe subsystem - Inactive: Not attached to the controller - Active: Attached to the controller This patch added support for allocated, but not attached namespace type: !nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid) nvme_ns() returns attached namespace instance of the given controller and nvme_subsys_ns() returns allocated namespace instance in the subsystem. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 13 + hw/block/nvme.c| 63 +++--- 2 files changed, 60 insertions(+), 16 deletions(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 8a0732b22316..14627f9ccb41 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -30,4 +30,17 @@ typedef struct NvmeSubsystem { int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp); +/* + * Return allocated namespace of the specified nsid in the subsystem. + */ +static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys, +uint32_t nsid) +{ +if (!subsys) { +return NULL; +} + +return subsys->namespaces[nsid]; +} + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index a1e930f7c8e4..d1761a82731f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -3124,7 +3124,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3138,7 +3138,14 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { -return nvme_rpt_empty_id_struct(n, req); +if (!active) { +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return nvme_rpt_empty_id_struct(n, req); +} +} else { +return nvme_rpt_empty_id_struct(n, req); +} } if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { @@ -3149,7 +3156,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_CMD_SET | NVME_DNR; } -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3163,7 +3171,14 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { -return nvme_rpt_empty_id_struct(n, req); +if (!active) { +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return nvme_rpt_empty_id_struct(n, req); +} +} else { +return nvme_rpt_empty_id_struct(n, req); +} } if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { @@ -3176,7 +3191,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3201,7 +3217,14 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { -continue; +if (!active) { +ns = nvme_subsys_ns(n->subsys, i); +if (!ns) { +continue; +} +} else { +continue; +} } if (ns->params.nsid <= min_nsid) { continue; @@ -3215,7 +3238,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3241,7 +3265,14 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { -continue; +if (!active) { +ns =
[PATCH V2 1/7] hw/block/nvme: support namespace detach
Given that now we have nvme-subsys device supported, we can manage namespace allocated, but not attached: detached. This patch introduced a parameter for nvme-ns device named 'detached'. This parameter indicates whether the given namespace device is detached from a entire NVMe subsystem('subsys' given case, shared namespace) or a controller('bus' given case, private namespace). - Allocated namespace 1) Shared ns in the subsystem 'subsys0': -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true 2) Private ns for the controller 'nvme0' of the subsystem 'subsys0': -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true 3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns: -device nvme,serial=foo,id=nvme0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme-subsys.h | 1 + hw/block/nvme.c| 41 +++-- hw/block/nvme.h| 22 ++ 5 files changed, 64 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c3b513b0fc78..cdcb81319fb5 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -393,6 +393,7 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, NvmeSubsystem *), +DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 7af6884862b5..b0c00e115d81 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -26,6 +26,7 @@ typedef struct NvmeZone { } NvmeZone; typedef struct NvmeNamespaceParams { +bool detached; uint32_t nsid; QemuUUID uuid; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index ccf6a71398d3..890d118117dc 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -23,6 +23,7 @@ typedef struct NvmeSubsystem { uint8_t subnqn[256]; NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; +/* Allocated namespaces for this subsystem */ NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES]; } NvmeSubsystem; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6b84e34843f5..a1e930f7c8e4 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -23,7 +23,7 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts=,zoned.append_size_limit=, \ - * subsys= \ + * subsys=,detached= * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ * subsys= @@ -78,6 +78,13 @@ * controllers in the subsystem. Otherwise, `bus` must be given to attach * this namespace to a specified single controller as a non-shared namespace. * + * - `detached` + * Not to attach the namespace device to controllers in the NVMe subsystem + * during boot-up. If not given, namespaces are all attahced to all + * controllers in the subsystem by default. + * It's mutual exclusive with 'bus' parameter. It's only valid in case + * `subsys` is provided. + * * Setting `zoned` to true selects Zoned Command Set at the namespace. * In this case, the following namespace properties are available to configure * zoned operation: @@ -4521,6 +4528,20 @@ static void nvme_init_state(NvmeCtrl *n) n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); } +static int nvme_attach_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) +{ +if (nvme_ns_is_attached(n, ns)) { +error_setg(errp, + "namespace %d is already attached to controller %d", + nvme_nsid(ns), n->cntlid); +return -1; +} + +nvme_ns_attach(n, ns); + +return 0; +} + int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) { uint32_t nsid = nvme_nsid(ns); @@ -4552,7 +4573,23 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) trace_pci_nvme_register_namespace(nsid); -n->namespaces[nsid - 1] = ns; +/* + * If subsys is not given, namespae is always attached to the controller + * because there's no subsystem to manage namespace allocation. + */ +if (!n->subsys) { +if (ns->params.detached) { +error_setg(errp, + "detached needs nvme-subsys specified nvme or nvme-ns"); +return -1; +} + +return nvme_attach_namesp
[PATCH V2 0/6] hw/block/nvme: support namespace attachment
FW Rev - - -- /dev/nvme0n1 foo QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n2 foo QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme1n2 bar QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme1n3 bar QEMU NVMe Ctrl 3 268.44 MB / 268.44 MB512 B + 0 B 1.0 root@vm:~/work# nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=1 attach-ns: Success, nsid:1 root@vm:~/work# echo 1 > /sys/class/nvme/nvme1/rescan_controller root@vm:~/work# nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 foo QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n2 foo QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme1n1 bar QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme1n2 bar QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme1n3 bar QEMU NVMe Ctrl 3 268.44 MB / 268.44 MB512 B + 0 B 1.0 Minwoo Im (7): hw/block/nvme: support namespace detach hw/block/nvme: fix namespaces array to 1-based hw/block/nvme: fix allocated namespace list to 256 hw/block/nvme: support allocated namespace type hw/block/nvme: refactor nvme_select_ns_iocs hw/block/nvme: support namespace attachment command hw/block/nvme: support Identify NS Attached Controller List hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme-subsys.h | 28 - hw/block/nvme.c| 241 +++-- hw/block/nvme.h| 33 ++ hw/block/trace-events | 3 + include/block/nvme.h | 6 + 7 files changed, 278 insertions(+), 35 deletions(-) -- 2.17.1
Re: [PATCH 2/2] hw/block/nvme: add write uncorrectable command
> > It might be nitpick, 'nlb' would easily represent the value which is > > defined itself in the spec which is zero-based. Can we have this like: > > > > uint32_t nlb = le16_to_cpu(rw->nlb); > > > > bitmap_clear(ns->uncorrectable, slba, nlb + 1); > > > > > I do not disagree, but the `uint32_t nlb = le16_to_cpu(rw->nlb) + 1;` > pattern is already used in several places. Oh yes, Now I just saw some places. Then, please take my review tag for this patch. Thanks!
Re: [PATCH 2/2] hw/block/nvme: add write uncorrectable command
On 21-02-10 08:06:46, Klaus Jensen wrote: > From: Gollu Appalanaidu > > Add support for marking blocks invalid with the Write Uncorrectable > command. Block status is tracked in a (non-persistent) bitmap that is > checked on all reads and written to on all writes. This is potentially > expensive, so keep Write Uncorrectable disabled by default. > > Signed-off-by: Gollu Appalanaidu > Signed-off-by: Klaus Jensen > --- > docs/specs/nvme.txt | 3 ++ > hw/block/nvme-ns.h| 2 ++ > hw/block/nvme.h | 1 + > hw/block/nvme-ns.c| 2 ++ > hw/block/nvme.c | 65 +-- > hw/block/trace-events | 1 + > 6 files changed, 66 insertions(+), 8 deletions(-) > > diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt > index 56d393884e7a..88f9cc278d4c 100644 > --- a/docs/specs/nvme.txt > +++ b/docs/specs/nvme.txt > @@ -19,5 +19,8 @@ Known issues > > * The accounting numbers in the SMART/Health are reset across power cycles > > +* Marking blocks invalid with the Write Uncorrectable is not persisted across > + power cycles. > + > * Interrupt Coalescing is not supported and is disabled by default in > volation >of the specification. > diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h > index 7af6884862b5..15fa422ded03 100644 > --- a/hw/block/nvme-ns.h > +++ b/hw/block/nvme-ns.h > @@ -72,6 +72,8 @@ typedef struct NvmeNamespace { > struct { > uint32_t err_rec; > } features; > + > +unsigned long *uncorrectable; > } NvmeNamespace; > > static inline uint32_t nvme_nsid(NvmeNamespace *ns) > diff --git a/hw/block/nvme.h b/hw/block/nvme.h > index 98082b2dfba3..9b8f85b9cf16 100644 > --- a/hw/block/nvme.h > +++ b/hw/block/nvme.h > @@ -68,6 +68,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc) > case NVME_CMD_FLUSH:return "NVME_NVM_CMD_FLUSH"; > case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE"; > case NVME_CMD_READ: return "NVME_NVM_CMD_READ"; > +case NVME_CMD_WRITE_UNCOR: return "NVME_CMD_WRITE_UNCOR"; > case NVME_CMD_COMPARE: return "NVME_NVM_CMD_COMPARE"; > case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES"; > case NVME_CMD_DSM: return "NVME_NVM_CMD_DSM"; > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c > index ade46e2f3739..742bbc4b4b62 100644 > --- a/hw/block/nvme-ns.c > +++ b/hw/block/nvme-ns.c > @@ -72,6 +72,8 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) > id_ns->mcl = cpu_to_le32(ns->params.mcl); > id_ns->msrc = ns->params.msrc; > > +ns->uncorrectable = bitmap_new(id_ns->nsze); > + > return 0; > } > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index e5f725d7..56048046c193 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -1112,6 +1112,20 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, > uint64_t slba, > return NVME_SUCCESS; > } > > +static inline uint16_t nvme_check_uncor(NvmeNamespace *ns, uint64_t slba, > +uint32_t nlb) > +{ > +uint64_t elba = nlb + slba; > + > +if (ns->uncorrectable) { > +if (find_next_bit(ns->uncorrectable, elba, slba) < elba) { > +return NVME_UNRECOVERED_READ | NVME_DNR; > +} > +} > + > +return NVME_SUCCESS; > +} > + > static void nvme_aio_err(NvmeRequest *req, int ret) > { > uint16_t status = NVME_SUCCESS; > @@ -1423,14 +1437,24 @@ static void nvme_rw_cb(void *opaque, int ret) > BlockAcctCookie *acct = >acct; > BlockAcctStats *stats = blk_get_stats(blk); > > +bool is_write = nvme_is_write(req); > + > trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); > > -if (ns->params.zoned && nvme_is_write(req)) { > +if (ns->params.zoned && is_write) { > nvme_finalize_zoned_write(ns, req); > } > > if (!ret) { > block_acct_done(stats, acct); > + > +if (is_write) { > +NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; > + uint64_t slba = le64_to_cpu(rw->slba); > +uint32_t nlb = le16_to_cpu(rw->nlb) + 1; > + > +bitmap_clear(ns->uncorrectable, slba, nlb); It might be nitpick, 'nlb' would easily represent the value which is defined itself in the spec which is zero-based. Can we have this like: uint32_t nlb = le16_to_cpu(rw->nlb); bitmap_clear(ns->uncorrectable, slba, nlb + 1); Otherwise, it looks good to me. Reviewed-by: Minwoo Im
Re: [PATCH 1/2] hw/block/nvme: add oncs device parameter
On 21-02-10 08:06:45, Klaus Jensen wrote: > From: Gollu Appalanaidu > > Add the 'oncs' nvme device parameter to allow optional features to be > enabled/disabled explicitly. Since most of these are optional commands, > make the CSE log pages dynamic to account for the value of ONCS. > > Signed-off-by: Gollu Appalanaidu > Signed-off-by: Klaus Jensen Reviewed-by: Minwoo Im
Re: [PATCH v2] hw/block: nvme: Fix a build error in nvme_get_feature()
On 21-02-10 18:23:17, Bin Meng wrote: > From: Bin Meng > > Current QEMU HEAD nvme.c does not compile: > > hw/block/nvme.c:3242:9: error: ‘result’ may be used uninitialized in this > function [-Werror=maybe-uninitialized] > trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled"); > ^ > hw/block/nvme.c:3150:14: note: ‘result’ was declared here > uint32_t result; > ^ > > Explicitly initialize the result to fix it. > > Fixes: aa5e55e3b07e ("hw/block/nvme: open code for volatile write cache") > Signed-off-by: Bin Meng Bin, Thanks for the fix!
Re: [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing
On 21-02-07 22:49:35, Klaus Jensen wrote: > From: Klaus Jensen > > Fix missing sign inversion. > > Signed-off-by: Klaus Jensen Reviewed-by: Minwoo Im
Re: [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes
On 21-02-07 22:49:34, Klaus Jensen wrote: > From: Klaus Jensen > > A Write Zeroes commands should not be counted in either the 'Data Units > Written' or in 'Host Write Commands' SMART/Health Information Log page. > > Signed-off-by: Klaus Jensen Reviewed-by: Minwoo Im
Re: [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context
On 21-02-07 22:49:33, Klaus Jensen wrote: > From: Klaus Jensen > > The 'len' member of the nvme_compare_ctx struct is redundant since the > same information is available in the 'iov' member. > > Signed-off-by: Klaus Jensen Reviewed-by: Minwoo Im
[PATCH 6/6] hw/block/nvme: support namespace attachment command
This patch supports Namespace Attachment command for the pre-defined nvme-ns device nodes. Of course, attach/detach namespace should only be supported in case 'subsys' is given. This is because if we detach a namespace from a controller, somebody needs to manage the detached, but allocated namespace in the NVMe subsystem. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 10 +++ hw/block/nvme.c| 59 ++ hw/block/nvme.h| 5 hw/block/trace-events | 2 ++ include/block/nvme.h | 5 5 files changed, 81 insertions(+) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 14627f9ccb41..ef4bec928eae 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -30,6 +30,16 @@ typedef struct NvmeSubsystem { int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp); +static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys, +uint32_t cntlid) +{ +if (!subsys) { +return NULL; +} + +return subsys->ctrls[cntlid]; +} + /* * Return allocated namespace of the specified nsid in the subsystem. */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 697368a6ae0c..769436722c7e 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -183,6 +183,7 @@ static const uint32_t nvme_cse_acs[256] = { [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP, }; static const uint32_t nvme_cse_iocs_none[256]; @@ -3766,6 +3767,62 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns); +static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeNamespace *ns; +NvmeCtrl *ctrl; +uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {}; +uint32_t nsid = le32_to_cpu(req->cmd.nsid); +uint32_t dw10 = le32_to_cpu(req->cmd.cdw10); +bool attach = !(dw10 & 0xf); +uint16_t *nr_ids = [0]; +uint16_t *ids = [1]; +uint16_t ret; +int i; + +trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf); + +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return NVME_INVALID_FIELD | NVME_DNR; +} + +ret = nvme_dma(n, (uint8_t *)list, 4096, + DMA_DIRECTION_TO_DEVICE, req); +if (ret) { +return ret; +} + +if (!*nr_ids) { +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR; +} + +for (i = 0; i < *nr_ids; i++) { +ctrl = nvme_subsys_ctrl(n->subsys, ids[i]); +if (!ctrl) { +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR; +} + +if (attach) { +if (nvme_ns_is_attached(ctrl, ns)) { +return NVME_NS_ALREADY_ATTACHED | NVME_DNR; +} + +nvme_ns_attach(n, ns); +__nvme_select_ns_iocs(n, ns); +} else { +if (!nvme_ns_is_attached(ctrl, ns)) { +return NVME_NS_NOT_ATTACHED | NVME_DNR; +} + +nvme_ns_detach(n, ns); +} +} + +return NVME_SUCCESS; +} + static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode, @@ -3797,6 +3854,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) return nvme_get_feature(n, req); case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); +case NVME_ADM_CMD_NS_ATTACHMENT: +return nvme_ns_attachment(n, req); default: assert(false); } diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 1c7796b20996..5a1ab857d166 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -222,6 +222,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, NvmeNamespace *ns) n->namespaces[nvme_nsid(ns) - 1] = ns; } +static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns) +{ +n->namespaces[nvme_nsid(ns) - 1] = NULL; +} + static inline NvmeCQueue *nvme_cq(NvmeRequest *req) { NvmeSQueue *sq = req->sq; diff --git a/hw/block/trace-events b/hw/block/trace-events index b6e972d733a6..bf67fe7873d2 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -80,6 +80,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16"" pci_nvme_aer_aerl_exceeded(void) "aerl exceeded" pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8"" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" +pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=0x%"PRIx8"" +pci_nvme_ns_attachment_attac
[PATCH 5/6] hw/block/nvme: refactor nvme_select_ns_iocs
This patch has no functional changes. This patch just refactored nvme_select_ns_iocs() to iterate the attached namespaces of the controlller and make it invoke __nvme_select_ns_iocs(). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d1761a82731f..697368a6ae0c 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -3896,6 +3896,25 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n) } } +static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns) +{ +ns->iocs = nvme_cse_iocs_none; +switch (ns->csi) { +case NVME_CSI_NVM: +if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { +ns->iocs = nvme_cse_iocs_nvm; +} +break; +case NVME_CSI_ZONED: +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { +ns->iocs = nvme_cse_iocs_zoned; +} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { +ns->iocs = nvme_cse_iocs_nvm; +} +break; +} +} + static void nvme_select_ns_iocs(NvmeCtrl *n) { NvmeNamespace *ns; @@ -3906,21 +3925,8 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) if (!ns) { continue; } -ns->iocs = nvme_cse_iocs_none; -switch (ns->csi) { -case NVME_CSI_NVM: -if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { -ns->iocs = nvme_cse_iocs_nvm; -} -break; -case NVME_CSI_ZONED: -if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { -ns->iocs = nvme_cse_iocs_zoned; -} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { -ns->iocs = nvme_cse_iocs_nvm; -} -break; -} + +__nvme_select_ns_iocs(n, ns); } } -- 2.17.1
[PATCH 4/6] hw/block/nvme: support allocated namespace type
>From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines valid namespace types: - Unallocated: Not exists in the NVMe subsystem - Allocated: Exists in the NVMe subsystem - Inactive: Not attached to the controller - Active: Attached to the controller This patch added support for allocated, but not attached namespace type: !nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid) nvme_ns() returns attached namespace instance of the given controller and nvme_subsys_ns() returns allocated namespace instance in the subsystem. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 13 + hw/block/nvme.c| 63 +++--- 2 files changed, 60 insertions(+), 16 deletions(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 8a0732b22316..14627f9ccb41 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -30,4 +30,17 @@ typedef struct NvmeSubsystem { int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp); +/* + * Return allocated namespace of the specified nsid in the subsystem. + */ +static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys, +uint32_t nsid) +{ +if (!subsys) { +return NULL; +} + +return subsys->namespaces[nsid]; +} + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index a1e930f7c8e4..d1761a82731f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -3124,7 +3124,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3138,7 +3138,14 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { -return nvme_rpt_empty_id_struct(n, req); +if (!active) { +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return nvme_rpt_empty_id_struct(n, req); +} +} else { +return nvme_rpt_empty_id_struct(n, req); +} } if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { @@ -3149,7 +3156,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_CMD_SET | NVME_DNR; } -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3163,7 +3171,14 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { -return nvme_rpt_empty_id_struct(n, req); +if (!active) { +ns = nvme_subsys_ns(n->subsys, nsid); +if (!ns) { +return nvme_rpt_empty_id_struct(n, req); +} +} else { +return nvme_rpt_empty_id_struct(n, req); +} } if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { @@ -3176,7 +3191,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3201,7 +3217,14 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { -continue; +if (!active) { +ns = nvme_subsys_ns(n->subsys, i); +if (!ns) { +continue; +} +} else { +continue; +} } if (ns->params.nsid <= min_nsid) { continue; @@ -3215,7 +3238,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, +bool active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -3241,7 +3265,14 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { -continue; +if (!active) { +ns =
[PATCH 3/6] hw/block/nvme: fix allocated namespace list to 256
Expand allocated namespace list (subsys->namespaces) to have 256 entries which is a value lager than at least NVME_MAX_NAMESPACES which is for attached namespace list in a controller. Allocated namespace list should at least larger than attached namespace list. n->num_namespaces = NVME_MAX_NAMESPACES; The above line will set the NN field by id->nn so that the subsystem should also prepare at least this number of namespace list entries. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 2 +- hw/block/nvme.h| 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 574774390c4c..8a0732b22316 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -14,7 +14,7 @@ OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) #define NVME_SUBSYS_MAX_CTRLS 32 -#define NVME_SUBSYS_MAX_NAMESPACES 32 +#define NVME_SUBSYS_MAX_NAMESPACES 256 typedef struct NvmeCtrl NvmeCtrl; typedef struct NvmeNamespace NvmeNamespace; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index bde0ed7c2679..1c7796b20996 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -10,6 +10,12 @@ #define NVME_DEFAULT_ZONE_SIZE (128 * MiB) #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) +/* + * Subsystem namespace list for allocated namespaces should be larger than + * attached namespace list in a controller. + */ +QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_SUBSYS_MAX_NAMESPACES); + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ -- 2.17.1
[PATCH 1/6] hw/block/nvme: support namespace detach
Given that now we have nvme-subsys device supported, we can manage namespace allocated, but not attached: detached. This patch introduced a parameter for nvme-ns device named 'detached'. This parameter indicates whether the given namespace device is detached from a entire NVMe subsystem('subsys' given case, shared namespace) or a controller('bus' given case, private namespace). - Allocated namespace 1) Shared ns in the subsystem 'subsys0': -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true 2) Private ns for the controller 'nvme0' of the subsystem 'subsys0': -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true 3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns: -device nvme,serial=foo,id=nvme0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme-subsys.h | 1 + hw/block/nvme.c| 41 +++-- hw/block/nvme.h| 22 ++ 5 files changed, 64 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c3b513b0fc78..cdcb81319fb5 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -393,6 +393,7 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, NvmeSubsystem *), +DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 7af6884862b5..b0c00e115d81 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -26,6 +26,7 @@ typedef struct NvmeZone { } NvmeZone; typedef struct NvmeNamespaceParams { +bool detached; uint32_t nsid; QemuUUID uuid; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index ccf6a71398d3..890d118117dc 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -23,6 +23,7 @@ typedef struct NvmeSubsystem { uint8_t subnqn[256]; NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; +/* Allocated namespaces for this subsystem */ NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES]; } NvmeSubsystem; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6b84e34843f5..a1e930f7c8e4 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -23,7 +23,7 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts=,zoned.append_size_limit=, \ - * subsys= \ + * subsys=,detached= * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ * subsys= @@ -78,6 +78,13 @@ * controllers in the subsystem. Otherwise, `bus` must be given to attach * this namespace to a specified single controller as a non-shared namespace. * + * - `detached` + * Not to attach the namespace device to controllers in the NVMe subsystem + * during boot-up. If not given, namespaces are all attahced to all + * controllers in the subsystem by default. + * It's mutual exclusive with 'bus' parameter. It's only valid in case + * `subsys` is provided. + * * Setting `zoned` to true selects Zoned Command Set at the namespace. * In this case, the following namespace properties are available to configure * zoned operation: @@ -4521,6 +4528,20 @@ static void nvme_init_state(NvmeCtrl *n) n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); } +static int nvme_attach_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) +{ +if (nvme_ns_is_attached(n, ns)) { +error_setg(errp, + "namespace %d is already attached to controller %d", + nvme_nsid(ns), n->cntlid); +return -1; +} + +nvme_ns_attach(n, ns); + +return 0; +} + int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) { uint32_t nsid = nvme_nsid(ns); @@ -4552,7 +4573,23 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) trace_pci_nvme_register_namespace(nsid); -n->namespaces[nsid - 1] = ns; +/* + * If subsys is not given, namespae is always attached to the controller + * because there's no subsystem to manage namespace allocation. + */ +if (!n->subsys) { +if (ns->params.detached) { +error_setg(errp, + "detached needs nvme-subsys specified nvme or nvme-ns"); +return -1; +} + +return nvme_attach_namesp
[PATCH 2/6] hw/block/nvme: fix namespaces array to 1-based
subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES. But subsys->namespaces are being accessed with 1-based namespace id which means the very first array entry will always be empty(NULL). Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 890d118117dc..574774390c4c 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -24,7 +24,7 @@ typedef struct NvmeSubsystem { NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; /* Allocated namespaces for this subsystem */ -NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES]; +NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1]; } NvmeSubsystem; int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); -- 2.17.1
[PATCH 0/6] hw/block/nvme: support namespace attachment
Hello, This series supports namespace attachment: attach and detach. It means that this series also introduced a scheme for allocated namespace which is detached, but allocated in a NVMe subsystem. Given that now we have nvme-subsys device to specify a NVMe subsystem, it can manage detached namespaces from controllers in the subsystem itself. Tested: -device nvme-subsys,id=subsys0 \ -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ -device nvme-ns,id=ns1,drive=drv0,nsid=1,subsys=subsys0,zoned=false \ -device nvme-ns,id=ns2,drive=drv1,nsid=2,subsys=subsys0,zoned=true \ -device nvme-ns,id=ns3,drive=drv2,nsid=3,subsys=subsys0,detached=true,zoned=false \ -device nvme-ns,id=ns4,drive=drv3,nsid=4,subsys=subsys0,detached=true,zoned=true \ root@vm:~# nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 foo QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n2 foo QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 root@vm:~# nvme attach-ns /dev/nvme0 --namespace-id=3 --controllers=0 attach-ns: Success, nsid:3 root@vm:~# nvme attach-ns /dev/nvme0 --namespace-id=4 --controllers=0 attach-ns: Success, nsid:4 root@vm:~# echo 1 > /sys/class/nvme/nvme0/rescan_controller root@vm:~# nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 foo QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n2 foo QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n3 foo QEMU NVMe Ctrl 3 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n4 foo QEMU NVMe Ctrl 4 268.44 MB / 268.44 MB512 B + 0 B 1.0 root@vm:~# nvme detach-ns /dev/nvme0 --namespace-id=3 --controllers=0 detach-ns: Success, nsid:3 root@vm:~# nvme detach-ns /dev/nvme0 --namespace-id=4 --controllers=0 detach-ns: Success, nsid:4 root@vm:~# echo 1 > /sys/class/nvme/nvme0/rescan_controller root@vm:~# nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 foo QEMU NVMe Ctrl 1 268.44 MB / 268.44 MB512 B + 0 B 1.0 /dev/nvme0n2 foo QEMU NVMe Ctrl 2 268.44 MB / 268.44 MB512 B + 0 B 1.0 Thanks, Minwoo Im (6): hw/block/nvme: support namespace detach hw/block/nvme: fix namespaces array to 1-based hw/block/nvme: fix allocated namespace list to 256 hw/block/nvme: support allocated namespace type hw/block/nvme: refactor nvme_select_ns_iocs hw/block/nvme: support namespace attachment command hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme-subsys.h | 28 +- hw/block/nvme.c| 199 ++--- hw/block/nvme.h| 33 +++ hw/block/trace-events | 2 + include/block/nvme.h | 5 ++ 7 files changed, 234 insertions(+), 35 deletions(-) -- 2.17.1
Re: [PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-27 05:39:29, Keith Busch wrote: > This came out looking cleaner than I had initially expected. Thanks for > seeing this feature through! > > Reviewed-by: Keith Busch Thanks Keith for the review!
Re: [PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem
On 21-01-26 09:57:23, Keith Busch wrote: > On Tue, Jan 26, 2021 at 09:52:48AM +0900, Minwoo Im wrote: > > On 21-01-25 10:11:43, Keith Busch wrote: > > > On Mon, Jan 25, 2021 at 07:03:32PM +0100, Klaus Jensen wrote: > > > > On Jan 24 11:54, Minwoo Im wrote: > > > > > We have nvme-subsys and nvme devices mapped together. To support > > > > > multi-controller scheme to this setup, controller identifier(id) has > > > > > to > > > > > be managed. Earlier, cntlid(controller id) used to be always 0 > > > > > because > > > > > we didn't have any subsystem scheme that controller id matters. > > > > > > > > > > This patch introduced 'cntlid' attribute to the nvme controller > > > > > instance(NvmeCtrl) and make it allocated by the nvme-subsys device > > > > > mapped to the controller. If nvme-subsys is not given to the > > > > > controller, then it will always be 0 as it was. > > > > > > > > > > Added 'ctrls' array in the nvme-subsys instance to manage attached > > > > > controllers to the subsystem with a limit(32). This patch didn't take > > > > > list for the controllers to make it seamless with nvme-ns device. > > > > > > > > > > Signed-off-by: Minwoo Im > > > > > --- > > > > > hw/block/nvme-subsys.c | 21 + > > > > > hw/block/nvme-subsys.h | 4 > > > > > hw/block/nvme.c| 29 + > > > > > hw/block/nvme.h| 1 + > > > > > 4 files changed, 55 insertions(+) > > > > > > > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > > > > > index b525fca14103..7138389be4bd 100644 > > > > > --- a/hw/block/nvme.c > > > > > +++ b/hw/block/nvme.c > > > > > @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, > > > > > PCIDevice *pci_dev) > > > > > id->psd[0].enlat = cpu_to_le32(0x10); > > > > > id->psd[0].exlat = cpu_to_le32(0x4); > > > > > > > > > > +if (n->subsys) { > > > > > +id->cmic |= NVME_CMIC_MULTI_CTRL; > > > > > +} > > > > > > > > Since multiple controllers show up with a PCIe port of their own, do we > > > > need to set bit 0 (NVME_CMIC_MULTI_PORT?) as well? Or am I > > > > misunderstanding that bit? > > > > > > AIUI, if you report this MULTI_PORT bit, then each PCI device in the > > > subsystem needs to report a different "Port Number" in their PCIe Link > > > Capabilities register. I don't think we can manipulate that value from > > > the nvme "device", but I also don't know what a host should do with this > > > information even if we could. So, I think it's safe to leave it at 0. > > > > AFAIK, If we leave it to 0, kernel will not allocate disk for multi-path > > case (e.g., nvmeXcYnZ). > > Kernel only checks for MULTI_CTRL. It doesn't do anything with MULTI_PORT. Please forgive me that I took this discussion as MULTI_CTRL rather than MULTI_PORT. Please ignore this noise ;) Thanks!
Re: [PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-25 21:29:58, Klaus Jensen wrote: > On Jan 24 11:54, Minwoo Im wrote: > > Hello, > > > > This is sixth patch series for the support of NVMe subsystem scheme with > > multi-controller and namespace sharing in a subsystem. > > > > This version has a fix in nvme_init_ctrl() when 'cntlid' is set to the > > Identify Controller data structure by making it by cpu_to_le16() as > > Keith reviewed. > > > > Here's test result with a simple 'nvme list -v' command from this model: > > > > -device nvme-subsys,id=subsys0 \ > > -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ > > -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ > > -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ > > -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ > > -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ > > \ > > -device nvme,serial=qux,id=nvme3 \ > > -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \ > > \ > > -device nvme-subsys,id=subsys1 \ > > -device nvme,serial=quux,id=nvme4,subsys=subsys1 \ > > -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \ > > > > root@vm:~/work# nvme list -v > > NVM Express Subsystems > > > > SubsystemSubsystem-NQN > > Controllers > > > > > > > > nvme-subsys1 nqn.2019-08.org.qemu:subsys0 > > nvme0, nvme1, nvme2 > > nvme-subsys3 nqn.2019-08.org.qemu:qux > > nvme3 > > nvme-subsys4 nqn.2019-08.org.qemu:subsys1 > > nvme4 > > > > NVM Express Controllers > > > > Device SN MN FR > > TxPort AddressSubsystemNamespaces > > > > -- -- > > nvme0foo QEMU NVMe Ctrl > > 1.0 pcie :00:06.0 nvme-subsys1 nvme1c0n1 > > nvme1bar QEMU NVMe Ctrl > > 1.0 pcie :00:07.0 nvme-subsys1 nvme1c1n1 > > nvme2baz QEMU NVMe Ctrl > > 1.0 pcie :00:08.0 nvme-subsys1 nvme1c2n1, nvme1c2n2 > > nvme3qux QEMU NVMe Ctrl > > 1.0 pcie :00:09.0 nvme-subsys3 nvme3n1 > > nvme4quux QEMU NVMe Ctrl > > 1.0 pcie :00:0a.0 nvme-subsys4 nvme4c4n1 > > > > NVM Express Namespaces > > > > Device NSID Usage Format > > Controllers > > -- > > > > nvme1n1 1134.22 MB / 134.22 MB512 B + 0 B nvme0, > > nvme1, nvme2 > > nvme1n2 2268.44 MB / 268.44 MB512 B + 0 B nvme2 > > nvme3n1 3268.44 MB / 268.44 MB512 B + 0 B nvme3 > > nvme4n1 1268.44 MB / 268.44 MB512 B + 0 B nvme4 > > > > Thanks, > > > > Since V5: > > - Fix endianness for 'cntlid' in Identify Controller data structure. > > (Keith) > > > > Since V4: > > - Code clean-up to snprintf rather than duplicating it and copy. > > (Keith) > > - Documentation for 'subsys' clean-up. (Keith) > > - Remove 'cntlid' param from nvme_init_ctrl(). (Keith) > > - Put error_propagate() in nvme_realize(). (Keith) > > > > Since RFC V3: > > - Exclude 'deatched' scheme from this series. This will be covered in > > the next series by covering all the ns-related admin commands > > including ZNS and ns-mgmt. (Niklas) > > - Rebased on nvme-next. > > - Remove RFC tag from this V4. > > > > Since RFC V2: > > - Rebased on nvme-next branch with trivial patches from the previous > > version(V2) applied. (Klaus) > > - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus) > > - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was > > missed i
Re: [PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem
On 21-01-25 10:11:43, Keith Busch wrote: > On Mon, Jan 25, 2021 at 07:03:32PM +0100, Klaus Jensen wrote: > > On Jan 24 11:54, Minwoo Im wrote: > > > We have nvme-subsys and nvme devices mapped together. To support > > > multi-controller scheme to this setup, controller identifier(id) has to > > > be managed. Earlier, cntlid(controller id) used to be always 0 because > > > we didn't have any subsystem scheme that controller id matters. > > > > > > This patch introduced 'cntlid' attribute to the nvme controller > > > instance(NvmeCtrl) and make it allocated by the nvme-subsys device > > > mapped to the controller. If nvme-subsys is not given to the > > > controller, then it will always be 0 as it was. > > > > > > Added 'ctrls' array in the nvme-subsys instance to manage attached > > > controllers to the subsystem with a limit(32). This patch didn't take > > > list for the controllers to make it seamless with nvme-ns device. > > > > > > Signed-off-by: Minwoo Im > > > --- > > > hw/block/nvme-subsys.c | 21 + > > > hw/block/nvme-subsys.h | 4 > > > hw/block/nvme.c| 29 + > > > hw/block/nvme.h| 1 + > > > 4 files changed, 55 insertions(+) > > > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > > > index b525fca14103..7138389be4bd 100644 > > > --- a/hw/block/nvme.c > > > +++ b/hw/block/nvme.c > > > @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice > > > *pci_dev) > > > id->psd[0].enlat = cpu_to_le32(0x10); > > > id->psd[0].exlat = cpu_to_le32(0x4); > > > > > > +if (n->subsys) { > > > +id->cmic |= NVME_CMIC_MULTI_CTRL; > > > +} > > > > Since multiple controllers show up with a PCIe port of their own, do we > > need to set bit 0 (NVME_CMIC_MULTI_PORT?) as well? Or am I > > misunderstanding that bit? > > AIUI, if you report this MULTI_PORT bit, then each PCI device in the > subsystem needs to report a different "Port Number" in their PCIe Link > Capabilities register. I don't think we can manipulate that value from > the nvme "device", but I also don't know what a host should do with this > information even if we could. So, I think it's safe to leave it at 0. AFAIK, If we leave it to 0, kernel will not allocate disk for multi-path case (e.g., nvmeXcYnZ).
[PATCH] hw/block/nvme: fix wrong parameter name 'cross_read'
The actual parameter name is 'cross_read' rather than 'cross_zone_read'. Signed-off-by: Minwoo Im --- hw/block/nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 21aec90637fa..bf9134f73d81 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -81,7 +81,7 @@ * The default value means there is no limit to the number of * concurrently open zones. * - * zoned.cross_zone_read= + * zoned.cross_read= * Setting this property to true enables Read Across Zone Boundaries. */ -- 2.17.1
[PATCH V6 6/6] hw/block/nvme: support for shared namespace in subsystem
nvme-ns device is registered to a nvme controller device during the initialization in nvme_register_namespace() in case that 'bus' property is given which means it's mapped to a single controller. This patch introduced a new property 'subsys' just like the controller device instance did to map a namespace to a NVMe subsystem. If 'subsys' property is given to the nvme-ns device, it will belong to the specified subsystem and will be attached to all controllers in that subsystem by enabling shared namespace capability in NMIC(Namespace Multi-path I/O and Namespace Capabilities) in Identify Namespace. Usage: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0 # Shared -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2 # Non-shared In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in the same subsystem. On the other hand, 'ns2' will be attached to the 'nvme2' only as a private namespace in that subsystem. All the namespace with 'subsys' parameter will attach all controllers in the subsystem to the namespace by default. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 23 ++- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 25 + hw/block/nvme-subsys.h | 3 +++ hw/block/nvme.c| 10 +- 5 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 62b25cf69bfa..9b493f2ead03 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +if (nvme_ns_shared(ns)) { +id_ns->nmic |= NVME_NMIC_NS_SHARED; +} + return 0; } @@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } -if (nvme_register_namespace(n, ns, errp)) { -error_propagate_prepend(errp, local_err, -"could not register namespace: "); -return; +if (ns->subsys) { +if (nvme_subsys_register_ns(ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not setup namespace to subsys: "); +return; +} +} else { +if (nvme_register_namespace(n, ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not register namespace: "); +return; +} } - } static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), +DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 293ac990e3f6..929e78861903 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -47,6 +47,8 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; +NvmeSubsystem *subsys; + NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; QTAILQ_HEAD(, NvmeZone) exp_open_zones; @@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline bool nvme_ns_shared(NvmeNamespace *ns) +{ +return !!ns->subsys; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index e9d61c993c90..641de33e99fc 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp) +{ +NvmeSubsystem *subsys = ns->subsys; +NvmeCtrl *n; +int i; + +if (subsys->namespaces[nvme_nsid(ns)]) { +error_setg(errp, "namespace %d already registerd to subsy %s", + nvme_nsid(ns), subsys->parent_obj.id); +return -1; +} + +subsys->namespaces[nvme_nsid(ns)] = ns; + +for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { +n = subsys->ctrls[i]; + +if (n && nvme_register_namespace(n, ns, errp)) { +return -1; +} +} + +return 0; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 4eba50d96a1d..ccf6a71398d3 100644 --- a/hw/block/nvme-subsys.h +++
[PATCH V6 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index d6415a869c1c..ad68cdc2b92d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType { NVME_NIDT_CSI = 0x04, }; +enum NvmeIdNsNmic { +NVME_NMIC_NS_SHARED = 1 << 0, +}; + enum NvmeCsi { NVME_CSI_NVM= 0x00, NVME_CSI_ZONED = 0x02, -- 2.17.1
[PATCH V6 4/6] hw/block/nvme: support for multi-controller in subsystem
We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.c | 21 + hw/block/nvme-subsys.h | 4 hw/block/nvme.c| 29 + hw/block/nvme.h| 1 + 4 files changed, 55 insertions(+) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index aa82911b951c..e9d61c993c90 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -22,6 +22,27 @@ #include "nvme.h" #include "nvme-subsys.h" +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) +{ +NvmeSubsystem *subsys = n->subsys; +int cntlid; + +for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { +if (!subsys->ctrls[cntlid]) { +break; +} +} + +if (cntlid == ARRAY_SIZE(subsys->ctrls)) { +error_setg(errp, "no more free controller id"); +return -1; +} + +subsys->ctrls[cntlid] = n; + +return cntlid; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 40f06a4c7db0..4eba50d96a1d 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace; typedef struct NvmeSubsystem { DeviceState parent_obj; uint8_t subnqn[256]; + +NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; } NvmeSubsystem; +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index b525fca14103..7138389be4bd 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); + +id->cntlid = cpu_to_le16(n->cntlid); + id->rab = 6; id->ieee[0] = 0x00; id->ieee[1] = 0x02; @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); +if (n->subsys) { +id->cmic |= NVME_CMIC_MULTI_CTRL; +} + NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); @@ -4495,6 +4502,24 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) n->bar.intmc = n->bar.intms = 0; } +static int nvme_init_subsys(NvmeCtrl *n, Error **errp) +{ +int cntlid; + +if (!n->subsys) { +return 0; +} + +cntlid = nvme_subsys_register_ctrl(n, errp); +if (cntlid < 0) { +return -1; +} + +n->cntlid = cntlid; + +return 0; +} + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); @@ -4515,6 +4540,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } +if (nvme_init_subsys(n, errp)) { +error_propagate(errp, local_err); +return; +} nvme_init_ctrl(n, pci_dev); /* setup a namespace if the controller drive property was given */ diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 04d4684601fd..b8f5f2d6ffb8 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -134,6 +134,7 @@ typedef struct NvmeCtrl { NvmeBus bus; BlockConfconf; +uint16_tcntlid; boolqs_created; uint32_tpage_size; uint16_tpage_bits; -- 2.17.1
[PATCH V6 3/6] hw/block/nvme: add CMIC enum value for Identify Controller
Added Controller Multi-path I/O and Namespace Sharing Capabilities (CMIC) field to support multi-controller in the following patches. This field is in Identify Controller data structure in [76]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index e4b918064df9..d6415a869c1c 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa { NVME_LPA_EXTENDED = 1 << 2, }; +enum NvmeIdCtrlCmic { +NVME_CMIC_MULTI_CTRL= 1 << 1, +}; + #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) -- 2.17.1
[PATCH V6 1/6] hw/block/nvme: introduce nvme-subsys device
To support multi-path in QEMU NVMe device model, We need to have NVMe subsystem hierarchy to map controllers and namespaces to a NVMe subsystem. This patch introduced a simple nvme-subsys device model. The subsystem will be prepared with subsystem NQN with provided in nvme-subsys device: ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0 Signed-off-by: Minwoo Im --- hw/block/meson.build | 2 +- hw/block/nvme-subsys.c | 60 ++ hw/block/nvme-subsys.h | 25 ++ hw/block/nvme.c| 3 +++ 4 files changed, 89 insertions(+), 1 deletion(-) create mode 100644 hw/block/nvme-subsys.c create mode 100644 hw/block/nvme-subsys.h diff --git a/hw/block/meson.build b/hw/block/meson.build index 602ca6c8541d..83ea2d37978d 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80.c')) softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c')) -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 'nvme-subsys.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c new file mode 100644 index ..aa82911b951c --- /dev/null +++ b/hw/block/nvme-subsys.c @@ -0,0 +1,60 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#include "qemu/units.h" +#include "qemu/osdep.h" +#include "qemu/uuid.h" +#include "qemu/iov.h" +#include "qemu/cutils.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-core.h" +#include "hw/block/block.h" +#include "block/aio.h" +#include "block/accounting.h" +#include "sysemu/sysemu.h" +#include "hw/pci/pci.h" +#include "nvme.h" +#include "nvme-subsys.h" + +static void nvme_subsys_setup(NvmeSubsystem *subsys) +{ +snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), + "nqn.2019-08.org.qemu:%s", subsys->parent_obj.id); +} + +static void nvme_subsys_realize(DeviceState *dev, Error **errp) +{ +NvmeSubsystem *subsys = NVME_SUBSYS(dev); + +nvme_subsys_setup(subsys); +} + +static void nvme_subsys_class_init(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); + +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + +dc->realize = nvme_subsys_realize; +dc->desc = "Virtual NVMe subsystem"; +} + +static const TypeInfo nvme_subsys_info = { +.name = TYPE_NVME_SUBSYS, +.parent = TYPE_DEVICE, +.class_init = nvme_subsys_class_init, +.instance_size = sizeof(NvmeSubsystem), +}; + +static void nvme_subsys_register_types(void) +{ +type_register_static(_subsys_info); +} + +type_init(nvme_subsys_register_types) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h new file mode 100644 index ..40f06a4c7db0 --- /dev/null +++ b/hw/block/nvme-subsys.h @@ -0,0 +1,25 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#ifndef NVME_SUBSYS_H +#define NVME_SUBSYS_H + +#define TYPE_NVME_SUBSYS "nvme-subsys" +#define NVME_SUBSYS(obj) \ +OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) + +#define NVME_SUBSYS_MAX_CTRLS 32 + +typedef struct NvmeCtrl NvmeCtrl; +typedef struct NvmeNamespace NvmeNamespace; +typedef struct NvmeSubsystem { +DeviceState parent_obj; +uint8_t subnqn[256]; +} NvmeSubsystem; + +#endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 21aec90637fa..aabccdf36f4b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,6 +25,7 @@ * mdts=,zoned.append_size_limit= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= + * -device nvme-subsys,id= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the @@ -38,6 +39,8 @@ * * The PMR will use BAR 4/5 exclusively. * + * To place controller(s) and namespace(s) to a subsystem, then provide + * nvme-subsys device as above. * * nvme device parameters * ~~ -- 2.17.1
[PATCH V6 0/6] hw/block/nvme: support multi-path for ctrl/ns
Hello, This is sixth patch series for the support of NVMe subsystem scheme with multi-controller and namespace sharing in a subsystem. This version has a fix in nvme_init_ctrl() when 'cntlid' is set to the Identify Controller data structure by making it by cpu_to_le16() as Keith reviewed. Here's test result with a simple 'nvme list -v' command from this model: -device nvme-subsys,id=subsys0 \ -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ \ -device nvme,serial=qux,id=nvme3 \ -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \ \ -device nvme-subsys,id=subsys1 \ -device nvme,serial=quux,id=nvme4,subsys=subsys1 \ -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \ root@vm:~/work# nvme list -v NVM Express Subsystems SubsystemSubsystem-NQN Controllers nvme-subsys1 nqn.2019-08.org.qemu:subsys0 nvme0, nvme1, nvme2 nvme-subsys3 nqn.2019-08.org.qemu:qux nvme3 nvme-subsys4 nqn.2019-08.org.qemu:subsys1 nvme4 NVM Express Controllers Device SN MN FR TxPort AddressSubsystemNamespaces -- -- nvme0foo QEMU NVMe Ctrl 1.0 pcie :00:06.0 nvme-subsys1 nvme1c0n1 nvme1bar QEMU NVMe Ctrl 1.0 pcie :00:07.0 nvme-subsys1 nvme1c1n1 nvme2baz QEMU NVMe Ctrl 1.0 pcie :00:08.0 nvme-subsys1 nvme1c2n1, nvme1c2n2 nvme3qux QEMU NVMe Ctrl 1.0 pcie :00:09.0 nvme-subsys3 nvme3n1 nvme4quux QEMU NVMe Ctrl 1.0 pcie :00:0a.0 nvme-subsys4 nvme4c4n1 NVM Express Namespaces Device NSID Usage Format Controllers -- nvme1n1 1134.22 MB / 134.22 MB512 B + 0 B nvme0, nvme1, nvme2 nvme1n2 2268.44 MB / 268.44 MB512 B + 0 B nvme2 nvme3n1 3268.44 MB / 268.44 MB512 B + 0 B nvme3 nvme4n1 1268.44 MB / 268.44 MB512 B + 0 B nvme4 Thanks, Since V5: - Fix endianness for 'cntlid' in Identify Controller data structure. (Keith) Since V4: - Code clean-up to snprintf rather than duplicating it and copy. (Keith) - Documentation for 'subsys' clean-up. (Keith) - Remove 'cntlid' param from nvme_init_ctrl(). (Keith) - Put error_propagate() in nvme_realize(). (Keith) Since RFC V3: - Exclude 'deatched' scheme from this series. This will be covered in the next series by covering all the ns-related admin commands including ZNS and ns-mgmt. (Niklas) - Rebased on nvme-next. - Remove RFC tag from this V4. Since RFC V2: - Rebased on nvme-next branch with trivial patches from the previous version(V2) applied. (Klaus) - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus) - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was missed in V2. - Added 'detached' parameter to nvme-ns device to decide whether to attach or not to controller(s) in the subsystem. (Klaus) - Implemented Identify Active Namespace ID List aprt from Identify Allocated Namespace ID List by removing fall-thru statement. Since RFC V1: - Updated namespace sharing scheme to be based on nvme-subsys hierarchy. Minwoo Im (6): hw/block/nvme: introduce nvme-subsys device hw/block/nvme: support to map controller to a subsystem hw/block/nvme: add CMIC enum value for Identify Controller hw/block/nvme: support for multi-controller in subsystem hw/block/nvme: add NMIC enum value for Identify Namespace hw/block/nvme: support for shared namespace in subsystem hw/block/meson.build | 2 +- hw/block/nvme-ns.c | 23 +++-- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 106 + hw/block/nvme-subsys.h | 32 + hw/block/nvme.c| 72 +--- hw/block/nvme.h
[PATCH V6 2/6] hw/block/nvme: support to map controller to a subsystem
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys). This patch maps a controller to a subsystem by adding a parameter 'subsys' to the nvme device. To map a controller to a subsystem, we need to put nvme-subsys first and then maps the subsystem to the controller: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 If 'subsys' property is not given to the nvme controller, then subsystem NQN will be created with serial (e.g., 'foo' in above example), Otherwise, it will be based on subsys id (e.g., 'subsys0' in above example). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 30 +- hw/block/nvme.h | 3 +++ 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index aabccdf36f4b..b525fca14103 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -22,7 +22,8 @@ * [pmrdev=,] \ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ - * mdts=,zoned.append_size_limit= \ + * mdts=,zoned.append_size_limit=, \ + * subsys= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= * -device nvme-subsys,id= @@ -44,6 +45,13 @@ * * nvme device parameters * ~~ + * - `subsys` + * NVM Subsystem device. If given, a subsystem NQN will be initialized with + *given. Otherwise, will be taken for subsystem NQN. + * Also, it will enable multi controller capability represented in Identify + * Controller data structure in CMIC (Controller Multi-path I/O and Namesapce + * Sharing Capabilities), if given. + * * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number * of concurrently outstanding Asynchronous Event Request commands support @@ -4404,11 +4412,23 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return 0; } +static void nvme_init_subnqn(NvmeCtrl *n) +{ +NvmeSubsystem *subsys = n->subsys; +NvmeIdCtrl *id = >id_ctrl; + +if (!subsys) { +snprintf((char *)id->subnqn, sizeof(id->subnqn), + "nqn.2019-08.org.qemu:%s", n->params.serial); +} else { +pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn); +} +} + static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; -char *subnqn; id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); @@ -4455,9 +4475,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN | NVME_CTRL_SGLS_BITBUCKET); -subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); -strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); -g_free(subnqn); +nvme_init_subnqn(n); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); @@ -4545,6 +4563,8 @@ static Property nvme_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), +DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0), DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0), diff --git a/hw/block/nvme.h b/hw/block/nvme.h index dee6092bd45f..04d4684601fd 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -2,6 +2,7 @@ #define HW_NVME_H #include "block/nvme.h" +#include "nvme-subsys.h" #include "nvme-ns.h" #define NVME_MAX_NAMESPACES 256 @@ -170,6 +171,8 @@ typedef struct NvmeCtrl { uint8_t zasl; +NvmeSubsystem *subsys; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; -- 2.17.1
Re: [PATCH V5 4/6] hw/block/nvme: support for multi-controller in subsystem
On 21-01-22 10:42:36, Keith Busch wrote: > On Fri, Jan 22, 2021 at 09:07:34PM +0900, Minwoo Im wrote: > > index b525fca14103..3dedefb8ebba 100644 > > --- a/hw/block/nvme.c > > +++ b/hw/block/nvme.c > > @@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice > > *pci_dev) > > strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); > > strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); > > strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); > > + > > +id->cntlid = n->cntlid; > > cpu_to_le16()? It might be okay to not do that since the only > requirement is that this is a unique value, but it would be confusing > for decoding commands that have a controller id field. Agreed. Yes, cntlids are allocated in unique values so that functionality has no problem here. But, even if so, we should make it have proper value in Identify data structure with the policy it has to avoid confusing. Thanks Keith! will fix it :)
[PATCH V5 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index d6415a869c1c..ad68cdc2b92d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType { NVME_NIDT_CSI = 0x04, }; +enum NvmeIdNsNmic { +NVME_NMIC_NS_SHARED = 1 << 0, +}; + enum NvmeCsi { NVME_CSI_NVM= 0x00, NVME_CSI_ZONED = 0x02, -- 2.17.1
[PATCH V5 2/6] hw/block/nvme: support to map controller to a subsystem
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys). This patch maps a controller to a subsystem by adding a parameter 'subsys' to the nvme device. To map a controller to a subsystem, we need to put nvme-subsys first and then maps the subsystem to the controller: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 If 'subsys' property is not given to the nvme controller, then subsystem NQN will be created with serial (e.g., 'foo' in above example), Otherwise, it will be based on subsys id (e.g., 'subsys0' in above example). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 30 +- hw/block/nvme.h | 3 +++ 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index aabccdf36f4b..b525fca14103 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -22,7 +22,8 @@ * [pmrdev=,] \ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ - * mdts=,zoned.append_size_limit= \ + * mdts=,zoned.append_size_limit=, \ + * subsys= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= * -device nvme-subsys,id= @@ -44,6 +45,13 @@ * * nvme device parameters * ~~ + * - `subsys` + * NVM Subsystem device. If given, a subsystem NQN will be initialized with + *given. Otherwise, will be taken for subsystem NQN. + * Also, it will enable multi controller capability represented in Identify + * Controller data structure in CMIC (Controller Multi-path I/O and Namesapce + * Sharing Capabilities), if given. + * * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number * of concurrently outstanding Asynchronous Event Request commands support @@ -4404,11 +4412,23 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return 0; } +static void nvme_init_subnqn(NvmeCtrl *n) +{ +NvmeSubsystem *subsys = n->subsys; +NvmeIdCtrl *id = >id_ctrl; + +if (!subsys) { +snprintf((char *)id->subnqn, sizeof(id->subnqn), + "nqn.2019-08.org.qemu:%s", n->params.serial); +} else { +pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn); +} +} + static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; -char *subnqn; id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); @@ -4455,9 +4475,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN | NVME_CTRL_SGLS_BITBUCKET); -subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); -strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); -g_free(subnqn); +nvme_init_subnqn(n); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); @@ -4545,6 +4563,8 @@ static Property nvme_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), +DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0), DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0), diff --git a/hw/block/nvme.h b/hw/block/nvme.h index dee6092bd45f..04d4684601fd 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -2,6 +2,7 @@ #define HW_NVME_H #include "block/nvme.h" +#include "nvme-subsys.h" #include "nvme-ns.h" #define NVME_MAX_NAMESPACES 256 @@ -170,6 +171,8 @@ typedef struct NvmeCtrl { uint8_t zasl; +NvmeSubsystem *subsys; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; -- 2.17.1
[PATCH V5 6/6] hw/block/nvme: support for shared namespace in subsystem
nvme-ns device is registered to a nvme controller device during the initialization in nvme_register_namespace() in case that 'bus' property is given which means it's mapped to a single controller. This patch introduced a new property 'subsys' just like the controller device instance did to map a namespace to a NVMe subsystem. If 'subsys' property is given to the nvme-ns device, it will belong to the specified subsystem and will be attached to all controllers in that subsystem by enabling shared namespace capability in NMIC(Namespace Multi-path I/O and Namespace Capabilities) in Identify Namespace. Usage: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0 # Shared -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2 # Non-shared In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in the same subsystem. On the other hand, 'ns2' will be attached to the 'nvme2' only as a private namespace in that subsystem. All the namespace with 'subsys' parameter will attach all controllers in the subsystem to the namespace by default. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 23 ++- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 25 + hw/block/nvme-subsys.h | 3 +++ hw/block/nvme.c| 10 +- 5 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 62b25cf69bfa..9b493f2ead03 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +if (nvme_ns_shared(ns)) { +id_ns->nmic |= NVME_NMIC_NS_SHARED; +} + return 0; } @@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } -if (nvme_register_namespace(n, ns, errp)) { -error_propagate_prepend(errp, local_err, -"could not register namespace: "); -return; +if (ns->subsys) { +if (nvme_subsys_register_ns(ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not setup namespace to subsys: "); +return; +} +} else { +if (nvme_register_namespace(n, ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not register namespace: "); +return; +} } - } static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), +DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 293ac990e3f6..929e78861903 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -47,6 +47,8 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; +NvmeSubsystem *subsys; + NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; QTAILQ_HEAD(, NvmeZone) exp_open_zones; @@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline bool nvme_ns_shared(NvmeNamespace *ns) +{ +return !!ns->subsys; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index e9d61c993c90..641de33e99fc 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp) +{ +NvmeSubsystem *subsys = ns->subsys; +NvmeCtrl *n; +int i; + +if (subsys->namespaces[nvme_nsid(ns)]) { +error_setg(errp, "namespace %d already registerd to subsy %s", + nvme_nsid(ns), subsys->parent_obj.id); +return -1; +} + +subsys->namespaces[nvme_nsid(ns)] = ns; + +for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { +n = subsys->ctrls[i]; + +if (n && nvme_register_namespace(n, ns, errp)) { +return -1; +} +} + +return 0; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 4eba50d96a1d..ccf6a71398d3 100644 --- a/hw/block/nvme-subsys.h +++
[PATCH V5 4/6] hw/block/nvme: support for multi-controller in subsystem
We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.c | 21 + hw/block/nvme-subsys.h | 4 hw/block/nvme.c| 29 + hw/block/nvme.h| 1 + 4 files changed, 55 insertions(+) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index aa82911b951c..e9d61c993c90 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -22,6 +22,27 @@ #include "nvme.h" #include "nvme-subsys.h" +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) +{ +NvmeSubsystem *subsys = n->subsys; +int cntlid; + +for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { +if (!subsys->ctrls[cntlid]) { +break; +} +} + +if (cntlid == ARRAY_SIZE(subsys->ctrls)) { +error_setg(errp, "no more free controller id"); +return -1; +} + +subsys->ctrls[cntlid] = n; + +return cntlid; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 40f06a4c7db0..4eba50d96a1d 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace; typedef struct NvmeSubsystem { DeviceState parent_obj; uint8_t subnqn[256]; + +NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; } NvmeSubsystem; +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index b525fca14103..3dedefb8ebba 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -4435,6 +4435,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); + +id->cntlid = n->cntlid; + id->rab = 6; id->ieee[0] = 0x00; id->ieee[1] = 0x02; @@ -4481,6 +4484,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); +if (n->subsys) { +id->cmic |= NVME_CMIC_MULTI_CTRL; +} + NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); @@ -4495,6 +4502,24 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) n->bar.intmc = n->bar.intms = 0; } +static int nvme_init_subsys(NvmeCtrl *n, Error **errp) +{ +int cntlid; + +if (!n->subsys) { +return 0; +} + +cntlid = nvme_subsys_register_ctrl(n, errp); +if (cntlid < 0) { +return -1; +} + +n->cntlid = cntlid; + +return 0; +} + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); @@ -4515,6 +4540,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } +if (nvme_init_subsys(n, errp)) { +error_propagate(errp, local_err); +return; +} nvme_init_ctrl(n, pci_dev); /* setup a namespace if the controller drive property was given */ diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 04d4684601fd..b8f5f2d6ffb8 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -134,6 +134,7 @@ typedef struct NvmeCtrl { NvmeBus bus; BlockConfconf; +uint16_tcntlid; boolqs_created; uint32_tpage_size; uint16_tpage_bits; -- 2.17.1
[PATCH V5 3/6] hw/block/nvme: add CMIC enum value for Identify Controller
Added Controller Multi-path I/O and Namespace Sharing Capabilities (CMIC) field to support multi-controller in the following patches. This field is in Identify Controller data structure in [76]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index e4b918064df9..d6415a869c1c 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa { NVME_LPA_EXTENDED = 1 << 2, }; +enum NvmeIdCtrlCmic { +NVME_CMIC_MULTI_CTRL= 1 << 1, +}; + #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) -- 2.17.1
[PATCH V5 1/6] hw/block/nvme: introduce nvme-subsys device
To support multi-path in QEMU NVMe device model, We need to have NVMe subsystem hierarchy to map controllers and namespaces to a NVMe subsystem. This patch introduced a simple nvme-subsys device model. The subsystem will be prepared with subsystem NQN with provided in nvme-subsys device: ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0 Signed-off-by: Minwoo Im --- hw/block/meson.build | 2 +- hw/block/nvme-subsys.c | 60 ++ hw/block/nvme-subsys.h | 25 ++ hw/block/nvme.c| 3 +++ 4 files changed, 89 insertions(+), 1 deletion(-) create mode 100644 hw/block/nvme-subsys.c create mode 100644 hw/block/nvme-subsys.h diff --git a/hw/block/meson.build b/hw/block/meson.build index 602ca6c8541d..83ea2d37978d 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80.c')) softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c')) -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 'nvme-subsys.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c new file mode 100644 index ..aa82911b951c --- /dev/null +++ b/hw/block/nvme-subsys.c @@ -0,0 +1,60 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#include "qemu/units.h" +#include "qemu/osdep.h" +#include "qemu/uuid.h" +#include "qemu/iov.h" +#include "qemu/cutils.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-core.h" +#include "hw/block/block.h" +#include "block/aio.h" +#include "block/accounting.h" +#include "sysemu/sysemu.h" +#include "hw/pci/pci.h" +#include "nvme.h" +#include "nvme-subsys.h" + +static void nvme_subsys_setup(NvmeSubsystem *subsys) +{ +snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), + "nqn.2019-08.org.qemu:%s", subsys->parent_obj.id); +} + +static void nvme_subsys_realize(DeviceState *dev, Error **errp) +{ +NvmeSubsystem *subsys = NVME_SUBSYS(dev); + +nvme_subsys_setup(subsys); +} + +static void nvme_subsys_class_init(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); + +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + +dc->realize = nvme_subsys_realize; +dc->desc = "Virtual NVMe subsystem"; +} + +static const TypeInfo nvme_subsys_info = { +.name = TYPE_NVME_SUBSYS, +.parent = TYPE_DEVICE, +.class_init = nvme_subsys_class_init, +.instance_size = sizeof(NvmeSubsystem), +}; + +static void nvme_subsys_register_types(void) +{ +type_register_static(_subsys_info); +} + +type_init(nvme_subsys_register_types) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h new file mode 100644 index ..40f06a4c7db0 --- /dev/null +++ b/hw/block/nvme-subsys.h @@ -0,0 +1,25 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#ifndef NVME_SUBSYS_H +#define NVME_SUBSYS_H + +#define TYPE_NVME_SUBSYS "nvme-subsys" +#define NVME_SUBSYS(obj) \ +OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) + +#define NVME_SUBSYS_MAX_CTRLS 32 + +typedef struct NvmeCtrl NvmeCtrl; +typedef struct NvmeNamespace NvmeNamespace; +typedef struct NvmeSubsystem { +DeviceState parent_obj; +uint8_t subnqn[256]; +} NvmeSubsystem; + +#endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 21aec90637fa..aabccdf36f4b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,6 +25,7 @@ * mdts=,zoned.append_size_limit= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= + * -device nvme-subsys,id= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the @@ -38,6 +39,8 @@ * * The PMR will use BAR 4/5 exclusively. * + * To place controller(s) and namespace(s) to a subsystem, then provide + * nvme-subsys device as above. * * nvme device parameters * ~~ -- 2.17.1
[PATCH V5 0/6] hw/block/nvme: support multi-path for ctrl/ns
Hello, Here's fifth patch series for the support of NVMe subsystem scheme with multi-controller and namespace sharing in a subsystem. This series has applied review comments from the previous series, mostly from Keith's review. Thanks Keith! Here's test result with a simple 'nvme list -v' command from this model with adding a ZNS example with subsys. -device nvme-subsys,id=subsys0 \ -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ \ -device nvme,serial=qux,id=nvme3 \ -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \ \ -device nvme-subsys,id=subsys1 \ -device nvme,serial=quux,id=nvme4,subsys=subsys1 \ -device nvme-ns,id=ns4,drive=drv13,nsid=1,subsys=subsys1,zoned=true \ root@vm:~/work# nvme list -v NVM Express Subsystems SubsystemSubsystem-NQN Controllers nvme-subsys1 nqn.2019-08.org.qemu:subsys0 nvme0, nvme1, nvme2 nvme-subsys3 nqn.2019-08.org.qemu:qux nvme3 nvme-subsys4 nqn.2019-08.org.qemu:subsys1 nvme4 NVM Express Controllers Device SN MN FR TxPort AddressSubsystemNamespaces -- -- nvme0foo QEMU NVMe Ctrl 1.0 pcie :00:06.0 nvme-subsys1 nvme1c0n1 nvme1bar QEMU NVMe Ctrl 1.0 pcie :00:07.0 nvme-subsys1 nvme1c1n1 nvme2baz QEMU NVMe Ctrl 1.0 pcie :00:08.0 nvme-subsys1 nvme1c2n1, nvme1c2n2 nvme3qux QEMU NVMe Ctrl 1.0 pcie :00:09.0 nvme-subsys3 nvme3n1 nvme4quux QEMU NVMe Ctrl 1.0 pcie :00:0a.0 nvme-subsys4 nvme4c4n1 NVM Express Namespaces Device NSID Usage Format Controllers -- nvme1n1 1134.22 MB / 134.22 MB512 B + 0 B nvme0, nvme1, nvme2 nvme1n2 2268.44 MB / 268.44 MB512 B + 0 B nvme2 nvme3n1 3268.44 MB / 268.44 MB512 B + 0 B nvme3 nvme4n1 1268.44 MB / 268.44 MB512 B + 0 B nvme4 Thanks, Since V4: - Code clean-up to snprintf rather than duplicating it and copy. (Keith) - Documentation for 'subsys' clean-up. (Keith) - Remove 'cntlid' param from nvme_init_ctrl(). (Keith) - Put error_propagate() in nvme_realize(). (Keith) Since RFC V3: - Exclude 'deatched' scheme from this series. This will be covered in the next series by covering all the ns-related admin commands including ZNS and ns-mgmt. (Niklas) - Rebased on nvme-next. - Remove RFC tag from this V4. Since RFC V2: - Rebased on nvme-next branch with trivial patches from the previous version(V2) applied. (Klaus) - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus) - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was missed in V2. - Added 'detached' parameter to nvme-ns device to decide whether to attach or not to controller(s) in the subsystem. (Klaus) - Implemented Identify Active Namespace ID List aprt from Identify Allocated Namespace ID List by removing fall-thru statement. Since RFC V1: - Updated namespace sharing scheme to be based on nvme-subsys hierarchy. Minwoo Im (6): hw/block/nvme: introduce nvme-subsys device hw/block/nvme: support to map controller to a subsystem hw/block/nvme: add CMIC enum value for Identify Controller hw/block/nvme: support for multi-controller in subsystem hw/block/nvme: add NMIC enum value for Identify Namespace hw/block/nvme: support for shared namespace in subsystem hw/block/meson.build | 2 +- hw/block/nvme-ns.c | 23 +++-- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 106 + hw/block/nvme-subsys.h | 32 + hw/block/nvme.c| 72 +--- hw/block/nvme.h| 4 ++ include/block/nvme.h | 8 8 files changed, 242 insertions(+), 12 deletions(-) create
Re: [PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem
On 21-01-21 15:17:16, Keith Busch wrote: > On Fri, Jan 22, 2021 at 07:09:06AM +0900, Minwoo Im wrote: > > -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) > > +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t > > cntlid) > > { > > NvmeIdCtrl *id = >id_ctrl; > > uint8_t *pci_conf = pci_dev->config; > > > > +n->cntlid = cntlid; > > I don't think 'cntlid' is important enough to be a function parameter. > You can just set it within the 'NvmeCtrl' struct before calling this > function like all the other properties. Okay. Rather than adding a parameter to this function, nvme_init_subsys() may take this job to assign cntlid to the controller instance first. Let me fix one! > > @@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error > > **errp) > > return; > > } > > > > -nvme_init_ctrl(n, pci_dev); > > +cntlid = nvme_init_subsys(n, errp); > > +if (cntlid < 0) { > > error_propogate(); Thanks for catching this.
Re: [PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem
On 21-01-21 15:03:38, Keith Busch wrote: > On Fri, Jan 22, 2021 at 07:09:04AM +0900, Minwoo Im wrote: > > --- a/hw/block/nvme.c > > +++ b/hw/block/nvme.c > > @@ -23,6 +23,7 @@ > > * max_ioqpairs=, \ > > * aerl=, aer_max_queued=, \ > > * mdts=,zoned.append_size_limit= \ > > + * ,subsys= \ > > For consistency, the ',' goes in the preceeding line. I have no idea what happened here :(. Will fix it. Thanks!
Re: [PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device
On 21-01-21 14:52:02, Keith Busch wrote: > On Fri, Jan 22, 2021 at 07:09:03AM +0900, Minwoo Im wrote: > > +static void nvme_subsys_setup(NvmeSubsystem *subsys) > > +{ > > +char *subnqn; > > + > > +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", > > subsys->parent_obj.id); > > +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, > > '\0'); > > +g_free(subnqn); > > Instead of the duplication and copy, you could format the string > directly into the destination: > > snprintf(subsys->subnqn, sizeof(subsys->subnqn), > "nqn.2019-08.org.qemu:%s", > subsys->parent_obj.id); Oh, Thanks. Will fix it!
[PATCH V4 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index d6415a869c1c..ad68cdc2b92d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType { NVME_NIDT_CSI = 0x04, }; +enum NvmeIdNsNmic { +NVME_NMIC_NS_SHARED = 1 << 0, +}; + enum NvmeCsi { NVME_CSI_NVM= 0x00, NVME_CSI_ZONED = 0x02, -- 2.17.1
[PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem
We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.c | 21 + hw/block/nvme-subsys.h | 4 hw/block/nvme.c| 34 -- hw/block/nvme.h| 1 + 4 files changed, 58 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index f1dc71d588d9..a01003136b12 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -22,6 +22,27 @@ #include "nvme.h" #include "nvme-subsys.h" +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) +{ +NvmeSubsystem *subsys = n->subsys; +int cntlid; + +for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { +if (!subsys->ctrls[cntlid]) { +break; +} +} + +if (cntlid == ARRAY_SIZE(subsys->ctrls)) { +error_setg(errp, "no more free controller id"); +return -1; +} + +subsys->ctrls[cntlid] = n; + +return cntlid; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 40f06a4c7db0..4eba50d96a1d 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace; typedef struct NvmeSubsystem { DeviceState parent_obj; uint8_t subnqn[256]; + +NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; } NvmeSubsystem; +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index ab0531492ddd..225f0d3f3a27 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -4427,16 +4427,21 @@ static void nvme_init_subnqn(NvmeCtrl *n) } } -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; +n->cntlid = cntlid; + id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); + +id->cntlid = cntlid; + id->rab = 6; id->ieee[0] = 0x00; id->ieee[1] = 0x02; @@ -4483,6 +4488,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); +if (n->subsys) { +id->cmic |= NVME_CMIC_MULTI_CTRL; +} + NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); @@ -4497,11 +4506,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) n->bar.intmc = n->bar.intms = 0; } +static int nvme_init_subsys(NvmeCtrl *n, Error **errp) +{ +int cntlid; + +if (!n->subsys) { +return 0; +} + +cntlid = nvme_subsys_register_ctrl(n, errp); +if (cntlid < 0) { +return -1; +} + +return cntlid; +} + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); NvmeNamespace *ns; Error *local_err = NULL; +int cntlid; nvme_check_constraints(n, _err); if (local_err) { @@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } -nvme_init_ctrl(n, pci_dev); +cntlid = nvme_init_subsys(n, errp); +if (cntlid < 0) { +return; +} +nvme_init_ctrl(n, pci_dev, cntlid); /* setup a namespace if the controller drive property was given */ if (n->namespace.blkconf.blk) { diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 04d4684601fd..b8f5f2d6ffb8 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -134,6 +134,7 @@ typedef struct NvmeCtrl { NvmeBus bus; BlockConfconf; +uint16_tcntlid; boolqs_created; uint32_tpage_size; uint16_tpage_bits; -- 2.17.1
[PATCH V4 3/6] hw/block/nvme: add CMIC enum value for Identify Controller
Added Controller Multi-path I/O and Namespace Sharing Capabilities (CMIC) field to support multi-controller in the following patches. This field is in Identify Controller data structure in [76]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index e4b918064df9..d6415a869c1c 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa { NVME_LPA_EXTENDED = 1 << 2, }; +enum NvmeIdCtrlCmic { +NVME_CMIC_MULTI_CTRL= 1 << 1, +}; + #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) -- 2.17.1
[PATCH V4 6/6] hw/block/nvme: support for shared namespace in subsystem
nvme-ns device is registered to a nvme controller device during the initialization in nvme_register_namespace() in case that 'bus' property is given which means it's mapped to a single controller. This patch introduced a new property 'subsys' just like the controller device instance did to map a namespace to a NVMe subsystem. If 'subsys' property is given to the nvme-ns device, it will belong to the specified subsystem and will be attached to all controllers in that subsystem by enabling shared namespace capability in NMIC(Namespace Multi-path I/O and Namespace Capabilities) in Identify Namespace. Usage: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0 # Shared -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2 # Non-shared In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in the same subsystem. On the other hand, 'ns2' will be attached to the 'nvme2' only as a private namespace in that subsystem. All the namespace with 'subsys' parameter will attach all controllers in the subsystem to the namespace by default. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 23 ++- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 25 + hw/block/nvme-subsys.h | 3 +++ hw/block/nvme.c| 10 +- 5 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 62b25cf69bfa..9b493f2ead03 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +if (nvme_ns_shared(ns)) { +id_ns->nmic |= NVME_NMIC_NS_SHARED; +} + return 0; } @@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } -if (nvme_register_namespace(n, ns, errp)) { -error_propagate_prepend(errp, local_err, -"could not register namespace: "); -return; +if (ns->subsys) { +if (nvme_subsys_register_ns(ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not setup namespace to subsys: "); +return; +} +} else { +if (nvme_register_namespace(n, ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not register namespace: "); +return; +} } - } static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), +DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 293ac990e3f6..929e78861903 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -47,6 +47,8 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; +NvmeSubsystem *subsys; + NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; QTAILQ_HEAD(, NvmeZone) exp_open_zones; @@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline bool nvme_ns_shared(NvmeNamespace *ns) +{ +return !!ns->subsys; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index a01003136b12..e7efdcae7d0d 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp) +{ +NvmeSubsystem *subsys = ns->subsys; +NvmeCtrl *n; +int i; + +if (subsys->namespaces[nvme_nsid(ns)]) { +error_setg(errp, "namespace %d already registerd to subsy %s", + nvme_nsid(ns), subsys->parent_obj.id); +return -1; +} + +subsys->namespaces[nvme_nsid(ns)] = ns; + +for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { +n = subsys->ctrls[i]; + +if (n && nvme_register_namespace(n, ns, errp)) { +return -1; +} +} + +return 0; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 4eba50d96a1d..ccf6a71398d3 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -14,6 +14,7 @@ OBJ
[PATCH V4 0/6] hw/block/nvme: support multi-path for ctrl/ns
Hello, This series is fourth series for the support of NVMe subsystem scheme with multi-controller and namespace sharing in a subsystem. This time, I've removed 'detached' scheme introduced in the previous series out of this series to focus on subsystem scheme in ths series. The attach/detach scheme will be introduced in the next series with ns-mgmt functionality. Here's an example of how-to: # Specify a subsystem -device nvme-subsys,id=subsys0 \ -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ \ # Not specify a subsystem -device nvme,serial=qux,id=nvme3 \ -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \ # nvme list -v NVM Express Subsystems SubsystemSubsystem-NQN Controllers nvme-subsys1 nqn.2019-08.org.qemu:subsys0 nvme0, nvme1, nvme2 nvme-subsys3 nqn.2019-08.org.qemu:qux nvme3 NVM Express Controllers Device SN MN FR TxPort AddressSubsystemNamespaces -- -- nvme0foo QEMU NVMe Ctrl 1.0 pcie :00:06.0 nvme-subsys1 nvme1c0n1 nvme1bar QEMU NVMe Ctrl 1.0 pcie :00:07.0 nvme-subsys1 nvme1c1n1 nvme2baz QEMU NVMe Ctrl 1.0 pcie :00:08.0 nvme-subsys1 nvme1c2n1, nvme1c2n2 nvme3qux QEMU NVMe Ctrl 1.0 pcie :00:09.0 nvme-subsys3 nvme3c3n1 NVM Express Namespaces Device NSID Usage Format Controllers -- nvme1n1 1134.22 MB / 134.22 MB512 B + 0 B nvme0, nvme1, nvme2 nvme1n2 2268.44 MB / 268.44 MB512 B + 0 B nvme2 nvme3n1 3268.44 MB / 268.44 MB512 B + 0 B nvme3 Thanks, Since RFC V3: - Exclude 'deatched' scheme from this series. This will be covered in the next series by covering all the ns-related admin commands including ZNS and ns-mgmt. (Niklas) - Rebased on nvme-next. - Remove RFC tag from this V4. Since RFC V2: - Rebased on nvme-next branch with trivial patches from the previous version(V2) applied. (Klaus) - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus) - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was missed in V2. - Added 'detached' parameter to nvme-ns device to decide whether to attach or not to controller(s) in the subsystem. (Klaus) - Implemented Identify Active Namespace ID List aprt from Identify Allocated Namespace ID List by removing fall-thru statement. Since RFC V1: - Updated namespace sharing scheme to be based on nvme-subsys hierarchy. Minwoo Im (6): hw/block/nvme: introduce nvme-subsys device hw/block/nvme: support to map controller to a subsystem hw/block/nvme: add CMIC enum value for Identify Controller hw/block/nvme: support for multi-controller in subsystem hw/block/nvme: add NMIC enum value for Identify Namespace hw/block/nvme: support for shared namespace in subsystem hw/block/meson.build | 2 +- hw/block/nvme-ns.c | 23 +++-- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 109 + hw/block/nvme-subsys.h | 32 hw/block/nvme.c| 77 ++--- hw/block/nvme.h| 4 ++ include/block/nvme.h | 8 +++ 8 files changed, 249 insertions(+), 13 deletions(-) create mode 100644 hw/block/nvme-subsys.c create mode 100644 hw/block/nvme-subsys.h -- 2.17.1
[PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device
To support multi-path in QEMU NVMe device model, We need to have NVMe subsystem hierarchy to map controllers and namespaces to a NVMe subsystem. This patch introduced a simple nvme-subsys device model. The subsystem will be prepared with subsystem NQN with provided in nvme-subsys device: ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0 Signed-off-by: Minwoo Im --- hw/block/meson.build | 2 +- hw/block/nvme-subsys.c | 63 ++ hw/block/nvme-subsys.h | 25 + hw/block/nvme.c| 3 ++ 4 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 hw/block/nvme-subsys.c create mode 100644 hw/block/nvme-subsys.h diff --git a/hw/block/meson.build b/hw/block/meson.build index 602ca6c8541d..83ea2d37978d 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80.c')) softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c')) -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 'nvme-subsys.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c new file mode 100644 index ..f1dc71d588d9 --- /dev/null +++ b/hw/block/nvme-subsys.c @@ -0,0 +1,63 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#include "qemu/units.h" +#include "qemu/osdep.h" +#include "qemu/uuid.h" +#include "qemu/iov.h" +#include "qemu/cutils.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-core.h" +#include "hw/block/block.h" +#include "block/aio.h" +#include "block/accounting.h" +#include "sysemu/sysemu.h" +#include "hw/pci/pci.h" +#include "nvme.h" +#include "nvme-subsys.h" + +static void nvme_subsys_setup(NvmeSubsystem *subsys) +{ +char *subnqn; + +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", subsys->parent_obj.id); +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0'); +g_free(subnqn); +} + +static void nvme_subsys_realize(DeviceState *dev, Error **errp) +{ +NvmeSubsystem *subsys = NVME_SUBSYS(dev); + +nvme_subsys_setup(subsys); +} + +static void nvme_subsys_class_init(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); + +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + +dc->realize = nvme_subsys_realize; +dc->desc = "Virtual NVMe subsystem"; +} + +static const TypeInfo nvme_subsys_info = { +.name = TYPE_NVME_SUBSYS, +.parent = TYPE_DEVICE, +.class_init = nvme_subsys_class_init, +.instance_size = sizeof(NvmeSubsystem), +}; + +static void nvme_subsys_register_types(void) +{ +type_register_static(_subsys_info); +} + +type_init(nvme_subsys_register_types) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h new file mode 100644 index ..40f06a4c7db0 --- /dev/null +++ b/hw/block/nvme-subsys.h @@ -0,0 +1,25 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#ifndef NVME_SUBSYS_H +#define NVME_SUBSYS_H + +#define TYPE_NVME_SUBSYS "nvme-subsys" +#define NVME_SUBSYS(obj) \ +OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) + +#define NVME_SUBSYS_MAX_CTRLS 32 + +typedef struct NvmeCtrl NvmeCtrl; +typedef struct NvmeNamespace NvmeNamespace; +typedef struct NvmeSubsystem { +DeviceState parent_obj; +uint8_t subnqn[256]; +} NvmeSubsystem; + +#endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 21aec90637fa..aabccdf36f4b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,6 +25,7 @@ * mdts=,zoned.append_size_limit= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= + * -device nvme-subsys,id= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the @@ -38,6 +39,8 @@ * * The PMR will use BAR 4/5 exclusively. * + * To place controller(s) and namespace(s) to a subsystem, then provide + * nvme-subsys device as above. * * nvme device parameters * ~~ -- 2.17.1
[PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys). This patch maps a controller to a subsystem by adding a parameter 'subsys' to the nvme device. To map a controller to a subsystem, we need to put nvme-subsys first and then maps the subsystem to the controller: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 If 'subsys' property is not given to the nvme controller, then subsystem NQN will be created with serial (e.g., 'foo' in above example), Otherwise, it will be based on subsys id (e.g., 'subsys0' in above example). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 30 ++ hw/block/nvme.h | 3 +++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index aabccdf36f4b..ab0531492ddd 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -23,6 +23,7 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts=,zoned.append_size_limit= \ + * ,subsys= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= * -device nvme-subsys,id= @@ -44,6 +45,13 @@ * * nvme device parameters * ~~ + * - `subsys` + * NVM Subsystem device. If given, a subsystem NQN will be initialized with + *given. Otherwise, will be taken for subsystem NQN. + * Also, it will enable multi controller capability represented in Identify + * Controller data structure in CMIC (Controller Multi-path I/O and Namesapce + * Sharing Capabilities), if given. + * * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number * of concurrently outstanding Asynchronous Event Request commands support @@ -4404,11 +4412,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return 0; } +static void nvme_init_subnqn(NvmeCtrl *n) +{ +NvmeSubsystem *subsys = n->subsys; +NvmeIdCtrl *id = >id_ctrl; +char *subnqn; + +if (!subsys) { +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); +strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); +g_free(subnqn); +} else { +pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn); +} +} + static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; -char *subnqn; id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); @@ -4455,9 +4477,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN | NVME_CTRL_SGLS_BITBUCKET); -subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); -strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); -g_free(subnqn); +nvme_init_subnqn(n); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); @@ -4545,6 +4565,8 @@ static Property nvme_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), +DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0), DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0), diff --git a/hw/block/nvme.h b/hw/block/nvme.h index dee6092bd45f..04d4684601fd 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -2,6 +2,7 @@ #define HW_NVME_H #include "block/nvme.h" +#include "nvme-subsys.h" #include "nvme-ns.h" #define NVME_MAX_NAMESPACES 256 @@ -170,6 +171,8 @@ typedef struct NvmeCtrl { uint8_t zasl; +NvmeSubsystem *subsys; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; -- 2.17.1
Re: [PATCH v3 12/12] hw/block/nvme: lift cmb restrictions
Nice for codes much more clean. Reviewed-by: Minwoo Im
Re: [PATCH v3 07/12] hw/block/nvme: remove redundant zeroing of PMR registers
Reviewed-by: Minwoo Im
Re: [PATCH v3 10/12] hw/block/nvme: move cmb logic to v1.4
Reviewed-by: Minwoo Im
Re: [RFC PATCH V3 8/8] hw/block/nvme: Add Identify Active Namespace ID List
> Hello Minwoo, > > By introducing a detached parameter, > you are also implicitly making the following > NVMe commands no longer be spec compliant: > > NVME_ID_CNS_NS, NVME_ID_CNS_CS_NS, > NVME_ID_CNS_NS_ACTIVE_LIST, NVME_ID_CNS_CS_NS_ACTIVE_LIST > > When these commands are called on a detached namespace, > they should usually return a zero filled data struct. Agreed. > Dmitry and I had a patch on V8 on the ZNS series > that tried to fix some the existing NVMe commands > to be spec compliant, by handling detached namespaces > properly. In the end, in order to make it easier to > get the ZNS series accepted, we decided to drop the > detached related stuff from the series. > > Feel free to look at that patch for inspiration: > https://github.com/dmitry-fomichev/qemu/commit/251c0ffee5149c739b1347811fa7e32a1c36bf7c I've seen this patch and as Klaus said, only thing patches need go with is to put some of nvme_ns_is_attached() helper among the Identify handlers. > I'm not sure if you want to modify all the functions that > our patch modifies, but I think that you should at least > modify the following nvme functions: > > nvme_identify_ns() > nvme_identify_ns_csi() > nvme_identify_nslist() > nvme_identify_nslist_csi() Yes, pretty makes sense to update them. But now it seems like 'attach/detach' scheme should have been separated out of this series which just introduced the multi-path for controllers and namespace sharing. I will drop this 'detach' scheme out of this series and make another series to support all of the Identify you mentioned above cleanly. > So they handle detached namespaces correctly for both: > NVME_ID_CNS_NS, NVME_ID_CNS_CS_NS, > NVME_ID_CNS_NS_ACTIVE_LIST, NVME_ID_CNS_CS_NS_ACTIVE_LIST > as well as for: > NVME_ID_CNS_NS_PRESENT, NVME_ID_CNS_CS_NS_PRESENT, > NVME_ID_CNS_NS_PRESENT_LIST, NVME_ID_CNS_CS_NS_PRESENT_LIST > > > Kind regards, > Niklas
Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-20 08:52:14, Klaus Jensen wrote: > On Jan 20 09:44, Minwoo Im wrote: > > On 21-01-19 19:18:16, Klaus Jensen wrote: > > > On Jan 20 02:01, Minwoo Im wrote: > > > > Hello, > > > > > > > > This patch series is third one to support multi-controller and namespace > > > > sharing in multi-path. This series introduced subsystem scheme to > > > > manage controller(s) and namespace(s) in the subsystem. > > > > > > > > This series has new patches from the V2: 'detached' parameter has been > > > > added to the nvme-ns device. This will decide whether to attach the > > > > namespace to controller(s) in the current subsystem or not. If it's > > > > given with true, then it will be just allocated in the subsystem, but > > > > not attaching to any controllers in the subsystem. Otherwise, it will > > > > automatically attach to all the controllers in the subsystem. The other > > > > t hing is that the last patch implemented Identify Active Namespace ID > > > > List command handler apart from the Allocated Namespace ID List. > > > > > > > > Run with: > > > > -device nvme,serial=qux,id=nvme3 > > > > -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 > > > > > > > > -device nvme-subsys,id=subsys0 > > > > -device nvme,serial=foo,id=nvme0,subsys=subsys0 > > > > -device nvme,serial=bar,id=nvme1,subsys=subsys0 > > > > -device nvme,serial=baz,id=nvme2,subsys=subsys0 > > > > -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true > > > > -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 > > > > > > > > nvme-cli: > > > > root@vm:~/work# nvme list -v > > > > > > > > NVM Express Subsystems > > > > > > > > > > > > > > > > SubsystemSubsystem-NQN > > > > Controllers > > > > > > > > > > > > > > > > nvme-subsys0 nqn.2019-08.org.qemu:qux > > > > nvme0 > > > > nvme-subsys1 nqn.2019-08.org.qemu:subsys0 > > > > nvme1, nvme2, nvme3 > > > > > > > > > > > > NVM Express Controllers > > > > > > > > > > > > > > > > Device SN MN > > > > FR TxPort AddressSubsystemNamespaces > > > > > > > > -- -- > > > > > > > > nvme0qux QEMU NVMe Ctrl > > > > 1.0 pcie :00:06.0 nvme-subsys0 > > > > > > Shouldn't nvme0n1 be listed under Namespaces for nvme0? > > > > Oh, I missed that one from the output. As Keith mentioned, I ran the > > list command again based on the latest nvme-cli.git: > > > > Please refer the following result. I think it's okay not to send the > > cover letter again :) > > > > # nvme --version > > nvme version 1.13.48.g33c6 > > > > # nvme list -v > > NVM Express Subsystems > > > > SubsystemSubsystem-NQN > > Controllers > > > > > > > > nvme-subsys0 n
Re: [RFC PATCH V3 7/8] hw/block/nvme: add 'detached' param not to attach namespace
> Isn't the HBitmap slightly overkill? Can qemu/bitmap.h suffice? Definitely, yes, I think. Current patch series supoprt up to 32 controllers so I think qemu/bitmap.h is enough for us. Will update the bitmap operations in the next series.
Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns
> Minwoo, try pulling the most current nvme-cli. There was a sysfs > scanning bug for non-mpath drives that should be fixed now. Thank you, Keith! I've posted list result based on the latest one :)
Re: [RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-19 19:18:16, Klaus Jensen wrote: > On Jan 20 02:01, Minwoo Im wrote: > > Hello, > > > > This patch series is third one to support multi-controller and namespace > > sharing in multi-path. This series introduced subsystem scheme to > > manage controller(s) and namespace(s) in the subsystem. > > > > This series has new patches from the V2: 'detached' parameter has been > > added to the nvme-ns device. This will decide whether to attach the > > namespace to controller(s) in the current subsystem or not. If it's > > given with true, then it will be just allocated in the subsystem, but > > not attaching to any controllers in the subsystem. Otherwise, it will > > automatically attach to all the controllers in the subsystem. The other > > t hing is that the last patch implemented Identify Active Namespace ID > > List command handler apart from the Allocated Namespace ID List. > > > > Run with: > > -device nvme,serial=qux,id=nvme3 > > -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 > > > > -device nvme-subsys,id=subsys0 > > -device nvme,serial=foo,id=nvme0,subsys=subsys0 > > -device nvme,serial=bar,id=nvme1,subsys=subsys0 > > -device nvme,serial=baz,id=nvme2,subsys=subsys0 > > -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true > > -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 > > > > nvme-cli: > > root@vm:~/work# nvme list -v > > > > NVM Express Subsystems > > > > > > > > SubsystemSubsystem-NQN > > Controllers > > > > > > > > nvme-subsys0 nqn.2019-08.org.qemu:qux > > nvme0 > > nvme-subsys1 nqn.2019-08.org.qemu:subsys0 > > nvme1, nvme2, nvme3 > > > > > > NVM Express Controllers > > > > > > > > Device SN MN FR > > TxPort AddressSubsystemNamespaces > > > > -- -- > > nvme0qux QEMU NVMe Ctrl > > 1.0 pcie :00:06.0 nvme-subsys0 > > Shouldn't nvme0n1 be listed under Namespaces for nvme0? Oh, I missed that one from the output. As Keith mentioned, I ran the list command again based on the latest nvme-cli.git: Please refer the following result. I think it's okay not to send the cover letter again :) # nvme --version nvme version 1.13.48.g33c6 # nvme list -v NVM Express Subsystems SubsystemSubsystem-NQN Controllers nvme-subsys0 nqn.2019-08.org.qemu:qux nvme0 nvme-subsys1 nqn.2019-08.org.qemu:subsys0 nvme1, nvme2, nvme3 NVM Express Controllers Device SN MN FR TxPort AddressSubsystemNamespaces -- -- nvme0qux QEMU NVMe Ctrl 1.0 pcie :00:06.0 nvme-subsys0 nvme0n1 nvme1foo QEMU NVMe Ctrl 1.0 pcie :00:07.0 nvme-subsys1 nvme2bar
[RFC PATCH V3 7/8] hw/block/nvme: add 'detached' param not to attach namespace
Introduced 'detached' parameter to nvme-ns device. If given, the namespace will not be attached to controller(s) in the subsystem. If 'subsys' is not given with this option, it should be provided with 'bus' which is for private namespace. This patch also introduced 'ctrls_bitmap' in NvmeNamespace instance to represent which controler id(cntlid) is attached to this namespace device. A namespace can be attached to multiple controllers in a subsystem so that this bitmap maps those two relationships. The ctrls_bitmap bitmap should not be accessed directly, but through the helpers introduced in this patch: nvme_ns_is_attached(), nvme_ns_attach(), nvme_ns_detach(). Note that this patch made identify namespace list data not hold non-attached namespace ID in nvme_identify_nslist. Currently, this command handler is for CNS 0x2(Active) and 0x10(Allocated) both. The next patch will introduce a handler for later on. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 9 + hw/block/nvme-ns.h | 6 ++ hw/block/nvme-subsys.c | 2 ++ hw/block/nvme.c| 31 ++- hw/block/nvme.h| 15 +++ 5 files changed, 62 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 073f65e49cac..70d42c24065c 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -17,6 +17,7 @@ #include "qemu/cutils.h" #include "qemu/log.h" #include "qemu/error-report.h" +#include "qemu/hbitmap.h" #include "hw/block/block.h" #include "hw/pci/pci.h" #include "sysemu/sysemu.h" @@ -304,6 +305,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) return 0; } +static void nvme_ns_init_state(NvmeNamespace *ns) +{ +ns->ctrls_bitmap = hbitmap_alloc(NVME_SUBSYS_MAX_CTRLS, 0); +} + int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { @@ -314,6 +320,8 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) return -1; } +nvme_ns_init_state(ns); + if (nvme_ns_init(ns, errp)) { return -1; } @@ -381,6 +389,7 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, NvmeSubsystem *), +DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 929e78861903..ad2f55931d1b 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -26,6 +26,7 @@ typedef struct NvmeZone { } NvmeZone; typedef struct NvmeNamespaceParams { +bool detached; uint32_t nsid; QemuUUID uuid; @@ -48,6 +49,11 @@ typedef struct NvmeNamespace { uint8_t csi; NvmeSubsystem *subsys; +/* + * Whether this namespace is attached to a controller or not. This bitmap + * is based on controller id. This is valid only in case 'subsys' != NULL. + */ +HBitmap *ctrls_bitmap; NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index e7efdcae7d0d..32ad8ef2825a 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -11,6 +11,7 @@ #include "qemu/uuid.h" #include "qemu/iov.h" #include "qemu/cutils.h" +#include "qemu/hbitmap.h" #include "qapi/error.h" #include "hw/qdev-properties.h" #include "hw/qdev-core.h" @@ -20,6 +21,7 @@ #include "sysemu/sysemu.h" #include "hw/pci/pci.h" #include "nvme.h" +#include "nvme-ns.h" #include "nvme-subsys.h" int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 06bccf1b9e9e..2b2c07b36c2b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -26,7 +26,7 @@ * ,subsys= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ - * subsys= + * subsys=,detached= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at @@ -77,6 +77,13 @@ * controllers in the subsystem. Otherwise, `bus` must be given to attach * this namespace to a specified single controller as a non-shared namespace. * + * - `detached` + * Not to attach the namespace device to controllers in the NVMe subsystem + * during boot-up. If not given, namespaces are all attached to all + * controllers in the subsystem by default. + * It's mutual exclusive with 'bus' paraemter. It's only valid in case + * 'subsys' is provided. + * * Setting `zoned`
[RFC PATCH V3 3/8] hw/block/nvme: add CMIC enum value for Identify Controller
Added Controller Multi-path I/O and Namespace Sharing Capabilities (CMIC) field to support multi-controller in the following patches. This field is in Identify Controller data structure in [76]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index 45b2678db1f0..733fb35fedde 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -941,6 +941,10 @@ enum NvmeIdCtrlLpa { NVME_LPA_EXTENDED = 1 << 2, }; +enum NvmeIdCtrlCmic { +NVME_CMIC_MULTI_CTRL= 1 << 1, +}; + #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) -- 2.17.1
[RFC PATCH V3 1/8] hw/block/nvme: introduce nvme-subsys device
To support multi-path in QEMU NVMe device model, We need to have NVMe subsystem hierarchy to map controllers and namespaces to a NVMe subsystem. This patch introduced a simple nvme-subsys device model. The subsystem will be prepared with subsystem NQN with provided in nvme-subsys device: ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0 Signed-off-by: Minwoo Im --- hw/block/meson.build | 2 +- hw/block/nvme-subsys.c | 63 ++ hw/block/nvme-subsys.h | 25 + hw/block/nvme.c| 3 ++ 4 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 hw/block/nvme-subsys.c create mode 100644 hw/block/nvme-subsys.h diff --git a/hw/block/meson.build b/hw/block/meson.build index 602ca6c8541d..83ea2d37978d 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80.c')) softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c')) -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 'nvme-subsys.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: files('vhost-user-blk.c')) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c new file mode 100644 index ..f1dc71d588d9 --- /dev/null +++ b/hw/block/nvme-subsys.c @@ -0,0 +1,63 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#include "qemu/units.h" +#include "qemu/osdep.h" +#include "qemu/uuid.h" +#include "qemu/iov.h" +#include "qemu/cutils.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-core.h" +#include "hw/block/block.h" +#include "block/aio.h" +#include "block/accounting.h" +#include "sysemu/sysemu.h" +#include "hw/pci/pci.h" +#include "nvme.h" +#include "nvme-subsys.h" + +static void nvme_subsys_setup(NvmeSubsystem *subsys) +{ +char *subnqn; + +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", subsys->parent_obj.id); +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0'); +g_free(subnqn); +} + +static void nvme_subsys_realize(DeviceState *dev, Error **errp) +{ +NvmeSubsystem *subsys = NVME_SUBSYS(dev); + +nvme_subsys_setup(subsys); +} + +static void nvme_subsys_class_init(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); + +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); + +dc->realize = nvme_subsys_realize; +dc->desc = "Virtual NVMe subsystem"; +} + +static const TypeInfo nvme_subsys_info = { +.name = TYPE_NVME_SUBSYS, +.parent = TYPE_DEVICE, +.class_init = nvme_subsys_class_init, +.instance_size = sizeof(NvmeSubsystem), +}; + +static void nvme_subsys_register_types(void) +{ +type_register_static(_subsys_info); +} + +type_init(nvme_subsys_register_types) diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h new file mode 100644 index ..40f06a4c7db0 --- /dev/null +++ b/hw/block/nvme-subsys.h @@ -0,0 +1,25 @@ +/* + * QEMU NVM Express Subsystem: nvme-subsys + * + * Copyright (c) 2021 Minwoo Im + * + * This code is licensed under the GNU GPL v2. Refer COPYING. + */ + +#ifndef NVME_SUBSYS_H +#define NVME_SUBSYS_H + +#define TYPE_NVME_SUBSYS "nvme-subsys" +#define NVME_SUBSYS(obj) \ +OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) + +#define NVME_SUBSYS_MAX_CTRLS 32 + +typedef struct NvmeCtrl NvmeCtrl; +typedef struct NvmeNamespace NvmeNamespace; +typedef struct NvmeSubsystem { +DeviceState parent_obj; +uint8_t subnqn[256]; +} NvmeSubsystem; + +#endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 309c26db8ff7..4644937a5c50 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,6 +25,7 @@ * mdts=,zoned.append_size_limit= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= + * -device nvme-subsys,id= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. @@ -37,6 +38,8 @@ * -object memory-backend-file,id=,share=on,mem-path=, \ * size= -device nvme,...,pmrdev= * + * To place controller(s) and namespace(s) to a subsystem, then provide + * nvme-subsys device as above. * * nvme device parameters * ~~ -- 2.17.1
[RFC PATCH V3 8/8] hw/block/nvme: Add Identify Active Namespace ID List
Spec v1.4b 6.1.4 "Active and Inactive NSID Types" says: "Active NSIDs for a controller refer to namespaces that are attached to that controller. Allocated NSIDs that are inactive for a controller refer to namespaces that are not attached to that controller." This patch introduced for Identify Active Namespace ID List (CNS 02h). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 39 ++- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 2b2c07b36c2b..7247167b0ee6 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -2883,6 +2883,39 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } +static uint16_t nvme_identify_nslist_active(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeNamespace *ns; +NvmeIdentify *c = (NvmeIdentify *)>cmd; +uint32_t min_nsid = le32_to_cpu(c->nsid); +uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; +static const int data_len = sizeof(list); +uint32_t *list_ptr = (uint32_t *)list; +int i, j = 0; + +if (min_nsid >= NVME_NSID_BROADCAST - 1) { +return NVME_INVALID_NSID | NVME_DNR; +} + +for (i = 1; i <= n->num_namespaces; i++) { +ns = nvme_ns(n, i); +if (!ns || ns->params.nsid <= min_nsid) { +continue; +} + +if (!nvme_ns_is_attached(n, ns)) { +continue; +} + +list_ptr[j++] = cpu_to_le32(ns->params.nsid); +if (j == data_len / sizeof(uint32_t)) { +break; +} +} + +return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; @@ -2914,10 +2947,6 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) continue; } -if (!nvme_ns_is_attached(n, ns)) { -continue; -} - list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -3045,7 +3074,7 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) case NVME_ID_CNS_CS_CTRL: return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: - /* fall through */ + return nvme_identify_nslist_active(n, req); case NVME_ID_CNS_NS_PRESENT_LIST: return nvme_identify_nslist(n, req); case NVME_ID_CNS_CS_NS_ACTIVE_LIST: -- 2.17.1
[RFC PATCH V3 0/8] hw/block/nvme: support multi-path for ctrl/ns
Hello, This patch series is third one to support multi-controller and namespace sharing in multi-path. This series introduced subsystem scheme to manage controller(s) and namespace(s) in the subsystem. This series has new patches from the V2: 'detached' parameter has been added to the nvme-ns device. This will decide whether to attach the namespace to controller(s) in the current subsystem or not. If it's given with true, then it will be just allocated in the subsystem, but not attaching to any controllers in the subsystem. Otherwise, it will automatically attach to all the controllers in the subsystem. The other t hing is that the last patch implemented Identify Active Namespace ID List command handler apart from the Allocated Namespace ID List. Run with: -device nvme,serial=qux,id=nvme3 -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0,detached=true -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 nvme-cli: root@vm:~/work# nvme list -v NVM Express Subsystems SubsystemSubsystem-NQN Controllers nvme-subsys0 nqn.2019-08.org.qemu:qux nvme0 nvme-subsys1 nqn.2019-08.org.qemu:subsys0 nvme1, nvme2, nvme3 NVM Express Controllers Device SN MN FR TxPort AddressSubsystemNamespaces -- -- nvme0qux QEMU NVMe Ctrl 1.0 pcie :00:06.0 nvme-subsys0 nvme1foo QEMU NVMe Ctrl 1.0 pcie :00:07.0 nvme-subsys1 nvme2bar QEMU NVMe Ctrl 1.0 pcie :00:08.0 nvme-subsys1 nvme3baz QEMU NVMe Ctrl 1.0 pcie :00:09.0 nvme-subsys1 nvme1n1 NVM Express Namespaces Device NSID Usage Format Controllers -- nvme0n1 3268.44 MB / 268.44 MB512 B + 0 B nvme0 nvme1n1 2268.44 MB / 268.44 MB512 B + 0 B nvme3 Now we have [allocated|not-allocated]/[attached/detached] scheme for namespaces and controllers. The next series is going to be to support namespace management and attachment with few Identify command handlers. Please review. Thanks! Since RFC V2: - Rebased on nvme-next branch with trivial patches from the previous version(V2) applied. (Klaus) - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus) - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was missed in V2. - Added 'detached' parameter to nvme-ns device to decide whether to attach or not to controller(s) in the subsystem. (Klaus) - Implemented Identify Active Namespace ID List aprt from Identify Allocated Namespace ID List by removing fall-thru statement. Since RFC V1: - Updated namespace sharing scheme to be based on nvme-subsys hierarchy. Minwoo Im (8): hw/block/nvme: introduce nvme-subsys device hw/block/nvme: support to map controller to a subsystem hw/block/nvme: add CMIC enum value for Identify Controller hw/block/nvme: support for multi-controller in subsystem hw/block/nvme: add NMIC enum value for Identify Namespace hw/block/nvme: support for shared namespace in subsystem hw
[RFC PATCH V3 6/8] hw/block/nvme: support for shared namespace in subsystem
nvme-ns device is registered to a nvme controller device during the initialization in nvme_register_namespace() in case that 'bus' property is given which means it's mapped to a single controller. This patch introduced a new property 'subsys' just like the controller device instance did to map a namespace to a NVMe subsystem. If 'subsys' property is given to the nvme-ns device, it will belong to the specified subsystem and will be attached to all controllers in that subsystem by enabling shared namespace capability in NMIC(Namespace Multi-path I/O and Namespace Capabilities) in Identify Namespace. Usage: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0 # Shared -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2 # Non-shared In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in the same subsystem. On the other hand, 'ns2' will be attached to the 'nvme2' only as a private namespace in that subsystem. All the namespace with 'subsys' parameter will attach all controllers in the subsystem to the namespace by default. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 23 ++- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 25 + hw/block/nvme-subsys.h | 3 +++ hw/block/nvme.c| 10 +- 5 files changed, 62 insertions(+), 6 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c8b75fa02138..073f65e49cac 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +if (nvme_ns_shared(ns)) { +id_ns->nmic |= NVME_NMIC_NS_SHARED; +} + return 0; } @@ -358,16 +362,25 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } -if (nvme_register_namespace(n, ns, errp)) { -error_propagate_prepend(errp, local_err, -"could not register namespace: "); -return; +if (ns->subsys) { +if (nvme_subsys_register_ns(ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not setup namespace to subsys: "); +return; +} +} else { +if (nvme_register_namespace(n, ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not register namespace: "); +return; +} } - } static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), +DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 293ac990e3f6..929e78861903 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -47,6 +47,8 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; +NvmeSubsystem *subsys; + NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; QTAILQ_HEAD(, NvmeZone) exp_open_zones; @@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline bool nvme_ns_shared(NvmeNamespace *ns) +{ +return !!ns->subsys; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index a01003136b12..e7efdcae7d0d 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp) +{ +NvmeSubsystem *subsys = ns->subsys; +NvmeCtrl *n; +int i; + +if (subsys->namespaces[nvme_nsid(ns)]) { +error_setg(errp, "namespace %d already registerd to subsy %s", + nvme_nsid(ns), subsys->parent_obj.id); +return -1; +} + +subsys->namespaces[nvme_nsid(ns)] = ns; + +for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { +n = subsys->ctrls[i]; + +if (n && nvme_register_namespace(n, ns, errp)) { +return -1; +} +} + +return 0; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 4eba50d96a1d..ccf6a71398d3 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -14,6 +14,7 @@ OBJ
[RFC PATCH V3 4/8] hw/block/nvme: support for multi-controller in subsystem
We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.c | 21 + hw/block/nvme-subsys.h | 4 hw/block/nvme.c| 34 -- hw/block/nvme.h| 1 + 4 files changed, 58 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index f1dc71d588d9..a01003136b12 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -22,6 +22,27 @@ #include "nvme.h" #include "nvme-subsys.h" +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) +{ +NvmeSubsystem *subsys = n->subsys; +int cntlid; + +for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { +if (!subsys->ctrls[cntlid]) { +break; +} +} + +if (cntlid == ARRAY_SIZE(subsys->ctrls)) { +error_setg(errp, "no more free controller id"); +return -1; +} + +subsys->ctrls[cntlid] = n; + +return cntlid; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 40f06a4c7db0..4eba50d96a1d 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace; typedef struct NvmeSubsystem { DeviceState parent_obj; uint8_t subnqn[256]; + +NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; } NvmeSubsystem; +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 3e3b5451ea3d..9f8a739fcd8f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -4304,16 +4304,21 @@ static void nvme_init_subnqn(NvmeCtrl *n) } } -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; +n->cntlid = cntlid; + id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); + +id->cntlid = cntlid; + id->rab = 6; id->ieee[0] = 0x00; id->ieee[1] = 0x02; @@ -4359,6 +4364,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); +if (n->subsys) { +id->cmic |= NVME_CMIC_MULTI_CTRL; +} + NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); @@ -4371,11 +4380,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) n->bar.intmc = n->bar.intms = 0; } +static int nvme_init_subsys(NvmeCtrl *n, Error **errp) +{ +int cntlid; + +if (!n->subsys) { +return 0; +} + +cntlid = nvme_subsys_register_ctrl(n, errp); +if (cntlid < 0) { +return -1; +} + +return cntlid; +} + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); NvmeNamespace *ns; Error *local_err = NULL; +int cntlid; nvme_check_constraints(n, _err); if (local_err) { @@ -4391,7 +4417,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } -nvme_init_ctrl(n, pci_dev); +cntlid = nvme_init_subsys(n, errp); +if (cntlid < 0) { +return; +} +nvme_init_ctrl(n, pci_dev, cntlid); /* setup a namespace if the controller drive property was given */ if (n->namespace.blkconf.blk) { diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 3fa0e0a15539..c158cc873b59 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -133,6 +133,7 @@ typedef struct NvmeCtrl { NvmeBus bus; BlockConfconf; +uint16_tcntlid; boolqs_created; uint32_tpage_size; uint16_tpage_bits; -- 2.17.1
[RFC PATCH V3 2/8] hw/block/nvme: support to map controller to a subsystem
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys). This patch maps a controller to a subsystem by adding a parameter 'subsys' to the nvme device. To map a controller to a subsystem, we need to put nvme-subsys first and then maps the subsystem to the controller: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 If 'subsys' property is not given to the nvme controller, then subsystem NQN will be created with serial (e.g., 'foo' in above example), Otherwise, it will be based on subsys id (e.g., 'subsys0' in above example). Signed-off-by: Minwoo Im --- hw/block/nvme.c | 30 ++ hw/block/nvme.h | 3 +++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 4644937a5c50..3e3b5451ea3d 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -23,6 +23,7 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts=,zoned.append_size_limit= \ + * ,subsys= \ * -device nvme-ns,drive=,bus=,nsid=,\ * zoned= * -device nvme-subsys,id= @@ -43,6 +44,13 @@ * * nvme device parameters * ~~ + * - `subsys` + * NVM Subsystem device. If given, a subsystem NQN will be initialized with + *given. Otherwise, will be taken for subsystem NQN. + * Also, it will enable multi controller capability represented in Identify + * Controller data structure in CMIC (Controller Multi-path I/O and Namesapce + * Sharing Capabilities), if given. + * * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number * of concurrently outstanding Asynchronous Event Request commands support @@ -4281,11 +4289,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return 0; } +static void nvme_init_subnqn(NvmeCtrl *n) +{ +NvmeSubsystem *subsys = n->subsys; +NvmeIdCtrl *id = >id_ctrl; +char *subnqn; + +if (!subsys) { +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); +strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); +g_free(subnqn); +} else { +pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn); +} +} + static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; -char *subnqn; id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); @@ -4331,9 +4353,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN | NVME_CTRL_SGLS_BITBUCKET); -subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial); -strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0'); -g_free(subnqn); +nvme_init_subnqn(n); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); @@ -4417,6 +4437,8 @@ static void nvme_exit(PCIDevice *pci_dev) static Property nvme_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), +DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmrdev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial), diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 347c149e7905..3fa0e0a15539 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -2,6 +2,7 @@ #define HW_NVME_H #include "block/nvme.h" +#include "nvme-subsys.h" #include "nvme-ns.h" #define NVME_MAX_NAMESPACES 256 @@ -158,6 +159,8 @@ typedef struct NvmeCtrl { uint8_t zasl; +NvmeSubsystem *subsys; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; -- 2.17.1
[RFC PATCH V3 5/8] hw/block/nvme: add NMIC enum value for Identify Namespace
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index 733fb35fedde..28404a62b9d2 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1110,6 +1110,10 @@ enum NvmeNsIdentifierType { NVME_NIDT_CSI = 0x04, }; +enum NvmeIdNsNmic { +NVME_NMIC_NS_SHARED = 1 << 0, +}; + enum NvmeCsi { NVME_CSI_NVM= 0x00, NVME_CSI_ZONED = 0x02, -- 2.17.1
Re: [PATCH v3 10/12] hw/block/nvme: move cmb logic to v1.4
On 21-01-19 11:15:02, Klaus Jensen wrote: > From: Padmakar Kalghatgi > > Implement v1.4 logic for configuring the Controller Memory Buffer. This > is not backward compatible with v1.3, so drivers that only support v1.3 > will not be able to use the CMB anymore. Reviewed the legacy-cmb paramete, but we should update the commit description up there as we can support v1.3 behavior also, or you can update it when you are picking up :)
Re: [RFC PATCH V2 02/11] hw/block/nvme: open code for volatile write cache
On 21-01-19 08:32:21, Klaus Jensen wrote: > On Jan 17 23:53, Minwoo Im wrote: > > Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the > > initial time. This feature is related to block device backed, but this > > feature is controlled in controller level via Set/Get Features command. > > > > This patch removed dependency between nvme and nvme-ns to manage the VWC > > flag value. Also, it open coded the Get Features for VWC to check all > > namespaces attached to the controller, and if false detected, return > > directly false. > > > > Signed-off-by: Minwoo Im > > --- > > hw/block/nvme-ns.c | 4 > > hw/block/nvme.c| 15 --- > > hw/block/nvme.h| 1 - > > 3 files changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c > > index 32662230130b..c403cd36b6bd 100644 > > --- a/hw/block/nvme-ns.c > > +++ b/hw/block/nvme-ns.c > > @@ -89,10 +89,6 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace > > *ns, Error **errp) > > return -1; > > } > > > > -if (blk_enable_write_cache(ns->blkconf.blk)) { > > -n->features.vwc = 0x1; > > -} > > - > > return 0; > > } > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > > index cf0fe28fe6eb..b2a9c48a9d81 100644 > > --- a/hw/block/nvme.c > > +++ b/hw/block/nvme.c > > @@ -3033,6 +3033,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, > > NvmeRequest *req) > > NvmeGetFeatureSelect sel = NVME_GETFEAT_SELECT(dw10); > > uint16_t iv; > > NvmeNamespace *ns; > > +int i; > > > > static const uint32_t nvme_feature_default[NVME_FID_MAX] = { > > [NVME_ARBITRATION] = NVME_ARB_AB_NOLIMIT, > > @@ -3108,7 +3109,17 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, > > NvmeRequest *req) > > result = ns->features.err_rec; > > goto out; > > case NVME_VOLATILE_WRITE_CACHE: > > -result = n->features.vwc; > > +for (i = 1; i <= n->num_namespaces; i++) { > > +ns = nvme_ns(n, i); > > +if (!ns) { > > +continue; > > +} > > + > > +result = blk_enable_write_cache(ns->blkconf.blk); > > +if (!result) { > > +break; > > +} > > +} > > I did a small tweak here and changed `if (!result)` to `if (result)`. We > want to report that a volatile write cache is present if ANY of the > namespace backing devices have a write cache. Oh Thanks for the fix!
Re: [RFC PATCH V2 00/11] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-19 07:04:04, Klaus Jensen wrote: > On Jan 19 12:21, Minwoo Im wrote: > > On 21-01-18 22:14:45, Klaus Jensen wrote: > > > On Jan 17 23:53, Minwoo Im wrote: > > > > Hello, > > > > > > > > This patch series introduces NVMe subsystem device to support multi-path > > > > I/O in NVMe device model. Two use-cases are supported along with this > > > > patch: Multi-controller, Namespace Sharing. > > > > > > > > V1 RFC has been discussed with Klaus and Keith, I really appreciate them > > > > for this patch series to have proper direction [1]. > > > > > > > > This patch series contains few start-up refactoring pathces from the > > > > first to fifth patches to make nvme-ns device not to rely on the nvme > > > > controller always. Because nvme-ns shall be able to be mapped to the > > > > subsystem level, not a single controller level so that it should provide > > > > generic initialization code: nvme_ns_setup() with NvmeCtrl. To do that, > > > > the first five patches are to remove the NvmeCtrl * instance argument > > > > from the nvme_ns_setup(). I'd be happy if they are picked! > > > > > > > > For controller and namespace devices, 'subsys' property has been > > > > introduced to map them to a subsystem. If multi-controller needed, we > > > > can specify 'subsys' to controllers the same. > > > > > > > > For namespace deivice, if 'subsys' is not given just like it was, it > > > > will have to be provided with 'bus' parameter to specify a nvme > > > > controller device to attach, it means, they are mutual-exlusive. To > > > > share a namespace between or among controllers, then nvme-ns should have > > > > 'subsys' property to a single nvme subsystem instance. To make a > > > > namespace private one, then we need to specify 'bus' property rather > > > > than the 'subsys'. > > > > > > > > Of course, this series does not require any updates for the run command > > > > for the previos users. > > > > > > > > Plase refer the following example with nvme-cli output: > > > > > > > > QEMU Run: > > > > -device nvme-subsys,id=subsys0 \ > > > > -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ > > > > -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ > > > > -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ > > > > -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ > > > > -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ > > > > \ > > > > -device nvme,serial=qux,id=nvme3 \ > > > > -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 > > > > > > > > nvme-cli: > > > > root@vm:~/work# nvme list -v > > > > NVM Express Subsystems > > > > > > > > SubsystemSubsystem-NQN > > > > Controllers > > > > > > > > > > > > > > > > nvme-subsys1 nqn.2019-08.org.qemu:subsys0 > > > > nvme0, nvme1, nvme2 > > > > nvme-subsys3 nqn.2019-08.org.qemu:qux > > > > nvme3 > > > > > > > > NVM Express Controllers > > > > > > > > Device SN MN > > > > FR TxPort AddressSubsystemNamespaces > > > > > > > > -- -- > > > > > > > > nvme0foo QEMU NVMe Ctrl > > > > 1.0 pcie :00:06.0 nvme-subsys1 nvme1n1 > > > > nvme1bar QEMU NVMe Ctrl > > > > 1.0 pcie :00:07.0 nvme-subsys1 nvme1n1 > > > > nvme2baz QEMU NVMe Ctrl > > > > 1.0 pcie :00:08.0 nvme-subsys1 nvme1n1, nvme1n2 > > > > nvme3qux QEMU NVMe Ctrl > >
Re: [RFC PATCH V2 00/11] hw/block/nvme: support multi-path for ctrl/ns
On 21-01-18 22:14:45, Klaus Jensen wrote: > On Jan 17 23:53, Minwoo Im wrote: > > Hello, > > > > This patch series introduces NVMe subsystem device to support multi-path > > I/O in NVMe device model. Two use-cases are supported along with this > > patch: Multi-controller, Namespace Sharing. > > > > V1 RFC has been discussed with Klaus and Keith, I really appreciate them > > for this patch series to have proper direction [1]. > > > > This patch series contains few start-up refactoring pathces from the > > first to fifth patches to make nvme-ns device not to rely on the nvme > > controller always. Because nvme-ns shall be able to be mapped to the > > subsystem level, not a single controller level so that it should provide > > generic initialization code: nvme_ns_setup() with NvmeCtrl. To do that, > > the first five patches are to remove the NvmeCtrl * instance argument > > from the nvme_ns_setup(). I'd be happy if they are picked! > > > > For controller and namespace devices, 'subsys' property has been > > introduced to map them to a subsystem. If multi-controller needed, we > > can specify 'subsys' to controllers the same. > > > > For namespace deivice, if 'subsys' is not given just like it was, it > > will have to be provided with 'bus' parameter to specify a nvme > > controller device to attach, it means, they are mutual-exlusive. To > > share a namespace between or among controllers, then nvme-ns should have > > 'subsys' property to a single nvme subsystem instance. To make a > > namespace private one, then we need to specify 'bus' property rather > > than the 'subsys'. > > > > Of course, this series does not require any updates for the run command > > for the previos users. > > > > Plase refer the following example with nvme-cli output: > > > > QEMU Run: > > -device nvme-subsys,id=subsys0 \ > > -device nvme,serial=foo,id=nvme0,subsys=subsys0 \ > > -device nvme,serial=bar,id=nvme1,subsys=subsys0 \ > > -device nvme,serial=baz,id=nvme2,subsys=subsys0 \ > > -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \ > > -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \ > > \ > > -device nvme,serial=qux,id=nvme3 \ > > -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 > > > > nvme-cli: > > root@vm:~/work# nvme list -v > > NVM Express Subsystems > > > > SubsystemSubsystem-NQN > > Controllers > > > > > > > > nvme-subsys1 nqn.2019-08.org.qemu:subsys0 > > nvme0, nvme1, nvme2 > > nvme-subsys3 nqn.2019-08.org.qemu:qux > > nvme3 > > > > NVM Express Controllers > > > > Device SN MN FR > > TxPort AddressSubsystemNamespaces > > > > -- -- > > nvme0foo QEMU NVMe Ctrl > > 1.0 pcie :00:06.0 nvme-subsys1 nvme1n1 > > nvme1bar QEMU NVMe Ctrl > > 1.0 pcie :00:07.0 nvme-subsys1 nvme1n1 > > nvme2baz QEMU NVMe Ctrl > > 1.0 pcie :00:08.0 nvme-subsys1 nvme1n1, nvme1n2 > > nvme3qux QEMU NVMe Ctrl > > 1.0 pcie :00:09.0 nvme-subsys3 > > > > NVM Express Namespaces > > > > Device NSID Usage Format > > Controllers > > -- > > > > nvme1n1 1134.22 MB / 134.22 MB512 B + 0 B nvme0, > > nvme1, nvme2 > > nvme1n2 2268.44 MB / 268.44 MB512 B + 0 B nvme2 > > nvme3n1 3268.44 MB / 268.44 MB512 B + 0 B nvme3 > > > > Summary: > > - Refactored nvme-ns device not to rely on controller during the > > setup. [1/11 - 5/11] > > - Introduced a nvme-subsys device model. [6/11] > > - Create subsystem NQN based on subsystem. [7/11]
Re: [RFC PATCH V2 10/11] hw/block/nvme: add NMIC enum value for Identify Namespace
> Let's keep convention (or should be convention...) of using NvmeIdNs > prefix for identify data structure fields on the enum. Okay!
Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4
On 21-01-18 20:23:30, Klaus Jensen wrote: > On Jan 18 14:22, Klaus Jensen wrote: > > On Jan 18 22:12, Minwoo Im wrote: > > > On 21-01-18 14:10:50, Klaus Jensen wrote: > > > > On Jan 18 22:09, Minwoo Im wrote: > > > > > > > Yes, CMB in v1.4 is not backward-compatible, but is it okay to > > > > > > > move onto > > > > > > > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this > > > > > > > device > > > > > > > model? > > > > > > > > > > > > Next patch moves to v1.4. I wanted to split it because the "bump" > > > > > > patch > > > > > > also adds a couple of other v1.4 requires fields. But I understand > > > > > > that > > > > > > this is slightly wrong. > > > > > > > > > > Sorry, I meant I'd like to have CMB for v1.3 support along with the > > > > > v1.4 > > > > > support in this device model separately. Maybe I missed the > > > > > linux-nvme > > > > > mailing list for CMB v1.4, but is there no plan to keep supportin CMB > > > > > v1.3 in NVMe driver? > > > > > > > > I posted a patch on linux-nvme for v1.4 support in the kernel. It will > > > > support both the v1.3 and v1.4 behavior :) > > > > > > Then, can we maintain CMB v1.3 also in QEMU also along with v1.4 ? :) > > > > My first version of this patch actually included compatibility support > > for v1.3 ("legacy cmb"), but Keith suggested we just drop that and keep > > this device compliant. > > > > What we could do is allow the spec version to be chosen with a > > parameter? > > Uhm. Maybe not. I gave this some more thought. > > Adding a device parameter to choose the specification version requires > us to maintain QEMU "compat" properties across different QEMU version. > I'm not sure we want that for something like this. Agreed. The default should head for latest one. > > Maybe the best course of action actually *is* an 'x-legacy-cmb' > parameter. This looks fine. As kernel driver does not remove the v1.3 CMB support, then it would be great if QEMU suports that also.
Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes
On 21-01-18 20:53:24, Klaus Jensen wrote: > On Jan 18 21:59, Minwoo Im wrote: > > > The spec requires aligned 32 bit accesses (or the size of the register), > > > so maybe it's about time we actually ignore instead of falling through. > > > > Agreed. > > > > On the other hand. > > The spec just allows undefined behavior if the alignment or size > requirement is violated. So falling through is not wrong. If so, maybe we just can make this MMIO region support under 4bytes access with error messaging. I don't think we should not support them on purpose because spec says it just results in undefined behaviors which is just undefined how to handle them.
Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4
On 21-01-18 14:10:50, Klaus Jensen wrote: > On Jan 18 22:09, Minwoo Im wrote: > > > > Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto > > > > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device > > > > model? > > > > > > Next patch moves to v1.4. I wanted to split it because the "bump" patch > > > also adds a couple of other v1.4 requires fields. But I understand that > > > this is slightly wrong. > > > > Sorry, I meant I'd like to have CMB for v1.3 support along with the v1.4 > > support in this device model separately. Maybe I missed the linux-nvme > > mailing list for CMB v1.4, but is there no plan to keep supportin CMB > > v1.3 in NVMe driver? > > I posted a patch on linux-nvme for v1.4 support in the kernel. It will > support both the v1.3 and v1.4 behavior :) Then, can we maintain CMB v1.3 also in QEMU also along with v1.4 ? :)
Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4
> > Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto > > the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device > > model? > > Next patch moves to v1.4. I wanted to split it because the "bump" patch > also adds a couple of other v1.4 requires fields. But I understand that > this is slightly wrong. Sorry, I meant I'd like to have CMB for v1.3 support along with the v1.4 support in this device model separately. Maybe I missed the linux-nvme mailing list for CMB v1.4, but is there no plan to keep supportin CMB v1.3 in NVMe driver?
Re: [PATCH v2 10/12] hw/block/nvme: move cmb logic to v1.4
Hello, On 21-01-18 10:47:03, Klaus Jensen wrote: > From: Padmakar Kalghatgi > > Implement v1.4 logic for configuring the Controller Memory Buffer. This > is not backward compatible with v1.3, so drivers that only support v1.3 > will not be able to use the CMB anymore. > > Signed-off-by: Padmakar Kalghatgi > Signed-off-by: Klaus Jensen Yes, CMB in v1.4 is not backward-compatible, but is it okay to move onto the CMB v1.4 from v1.3 without supporting the v1.3 usage on this device model?
Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes
> The spec requires aligned 32 bit accesses (or the size of the register), > so maybe it's about time we actually ignore instead of falling through. Agreed.
Re: [PATCH v2 07/12] hw/block/nvme: remove redundant zeroing of PMR registers
On 21-01-18 10:47:00, Klaus Jensen wrote: > From: Klaus Jensen > > The controller registers are initially zero. Remove the redundant > zeroing. > > Signed-off-by: Klaus Jensen > --- > hw/block/nvme.c | 35 --- > 1 file changed, 35 deletions(-) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index f3bea582b3c0..9ee9570bb65c 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -4179,43 +4179,8 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice > *pci_dev) > > static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev) > { > -/* PMR Capabities register */ > -n->bar.pmrcap = 0; > -NVME_PMRCAP_SET_RDS(n->bar.pmrcap, 0); > -NVME_PMRCAP_SET_WDS(n->bar.pmrcap, 0); > NVME_PMRCAP_SET_BIR(n->bar.pmrcap, NVME_PMR_BIR); > -NVME_PMRCAP_SET_PMRTU(n->bar.pmrcap, 0); > -/* Turn on bit 1 support */ This comment says that PMRWBM [1]th bit is set to PMRCAP below :). > NVME_PMRCAP_SET_PMRWBM(n->bar.pmrcap, 0x02); > -NVME_PMRCAP_SET_PMRTO(n->bar.pmrcap, 0); > -NVME_PMRCAP_SET_CMSS(n->bar.pmrcap, 0); > - > -/* PMR Control register */ > -n->bar.pmrctl = 0; > -NVME_PMRCTL_SET_EN(n->bar.pmrctl, 0); > - > -/* PMR Status register */ > -n->bar.pmrsts = 0; > -NVME_PMRSTS_SET_ERR(n->bar.pmrsts, 0); > -NVME_PMRSTS_SET_NRDY(n->bar.pmrsts, 0); > -NVME_PMRSTS_SET_HSTS(n->bar.pmrsts, 0); > -NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 0); > - > -/* PMR Elasticity Buffer Size register */ > -n->bar.pmrebs = 0; > -NVME_PMREBS_SET_PMRSZU(n->bar.pmrebs, 0); > -NVME_PMREBS_SET_RBB(n->bar.pmrebs, 0); > -NVME_PMREBS_SET_PMRWBZ(n->bar.pmrebs, 0); > - > -/* PMR Sustained Write Throughput register */ > -n->bar.pmrswtp = 0; > -NVME_PMRSWTP_SET_PMRSWTU(n->bar.pmrswtp, 0); > -NVME_PMRSWTP_SET_PMRSWTV(n->bar.pmrswtp, 0); > - > -/* PMR Memory Space Control register */ > -n->bar.pmrmsc = 0; > -NVME_PMRMSC_SET_CMSE(n->bar.pmrmsc, 0); > -NVME_PMRMSC_SET_CBA(n->bar.pmrmsc, 0); > > pci_register_bar(pci_dev, NVME_PMRCAP_BIR(n->bar.pmrcap), > PCI_BASE_ADDRESS_SPACE_MEMORY | > -- > 2.30.0 > >
Re: [PATCH v2 06/12] hw/block/nvme: rename PMR/CMB shift/mask fields
On 21-01-18 10:46:59, Klaus Jensen wrote: > From: Klaus Jensen > > Use the correct field names. > > Signed-off-by: Klaus Jensen > --- > include/block/nvme.h | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/include/block/nvme.h b/include/block/nvme.h > index 86d7fc2f905c..f3cbe17d0971 100644 > --- a/include/block/nvme.h > +++ b/include/block/nvme.h > @@ -35,8 +35,8 @@ enum NvmeCapShift { > CAP_CSS_SHIFT = 37, > CAP_MPSMIN_SHIFT = 48, > CAP_MPSMAX_SHIFT = 52, > -CAP_PMR_SHIFT = 56, > -CAP_CMB_SHIFT = 57, > +CAP_PMRS_SHIFT = 56, > +CAP_CMBS_SHIFT = 57, > }; > > enum NvmeCapMask { > @@ -49,8 +49,8 @@ enum NvmeCapMask { > CAP_CSS_MASK = 0xff, > CAP_MPSMIN_MASK= 0xf, > CAP_MPSMAX_MASK= 0xf, > -CAP_PMR_MASK = 0x1, > -CAP_CMB_MASK = 0x1, > +CAP_PMRS_MASK = 0x1, > +CAP_CMBS_MASK = 0x1, > }; > > #define NVME_CAP_MQES(cap) (((cap) >> CAP_MQES_SHIFT) & CAP_MQES_MASK) > @@ -81,10 +81,10 @@ enum NvmeCapMask { > << > CAP_MPSMIN_SHIFT) > #define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & > CAP_MPSMAX_MASK)\ > << > CAP_MPSMAX_SHIFT) > -#define NVME_CAP_SET_PMRS(cap, val) (cap |= (uint64_t)(val & CAP_PMR_MASK) > \ > - << CAP_PMR_SHIFT) > -#define NVME_CAP_SET_CMBS(cap, val) (cap |= (uint64_t)(val & CAP_CMB_MASK) > \ > - << CAP_CMB_SHIFT) > +#define NVME_CAP_SET_PMRS(cap, val) (cap |= (uint64_t)(val & > CAP_PMRS_MASK) \ > + << CAP_PMRS_SHIFT) > +#define NVME_CAP_SET_CMBS(cap, val) (cap |= (uint64_t)(val & > CAP_CMBS_MASK) \ > + << CAP_CMBS_SHIFT) Oh, it would have been better folded into [3/12] patch, though. Changes are looking good to me to represent "Supported". Reviewed-by: Minwoo Im > > enum NvmeCapCss { > NVME_CAP_CSS_NVM= 1 << 0, > -- > 2.30.0 > >
Re: [PATCH v2 05/12] hw/block/nvme: allow cmb and pmr to coexist
Reviewed-by: Minwoo Im
Re: [PATCH v2 04/12] hw/block/nvme: move msix table and pba to BAR 0
On 21-01-18 10:46:57, Klaus Jensen wrote: > From: Klaus Jensen > > In the interest of supporting both CMB and PMR to be enabled on the same > device, move the MSI-X table and pending bit array out of BAR 4 and into > BAR 0. Nice! Reviewed-by: Minwoo Im Tested-by: Minwoo Im Thanks,
Re: [PATCH v2 03/12] hw/block/nvme: indicate CMB support through controller capabilities register
Reviewed-by: Minwoo Im
Re: [PATCH v2 02/12] hw/block/nvme: fix 64 bit register hi/lo split writes
On 21-01-18 10:46:55, Klaus Jensen wrote: > From: Klaus Jensen > > 64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32 > bit write combination as well as a plain 64 bit write. The spec does not > define ordering on the hi/lo split, but the code currently assumes that > the low order bits are written first. Additionally, the code does not > consider that another address might already have been written into the > register, causing the OR'ing to result in a bad address. > > Fix this by explicitly overwriting only the low or high order bits for > 32 bit writes. > > Signed-off-by: Klaus Jensen > --- > hw/block/nvme.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index bd7e258c3c26..40b9f8ccfc0e 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -3781,19 +3781,21 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr > offset, uint64_t data, > trace_pci_nvme_mmio_aqattr(data & 0x); > break; > case 0x28: /* ASQ */ > -n->bar.asq = data; > +n->bar.asq = size == 8 ? data : > +(n->bar.asq & ~0x) | (data & 0x); ^^^ If this one is to unmask lower 32bits other than higher 32bits, then it should be: (n->bar.asq & ~0xULL) Also, maybe we should take care of lower than 4bytes as: .min_access_size = 2, .max_access_size = 8, Even we have some error messages up there with: if (unlikely(size < sizeof(uint32_t))) { NVME_GUEST_ERR(pci_nvme_ub_mmiowr_toosmall, "MMIO write smaller than 32-bits," " offset=0x%"PRIx64", size=%u", offset, size); /* should be ignored, fall through for now */ } It's a fall-thru error, and that's it. I would prefer not to have this error and support for 2bytes access also, OR do not support for 2bytes access for this MMIO area. Thanks!
Re: [PATCH v2 01/12] hw/block/nvme: add size to mmio read/write trace events
Reviewed-by: Minwoo Im
[RFC PATCH V2 11/11] hw/block/nvme: support for shared namespace in subsystem
nvme-ns device is registered to a nvme controller device during the initialization in nvme_register_namespace() in case that 'bus' property is given which means it's mapped to a single controller. This patch introduced a new property 'subsys' just like the controller device instance did to map a namespace to a NVMe subsystem. If 'subsys' property is given to the nvme-ns device, it will belong to the specified subsystem and will be attached to all controllers in that subsystem by enabling shared namespace capability in NMIC(Namespace Multi-path I/O and Namespace Capabilities) in Identify Namespace. Usage: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme,serial=bar,id=nvme1,subsys=subsys0 -device nvme,serial=baz,id=nvme2,subsys=subsys0 -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0 # Shared -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2 # Non-shared In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in the same subsystem. On the other hand, 'ns2' will be attached to the 'nvme2' only as a private namespace in that subsystem. Signed-off-by: Minwoo Im --- hw/block/nvme-ns.c | 23 ++- hw/block/nvme-ns.h | 7 +++ hw/block/nvme-subsys.c | 17 + hw/block/nvme-subsys.h | 1 + hw/block/nvme.c| 10 +- 5 files changed, 52 insertions(+), 6 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c8b75fa02138..073f65e49cac 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +if (nvme_ns_shared(ns)) { +id_ns->nmic |= NVME_NMIC_NS_SHARED; +} + return 0; } @@ -358,16 +362,25 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) return; } -if (nvme_register_namespace(n, ns, errp)) { -error_propagate_prepend(errp, local_err, -"could not register namespace: "); -return; +if (ns->subsys) { +if (nvme_subsys_register_ns(ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not setup namespace to subsys: "); +return; +} +} else { +if (nvme_register_namespace(n, ns, errp)) { +error_propagate_prepend(errp, local_err, +"could not register namespace: "); +return; +} } - } static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), +DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS, + NvmeSubsystem *), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d87c025b0ab6..3fc7262a7915 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -47,6 +47,8 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; +NvmeSubsystem *subsys; + NvmeIdNsZoned *id_ns_zoned; NvmeZone*zone_array; QTAILQ_HEAD(, NvmeZone) exp_open_zones; @@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline bool nvme_ns_shared(NvmeNamespace *ns) +{ +return !!ns->subsys; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNs *id_ns = >id_ns; diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index a01003136b12..d67160a0ed6f 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -43,6 +43,23 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) return cntlid; } +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp) +{ +NvmeSubsystem *subsys = ns->subsys; +NvmeCtrl *n; +int i; + +for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { +n = subsys->ctrls[i]; + +if (n && nvme_register_namespace(n, ns, errp)) { +return -1; +} +} + +return 0; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 4eba50d96a1d..cae0fbd464e2 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -25,5 +25,6 @@ typedef struct NvmeSubsystem { } NvmeSubsystem; int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); +int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp); #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 2aeb164dd508..7869bd95b099 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -25,7 +25,8 @@ * mdts=,zoned.append_s
[RFC PATCH V2 10/11] hw/block/nvme: add NMIC enum value for Identify Namespace
Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im --- include/block/nvme.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/nvme.h b/include/block/nvme.h index e83ec1e124c6..dd7b5ba9ef4c 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1110,6 +1110,10 @@ enum NvmeNsIdentifierType { NVME_NIDT_CSI = 0x04, }; +enum NvmeNsNmic { +NVME_NMIC_NS_SHARED = 1 << 0, +}; + enum NvmeCsi { NVME_CSI_NVM= 0x00, NVME_CSI_ZONED = 0x02, -- 2.17.1
[RFC PATCH V2 09/11] hw/block/nvme: support for multi-controller in subsystem
We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im --- hw/block/nvme-subsys.c | 21 + hw/block/nvme-subsys.h | 4 hw/block/nvme.c| 32 ++-- hw/block/nvme.h| 1 + 4 files changed, 56 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c index f1dc71d588d9..a01003136b12 100644 --- a/hw/block/nvme-subsys.c +++ b/hw/block/nvme-subsys.c @@ -22,6 +22,27 @@ #include "nvme.h" #include "nvme-subsys.h" +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) +{ +NvmeSubsystem *subsys = n->subsys; +int cntlid; + +for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { +if (!subsys->ctrls[cntlid]) { +break; +} +} + +if (cntlid == ARRAY_SIZE(subsys->ctrls)) { +error_setg(errp, "no more free controller id"); +return -1; +} + +subsys->ctrls[cntlid] = n; + +return cntlid; +} + static void nvme_subsys_setup(NvmeSubsystem *subsys) { char *subnqn; diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h index 40f06a4c7db0..4eba50d96a1d 100644 --- a/hw/block/nvme-subsys.h +++ b/hw/block/nvme-subsys.h @@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace; typedef struct NvmeSubsystem { DeviceState parent_obj; uint8_t subnqn[256]; + +NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS]; } NvmeSubsystem; +int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp); + #endif /* NVME_SUBSYS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index af5b2162e2b5..2aeb164dd508 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -4242,7 +4242,7 @@ static void nvme_init_subnqn(NvmeCtrl *n) } } -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid) { NvmeIdCtrl *id = >id_ctrl; uint8_t *pci_conf = pci_dev->config; @@ -4252,6 +4252,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' '); strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' '); + +id->cntlid = cntlid; + id->rab = 6; id->ieee[0] = 0x00; id->ieee[1] = 0x02; @@ -4297,6 +4300,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); +if (n->subsys) { +id->cmic |= NVME_CMIC_MULTI_CTRL; +} + NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); @@ -4309,11 +4316,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) n->bar.intmc = n->bar.intms = 0; } +static int nvme_init_subsys(NvmeCtrl *n, Error **errp) +{ +int cntlid; + +if (!n->subsys) { +return 0; +} + +cntlid = nvme_subsys_register_ctrl(n, errp); +if (cntlid < 0) { +return -1; +} + +return cntlid; +} + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); NvmeNamespace *ns; Error *local_err = NULL; +int cntlid; nvme_check_constraints(n, _err); if (local_err) { @@ -4329,7 +4353,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } -nvme_init_ctrl(n, pci_dev); +cntlid = nvme_init_subsys(n, errp); +if (cntlid < 0) { +return; +} +nvme_init_ctrl(n, pci_dev, cntlid); /* setup a namespace if the controller drive property was given */ if (n->namespace.blkconf.blk) { diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 2061e53e2cee..329902a9fc04 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -129,6 +129,7 @@ typedef struct NvmeCtrl { NvmeBus bus; BlockConfconf; +uint16_tcntlid; boolqs_created; uint32_tpage_size; uint16_tpage_bits; -- 2.17.1