Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-12 Thread Matthew R. Ochs
> On Aug 11, 2015, at 10:54 PM, Michael Neuling  wrote:
> 
> 
 +  pr_debug("%s: Wait for user context to quiesce...\n", __func__);
 +  wake_up_all(&cfg->limbo_waitq);
 +  ssleep(1);
>>> 
>>> Why 1 sec and why in a loop?  Can’t you poll/wait for completion somewhere?
>> 
>> This routine is called when the device is being removed and we need to stall 
>> until
>> the user contexts are made aware of removal [by marking them in error] and 
>> have
>> completely gone away [there are no longer any contexts in our table or list].
>> 
>> The 1 second sleep is simply to give the users time to cleanup once they see
>> the notification. We can make it a larger/smaller value or remove it 
>> entirely, but
>> I felt that since this is not a hot path there was no reason to perform a 
>> busy-wait
>> style of loop here and yield to someone else.
>> 
>> As for the completion/poll, I did consider that but couldn’t come up with a 
>> clean
>> way of implementing given how we designed the notification/cleanup mechanism
>> (really just a failed recovery). We can certainly look at doing that as an
>> enhancement in the future.
> 
> Does the API allow this flexibility in the future?

Yes, I believe it is flexible enough where it could allow for this.

>> 
 +  rhte_checkin(ctxi, rhte);
 +  cxlflash_lun_detach(gli);
 +
 +out:
 +  if (unlock_ctx)
 +  mutex_unlock(&ctxi->mutex);
>>> 
>>> Where is the matching lock for this?
>> 
>> One of the main design points of our context serialization strategy is that
>> any context returned from get_context() has been fully validated, will not
>> be removed from under you, and _is holding the context mutex_. Thus
>> for each of these mutex_unlock(ctxi) you see at the bottom of external
>> entry points, the mutex was obtained in get_context().
> 
> Should we have something like put_context() that does this?  So there is
> matching calls to get/put_context

Yes, this is a really good idea (I wish I’d thought of it ;)) that will make for
better code. Look for it in v5!

> 
>> 
 +  char *tmp = NULL;
 +  size_t size;
 +  struct afu *afu = cfg->afu;
 +  struct ctx_info *ctxi = NULL;
 +  struct sisl_rht_entry *rhte;
 +
 +  size = (MAX_RHT_PER_CONTEXT * sizeof(*ctxi->rht_lun));
 +  size += sizeof(*ctxi);
 +
 +  tmp = kzalloc(size, GFP_KERNEL);
>>> 
>>> Just do two allocs. One for ctxi and one for rht_lun.  This is overly
>>> complicated.
>> 
>> I disagree that it’s overly complicated. The intention here was to get this
>> stuff together to avoid multiple function calls and possibly help out 
>> perf-wise
>> via locality. That said, I’m not opposed to performing multiple allocations.
> 
> I'm not sure I buy it's a performance issue on create_context().  How
> often are you calling that?  Doesn't that do a lot of things other than
> just this?
> 
> Anyway if you want to do that, that's ok I guess, but you need to
> document why and what you’re doing.

We’ve already made the change to perform multiple allocations and
are ‘okay’ with that.

 +  hdr = (struct dk_cxlflash_hdr *)&buf;
 +  if (hdr->version != 0) {
 +  pr_err("%s: Version %u not supported for %s\n",
 + __func__, hdr->version, decode_ioctl(cmd));
 +  rc = -EINVAL;
 +  goto cxlflash_ioctl_exit;
 +  }
>>> 
>>> Do you advertise this version anywhere?  Users just have to call it and 
>>> fail?
>> 
>> We don’t. We can add a version define to the exported ioctl header.
> 
> It needs to be exported dynamically by the kernel.  Not the headers.
> Think new software on old kernel and visa versa.

Okay, will look into this.

 +#define DK_CXLFLASH_ATTACHCXL_IOWR(0x80, 
 dk_cxlflash_attach)
 +#define DK_CXLFLASH_USER_DIRECT   CXL_IOWR(0x81, 
 dk_cxlflash_udirect)
 +#define DK_CXLFLASH_RELEASE   CXL_IOWR(0x84, 
 dk_cxlflash_release)
 +#define DK_CXLFLASH_DETACHCXL_IOWR(0x85, 
 dk_cxlflash_detach)
 +#define DK_CXLFLASH_VERIFYCXL_IOWR(0x86, 
 dk_cxlflash_verify)
 +#define DK_CXLFLASH_RECOVER_AFU   CXL_IOWR(0x88, 
 dk_cxlflash_recover_afu)
 +#define DK_CXLFLASH_MANAGE_LUNCXL_IOWR(0x89, 
 dk_cxlflash_manage_lun)
>>> 
>>> I'm not sure I'd leave these sparse.  What happens if the vlun patches don't
>>> get in?
>> 
>> I do agree with you. I had wanted to keep them like this as their ordering 
>> matched
>> closer with how they are expected to be used. That said, I’m okay with moving
>> them around to avoid the sparseness.
> 
> Yeah, plus it breaks your table in the ioctl

Right. We had originally thought about adding reserved placeholders to
avoid the breakage in the ioctl table but have gone ahead and just reordered.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More ma

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-12 Thread Matthew R. Ochs
Hi Wendy,

Thanks for reviewing. Comments inline below.


-matt

> On Aug 11, 2015, at 11:18 PM, wenxi...@linux.vnet.ibm.com wrote:
> Quoting "Matthew R. Ochs" :
>> +struct ctx_info *get_context(struct cxlflash_cfg *cfg, u64 rctxid,
>> + void *arg, enum ctx_ctrl ctx_ctrl)
>> +{
>> +struct ctx_info *ctxi = NULL;
>> +struct lun_access *lun_access = NULL;
>> +struct file *file = NULL;
>> +struct llun_info *lli = arg;
>> +u64 ctxid = DECODE_CTXID(rctxid);
>> +int rc;
>> +pid_t pid = current->tgid, ctxpid = 0;
>> +
>> +if (ctx_ctrl & CTX_CTRL_FILE) {
>> +lli = NULL;
>> +file = (struct file *)arg;
>> +}
>> +
>> +if (ctx_ctrl & CTX_CTRL_CLONE)
>> +pid = current->parent->tgid;
>> +
>> +if (likely(ctxid < MAX_CONTEXT)) {
>> +retry:
>> +rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
>> +if (rc)
>> +goto out;
>> +
> 
> if (mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex))
>   goto out;
> or  return ctxi;

I’d like to keep the goto out so we can capture what is being returned
under a common trace that exists at ‘out’.

> 
>> +ctxi = cfg->ctx_tbl[ctxid];
>> +if (ctxi)
>> +if ((file && (ctxi->file != file)) ||
>> +(!file && (ctxi->ctxid != rctxid)))
>> +ctxi = NULL;
>> +
> 
> Should you combine two “if" to one "if"?

The logic in this routine is fairly complex. I intentionally structured
the conditional clauses in an effort to maximize the ease of quickly
comprehending what is taking place without polluting the code with
a lot of comments.

If you feel very strongly about this being a necessary fix and one that
would improve the quick comprehension of what is taking place than
I would be willing to consider changing this, but I feel it’s most clear
as is currently written.

> 
>> +if ((ctx_ctrl & CTX_CTRL_ERR) ||
>> +(!ctxi && (ctx_ctrl & CTX_CTRL_ERR_FALLBACK)))
>> +ctxi = find_error_context(cfg, rctxid, file);
>> +if (!ctxi) {
>> +mutex_unlock(&cfg->ctx_tbl_list_mutex);
>> +goto out;
>> +}
>> +
>> +/*
>> + * Need to acquire ownership of the context while still under
>> + * the table/list lock to serialize with a remove thread. Use
>> + * the 'try' to avoid stalling the table/list lock for a single
>> + * context.
>> + */
>> +rc = mutex_trylock(&ctxi->mutex);
>> +mutex_unlock(&cfg->ctx_tbl_list_mutex);
>> +if (!rc)
>> +goto retry;
>> +
>> +if (ctxi->unavail)
>> +goto denied;
>> +
>> +ctxpid = ctxi->pid;
>> +if (likely(!(ctx_ctrl & CTX_CTRL_NOPID)))
>> +if (pid != ctxpid)
>> +goto denied;
> 
> Should you combine above two “if" to one "if"?

Same response as above.

>> +spin_lock(&gli->slock);
>> +if (gli->mode == MODE_NONE)
>> +gli->mode = mode;
>> +else if (gli->mode != mode) {
>> +pr_err("%s: LUN operating in mode %d, requested mode %d\n",
>> +   __func__, gli->mode, mode);
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +gli->users++;
>> +WARN_ON(gli->users <= 0);
> 
> Does “gli->users" have upper limit?

No.

>> +void cxlflash_lun_detach(struct glun_info *gli)
>> +{
>> +spin_lock(&gli->slock);
>> +WARN_ON(gli->mode == MODE_NONE);
>> +if (--gli->users == 0)
>> +gli->mode = MODE_NONE;
>> +pr_debug("%s: gli->users=%u\n", __func__, gli->users);
>> +WARN_ON(gli->users < 0);
> 
> do you like to add a pr_debug(….) here?

I don’t quite follow what you mean. Are you suggesting to use a
pr_debug instead of a WARN?

These are intentionally made to be WARNs as they are indicative of
a driver bug and the data provided by the warn will help in tracking
down the root cause of the bug.

>> +
>> +out:
>> +if (unlock_ctx)
>> +mutex_unlock(&ctxi->mutex);
> 
> Should “mutex_lock(&ctxi->mutex);" in the same function?

This is something that Mikey Neuling also commented on. The mutex
is obtained in get_context(). Mikey made a suggestion that I’m going
to implement that will replace these with a ‘put_context’ routine.

>> +size = (MAX_RHT_PER_CONTEXT * sizeof(*ctxi->rht_lun));
>> +size += sizeof(*ctxi);
>> +
> 
> Combine above two lines code into one line code?

These allocations are now broken up, so this doesn’t exist anymore
as you’ll see in v5.

> 
>> +tmp = kzalloc(size, GFP_KERNEL);
>> +if (unlikely(!tmp)) {
>> +pr_err("%s: Unable to allocate context! (%ld)\n",
>> +   __func__, size);
>> +goto out;
>> +}
>> +
>> +rhte = (struct sisl_rht_entry *)g

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-11 Thread wenxiong


Quoting "Matthew R. Ochs" :


Add superpipe supporting infrastructure to device driver for the IBM CXL
Flash adapter. This patch allows userspace applications to take advantage
of the accelerated I/O features that this adapter provides and bypass the
traditional filesystem stack.

Signed-off-by: Matthew R. Ochs 
Signed-off-by: Manoj N. Kumar 
---
Documentation/ioctl/ioctl-number.txt |1 +
Documentation/powerpc/cxlflash.txt   |  297 +
drivers/scsi/cxlflash/Makefile   |2 +-
drivers/scsi/cxlflash/common.h   |   19 +
drivers/scsi/cxlflash/main.c |   21 +-
drivers/scsi/cxlflash/superpipe.c| 2206  
++

drivers/scsi/cxlflash/superpipe.h|  127 ++
include/uapi/scsi/Kbuild |1 +
include/uapi/scsi/cxlflash_ioctl.h   |  139 +++
9 files changed, 2810 insertions(+), 3 deletions(-)
create mode 100644 Documentation/powerpc/cxlflash.txt
create mode 100644 drivers/scsi/cxlflash/superpipe.c
create mode 100644 drivers/scsi/cxlflash/superpipe.h
create mode 100644 include/uapi/scsi/cxlflash_ioctl.h







diff --git a/drivers/scsi/cxlflash/superpipe.c  
b/drivers/scsi/cxlflash/superpipe.c

new file mode 100644
index 000..802f1f5
--- /dev/null
+++ b/drivers/scsi/cxlflash/superpipe.c



+struct ctx_info *get_context(struct cxlflash_cfg *cfg, u64 rctxid,
+void *arg, enum ctx_ctrl ctx_ctrl)
+{
+   struct ctx_info *ctxi = NULL;
+   struct lun_access *lun_access = NULL;
+   struct file *file = NULL;
+   struct llun_info *lli = arg;
+   u64 ctxid = DECODE_CTXID(rctxid);
+   int rc;
+   pid_t pid = current->tgid, ctxpid = 0;
+
+   if (ctx_ctrl & CTX_CTRL_FILE) {
+   lli = NULL;
+   file = (struct file *)arg;
+   }
+
+   if (ctx_ctrl & CTX_CTRL_CLONE)
+   pid = current->parent->tgid;
+
+   if (likely(ctxid < MAX_CONTEXT)) {
+retry:
+   rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
+   if (rc)
+   goto out;
+


if (mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex))
   goto out;
or  return ctxi;


+   ctxi = cfg->ctx_tbl[ctxid];
+   if (ctxi)
+   if ((file && (ctxi->file != file)) ||
+   (!file && (ctxi->ctxid != rctxid)))
+   ctxi = NULL;
+


Should you combine two "if" to one "if"?


+   if ((ctx_ctrl & CTX_CTRL_ERR) ||
+   (!ctxi && (ctx_ctrl & CTX_CTRL_ERR_FALLBACK)))
+   ctxi = find_error_context(cfg, rctxid, file);
+   if (!ctxi) {
+   mutex_unlock(&cfg->ctx_tbl_list_mutex);
+   goto out;
+   }
+
+   /*
+* Need to acquire ownership of the context while still under
+* the table/list lock to serialize with a remove thread. Use
+* the 'try' to avoid stalling the table/list lock for a single
+* context.
+*/
+   rc = mutex_trylock(&ctxi->mutex);
+   mutex_unlock(&cfg->ctx_tbl_list_mutex);
+   if (!rc)
+   goto retry;
+
+   if (ctxi->unavail)
+   goto denied;
+
+   ctxpid = ctxi->pid;
+   if (likely(!(ctx_ctrl & CTX_CTRL_NOPID)))
+   if (pid != ctxpid)
+   goto denied;


Should you combine above two "if" to one "if"?


+
+   if (lli) {
+   list_for_each_entry(lun_access, &ctxi->luns, list)
+   if (lun_access->lli == lli)
+   goto out;
+   goto denied;
+   }
+   }
+
+out:
+   pr_debug("%s: rctxid=%016llX ctxinfo=%p ctxpid=%u pid=%u ctx_ctrl=%u\n",
+__func__, rctxid, ctxi, ctxpid, pid, ctx_ctrl);
+
+   return ctxi;
+
+denied:
+   mutex_unlock(&ctxi->mutex);
+   ctxi = NULL;
+   goto out;
+}



+/**
+ * cxlflash_lun_attach() - attaches a user to a LUN and manages the  
LUN's mode

+ * @gli:   LUN to attach.
+ * @mode:  Desired mode of the LUN.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_lun_attach(struct glun_info *gli, enum lun_mode mode)
+{
+   int rc = 0;
+
+   spin_lock(&gli->slock);
+   if (gli->mode == MODE_NONE)
+   gli->mode = mode;
+   else if (gli->mode != mode) {
+   pr_err("%s: LUN operating in mode %d, requested mode %d\n",
+  __func__, gli->mode, mode);
+   rc = -EINVAL;
+   goto out;
+   }
+
+   gli->users++;
+   WARN_ON(gli->users <= 0);


Does "gli->users" have upper limit?


+out:
+   pr_debug("%s: Returning rc=%d gli->mode=%u gli->users=%u\n",
+__func__, rc, gli->mode, gli->users);
+   spin_unlock(&gli->sloc

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-11 Thread Michael Neuling

> >> +  pr_debug("%s: Wait for user context to quiesce...\n", __func__);
> >> +  wake_up_all(&cfg->limbo_waitq);
> >> +  ssleep(1);
> > 
> > Why 1 sec and why in a loop?  Can’t you poll/wait for completion somewhere?
> 
> This routine is called when the device is being removed and we need to stall 
> until
> the user contexts are made aware of removal [by marking them in error] and 
> have
> completely gone away [there are no longer any contexts in our table or list].
> 
> The 1 second sleep is simply to give the users time to cleanup once they see
> the notification. We can make it a larger/smaller value or remove it 
> entirely, but
> I felt that since this is not a hot path there was no reason to perform a 
> busy-wait
> style of loop here and yield to someone else.
> 
> As for the completion/poll, I did consider that but couldn’t come up with a 
> clean
> way of implementing given how we designed the notification/cleanup mechanism
> (really just a failed recovery). We can certainly look at doing that as an
> enhancement in the future.

Does the API allow this flexibility in the future?

> 
> >> +  if (likely(ctxid < MAX_CONTEXT)) {
> >> +retry:
> >> +  rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
> >> +  if (rc)
> >> +  goto out;
> >> +
> >> +  ctxi = cfg->ctx_tbl[ctxid];
> >> +  if (ctxi)
> >> +  if ((file && (ctxi->file != file)) ||
> >> +  (!file && (ctxi->ctxid != rctxid)))
> >> +  ctxi = NULL;
> >> +
> >> +  if ((ctx_ctrl & CTX_CTRL_ERR) ||
> >> +  (!ctxi && (ctx_ctrl & CTX_CTRL_ERR_FALLBACK)))
> >> +  ctxi = find_error_context(cfg, rctxid, file);
> >> +  if (!ctxi) {
> >> +  mutex_unlock(&cfg->ctx_tbl_list_mutex);
> >> +  goto out;
> >> +  }
> >> +
> >> +  /*
> >> +   * Need to acquire ownership of the context while still under
> >> +   * the table/list lock to serialize with a remove thread. Use
> >> +   * the 'try' to avoid stalling the table/list lock for a single
> >> +   * context.
> >> +   */
> >> +  rc = mutex_trylock(&ctxi->mutex);
> >> +  mutex_unlock(&cfg->ctx_tbl_list_mutex);
> >> +  if (!rc)
> >> +  goto retry;
> > 
> > Please just create a loop rather than this goto retry.
> 
> Okay.
> 
> >> +  rhte_checkin(ctxi, rhte);
> >> +  cxlflash_lun_detach(gli);
> >> +
> >> +out:
> >> +  if (unlock_ctx)
> >> +  mutex_unlock(&ctxi->mutex);
> > 
> > Where is the matching lock for this?
> 
> One of the main design points of our context serialization strategy is that
> any context returned from get_context() has been fully validated, will not
> be removed from under you, and _is holding the context mutex_. Thus
> for each of these mutex_unlock(ctxi) you see at the bottom of external
> entry points, the mutex was obtained in get_context().

Should we have something like put_context() that does this?  So there is
matching calls to get/put_context

> 
> >> +  char *tmp = NULL;
> >> +  size_t size;
> >> +  struct afu *afu = cfg->afu;
> >> +  struct ctx_info *ctxi = NULL;
> >> +  struct sisl_rht_entry *rhte;
> >> +
> >> +  size = (MAX_RHT_PER_CONTEXT * sizeof(*ctxi->rht_lun));
> >> +  size += sizeof(*ctxi);
> >> +
> >> +  tmp = kzalloc(size, GFP_KERNEL);
> > 
> > Just do two allocs. One for ctxi and one for rht_lun.  This is overly
> > complicated.
> 
> I disagree that it’s overly complicated. The intention here was to get this
> stuff together to avoid multiple function calls and possibly help out 
> perf-wise
> via locality. That said, I’m not opposed to performing multiple allocations.

I'm not sure I buy it's a performance issue on create_context().  How
often are you calling that?  Doesn't that do a lot of things other than
just this?

Anyway if you want to do that, that's ok I guess, but you need to
document why and what you're doing. 

> Look for this in v5.
> 
> > 
> >> +  if (unlikely(!tmp)) {
> >> +  pr_err("%s: Unable to allocate context! (%ld)\n",
> >> + __func__, size);
> >> +  goto out;
> >> +  }
> >> +
> >> +  rhte = (struct sisl_rht_entry *)get_zeroed_page(GFP_KERNEL);
> >> +  if (unlikely(!rhte)) {
> >> +  pr_err("%s: Unable to allocate RHT!\n", __func__);
> >> +  goto err;
> >> +  }
> >> +
> >> +  ctxi = (struct ctx_info *)tmp;
> >> +  tmp += sizeof(*ctxi);
> >> +  ctxi->rht_lun = (struct llun_info **)tmp;
> > 
> > Yuck… just do two allocs rather than this throbbing.
> 
> You got it.
> 
> >> +out:
> >> +  if (unlock_ctx)
> >> +  mutex_unlock(&ctxi->mutex);
> > 
> > Where is the matching lock for this?
> 
> See earlier comment about get_context().
> 
> >> +  if (likely(ctxi))
> >> +  mutex_unlock(&ctxi->mutex);
> > 
> > Where is the matching lock for this?
> 
> Ditto.
> 
> >> +  file = cxl_get_fd(ctx, &

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-11 Thread wenxiong


Quoting "Matthew R. Ochs" :


Add superpipe supporting infrastructure to device driver for the IBM CXL
Flash adapter. This patch allows userspace applications to take advantage
of the accelerated I/O features that this adapter provides and bypass the
traditional filesystem stack.

Signed-off-by: Matthew R. Ochs 
Signed-off-by: Manoj N. Kumar 
---
 Documentation/ioctl/ioctl-number.txt |1 +
 Documentation/powerpc/cxlflash.txt   |  297 +
 drivers/scsi/cxlflash/Makefile   |2 +-
 drivers/scsi/cxlflash/common.h   |   19 +
 drivers/scsi/cxlflash/main.c |   21 +-
 drivers/scsi/cxlflash/superpipe.c| 2206  
++

 drivers/scsi/cxlflash/superpipe.h|  127 ++
 include/uapi/scsi/Kbuild |1 +
 include/uapi/scsi/cxlflash_ioctl.h   |  139 +++
 9 files changed, 2810 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/powerpc/cxlflash.txt
 create mode 100644 drivers/scsi/cxlflash/superpipe.c
 create mode 100644 drivers/scsi/cxlflash/superpipe.h
 create mode 100644 include/uapi/scsi/cxlflash_ioctl.h

diff --git a/Documentation/ioctl/ioctl-number.txt  
b/Documentation/ioctl/ioctl-number.txt

index fdd35bf..67273e1 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -315,6 +315,7 @@ Code  Seq#(hex) Include FileComments
 0xC0   00-0F   linux/usb/iowarrior.h
 0xC9   00-0F   uapi/cxl-memcpy.h   Reserved for non-upstream prototype
 0xCA   00-0F   uapi/misc/cxl.h
+0xCA   80-8F   uapi/scsi/cxlflash_ioctl.h
 0xCB   00-1F   CBM serial IEC bus  in development:


 0xCD   01  linux/reiserfs_fs.h
diff --git a/Documentation/powerpc/cxlflash.txt  
b/Documentation/powerpc/cxlflash.txt

new file mode 100644
index 000..4a59f1a
--- /dev/null
+++ b/Documentation/powerpc/cxlflash.txt
@@ -0,0 +1,297 @@
+Introduction
+
+
+The IBM Power architecture provides support for CAPI (Coherent
+Accelerator Power Interface), which is available to certain PCIe slots
+on Power 8 systems. CAPI can be thought of as a special tunneling
+protocol through PCIe that allow PCIe adapters to look like special
+purpose co-processors which can read or write an application's
+memory and generate page faults. As a result, the host interface to
+an adapter running in CAPI mode does not require the data buffers to
+be mapped to the device's memory (IOMMU bypass) nor does it require
+memory to be pinned.
+
+On Linux, Coherent Accelerator (CXL) kernel services present CAPI
+devices as a PCI device by implementing a virtual PCI host bridge.
+This abstraction simplifies the infrastructure and programming
+model, allowing for drivers to look similar to other native PCI
+device drivers.
+
+CXL provides a mechanism by which user space applications can
+directly talk to a device (network or storage) bypassing the typical
+kernel/device driver stack. The CXL Flash Adapter Driver enables a
+user space application direct access to Flash storage.
+
+The CXL Flash Adapter Driver is a kernel module that sits in the
+SCSI stack as a low level device driver (below the SCSI disk and
+protocol drivers) for the IBM CXL Flash Adapter. This driver is
+responsible for the initialization of the adapter, setting up the
+special path for user space access, and performing error recovery. It
+communicates directly the Flash Accelerator Functional Unit (AFU)
+as described in Documentation/powerpc/cxl.txt.
+
+The cxlflash driver supports two, mutually exclusive, modes of
+operation at the device (LUN) level:
+
+- Any flash device (LUN) can be configured to be accessed as a
+  regular disk device (i.e.: /dev/sdc). This is the default mode.
+
+- Any flash device (LUN) can be configured to be accessed from
+  user space with a special block library. This mode further
+  specifies the means of accessing the device and provides for
+  either raw access to the entire LUN (referred to as direct
+  or physical LUN access) or access to a kernel/AFU-mediated
+  partition of the LUN (referred to as virtual LUN access). The
+  segmentation of a disk device into virtual LUNs is assisted
+  by special translation services provided by the Flash AFU.
+
+Overview
+
+
+The Coherent Accelerator Interface Architecture (CAIA) introduces a
+concept of a master context. A master typically has special privileges
+granted to it by the kernel or hypervisor allowing it to perform AFU
+wide management and control. The master may or may not be involved
+directly in each user I/O, but at the minimum is involved in the
+initial setup before the user application is allowed to send requests
+directly to the AFU.
+
+The CXL Flash Adapter Driver establish

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-11 Thread Manoj Kumar

Ben:

Comments inline below.


On 8/11/2015 12:29 AM, Benjamin Herrenschmidt wrote:

  So in a similar vein to the previous review, I am missing a lot of
context here but a few things did spring to me eyes:


Thanks for your review.


+   list_for_each_entry_safe(lli, temp, &cfg->lluns, list)
+   if (!memcmp(lli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN)) {
+   lli->newly_created = false;


This is weird ... a lookup effectively changes the state of the object
looked up... what for ? There is something oddball here.

It might be legit but in that case, you should really add a comment
explaining the logic around that 'newly_created' field.


As suggested later, will rename these functions in v5.


Also you drop the lock right below but you have no refcounting, are
these objects ever disposed of ?


These objects are long lived. The local lun info structure lives as
long as the card is available, and the global lun info is kept around
as long as the module is loaded.


In general, can you provide a clearer explanation of what are "global"
vs "local" LUNs ?


This is a good idea. Will clarify in v5.


Same ...


Same as above. Will address by renaming.


Make the function name more explicit: find_or_create_lun() for example.
I very much dislike a function called "lookup" that has side effects.


Good point. Will rename in v5.



+   lli = create_local(sdev, wwid);
+   if (unlikely(!lli))
+   goto out;


Similar question to earlier, you have no refcounting, should I assume
these things never get removed ?


Right, these are long lived.



+   lli->parent = gli;
+   spin_lock_irqsave(&cfg->slock, lock_flags);
+   list_add(&lli->list, &cfg->lluns);
+   spin_unlock_irqrestore(&cfg->slock, lock_flags);
+
+   spin_lock_irqsave(&global.slock, lock_flags);
+   list_add(&gli->list, &global.gluns);
+   spin_unlock_irqrestore(&global.slock, lock_flags);


Your locks are extremely fine grained... too much ? Any reason why you
don't have a simple/single lock handling all these ? IE, do you expect
frequent accesses ?

Also, a function called "lookup_something" that has the side effect of
adding that something to two lists doesn't look great to me. You may
want to review the function naming a bit.

Finally, what happens if two processes call this trying to effectively
create the same global LUN simultaneously ?

IE, can't you have a case where both lookup fail, then they both hit the
create_global() case for the same WWID ? Should you have a single lock
or a mutex wrapping the whole thing ? That would make the code a lot
simpler to review as well...


Good catch. Will look into simplifying to a mutex in v5, wrapping the
whole lookup/create sequence.



+void cxlflash_list_init(void)
+{
+   INIT_LIST_HEAD(&global.gluns);
+   spin_lock_init(&global.slock);
+   global.err_page = NULL;
+}


Wouldn't it make the code nicer to have all that LUN management in a
separate file ?


Good suggestion. Will look at moving these LUN management to a separate
file.


+   rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
+   if (rc)
+   goto out;


This can be interrupted by any signal, I assume your userspace deals
with it ?


That is correct. We do want to be interrupted by any signal (SIGTERM, 
SIGKILL, SIGINT etc.) at this point. We fail the context validation, and 
ultimately the ioctl and leave it to user-space to deal with it.




+   rc = mutex_trylock(&ctxi->mutex);
+   mutex_unlock(&cfg->ctx_tbl_list_mutex);
+   if (!rc)
+   goto retry;


Ouch.. that's a nasty one. Are you using the above construct to avoid
an A->B/B->A deadlock scenario where somebody else might be taking
the list mutex while holding the context one ?


No, this is not addressing a lock ordering issue. Will clarify with a
comment why the list mutex is being released and reacquired in the
retry loop.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-11 Thread Matthew R. Ochs
> On Aug 11, 2015, at 12:23 AM, Michael Neuling  wrote:
> 
> Some comments inline

Thanks for reviewing, responses below.

> On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
>> Add superpipe supporting infrastructure to device driver for the IBM CXL
>> Flash adapter. This patch allows userspace applications to take advantage
>> of the accelerated I/O features that this adapter provides and bypass the
>> traditional filesystem stack.
>> 
>> Signed-off-by: Matthew R. Ochs 
>> Signed-off-by: Manoj N. Kumar 
>> ---
>> Documentation/ioctl/ioctl-number.txt |1 +
>> Documentation/powerpc/cxlflash.txt   |  297 +
>> drivers/scsi/cxlflash/Makefile   |2 +-
>> drivers/scsi/cxlflash/common.h   |   19 +
>> drivers/scsi/cxlflash/main.c |   21 +-
>> drivers/scsi/cxlflash/superpipe.c| 2206 
>> ++
>> drivers/scsi/cxlflash/superpipe.h|  127 ++
>> include/uapi/scsi/Kbuild |1 +
>> include/uapi/scsi/cxlflash_ioctl.h   |  139 +++
>> 9 files changed, 2810 insertions(+), 3 deletions(-)
>> create mode 100644 Documentation/powerpc/cxlflash.txt
>> create mode 100644 drivers/scsi/cxlflash/superpipe.c
>> create mode 100644 drivers/scsi/cxlflash/superpipe.h
>> create mode 100644 include/uapi/scsi/cxlflash_ioctl.h
>> 
>> diff --git a/Documentation/ioctl/ioctl-number.txt 
>> b/Documentation/ioctl/ioctl-number.txt
>> index fdd35bf..67273e1 100644
>> --- a/Documentation/ioctl/ioctl-number.txt
>> +++ b/Documentation/ioctl/ioctl-number.txt
>> @@ -315,6 +315,7 @@ Code  Seq#(hex)  Include FileComments
>> 0xC0 00-0F   linux/usb/iowarrior.h
>> 0xC9 00-0F   uapi/cxl-memcpy.h   Reserved for non-upstream prototype
> 
> This above doesn't exist upstream.  Make sure your patch applies to a
> clean tree.

My mistake. Will fix in v5.

>> +
>> +DK_CXLFLASH_USER_VIRTUAL
>> +
>> +This ioctl is responsible for transitioning the LUN to virtual mode
>> +of access and configuring the AFU for virtual access from user space
>> +on a per-context basis. Additionally, the block size and last logical
>> +block address (LBA) are returned to the user.
>> +
>> +As mentioned previously, when operating in user space access mode,
>> +LUNs may be accessed in whole or in part. Only one mode is allowed
>> +at a time and if one mode is active (outstanding references exist),
>> +requests to use the LUN in a different mode are denied.
>> +
>> +The AFU is configured for virtual access from user space by adding
>> +an entry to the AFU's resource handle table. The index of the entry
>> +is treated as a resource handle that is returned to the user. The
>> +user is then able to use the handle to reference the LUN during I/O.
>> +
>> +By default, the virtual LUN is created with a size of 0. The user
>> +would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
>> +the virtual LUN to a desired size. To avoid having to perform this
>> +resize for the initial creation of the virtual LUN, the user has the
>> +option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
>> +ioctl, such that when success is returned to the user, the
>> +resource handle that is provided is already referencing provisioned
>> +storage. This is reflected by the last LBA being a non-zero value.
> 
> 
> This should be in the vlun patch.

Okay.

>> +DK_CXLFLASH_VLUN_RESIZE
>> +---
>> +This ioctl is responsible for resizing a previously created virtual
>> +LUN and will fail if invoked upon a LUN that is not in virtual
>> +mode. Upon success, an updated last LBA is returned to the user
>> +indicating the new size of the virtual LUN associated with the
>> +resource handle.
>> +
>> +The partitioning of virtual LUNs is jointly mediated by the cxlflash
>> +driver and the AFU. An allocation table is kept for each LUN that is
>> +operating in the virtual mode and used to program a LUN translation
>> +table that the AFU references when provided with a resource handle.
> 
> All this vlun discussion would be in the next patch not this superpipe patch.

Ditto, will do.

>> +DK_CXLFLASH_RELEASE
>> +---
>> +This ioctl is responsible for releasing a previously obtained
>> +reference to either a physical or virtual LUN. This can be
>> +thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
>> +DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
>> +is no longer valid and the entry in the resource handle table is
>> +made available to be used again.
>> +
>> +As part of the release process for virtual LUNs, the virtual LUN
>> +is first resized to 0 to clear out and free the translation tables
>> +associated with the virtual LUN reference.
> 
> Looks like file_ops release calls these functions anyway.  So why do we need
> this?

This is required because a user may hold multiple 

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-10 Thread Benjamin Herrenschmidt
On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
> Add superpipe supporting infrastructure to device driver for the IBM CXL
> Flash adapter. This patch allows userspace applications to take advantage
> of the accelerated I/O features that this adapter provides and bypass the
> traditional filesystem stack.

 So in a similar vein to the previous review, I am missing a lot of
context here but a few things did spring to me eyes:

> +/**
> + * lookup_local() - find a local LUN information structure by WWID
> + * @cfg: Internal structure associated with the host.
> + * @wwid:WWID associated with LUN.
> + *
> + * Return: Found local lun_info structure on success, NULL on failure
> + */
> +static struct llun_info *lookup_local(struct cxlflash_cfg *cfg, u8 *wwid)
> +{
> + struct llun_info *lli, *temp;
> + ulong lock_flags;
> +
> + spin_lock_irqsave(&cfg->slock, lock_flags);
> +
> + list_for_each_entry_safe(lli, temp, &cfg->lluns, list)
> + if (!memcmp(lli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN)) {
> + lli->newly_created = false;

This is weird ... a lookup effectively changes the state of the object
looked up... what for ? There is something oddball here.

It might be legit but in that case, you should really add a comment
explaining the logic around that 'newly_created' field.

Also you drop the lock right below but you have no refcounting, are
these objects ever disposed of ?

In general, can you provide a clearer explanation of what are "global"
vs "local" LUNs ?

> + spin_unlock_irqrestore(&cfg->slock, lock_flags);
> + return lli;
> + }
> +
> + spin_unlock_irqrestore(&cfg->slock, lock_flags);
> + return NULL;
> +}
> +
> +/**
> + * lookup_global() - find a global LUN information structure by WWID
> + * @wwid:WWID associated with LUN.
> + *
> + * Return: Found global lun_info structure on success, NULL on failure
> + */
> +static struct glun_info *lookup_global(u8 *wwid)
> +{
> + struct glun_info *gli, *temp;
> + ulong lock_flags;
> +
> + spin_lock_irqsave(&global.slock, lock_flags);
> +
> + list_for_each_entry_safe(gli, temp, &global.gluns, list)
> + if (!memcmp(gli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN)) {
> + spin_unlock_irqrestore(&global.slock, lock_flags);
> + return gli;
> + }
> +
> + spin_unlock_irqrestore(&global.slock, lock_flags);
> + return NULL;
> +}

Same ...

> +/**
> + * lookup_lun() - find or create a local LUN information structure
> + * @sdev:SCSI device associated with LUN.
> + * @wwid:WWID associated with LUN.
> + *
> + * When a local LUN is not found and a global LUN is also not found, both
> + * a global LUN and local LUN are created. The global LUN is added to the
> + * global list and the local LUN is returned.

Make the function name more explicit: find_or_create_lun() for example.
I very much dislike a function called "lookup" that has side effects.

> + * Return: Found/Allocated local lun_info structure on success, NULL on 
> failure
> + */
> +static struct llun_info *lookup_lun(struct scsi_device *sdev, u8 *wwid)
> +{
> + struct llun_info *lli = NULL;
> + struct glun_info *gli = NULL;
> + struct Scsi_Host *shost = sdev->host;
> + struct cxlflash_cfg *cfg = shost_priv(shost);
> + ulong lock_flags;
> +
> + if (unlikely(!wwid))
> + goto out;
> +
> + lli = lookup_local(cfg, wwid);
> + if (lli)
> + goto out;
> +
> + lli = create_local(sdev, wwid);
> + if (unlikely(!lli))
> + goto out;

Similar question to earlier, you have no refcounting, should I assume
these things never get removed ?

> + gli = lookup_global(wwid);
> + if (gli) {
> + lli->parent = gli;
> + spin_lock_irqsave(&cfg->slock, lock_flags);
> + list_add(&lli->list, &cfg->lluns);
> + spin_unlock_irqrestore(&cfg->slock, lock_flags);
> + goto out;
> + }
> +
> + gli = create_global(sdev, wwid);
> + if (unlikely(!gli)) {
> + kfree(lli);
> + lli = NULL;
> + goto out;
> + }
> +
> + lli->parent = gli;
> + spin_lock_irqsave(&cfg->slock, lock_flags);
> + list_add(&lli->list, &cfg->lluns);
> + spin_unlock_irqrestore(&cfg->slock, lock_flags);
> +
> + spin_lock_irqsave(&global.slock, lock_flags);
> + list_add(&gli->list, &global.gluns);
> + spin_unlock_irqrestore(&global.slock, lock_flags);

Your locks are extremely fine grained... too much ? Any reason why you
don't have a simple/single lock handling all these ? IE, do you expect
frequent accesses ?

Also, a function called "lookup_something" that has the side effect of
adding that something to two lists doesn't look great to me. You may
want to review the function naming a bit.

Finally, what happens if two processes call this trying 

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-10 Thread Michael Neuling
Some comments inline

On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
> Add superpipe supporting infrastructure to device driver for the IBM CXL
> Flash adapter. This patch allows userspace applications to take advantage
> of the accelerated I/O features that this adapter provides and bypass the
> traditional filesystem stack.
> 
> Signed-off-by: Matthew R. Ochs 
> Signed-off-by: Manoj N. Kumar 
> ---
>  Documentation/ioctl/ioctl-number.txt |1 +
>  Documentation/powerpc/cxlflash.txt   |  297 +
>  drivers/scsi/cxlflash/Makefile   |2 +-
>  drivers/scsi/cxlflash/common.h   |   19 +
>  drivers/scsi/cxlflash/main.c |   21 +-
>  drivers/scsi/cxlflash/superpipe.c| 2206 
> ++
>  drivers/scsi/cxlflash/superpipe.h|  127 ++
>  include/uapi/scsi/Kbuild |1 +
>  include/uapi/scsi/cxlflash_ioctl.h   |  139 +++
>  9 files changed, 2810 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/powerpc/cxlflash.txt
>  create mode 100644 drivers/scsi/cxlflash/superpipe.c
>  create mode 100644 drivers/scsi/cxlflash/superpipe.h
>  create mode 100644 include/uapi/scsi/cxlflash_ioctl.h
> 
> diff --git a/Documentation/ioctl/ioctl-number.txt 
> b/Documentation/ioctl/ioctl-number.txt
> index fdd35bf..67273e1 100644
> --- a/Documentation/ioctl/ioctl-number.txt
> +++ b/Documentation/ioctl/ioctl-number.txt
> @@ -315,6 +315,7 @@ Code  Seq#(hex)   Include FileComments
>  0xC0 00-0F   linux/usb/iowarrior.h
>  0xC9 00-0F   uapi/cxl-memcpy.h   Reserved for non-upstream prototype

This above doesn't exist upstream.  Make sure your patch applies to a
clean tree.


>  0xCA 00-0F   uapi/misc/cxl.h
> +0xCA 80-8F   uapi/scsi/cxlflash_ioctl.h
>  0xCB 00-1F   CBM serial IEC bus  in development:
>   
> 
>  0xCD 01  linux/reiserfs_fs.h
> diff --git a/Documentation/powerpc/cxlflash.txt 
> b/Documentation/powerpc/cxlflash.txt
> new file mode 100644
> index 000..4a59f1a
> --- /dev/null
> +++ b/Documentation/powerpc/cxlflash.txt
> @@ -0,0 +1,297 @@
> +Introduction
> +
> +
> +The IBM Power architecture provides support for CAPI (Coherent
> +Accelerator Power Interface), which is available to certain PCIe slots
> +on Power 8 systems. CAPI can be thought of as a special tunneling
> +protocol through PCIe that allow PCIe adapters to look like special
> +purpose co-processors which can read or write an application's
> +memory and generate page faults. As a result, the host interface to
> +an adapter running in CAPI mode does not require the data buffers to
> +be mapped to the device's memory (IOMMU bypass) nor does it require
> +memory to be pinned.
> +
> +On Linux, Coherent Accelerator (CXL) kernel services present CAPI
> +devices as a PCI device by implementing a virtual PCI host bridge.
> +This abstraction simplifies the infrastructure and programming
> +model, allowing for drivers to look similar to other native PCI
> +device drivers.
> +
> +CXL provides a mechanism by which user space applications can
> +directly talk to a device (network or storage) bypassing the typical
> +kernel/device driver stack. The CXL Flash Adapter Driver enables a
> +user space application direct access to Flash storage.
> +
> +The CXL Flash Adapter Driver is a kernel module that sits in the
> +SCSI stack as a low level device driver (below the SCSI disk and
> +protocol drivers) for the IBM CXL Flash Adapter. This driver is
> +responsible for the initialization of the adapter, setting up the
> +special path for user space access, and performing error recovery. It
> +communicates directly the Flash Accelerator Functional Unit (AFU)
> +as described in Documentation/powerpc/cxl.txt.
> +
> +The cxlflash driver supports two, mutually exclusive, modes of
> +operation at the device (LUN) level:
> +
> +- Any flash device (LUN) can be configured to be accessed as a
> +  regular disk device (i.e.: /dev/sdc). This is the default mode.
> +
> +- Any flash device (LUN) can be configured to be accessed from
> +  user space with a special block library. This mode further
> +  specifies the means of accessing the device and provides for
> +  either raw access to the entire LUN (referred to as direct
> +  or physical LUN access) or access to a kernel/AFU-mediated
> +  partition of the LUN (referred to as virtual LUN access). The
> +  segmentation of a disk device into virtual LUNs is assisted
> +  by special translation services provided by the Flash AFU.
> +
> +Overview
> +
> +
> +The Coherent Accelerator Interface Architecture (CAIA) introduces a
> +concept of a master context. A master typically has special privileges
> +granted to it by the kernel or hypervisor allowing