Re: [PATCH 2/2] block: sed-opal: fix sparse warning: convert __be64 data

2019-10-03 Thread Scott Bauer
On Wed, Oct 02, 2019 at 07:23:15PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> sparse warns about incorrect type when using __be64 data.
> It is not being converted to CPU-endian but it should be.
> 
> Fixes these sparse warnings:
> 
> ../block/sed-opal.c:375:20: warning: incorrect type in assignment (different 
> base types)
> ../block/sed-opal.c:375:20:expected unsigned long long [usertype] align
> ../block/sed-opal.c:375:20:got restricted __be64 const [usertype] 
> alignment_granularity
> ../block/sed-opal.c:376:25: warning: incorrect type in assignment (different 
> base types)
> ../block/sed-opal.c:376:25:expected unsigned long long [usertype] 
> lowest_lba
> ../block/sed-opal.c:376:25:got restricted __be64 const [usertype] 
> lowest_aligned_lba
> 
> Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
> Signed-off-by: Randy Dunlap 
> Cc: Scott Bauer 
> Cc: Rafael Antognolli 
> Cc: Jens Axboe 
> Cc: linux-bl...@vger.kernel.org

+ Jon and Revanth,


These look fine. They're currently unused, but may be useful in the future for 
sysfs or what ever else we add in.


Re: [PATCH 1/2] block: sed-opal: fix sparse warning: obsolete array init.

2019-10-03 Thread Scott Bauer
On Wed, Oct 02, 2019 at 07:23:05PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix sparse warning: (missing '=')
> ../block/sed-opal.c:133:17: warning: obsolete array initializer, use C99 
> syntax
> 
> Fixes: ff91064ea37c ("block: sed-opal: check size of shadow mbr")
> Signed-off-by: Randy Dunlap 
> Cc: Jens Axboe 
> Cc: linux-bl...@vger.kernel.org
> Cc: Jonas Rabenstein 
> Cc: David Kozub 
> ---

Un cc'd David and Jonas, +CC'd Jon and Revanth.

This looks fine to me too.

Reviewed-by: Scott Bauer 


Re: [PATCH 0/3] block: sed-opal: add support for shadow MBR done flag and write

2019-05-05 Thread Scott Bauer
On Fri, May 03, 2019 at 10:32:19PM +0200, David Kozub wrote:
> On Wed, 1 May 2019, Christoph Hellwig wrote:
> 
> > > I successfully tested toggling the MBR done flag and writing the shadow 
> > > MBR
> > > using some tools I hacked together[4] with a Samsung SSD 850 EVO drive.
> > 
> > Can you submit the tool to util-linux so that we get it into distros?
> 
> There is already Scott's sed-opal-temp[1] and a fork by Jonas that adds
> support for older version of these new IOCTLs[2]. There was already some
> discussion of getting that to util-linux.[3]
> 
> While I like my hack, sed-opal-temp can do much more (my tool supports just
> the few things I actually use). But there are two things which sed-opal-temp
> currently lacks which my hack has:
> 
> * It can use a PBKDF2 hash (salted by disk serial number) of the password
>   rather than the password directly. This makes it compatible with sedutil
>   and I think it's also better practice (as firmware can contain many
>   surprises).
> 
> * It contains a 'PBA' (pre-boot authorization) tool. A tool intended to be
>   run from shadow mbr that asks for a password and uses it to unlock all
>   disks and set shadow mbr done flag, so after restart the computer boots
>   into the real OS.
> 
> @Scott: What are your plans with sed-opal-temp? If you want I can update
> Jonas' patches to the adapted IOCTLs. What are your thoughts on PW hashing
> and a PBA tool?

I will accept any and all patches to sed opal tooling, I am not picky. I will
also give up maintainership of it is someone else feels they can (rightfully
so) do a better job.

Jon sent me a patch for the tool that will deal with writing to the shadow MBR,
so once we know these patches are going in i'll pull that patch into the tool.

Then I guess that leaves PBKDF2 which I don't think will be too hard to pull in.

With regard to your PBA tool, is that actually being run post-uefi/pre-linux?
IE are we writing your tool into the SMBR and that's what is being run on 
bootup?

Jon, if you think it's a good idea can you ask David if Revanth or you wants
to take over the tooling? Or if anyone else here wants to own it then let me 
know.



Re: [PATCH v4 10/16] block: sed-opal: add ioctl for done-mark of shadow mbr

2019-02-10 Thread Scott Bauer
On Fri, Feb 08, 2019 at 12:44:14AM +, Derrick, Jonathan wrote:
> On Thu, 2019-02-07 at 23:56 +0100, David Kozub wrote:
> > On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> > > >  static int opal_enable_disable_shadow_mbr(struct opal_dev *dev,
> > > >   struct opal_mbr_data 
> > > > *opal_mbr)
> > > >  {
> > > > +   u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > > > +   ? OPAL_TRUE : OPAL_FALSE;
> > > > const struct opal_step mbr_steps[] = {
> > > > { opal_discovery0, },
> > > > { start_admin1LSP_opal_session, _mbr->key },
> > > > -   { set_mbr_done, _mbr->enable_disable },
> > > > +   { set_mbr_done,  },

> > Am I missing something here? This seems wrong to me. And I think this 
> > patch actually changes it by introducing:
> > 
> > +u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > +? OPAL_TRUE : OPAL_FALSE;
> > 
> > which is essentially a negation (map 0 to 1 and 1 to 0).

Agreed the original code did the opposite of what the user wanted, looks like
when I authored it I messed up that enum which set everything off.



> > With regard to the new IOC_OPAL_MBR_STATUS: I find the usage of 
> > OPAL_MBR_ENABLE/DISABLE for this confusing: what should passing 
> > OPAL_MBR_ENABLE do? Should it enable the shadow MBR? Or should it 
> > enable the MBR-done flag? I think the implementation in this patch 
> > interprets OPAL_MBR_ENABLE as 'set the "done" flag to true', thus hiding 
> > the shadow MBR. But this is not obvious looking at the IOCTL name.

For the new ioctl I think we should just add a new enum with the correct
nomenclature.  So OPAL_MBR_DONE, OPAL_MBR_NOT_DONE.


> In order to keep the userspace interface consistent, I'll ACK your
> change in this patch, unless Scott can fill me in on why this looks
> wrong but is actually right.

I think it is just wrong. 


> 
> We have 7 bytes in the opal_mbr_data struct we could use for DONE/NOT
> DONE. I'm not sure how to go about keeping it consistent with old uapi,
> although arguably opal_enable_disable_shadow_mbr is already doing the
> wrong thing with DONE and ENABLE so it's low impact.

Can we keep the old mbr struct the same and just add a new struct with new enums
for the new done ioctl? I think this will keep the new ioctl cleaner instead
of trying to apply older, some what incorrectly named, enums.

Lastly someone will need to backport his

> > > > +   u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > > > +   ? OPAL_TRUE : OPAL_FALSE;

to stable so we can fix up my broken coding in older kernels.


I can do that or, if David wants to do that that's fine... just want to 
coordinate.









Re: [PATCH v4 10/16] block: sed-opal: add ioctl for done-mark of shadow mbr

2019-02-07 Thread Scott Bauer
On Fri, Feb 08, 2019 at 12:44:14AM +, Derrick, Jonathan wrote:
> On Thu, 2019-02-07 at 23:56 +0100, David Kozub wrote:
> > On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> > 
> > > On Fri, Feb 01, 2019 at 09:50:17PM +0100, David Kozub wrote:
> > > > From: Jonas Rabenstein 
> > > > 
> > > > Enable users to mark the shadow mbr as done without completely
> > > > deactivating the shadow mbr feature. This may be useful on reboots,
> > > > when the power to the disk is not disconnected in between and the shadow
> > > > mbr stores the required boot files. Of course, this saves also the
> > > > (few) commands required to enable the feature if it is already enabled
> > > > and one only wants to mark the shadow mbr as done.
> > > > 
> > > > Signed-off-by: Jonas Rabenstein 
> > > > 
> > > > Reviewed-by: Scott Bauer 
> > > > ---
> > > >  block/sed-opal.c  | 33 +++--
> > > >  include/linux/sed-opal.h  |  1 +
> > > >  include/uapi/linux/sed-opal.h |  1 +
> > > >  3 files changed, 33 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > > > index 4b0a63b9d7c9..e03838cfd31b 100644
> > > > --- a/block/sed-opal.c
> > > > +++ b/block/sed-opal.c
> > > > @@ -1996,13 +1996,39 @@ static int opal_erase_locking_range(struct 
> > > > opal_dev *dev,
> > > >  static int opal_enable_disable_shadow_mbr(struct opal_dev *dev,
> > > >   struct opal_mbr_data 
> > > > *opal_mbr)
> > > >  {
> > > > +   u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > > > +   ? OPAL_TRUE : OPAL_FALSE;
> > > > const struct opal_step mbr_steps[] = {
> > > > { opal_discovery0, },
> > > > { start_admin1LSP_opal_session, _mbr->key },
> > > > -   { set_mbr_done, _mbr->enable_disable },
> > > > +   { set_mbr_done,  },
> > > > { end_opal_session, },
> > > > { start_admin1LSP_opal_session, _mbr->key },
> > > > -   { set_mbr_enable_disable, _mbr->enable_disable },
> > > > +   { set_mbr_enable_disable,  },
> > > > +   { end_opal_session, },
> > > > +   { NULL, }
> > > 
> > > This seems to be a change of what we pass to set_mbr_done /
> > > set_mbr_enable_disable and not really related to the new functionality
> > > here, so it should be split into a separate patch.
> > > 
> > > That being said if we really care about this translation between
> > > the two sets of constants, why not do it inside
> > > set_mbr_done and set_mbr_enable_disable?
> > 
> > Hi Christoph,
> > 
> > I agree, this should be split. Furthermore I think I found an issue here: 
> > OPAL_MBR_ENABLE and OPAL_MBR_DISABLE are defined as follows:
> > 
> > enum opal_mbr {
> > OPAL_MBR_ENABLE = 0x0,
> > OPAL_MBR_DISABLE = 0x01,
> > };
> > 
> > ... while OPAL_TRUE and OPAL_FALSE tokens are:
> > 
> > OPAL_TRUE = 0x01,
> > OPAL_FALSE = 0x00,
> > 
> > so in the current code in kernel, when the IOCTL input is directly passed 
> > in place of the TRUE/FALSE tokens (in opal_enable_disable_shadow_mbr), 
> > passing OPAL_MBR_ENABLE (0) to IOC_OPAL_ENABLE_DISABLE_MBR ends up being 
> > interpreted as OPAL_FALSE (0) and passing OPAL_MBR_DISABLE (1) ended up 
> > being interpreted as OPAL_TRUE (1). So the behavior is:
> > 
> > OPAL_MBR_ENABLE: set MBR enable to OPAL_FALSE and done to OPAL_FALSE
> > OPAL_MBR_DISABLE: set MBR enable to OPAL_TRUE and done to OPAL_TRUE
> > 
> > Am I missing something here? This seems wrong to me. And I think this 
> > patch actually changes it by introducing:
> > 
> > +u8 token = opal_mbr->enable_disable == OPAL_MBR_ENABLE
> > +? OPAL_TRUE : OPAL_FALSE;
> > 
> > which is essentially a negation (map 0 to 1 and 1 to 0).
> > 
> > I had a strange feeling of IOC_OPAL_ENABLE_DISABLE_MBR behaving 
> > incorrectly when I first tried it. But when I checked later I was not able 
> > to reproduce it - probably originally I tested without this patch.
> > 
> > With regard to the new IOC_OPAL_MBR_STATUS: I find the usage of 
> > OPA

Re: [PATCH v4 00/16] block: sed-opal: support shadow MBR done flag and write

2019-02-04 Thread Scott Bauer
On Mon, Feb 04, 2019 at 07:04:15AM -0800, Christoph Hellwig wrote:
> On Fri, Feb 01, 2019 at 09:50:07PM +0100, David Kozub wrote:
> > This patch series extends SED OPAL support: it adds IOCTL for setting the 
> > shadow
> > MBR done flag which can be useful for unlocking an OPAL disk on boot and it 
> > adds
> > IOCTL for writing to the shadow MBR. Also included are some minor fixes and
> > improvements.
> 
> Actually most of it is really useful cleanups and small fixes for the
> existing OPAL code.  Any chance you could resend just those bits as
> a first series ASAP?  I'll try to spend some time to to review the
> actual feature additions in the meantime.
> 
> > There is a fork of sed-opal-temp that can use these new IOCTLs.[2] I tested
> > these on Samsung 840 EVO and 850 EVO drives, on x86-64 and arm64 systems.
> 
> Which brings up another question:  how do we get a properly maintained
> version of the sed-opal tool up ASAP?  It's been a bit bitrotting
> unfortunately, and the documentation and error handling hasn't been all
> that great to start with.  Are we going to find a good home for it?
> Util-linux for example?

I would also like to get it off my github and into a proper location as well.

Christoph, do you know off hand how we would get it in util-linux?



Re: [PATCH v3 00/16] block: sed-opal: support shadow MBR done flag and write

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:31PM +0100, David Kozub wrote:
> This patch series extends OPAL support: it adds IOCTL for setting the shadow
> MBR done flag which can be useful for unlocking an OPAL disk on boot and it 
> adds
> IOCTL for writing to the shadow MBR. Also included are some minor fixes and
> improvements.
> 
> This series is based on the original work done by Jonas Rabenstein which was
> submitted in March 2018.[1] I tried to apply suggestions made in review on the
> list and do some further improvements.
> 
> The most contentious issue in the original series was the IOCTL for shadow MBR
> write but I think no better approach was found[2] so this was not changed. I'm
> open to suggestions.
> 
> There is a fork of sed-opal-temp that can use these new IOCTLs.[3] I tested
> these on Samsung 840 EVO and 850 EVO drives, on x86-64 and arm64 systems.
> 
> The series applies on v5.0-rc3.
> 
> Changes from v2 to v3:
> * review suggestions from Scott Bauer
> 
> As Scott suggested I tried to do a more thorough testing, esp. with things 
> like
> wrong passwords/invalid values. I did not observe any crash or unexpected
> behavior.
> 
> David Kozub (8):
>   block: sed-opal: fix typos and formatting
>   block: sed-opal: close parameter list in cmd_finalize
>   block: sed-opal: unify cmd start
>   block: sed-opal: unify error handling of responses
>   block: sed-opal: reuse response_get_token to decrease code duplication
>   block: sed-opal: pass steps via argument rather than via opal_dev
>   block: sed-opal: don't repeat opal_discovery0 in each steps array
>   block: sed-opal: rename next to execute_steps
> 

You forgot to send 16/16 again in v3. I pulled and tested off your github page.
If you send a v4 or just resend that patch you can add my reviewed-by, as I 
reviewed it
on my tree/from your github.



Re: [PATCH v3 15/16] block: sed-opal: don't repeat opal_discovery0 in each steps array

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:46PM +0100, David Kozub wrote:
> Originally each of the opal functions that call next include
> opal_discovery0 in the array of steps. This is superfluous and
> can be done always inside next.
> 
> Signed-off-by: David Kozub 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 14/16] block: sed-opal: pass steps via argument rather than via opal_dev

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:45PM +0100, David Kozub wrote:
> The steps argument is only read by the next function, so it can
> be passed directly as an argument rather than via opal_dev.
> 
> Normally, the steps is an array on the stack, so the pointer stops
> being valid then the function that set opal_dev.steps returns.
> If opal_dev.steps was not set to NULL before return it would become
> a dangling pointer. When the steps are passed as argument this
> becomes easier to see and more difficult to misuse.
> 
> Signed-off-by: David Kozub 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 13/16] block: sed-opal: check size of shadow mbr

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:44PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Check whether the shadow mbr does fit in the provided space on the
> target. Also a proper firmware should handle this case and return an
> error we may prevent problems or even damage with crappy firmwares.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 12/16] block: sed-opal: unify retrieval of table columns

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:43PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Instead of having multiple places defining the same argument list to get
> a specific column of a sed-opal table, provide a generic version and
> call it from those functions.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 11/16] block: sed-opal: ioctl for writing to shadow mbr

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:42PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Allow modification of the shadow mbr. If the shadow mbr is not marked as
> done, this data will be presented read only as the device content. Only
> after marking the shadow mbr as done and unlocking a locking range the
> actual content is accessible.
> 
> Co-authored-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 10/16] block: sed-opal: add ioctl for done-mark of shadow mbr

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:41PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Enable users to mark the shadow mbr as done without completely
> deactivating the shadow mbr feature. This may be useful on reboots,
> when the power to the disk is not disconnected in between and the shadow
> mbr stores the required boot files. Of course, this saves also the
> (few) commands required to enable the feature if it is already enabled
> and one only wants to mark the shadow mbr as done.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 07/16] block: sed-opal: reuse response_get_token to decrease code duplication

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:38PM +0100, David Kozub wrote:
> response_get_token had already been in place, its functionality had
> been duplicated within response_get_{u64,bytestring} with the same error
> handling. Unify the handling by reusing response_get_token within the
> other functions.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 09/16] block: sed-opal: split generation of bytestring header and content

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:40PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Split the header generation from the (normal) memcpy part if a
> bytestring is copied into the command buffer. This allows in-place
> generation of the bytestring content. For example, copy_from_user may be
> used without an intermediate buffer.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 08/16] block: sed-opal: print failed function address

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:39PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Add function address (and if available its symbol) to the message if a
> step function fails.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 



Re: [PATCH v3 02/16] block: sed-opal: use correct macro for method length

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:33PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Also the values of OPAL_UID_LENGTH and OPAL_METHOD_LENGTH are the same,
> it is weird to use OPAL_UID_LENGTH for the definition of the methods.
> 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 03/16] block: sed-opal: unify space check in add_token_*

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:34PM +0100, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> All add_token_* functions have a common set of conditions that have to
> be checked. Use a common function for those checks in order to avoid
> different behaviour as well as code duplication.
> 
> Co-authored-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 05/16] block: sed-opal: unify cmd start

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:36PM +0100, David Kozub wrote:
> Every step starts with resetting the cmd buffer as well as the comid and
> constructs the appropriate OPAL_CALL command. Consequently, those
> actions may be combined into one generic function. On should take care
> that the opening and closing tokens for the argument list are already
> emitted by cmd_start and cmd_finalize respectively and thus must not be
> additionally added.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 06/16] block: sed-opal: unify error handling of responses

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:37PM +0100, David Kozub wrote:
> response_get_{string,u64} include error handling for argument resp being
> NULL but response_get_token does not handle this.
> 
> Make all three of response_get_{string,u64,token} handle NULL resp in
> the same way.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 04/16] block: sed-opal: close parameter list in cmd_finalize

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:35PM +0100, David Kozub wrote:
> Every step ends by calling cmd_finalize (via finalize_and_send)
> yet every step adds the token OPAL_ENDLIST on its own. Moving
> this into cmd_finalize decreases code duplication.
> 
> Co-authored-by: Jonas Rabenstein 
> Signed-off-by: David Kozub 
> Signed-off-by: Jonas Rabenstein 
Reviewed-by: Scott Bauer 


Re: [PATCH v3 01/16] block: sed-opal: fix typos and formatting

2019-01-27 Thread Scott Bauer
On Tue, Jan 22, 2019 at 11:31:32PM +0100, David Kozub wrote:
> This should make no change in functionality.
> The formatting changes were triggered by checkpatch.pl.
> 
> Signed-off-by: David Kozub 
Reviewed-by: Scott Bauer 


Re: [PATCH v2 11/16] block: sed-opal: ioctl for writing to shadow mbr

2019-01-20 Thread Scott Bauer
On Sun, Jan 20, 2019 at 11:27:30AM +0100, David Kozub wrote:
> On Sat, 19 Jan 2019, Scott Bauer wrote:
> 
> > On Thu, Jan 17, 2019 at 09:31:51PM +, David Kozub wrote:
> > 
> > > +static int write_shadow_mbr(struct opal_dev *dev, void *data)
> > > +{
> > > + struct opal_shadow_mbr *shadow = data;
> > > + const u8 __user *src;
> > > + u8 *dst;
> > > + size_t off = 0;
> > > + u64 len;
> > > + int err = 0;
> > > +
> > > + /* do the actual transmission(s) */
> > > + src = (u8 *) shadow->data;
> > > + while (off < shadow->size) {
> > > + err = cmd_start(dev, opaluid[OPAL_MBR], opalmethod[OPAL_SET]);
> > > + add_token_u8(, dev, OPAL_STARTNAME);
> > > + add_token_u8(, dev, OPAL_WHERE);
> > > + add_token_u64(, dev, shadow->offset + off);
> > > + add_token_u8(, dev, OPAL_ENDNAME);
> > > +
> > > + add_token_u8(, dev, OPAL_STARTNAME);
> > > + add_token_u8(, dev, OPAL_VALUES);
> > > +
> > > + /*
> > > +  * The bytestring header is either 1 or 2 bytes, so assume 2.
> > > +  * There also needs to be enough space to accommodate the
> > > +  * trailing OPAL_ENDNAME (1 byte) and tokens added by
> > > +  * cmd_finalize.
> > > +  */
> > > + len = min(remaining_size(dev) - (2+1+CMD_FINALIZE_BYTES_NEEDED),
> > > +   (size_t)(shadow->size - off));
> > 
> > What if remaining_size(dev) <  2 + 1 + CMD_FINALIZE_BYTES_NEEDED? If that's 
> > possible we
> > get min(UINT_MAX(ish) , some size larger than our remaining buffer) and 
> > that's not good.
> 
> This is only possible for uselessly small values of IO_BUFFER_LENGTH, which
> is a compile-time value. Originally I thought it's OK as nobody would set
> the value so low. But on a second thought, after reading your comment, I
> think that even if IO_BUFFER_LENGTH was set to such a value, the code should
> fail gracefully.

Naw, just leave it the way it is. I didnt follow cmd_start down deep enough to 
realize we reset
the position on cmd_start, so there is no way for my scenario to happen. We'll 
never change IO_BUFFER_LENGTH
to something small. If someone wants to write a bug in their custom kernel, 
it's not our problem.


Re: [PATCH v2 15/16] block: sed-opal: don't repeat opal_discovery0 in each steps array

2019-01-19 Thread Scott Bauer
On Thu, Jan 17, 2019 at 09:31:55PM +, David Kozub wrote:

> - for (state = 0; !error && state < n_steps; state++) {
> - step = [state];
> -
> - error = step->fn(dev, step->data);
> - if (error) {
> - pr_debug("Step %zu (%pS) failed with error %d: %s\n",
> -  state, step->fn, error,
> -  opal_error_to_human(error));
> -
> - /* For each OPAL command we do a discovery0 then we
> -  * start some sort of session.
> -  * If we haven't passed state 1 then there was an error
> -  * on discovery0 or during the attempt to start a
> -  * session. Therefore we shouldn't attempt to terminate
> -  * a session, as one has not yet been created.
> -  */
> - if (state > 1) {
> - end_opal_session_error(dev);
> - return error;
> - }




> + /* first do a discovery0 */
> + error = opal_discovery0_step(dev);
>  
> - }
> - }
> + for (state = 0; !error && state < n_steps; state++)
> + error = execute_step(dev, [state], state);
> +
> + /* For each OPAL command the first step in steps starts some sort
> +  * of session. If an error occurred in the initial discovery0 or if
> +  * an error stopped the loop in state 0 then there was an error
> +  * before or during the attempt to start a session. Therefore we
> +  * shouldn't attempt to terminate a session, as one has not yet
> +  * been created.
> +  */
> + if (error && state > 0)
> + end_opal_session_error(dev);


This is subtly wrong. There was some state that was encoded by having the
loop the way we did.

the tl;dr is the check needs to be if (error && state > 1) still.

The why is that in the previous implementation we wanted to 
end_opal_session_error
only if a successful discovery0 AND a successful start_*_session. In the 
previous loop,
discovery0 would complete, we'd do state++, then we'd go and attempt to start 
our
session. If we failed on that session starting we'd still be in state 1, and 
wouldn't
attempt to close the session.

In the current form, discovery0 is gone, so start session is on state 0. If we 
fail
on the start session, we set error = true, state gets ++'d, then we look at 
!error
and we don't loop again.

We go down to the check and attempt to end a session that was never started.




> -- 
> 2.20.1
> 
> 


Re: [PATCH v2 13/16] block: sed-opal: check size of shadow mbr

2019-01-19 Thread Scott Bauer
On Thu, Jan 17, 2019 at 09:31:53PM +, David Kozub wrote:
> From: Jonas Rabenstein 
> 
> Check whether the shadow mbr does fit in the provided space on the
> target. Also a proper firmware should handle this case and return an
> error we may prevent problems or even damage with crappy firmwares.
> + len = response_get_u64(>parsed, 4);
> + if (shadow->offset + shadow->size > len) {
> + pr_debug("MBR: does not fit in shadow (%llu vs. %llu)\n",
> +  shadow->offset + shadow->size, len);
> + return -ENOSPC;
> + }

Can we please change this check to the following:

if (shadow->size > len || shadow->offset > len - shadow->size)

Thanks

> -- 
> 2.20.1
> 
> 


Re: [PATCH v2 11/16] block: sed-opal: ioctl for writing to shadow mbr

2019-01-19 Thread Scott Bauer
On Thu, Jan 17, 2019 at 09:31:51PM +, David Kozub wrote:
 
> +static int write_shadow_mbr(struct opal_dev *dev, void *data)
> +{
> + struct opal_shadow_mbr *shadow = data;
> + const u8 __user *src;
> + u8 *dst;
> + size_t off = 0;
> + u64 len;
> + int err = 0;
> +
> + /* do the actual transmission(s) */
> + src = (u8 *) shadow->data;
> + while (off < shadow->size) {
> + err = cmd_start(dev, opaluid[OPAL_MBR], opalmethod[OPAL_SET]);
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_WHERE);
> + add_token_u64(, dev, shadow->offset + off);
> + add_token_u8(, dev, OPAL_ENDNAME);
> +
> + add_token_u8(, dev, OPAL_STARTNAME);
> + add_token_u8(, dev, OPAL_VALUES);
> +
> + /*
> +  * The bytestring header is either 1 or 2 bytes, so assume 2.
> +  * There also needs to be enough space to accommodate the
> +  * trailing OPAL_ENDNAME (1 byte) and tokens added by
> +  * cmd_finalize.
> +  */
> + len = min(remaining_size(dev) - (2+1+CMD_FINALIZE_BYTES_NEEDED),
> +   (size_t)(shadow->size - off));

What if remaining_size(dev) <  2 + 1 + CMD_FINALIZE_BYTES_NEEDED? If that's 
possible we
get min(UINT_MAX(ish) , some size larger than our remaining buffer) and that's 
not good.


> + pr_debug("MBR: write bytes %zu+%llu/%llu\n",
> +  off, len, shadow->size);
> +
> + dst = add_bytestring_header(, dev, len);
> + if (!dst)
> + break;
> + if (copy_from_user(dst, src + off, len))
> + err = -EFAULT;
> -- 
> 2.20.1
> 
> 


Re: [PATCH v2 09/16] block: sed-opal: split generation of bytestring header and content

2019-01-19 Thread Scott Bauer
On Thu, Jan 17, 2019 at 09:31:49PM +, David Kozub wrote:
> -
> - memcpy(>cmd[cmd->pos], bytestring, len);
> + start = >cmd[cmd->pos];
>   cmd->pos += len;

This is somewhat pendatic, but it helps me review patches if we keep things 
together.
Since we're no longer doing the memcpy in this function, can we please move the 
cmd->pos += len
to the location where we actually do the memcpy.

I'm willing to be told to get over it if other reviewers don't like that 
approach, but if no one cares
please move it.


> + return start;
> +}
>  
> +static void add_token_bytestring(int *err, struct opal_dev *cmd,
> +  const u8 *bytestring, size_t len)
> +{
> + u8 *start;
> +
> + start = add_bytestring_header(err, cmd, len);
> + if (!start)
> + return;
> + memcpy(start, bytestring, len);

Do the above here instead.

> 2.20.1
> 
> 


Re: [PATCH 00/16] block: sed-opal: support shadow MBR done flag and write

2019-01-17 Thread Scott Bauer
On Wed, Jan 16, 2019 at 09:56:51PM +, David Kozub wrote:
>   block: sed-opal: rename next to execute_steps

I don't see this change anywhere on the lists. I have 0->15, but am missing 
16/16.

Please resend v2 with the changes Christoph requested, so it's easier to review.

Lastly, please CC me on the changes, and I will review them:
sba...@plzdonthack.me

I don't work at Intel anymore -- maintainers never got updated.


> Jonas Rabenstein (8):
>   block: sed-opal: use correct macro for method length
>   block: sed-opal: unify space check in add_token_*
>   block: sed-opal: print failed function address
>   block: sed-opal: split generation of bytestring header and content
>   block: sed-opal: add ioctl for done-mark of shadow mbr
>   block: sed-opal: ioctl for writing to shadow mbr
>   block: sed-opal: unify retrieval of table columns
>   block: sed-opal: check size of shadow mbr
> 
>  block/opal_proto.h|  18 +
>  block/sed-opal.c  | 843 +-
>  include/linux/sed-opal.h  |   2 +
>  include/uapi/linux/sed-opal.h |   9 +
>  4 files changed, 449 insertions(+), 423 deletions(-)
> 
> -- 
> 2.20.1


Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-04-05 Thread Scott Bauer
On Thu, Mar 29, 2018 at 08:27:30PM +0200, catch...@ghostav.ddnss.de wrote:
> On Thu, Mar 29, 2018 at 11:16:42AM -0600, Scott Bauer wrote:
> > Yeah, having to autheticate to write the MBR is a real bummer. Theoretically
> > you could dd a the pw struct + the shador MBR into sysfs. But that's
> > a pretty disgusting hack just to use sysfs. The other method I thought of
> > was to authenticate via ioctl then write via sysfs. We already save the PW
> > in-kernel for unlocks, so perhaps we can re-use the save-for-unlock to
> > do shadow MBR writes via sysfs?
> > 
> > Re-using an already exposed ioctl for another purpose seems somewhat 
> > dangerous?
> > In the sense that what if the user wants to write the smbr but doesn't want 
> > to
> > unlock on suspends, or does not want their PW hanging around in the kernel.
> Well. If we would force the user to a two-step interaction, why not stay
> completely in sysfs? So instead of using the save-for-unlock ioctl, we
> could export each security provider( (AdminSP, UserSPX, ...) as a sysfs

The Problem with this is Single user mode, where you can assign users to 
locking ranges.
There would have to be a lot of dynamic changes of sysfs as users get 
added/removed,
or added to LRs etc. It seems like we're trying mold something that already
works fine into something that doesnt really work as we dig into the details. 



> directory with appropriate files (e.g. mbr for AdminSP) as well as a
> 'unlock' file to store a users password for the specific locking space
> and a 'lock' file to remove the stored password on write to it.
> Of course, while this will prevent from reuse of the ioctl and
> stays within the same configuration method, the PW will still hang
> around in the kernel between 'lock' and 'unlock'.
> 
> Another idea I just came across while writing this down:
> Instead of storing/releasing the password permanently with the 'unlock' and
> 'lock' files, those may be used to start/stop an authenticated session.
> To make it more clear what I mean: Each ioctl that requires
> authentication has a similar pattern:
> discovery0, start_session, , end_session
> Instead of having the combination determined by the ioctl, the 'unlock'
> would do discovery0 and start_session while the 'lock' would do the
> end_session. The user is free to issue further commands with the
> appropriate write/reads to other files of the sysfs-directory.
> While this removes the requirement to store the key within kernel space,
> the open session handle may be used from everybody with permissions for
> read/write access to the sysfs-directory files. So this is not optimal
> as not only the user who provided the password will finally be able to use
> it.

I generally like the idea of being able to run your abritrary opal commands, 
but:

that's probably not going to work for the final reason you outlined.
Even though it's root only access(to sysfs) we're breaking the authentication
lower down by essentially allowing any opal command to be ran if you've somehow
become root.

The other issue with this is the session time out in opal. When we dispatch the 
commands
in-kernel we're hammering them out 1-by-1. If the user needs to do an 
activatelsp,
setuplr, etc. They do that with a new session.

If someone starts the session and it times out it may be hard to figure out how 
to not
get an SP_BUSY back from the controller. I've in the past just had to wipe my 
damn
fw to get out of SP_BUSYs, but that could be due to the early implementations I 
was
dealing with.



> I already did some basic work to split of the session-information from
> the opal_dev struct (initially to reduce the memory-footprint of devices with
> currently no active opal-interaction). So I think, I could get a
> proof-of-concept of this approach within the next one or two weeks if
> there are no objections to the base idea.

Sorry to ocme back a week later, but if you do have anything it would be at 
least
interesting to see. I would still prefer the ioctl route, but will review and 
test
any implementation people deem acceptable. 


Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-04-05 Thread Scott Bauer
On Thu, Mar 29, 2018 at 08:27:30PM +0200, catch...@ghostav.ddnss.de wrote:
> On Thu, Mar 29, 2018 at 11:16:42AM -0600, Scott Bauer wrote:
> > Yeah, having to autheticate to write the MBR is a real bummer. Theoretically
> > you could dd a the pw struct + the shador MBR into sysfs. But that's
> > a pretty disgusting hack just to use sysfs. The other method I thought of
> > was to authenticate via ioctl then write via sysfs. We already save the PW
> > in-kernel for unlocks, so perhaps we can re-use the save-for-unlock to
> > do shadow MBR writes via sysfs?
> > 
> > Re-using an already exposed ioctl for another purpose seems somewhat 
> > dangerous?
> > In the sense that what if the user wants to write the smbr but doesn't want 
> > to
> > unlock on suspends, or does not want their PW hanging around in the kernel.
> Well. If we would force the user to a two-step interaction, why not stay
> completely in sysfs? So instead of using the save-for-unlock ioctl, we
> could export each security provider( (AdminSP, UserSPX, ...) as a sysfs

The Problem with this is Single user mode, where you can assign users to 
locking ranges.
There would have to be a lot of dynamic changes of sysfs as users get 
added/removed,
or added to LRs etc. It seems like we're trying mold something that already
works fine into something that doesnt really work as we dig into the details. 



> directory with appropriate files (e.g. mbr for AdminSP) as well as a
> 'unlock' file to store a users password for the specific locking space
> and a 'lock' file to remove the stored password on write to it.
> Of course, while this will prevent from reuse of the ioctl and
> stays within the same configuration method, the PW will still hang
> around in the kernel between 'lock' and 'unlock'.
> 
> Another idea I just came across while writing this down:
> Instead of storing/releasing the password permanently with the 'unlock' and
> 'lock' files, those may be used to start/stop an authenticated session.
> To make it more clear what I mean: Each ioctl that requires
> authentication has a similar pattern:
> discovery0, start_session, , end_session
> Instead of having the combination determined by the ioctl, the 'unlock'
> would do discovery0 and start_session while the 'lock' would do the
> end_session. The user is free to issue further commands with the
> appropriate write/reads to other files of the sysfs-directory.
> While this removes the requirement to store the key within kernel space,
> the open session handle may be used from everybody with permissions for
> read/write access to the sysfs-directory files. So this is not optimal
> as not only the user who provided the password will finally be able to use
> it.

I generally like the idea of being able to run your abritrary opal commands, 
but:

that's probably not going to work for the final reason you outlined.
Even though it's root only access(to sysfs) we're breaking the authentication
lower down by essentially allowing any opal command to be ran if you've somehow
become root.

The other issue with this is the session time out in opal. When we dispatch the 
commands
in-kernel we're hammering them out 1-by-1. If the user needs to do an 
activatelsp,
setuplr, etc. They do that with a new session.

If someone starts the session and it times out it may be hard to figure out how 
to not
get an SP_BUSY back from the controller. I've in the past just had to wipe my 
damn
fw to get out of SP_BUSYs, but that could be due to the early implementations I 
was
dealing with.



> I already did some basic work to split of the session-information from
> the opal_dev struct (initially to reduce the memory-footprint of devices with
> currently no active opal-interaction). So I think, I could get a
> proof-of-concept of this approach within the next one or two weeks if
> there are no objections to the base idea.

Sorry to ocme back a week later, but if you do have anything it would be at 
least
interesting to see. I would still prefer the ioctl route, but will review and 
test
any implementation people deem acceptable. 


Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-03-29 Thread Scott Bauer
On Thu, Mar 29, 2018 at 07:30:02PM +0200, Jonas Rabenstein wrote:
> Hi,
> On Wed, Mar 21, 2018 at 02:43:21AM +0100, Jonas Rabenstein wrote:
> > On Tue, Mar 20, 2018 at 04:09:08PM -0600, Scott Bauer wrote:
> > > On Tue, Mar 20, 2018 at 10:36:04AM +0100, Jonas Rabenstein wrote:
> > > > On Mon, Mar 19, 2018 at 08:52:24PM +0100, Christoph Hellwig wrote:
> > > > > On Mon, Mar 19, 2018 at 07:36:50PM +0100, Jonas Rabenstein wrote:
> > > > > I hate doing this as an ioctls.  Can we make this a sysfs binary file
> > > > > so that people can use dd or cat to write the shadow mbr?
> > > > I already thought about providing a sysfs interface for all that instead
> > > > of using ioctls. But as I am pretty new to kernel programming I do not
> > > > have all the required insight. Especially, as writing the mbr requires
> > > > the sed-opal password I am unsure how a clean sysfs interface to provide
> > > > the password together with a simple dd would look like.
> Just wanted to ask, how to proceed with those patches/what I should do.
> Using sysfs instead of an ioctl is probably easier to use from userspace
> _if_ there is a good way to provide the password - which I do not know
> of :(
> If nobody else could think of a solution, shall writes to the shadow mbr
> remain unsupported?
> 
> I'ld really appreciate feedback and possible solutions,
>  Jonas

Yeah, having to autheticate to write the MBR is a real bummer. Theoretically
you could dd a the pw struct + the shador MBR into sysfs. But that's
a pretty disgusting hack just to use sysfs. The other method I thought of
was to authenticate via ioctl then write via sysfs. We already save the PW
in-kernel for unlocks, so perhaps we can re-use the save-for-unlock to
do shadow MBR writes via sysfs?

Re-using an already exposed ioctl for another purpose seems somewhat dangerous?
In the sense that what if the user wants to write the smbr but doesn't want to
unlock on suspends, or does not want their PW hanging around in the kernel.

Overall I think the ioctl is still the best path forward due to the 
authentication
problem. But am still willing to hear others opinions if they do have an idea.

I can say yes to the Ioctl, but we really need Jens and Christophs Okay on it.

I have some free time tomorrow to work on this, so let me goof with it tomorrow 
and
over the weekend and I'll see if there is a sane way to get sysfs to work.



Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-03-29 Thread Scott Bauer
On Thu, Mar 29, 2018 at 07:30:02PM +0200, Jonas Rabenstein wrote:
> Hi,
> On Wed, Mar 21, 2018 at 02:43:21AM +0100, Jonas Rabenstein wrote:
> > On Tue, Mar 20, 2018 at 04:09:08PM -0600, Scott Bauer wrote:
> > > On Tue, Mar 20, 2018 at 10:36:04AM +0100, Jonas Rabenstein wrote:
> > > > On Mon, Mar 19, 2018 at 08:52:24PM +0100, Christoph Hellwig wrote:
> > > > > On Mon, Mar 19, 2018 at 07:36:50PM +0100, Jonas Rabenstein wrote:
> > > > > I hate doing this as an ioctls.  Can we make this a sysfs binary file
> > > > > so that people can use dd or cat to write the shadow mbr?
> > > > I already thought about providing a sysfs interface for all that instead
> > > > of using ioctls. But as I am pretty new to kernel programming I do not
> > > > have all the required insight. Especially, as writing the mbr requires
> > > > the sed-opal password I am unsure how a clean sysfs interface to provide
> > > > the password together with a simple dd would look like.
> Just wanted to ask, how to proceed with those patches/what I should do.
> Using sysfs instead of an ioctl is probably easier to use from userspace
> _if_ there is a good way to provide the password - which I do not know
> of :(
> If nobody else could think of a solution, shall writes to the shadow mbr
> remain unsupported?
> 
> I'ld really appreciate feedback and possible solutions,
>  Jonas

Yeah, having to autheticate to write the MBR is a real bummer. Theoretically
you could dd a the pw struct + the shador MBR into sysfs. But that's
a pretty disgusting hack just to use sysfs. The other method I thought of
was to authenticate via ioctl then write via sysfs. We already save the PW
in-kernel for unlocks, so perhaps we can re-use the save-for-unlock to
do shadow MBR writes via sysfs?

Re-using an already exposed ioctl for another purpose seems somewhat dangerous?
In the sense that what if the user wants to write the smbr but doesn't want to
unlock on suspends, or does not want their PW hanging around in the kernel.

Overall I think the ioctl is still the best path forward due to the 
authentication
problem. But am still willing to hear others opinions if they do have an idea.

I can say yes to the Ioctl, but we really need Jens and Christophs Okay on it.

I have some free time tomorrow to work on this, so let me goof with it tomorrow 
and
over the weekend and I'll see if there is a sane way to get sysfs to work.



Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-03-20 Thread Scott Bauer
On Tue, Mar 20, 2018 at 10:36:04AM +0100, Jonas Rabenstein wrote:
> On Mon, Mar 19, 2018 at 08:52:24PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 19, 2018 at 07:36:50PM +0100, Jonas Rabenstein wrote:
> > > Allow modification of the shadow mbr. If the shadow mbr is not marked as
> > > done, this data will be presented read only as the device content. Only
> > > after marking the shadow mbr as done and unlocking a locking range the
> > > actual content is accessible.
> > 
> > I hate doing this as an ioctls.  Can we make this a sysfs binary file
> > so that people can use dd or cat to write the shadow mbr?
> I already thought about providing a sysfs interface for all that instead
> of using ioctls. But as I am pretty new to kernel programming I do not
> have all the required insight. Especially, as writing the mbr requires
> the sed-opal password I am unsure how a clean sysfs interface to provide
> the password together with a simple dd would look like.
> Moreover I already have a patch that changes the 'void *data' argument
> to setup_opal_dev to a kobject pointer. As far as I know, this is the
> first step to get into the sysfs hierarchy. But as I do not have access
> to an NVMe drive and have no idea about its implementation, this change
> works only for the scsi side.

Post what you have as an RFC (review for comment) and I will test for the NVMe
side, and or start a port for NVMe. It doesn't have to be perfect since you're
sending it out as RFC. It's just a base for us to test/look at to see if we
still like the sysfs way.





Re: [PATCH v2 08/11] block: sed-opal: ioctl for writing to shadow mbr

2018-03-20 Thread Scott Bauer
On Tue, Mar 20, 2018 at 10:36:04AM +0100, Jonas Rabenstein wrote:
> On Mon, Mar 19, 2018 at 08:52:24PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 19, 2018 at 07:36:50PM +0100, Jonas Rabenstein wrote:
> > > Allow modification of the shadow mbr. If the shadow mbr is not marked as
> > > done, this data will be presented read only as the device content. Only
> > > after marking the shadow mbr as done and unlocking a locking range the
> > > actual content is accessible.
> > 
> > I hate doing this as an ioctls.  Can we make this a sysfs binary file
> > so that people can use dd or cat to write the shadow mbr?
> I already thought about providing a sysfs interface for all that instead
> of using ioctls. But as I am pretty new to kernel programming I do not
> have all the required insight. Especially, as writing the mbr requires
> the sed-opal password I am unsure how a clean sysfs interface to provide
> the password together with a simple dd would look like.
> Moreover I already have a patch that changes the 'void *data' argument
> to setup_opal_dev to a kobject pointer. As far as I know, this is the
> first step to get into the sysfs hierarchy. But as I do not have access
> to an NVMe drive and have no idea about its implementation, this change
> works only for the scsi side.

Post what you have as an RFC (review for comment) and I will test for the NVMe
side, and or start a port for NVMe. It doesn't have to be perfect since you're
sending it out as RFC. It's just a base for us to test/look at to see if we
still like the sysfs way.





Re: [PATCH v2 06/11] block: sed-opal: split generation of bytestring header and content

2018-03-19 Thread Scott Bauer
On Mon, Mar 19, 2018 at 08:59:45PM +0100, Christoph Hellwig wrote:
> > +static u8 *add_bytestring_header(int *err, struct opal_dev *cmd, size_t 
> > len)
> >  {
> > size_t header_len = 1;
> > bool is_short_atom = true;
> > -
> > -   if (*err)
> > -   return;
> > +   char *start;
> 
> Shouldn't this also return early if we have a pending error?

It will short circuit and bail out via can_add failing. So even though
you have to go dig to see if the following functions handle the erorr
I think it's slightly cleaner to have a single if (*err) in the deeper 
functions.
This lest the error back out the call chain instead of having multiple if (*err)
checks earlier in the call chains.


Re: [PATCH v2 06/11] block: sed-opal: split generation of bytestring header and content

2018-03-19 Thread Scott Bauer
On Mon, Mar 19, 2018 at 08:59:45PM +0100, Christoph Hellwig wrote:
> > +static u8 *add_bytestring_header(int *err, struct opal_dev *cmd, size_t 
> > len)
> >  {
> > size_t header_len = 1;
> > bool is_short_atom = true;
> > -
> > -   if (*err)
> > -   return;
> > +   char *start;
> 
> Shouldn't this also return early if we have a pending error?

It will short circuit and bail out via can_add failing. So even though
you have to go dig to see if the following functions handle the erorr
I think it's slightly cleaner to have a single if (*err) in the deeper 
functions.
This lest the error back out the call chain instead of having multiple if (*err)
checks earlier in the call chains.


Re: [PATCH v2 00/11] block: sed-opal support write to shadow mbr

2018-03-19 Thread Scott Bauer
On Mon, Mar 19, 2018 at 08:53:35PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 19, 2018 at 07:36:42PM +0100, Jonas Rabenstein wrote:
> > Hi,
> > I was advised to resend the patchset as a v2 where all the patches are
> > in a flat hierarchy. So here is a complete set which hopefully pleases
> > all requirements.
> > As the previous fixes have by now all landed into linux-next, no
> > additional patches are required for testing.
> 
> Btw, what userspace do you use for this?  The only one I know so far
> is Scotts sed-opal-temp, which really should grow a more permanent
> name / location.

He has a forked verison here:
https://github.com/ghostav/sed-opal-temp

He sent out a v1, obviously, but I had him clean up/resubmit and CC you. So you 
could
chime in on the new ioctl.

v1:
https://marc.info/?l=linux-block=152094656909515=4


As for the moving of sed-opal-temp I am more than willing to place this else 
where. I just
don't have a location other than github to place it. If people have suggestions 
on where to store it
that *isnt* on my personal github I will do that. And give access to the 
necessary maintainers.



Re: [PATCH v2 00/11] block: sed-opal support write to shadow mbr

2018-03-19 Thread Scott Bauer
On Mon, Mar 19, 2018 at 08:53:35PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 19, 2018 at 07:36:42PM +0100, Jonas Rabenstein wrote:
> > Hi,
> > I was advised to resend the patchset as a v2 where all the patches are
> > in a flat hierarchy. So here is a complete set which hopefully pleases
> > all requirements.
> > As the previous fixes have by now all landed into linux-next, no
> > additional patches are required for testing.
> 
> Btw, what userspace do you use for this?  The only one I know so far
> is Scotts sed-opal-temp, which really should grow a more permanent
> name / location.

He has a forked verison here:
https://github.com/ghostav/sed-opal-temp

He sent out a v1, obviously, but I had him clean up/resubmit and CC you. So you 
could
chime in on the new ioctl.

v1:
https://marc.info/?l=linux-block=152094656909515=4


As for the moving of sed-opal-temp I am more than willing to place this else 
where. I just
don't have a location other than github to place it. If people have suggestions 
on where to store it
that *isnt* on my personal github I will do that. And give access to the 
necessary maintainers.



Re: [PATCH v2] block: sed-opal: fix u64 short atom length

2018-03-16 Thread Scott Bauer
On Wed, Mar 07, 2018 at 05:55:56PM +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein <jonas.rabenst...@studium.uni-erlangen.de>
> ---
> v2:
>   - use fls64
>   - shorten loop body
> ---
>  block/sed-opal.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)

Reviewed-by: Scott Bauer <scott.ba...@intel.com>
Tested-by: Scott Bauer <scott.ba...@intel.com>

Hi Jens,

When you get time can you apply this if you have no objections?

Thanks


Re: [PATCH v2] block: sed-opal: fix u64 short atom length

2018-03-16 Thread Scott Bauer
On Wed, Mar 07, 2018 at 05:55:56PM +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein 
> ---
> v2:
>   - use fls64
>   - shorten loop body
> ---
>  block/sed-opal.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)

Reviewed-by: Scott Bauer 
Tested-by: Scott Bauer 

Hi Jens,

When you get time can you apply this if you have no objections?

Thanks


Re: [PATCH 0/8] block: sed-opal: support write to shadow mbr

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:08:53PM +0100, Jonas Rabenstein wrote:
> Hi,
> this patchset adds support to write data into the shadow mbr of sed-opal
> enabled devices. They apply cleanly on today next-tree (next-20180313)
> and requires the u64 short atom length fix from [0] as that is still
> missing in that tree. As I only can test on my only sed-opal enabled
> Samsung 850 Evo on amd64, tests on other hardware would be appreciated.

Generally the series looks fine, there are a few nits I've sent out.
I will test and get back to you in the next day or two. With any further
comments.


Re: [PATCH 0/8] block: sed-opal: support write to shadow mbr

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:08:53PM +0100, Jonas Rabenstein wrote:
> Hi,
> this patchset adds support to write data into the shadow mbr of sed-opal
> enabled devices. They apply cleanly on today next-tree (next-20180313)
> and requires the u64 short atom length fix from [0] as that is still
> missing in that tree. As I only can test on my only sed-opal enabled
> Samsung 850 Evo on amd64, tests on other hardware would be appreciated.

Generally the series looks fine, there are a few nits I've sent out.
I will test and get back to you in the next day or two. With any further
comments.


Re: [PATCH 8/8] block: sed-opal: ioctl for writing to shadow mbr

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:09:01PM +0100, Jonas Rabenstein wrote:
> Allow modification of the shadow mbr. If the shadow mbr is not marked as
> done, this data will be presented read only as the device content. Only
> after marking the shadow mbr as done and unlocking a locking range the
> actual content is accessible.
> 


> Signed-off-by: Jonas Rabenstein 
> +static int opal_write_shadow_mbr(struct opal_dev *dev,
> +  struct opal_shadow_mbr *info)
> +{
> + const struct opal_step mbr_steps[] = {
> + { opal_discovery0, },
> + { start_admin1LSP_opal_session, >key },
> + { write_shadow_mbr, info },
> + { end_opal_session, },
> + { NULL, }
> + };
> + int ret;
> +
> + if (info->size == 0)
> + return 0;

We need to bound this to some maximum. I assume we'll at some point come across 
a controller
with crappy firmware that wont check this against the MBR Table size and the 
user will either
brick their drive or overwrite their data.

We can get the size of the MBR Table it seems but I'm not sure how hard it is 
to pull that table yet.

TCG SAS 5.7.3.6:
The size of the MBR Table is retrievable from the "Table" table of the SP that 
incorporates the Locking Template.

As always the TCG spec is super helpful /s.

I will see how todo this and if it's worth it.


> diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h
> index 0cb9890cdc04..c2669ebff681 100644
> --- a/include/uapi/linux/sed-opal.h
> +++ b/include/uapi/linux/sed-opal.h
> @@ -104,6 +104,13 @@ struct opal_mbr_data {
>   __u8 __align[7];
>  };
>  
> +struct opal_shadow_mbr {
> + struct opal_key key;
> + const __u8 *data;

 Please use a u64 here for the data and cast it to a pointer
 in the driver. We do this so we do not have to maintain a compat
 layer for 32 bit userland.



Re: [PATCH 8/8] block: sed-opal: ioctl for writing to shadow mbr

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:09:01PM +0100, Jonas Rabenstein wrote:
> Allow modification of the shadow mbr. If the shadow mbr is not marked as
> done, this data will be presented read only as the device content. Only
> after marking the shadow mbr as done and unlocking a locking range the
> actual content is accessible.
> 


> Signed-off-by: Jonas Rabenstein 
> +static int opal_write_shadow_mbr(struct opal_dev *dev,
> +  struct opal_shadow_mbr *info)
> +{
> + const struct opal_step mbr_steps[] = {
> + { opal_discovery0, },
> + { start_admin1LSP_opal_session, >key },
> + { write_shadow_mbr, info },
> + { end_opal_session, },
> + { NULL, }
> + };
> + int ret;
> +
> + if (info->size == 0)
> + return 0;

We need to bound this to some maximum. I assume we'll at some point come across 
a controller
with crappy firmware that wont check this against the MBR Table size and the 
user will either
brick their drive or overwrite their data.

We can get the size of the MBR Table it seems but I'm not sure how hard it is 
to pull that table yet.

TCG SAS 5.7.3.6:
The size of the MBR Table is retrievable from the "Table" table of the SP that 
incorporates the Locking Template.

As always the TCG spec is super helpful /s.

I will see how todo this and if it's worth it.


> diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h
> index 0cb9890cdc04..c2669ebff681 100644
> --- a/include/uapi/linux/sed-opal.h
> +++ b/include/uapi/linux/sed-opal.h
> @@ -104,6 +104,13 @@ struct opal_mbr_data {
>   __u8 __align[7];
>  };
>  
> +struct opal_shadow_mbr {
> + struct opal_key key;
> + const __u8 *data;

 Please use a u64 here for the data and cast it to a pointer
 in the driver. We do this so we do not have to maintain a compat
 layer for 32 bit userland.



Re: [PATCH 3/8] block: sed-opal: unify cmd start and finalize

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:08:56PM +0100, Jonas Rabenstein wrote:
> Every step starts with resetting the cmd buffer as well as the comid and
> constructs the appropriate OPAL_CALL command. Consequently, those
> actions may be combined into one generic function.
> 
> Signed-off-by: Jonas Rabenstein 
> ---
>  block/sed-opal.c | 243 
> +++
>  1 file changed, 63 insertions(+), 180 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index a228a13f0a08..22dbea7cf4d1 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -659,6 +659,7 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   struct opal_header *hdr;
>   int err = 0;
>  
> + add_token_u8(, cmd, OPAL_ENDLIST);
>   add_token_u8(, cmd, OPAL_ENDOFDATA);
>   add_token_u8(, cmd, OPAL_STARTLIST);
>   add_token_u8(, cmd, 0);
> @@ -1001,6 +1002,21 @@ static void clear_opal_cmd(struct opal_dev *dev)
>   memset(dev->cmd, 0, IO_BUFFER_LENGTH);
>  }
>  
> +static int start_opal_cmd(struct opal_dev *dev, const u8 *uid, const u8 
> *method)
> +{
> + int err = 0;
> +
> + clear_opal_cmd(dev);
> + set_comid(dev, dev->comid);
> +
> + add_token_u8(, dev, OPAL_CALL);
> + add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> + add_token_bytestring(, dev, method, OPAL_METHOD_LENGTH);
> + add_token_u8(, dev, OPAL_STARTLIST);

Can you resend this patch (in a bit I'm revewing the rest) with a comment here 
and
in cmd_finalize explaining that the start_list here gets terminiated by the new
opal_endlist in cmd_finalize.

I'm not a huge fan of having the start/list and end/list in separate functions
since those are used to specify a parameter list for a opal method. I can get 
over it
since it cleans the code up, as long as there are comments on both sides 
reminding me
what they're opening/closing off.


Re: [PATCH 3/8] block: sed-opal: unify cmd start and finalize

2018-03-13 Thread Scott Bauer
On Tue, Mar 13, 2018 at 02:08:56PM +0100, Jonas Rabenstein wrote:
> Every step starts with resetting the cmd buffer as well as the comid and
> constructs the appropriate OPAL_CALL command. Consequently, those
> actions may be combined into one generic function.
> 
> Signed-off-by: Jonas Rabenstein 
> ---
>  block/sed-opal.c | 243 
> +++
>  1 file changed, 63 insertions(+), 180 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index a228a13f0a08..22dbea7cf4d1 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -659,6 +659,7 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   struct opal_header *hdr;
>   int err = 0;
>  
> + add_token_u8(, cmd, OPAL_ENDLIST);
>   add_token_u8(, cmd, OPAL_ENDOFDATA);
>   add_token_u8(, cmd, OPAL_STARTLIST);
>   add_token_u8(, cmd, 0);
> @@ -1001,6 +1002,21 @@ static void clear_opal_cmd(struct opal_dev *dev)
>   memset(dev->cmd, 0, IO_BUFFER_LENGTH);
>  }
>  
> +static int start_opal_cmd(struct opal_dev *dev, const u8 *uid, const u8 
> *method)
> +{
> + int err = 0;
> +
> + clear_opal_cmd(dev);
> + set_comid(dev, dev->comid);
> +
> + add_token_u8(, dev, OPAL_CALL);
> + add_token_bytestring(, dev, uid, OPAL_UID_LENGTH);
> + add_token_bytestring(, dev, method, OPAL_METHOD_LENGTH);
> + add_token_u8(, dev, OPAL_STARTLIST);

Can you resend this patch (in a bit I'm revewing the rest) with a comment here 
and
in cmd_finalize explaining that the start_list here gets terminiated by the new
opal_endlist in cmd_finalize.

I'm not a huge fan of having the start/list and end/list in separate functions
since those are used to specify a parameter list for a opal method. I can get 
over it
since it cleans the code up, as long as there are comments on both sides 
reminding me
what they're opening/closing off.


Re: [PATCH v2] block: sed-opal: fix u64 short atom length

2018-03-07 Thread Scott Bauer
On Wed, Mar 07, 2018 at 05:55:56PM +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein <jonas.rabenst...@studium.uni-erlangen.de>
> ---
> v2:
>   - use fls64
>   - shorten loop body
> ---
>  block/sed-opal.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 

Reviewed-by: Scott Bauer <scott.ba...@intel.com>

Your two patches should be sent to stable for 4.14. I can queue those up and do 
it,
or if you want to you can do it as well. Let me know what you prefer!



Re: [PATCH v2] block: sed-opal: fix u64 short atom length

2018-03-07 Thread Scott Bauer
On Wed, Mar 07, 2018 at 05:55:56PM +0100, Jonas Rabenstein wrote:
> The length must be given as bytes and not as 4 bit tuples.
> 
> Signed-off-by: Jonas Rabenstein 
> ---
> v2:
>   - use fls64
>   - shorten loop body
> ---
>  block/sed-opal.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 

Reviewed-by: Scott Bauer 

Your two patches should be sent to stable for 4.14. I can queue those up and do 
it,
or if you want to you can do it as well. Let me know what you prefer!



Re: [PATCH][RESEND] block: sed-opal: fix response string extraction

2018-03-07 Thread Scott Bauer
On Tue, Mar 06, 2018 at 04:23:24PM -0800, Derrick, Jonathan wrote:
> This looks correct.
> 
> Adding my Ack unless Scott has objections
> 
> Acked-by: Jonathan Derrick <jonathan.derr...@intel.com>


Reviewed-by: Scott Bauer <scott.ba...@intel.com>

Nice catch Jonas! Sorry you had to resend, my message filtering was a little
too agressive and this slipped through the cracks, that has been remediated.


> 
> On Thu, 2018-03-01 at 14:26 +0100, Jonas Rabenstein wrote:
> > Tokens are prefixed by a variable length of bytes. If a bytestring is
> > not stored in an tiny or short atom, we have to skip more than one
> > byte
> > in order to have the actual bytes not prefixed by the bytes
> > describing
> > the actual length of the string.
> > 
> > Signed-off-by: Jonas Rabenstein <jonas.rabenst...@studium.uni-erlange
> > n.de>
> > ---
> >  block/sed-opal.c | 26 +++---
> >  1 file changed, 23 insertions(+), 3 deletions(-)


Re: [PATCH][RESEND] block: sed-opal: fix response string extraction

2018-03-07 Thread Scott Bauer
On Tue, Mar 06, 2018 at 04:23:24PM -0800, Derrick, Jonathan wrote:
> This looks correct.
> 
> Adding my Ack unless Scott has objections
> 
> Acked-by: Jonathan Derrick 


Reviewed-by: Scott Bauer 

Nice catch Jonas! Sorry you had to resend, my message filtering was a little
too agressive and this slipped through the cracks, that has been remediated.


> 
> On Thu, 2018-03-01 at 14:26 +0100, Jonas Rabenstein wrote:
> > Tokens are prefixed by a variable length of bytes. If a bytestring is
> > not stored in an tiny or short atom, we have to skip more than one
> > byte
> > in order to have the actual bytes not prefixed by the bytes
> > describing
> > the actual length of the string.
> > 
> > Signed-off-by: Jonas Rabenstein  > n.de>
> > ---
> >  block/sed-opal.c | 26 +++---
> >  1 file changed, 23 insertions(+), 3 deletions(-)


Re: [PATCH v4 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-18 Thread Scott Bauer

> > +config DM_UN_STRIPE
> > +   tristate "Transpose IO to individual drives on a raid device"
> > +   depends on BLK_DEV_DM
> > +   ---help---
> > + Enable this feature if you with to unstripe I/O on a RAID 0
> > + device to the respective drive. If your hardware has physical
> > + RAID 0 this module can unstripe the I/O to respective sides.
> 
> What does "sides" mean above?

Should say, "members". There's also an alternative patch set out by Heinz
that I am currently testing/writing comments for. It looks like we'll use
his version after I submit my comments (IE take the stuff I put in my v4
to fix your comments).


Re: [PATCH v4 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-18 Thread Scott Bauer

> > +config DM_UN_STRIPE
> > +   tristate "Transpose IO to individual drives on a raid device"
> > +   depends on BLK_DEV_DM
> > +   ---help---
> > + Enable this feature if you with to unstripe I/O on a RAID 0
> > + device to the respective drive. If your hardware has physical
> > + RAID 0 this module can unstripe the I/O to respective sides.
> 
> What does "sides" mean above?

Should say, "members". There's also an alternative patch set out by Heinz
that I am currently testing/writing comments for. It looks like we'll use
his version after I submit my comments (IE take the stuff I put in my v4
to fix your comments).


[PATCH v4 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-18 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0/dm-stripe target.
In a 4 drive RAID 0 the mapper exposes 1/4th of the LBA range
as a virtual drive. Each IO to that virtual drive will land on
only one of the 4 drives, selected by the user.

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
Acked-by: Keith Busch <keith.bu...@intel.com>
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 204 +++
 3 files changed, 215 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..e1c48a7f98f1 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+ device to the respective drive. If your hardware has physical
+ RAID 0 this module can unstripe the I/O to respective sides.
+
+ If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..086689fb11e3
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  <scott.ba...@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   sector_t chunk_sectors;
+   sector_t stripe_sectors;
+   u8 chunk_shift;
+   u8 cur_drive;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *  
+ *  
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int chunk_size;
+   u64 tot_sec, mod;
+   u8 cur_drive, tot_drives;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", _drive, ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _drives, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (tot_drives == 0 || (cur_drive >= tot_drives && tot_drives > 1)) {
+   DMERR("Please provide a drive between [0,%hhu)", tot_drives);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   kfree(target);
+   DMERR("dm-unstripe dev lookup failure! for drive %s", argv[0]);
+   return ret;
+   }
+
+   bbdev = target->ddisk->bdev;
+
+   target->cur_drive = cur_drive;
+   if (chunk_size)
+   target->chunk_sectors = chunk_size;
+   else
+   target->chunk_sectors =
+   queue_max_hw_sectors(bdev_get_queue(bbdev));
+
+   target->stripe_sectors = (tot_drives - 1) * target->chunk_sectors;
+   target->chunk_shift = fls(targ

[PATCH v4 2/2] dm-unstripe: Add documentation for unstripe target

2017-12-18 Thread Scott Bauer
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 Documentation/device-mapper/dm-unstripe.txt | 130 
 1 file changed, 130 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..9c13e9316d52
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,130 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a device-mapper "striped" target to access the
+underlying disks without having to touch the true backing block-device.
+It can also be used to unstripe a hardware RAID-0 to access backing disks
+as well.
+
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the chunk striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+An example of undoing an existing dm-stripe:
+
+This small bash script will setup 4 loop devices and use the existing
+dm-stripe target to combine the 4 devices into one. It then will use
+the unstripe target on the new combined stripe device to access the
+individual backing loop devices. We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array.
+
+#!/bin/bash
+
+MEMBER_SIZE=$((128 * 1024 * 1024))
+NUM=4
+SEQ_END=$((${NUM}-1))
+CHUNK=256
+BS=4096
+
+RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+  losetup /dev/loop${i} member-${i}
+  DM_PARMS+=" /dev/loop${i} 0"
+done
+
+echo $DM_PARMS | dmsetup create raid0
+for i in $(seq 0 ${SEQ_END}); do
+  echo "0 1 unstripe /dev/mapper/raid0 ${i} ${NUM} ${CHUNK}" | dmsetup 
create set-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} 
oflag=direct
+  diff /dev/mapper/set-${i} member-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dmsetup remove set-${i}
+done
+
+dmsetup remove raid0
+
+for i in $(seq 0 ${SEQ_END}); do
+  losetup -d /dev/loop${i}
+  rm -f member-${i}
+done
+
+==
+
+
+Another example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 512|| LBA 768|
+  | LBA 0  || LBA 256|
+  --   --
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. This is because the partitions
+are striped across the two cores. When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction read latency using this device mapper target.
+
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+# In a dm-stripe with 4 drives of chunk size 128K:
+dmsetup create raid_disk0 --table '0 1 unstripe /dev/mapper/striped 0 4 256'
+dmsetup create raid_disk1 --table '0 1 unstripe /dev/mapper/striped 1 4 256'
+dmsetup create raid_disk2 --table '0 1 unstripe /dev/mapper/striped 2 4 256'
+dmsetup create raid_disk3 --table '0 1 unstripe /dev/mapper/striped 3 4 256'
+
-- 
2.11.0



[PATCH v4 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-18 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0/dm-stripe target.
In a 4 drive RAID 0 the mapper exposes 1/4th of the LBA range
as a virtual drive. Each IO to that virtual drive will land on
only one of the 4 drives, selected by the user.

Signed-off-by: Scott Bauer 
Acked-by: Keith Busch 
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 204 +++
 3 files changed, 215 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..e1c48a7f98f1 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+ device to the respective drive. If your hardware has physical
+ RAID 0 this module can unstripe the I/O to respective sides.
+
+ If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..086689fb11e3
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   sector_t chunk_sectors;
+   sector_t stripe_sectors;
+   u8 chunk_shift;
+   u8 cur_drive;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *  
+ *  
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int chunk_size;
+   u64 tot_sec, mod;
+   u8 cur_drive, tot_drives;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", _drive, ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _drives, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (tot_drives == 0 || (cur_drive >= tot_drives && tot_drives > 1)) {
+   DMERR("Please provide a drive between [0,%hhu)", tot_drives);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   kfree(target);
+   DMERR("dm-unstripe dev lookup failure! for drive %s", argv[0]);
+   return ret;
+   }
+
+   bbdev = target->ddisk->bdev;
+
+   target->cur_drive = cur_drive;
+   if (chunk_size)
+   target->chunk_sectors = chunk_size;
+   else
+   target->chunk_sectors =
+   queue_max_hw_sectors(bdev_get_queue(bbdev));
+
+   target->stripe_sectors = (tot_drives - 1) * target->chunk_sectors;
+   target->chunk_shift = fls(target->chunk_sectors) - 1;
+
+   ret = dm_set_target_max_io_len

[PATCH v4 2/2] dm-unstripe: Add documentation for unstripe target

2017-12-18 Thread Scott Bauer
Signed-off-by: Scott Bauer 
---
 Documentation/device-mapper/dm-unstripe.txt | 130 
 1 file changed, 130 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..9c13e9316d52
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,130 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a device-mapper "striped" target to access the
+underlying disks without having to touch the true backing block-device.
+It can also be used to unstripe a hardware RAID-0 to access backing disks
+as well.
+
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the chunk striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+An example of undoing an existing dm-stripe:
+
+This small bash script will setup 4 loop devices and use the existing
+dm-stripe target to combine the 4 devices into one. It then will use
+the unstripe target on the new combined stripe device to access the
+individual backing loop devices. We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array.
+
+#!/bin/bash
+
+MEMBER_SIZE=$((128 * 1024 * 1024))
+NUM=4
+SEQ_END=$((${NUM}-1))
+CHUNK=256
+BS=4096
+
+RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+  losetup /dev/loop${i} member-${i}
+  DM_PARMS+=" /dev/loop${i} 0"
+done
+
+echo $DM_PARMS | dmsetup create raid0
+for i in $(seq 0 ${SEQ_END}); do
+  echo "0 1 unstripe /dev/mapper/raid0 ${i} ${NUM} ${CHUNK}" | dmsetup 
create set-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} 
oflag=direct
+  diff /dev/mapper/set-${i} member-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dmsetup remove set-${i}
+done
+
+dmsetup remove raid0
+
+for i in $(seq 0 ${SEQ_END}); do
+  losetup -d /dev/loop${i}
+  rm -f member-${i}
+done
+
+==
+
+
+Another example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 512|| LBA 768|
+  | LBA 0  || LBA 256|
+  --   --
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. This is because the partitions
+are striped across the two cores. When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction read latency using this device mapper target.
+
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+# In a dm-stripe with 4 drives of chunk size 128K:
+dmsetup create raid_disk0 --table '0 1 unstripe /dev/mapper/striped 0 4 256'
+dmsetup create raid_disk1 --table '0 1 unstripe /dev/mapper/striped 1 4 256'
+dmsetup create raid_disk2 --table '0 1 unstripe /dev/mapper/striped 2 4 256'
+dmsetup create raid_disk3 --table '0 1 unstripe /dev/mapper/striped 3 4 256'
+
-- 
2.11.0



[PATCH v4 0/2] dm-unstripe

2017-12-18 Thread Scott Bauer

Change log:

v3->v4:
 Addressed comments from Randy. Modified documentation to be clearer,
 and fixed small off by one in bounds checking in constructor.

v2->v3:

1) Renamed variables in-code to reflect correct terminology with respect
   to the dm-stripe target.

2) Fixed a  __must_check missing check in the constructor.

3) Used correct types for working with sector remaping.

3) Fixed documentation to reflect the correct termonology, like the code.
   Added the test script Keith sent out in an email that makes a striped
   device and uses the un-stripe target to access the underlying loop
   devices.


[PATCH v4 0/2] dm-unstripe

2017-12-18 Thread Scott Bauer

Change log:

v3->v4:
 Addressed comments from Randy. Modified documentation to be clearer,
 and fixed small off by one in bounds checking in constructor.

v2->v3:

1) Renamed variables in-code to reflect correct terminology with respect
   to the dm-stripe target.

2) Fixed a  __must_check missing check in the constructor.

3) Used correct types for working with sector remaping.

3) Fixed documentation to reflect the correct termonology, like the code.
   Added the test script Keith sent out in an email that makes a striped
   device and uses the un-stripe target to access the underlying loop
   devices.


Re: [PATCH v3 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-15 Thread Scott Bauer
[snip]
On Wed, Dec 13, 2017 at 04:11:44PM -0800, Randy Dunlap wrote:
> 
> >=
>

Thanks, good catch.


> > +   tot_sec = i_size_read(bbdev->bd_inode) >> SECTOR_SHIFT;
> > +   mod = tot_sec % target->chunk_sectors;
> 
> Did you build this on 32-bit also?  Is that '%' OK on 32-bit?

I've looked at this a bit and still can't figure out why this
modulo operation would operate differently on a 32 versus a 64
bit platform? I know sector_t is config dependent but the
sector_t should be promoted to 64 bit width during the modulo
operation.

Are you wondering whether sector_t is the right type for any of
the math in this file? Perhaps we should be safe and only use
u64s?



Re: [PATCH v3 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-15 Thread Scott Bauer
[snip]
On Wed, Dec 13, 2017 at 04:11:44PM -0800, Randy Dunlap wrote:
> 
> >=
>

Thanks, good catch.


> > +   tot_sec = i_size_read(bbdev->bd_inode) >> SECTOR_SHIFT;
> > +   mod = tot_sec % target->chunk_sectors;
> 
> Did you build this on 32-bit also?  Is that '%' OK on 32-bit?

I've looked at this a bit and still can't figure out why this
modulo operation would operate differently on a 32 versus a 64
bit platform? I know sector_t is config dependent but the
sector_t should be promoted to 64 bit width during the modulo
operation.

Are you wondering whether sector_t is the right type for any of
the math in this file? Perhaps we should be safe and only use
u64s?



Re: [PATCH v3 2/2] dm unstripe: Add documentation for unstripe target

2017-12-13 Thread Scott Bauer
On Wed, Dec 13, 2017 at 04:18:06PM -0800, Randy Dunlap wrote:
> Use ASCII characters ___ or ---,  not whatever those bottom block characters 
> are.

Argh, sorry, I removed those for an internal review but forgot to remove them 
here.

> 
> > +
> > +The purpose of this unstriping is to provide better QoS in noisy
> > +neighbor environments. When two partitions are created on the
> > +aggregate drive without this unstriping, reads on one partition
> > +can affect writes on another partition. This is because the partitions
> > +are striped across the two cores. When we unstripe this hardware RAID 0
> > +and make partitions on each new exposed device the two partitions are 
> > now
> > +physically separated.
> > +
> > +With the module we were able to segregate a fio script that has read 
> > and
> > +write jobs that are independent of each other. Compared to when we run
> > +the test on a combined drive with partitions, we were able to get a 92%
> > +reduction in five-9ths read latency using this device mapper target.
> 
>   5/9ths
> although I can't quite parse that sentence.

I'll reword it, thanks.



Re: [PATCH v3 2/2] dm unstripe: Add documentation for unstripe target

2017-12-13 Thread Scott Bauer
On Wed, Dec 13, 2017 at 04:18:06PM -0800, Randy Dunlap wrote:
> Use ASCII characters ___ or ---,  not whatever those bottom block characters 
> are.

Argh, sorry, I removed those for an internal review but forgot to remove them 
here.

> 
> > +
> > +The purpose of this unstriping is to provide better QoS in noisy
> > +neighbor environments. When two partitions are created on the
> > +aggregate drive without this unstriping, reads on one partition
> > +can affect writes on another partition. This is because the partitions
> > +are striped across the two cores. When we unstripe this hardware RAID 0
> > +and make partitions on each new exposed device the two partitions are 
> > now
> > +physically separated.
> > +
> > +With the module we were able to segregate a fio script that has read 
> > and
> > +write jobs that are independent of each other. Compared to when we run
> > +the test on a combined drive with partitions, we were able to get a 92%
> > +reduction in five-9ths read latency using this device mapper target.
> 
>   5/9ths
> although I can't quite parse that sentence.

I'll reword it, thanks.



[PATCH v3 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-13 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0/dm-stripe target.
In a 4 drive RAID 0 the mapper exposes 1/4th of the LBA range
as a virtual drive. Each IO to that virtual drive will land on
only one of the 4 drives, selected by the user.

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
Acked-by: Keith Busch <keith.bu...@intel.com>
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 204 +++
 3 files changed, 215 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..d1105e92bb3f
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  <scott.ba...@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   sector_t chunk_sectors;
+   sector_t stripe_sectors;
+   u8 chunk_shift;
+   u8 cur_drive;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *  
+ *  
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int chunk_size;
+   u64 tot_sec, mod;
+   u8 cur_drive, tot_drives;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", _drive, ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _drives, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (tot_drives == 0 || (cur_drive > tot_drives && tot_drives > 1)) {
+   DMERR("Please provide a drive between [0,%hhu)", tot_drives);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   kfree(target);
+   DMERR("dm-unstripe dev lookup failure! for drive %s", argv[0]);
+   return ret;
+   }
+
+   bbdev = target->ddisk->bdev;
+
+   target->cur_drive = cur_drive;
+   if (chunk_size)
+   target->chunk_sectors = chunk_size;
+   else
+   target->chunk_sectors =
+   queue_max_hw_sectors(bdev_get_queue(bbdev));
+
+   target->stripe_sectors = (tot_drives - 1) * target->chunk_sectors;
+   target->chunk_shift = fls(targ

[PATCH v3 0/2] dm-unstripe

2017-12-13 Thread Scott Bauer


Changes from v2->v3:

1) Renamed variables in-code to reflect correct terminology with respect
   to the dm-stripe target.

2) Fixed a  __must_check missing check in the constructor.

3) Used correct types for working with sector remaping.

3) Fixed documentation to reflect the correct termonology, like the code.
   Added the test script Keith sent out in an email that makes a striped
   device and uses the un-stripe target to access the underlying loop
   devices.


[PATCH v3 1/2] dm-unstripe: unstripe RAID 0/dm-striped device

2017-12-13 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0/dm-stripe target.
In a 4 drive RAID 0 the mapper exposes 1/4th of the LBA range
as a virtual drive. Each IO to that virtual drive will land on
only one of the 4 drives, selected by the user.

Signed-off-by: Scott Bauer 
Acked-by: Keith Busch 
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 204 +++
 3 files changed, 215 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..d1105e92bb3f
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   sector_t chunk_sectors;
+   sector_t stripe_sectors;
+   u8 chunk_shift;
+   u8 cur_drive;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *  
+ *  
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int chunk_size;
+   u64 tot_sec, mod;
+   u8 cur_drive, tot_drives;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", _drive, ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _drives, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (tot_drives == 0 || (cur_drive > tot_drives && tot_drives > 1)) {
+   DMERR("Please provide a drive between [0,%hhu)", tot_drives);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   kfree(target);
+   DMERR("dm-unstripe dev lookup failure! for drive %s", argv[0]);
+   return ret;
+   }
+
+   bbdev = target->ddisk->bdev;
+
+   target->cur_drive = cur_drive;
+   if (chunk_size)
+   target->chunk_sectors = chunk_size;
+   else
+   target->chunk_sectors =
+   queue_max_hw_sectors(bdev_get_queue(bbdev));
+
+   target->stripe_sectors = (tot_drives - 1) * target->chunk_sectors;
+   target->chunk_shift = fls(target->chunk_sectors) - 1;
+
+   ret = dm_set_target_max_io_len(ti, target->chunk_

[PATCH v3 0/2] dm-unstripe

2017-12-13 Thread Scott Bauer


Changes from v2->v3:

1) Renamed variables in-code to reflect correct terminology with respect
   to the dm-stripe target.

2) Fixed a  __must_check missing check in the constructor.

3) Used correct types for working with sector remaping.

3) Fixed documentation to reflect the correct termonology, like the code.
   Added the test script Keith sent out in an email that makes a striped
   device and uses the un-stripe target to access the underlying loop
   devices.


[PATCH v3 2/2] dm unstripe: Add documentation for unstripe target

2017-12-13 Thread Scott Bauer
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 Documentation/device-mapper/dm-unstripe.txt | 130 
 1 file changed, 130 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..01d7194b9075
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,130 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a device-mapper "striped" target to access the
+underlying disks without having to touch the true backing block-device.
+It can also be used to unstripe a hardware RAID-0 to access backing disks
+as well.
+
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the chunk striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+An example of undoing an existing dm-stripe:
+
+This small bash script will setup 4 loop devices and use the existing
+dm-stripe target to combine the 4 devices into one. It then will use
+the unstripe target on the new combined stripe device to access the
+individual backing loop devices. We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array.
+
+#!/bin/bash
+
+MEMBER_SIZE=$((128 * 1024 * 1024))
+NUM=4
+SEQ_END=$((${NUM}-1))
+CHUNK=256
+BS=4096
+
+RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+  losetup /dev/loop${i} member-${i}
+  DM_PARMS+=" /dev/loop${i} 0"
+done
+
+echo $DM_PARMS | dmsetup create raid0
+for i in $(seq 0 ${SEQ_END}); do
+  echo "0 1 unstripe /dev/mapper/raid0 ${i} ${NUM} ${CHUNK}" | dmsetup 
create set-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} 
oflag=direct
+  diff /dev/mapper/set-${i} member-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dmsetup remove set-${i}
+done
+
+dmsetup remove raid0
+
+for i in $(seq 0 ${SEQ_END}); do
+  losetup -d /dev/loop${i}
+  rm -f member-${i}
+done
+
+==
+
+
+Another example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 512|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. This is because the partitions
+are striped across the two cores. When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+# In a dm-stripe with 4 drives of chunk size 128K:
+dmsetup create raid_disk0 --table '0 1 unstripe /dev/mapper/striped 0 4 256'
+dmsetup create raid_disk1 --table '0 1 unstripe /dev/mapper/striped 1 4 256'
+dmsetup create raid_disk2 --table '0 1 unstripe /dev/mapper/striped 2 4 256'
+dmsetup create raid_disk3 --table '0 1 unstripe /dev/mapper/striped 3 4 256'
+
-- 
2.11.0



[PATCH v3 2/2] dm unstripe: Add documentation for unstripe target

2017-12-13 Thread Scott Bauer
Signed-off-by: Scott Bauer 
---
 Documentation/device-mapper/dm-unstripe.txt | 130 
 1 file changed, 130 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..01d7194b9075
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,130 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a device-mapper "striped" target to access the
+underlying disks without having to touch the true backing block-device.
+It can also be used to unstripe a hardware RAID-0 to access backing disks
+as well.
+
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the chunk striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+An example of undoing an existing dm-stripe:
+
+This small bash script will setup 4 loop devices and use the existing
+dm-stripe target to combine the 4 devices into one. It then will use
+the unstripe target on the new combined stripe device to access the
+individual backing loop devices. We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array.
+
+#!/bin/bash
+
+MEMBER_SIZE=$((128 * 1024 * 1024))
+NUM=4
+SEQ_END=$((${NUM}-1))
+CHUNK=256
+BS=4096
+
+RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+  losetup /dev/loop${i} member-${i}
+  DM_PARMS+=" /dev/loop${i} 0"
+done
+
+echo $DM_PARMS | dmsetup create raid0
+for i in $(seq 0 ${SEQ_END}); do
+  echo "0 1 unstripe /dev/mapper/raid0 ${i} ${NUM} ${CHUNK}" | dmsetup 
create set-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} 
oflag=direct
+  diff /dev/mapper/set-${i} member-${i}
+done;
+
+for i in $(seq 0 ${SEQ_END}); do
+  dmsetup remove set-${i}
+done
+
+dmsetup remove raid0
+
+for i in $(seq 0 ${SEQ_END}); do
+  losetup -d /dev/loop${i}
+  rm -f member-${i}
+done
+
+==
+
+
+Another example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 512|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. This is because the partitions
+are striped across the two cores. When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+# In a dm-stripe with 4 drives of chunk size 128K:
+dmsetup create raid_disk0 --table '0 1 unstripe /dev/mapper/striped 0 4 256'
+dmsetup create raid_disk1 --table '0 1 unstripe /dev/mapper/striped 1 4 256'
+dmsetup create raid_disk2 --table '0 1 unstripe /dev/mapper/striped 2 4 256'
+dmsetup create raid_disk3 --table '0 1 unstripe /dev/mapper/striped 3 4 256'
+
-- 
2.11.0



Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

2017-12-12 Thread Scott Bauer
On Tue, Dec 12, 2017 at 01:10:13PM -0500, Mike Snitzer wrote:
> On Mon, Dec 11 2017 at 11:00am -0500,
> Scott Bauer <scott.ba...@intel.com> wrote:
> 
> OK, but I'm left wondering: why doesn't the user avoid striping across
> the cores?
> 
> Do the Intel NVMe drives not provide the ability to present 1 device per
> NVMe core?
> 
> This DM target seems like a pretty nasty workaround for what should be
> fixed in the NVMe drive's firmware.
> 
> Mainly because there is no opportunity to use both striped and unstriped
> access to the same NVMe drive.  So why impose striped on the user in the
> first place?
> 
> Mike

Unfortunately, the NVMe drives do not currently support exposing each core
as seperate drives or namespaces. While it would be preferable if the
controllers did expose such features, firmware development informed us there
are sufficient reasons why it isn't possible for the existing generation of
drives.

The NVMe working group just finalized a standard that presents isolated storage
sets for users who wish to use them. As the standard was just recently finalized
the use case was created well after the targeted drives were created. The Intel
drives just so happen to have independent back-ends that align with the isolated
storage sets use case. We would like to provide a way to exploit the independent
back-ends to expose isolated storage in a way that is generic across all the 
Intel
NVMe drive familes. The implementation is generic enough that it can be applied
to  any storage that has physical seperation at known block boundaries.


Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

2017-12-12 Thread Scott Bauer
On Tue, Dec 12, 2017 at 01:10:13PM -0500, Mike Snitzer wrote:
> On Mon, Dec 11 2017 at 11:00am -0500,
> Scott Bauer  wrote:
> 
> OK, but I'm left wondering: why doesn't the user avoid striping across
> the cores?
> 
> Do the Intel NVMe drives not provide the ability to present 1 device per
> NVMe core?
> 
> This DM target seems like a pretty nasty workaround for what should be
> fixed in the NVMe drive's firmware.
> 
> Mainly because there is no opportunity to use both striped and unstriped
> access to the same NVMe drive.  So why impose striped on the user in the
> first place?
> 
> Mike

Unfortunately, the NVMe drives do not currently support exposing each core
as seperate drives or namespaces. While it would be preferable if the
controllers did expose such features, firmware development informed us there
are sufficient reasons why it isn't possible for the existing generation of
drives.

The NVMe working group just finalized a standard that presents isolated storage
sets for users who wish to use them. As the standard was just recently finalized
the use case was created well after the targeted drives were created. The Intel
drives just so happen to have independent back-ends that align with the isolated
storage sets use case. We would like to provide a way to exploit the independent
back-ends to expose isolated storage in a way that is generic across all the 
Intel
NVMe drive familes. The implementation is generic enough that it can be applied
to  any storage that has physical seperation at known block boundaries.


[PATCH v2 0/2] dm-unstripe

2017-12-11 Thread Scott Bauer



V1->v2 Changes:
1) Fixed up some spelling errors in documentation.
2) Cleaned up some variable names to something more appropriate.




[PATCH v2 0/2] dm-unstripe

2017-12-11 Thread Scott Bauer



V1->v2 Changes:
1) Fixed up some spelling errors in documentation.
2) Cleaned up some variable names to something more appropriate.




[PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

2017-12-11 Thread Scott Bauer
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 Documentation/device-mapper/dm-unstripe.txt | 82 +
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..4e1a0a39a689
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,82 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper Unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a RAID 0 striping to access segregated disks.
+
+This module should be used by users who understand what the underlying
+disks look like behind the software/hardware RAID.
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the raid striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+As a use case:
+
+
+As an example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 511|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. With the striping concurrent
+reads and writes and I/O on opposite cores have lower completion times,
+and better tail latencies.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+One could use the module to Logical de-pop a HDD if you have sufficient
+geometry information regarding the drive.
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+In a Raid 0 with 4 drives of stripe size 128K:
+dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
+dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
+dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
+dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
+
-- 
2.11.0



[PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

2017-12-11 Thread Scott Bauer
Signed-off-by: Scott Bauer 
---
 Documentation/device-mapper/dm-unstripe.txt | 82 +
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..4e1a0a39a689
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,82 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper Unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a RAID 0 striping to access segregated disks.
+
+This module should be used by users who understand what the underlying
+disks look like behind the software/hardware RAID.
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the raid striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+As a use case:
+
+
+As an example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 511|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. With the striping concurrent
+reads and writes and I/O on opposite cores have lower completion times,
+and better tail latencies.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+One could use the module to Logical de-pop a HDD if you have sufficient
+geometry information regarding the drive.
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+In a Raid 0 with 4 drives of stripe size 128K:
+dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
+dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
+dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
+dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
+
-- 
2.11.0



[PATCH v2 1/2] dm-unstripe: unstripe of IO across RAID 0

2017-12-11 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0. In a 4 drive RAID 0 the
mapper exposes 1/4th of the LBA range as a virtual drive.
Each IO to that virtual drive will land on only one of the 4
drives, selected by the user.

As an example:

Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k stripe across the two cores:

   Core 0:Core 1:
  ____
  | LBA 511|| LBA 768|
  | LBA 0  || LBA 256|

The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. With the striping concurrent
reads and writes and I/O on opposite cores have lower completion times,
and better tail latencies.

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 197 +++
 3 files changed, 208 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..cca91108688f
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  <scott.ba...@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   unsigned int max_hw_sectors;
+   unsigned int chunk_sector;
+   u64 stripe_shift;
+   u8 cur_stripe;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int stripe_size;
+   u64 tot_sec, mod;
+   u8 set, num_sets;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", , ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _sets, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (num_sets == 0 || (set > num_sets && num_sets > 1)) {
+   DMERR("Please provide a set between [0,%hhu)", num_sets);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+ 

[PATCH v2 1/2] dm-unstripe: unstripe of IO across RAID 0

2017-12-11 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0. In a 4 drive RAID 0 the
mapper exposes 1/4th of the LBA range as a virtual drive.
Each IO to that virtual drive will land on only one of the 4
drives, selected by the user.

As an example:

Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k stripe across the two cores:

   Core 0:Core 1:
  ____
  | LBA 511|| LBA 768|
  | LBA 0  || LBA 256|

The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. With the striping concurrent
reads and writes and I/O on opposite cores have lower completion times,
and better tail latencies.

Signed-off-by: Scott Bauer 
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 197 +++
 3 files changed, 208 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..cca91108688f
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   unsigned int max_hw_sectors;
+   unsigned int chunk_sector;
+   u64 stripe_shift;
+   u8 cur_stripe;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *bbdev;
+   struct unstripe *target;
+   unsigned int stripe_size;
+   u64 tot_sec, mod;
+   u8 set, num_sets;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", , ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _sets, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (num_sets == 0 || (set > num_sets && num_sets > 1)) {
+   DMERR("Please provide a set between [0,%hhu)", num_sets);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   

[PATCH 1/2] dm-unstripe: unstripe of IO across RAID 0

2017-12-04 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0. In a 4 drive RAID 0 the
mapper exposes 1/4th of the LBA range as a virtual drive.
Each IO to that virtual drive will land on only one of the 4
drives, selected by the user.

As an example:

Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k stripe across the two cores:

   Core 0:Core 1:
  ____
  | LBA 511|| LBA 768|
  | LBA 0  || LBA 256|

The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. With the striping concurrent
reads and writes and I/O on opposite cores have lower completion times,
and better tail latencies.

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 197 +++
 3 files changed, 208 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..ab8f394c
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  <scott.ba...@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   unsigned int max_hw_sectors;
+   unsigned int chunk_sector;
+   u64 stripe_shift;
+   u8 cur_stripe;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *nvme_bdev;
+   struct unstripe *target;
+   unsigned int stripe_size;
+   u64 tot_sec, mod;
+   u8 set, num_sets;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", , ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _sets, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (num_sets == 0 || (set > num_sets && num_sets > 1)) {
+   DMERR("Please provide a set between [0,%hhu)", num_sets);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+ 

[PATCH 2/2] dm unstripe: Add documentation for unstripe target

2017-12-04 Thread Scott Bauer
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 Documentation/device-mapper/dm-unstripe.txt | 82 +
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..7060f1b53db3
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,82 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper Unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a RAID 0 striping to access segregated disks.
+
+This module should be used by users who understand what the underlying
+disks look like behind the Software/Hardware RAID.
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the raid striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+As a use case:
+
+
+As an example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 511|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. With the striping concurrent
+reads and writes and I/O on opposite cores have lower completion times,
+and better tail latencies.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+One could use the module to Logical de-pop a HDD if you have sufficient
+geomoetry information regarding the drive.
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 1 and core 2.
+
+
+In a Raid 0 with 4 drives of stripe size 128K:
+dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
+dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
+dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
+dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
+
-- 
2.11.0



[PATCH 2/2] dm unstripe: Add documentation for unstripe target

2017-12-04 Thread Scott Bauer
Signed-off-by: Scott Bauer 
---
 Documentation/device-mapper/dm-unstripe.txt | 82 +
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt 
b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index ..7060f1b53db3
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,82 @@
+Device-Mapper Unstripe
+=
+
+The device-mapper Unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a RAID 0 striping to access segregated disks.
+
+This module should be used by users who understand what the underlying
+disks look like behind the Software/Hardware RAID.
+
+Parameters:
+  <# of drives> 
+
+
+
+   The block device you wish to unstripe.
+
+
+The physical drive you wish to expose via this "virtual" device
+   mapper target. This must be 0 indexed.
+
+<# of drives>
+The number of drives in the RAID 0.
+
+
+   The amount of 512B sectors in the raid striping, or zero, if you
+   wish you use max_hw_sector_size.
+
+
+Why use this module?
+=
+
+As a use case:
+
+
+As an example:
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k stripe across the two cores:
+
+   Core 0:Core 1:
+  ____
+  | LBA 511|| LBA 768|
+  | LBA 0  || LBA 256|
+  ⎻⎻⎻⎻
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition. With the striping concurrent
+reads and writes and I/O on opposite cores have lower completion times,
+and better tail latencies.
+
+With the module we were able to segregate a fio script that has read and
+write jobs that are independent of each other. Compared to when we run
+the test on a combined drive with partitions, we were able to get a 92%
+reduction in five-9ths read latency using this device mapper target.
+
+
+One could use the module to Logical de-pop a HDD if you have sufficient
+geomoetry information regarding the drive.
+
+
+Example scripts:
+
+
+dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 1 and core 2.
+
+
+In a Raid 0 with 4 drives of stripe size 128K:
+dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
+dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
+dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
+dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
+
-- 
2.11.0



[PATCH 1/2] dm-unstripe: unstripe of IO across RAID 0

2017-12-04 Thread Scott Bauer
This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0. In a 4 drive RAID 0 the
mapper exposes 1/4th of the LBA range as a virtual drive.
Each IO to that virtual drive will land on only one of the 4
drives, selected by the user.

As an example:

Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k stripe across the two cores:

   Core 0:Core 1:
  ____
  | LBA 511|| LBA 768|
  | LBA 0  || LBA 256|

The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. With the striping concurrent
reads and writes and I/O on opposite cores have lower completion times,
and better tail latencies.

Signed-off-by: Scott Bauer 
---
 drivers/md/Kconfig   |  10 +++
 drivers/md/Makefile  |   1 +
 drivers/md/dm-unstripe.c | 197 +++
 3 files changed, 208 insertions(+)
 create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON
 
 source "drivers/md/persistent-data/Kconfig"
 
+config DM_UN_STRIPE
+   tristate "Transpose IO to individual drives on a raid device"
+   depends on BLK_DEV_DM
+   ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+device to the respective drive. If your hardware has physical
+RAID 0 this module can unstripe the I/O to respective sides.
+
+If unsure say N.
+
 config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE)  += bcache/
 obj-$(CONFIG_BLK_DEV_MD)   += md-mod.o
 obj-$(CONFIG_BLK_DEV_DM)   += dm-mod.o
 obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE)   += dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index ..ab8f394c
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ *Scott  Bauer  
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct unstripe {
+   struct dm_dev *ddisk;
+   unsigned int max_hw_sectors;
+   unsigned int chunk_sector;
+   u64 stripe_shift;
+   u8 cur_stripe;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+   "  "
+   " ";
+
+/*
+ * Argument layout:
+ *
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+   struct block_device *nvme_bdev;
+   struct unstripe *target;
+   unsigned int stripe_size;
+   u64 tot_sec, mod;
+   u8 set, num_sets;
+   char dummy;
+   int ret;
+
+   if (argc != 4) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (sscanf(argv[1], "%hhu%c", , ) != 1 ||
+   sscanf(argv[2], "%hhu%c", _sets, ) != 1 ||
+   sscanf(argv[3], "%u%c", _size, ) != 1) {
+   DMERR("%s", parse_err);
+   return -EINVAL;
+   }
+
+   if (num_sets == 0 || (set > num_sets && num_sets > 1)) {
+   DMERR("Please provide a set between [0,%hhu)", num_sets);
+   return -EINVAL;
+   }
+
+   target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+   if (!target) {
+   DMERR("Failed to allocate space for DM unstripe!");
+   return -ENOMEM;
+   }
+
+   ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+   >ddisk);
+   if (ret) {
+   kfre

[PATCH] MAINTAINERS: Remove Rafael from Opal maintainers.

2017-10-31 Thread Scott Bauer
He is no longer working on storage.

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index af0cb69f6a3e..5c0864d7d7ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12052,7 +12052,6 @@ F:  drivers/mmc/host/sdhci-spear.c
 SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
 M: Scott Bauer <scott.ba...@intel.com>
 M: Jonathan Derrick <jonathan.derr...@intel.com>
-M: Rafael Antognolli <rafael.antogno...@intel.com>
 L: linux-bl...@vger.kernel.org
 S: Supported
 F: block/sed*
-- 
2.11.0



[PATCH] MAINTAINERS: Remove Rafael from Opal maintainers.

2017-10-31 Thread Scott Bauer
He is no longer working on storage.

Signed-off-by: Scott Bauer 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index af0cb69f6a3e..5c0864d7d7ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12052,7 +12052,6 @@ F:  drivers/mmc/host/sdhci-spear.c
 SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
 M: Scott Bauer 
 M: Jonathan Derrick 
-M: Rafael Antognolli 
 L: linux-bl...@vger.kernel.org
 S: Supported
 F: block/sed*
-- 
2.11.0



Re: [PATCHv3 2/4] block/sed: Add helper to qualify response tokens

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:07PM -0700, Jon Derrick wrote:
> Add helper which verifies the response token is valid and matches the
> expected value. Merges token_type and response_get_token.
> 
> Signed-off-by: Jon Derrick <jonathan.derr...@intel.com>
> ---
>  block/sed-opal.c | 61 
> +++-
>  1 file changed, 25 insertions(+), 36 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 77623ad..d6dd604 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -591,48 +591,25 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   return 0;
>  }
>  
> -static enum opal_response_token token_type(const struct parsed_resp *resp,
> -int n)
> +static const struct opal_resp_tok *response_get_token(
> + const struct parsed_resp *resp,
> + int n)
>  {
>   const struct opal_resp_tok *tok;
>  
>   if (n >= resp->num) {
>   pr_err("Token number doesn't exist: %d, resp: %d\n",
>  n, resp->num);
> - return OPAL_DTA_TOKENID_INVALID;
> + return ERR_PTR(-EINVAL);
>   }
>  
>   tok = >toks[n];
>   if (tok->len == 0) {
>   pr_err("Token length must be non-zero\n");
> - return OPAL_DTA_TOKENID_INVALID;
> + return ERR_PTR(-EINVAL);
>   }
>  
> - return tok->type;
> -}
> -
> -/*
> - * This function returns 0 in case of invalid token. One should call
> - * token_type() first to find out if the token is valid or not.
> - */
> -static enum opal_token response_get_token(const struct parsed_resp *resp,
> -   int n)
> -{
> - const struct opal_resp_tok *tok;
> -
> - if (n >= resp->num) {
> - pr_err("Token number doesn't exist: %d, resp: %d\n",
> -n, resp->num);
> - return 0;
> - }
> -
> - tok = >toks[n];
> - if (tok->len == 0) {
> - pr_err("Token length must be non-zero\n");
> - return 0;
> - }
> -
> - return tok->pos[0];
> + return tok;
>  }
>  
>  static ssize_t response_parse_tiny(struct opal_resp_tok *tok,
> @@ -851,20 +828,32 @@ static u64 response_get_u64(const struct parsed_resp 
> *resp, int n)
>   return resp->toks[n].stored.u;
>  }
>  
> +static bool response_token_matches(const struct opal_resp_tok *token, u8 
> match)
> +{
> + if (IS_ERR_OR_NULL(token) ||
> + token->type != OPAL_DTA_TOKENID_TOKEN ||
> +     token->pos[0] != match)
> + return false;
> + return true;
> +}
> +

This is sorta pedantic but from my reading of the code token can never be null. 
It will
either be a valid pointer or ERR_PTR(-EINVAL), but never null.
So maybe change this to IS_ERR(token) etc. Other than that little nit:

Reviewed-by: Scott Bauer <scott.ba...@intel.com>


Re: [PATCHv3 2/4] block/sed: Add helper to qualify response tokens

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:07PM -0700, Jon Derrick wrote:
> Add helper which verifies the response token is valid and matches the
> expected value. Merges token_type and response_get_token.
> 
> Signed-off-by: Jon Derrick 
> ---
>  block/sed-opal.c | 61 
> +++-
>  1 file changed, 25 insertions(+), 36 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index 77623ad..d6dd604 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -591,48 +591,25 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, 
> u32 tsn)
>   return 0;
>  }
>  
> -static enum opal_response_token token_type(const struct parsed_resp *resp,
> -int n)
> +static const struct opal_resp_tok *response_get_token(
> + const struct parsed_resp *resp,
> + int n)
>  {
>   const struct opal_resp_tok *tok;
>  
>   if (n >= resp->num) {
>   pr_err("Token number doesn't exist: %d, resp: %d\n",
>  n, resp->num);
> - return OPAL_DTA_TOKENID_INVALID;
> + return ERR_PTR(-EINVAL);
>   }
>  
>   tok = >toks[n];
>   if (tok->len == 0) {
>   pr_err("Token length must be non-zero\n");
> - return OPAL_DTA_TOKENID_INVALID;
> + return ERR_PTR(-EINVAL);
>   }
>  
> - return tok->type;
> -}
> -
> -/*
> - * This function returns 0 in case of invalid token. One should call
> - * token_type() first to find out if the token is valid or not.
> - */
> -static enum opal_token response_get_token(const struct parsed_resp *resp,
> -   int n)
> -{
> - const struct opal_resp_tok *tok;
> -
> - if (n >= resp->num) {
> - pr_err("Token number doesn't exist: %d, resp: %d\n",
> -n, resp->num);
> - return 0;
> - }
> -
> - tok = >toks[n];
> - if (tok->len == 0) {
> - pr_err("Token length must be non-zero\n");
> - return 0;
> - }
> -
> - return tok->pos[0];
> + return tok;
>  }
>  
>  static ssize_t response_parse_tiny(struct opal_resp_tok *tok,
> @@ -851,20 +828,32 @@ static u64 response_get_u64(const struct parsed_resp 
> *resp, int n)
>   return resp->toks[n].stored.u;
>  }
>  
> +static bool response_token_matches(const struct opal_resp_tok *token, u8 
> match)
> +{
> + if (IS_ERR_OR_NULL(token) ||
> + token->type != OPAL_DTA_TOKENID_TOKEN ||
> + token->pos[0] != match)
> + return false;
> + return true;
> +}
> +

This is sorta pedantic but from my reading of the code token can never be null. 
It will
either be a valid pointer or ERR_PTR(-EINVAL), but never null.
So maybe change this to IS_ERR(token) etc. Other than that little nit:

Reviewed-by: Scott Bauer 


Re: [PATCHv3 3/4] block/sed: Check received header lengths

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:08PM -0700, Jon Derrick wrote:
> Add a buffer size check against discovery and response header lengths
> before we loop over their buffers.
> 
> Signed-off-by: Jon Derrick <jonathan.derr...@intel.com>
Reviewed-by: Scott Bauer <scott.ba...@intel.com>


Re: [PATCHv3 3/4] block/sed: Check received header lengths

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:08PM -0700, Jon Derrick wrote:
> Add a buffer size check against discovery and response header lengths
> before we loop over their buffers.
> 
> Signed-off-by: Jon Derrick 
Reviewed-by: Scott Bauer 


Re: [PATCHv3 1/4] block/sed: Use ssize_t on atom parsers to return errors

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:06PM -0700, Jon Derrick wrote:
> The short atom parser can return an errno from decoding but does not
> currently return the error as a signed value. Convert all of the parsers
> to ssize_t.
> 
> Signed-off-by: Jon Derrick <jonathan.derr...@intel.com>
Reviewed-by: Scott Bauer <scott.ba...@intel.com>


Re: [PATCHv3 1/4] block/sed: Use ssize_t on atom parsers to return errors

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:42:06PM -0700, Jon Derrick wrote:
> The short atom parser can return an errno from decoding but does not
> currently return the error as a signed value. Convert all of the parsers
> to ssize_t.
> 
> Signed-off-by: Jon Derrick 
Reviewed-by: Scott Bauer 


Re: [PATCH 3/4] block/sed: Check received header lengths

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:38:53PM -0700, Jon Derrick wrote:
> Add a buffer size check against discovery and response header lengths
> before we loop over their buffers.
> 
> Signed-off-by: Jon Derrick 
> ---
>  block/sed-opal.c | 35 +--
>  1 file changed, 21 insertions(+), 14 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index d6dd604..addf89e 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -340,10 +340,17 @@ static int opal_discovery0_end(struct opal_dev *dev)
>   const struct d0_header *hdr = (struct d0_header *)dev->resp;
>   const u8 *epos = dev->resp, *cpos = dev->resp;
>   u16 comid = 0;
> + u32 hlen = be32_to_cpu(hdr->length);
>  
> - print_buffer(dev->resp, be32_to_cpu(hdr->length));
> + print_buffer(dev->resp, hlen);
>  
> - epos += be32_to_cpu(hdr->length); /* end of buffer */
> + if (hlen > IO_BUFFER_LENGTH - sizeof(*hdr)) {
> + pr_warn("Discovery length overflows buffer (%zu+%u)/%u\n",
> + sizeof(*hdr), hlen, IO_BUFFER_LENGTH);
> + return -EFAULT;
> + }
> +
> + epos += hlen; /* end of buffer */
>   cpos += sizeof(*hdr); /* current position on buffer */
>  
>   while (cpos < epos && supported) {
> @@ -713,6 +720,7 @@ static int response_parse(const u8 *buf, size_t length,
>   int total;
>   ssize_t token_length;
>   const u8 *pos;
> + u32 clen, plen, slen;
>  
>   if (!buf)
>   return -EFAULT;
> @@ -724,17 +732,16 @@ static int response_parse(const u8 *buf, size_t length,
>   pos = buf;
>   pos += sizeof(*hdr);
>  
> - pr_debug("Response size: cp: %d, pkt: %d, subpkt: %d\n",
> -  be32_to_cpu(hdr->cp.length),
> -  be32_to_cpu(hdr->pkt.length),
> -  be32_to_cpu(hdr->subpkt.length));
> -
> - if (hdr->cp.length == 0 || hdr->pkt.length == 0 ||
> - hdr->subpkt.length == 0) {
> - pr_err("Bad header length. cp: %d, pkt: %d, subpkt: %d\n",
> -be32_to_cpu(hdr->cp.length),
> -be32_to_cpu(hdr->pkt.length),
> -be32_to_cpu(hdr->subpkt.length));
> + clen = be32_to_cpu(hdr->cp.length);
> + plen = be32_to_cpu(hdr->pkt.length);
> + slen = be32_to_cpu(hdr->subpkt.length);
> + pr_debug("Response size: cp: %u, pkt: %u, subpkt: %u\n",
> +  clen, plen, slen);
> +
> + if (clen == 0 || plen == 0 || slen == 0 ||
> + (u64)clen + plen + slen > IO_BUFFER_LENGTH) {

I think we're over calculating here. From my understanding of the spec:
TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf
3.2.3
  ComPackets, Packets & Subpackets

The Comp packet size should the size of the packet, and the subpackets.
The "packet" should be the size of all the subpackets.
And lastly the subpacket should be the size its payload.

if ((u64) slen + sizeof(*hdr) > IO_BUFFER_LENGTH)

Since we never need the com packet length or pkt length (we never use them
anywhere) I think the above check is okay. Let me know if i'm wrong though,
or you have a different understanding of the lengths.










Re: [PATCH 3/4] block/sed: Check received header lengths

2017-02-15 Thread Scott Bauer
On Wed, Feb 15, 2017 at 12:38:53PM -0700, Jon Derrick wrote:
> Add a buffer size check against discovery and response header lengths
> before we loop over their buffers.
> 
> Signed-off-by: Jon Derrick 
> ---
>  block/sed-opal.c | 35 +--
>  1 file changed, 21 insertions(+), 14 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index d6dd604..addf89e 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -340,10 +340,17 @@ static int opal_discovery0_end(struct opal_dev *dev)
>   const struct d0_header *hdr = (struct d0_header *)dev->resp;
>   const u8 *epos = dev->resp, *cpos = dev->resp;
>   u16 comid = 0;
> + u32 hlen = be32_to_cpu(hdr->length);
>  
> - print_buffer(dev->resp, be32_to_cpu(hdr->length));
> + print_buffer(dev->resp, hlen);
>  
> - epos += be32_to_cpu(hdr->length); /* end of buffer */
> + if (hlen > IO_BUFFER_LENGTH - sizeof(*hdr)) {
> + pr_warn("Discovery length overflows buffer (%zu+%u)/%u\n",
> + sizeof(*hdr), hlen, IO_BUFFER_LENGTH);
> + return -EFAULT;
> + }
> +
> + epos += hlen; /* end of buffer */
>   cpos += sizeof(*hdr); /* current position on buffer */
>  
>   while (cpos < epos && supported) {
> @@ -713,6 +720,7 @@ static int response_parse(const u8 *buf, size_t length,
>   int total;
>   ssize_t token_length;
>   const u8 *pos;
> + u32 clen, plen, slen;
>  
>   if (!buf)
>   return -EFAULT;
> @@ -724,17 +732,16 @@ static int response_parse(const u8 *buf, size_t length,
>   pos = buf;
>   pos += sizeof(*hdr);
>  
> - pr_debug("Response size: cp: %d, pkt: %d, subpkt: %d\n",
> -  be32_to_cpu(hdr->cp.length),
> -  be32_to_cpu(hdr->pkt.length),
> -  be32_to_cpu(hdr->subpkt.length));
> -
> - if (hdr->cp.length == 0 || hdr->pkt.length == 0 ||
> - hdr->subpkt.length == 0) {
> - pr_err("Bad header length. cp: %d, pkt: %d, subpkt: %d\n",
> -be32_to_cpu(hdr->cp.length),
> -be32_to_cpu(hdr->pkt.length),
> -be32_to_cpu(hdr->subpkt.length));
> + clen = be32_to_cpu(hdr->cp.length);
> + plen = be32_to_cpu(hdr->pkt.length);
> + slen = be32_to_cpu(hdr->subpkt.length);
> + pr_debug("Response size: cp: %u, pkt: %u, subpkt: %u\n",
> +  clen, plen, slen);
> +
> + if (clen == 0 || plen == 0 || slen == 0 ||
> + (u64)clen + plen + slen > IO_BUFFER_LENGTH) {

I think we're over calculating here. From my understanding of the spec:
TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf
3.2.3
  ComPackets, Packets & Subpackets

The Comp packet size should the size of the packet, and the subpackets.
The "packet" should be the size of all the subpackets.
And lastly the subpacket should be the size its payload.

if ((u64) slen + sizeof(*hdr) > IO_BUFFER_LENGTH)

Since we never need the com packet length or pkt length (we never use them
anywhere) I think the above check is okay. Let me know if i'm wrong though,
or you have a different understanding of the lengths.










[PATCH V6 1/3] uapi: sed-opal fix IOW for activate lsp to use correct struct

2017-02-14 Thread Scott Bauer
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would
give us the wrong size when using _IOC_SIZE, switch it to the
right structure.

Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal")

Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 include/uapi/linux/sed-opal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h
index fc06e3a..c72e073 100644
--- a/include/uapi/linux/sed-opal.h
+++ b/include/uapi/linux/sed-opal.h
@@ -106,7 +106,7 @@ struct opal_mbr_data {
 #define IOC_OPAL_SAVE  _IOW('p', 220, struct opal_lock_unlock)
 #define IOC_OPAL_LOCK_UNLOCK   _IOW('p', 221, struct opal_lock_unlock)
 #define IOC_OPAL_TAKE_OWNERSHIP_IOW('p', 222, struct opal_key)
-#define IOC_OPAL_ACTIVATE_LSP   _IOW('p', 223, struct opal_key)
+#define IOC_OPAL_ACTIVATE_LSP   _IOW('p', 223, struct opal_lr_act)
 #define IOC_OPAL_SET_PW _IOW('p', 224, struct opal_new_pw)
 #define IOC_OPAL_ACTIVATE_USR   _IOW('p', 225, struct opal_session_info)
 #define IOC_OPAL_REVERT_TPR _IOW('p', 226, struct opal_key)
-- 
2.7.4



[PATCH V6 1/3] uapi: sed-opal fix IOW for activate lsp to use correct struct

2017-02-14 Thread Scott Bauer
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would
give us the wrong size when using _IOC_SIZE, switch it to the
right structure.

Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal")

Signed-off-by: Scott Bauer 
---
 include/uapi/linux/sed-opal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/sed-opal.h b/include/uapi/linux/sed-opal.h
index fc06e3a..c72e073 100644
--- a/include/uapi/linux/sed-opal.h
+++ b/include/uapi/linux/sed-opal.h
@@ -106,7 +106,7 @@ struct opal_mbr_data {
 #define IOC_OPAL_SAVE  _IOW('p', 220, struct opal_lock_unlock)
 #define IOC_OPAL_LOCK_UNLOCK   _IOW('p', 221, struct opal_lock_unlock)
 #define IOC_OPAL_TAKE_OWNERSHIP_IOW('p', 222, struct opal_key)
-#define IOC_OPAL_ACTIVATE_LSP   _IOW('p', 223, struct opal_key)
+#define IOC_OPAL_ACTIVATE_LSP   _IOW('p', 223, struct opal_lr_act)
 #define IOC_OPAL_SET_PW _IOW('p', 224, struct opal_new_pw)
 #define IOC_OPAL_ACTIVATE_USR   _IOW('p', 225, struct opal_session_info)
 #define IOC_OPAL_REVERT_TPR _IOW('p', 226, struct opal_key)
-- 
2.7.4



[PATCH V6 2/3] Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN

2017-02-14 Thread Scott Bauer
When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 
2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")

Reported-by: Arnd Bergmann <a...@arndb.de>
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 block/sed-opal.c | 133 ---
 drivers/nvme/host/core.c |   3 +-
 include/linux/sed-opal.h |   4 +-
 3 files changed, 50 insertions(+), 90 deletions(-)

diff --git a/block/sed-opal.c b/block/sed-opal.c
index bf1406e..e95b8a5 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -2344,9 +2344,10 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
 }
 EXPORT_SYMBOL(opal_unlock_from_suspend);
 
-int sed_ioctl(struct opal_dev *dev, unsigned int cmd, unsigned long ptr)
+int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
 {
-   void __user *arg = (void __user *)ptr;
+   void *p;
+   int ret = -ENOTTY;
 
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -2355,94 +2356,52 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, 
unsigned long ptr)
return -ENOTSUPP;
}
 
-   switch (cmd) {
-   case IOC_OPAL_SAVE: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_save(dev, _unlk);
-   }
-   case IOC_OPAL_LOCK_UNLOCK: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_lock_unlock(dev, _unlk);
-   }
-   case IOC_OPAL_TAKE_OWNERSHIP: {
-   struct opal_key opal_key;
-
-   if (copy_from_user(_key, arg, sizeof(opal_key)))
-   return -EFAULT;
-   return opal_take_ownership(dev, _key);
-   }
-   case IOC_OPAL_ACTIVATE_LSP: {
-   struct opal_lr_act opal_lr_act;
-
-   if (copy_from_user(_lr_act, arg, sizeof(opal_lr_act)))
-   return -EFAULT;
-   return opal_activate_lsp(dev, _lr_act);
-   }
-   case IOC_OPAL_SET_PW: {
-   struct opal_new_pw opal_pw;
-
-   if (copy_from_user(_pw, arg, sizeof(opal_pw)))
-   return -EFAULT;
-   return opal_set_new_pw(dev, _pw);
-   }
-   case IOC_OPAL_ACTIVATE_USR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_activate_user(dev, );
-   }
-   case IOC_OPAL_REVERT_TPR: {
-   struct opal_key opal_key;
-
-   if (copy_from_user(_key, arg, sizeof(opal_key)))
-   return -EFAULT;
-   return opal_reverttper(dev, _key);
-   }
-   case IOC_OPAL_LR_SETUP: {
-   struct opal_user_lr_setup lrs;
+   p = memdup_user(arg,  _IOC_SIZE(cmd));
+   if (IS_ERR(p))
+   return PTR_ERR(p);
 
-   if (copy_from_user(, arg, sizeof(lrs)))
-   return -EFAULT;
-   return opal_setup_locking_range(dev, );
-   }
-   case IOC_OPAL_ADD_USR_TO_LR: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_add_user_to_lr(dev, _unlk);
-   }
-   case IOC_OPAL_ENABLE_DISABLE_MBR: {
-   struct opal_mbr_data mbr;
-
-   if (copy_from_user(, arg, sizeof(mbr)))
-   return -EFAULT;
-   return opal_enable_disable_shadow_mbr(dev, );
-   }
-   case IOC_OPAL_ERASE_LR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_erase_locking_range(dev, );
-   }
-   case IOC_OPAL_SECURE_ERASE_LR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_secure_erase_locking_range(dev, );
-   }
+   switch (cmd) {
+   case IOC_OPAL_SAVE:
+   ret = opal_save(dev, p);
+   break;
+   case IOC_OPAL_LOCK_UNLOCK:
+   ret = opal_lock_unlock(dev, p);
+   break;
+   case IOC_OPAL_TAKE_OWNERSHIP:
+   ret = opal_take_ownership(dev, p);
+   break;
+   case IOC_OPAL_ACTIVATE_LSP:
+   ret = opal_activate_lsp(dev, p);
+   break;
+ 

[PATCH V6 3/3] Maintainers: Modify SED list from nvme to block

2017-02-14 Thread Scott Bauer
Signed-off-by: Scott Bauer <scott.ba...@intel.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e325373..b983b25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11094,7 +11094,7 @@ SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
 M: Scott Bauer <scott.ba...@intel.com>
 M: Jonathan Derrick <jonathan.derr...@intel.com>
 M: Rafael Antognolli <rafael.antogno...@intel.com>
-L: linux-n...@lists.infradead.org
+L: linux-bl...@vger.kernel.org
 S: Supported
 F: block/sed*
 F: block/opal_proto.h
-- 
2.7.4



[PATCH V6 2/3] Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN

2017-02-14 Thread Scott Bauer
When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 
2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")

Reported-by: Arnd Bergmann 
Signed-off-by: Scott Bauer 
---
 block/sed-opal.c | 133 ---
 drivers/nvme/host/core.c |   3 +-
 include/linux/sed-opal.h |   4 +-
 3 files changed, 50 insertions(+), 90 deletions(-)

diff --git a/block/sed-opal.c b/block/sed-opal.c
index bf1406e..e95b8a5 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -2344,9 +2344,10 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
 }
 EXPORT_SYMBOL(opal_unlock_from_suspend);
 
-int sed_ioctl(struct opal_dev *dev, unsigned int cmd, unsigned long ptr)
+int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
 {
-   void __user *arg = (void __user *)ptr;
+   void *p;
+   int ret = -ENOTTY;
 
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -2355,94 +2356,52 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, 
unsigned long ptr)
return -ENOTSUPP;
}
 
-   switch (cmd) {
-   case IOC_OPAL_SAVE: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_save(dev, _unlk);
-   }
-   case IOC_OPAL_LOCK_UNLOCK: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_lock_unlock(dev, _unlk);
-   }
-   case IOC_OPAL_TAKE_OWNERSHIP: {
-   struct opal_key opal_key;
-
-   if (copy_from_user(_key, arg, sizeof(opal_key)))
-   return -EFAULT;
-   return opal_take_ownership(dev, _key);
-   }
-   case IOC_OPAL_ACTIVATE_LSP: {
-   struct opal_lr_act opal_lr_act;
-
-   if (copy_from_user(_lr_act, arg, sizeof(opal_lr_act)))
-   return -EFAULT;
-   return opal_activate_lsp(dev, _lr_act);
-   }
-   case IOC_OPAL_SET_PW: {
-   struct opal_new_pw opal_pw;
-
-   if (copy_from_user(_pw, arg, sizeof(opal_pw)))
-   return -EFAULT;
-   return opal_set_new_pw(dev, _pw);
-   }
-   case IOC_OPAL_ACTIVATE_USR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_activate_user(dev, );
-   }
-   case IOC_OPAL_REVERT_TPR: {
-   struct opal_key opal_key;
-
-   if (copy_from_user(_key, arg, sizeof(opal_key)))
-   return -EFAULT;
-   return opal_reverttper(dev, _key);
-   }
-   case IOC_OPAL_LR_SETUP: {
-   struct opal_user_lr_setup lrs;
+   p = memdup_user(arg,  _IOC_SIZE(cmd));
+   if (IS_ERR(p))
+   return PTR_ERR(p);
 
-   if (copy_from_user(, arg, sizeof(lrs)))
-   return -EFAULT;
-   return opal_setup_locking_range(dev, );
-   }
-   case IOC_OPAL_ADD_USR_TO_LR: {
-   struct opal_lock_unlock lk_unlk;
-
-   if (copy_from_user(_unlk, arg, sizeof(lk_unlk)))
-   return -EFAULT;
-   return opal_add_user_to_lr(dev, _unlk);
-   }
-   case IOC_OPAL_ENABLE_DISABLE_MBR: {
-   struct opal_mbr_data mbr;
-
-   if (copy_from_user(, arg, sizeof(mbr)))
-   return -EFAULT;
-   return opal_enable_disable_shadow_mbr(dev, );
-   }
-   case IOC_OPAL_ERASE_LR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_erase_locking_range(dev, );
-   }
-   case IOC_OPAL_SECURE_ERASE_LR: {
-   struct opal_session_info session;
-
-   if (copy_from_user(, arg, sizeof(session)))
-   return -EFAULT;
-   return opal_secure_erase_locking_range(dev, );
-   }
+   switch (cmd) {
+   case IOC_OPAL_SAVE:
+   ret = opal_save(dev, p);
+   break;
+   case IOC_OPAL_LOCK_UNLOCK:
+   ret = opal_lock_unlock(dev, p);
+   break;
+   case IOC_OPAL_TAKE_OWNERSHIP:
+   ret = opal_take_ownership(dev, p);
+   break;
+   case IOC_OPAL_ACTIVATE_LSP:
+   ret = opal_activate_lsp(dev, p);
+   break;
+   case IOC_OPAL_SET_PW:
+   ret = opal_set_new

[PATCH V6 3/3] Maintainers: Modify SED list from nvme to block

2017-02-14 Thread Scott Bauer
Signed-off-by: Scott Bauer 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e325373..b983b25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11094,7 +11094,7 @@ SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
 M: Scott Bauer 
 M: Jonathan Derrick 
 M: Rafael Antognolli 
-L: linux-n...@lists.infradead.org
+L: linux-bl...@vger.kernel.org
 S: Supported
 F: block/sed*
 F: block/opal_proto.h
-- 
2.7.4



Re: [PATCH V5 3/4] Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN

2017-02-13 Thread Scott Bauer
On Mon, Feb 13, 2017 at 04:30:36PM +, David Laight wrote:
> From: Scott Bauer Sent: 13 February 2017 16:11
> > When CONFIG_KASAN is enabled, compilation fails:
> > 
> > block/sed-opal.c: In function 'sed_ioctl':
> > block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 
> > 2048 bytes [-Werror=frame-
> > larger-than=]
> > 
> > Moved all the ioctl structures off the stack and dynamically activate
> > using _IOC_SIZE()
> 
> Think I'd not that this simplifies the code considerably.
> AFAICT CONFIG_KASAN is a just brainf*ck.
> It at least needs annotation that copy_from_user() has properties
> similar to memset().
> So if the size matches that of the type then no guard space (etc)
> is required.
>

I don't really follow what you're saying. Do you want me to just include that
the patch cleans up the ioctl path a bit. I need to include the KASAN part since
there was build breakage and the series does fix it even if it simplifies the 
path
as well. As for the memset part, we never copy back to userland so there's no 
chance
of data leakage which is what it seems you're hinting at.

> ...
> > +   ioctl_ptr = memdup_user(arg,  _IOC_SIZE(cmd));
> > +   if (IS_ERR_OR_NULL(ioctl_ptr)) {
> > +   ret = PTR_ERR(ioctl_ptr);
> > +   goto out;
> ...
> > + out:
> > +   kfree(ioctl_ptr);
> > +   return ret;
> >  }


> 
> That error path doesn't look quite right to me.
> 
>   David
> 

good god, this is a mess this morning. Thanks for the catch, I'll review these
more aggressively before sending out. 


Re: [PATCH V5 3/4] Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN

2017-02-13 Thread Scott Bauer
On Mon, Feb 13, 2017 at 04:30:36PM +, David Laight wrote:
> From: Scott Bauer Sent: 13 February 2017 16:11
> > When CONFIG_KASAN is enabled, compilation fails:
> > 
> > block/sed-opal.c: In function 'sed_ioctl':
> > block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 
> > 2048 bytes [-Werror=frame-
> > larger-than=]
> > 
> > Moved all the ioctl structures off the stack and dynamically activate
> > using _IOC_SIZE()
> 
> Think I'd not that this simplifies the code considerably.
> AFAICT CONFIG_KASAN is a just brainf*ck.
> It at least needs annotation that copy_from_user() has properties
> similar to memset().
> So if the size matches that of the type then no guard space (etc)
> is required.
>

I don't really follow what you're saying. Do you want me to just include that
the patch cleans up the ioctl path a bit. I need to include the KASAN part since
there was build breakage and the series does fix it even if it simplifies the 
path
as well. As for the memset part, we never copy back to userland so there's no 
chance
of data leakage which is what it seems you're hinting at.

> ...
> > +   ioctl_ptr = memdup_user(arg,  _IOC_SIZE(cmd));
> > +   if (IS_ERR_OR_NULL(ioctl_ptr)) {
> > +   ret = PTR_ERR(ioctl_ptr);
> > +   goto out;
> ...
> > + out:
> > +   kfree(ioctl_ptr);
> > +   return ret;
> >  }


> 
> That error path doesn't look quite right to me.
> 
>   David
> 

good god, this is a mess this morning. Thanks for the catch, I'll review these
more aggressively before sending out. 


Re: [PATCH V5 1/4] block: sed-opal: change ioctl to take user pointer instead of unsinged long

2017-02-13 Thread Scott Bauer
esOn Mon, Feb 13, 2017 at 09:11:09AM -0700, Scott Bauer wrote:
> Signed-off-by: Scott Bauer <scott.ba...@intel.com>
> ---
>  block/sed-opal.c | 6 --
>  drivers/nvme/host/core.c | 3 ++-
>  include/linux/sed-opal.h | 4 ++--
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index bf1406e..2448d4a 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -2344,9 +2344,11 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
>  }
>  EXPORT_SYMBOL(opal_unlock_from_suspend);
>  
> -int sed_ioctl(struct opal_dev *dev, unsigned int cmd, unsigned long ptr)
> +int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
>  {
> - void __user *arg = (void __user *)ptr;
> + void *ioctl_ptr;
> + int ret = -ENOTTY;
> + unsigned int cmd_size = _IOC_SIZE(cmd);

ugh, I apparently messed up my rebase these should be in patch 2 or maybe I 
should
sqash p1 and p2  together.


Re: [PATCH V5 1/4] block: sed-opal: change ioctl to take user pointer instead of unsinged long

2017-02-13 Thread Scott Bauer
esOn Mon, Feb 13, 2017 at 09:11:09AM -0700, Scott Bauer wrote:
> Signed-off-by: Scott Bauer 
> ---
>  block/sed-opal.c | 6 --
>  drivers/nvme/host/core.c | 3 ++-
>  include/linux/sed-opal.h | 4 ++--
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/block/sed-opal.c b/block/sed-opal.c
> index bf1406e..2448d4a 100644
> --- a/block/sed-opal.c
> +++ b/block/sed-opal.c
> @@ -2344,9 +2344,11 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
>  }
>  EXPORT_SYMBOL(opal_unlock_from_suspend);
>  
> -int sed_ioctl(struct opal_dev *dev, unsigned int cmd, unsigned long ptr)
> +int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
>  {
> - void __user *arg = (void __user *)ptr;
> + void *ioctl_ptr;
> + int ret = -ENOTTY;
> + unsigned int cmd_size = _IOC_SIZE(cmd);

ugh, I apparently messed up my rebase these should be in patch 2 or maybe I 
should
sqash p1 and p2  together.


  1   2   >