Re: Outstanding MQ questions from MMC

2017-05-18 Thread Christoph Hellwig
On Wed, Mar 29, 2017 at 05:09:37AM +0200, Linus Walleij wrote:
> 2. When MMC cards are ejected a serious error condition occurs. So for this
> reason we spool out the queue with
> 
> req->rq_flags |= RQF_QUIET;
> blk_end_request_all(req, -EIO);
> 
> This will shut up a huge amount of console errors for example.
> I have no clue on how to manage this with MQ. I am currently using
> 
> blk_mq_complete_request(mq_rq->req, -EIO);
> 
> and nothing else, and it will hit all requests for the ejected card coming
> in from this point. Is this the right solution? Or is there some medium
> eject handling I'm not aware of inside the MQ layer? It seems like something
> that can happen on pluggable harddrives and CDROMS and what not.

Hot unplug handling currently is a mess in the block layer (not just the
mq case).  Especially if you ever requeue an I/O there is tons of
boilerplate code.  I wish we could move a little more of this into the
core, but right now I don't have a good idea on how to.

> 3. Sometimes a read or write gets partially completed. Say we read 12 out
> of 15 sectors or somthing like that. I have no idea how often this occurs in
> practice. With the old block layer we did this:
> 
> blk_end_request(req, 0, bytes_xfered);

You can still do this with mq, but you have to open code it.  Take a look
at scsi_end_request which has opencoded versions of the mq and non-mq
end_request routines.  Currently it's the only example of blk-mq partial
completions.


Re: Outstanding MQ questions from MMC

2017-05-18 Thread Linus Walleij
On Tue, Apr 18, 2017 at 5:31 PM, Alex Lemberg  wrote:

> There is an additional functionality, which is require the host lock
> to be held for several write commands - the FFU.
> In case of FFU, the FW can be download/write in several iterations
> of Write command (CMD25). This sequence should not be interrupted by regular
> Write requests.
> In current driver, both FFU and RPMB can be sent by using 
> mmc_blk_ioctl_multi_cmd().

Both single and multi ioctl()s are funneled into the block layer
using the driver-specific request ops in the latest iteration of
my patch set.

It turns out this was a simpler change than I though.

If you check the patch set I realized that I also fixes a userspace
starvation issue when issueing ioctls() such as for RPMB
during heavy block I/O.

This usecase:

> dd if=/dev/mmcblk3 of=/dev/null bs=1M &
> mmc extcs read /dev/mmcblk3

This would previously hang until the dd command was complete before
issuing the ioctl() command, just waiting for the host lock.
I guess RPMB has the same problem...

It is now fixed.

If you can verify the v2 patch set (just posted) and provide Tested-by's
(or bug reports...) it's appreciated.

Yours,
Linus Walleij


Re: Outstanding MQ questions from MMC

2017-04-18 Thread Alex Lemberg
There is an additional functionality, which is require the host lock
to be held for several write commands - the FFU.
In case of FFU, the FW can be download/write in several iterations
of Write command (CMD25). This sequence should not be interrupted by regular
Write requests.
In current driver, both FFU and RPMB can be sent by using 
mmc_blk_ioctl_multi_cmd().

Thanks,
Alex


> On Apr 15, 2017, at 10:24 PM, Avri Altman  wrote:
> 
> You can see how it's done in mmc_blk_ioctl_rpmb_cmd().
> 
> The RPMB protocol defines 6 types of accesses:
> Accesses that performs read operation (read counter, read data, and read 
> configuration) - requires sending 2 requests. 
> Accesses that performs write operation (program key, write data, write 
> configuration) - requires sending 3 requests,
> But you must do read counter beforehand (accept from program key), hence the 
> 5 different requests.
> 
> The standard does not define a "special" request that does it all in once, 
> but expects a pre-define series of cmd18 & cmd25 for each access type, 
> in which the payload are 512 bytes frames in a pre-defined structure.
> 
> Cheers,
> Avri
> 
>> -Original Message-
>> From: Linus Walleij [mailto:linus.wall...@linaro.org]
>> Sent: Saturday, April 15, 2017 9:35 PM
>> To: Avri Altman 
>> Cc: Arnd Bergmann ; Ulf Hansson
>> ; linux-...@vger.kernel.org; linux-
>> bl...@vger.kernel.org; Jens Axboe ; Christoph Hellwig
>> ; Adrian Hunter ; Paolo Valente
>> 
>> Subject: Re: Outstanding MQ questions from MMC
>> 
>> On Fri, Apr 14, 2017 at 8:41 PM, Avri Altman 
>> wrote:
>>> [Me]
>>>> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
>>>>   things and funnel them into the block scheduler
>>>>   using REQ_OP_DRV_IN/OUT requests.
>>>> 
>>> 
>>> Accessing the RPMB is done via a strange protocol, in which each access is
>> comprised of several requests.
>>> For example, writing to the RPMB will require sending 5 different requests:
>>> 2 requests to read the write counter, and then 3 more requests for the
>> write operation itself.
>>> 
>>> Once the sequence has started, it should not get interfered by other
>> requests, or the operation will fail.
>> 
>> So I guess currently something takes a host lock and then performs the
>> 5 requests.
>> 
>> Thus we need to send a single custom request containing a list of 5 things to
>> do, and return after that.
>> 
>> Or do you mean that we return to userspace inbetween these different
>> requests and the sequencing is done in userspace?
>> 
>> I hope not because that sounds fragile, like userspace could crash and leave
>> the host lock dangling :/
>> 
>> Yours,
>> Linus Walleij
> ��칻�&�~�&���+-��ݶ��w��˛���m�b��f�ȧ���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?&�)ߢf



RE: Outstanding MQ questions from MMC

2017-04-15 Thread Avri Altman
You can see how it's done in mmc_blk_ioctl_rpmb_cmd().

The RPMB protocol defines 6 types of accesses:
Accesses that performs read operation (read counter, read data, and read 
configuration) - requires sending 2 requests. 
Accesses that performs write operation (program key, write data, write 
configuration) - requires sending 3 requests,
But you must do read counter beforehand (accept from program key), hence the 5 
different requests.

The standard does not define a "special" request that does it all in once, 
but expects a pre-define series of cmd18 & cmd25 for each access type, 
in which the payload are 512 bytes frames in a pre-defined structure.
  
Cheers,
Avri

> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Saturday, April 15, 2017 9:35 PM
> To: Avri Altman 
> Cc: Arnd Bergmann ; Ulf Hansson
> ; linux-...@vger.kernel.org; linux-
> bl...@vger.kernel.org; Jens Axboe ; Christoph Hellwig
> ; Adrian Hunter ; Paolo Valente
> 
> Subject: Re: Outstanding MQ questions from MMC
> 
> On Fri, Apr 14, 2017 at 8:41 PM, Avri Altman 
> wrote:
> > [Me]
> >> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
> >>things and funnel them into the block scheduler
> >>using REQ_OP_DRV_IN/OUT requests.
> >>
> >
> > Accessing the RPMB is done via a strange protocol, in which each access is
> comprised of several requests.
> > For example, writing to the RPMB will require sending 5 different requests:
> > 2 requests to read the write counter, and then 3 more requests for the
> write operation itself.
> >
> > Once the sequence has started, it should not get interfered by other
> requests, or the operation will fail.
> 
> So I guess currently something takes a host lock and then performs the
> 5 requests.
> 
> Thus we need to send a single custom request containing a list of 5 things to
> do, and return after that.
> 
> Or do you mean that we return to userspace inbetween these different
> requests and the sequencing is done in userspace?
> 
> I hope not because that sounds fragile, like userspace could crash and leave
> the host lock dangling :/
> 
> Yours,
> Linus Walleij


Re: Outstanding MQ questions from MMC

2017-04-15 Thread Linus Walleij
On Fri, Apr 14, 2017 at 8:41 PM, Avri Altman  wrote:
> [Me]
>> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
>>things and funnel them into the block scheduler
>>using REQ_OP_DRV_IN/OUT requests.
>>
>
> Accessing the RPMB is done via a strange protocol, in which each access is 
> comprised of several requests.
> For example, writing to the RPMB will require sending 5 different requests:
> 2 requests to read the write counter, and then 3 more requests for the write 
> operation itself.
>
> Once the sequence has started, it should not get interfered by other 
> requests, or the operation will fail.

So I guess currently something takes a host lock and then performs the
5 requests.

Thus we need to send a single custom request containing a list of 5
things to do, and return after that.

Or do you mean that we return to userspace inbetween these different
requests and the sequencing is done in userspace?

I hope not because that sounds fragile, like userspace could crash and
leave the host lock dangling :/

Yours,
Linus Walleij


Re: Outstanding MQ questions from MMC

2017-04-15 Thread Ulf Hansson
[...]

>> Alternatively, I had this idea that we could translate blk requests into
>> mmc commands and then have a (short fixed length) set of outstanding
>> mmc commands in the device that always get done in order. The card
>> detect and the user space I/O would then directly put mmc commands
>> onto the command queue, as would the blk-mq scheduler. You
>> still need a lock to access that command queue, but the mmc host
>> would just always pick the next command off the list when one
>> command completes.
>
> I looked into this.
>
> The block layer queue can wrap and handle custom device commands
> using REQ_OP_DRV_IN/OUT, and that seems to be the best way
> to play with the block layer IMO.
>
> The card detect work is a special case because it is also used by
> SDIO which does not use the block layer. But that could maybe be
> solved by a separate host lock just for the SDIO case, letting
> devices accessed as block devices use the method of inserting
> custom commands.

The problem with trying to manage the SDIO case as a specific case, it
that it is the same work (mmc_rescan()) that runs to detect any kind
of removable card.

Moreover, it's not until the card has been fully detected and
initialized, when we can realize what kind of card it is.

Perhaps we can re-factor the hole mmc_rescan() thing so there is one
part that can be run only to detect new cards being inserted in
lockless fashion, while another part could deal with the
polling/removal - which then perhaps could be different depending on
the card type.

Not sure if this helps...

>
> I looked at how e.g. IDE and SCSI does this, drivers/ide/ide-ioctls.c
> looks like this nowadays:
>
> static int generic_drive_reset(ide_drive_t *drive)
> {
> struct request *rq;
> int ret = 0;
>
> rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
> scsi_req_init(rq);
> ide_req(rq)->type = ATA_PRIV_MISC;
> scsi_req(rq)->cmd_len = 1;
> scsi_req(rq)->cmd[0] = REQ_DRIVE_RESET;
> if (blk_execute_rq(drive->queue, NULL, rq, 1))
> ret = rq->errors;
> blk_put_request(rq);
> return ret;
> }
>
> So it creates a custom REQ_OP_DRV_IN request, then scsi_req_init()
> sets up the special command, in this case
> ATA_PRIV_MISC/REQ_DRIVE_RESET and toss this into the block
> queue like everything else.
>
> We could do the same, especially the RPMB operations should
> probably have been done like this from the beginning. But due to
> historical factors they were not.
>
> It is a bit hairy and the whole thing is in a bit of flux because Christoph
> is heavily refactoring this and cleaning up the old block devices as
> we speak (I bet) so it is a bit hard to do the right thing.
>
> I easily get confused here ... for example there is custom
> per-request data access by this simple:
>
> scsi_req_init(rq)
>
> which does
>
> struct scsi_request *req = scsi_req(rq);
>
> which does
>
> static inline struct scsi_request *scsi_req(struct request *rq)
> {
> return blk_mq_rq_to_pdu(rq);
> }
>
> Oohps blk_mq_* namespace? You would assume this means that
> you have to use blk-mq? Nah, I think not, because all it does is:
>
> static inline void *blk_mq_rq_to_pdu(struct request *rq)
> {
> return rq + 1;
> }
>
> So while we have to #include  this is one of these
> mixed semantics that just give you a pointer to something behind
> the request, a method that is simple and natural in blk-mq but which
> is (I guess) set up by some other mechanism in the !mq case,
> albeit access by this inline.
>
> And I have to do this with the old block layer to get to a point
> where we can start using blk-mq, sigh.
>
> The border between blk and blk-mq is a bit blurry right now.
>
> With blk-mq I do this:
>
> mq->tag_set.cmd_size = sizeof(foo_cmd);
> blk_mq_alloc_tag_set(...)
>
> To do this with the old blk layer I may need some help to figure
> out how to set up per-request additional data in a way that works
> with the old layer.
>
> scsi_lib.c scsi_alloc_queue() does this:
>
> q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
> if (!q)
>   return NULL;
> q->cmd_size = sizeof(foo_cmd);
>
> And this means there will be sizeof(foo_cmd) after the request
> that can be dereferenced by blk_mq_rq_to_pdu(rq);...
>
> Yeah I'll try it.
>
> Just trying to give a picture of why it's a bit in flux here.
> Or documenting it for myself :D
>
>> This also lets you integrate packed commands: if the next outstanding
>> command is the same type as the request coming in from blk-mq,
>> you can merge it into a single mmc command to be submitted
>> together, otherwise it gets deferred.
>
> Essentially the heavy lifting that needs to happen is:
>
> 1. Start allocating per-request PDU (extra data) in the MMC case
>this will then be struct mmc_queue_req request items.
>
> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
>things and funnel them into the block scheduler
>using REQ

RE: Outstanding MQ questions from MMC

2017-04-14 Thread Avri Altman


> 
> 2. Turn RPMB and other ioctl() MMC operations into mmc_queue_req
>things and funnel them into the block scheduler
>using REQ_OP_DRV_IN/OUT requests.
> 

Accessing the RPMB is done via a strange protocol, in which each access is 
comprised of several requests.
For example, writing to the RPMB will require sending 5 different requests: 
2 requests to read the write counter, and then 3 more requests for the write 
operation itself.

Once the sequence has started, it should not get interfered by other requests, 
or the operation will fail.

So, if you are looking to eliminate the host lock, and count merely on the blk 
queue to regulate  access to the device,
You'll be needing some barrier mechanism that will assure that a sequence of 
requests will take place as an atomic unit.

But then again, isn't it contradicts the very idea of a multi-queue?

Cheers,
Avri


Re: Outstanding MQ questions from MMC

2017-04-13 Thread Linus Walleij
On Fri, Mar 31, 2017 at 10:43 AM, Arnd Bergmann  wrote:
> On Thu, Mar 30, 2017 at 6:39 PM, Ulf Hansson  wrote:
>> On 30 March 2017 at 14:42, Arnd Bergmann  wrote:
>>> On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij  
>>> wrote:
>
 In MQ, I have simply locked the host on the first request and then I never
 release it. Clearly this does not work. I am uncertain on how to handle 
 this
 and whether MQ has a way to tell us that the queue is empty so we may 
 release
 the host. I toyed with the idea to just set up a timer, but a "queue
 empty" callback
 from the block layer is what would be ideal.
>>>
>>> Would it be possible to change the userspace code to go through
>>> the block layer instead and queue a request there, to avoid having
>>> to lock the card at all?
>>
>> That would be good from an I/O scheduling point of view, as it would
>> avoid one side being able to starve the other.
>>
>> However, we would still need a lock, as we also have card detect work
>> queue, which also needs to claim the host when it polls for removable
>> cards.
>
> Hmm, In theory the card-detect work queue should not be active
> at the same time as any I/O, but I see you point. Could the card-detect
> wq perhaps also use the blk queue for sending a special request that
> triggers the detection logic?
>
> Alternatively, I had this idea that we could translate blk requests into
> mmc commands and then have a (short fixed length) set of outstanding
> mmc commands in the device that always get done in order. The card
> detect and the user space I/O would then directly put mmc commands
> onto the command queue, as would the blk-mq scheduler. You
> still need a lock to access that command queue, but the mmc host
> would just always pick the next command off the list when one
> command completes.

I looked into this.

The block layer queue can wrap and handle custom device commands
using REQ_OP_DRV_IN/OUT, and that seems to be the best way
to play with the block layer IMO.

The card detect work is a special case because it is also used by
SDIO which does not use the block layer. But that could maybe be
solved by a separate host lock just for the SDIO case, letting
devices accessed as block devices use the method of inserting
custom commands.

I looked at how e.g. IDE and SCSI does this, drivers/ide/ide-ioctls.c
looks like this nowadays:

static int generic_drive_reset(ide_drive_t *drive)
{
struct request *rq;
int ret = 0;

rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
scsi_req_init(rq);
ide_req(rq)->type = ATA_PRIV_MISC;
scsi_req(rq)->cmd_len = 1;
scsi_req(rq)->cmd[0] = REQ_DRIVE_RESET;
if (blk_execute_rq(drive->queue, NULL, rq, 1))
ret = rq->errors;
blk_put_request(rq);
return ret;
}

So it creates a custom REQ_OP_DRV_IN request, then scsi_req_init()
sets up the special command, in this case
ATA_PRIV_MISC/REQ_DRIVE_RESET and toss this into the block
queue like everything else.

We could do the same, especially the RPMB operations should
probably have been done like this from the beginning. But due to
historical factors they were not.

It is a bit hairy and the whole thing is in a bit of flux because Christoph
is heavily refactoring this and cleaning up the old block devices as
we speak (I bet) so it is a bit hard to do the right thing.

I easily get confused here ... for example there is custom
per-request data access by this simple:

scsi_req_init(rq)

which does

struct scsi_request *req = scsi_req(rq);

which does

static inline struct scsi_request *scsi_req(struct request *rq)
{
return blk_mq_rq_to_pdu(rq);
}

Oohps blk_mq_* namespace? You would assume this means that
you have to use blk-mq? Nah, I think not, because all it does is:

static inline void *blk_mq_rq_to_pdu(struct request *rq)
{
return rq + 1;
}

So while we have to #include  this is one of these
mixed semantics that just give you a pointer to something behind
the request, a method that is simple and natural in blk-mq but which
is (I guess) set up by some other mechanism in the !mq case,
albeit access by this inline.

And I have to do this with the old block layer to get to a point
where we can start using blk-mq, sigh.

The border between blk and blk-mq is a bit blurry right now.

With blk-mq I do this:

mq->tag_set.cmd_size = sizeof(foo_cmd);
blk_mq_alloc_tag_set(...)

To do this with the old blk layer I may need some help to figure
out how to set up per-request additional data in a way that works
with the old layer.

scsi_lib.c scsi_alloc_queue() does this:

q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
if (!q)
  return NULL;
q->cmd_size = sizeof(foo_cmd);

And this means there will be sizeof(foo_cmd) after the request
that can be dereferenced by blk_mq_rq_to_pdu(rq);...

Yeah I'll try it.

Just trying to give a picture of why it's a bit in flux here.
Or documenting it for myself :D

> This als

Re: Outstanding MQ questions from MMC

2017-03-31 Thread Arnd Bergmann
On Thu, Mar 30, 2017 at 6:39 PM, Ulf Hansson  wrote:
> On 30 March 2017 at 14:42, Arnd Bergmann  wrote:
>> On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij  
>> wrote:

>>> In MQ, I have simply locked the host on the first request and then I never
>>> release it. Clearly this does not work. I am uncertain on how to handle this
>>> and whether MQ has a way to tell us that the queue is empty so we may 
>>> release
>>> the host. I toyed with the idea to just set up a timer, but a "queue
>>> empty" callback
>>> from the block layer is what would be ideal.
>>
>> Would it be possible to change the userspace code to go through
>> the block layer instead and queue a request there, to avoid having
>> to lock the card at all?
>
> That would be good from an I/O scheduling point of view, as it would
> avoid one side being able to starve the other.
>
> However, we would still need a lock, as we also have card detect work
> queue, which also needs to claim the host when it polls for removable
> cards.

Hmm, In theory the card-detect work queue should not be active
at the same time as any I/O, but I see you point. Could the card-detect
wq perhaps also use the blk queue for sending a special request that
triggers the detection logic?

Alternatively, I had this idea that we could translate blk requests into
mmc commands and then have a (short fixed length) set of outstanding
mmc commands in the device that always get done in order. The card
detect and the user space I/O would then directly put mmc commands
onto the command queue, as would the blk-mq scheduler. You
still need a lock to access that command queue, but the mmc host
would just always pick the next command off the list when one
command completes.
This also lets you integrate packed commands: if the next outstanding
command is the same type as the request coming in from blk-mq,
you can merge it into a single mmc command to be submitted
together, otherwise it gets deferred.

  Arnd


Re: Outstanding MQ questions from MMC

2017-03-30 Thread Ulf Hansson
On 30 March 2017 at 14:42, Arnd Bergmann  wrote:
> On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij  
> wrote:
>> Hi folks,
>>
>> I earlier left some unanswered questions in my MMC to MQ conversion series
>> but I figured it is better if I collect them and ask the blk-mq
>> maintainers directly
>> how to deal with the following situations that occur in the MMC block layer:
>>
>>
>> 1. The current MMC code locks the host when the first request comes in
>> from blk_fetch_request() and unlocks it when blk_fetch_request() returns
>> NULL twice in a row. Then the polling thread terminated and is not restarted
>> until we get called by the mmc_request_fn.
>>
>> Host locking means that we will not send other commands to the MMC
>> card from i.e. userspace, which sometimes can send spurious stuff orthogonal
>> to the block layer. If the block layer has locked the host, userspace
>> has to wait
>> and vice versa. It is not a common contention point but it still happens.
>>
>> In MQ, I have simply locked the host on the first request and then I never
>> release it. Clearly this does not work. I am uncertain on how to handle this
>> and whether MQ has a way to tell us that the queue is empty so we may release
>> the host. I toyed with the idea to just set up a timer, but a "queue
>> empty" callback
>> from the block layer is what would be ideal.
>
> Would it be possible to change the userspace code to go through
> the block layer instead and queue a request there, to avoid having
> to lock the card at all?

That would be good from an I/O scheduling point of view, as it would
avoid one side being able to starve the other.

However, we would still need a lock, as we also have card detect work
queue, which also needs to claim the host when it polls for removable
cards.

Kind regards
Uffe


Re: Outstanding MQ questions from MMC

2017-03-30 Thread Arnd Bergmann
On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij  wrote:
> Hi folks,
>
> I earlier left some unanswered questions in my MMC to MQ conversion series
> but I figured it is better if I collect them and ask the blk-mq
> maintainers directly
> how to deal with the following situations that occur in the MMC block layer:
>
>
> 1. The current MMC code locks the host when the first request comes in
> from blk_fetch_request() and unlocks it when blk_fetch_request() returns
> NULL twice in a row. Then the polling thread terminated and is not restarted
> until we get called by the mmc_request_fn.
>
> Host locking means that we will not send other commands to the MMC
> card from i.e. userspace, which sometimes can send spurious stuff orthogonal
> to the block layer. If the block layer has locked the host, userspace
> has to wait
> and vice versa. It is not a common contention point but it still happens.
>
> In MQ, I have simply locked the host on the first request and then I never
> release it. Clearly this does not work. I am uncertain on how to handle this
> and whether MQ has a way to tell us that the queue is empty so we may release
> the host. I toyed with the idea to just set up a timer, but a "queue
> empty" callback
> from the block layer is what would be ideal.

Would it be possible to change the userspace code to go through
the block layer instead and queue a request there, to avoid having
to lock the card at all?

   Arnd