[PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()

2019-10-23 Thread James Smart
During the following test scenario:
- Offline a cpu
- load lpfc driver, which auto-discovers NVMe devices. For a new
  nvme device, the lpfc/nvme_fc transport can request up to
  num_online_cpus() worth of nr_hw_queues. The target in
  this case allowed at least that many of nvme queues.
The system encountered the following crash:

 BUG: unable to handle kernel paging request at 3659d33953a8
 ...
 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
 RIP: 0010:blk_mq_get_request+0x21d/0x3c0
 ...
 Blk_mq_alloc_request_hctx+0xef/0x140
 Nvme_alloc_request+0x32/0x80 [nvme_core]
 __nvme_submit_sync_cmd+0x4a/0x1c0 [nvme_core]
 Nvmf_connect_io_queue+0x130/0x1a0 [nvme_fabrics]
 Nvme_fc_connect_io_queues+0x285/0x2b0 [nvme_fc]
 Nvme_fc_create_association+0x0x8ea/0x9c0 [nvme_fc]
 Nvme_fc_connect_ctrl_work+0x19/0x50 [nvme_fc]
 ...

There was a commit a while ago to simplify queue mapping which
replaced the use of cpumask_first() by cpumask_first_and().
The issue is if cpumask_first_and() does not find any _intersecting_ cpus,
it return's nr_cpu_id. nr_cpu_id isn't valid for the per_cpu_ptr index
which is done in __blk_mq_get_ctx().

Considered reverting back to cpumask_first(), but instead followed
logic in blk_mq_first_mapped_cpu() to check for nr_cpu_id before
calling cpumask_first().

Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each 
possisble CPU")
Signed-off-by: Shagun Agrawal 
Signed-off-by: James Smart 
CC: Christoph Hellwig 
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8538dc415499..0b06b4ea57f1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q,
return ERR_PTR(-EXDEV);
}
cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(alloc_data.hctx->cpumask);
alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
 
rq = blk_mq_get_request(q, NULL, &alloc_data);
-- 
2.13.7



Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues

2019-03-26 Thread James Smart

On 3/26/2019 8:33 AM, Hannes Reinecke wrote:

On 3/26/19 4:12 PM, Bart Van Assche wrote:

On Tue, 2019-03-26 at 15:08 +0100, Hannes Reinecke wrote:

On 3/26/19 2:43 PM, Bart Van Assche wrote:

On 3/26/19 5:07 AM, Hannes Reinecke wrote:

When a queue is dying or dead there is no point in calling
blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
so might crash the machine as the queue structures are in the
process of being deleted.

Signed-off-by: Hannes Reinecke 
---
   block/blk-mq.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a9c181603cbd..b1eeba38bc79 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct 
request_queue *q)

   blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
   /* dispatch requests which are inserted during quiescing */
-    blk_mq_run_hw_queues(q, true);
+    if (!blk_queue_dying(q) && !blk_queue_dead(q))
+    blk_mq_run_hw_queues(q, true);
   }
   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);


Hi Hannes,

Please provide more context information. In the "dead" state the queue
must be run to make sure that all requests that were queued before the
"dead" state get processed. The blk_cleanup_queue() function is
responsible for stopping all code that can run the queue after all
requests have finished and before destruction of the data structures
needed for request processing starts.



I have a crash with two processes competing for the same controller:

#0  0x983d3bcb in sbitmap_any_bit_set (sb=0x8a1b874ba0d8)
  at ../lib/sbitmap.c:181
#1  0x98366c05 in blk_mq_hctx_has_pending 
(hctx=0x8a1b874ba000)

  at ../block/blk-mq.c:66
#2  0x98366c85 in blk_mq_run_hw_queues (q=0x8a1b874ba0d8,
async=true) at ../block/blk-mq.c:1292
#3  0x98366d3a in blk_mq_unquiesce_queue (q=)
  at ../block/blk-mq.c:265
#4  0xc01f3e0e in nvme_start_queues (ctrl=)
  at ../drivers/nvme/host/core.c:3658
#5  0xc01e843c in nvme_fc_delete_association
(ctrl=0x8a1f9be5a000)
  at ../drivers/nvme/host/fc.c:2843
#6  0xc01e8544 in nvme_fc_delete_association 
(ctrl=)

  at ../drivers/nvme/host/fc.c:2918
#7  0xc01e8544 in __nvme_fc_terminate_io 
(ctrl=0x8a1f9be5a000)

  at ../drivers/nvme/host/fc.c:2911
#8  0xc01e8f09 in nvme_fc_reset_ctrl_work 
(work=0x8a1f9be5a6d0)

  at ../drivers/nvme/host/fc.c:2927
#9  0x980a224a in process_one_work (worker=0x8a1b73934f00,
work=0x8a1f9be5a6d0) at ../kernel/workqueue.c:2092
#10 0x980a249b in worker_thread (__worker=0x8a1b73934f00)
  at ../kernel/workqueue.c:2226

#7  0x986d2e9a in wait_for_completion (x=0xa48eca88bc40)
  at ../kernel/sched/completion.c:125
#8  0x980f25ae in __synchronize_srcu (sp=0x9914fc20
, do_norm=) at 
../kernel/rcu/srcutree.c:851
#9  0x982d18b1 in debugfs_remove_recursive 
(dentry=)

  at ../fs/debugfs/inode.c:741
#10 0x98398ac5 in blk_mq_debugfs_unregister_hctx
(hctx=0x8a1b7000) at ../block/blk-mq-debugfs.c:897
#11 0x983661cf in blk_mq_exit_hctx (q=0x8a1f825e4040,
set=0x8a1f9be5a0c0, hctx=0x8a1b7000, hctx_idx=2) at
../block/blk-mq.c:1987
#12 0x9836946a in blk_mq_exit_hw_queues (nr_queue=, set=, q=) at ../block/blk-mq.c:2017
#13 0x9836946a in blk_mq_free_queue (q=0x8a1f825e4040)
  at ../block/blk-mq.c:2506
#14 0x9835aac5 in blk_cleanup_queue (q=0x8a1f825e4040)
  at ../block/blk-core.c:691
#15 0xc01f5bc8 in nvme_ns_remove (ns=0x8a1f819e8f80)
  at ../drivers/nvme/host/core.c:3138
#16 0xc01f6fea in nvme_validate_ns (ctrl=0x8a1f9be5a308, 
nsid=5)

  at ../drivers/nvme/host/core.c:3164
#17 0xc01f9053 in nvme_scan_ns_list (nn=,
ctrl=) at ../drivers/nvme/host/core.c:3202
#18 0xc01f9053 in nvme_scan_work (work=)
  at ../drivers/nvme/host/core.c:3280
#19 0x980a224a in process_one_work (worker=0x8a1b7349f6c0,
work=0x8a1f9be5aba0) at ../kernel/workqueue.c:2092

Point is that the queue is already dead by the time nvme_start_queues()
tries to flush existing requests (of which there are none, of course).
I had been looking into synchronizing scan_work and reset_work, but 
then

I wasn't sure if that wouldn't deadlock somewhere.


James, do you agree that nvme_fc_reset_ctrl_work should be canceled 
before

nvme_ns_remove() is allowed to call blk_cleanup_queue()?

Hmm. I would've thought exactly the other way round, namely 
cancelling/flushing the scan_work element when calling reset; after 
all, reset definitely takes precedence to scanning the controller 
(which won't work anyway as the controller is about to be reset...)


Cheers,

Hannes


Cancelling/flushing scan_work before starting reset won't work. Hannes 
is, correct, reset must take precedent.


The scenario is a controller that was recently con

Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-18 Thread James Smart




On 3/18/2019 6:31 PM, Ming Lei wrote:

On Mon, Mar 18, 2019 at 10:37:08AM -0700, James Smart wrote:


On 3/17/2019 8:29 PM, Ming Lei wrote:

In NVMe's error handler, follows the typical steps for tearing down
hardware:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
actually completes the request asynchronously.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.


This won't help FC at all. Inherently, the "completion" has to be
asynchronous as line traffic may be required.

e.g. FC doesn't use nvme_complete_request() in the iterator routine.


Looks FC has done the sync already, see nvme_fc_delete_association():

...
 /* wait for all io that had to be aborted */
 spin_lock_irq(&ctrl->lock);
 wait_event_lock_irq(ctrl->ioabort_wait, ctrl->iocnt == 0, ctrl->lock);
 ctrl->flags &= ~FCCTRL_TERMIO;
 spin_unlock_irq(&ctrl->lock);


yes - but the iterator started a lot of the back end io terminating in 
parallel. So waiting on many happening in parallel is better than 
waiting 1 at a time.   Even so, I've always disliked this wait and would 
have preferred to exit the thread with something monitoring the 
completions re-queuing a work thread to finish.


-- james



Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-18 Thread James Smart




On 3/18/2019 6:06 PM, Ming Lei wrote:

On Mon, Mar 18, 2019 at 10:37:08AM -0700, James Smart wrote:


On 3/17/2019 8:29 PM, Ming Lei wrote:

In NVMe's error handler, follows the typical steps for tearing down
hardware:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
actually completes the request asynchronously.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.


This won't help FC at all. Inherently, the "completion" has to be
asynchronous as line traffic may be required.

e.g. FC doesn't use nvme_complete_request() in the iterator routine.

Yeah, I saw the FC code, it is supposed to address the asynchronous
completion of blk_mq_complete_request() in error handler.

Also I think it is always the correct thing to abort requests
synchronously in error handler, isn't it?



not sure I fully follow you, but if you're asking shouldn't it always be 
synchronous - why would that be the case ?  I really don't want a 
blocking thread that could block for several seconds on a single io to 
complete.  The controller has changed state and the queues frozen which 
should have been sufficient - but bottom-end io can still complete at 
any time.


-- james



Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-18 Thread James Smart




On 3/17/2019 8:29 PM, Ming Lei wrote:

In NVMe's error handler, follows the typical steps for tearing down
hardware:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
actually completes the request asynchronously.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.



This won't help FC at all. Inherently, the "completion" has to be 
asynchronous as line traffic may be required.


e.g. FC doesn't use nvme_complete_request() in the iterator routine.

-- james



Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-18 Thread James Smart



On 3/17/2019 9:09 PM, Bart Van Assche wrote:

On 3/17/19 8:29 PM, Ming Lei wrote:

In NVMe's error handler, follows the typical steps for tearing down
hardware:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because 
blk_mq_complete_request()

actually completes the request asynchronously.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.


Other block drivers wait until outstanding requests have completed by 
calling blk_cleanup_queue() before hardware queues are destroyed. Why 
can't the NVMe driver follow that approach?




speaking for the fabrics, not necessarily pci:

The intent of this looping, which happens immediately following an error 
being detected, is to cause the termination of the outstanding requests. 
Otherwise, the only recourse is to wait for the ios to finish, which 
they may never do, or have their upper-level timeout expire to cause 
their termination - thus a very long delay.   And one of the commands, 
on the admin queue - a different tag set but handled the same, doesn't 
have a timeout (the Async Event Reporting command) so it wouldn't 
necessarily clear without this looping.


-- james



Re: [PATCH 10/17] nvmet-fc: fix issues with targetport assoc_list list walking

2019-03-13 Thread James Smart

On 3/13/2019 10:55 AM, Christoph Hellwig wrote:

From: James Smart 

There are two changes:

1) The logic in the __nvmet_fc_free_assoc() routine is bad. It uses
"safe" routines assuming pointers will come back valid.  However, the
intervening next structure being linked can be removed from the list and
the resulting safe pointers are bad, resulting in NULL ptrs being hit.

Correct by scheduling a work element to perform the association delete,
which can be done while under lock.

2) Prior patch that added the work element scheduling left a possible
reference on the object if the work element couldn't be scheduled.

Correct by doing the put on a failing schedule_work() call.

Signed-off-by: Nigel Kirkland 
Signed-off-by: James Smart 
Reviewed-by: Ewan D. Milne 
Signed-off-by: Christoph Hellwig 
---
  drivers/nvme/target/fc.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 1e9654f04c60..8d34aa573d5b 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1143,10 +1143,8 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
&tgtport->assoc_list, a_list) {
if (!nvmet_fc_tgt_a_get(assoc))
continue;
-   spin_unlock_irqrestore(&tgtport->lock, flags);
-   nvmet_fc_delete_target_assoc(assoc);
-   nvmet_fc_tgt_a_put(assoc);
-   spin_lock_irqsave(&tgtport->lock, flags);
+   if (!schedule_work(&assoc->del_work))
+   nvmet_fc_tgt_a_put(assoc);
}
spin_unlock_irqrestore(&tgtport->lock, flags);
  }
@@ -1185,7 +1183,8 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
nvmet_fc_tgtport_put(tgtport);
  
  		if (found_ctrl) {

-   schedule_work(&assoc->del_work);
+   if (schedule_work(&assoc->del_work))
+   nvmet_fc_tgt_a_put(assoc);
return;
}
  
V1 was checked in 
(http://lists.infradead.org/pipermail/linux-nvme/2019-February/022193.html
V2 was skipped 
(http://lists.infradead.org/pipermail/linux-nvme/2019-February/022540.html)


V2 changes the last
+            if (schedule_work(&assoc->del_work))
to
+            if (!schedule_work(&assoc->del_work))


Jens - can you do this manual fixup ?

-- james




Re: [PATCH v2 0/5] implement nvmf read/write queue maps

2018-12-12 Thread James Smart




On 12/12/2018 9:58 AM, Sagi Grimberg wrote:



Sagi,

What other wins are there for this split ?

I'm considering whether its worthwhile for fc as well, but the hol 
issue doesn't exist with fc. What else is being resolved ?


I've pondered it myself, which is why I didn't add it to fc as well
(would have been easy enough I think). I guess that with this the
user can limit writes compared to read by assigning less queues as
jens suggested in his patches...


not enough to interest me.  I also would prefer to not require more 
queues - it only increases the oversubscription requirements on the 
target, which is already a major issue on real shared subsystems.


Thanks

-- james



Re: [PATCH v2 0/5] implement nvmf read/write queue maps

2018-12-12 Thread James Smart




On 12/11/2018 3:35 PM, Sagi Grimberg wrote:

This set implements read/write queue maps to nvmf (implemented in tcp
and rdma). We basically allow the users to pass in nr_write_queues
argument that will basically maps a separate set of queues to host
write I/O (or more correctly non-read I/O) and a set of queues to
hold read I/O (which is now controlled by the known nr_io_queues).

A patchset that restores nvme-rdma polling is in the pipe.
The polling is less trivial because:
1. we can find non I/O completions in the cq (i.e. memreg)
2. we need to start with non-polling for a sane connect and
then switch to polling which is not trivial behind the
cq API we use.

Note that read/write separation for rdma but especially tcp this can be
very clear win as we minimize the risk for head-of-queue blocking for
mixed workloads over a single tcp byte stream.


Sagi,

What other wins are there for this split ?

I'm considering whether its worthwhile for fc as well, but the hol issue 
doesn't exist with fc. What else is being resolved ?


-- james



Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread James Smart




On 12/4/2018 1:21 PM, Keith Busch wrote:

On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote:

I disagree.  The cases I've run into are on the admin queue - where we are
sending io to initialize the controller when another error/reset occurs, and
the checks are required to identify/reject the "old" initialization
commands, with another state check allowing them to proceed on the "new"
initialization commands.  And there are also cases for ioctls and other
things that occur during the middle of those initialization steps that need
to be weeded out.   The Admin queue has to be kept live to allow the
initialization commands on the new controller.

state checks are also needed for those namespace validation cases


Once quiesced, the proposed iterator can handle the final termination
of the request, perform failover, or some other lld specific action
depending on your situation.

I don't believe they can remain frozen, definitely not for the admin queue.
-- james

Quiesced and frozen carry different semantics.

My understanding of the nvme-fc implementation is that it returns
BLK_STS_RESOURCE in the scenario you've described where the admin
command can't be executed at the moment. That just has the block layer
requeue it for later resubmission 3 milliseconds later, which will
continue to return the same status code until you're really ready for
it.


BLK_STS_RESOURCE is correct - but for "normal" io, which comes from the 
filesystem, etc and are mostly on the io queues.


But if the io originated from other sources, like the core layer via 
nvme_alloc_request() - used by lots of service routines for the 
transport to initialize the controller, core routines to talk to the 
controller, and ioctls from user space - then they are failed with a 
PATHING ERROR status.  The status doesn't mean much to these other 
places which mainly care if they succeed or not, and if not, they fail 
and unwind.  It's pretty critical for these paths to get that error 
status as many of those threads do synchronous io.  And this is not just 
for nvme-fc. Any transport initializing the controller and getting half 
way through it when an error occurs that kills the association will 
depend on this behavior.  PCI is a large exception as interaction with a 
pci function is very different from sending packets over a network and 
detecting network errors.


Io requests, on the io queues, that are flagged as multipath, also are 
failed this way rather than requeued.  We would need some iterator here 
to classify the type of io (one valid to go down another path) and move 
it to another path (but the transport doesn't want to know about other 
paths).





What I'm proposing is that instead of using that return code, you may
have nvme-fc control when to dispatch those queued requests by utilizing
the blk-mq quiesce on/off states. Is there a reason that wouldn't work?


and quiesce on/off isn't sufficient to do this.

-- james




Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread James Smart




On 12/4/2018 9:48 AM, Keith Busch wrote:

On Tue, Dec 04, 2018 at 09:38:29AM -0800, Sagi Grimberg wrote:

Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the queue, set
some error flags in the driver, then _restarting_ the queue, just so
that the driver then sees the error flag and terminates the requests.
Which I always found quite counter-intuitive.

What about requests that come in after the iteration runs? how are those
terminated?

If we've reached a dead state, I think you'd want to start a queue freeze
before running the terminating iterator.

Its not necessarily dead, in fabrics we need to handle disconnections
that last for a while before we are able to reconnect (for a variety of
reasons) and we need a way to fail I/O for failover (or requeue, or
block its up to the upper layer). Its less of a "last resort" action
like in the pci case.

Does this guarantee that after freeze+iter we won't get queued with any
other request? If not then we still need to unfreeze and fail at
queue_rq.

It sounds like there are different scenarios to consider.

For the dead controller, we call blk_cleanup_queue() at the end which
ends callers who blocked on entering.

If you're doing a failover, you'd replace the freeze with a current path
update in order to prevent new requests from entering.
and if you're not multipath ?    I assume you want the io queues to be 
frozen so they queue there - which can block threads such as ns 
verification. It's good to have them live, as todays checks bounce the 
io, letting the thread terminate as its in a reset/reconnect state, 
which allows those threads to exit out or finish before a new reconnect 
kicks them off again. We've already been fighting deadlocks with the 
reset/delete/rescan paths and these io paths. suspending the queues 
completely over the reconnect will likely create more issues in this area.




In either case, you don't need checks in queue_rq. The queue_rq check
is redundant with the quiesce state that blk-mq already provides.


I disagree.  The cases I've run into are on the admin queue - where we 
are sending io to initialize the controller when another error/reset 
occurs, and the checks are required to identify/reject the "old" 
initialization commands, with another state check allowing them to 
proceed on the "new" initialization commands.  And there are also cases 
for ioctls and other things that occur during the middle of those 
initialization steps that need to be weeded out.   The Admin queue has 
to be kept live to allow the initialization commands on the new controller.


state checks are also needed for those namespace validation cases



Once quiesced, the proposed iterator can handle the final termination
of the request, perform failover, or some other lld specific action
depending on your situation.


I don't believe they can remain frozen, definitely not for the admin queue.
-- james



Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread James Smart




On 12/4/2018 9:23 AM, Sagi Grimberg wrote:



Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the 
queue, set

some error flags in the driver, then _restarting_ the queue, just so
that the driver then sees the error flag and terminates the requests.
Which I always found quite counter-intuitive.
What about requests that come in after the iteration runs? how are 
those

terminated?
If we've reached a dead state, I think you'd want to start a queue 
freeze

before running the terminating iterator.


For the requests that come in after the iterator, the 
nvmf_check_ready() routine, which validates controller state, will 
catch and bounce them.


The point of this patch was to omit the check in queue_rq like Keith
did in patch #2.


well - I'm not sure that's possible. The fabrics will have different 
time constraints vs pci.


-- james



Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread James Smart

On 12/4/2018 7:46 AM, Keith Busch wrote:

On Mon, Dec 03, 2018 at 05:33:06PM -0800, Sagi Grimberg wrote:

Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the queue, set
some error flags in the driver, then _restarting_ the queue, just so
that the driver then sees the error flag and terminates the requests.
Which I always found quite counter-intuitive.

What about requests that come in after the iteration runs? how are those
terminated?

If we've reached a dead state, I think you'd want to start a queue freeze
before running the terminating iterator.


For the requests that come in after the iterator, the nvmf_check_ready() 
routine, which validates controller state, will catch and bounce them.


Keith states why we froze it in the past.  Whatever the itterator is, 
I'd prefer we not get abort calls on io that has yet to be successfully 
started.


-- james



Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-03 Thread James Smart




On 12/1/2018 10:32 AM, Bart Van Assche wrote:

On 12/1/18 9:11 AM, Hannes Reinecke wrote:


Yes, I'm very much in favour of this, too.
We always have this IMO slightly weird notion of stopping the queue, 
set some error flags in the driver, then _restarting_ the queue, just 
so that the driver then sees the error flag and terminates the requests.

Which I always found quite counter-intuitive.
So having a common helper for terminating requests for queue errors 
would be very welcomed here.


But when we have that we really should audit all drivers to ensure 
they do the right thin (tm).


Would calling blk_abort_request() for all outstanding requests be 
sufficient to avoid that the queue has to be stopped and restarted in 
the nvme-fc driver?


what nvme-fc does is the same as what is done in all the other 
transports - for the same reasons.  If we're eliminating those 
synchronization reasons, and now that we've plugged the request_queue 
path into the transports to check state appropriately, I don' t think 
there are reasons to block the queue.  In some respects, it is nice to 
stop new io while the work to terminate everything else happens, but I 
don't know that it's required.  I would hope that the bounced work due 
to the controller state (returned BLK_STAT_RESOURCE) is actually pausing 
for a short while. I've seen some circumstances where it didn't and was 
infinitely polling. Which would be a change in behavior vs the queue stops.


-- james



Re: [PATCH 3/7] nvme-fc: remove unused poll implementation

2018-11-19 Thread James Smart




On 11/19/2018 7:19 AM, Jens Axboe wrote:

On 11/19/18 12:59 AM, Christoph Hellwig wrote:

On Sat, Nov 17, 2018 at 02:43:50PM -0700, Jens Axboe wrote:

This relies on the fc taget ops setting ->poll_queue, which
nobody does. Otherwise it just checks if something has
completed, which isn't very useful.

Please also remove the unused ->poll_queue method.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig 

Sure, will do.



Looks fine on my end

Reviewed-by:  James Smart 



Re: [PATCH 0/2] blktests: test ANA base support

2018-07-26 Thread James Smart

make sure this fix has been picked up in your kernel:
http://git.infradead.org/nvme.git/commit/6cdefc6e2ad52170f89a8d0e8b1a1339f91834dc

deletes were broken otherwise and have the exact trace below.

-- james


On 7/25/2018 7:01 PM, Chaitanya Kulkarni wrote:

Thanks for the report and correction. I was able to run the test and reproduce 
the problem, I also confirm that it is not that consistent.

As I suspected the problem is in the nvme disconnect in the following call 
chain:-


nvme_sysfs_delete
  nvme_delete_ctrl_sync
    nvme_delete_ctrl_work
     nvme_remove_namespaces
      nvme_ns_remove
        nvme_ns_remove
          blk_cleanup_queue
        blk_freeze_queue
              blk_freeze_queue_start
            wait_event(q->mq_freeze_wq, 
percpu_ref_is_zero(&q->q_usage_counter)); <- Stuck here.
 
blk_cleanup_queue() that's where it is blocking and then target ctrl is timing out on keep alive.


It will take some time for me to debug and find a fix let's merge this test 
once we fix this issue.



How should we go about other tests ? should we merge them and keep this one out 
in the nvmeof branch?
Regards,
Chaitanya



From: Omar Sandoval 
Sent: Wednesday, July 25, 2018 2:00 PM
To: Chaitanya Kulkarni
Cc: Hannes Reinecke; Omar Sandoval; Christoph Hellwig; Sagi Grimberg; Keith 
Busch; linux-n...@lists.infradead.org; linux-block@vger.kernel.org
Subject: Re: [PATCH 0/2] blktests: test ANA base support
   
  
On Wed, Jul 25, 2018 at 07:27:35PM +, Chaitanya Kulkarni wrote:

Thanks, Omar.

Tests nvme/014 and nvme/015 had a pretty bad typo that I didn't notice
last time:

dd=/dev/urandom of="/dev/${nvmedev}n1" count=128000 bs=4k

That should be

dd if=/dev/urandom of="/dev/${nvmedev}n1" count=128000 bs=4k status=none

When I fix that (and change the nvme flush call as mentioned before), I
sometimes get a hung task:

[  273.80] run blktests nvme/015 at 2018-07-25 13:44:11
[  273.861950] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  273.875014] nvmet: creating controller 1 for subsystem blktests-subsystem-1 
for NQN nqn.2014-08.org.nvmexpress:uuid:c5e36fdf-8e4d-4c27-be56-da373db583b2.
[  273.877457] nvme nvme1: creating 4 I/O queues.
[  273.879141] nvme nvme1: new ctrl: "blktests-subsystem-1"
[  276.247708] nvme nvme1: using deprecated NVME_IOCTL_IO_CMD ioctl on the char 
device!
[  276.262835] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  289.755361] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[  289.760579] nvmet: ctrl 1 fatal error occurred!
[  491.095890] INFO: task kworker/u8:0:7 blocked for more than 120 seconds.
[  491.104407]   Not tainted 4.18.0-rc6 #18
[  491.108330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  491.116521] kworker/u8:0    D    0 7  2 0x8000
[  491.121754] Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
[  491.129604] Call Trace:
[  491.131611]  ? __schedule+0x2a1/0x890
[  491.135112]  ? _raw_spin_unlock_irqrestore+0x20/0x40
[  491.140542]  schedule+0x32/0x90
[  491.142030]  blk_mq_freeze_queue_wait+0x41/0xa0
[  491.144186]  ? wait_woken+0x80/0x80
[  491.145726]  blk_cleanup_queue+0x75/0x160
[  491.150235]  nvme_ns_remove+0xf9/0x130 [nvme_core]
[  491.151910]  nvme_remove_namespaces+0x86/0xc0 [nvme_core]
[  491.153127]  nvme_delete_ctrl_work+0x4b/0x80 [nvme_core]
[  491.154727]  process_one_work+0x18c/0x360
[  491.155428]  worker_thread+0x1c6/0x380
[  491.156160]  ? process_one_work+0x360/0x360
[  491.157493]  kthread+0x112/0x130
[  491.159119]  ? kthread_flush_work_fn+0x10/0x10
[  491.160008]  ret_from_fork+0x35/0x40
[  491.160729] INFO: task nvme:1139 blocked for more than 120 seconds.
[  491.162416]   Not tainted 4.18.0-rc6 #18
[  491.164348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  491.166012] nvme    D    0  1139   1072 0x
[  491.167946] Call Trace:
[  491.168459]  ? __schedule+0x2a1/0x890
[  491.169312]  schedule+0x32/0x90
[  491.170180]  schedule_timeout+0x311/0x4a0
[  491.171921]  ? kernfs_fop_release+0xa0/0xa0
[  491.172884]  wait_for_common+0x1a0/0x1d0
[  491.173813]  ? wake_up_q+0x70/0x70
[  491.174410]  flush_work+0x10e/0x1c0
[  491.174991]  ? flush_workqueue_prep_pwqs+0x130/0x130
[  491.176113]  nvme_delete_ctrl_sync+0x41/0x50 [nvme_core]
[  491.177969]  nvme_sysfs_delete+0x28/0x30 [nvme_core]
[  491.178632]  kernfs_fop_write+0x116/0x190
[  491.179254]  __vfs_write+0x36/0x190
[  491.179812]  vfs_write+0xa9/0x190
[  491.180344]  ksys_write+0x4f/0xb0
[  491.181056]  do_syscall_64+0x5b/0x170
[  491.181583]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  491.182311] RIP: 0033:0x7fc04176b9d4
[  491.182863] Code: Bad RIP value.
[  491.183650] RSP: 002b:7ffc33bd15a8 EFLAGS: 0246 ORIG_RAX: 
0001
[  491.184622] RAX: ffda RBX: 0004 RCX: 7fc04176b9d4
[  491.185606] RDX: 0001 RSI: 55884bd0810a RDI: 0004
[  491.186719] RBP: 00

Re: [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context

2018-05-07 Thread James Smart



On 5/5/2018 6:59 AM, Ming Lei wrote:

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, 
int status)
nvme_put_ctrl(&dev->ctrl);
  }
  
-static void nvme_reset_work(struct work_struct *work)

+static void nvme_reset_dev(struct nvme_dev *dev)
  {
-   struct nvme_dev *dev =
-   container_of(work, struct nvme_dev, ctrl.reset_work);
bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
int result = -ENODEV;
enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
  
+	mutex_lock(&dev->ctrl.reset_lock);

+
if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
goto out;
  


I believe the reset_lock is unnecessary (patch 5) as it should be 
covered by the transition of the state to RESETTING which is done under 
lock.


Thus the error is:
instead of:
 if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
     goto out;

it should be:
 if (dev->ctrl.state != NVME_CTRL_RESETTING))
     return;


-- james



Re: [PATCH v4 3/5] nvmet/fc: Use sgl_alloc() and sgl_free()

2018-01-05 Thread James Smart

On 1/5/2018 8:26 AM, Bart Van Assche wrote:

Use the sgl_alloc() and sgl_free() functions instead of open coding
these functions.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Cc: Keith Busch 
Cc: Christoph Hellwig 
Cc: James Smart 
Cc: Sagi Grimberg 
---
  drivers/nvme/target/Kconfig |  1 +
  drivers/nvme/target/fc.c| 36 ++--
  2 files changed, 3 insertions(+), 34 deletions(-)


Reviewed-by:  James Smart 

-- james




Re: [PATCH 10/10] Reserved tag for active ns command

2017-09-12 Thread James Smart
Please commonize the # of reserved tags.  There's more than 1 transport 
and they all do the same things.


-- james


On 9/11/2017 9:33 PM, Anish M Jhaveri wrote:

Using reserved tag for setting standby namespace to active using nvme command.

Signed-off-by: Anish M Jhaveri 
---
  drivers/nvme/host/rdma.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index cb6a5f8..d094793 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1550,7 +1550,7 @@ static int nvme_rdma_configure_admin_queue(struct 
nvme_rdma_ctrl *ctrl)
memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
-   ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
+   ctrl->admin_tag_set.reserved_tags = 3; /* connect + keep-alive */
ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
SG_CHUNK_SIZE * sizeof(struct scatterlist);




Re: [PATCH 02/10] RDMA timeout triggers failover.

2017-09-12 Thread James Smart
I don't know this is a good idea - just because there's a controller 
reset we need to failover a path ? Also putting failover smarts in the 
transport doesn't seem like a great idea. Also, there's more than just 
an rdma transport


-- james



On 9/11/2017 9:22 PM, Anish M Jhaveri wrote:

Trigger failover functionality will be called on any RDMA timeout. This timeout 
can occur due failure for an IO to be returned from Target. This could be 
caused due to interface going down while leads to failover functionality being 
triggered.

Signed-off-by: Anish M Jhaveri 
---
  drivers/nvme/host/rdma.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a03299d..cb6a5f8 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -174,6 +174,7 @@ static void nvme_rdma_free_qe(struct ib_device *ibdev, 
struct nvme_rdma_qe *qe,
  {
ib_dma_unmap_single(ibdev, qe->dma, capsule_size, dir);
kfree(qe->data);
+   qe->data = NULL;
  }
  
  static int nvme_rdma_alloc_qe(struct ib_device *ibdev, struct nvme_rdma_qe *qe,

@@ -766,6 +767,8 @@ static void nvme_rdma_error_recovery_work(struct 
work_struct *work)
  
  	nvme_stop_ctrl(&ctrl->ctrl);
  
+	nvme_trigger_failover(&ctrl->ctrl);

+
for (i = 0; i < ctrl->ctrl.queue_count; i++)
clear_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
  




Re: [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset

2017-07-10 Thread James Smart

On 6/18/2017 8:21 AM, Sagi Grimberg wrote:

No need to queue an extra work to indirect controller
uninit and put the final reference.


  static void nvme_rdma_del_ctrl_work(struct work_struct *work)
  {
struct nvme_rdma_ctrl *ctrl = container_of(work,
struct nvme_rdma_ctrl, delete_work);
  
-	__nvme_rdma_remove_ctrl(ctrl, true);

+   nvme_uninit_ctrl(&ctrl->ctrl);
+   nvme_rdma_shutdown_ctrl(ctrl, true);
+   nvme_put_ctrl(&ctrl->ctrl);
  }
  
  static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)

@@ -1791,14 +1784,6 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
return ret;
  }
  

...


@@ -1832,10 +1814,13 @@ static void nvme_rdma_reset_ctrl_work(struct 
work_struct *work)
  
  	return;
  
-del_dead_ctrl:

-   /* Deleting this dead controller... */
+out_destroy_io:
+   nvme_rdma_destroy_io_queues(ctrl, true);
+out_destroy_admin:
+   nvme_rdma_destroy_admin_queue(ctrl, true);
dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
-   WARN_ON(!queue_work(nvme_wq, &ctrl->delete_work));
+   nvme_uninit_ctrl(&ctrl->ctrl);
+   nvme_put_ctrl(&ctrl->ctrl);
  }
  
  static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {


Recommend calls to nvme_stop_keep_alive() prior to nvme_uninit_ctrl().

-- james



Re: [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset

2017-07-10 Thread James Smart

On 6/19/2017 12:18 AM, Christoph Hellwig wrote:

+static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool 
remove)
  {
+   nvme_rdma_stop_queue(&ctrl->queues[0]);
+   if (remove) {
+   blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
+   blk_cleanup_queue(ctrl->ctrl.admin_q);
+   blk_mq_free_tag_set(&ctrl->admin_tag_set);
+   nvme_rdma_dev_put(ctrl->device);
+   }
+
nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
sizeof(struct nvme_command), DMA_TO_DEVICE);
+   nvme_rdma_free_queue(&ctrl->queues[0]);

I don't like the calling convention.  We only have have two callers
anyway.  So I'd much rather only keep the code inside the if above
in the new nvme_rdma_destroy_admin_queue that is only called at shutdown
time, and opencode the calls to nvme_rdma_stop_queue, nvme_rdma_free_qe
and nvme_rdma_free_queue in the callers.



Any chance you can make the organization like what I did with FC and 
avoid all the "new" and "remove" flags ?


e.g. code blocks for:
- allocation/initialization for the controller and the tag sets. 
Basically initial allocation/creation of everything that would be the 
os-facing side of the controller.
- an association (or call it a session) create. Basically everything 
that makes the link-side ties to the subsystem and creates the 
controller and its connections. Does admin queue creation, controller 
init, and io queue creation, and enablement of the blk-mq queues as it 
does so.
- an association teardown. Basically everything that stops the blk-mq 
queues and tears down the link-side ties to the controller.
- a final controller teardown, which removes it from the system. 
Everything that terminates the os-facing side of the controller.


-- james






Re: [PATCH] lpfc: Fix panic on BFS configuration.

2017-04-24 Thread James Smart

Please disregard this patch. there is an issue in the fix

-- james


On 4/21/2017 5:24 PM, jsmart2...@gmail.com wrote:

From: James Smart 

To select the appropriate shost template, the driver is issuing
a mailbox command to retrieve the wwn. Turns out the sending of
the command precedes the reset of the function.  On SLI-4 adapters,
this is inconsequential as the mailbox command location is specified
by dma via the BMBX register. However, on SLI-3 adapters, the
location of the mailbox command submission area changes. When the
function is first powered on or reset, the cmd is submitted via PCI
bar memory. Later the driver changes the function config to use
host memory and DMA. The request to start a mailbox command is the
same, a simple doorbell write, regardless of submission area.
So.. if there has not been a boot driver run against the adapter,
the mailbox command works as defaults are ok. But, if the boot
driver has configured the card and, and if no platform pci
function/slot reset occurs as the os starts, the mailbox command
will fail. The SLI-3 device will use the stale boot driver dma
location. This can cause PCI eeh errors.

Fix is to reset the sli-3 function before sending the
mailbox command, thus synchronizing the function/driver on mailbox
location.

This issue was introduced by this patch:
http://www.spinics.net/lists/linux-scsi/msg105908.html
which is in the stable pools with commit id:
96418b5e2c8867da3279d877f5d1ffabfe460c3d

This patch needs to be applied to the stable trees where ever the
introducing patch exists.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Cc: sta...@vger.kernel.org




Re: [PATCH] Fix panic on BFS configuration.

2017-04-21 Thread James Smart

disregard. Fixing title

-- james



Re: [PATCH-v0 2/2] Fix memory corruption of the lpfc_ncmd->list pointers

2017-04-21 Thread James Smart
Please disregard - I am recutting, breaking it from the series, 
reposting as an individual patch.


-- james


On 4/21/2017 11:42 AM, Dick Kennedy wrote:

When starting the nvme Initiator and nvmet Target cli apps,
the nvmeI would cause an Oops in the nvme buffer list_head.

The nvmeI was using the private data area of the template and
setting it to 0 size.  This caused a use conflict with the
nvme transport also using the memory area and causing the
memory corruption.

NVMEI driver now claims a size for fcpreq that is provided in
the template and zero's out the area when done.  This
change has fixed the memory corruption.  I also altered some
list_del's to list_del_init because the lpfc_ncmd's are
migrating between the abts and put lists.
There are some new debug log statements added to help
debug.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
  drivers/scsi/lpfc/lpfc_nvme.c | 17 +++--
  drivers/scsi/lpfc/lpfc_nvme.h |  4 
  2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index f98cbc2..8008c82 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -761,6 +761,7 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pwqeIn,
struct nvme_fc_cmd_iu *cp;
struct lpfc_nvme_rport *rport;
struct lpfc_nodelist *ndlp;
+   struct lpfc_nvme_fcpreq_priv *freqpriv;
unsigned long flags;
uint32_t code;
uint16_t cid, sqhd, data;
@@ -918,6 +919,8 @@ out_err:
phba->cpucheck_cmpl_io[lpfc_ncmd->cpu]++;
}
  #endif
+   freqpriv = nCmd->private;
+   freqpriv->nvme_buf = NULL;
nCmd->done(nCmd);
  
  	spin_lock_irqsave(&phba->hbalock, flags);

@@ -1214,6 +1217,7 @@ lpfc_nvme_fcp_io_submit(struct nvme_fc_local_port 
*pnvme_lport,
struct lpfc_nvme_buf *lpfc_ncmd;
struct lpfc_nvme_rport *rport;
struct lpfc_nvme_qhandle *lpfc_queue_info;
+   struct lpfc_nvme_fcpreq_priv *freqpriv = pnvme_fcreq->private;
  #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
uint64_t start = 0;
  #endif
@@ -1292,7 +1296,7 @@ lpfc_nvme_fcp_io_submit(struct nvme_fc_local_port 
*pnvme_lport,
 * Do not let the IO hang out forever.  There is no midlayer issuing
 * an abort so inform the FW of the maximum IO pending time.
 */
-   pnvme_fcreq->private = (void *)lpfc_ncmd;
+   freqpriv->nvme_buf = lpfc_ncmd;
lpfc_ncmd->nvmeCmd = pnvme_fcreq;
lpfc_ncmd->nrport = rport;
lpfc_ncmd->ndlp = ndlp;
@@ -1422,6 +1426,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
struct lpfc_nvme_buf *lpfc_nbuf;
struct lpfc_iocbq *abts_buf;
struct lpfc_iocbq *nvmereq_wqe;
+   struct lpfc_nvme_fcpreq_priv *freqpriv = pnvme_fcreq->private;
union lpfc_wqe *abts_wqe;
unsigned long flags;
int ret_val;
@@ -1484,7 +1489,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
return;
}
  
-	lpfc_nbuf = (struct lpfc_nvme_buf *)pnvme_fcreq->private;

+   lpfc_nbuf = freqpriv->nvme_buf;
if (!lpfc_nbuf) {
spin_unlock_irqrestore(&phba->hbalock, flags);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_ABTS,
@@ -1637,7 +1642,7 @@ static struct nvme_fc_port_template lpfc_nvme_template = {
.local_priv_sz = sizeof(struct lpfc_nvme_lport),
.remote_priv_sz = sizeof(struct lpfc_nvme_rport),
.lsrqst_priv_sz = 0,
-   .fcprqst_priv_sz = 0,
+   .fcprqst_priv_sz = sizeof(struct lpfc_nvme_fcpreq_priv),
  };
  
  /**

@@ -2068,7 +2073,7 @@ lpfc_get_nvme_buf(struct lpfc_hba *phba, struct 
lpfc_nodelist *ndlp)
if (lpfc_test_rrq_active(phba, ndlp,
 lpfc_ncmd->cur_iocbq.sli4_lxritag))
continue;
-   list_del(&lpfc_ncmd->list);
+   list_del_init(&lpfc_ncmd->list);
found = 1;
break;
}
@@ -2083,7 +2088,7 @@ lpfc_get_nvme_buf(struct lpfc_hba *phba, struct 
lpfc_nodelist *ndlp)
if (lpfc_test_rrq_active(
phba, ndlp, lpfc_ncmd->cur_iocbq.sli4_lxritag))
continue;
-   list_del(&lpfc_ncmd->list);
+   list_del_init(&lpfc_ncmd->list);
found = 1;
break;
}
@@ -2542,7 +2547,7 @@ lpfc_sli4_nvme_xri_aborted(struct lpfc_hba *phba,
 &phba->sli4_hba.lpfc_abts_nvme_buf_list,
 list) {
if (lpfc_ncmd->cur_iocbq.sli4_xritag == xri) {
-   list_del(&lpfc_ncmd->list);
+   list_d

Re: [PATCH-v0 1/2] Fix panic on BFS configuration.

2017-04-21 Thread James Smart
Please disregard - I am recutting, breaking it from the series, 
reposting as an individual patch, and will be copying the stable tree 
that needs it too.


-- james

On 4/21/2017 11:42 AM, Dick Kennedy wrote:

The previous revison of FW had already started when the driver
started to init. The driver is trying to get the WWPN through the
mailbox slim register which is only for bootstraping the fw.

The fix is to reset the fw get the wwpn and then start the fw.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
  drivers/scsi/lpfc/lpfc_crtn.h | 1 +
  drivers/scsi/lpfc/lpfc_init.c | 7 +++
  drivers/scsi/lpfc/lpfc_sli.c  | 2 +-
  3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 944b32c..62016d6 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -294,6 +294,7 @@ int lpfc_selective_reset(struct lpfc_hba *);
  void lpfc_reset_barrier(struct lpfc_hba *);
  int lpfc_sli_brdready(struct lpfc_hba *, uint32_t);
  int lpfc_sli_brdkill(struct lpfc_hba *);
+int lpfc_sli_chipset_init(struct lpfc_hba *);
  int lpfc_sli_brdreset(struct lpfc_hba *);
  int lpfc_sli_brdrestart(struct lpfc_hba *);
  int lpfc_sli_hba_setup(struct lpfc_hba *);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 90ae354..e85f273 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3602,6 +3602,13 @@ lpfc_get_wwpn(struct lpfc_hba *phba)
LPFC_MBOXQ_t *mboxq;
MAILBOX_t *mb;
  
+	if (phba->sli_rev < LPFC_SLI_REV4) {

+   /* Reset the port first */
+   lpfc_sli_brdrestart(phba);
+   rc = lpfc_sli_chipset_init(phba);
+   if (rc)
+   return (uint64_t)-1;
+   }
  
  	mboxq = (LPFC_MBOXQ_t *) mempool_alloc(phba->mbox_mem_pool,

GFP_KERNEL);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index cf19f49..22862b7 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4446,7 +4446,7 @@ lpfc_sli_brdrestart(struct lpfc_hba *phba)
   * iteration, the function will restart the HBA again. The function returns
   * zero if HBA successfully restarted else returns negative error code.
   **/
-static int
+int
  lpfc_sli_chipset_init(struct lpfc_hba *phba)
  {
uint32_t status, i = 0;




Re: nvme-fc: fix status code handling in nvme_fc_fcpio_done

2017-04-18 Thread James Smart


On 4/18/2017 8:52 AM, Christoph Hellwig wrote:

From: Christoph Hellwig 

nvme_complete_async_event expects the little endian status code
including the phase bit, and a new completion handler I plan to
introduce will do so as well.

Change the status variable into the little endian format with the
phase bit used in the NVMe CQE to fix / enable this.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---



look good

-- james

Signed-off-by: James Smart