Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi James

Thanks for you detailed, kindly response and directive.
That's really appreciated.

On 01/18/2018 02:24 PM, James Smart wrote:
>> So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then 
>> we get:
>> nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
>> nvme-pci RESET_PREPARE -> RESETTING    -> LIVE
> 
> Right - so another way to look at this is you renamed RESETTING to 
> RESET_PREPARE and added a new RESETTING state in the nvme-pci when in 
> reinits.  Why not simply have the nvme-pci transport transition to 
> RECONNECTING like the other transports. No new states, semantics are all the 
> same.
> 
> Here's what the responsibility of the states are:
> RESETTING:
> -quiescing the blk-mq queues os errors don't bubble back to callees and new 
> io is suspended
> -terminating the link-side association: resets the controller via register 
> access/SET_PROPERTY, deletes connections/queues, cleans out active io and 
> lets them queue for resending, etc.
> -end result is nvme block device is present, but idle, and no active 
> controller on the link side  (link being a transport specific link such as 
> pci, or ethernet/ib for rdma or fc link).
> 
> RECONNECTING:
> - "establish an association at the transport" on PCI this is immediate - you 
> can either talk to the pci function or you can't. With rdma/fc we send 
> transport traffic to create an admin q connection.
> - if association succeeded: the controller is init'd via register 
> accesses/fabric GET/SET_PROPERTIES and admin-queue command, io 
> connections/queues created, blk-mq queues unquiesced, and finally transition 
> to NVME_CTRL_LIVE.
> - if association failed: check controller timeout., if not exceeded, schedule 
> timer and retry later.  With pci, this could cover the temporary loss of the 
> pci function access (a hot plug event) while keeping the nvme blk device live 
> in the system.
> 
> It matches what you are describing but using what we already have in place - 
> with the only difference being the addition of your "boundary" by adding the 
> RECONNECTING state to nvme-pci.

Yes, absolutely. Thanks for your detailed summary here. 
Introducing RECONNECTING into nvme-pci is indeed a clearer way to fix the issue.

> 
> 
>>
 I don't believe RESETTING and RECONNECTING are that similar - unless - you 
 are looking at that "reinit" part that we left grandfathered into PCI.
>> Yes. I have got the point. Thanks for your directive.
>>
>> Both nvme-pc and nvme-fc/rdma have "shutdown" part to tear down 
>> queues/connects, quiesce queues, cancel outstanding requests...
>> We could call this part as "shutdowning" as well as the scheduling gap.
>> Because the difference of initializing between the nvme-pci and 
>> nvme-fc/rdma,  we could call "reiniting" for nvme-pci and
>> call "reconnecting" for nvme-fc/rdma
> 
> I don't think we need different states for the two. The actions taken are 
> really very similar. How they do the actions varies slightly, but not what 
> they are trying to do.   Thus, I'd prefer we keep the existing RECONNECTING 
> state and use it in nvme-pci as well. I'm open to renaming it if there's 
> something about the name that is not agreeable.

Yes. And I will use RECONNECTING in next version to fix this issue. If better 
name, could introduce in other patchset. :)


Many thanks again.
Jianchao


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi James

Thanks for you detailed, kindly response and directive.
That's really appreciated.

On 01/18/2018 02:24 PM, James Smart wrote:
>> So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then 
>> we get:
>> nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
>> nvme-pci RESET_PREPARE -> RESETTING    -> LIVE
> 
> Right - so another way to look at this is you renamed RESETTING to 
> RESET_PREPARE and added a new RESETTING state in the nvme-pci when in 
> reinits.  Why not simply have the nvme-pci transport transition to 
> RECONNECTING like the other transports. No new states, semantics are all the 
> same.
> 
> Here's what the responsibility of the states are:
> RESETTING:
> -quiescing the blk-mq queues os errors don't bubble back to callees and new 
> io is suspended
> -terminating the link-side association: resets the controller via register 
> access/SET_PROPERTY, deletes connections/queues, cleans out active io and 
> lets them queue for resending, etc.
> -end result is nvme block device is present, but idle, and no active 
> controller on the link side  (link being a transport specific link such as 
> pci, or ethernet/ib for rdma or fc link).
> 
> RECONNECTING:
> - "establish an association at the transport" on PCI this is immediate - you 
> can either talk to the pci function or you can't. With rdma/fc we send 
> transport traffic to create an admin q connection.
> - if association succeeded: the controller is init'd via register 
> accesses/fabric GET/SET_PROPERTIES and admin-queue command, io 
> connections/queues created, blk-mq queues unquiesced, and finally transition 
> to NVME_CTRL_LIVE.
> - if association failed: check controller timeout., if not exceeded, schedule 
> timer and retry later.  With pci, this could cover the temporary loss of the 
> pci function access (a hot plug event) while keeping the nvme blk device live 
> in the system.
> 
> It matches what you are describing but using what we already have in place - 
> with the only difference being the addition of your "boundary" by adding the 
> RECONNECTING state to nvme-pci.

Yes, absolutely. Thanks for your detailed summary here. 
Introducing RECONNECTING into nvme-pci is indeed a clearer way to fix the issue.

> 
> 
>>
 I don't believe RESETTING and RECONNECTING are that similar - unless - you 
 are looking at that "reinit" part that we left grandfathered into PCI.
>> Yes. I have got the point. Thanks for your directive.
>>
>> Both nvme-pc and nvme-fc/rdma have "shutdown" part to tear down 
>> queues/connects, quiesce queues, cancel outstanding requests...
>> We could call this part as "shutdowning" as well as the scheduling gap.
>> Because the difference of initializing between the nvme-pci and 
>> nvme-fc/rdma,  we could call "reiniting" for nvme-pci and
>> call "reconnecting" for nvme-fc/rdma
> 
> I don't think we need different states for the two. The actions taken are 
> really very similar. How they do the actions varies slightly, but not what 
> they are trying to do.   Thus, I'd prefer we keep the existing RECONNECTING 
> state and use it in nvme-pci as well. I'm open to renaming it if there's 
> something about the name that is not agreeable.

Yes. And I will use RECONNECTING in next version to fix this issue. If better 
name, could introduce in other patchset. :)


Many thanks again.
Jianchao


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread James Smart

Thanks jianchoa. This helped.

On 1/17/2018 7:13 PM, jianchao.wang wrote:

Actually, this patchset is to fix a issue in nvme_timeout.
Please consider the following scenario.

nvme_reset_ctrl
 -> set state to RESETTING
 -> queue reset_work
 (scheduling)
  nvme_reset_work  -> nvme_dev_disable
 -> quiesce queues
   -> nvme_cancel_request
  on outstanding requests
  _boundary_
   -> nvme initializing (may issue request on adminq)

Before the boundary, not only quiesce the queues, but only cancel all the 
outstanding requests.

A request could expire when the ctrl state is RESETTING.
  - If the timeout occur before the _boundary_, the expired requests are from 
the previous work.
  - Otherwise, the expired requests are from the controller initializing 
procedure, such as sending cq/sq
create commands to adminq to setup io queues.
In current implementation, nvme_timeout cannot identify the _boundary_ so only 
handles second case above.


So what you've described very well is the pci adapter and the fact that 
it doesn't use a RECONNECTING state when it starts to reinit the 
controller like rdma/fc does.  Note: we had left it that way as a 
"grandfathering" as the code already existed and we didn't see an issue 
just leaving the reinit in the resetting handler.






So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then we 
get:
nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
nvme-pci RESET_PREPARE -> RESETTING-> LIVE


Right - so another way to look at this is you renamed RESETTING to 
RESET_PREPARE and added a new RESETTING state in the nvme-pci when in 
reinits.  Why not simply have the nvme-pci transport transition to 
RECONNECTING like the other transports. No new states, semantics are all 
the same.


Here's what the responsibility of the states are:
RESETTING:
-quiescing the blk-mq queues os errors don't bubble back to callees and 
new io is suspended
-terminating the link-side association: resets the controller via 
register access/SET_PROPERTY, deletes connections/queues, cleans out 
active io and lets them queue for resending, etc.
-end result is nvme block device is present, but idle, and no active 
controller on the link side  (link being a transport specific link such 
as pci, or ethernet/ib for rdma or fc link).


RECONNECTING:
- "establish an association at the transport" on PCI this is immediate - 
you can either talk to the pci function or you can't. With rdma/fc we 
send transport traffic to create an admin q connection.
- if association succeeded: the controller is init'd via register 
accesses/fabric GET/SET_PROPERTIES and admin-queue command, io 
connections/queues created, blk-mq queues unquiesced, and finally 
transition to NVME_CTRL_LIVE.
- if association failed: check controller timeout., if not exceeded, 
schedule timer and retry later.  With pci, this could cover the 
temporary loss of the pci function access (a hot plug event) while 
keeping the nvme blk device live in the system.


It matches what you are describing but using what we already have in 
place - with the only difference being the addition of your "boundary" 
by adding the RECONNECTING state to nvme-pci.






I don't believe RESETTING and RECONNECTING are that similar - unless - you are looking at 
that "reinit" part that we left grandfathered into PCI.

Yes. I have got the point. Thanks for your directive.

Both nvme-pc and nvme-fc/rdma have "shutdown" part to tear down 
queues/connects, quiesce queues, cancel outstanding requests...
We could call this part as "shutdowning" as well as the scheduling gap.
Because the difference of initializing between the nvme-pci and nvme-fc/rdma,  we could 
call "reiniting" for nvme-pci and
call "reconnecting" for nvme-fc/rdma


I don't think we need different states for the two. The actions taken 
are really very similar. How they do the actions varies slightly, but 
not what they are trying to do.   Thus, I'd prefer we keep the existing 
RECONNECTING state and use it in nvme-pci as well. I'm open to renaming 
it if there's something about the name that is not agreeable.



-- james



Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread James Smart

Thanks jianchoa. This helped.

On 1/17/2018 7:13 PM, jianchao.wang wrote:

Actually, this patchset is to fix a issue in nvme_timeout.
Please consider the following scenario.

nvme_reset_ctrl
 -> set state to RESETTING
 -> queue reset_work
 (scheduling)
  nvme_reset_work  -> nvme_dev_disable
 -> quiesce queues
   -> nvme_cancel_request
  on outstanding requests
  _boundary_
   -> nvme initializing (may issue request on adminq)

Before the boundary, not only quiesce the queues, but only cancel all the 
outstanding requests.

A request could expire when the ctrl state is RESETTING.
  - If the timeout occur before the _boundary_, the expired requests are from 
the previous work.
  - Otherwise, the expired requests are from the controller initializing 
procedure, such as sending cq/sq
create commands to adminq to setup io queues.
In current implementation, nvme_timeout cannot identify the _boundary_ so only 
handles second case above.


So what you've described very well is the pci adapter and the fact that 
it doesn't use a RECONNECTING state when it starts to reinit the 
controller like rdma/fc does.  Note: we had left it that way as a 
"grandfathering" as the code already existed and we didn't see an issue 
just leaving the reinit in the resetting handler.






So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then we 
get:
nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
nvme-pci RESET_PREPARE -> RESETTING-> LIVE


Right - so another way to look at this is you renamed RESETTING to 
RESET_PREPARE and added a new RESETTING state in the nvme-pci when in 
reinits.  Why not simply have the nvme-pci transport transition to 
RECONNECTING like the other transports. No new states, semantics are all 
the same.


Here's what the responsibility of the states are:
RESETTING:
-quiescing the blk-mq queues os errors don't bubble back to callees and 
new io is suspended
-terminating the link-side association: resets the controller via 
register access/SET_PROPERTY, deletes connections/queues, cleans out 
active io and lets them queue for resending, etc.
-end result is nvme block device is present, but idle, and no active 
controller on the link side  (link being a transport specific link such 
as pci, or ethernet/ib for rdma or fc link).


RECONNECTING:
- "establish an association at the transport" on PCI this is immediate - 
you can either talk to the pci function or you can't. With rdma/fc we 
send transport traffic to create an admin q connection.
- if association succeeded: the controller is init'd via register 
accesses/fabric GET/SET_PROPERTIES and admin-queue command, io 
connections/queues created, blk-mq queues unquiesced, and finally 
transition to NVME_CTRL_LIVE.
- if association failed: check controller timeout., if not exceeded, 
schedule timer and retry later.  With pci, this could cover the 
temporary loss of the pci function access (a hot plug event) while 
keeping the nvme blk device live in the system.


It matches what you are describing but using what we already have in 
place - with the only difference being the addition of your "boundary" 
by adding the RECONNECTING state to nvme-pci.






I don't believe RESETTING and RECONNECTING are that similar - unless - you are looking at 
that "reinit" part that we left grandfathered into PCI.

Yes. I have got the point. Thanks for your directive.

Both nvme-pc and nvme-fc/rdma have "shutdown" part to tear down 
queues/connects, quiesce queues, cancel outstanding requests...
We could call this part as "shutdowning" as well as the scheduling gap.
Because the difference of initializing between the nvme-pci and nvme-fc/rdma,  we could 
call "reiniting" for nvme-pci and
call "reconnecting" for nvme-fc/rdma


I don't think we need different states for the two. The actions taken 
are really very similar. How they do the actions varies slightly, but 
not what they are trying to do.   Thus, I'd prefer we keep the existing 
RECONNECTING state and use it in nvme-pci as well. I'm open to renaming 
it if there's something about the name that is not agreeable.



-- james



Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi James

Thanks for your kindly and detailed response.
That's really appreciated!
And sorry for my bad description.

On 01/18/2018 05:01 AM, James Smart wrote:
> I'm having a hard time following why this patch is being requested. Help me 
> catch on.
> 
Actually, this patchset is to fix a issue in nvme_timeout.
Please consider the following scenario.

nvme_reset_ctrl
-> set state to RESETTING
-> queue reset_work   
(scheduling)
 nvme_reset_work
  -> nvme_dev_disable
-> quiesce queues
  -> nvme_cancel_request 
 on outstanding requests
 _boundary_
  -> nvme initializing (may issue request on adminq)

Before the boundary, not only quiesce the queues, but only cancel all the 
outstanding requests.

A request could expire when the ctrl state is RESETTING.
 - If the timeout occur before the _boundary_, the expired requests are from 
the previous work.
 - Otherwise, the expired requests are from the controller initializing 
procedure, such as sending cq/sq  
   create commands to adminq to setup io queues.
In current implementation, nvme_timeout cannot identify the _boundary_ so only 
handles second case above.

This patchset is to identify the _boundary_ above, so introduce RESET_PREPARE.
Please refer to the following diagram.
nvme_reset_ctrl
-> set state to RESET_PREPARE
-> queue reset_work   
  (scheduling)
 nvme_reset_work
  -> nvme_dev_disable
-> quiesce queues
  -> nvme_cancel_request 
 on outstanding requests
  -> set state to RESETTING   --> boundary
  -> nvme initializing (may issue request on adminq)

Then we could identify the boundary by checking the ctrl-> state

After Sagi introduced d5bf4b7 (nvme-rdma: fix concurrent reset and reconnect), 
actually, same things as been done
in both nvme-fc/nvme.
- nvme_rdma_error_recovery_work
- nvme_rdma_reset_ctrl_work
- nvme_fc_reset_ctrl_work
But the state is from RESETTING to RECONNECTING.

So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then we 
get:
nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
nvme-pci RESET_PREPARE -> RESETTING-> LIVE


> On 1/16/2018 8:54 PM, Jianchao Wang wrote:
>> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
>> before queue the reset work. This is not so strict. There could be
>> a big gap before the reset_work callback is invoked.
> ok so what you're saying is you want something to know that you've 
> transitioned to RESETTING but not yet performed an action for the reset.   
> What is that something and what is it to do > 
> guessing - I assume it's in the transport  xxx_is_ready() and/or 
> nvmf_check_init_req(), which is called at the start of queue_rq(), that wants 
> to do something ?
> 

Should be explained above. :)

> 
>>   In addition,
>> there is some disable work in the reset_work callback, strictly
>> speaking, not part of reset work, and could lead to some confusion.
> 
> Can you explain this ?  what's the confusion ?
> 
> I assume by "disable" you mean it is quiescing queues ?

Sorry for my bad description. Not only quiesce queues, but also handle the 
outstanding requests. 

> 
> 
>>
>> In addition, after set state to RESETTING and disable procedure,
>> nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
>> reconnect procedure. The RESETTING state has been narrowed.
> 
> I still don't follow. Yes RECONNECTING is where we repetitively: try to 
> create a link-side association again: if it fails, delay and try again; or if 
> success, reinit the controller, and unquiesce all queues - allowing full 
> operation again, at which time we transition to LIVE.
> 
> by "narrowed" what do you mean ?    what "narrowed" ?

Currently , in nvme-pci, the RESETTING state includes both "shutdown" and 
"setup". But in nvme-fc/rdma, only "shutdown", "setup" is in RECONECTTING
shutdown - tear down queues/connections, quiesce queues, cancel requests 
setup- setup new IO queues or connections and some other initializing work 

> 
> In FC, as we have a lot of work that must occur to terminate io as part of 
> the reset, it can be a fairly long window.  I don't know that any 
> functionally in this path, regardless of time window, has narrowed.
> 
> 
>>
>> This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
>> or error recovery work, scheduling gap and disable procedure.
>> After that,
>>   - For nvme-pci, nvmet-loop, set state to RESETTING, start
>>     initialization.
>>   - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
>>     initialization or reconnect.
> 
> So I'm lost - so you've effectively renamed RESETTING to 

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi James

Thanks for your kindly and detailed response.
That's really appreciated!
And sorry for my bad description.

On 01/18/2018 05:01 AM, James Smart wrote:
> I'm having a hard time following why this patch is being requested. Help me 
> catch on.
> 
Actually, this patchset is to fix a issue in nvme_timeout.
Please consider the following scenario.

nvme_reset_ctrl
-> set state to RESETTING
-> queue reset_work   
(scheduling)
 nvme_reset_work
  -> nvme_dev_disable
-> quiesce queues
  -> nvme_cancel_request 
 on outstanding requests
 _boundary_
  -> nvme initializing (may issue request on adminq)

Before the boundary, not only quiesce the queues, but only cancel all the 
outstanding requests.

A request could expire when the ctrl state is RESETTING.
 - If the timeout occur before the _boundary_, the expired requests are from 
the previous work.
 - Otherwise, the expired requests are from the controller initializing 
procedure, such as sending cq/sq  
   create commands to adminq to setup io queues.
In current implementation, nvme_timeout cannot identify the _boundary_ so only 
handles second case above.

This patchset is to identify the _boundary_ above, so introduce RESET_PREPARE.
Please refer to the following diagram.
nvme_reset_ctrl
-> set state to RESET_PREPARE
-> queue reset_work   
  (scheduling)
 nvme_reset_work
  -> nvme_dev_disable
-> quiesce queues
  -> nvme_cancel_request 
 on outstanding requests
  -> set state to RESETTING   --> boundary
  -> nvme initializing (may issue request on adminq)

Then we could identify the boundary by checking the ctrl-> state

After Sagi introduced d5bf4b7 (nvme-rdma: fix concurrent reset and reconnect), 
actually, same things as been done
in both nvme-fc/nvme.
- nvme_rdma_error_recovery_work
- nvme_rdma_reset_ctrl_work
- nvme_fc_reset_ctrl_work
But the state is from RESETTING to RECONNECTING.

So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then we 
get:
nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE
nvme-pci RESET_PREPARE -> RESETTING-> LIVE


> On 1/16/2018 8:54 PM, Jianchao Wang wrote:
>> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
>> before queue the reset work. This is not so strict. There could be
>> a big gap before the reset_work callback is invoked.
> ok so what you're saying is you want something to know that you've 
> transitioned to RESETTING but not yet performed an action for the reset.   
> What is that something and what is it to do > 
> guessing - I assume it's in the transport  xxx_is_ready() and/or 
> nvmf_check_init_req(), which is called at the start of queue_rq(), that wants 
> to do something ?
> 

Should be explained above. :)

> 
>>   In addition,
>> there is some disable work in the reset_work callback, strictly
>> speaking, not part of reset work, and could lead to some confusion.
> 
> Can you explain this ?  what's the confusion ?
> 
> I assume by "disable" you mean it is quiescing queues ?

Sorry for my bad description. Not only quiesce queues, but also handle the 
outstanding requests. 

> 
> 
>>
>> In addition, after set state to RESETTING and disable procedure,
>> nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
>> reconnect procedure. The RESETTING state has been narrowed.
> 
> I still don't follow. Yes RECONNECTING is where we repetitively: try to 
> create a link-side association again: if it fails, delay and try again; or if 
> success, reinit the controller, and unquiesce all queues - allowing full 
> operation again, at which time we transition to LIVE.
> 
> by "narrowed" what do you mean ?    what "narrowed" ?

Currently , in nvme-pci, the RESETTING state includes both "shutdown" and 
"setup". But in nvme-fc/rdma, only "shutdown", "setup" is in RECONECTTING
shutdown - tear down queues/connections, quiesce queues, cancel requests 
setup- setup new IO queues or connections and some other initializing work 

> 
> In FC, as we have a lot of work that must occur to terminate io as part of 
> the reset, it can be a fairly long window.  I don't know that any 
> functionally in this path, regardless of time window, has narrowed.
> 
> 
>>
>> This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
>> or error recovery work, scheduling gap and disable procedure.
>> After that,
>>   - For nvme-pci, nvmet-loop, set state to RESETTING, start
>>     initialization.
>>   - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
>>     initialization or reconnect.
> 
> So I'm lost - so you've effectively renamed RESETTING to 

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread James Smart
I'm having a hard time following why this patch is being requested. Help 
me catch on.


On 1/16/2018 8:54 PM, Jianchao Wang wrote:

Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked.
ok so what you're saying is you want something to know that you've 
transitioned to RESETTING but not yet performed an action for the 
reset.   What is that something and what is it to do ?


guessing - I assume it's in the transport  xxx_is_ready() and/or 
nvmf_check_init_req(), which is called at the start of queue_rq(), that 
wants to do something ?




  In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work, and could lead to some confusion.


Can you explain this ?  what's the confusion ?

I assume by "disable" you mean it is quiescing queues ?




In addition, after set state to RESETTING and disable procedure,
nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
reconnect procedure. The RESETTING state has been narrowed.


I still don't follow. Yes RECONNECTING is where we repetitively: try to 
create a link-side association again: if it fails, delay and try again; 
or if success, reinit the controller, and unquiesce all queues - 
allowing full operation again, at which time we transition to LIVE.


by "narrowed" what do you mean ?    what "narrowed" ?

In FC, as we have a lot of work that must occur to terminate io as part 
of the reset, it can be a fairly long window.  I don't know that any 
functionally in this path, regardless of time window, has narrowed.





This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
or error recovery work, scheduling gap and disable procedure.
After that,
  - For nvme-pci, nvmet-loop, set state to RESETTING, start
initialization.
  - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
initialization or reconnect.


So I'm lost - so you've effectively renamed RESETTING to RESET_PREPARE 
for fc/rdma.  What do you define are the actions in RESETTING that went 
away and why is that different between pci and the other transports ?   
Why doesn't nvme-pci need to go through RESET_PREPARE ? doesn't it have 
the same scheduling window for a reset_work thread ?



On 1/17/2018 1:06 AM, Max Gurtovoy wrote:



+
+    case NVME_CTRL_RESETTING:
+    switch (old_state) {
+    case NVME_CTRL_RESET_PREPARE:
+    changed = true;
+    /* FALLTHRU */
+    default:
+    break;
+    }
+    break;
  case NVME_CTRL_RECONNECTING:
  switch (old_state) {
  case NVME_CTRL_LIVE:
-    case NVME_CTRL_RESETTING:
+    case NVME_CTRL_RESET_PREPARE:


As I suggested in V3, please don't allow this transition.
We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.

I look on it like that:

NVME_CTRL_RESET_PREPARE - "suspend" state
NVME_CTRL_RESETTING - "resume" state

you don't reconnect from "suspend" state, you must "resume" before you 
reconnect.


This makes no sense to me.

I could use a definition of what "suspend" and "resume" mean to you

from what I've seen so far:
NVME_CTRL_RESET_PREPARE:   means I've decided to reset, changed state, 
but the actual work for reset hasn't started yet.   As we haven't 
commonized who does the quiescing of the queues, the queues are still 
live at this state, although some nvme check routine may bounce them. In 
truth, the queues should be quiesced here.


NVME_CTRL_RESETTING: I'm resetting the controller, tearing down 
queues/connections, the link side association.  AFAIK - pci and all the 
other transports have to do things things.   Now is when the blk-mq 
queues get stopped.   We have a variance on whether the queues are 
unquiesced or left quiesced (I think this is what you meant by "resume", 
where resume means unquiesce) at the end of this.   The admin_q is 
unquiesced, meaning new admin cmds should fail.  rdma also has io queues 
unquiesced meaning new ios fail, while fc leaves them quiesced while 
background timers run - meaning no new ios issued, nor any fail back to 
a multipather. With the agreement that we would patch all of the 
transports to leave them quiesced with fast-fail-timeouts occuring to 
unquiesce them and start failing ios.


NVME_RECONNECTING: transitioned to after the link-side association is 
terminated and the transport will now attempt to reconnect (perhaps 
several attempts) to create a new link-side association. Stays in this 
state until the controller is fully reconnected and it transitions to 
NVME_LIVE.   Until the link side association is active, queues do what 
they do (as left by RESETTING and/or updated per timeouts) excepting 
that after an active assocation, they queues will be unquiesced at the 
time of the LIVE transition.   Note: we grandfathered PCI into not 
needing this state:   As you (almost) can't fail the 

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread James Smart
I'm having a hard time following why this patch is being requested. Help 
me catch on.


On 1/16/2018 8:54 PM, Jianchao Wang wrote:

Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked.
ok so what you're saying is you want something to know that you've 
transitioned to RESETTING but not yet performed an action for the 
reset.   What is that something and what is it to do ?


guessing - I assume it's in the transport  xxx_is_ready() and/or 
nvmf_check_init_req(), which is called at the start of queue_rq(), that 
wants to do something ?




  In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work, and could lead to some confusion.


Can you explain this ?  what's the confusion ?

I assume by "disable" you mean it is quiescing queues ?




In addition, after set state to RESETTING and disable procedure,
nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
reconnect procedure. The RESETTING state has been narrowed.


I still don't follow. Yes RECONNECTING is where we repetitively: try to 
create a link-side association again: if it fails, delay and try again; 
or if success, reinit the controller, and unquiesce all queues - 
allowing full operation again, at which time we transition to LIVE.


by "narrowed" what do you mean ?    what "narrowed" ?

In FC, as we have a lot of work that must occur to terminate io as part 
of the reset, it can be a fairly long window.  I don't know that any 
functionally in this path, regardless of time window, has narrowed.





This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
or error recovery work, scheduling gap and disable procedure.
After that,
  - For nvme-pci, nvmet-loop, set state to RESETTING, start
initialization.
  - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
initialization or reconnect.


So I'm lost - so you've effectively renamed RESETTING to RESET_PREPARE 
for fc/rdma.  What do you define are the actions in RESETTING that went 
away and why is that different between pci and the other transports ?   
Why doesn't nvme-pci need to go through RESET_PREPARE ? doesn't it have 
the same scheduling window for a reset_work thread ?



On 1/17/2018 1:06 AM, Max Gurtovoy wrote:



+
+    case NVME_CTRL_RESETTING:
+    switch (old_state) {
+    case NVME_CTRL_RESET_PREPARE:
+    changed = true;
+    /* FALLTHRU */
+    default:
+    break;
+    }
+    break;
  case NVME_CTRL_RECONNECTING:
  switch (old_state) {
  case NVME_CTRL_LIVE:
-    case NVME_CTRL_RESETTING:
+    case NVME_CTRL_RESET_PREPARE:


As I suggested in V3, please don't allow this transition.
We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.

I look on it like that:

NVME_CTRL_RESET_PREPARE - "suspend" state
NVME_CTRL_RESETTING - "resume" state

you don't reconnect from "suspend" state, you must "resume" before you 
reconnect.


This makes no sense to me.

I could use a definition of what "suspend" and "resume" mean to you

from what I've seen so far:
NVME_CTRL_RESET_PREPARE:   means I've decided to reset, changed state, 
but the actual work for reset hasn't started yet.   As we haven't 
commonized who does the quiescing of the queues, the queues are still 
live at this state, although some nvme check routine may bounce them. In 
truth, the queues should be quiesced here.


NVME_CTRL_RESETTING: I'm resetting the controller, tearing down 
queues/connections, the link side association.  AFAIK - pci and all the 
other transports have to do things things.   Now is when the blk-mq 
queues get stopped.   We have a variance on whether the queues are 
unquiesced or left quiesced (I think this is what you meant by "resume", 
where resume means unquiesce) at the end of this.   The admin_q is 
unquiesced, meaning new admin cmds should fail.  rdma also has io queues 
unquiesced meaning new ios fail, while fc leaves them quiesced while 
background timers run - meaning no new ios issued, nor any fail back to 
a multipather. With the agreement that we would patch all of the 
transports to leave them quiesced with fast-fail-timeouts occuring to 
unquiesce them and start failing ios.


NVME_RECONNECTING: transitioned to after the link-side association is 
terminated and the transport will now attempt to reconnect (perhaps 
several attempts) to create a new link-side association. Stays in this 
state until the controller is fully reconnected and it transitions to 
NVME_LIVE.   Until the link side association is active, queues do what 
they do (as left by RESETTING and/or updated per timeouts) excepting 
that after an active assocation, they queues will be unquiesced at the 
time of the LIVE transition.   Note: we grandfathered PCI into not 
needing this state:   As you (almost) can't fail the 

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi max

On 01/17/2018 05:19 PM, jianchao.wang wrote:
> Hi Max
> 
> Thanks for your kindly response.
> 
> I have merged the response to you together below.
> On 01/17/2018 05:06 PM, Max Gurtovoy wrote:
>>>   case NVME_CTRL_RECONNECTING:
>>>   switch (old_state) {
>>>   case NVME_CTRL_LIVE:
>>> -    case NVME_CTRL_RESETTING:
>>> +    case NVME_CTRL_RESET_PREPARE:
>>
>> As I suggested in V3, please don't allow this transition.
>> We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.
>>
>> I look on it like that:
>>
>> NVME_CTRL_RESET_PREPARE - "suspend" state
>> NVME_CTRL_RESETTING - "resume" state
>>
>> you don't reconnect from "suspend" state, you must "resume" before you 
>> reconnect.
> 
>>> index d49b1e7..6b5f2f4 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -985,7 +985,7 @@ static void nvme_rdma_error_recovery_work(struct 
>>> work_struct *work)
>>>     static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
>>>   {
>>> -    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESETTING))
>>> +    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESET_PREPARE))
>>>   return;
>>
>> We can add a NVME_CTRL_RESET_PREPARE --> NVME_CTRL_RESETTING transition and 
>> then move to NVME_CTRL_RECONNECTING (in nvme_rdma_reset_ctrl_work and 
>> nvme_rdma_error_recovery_work).
>> I want to add an ability to recover from device removal (actually wanted to 
>> send it today but I'm waiting to see what will happen with this patchset) 
>> for RDMA and your approach (enable transition to from both "suspend" and 
>> "resume" to "reconnect") might be problematic.
>>
>> Sagi/Christoph ?
> 
> I used to respond you in the V3 and wait for your feedback.
> Please refer to:

> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl 
> state is changed to RECONNECTING state
> after some clearing and shutdown work, then some initializing procedure,  no 
> matter reset work path or error recovery path.
> The fc reset work also does the same thing.
> So if we define the range that RESET_PREPARE includes scheduling gap and 
> disable and clear work, RESETTING includes initializing
> procedure,  RECONNECTING is very similar with RESETTING.
> 
> Maybe we could do like this;
> In nvme fc/rdma
> - set state to RESET_PREPARE, queue reset_work/err_work 
> - clear/shutdown works, set state to RECONNECTING
> - initialization, set state to LIVE
> 
> In nvme pci
> - set state to RESET_PREPARE, queue reset_work
> - clear/shutdown works, set state to RESETTING
> - initialization, set state to LIVE

> 
> Currently, RECONNECTING has overlapped with RESETTING.
> So I suggest to use RESET_PREPARE to mark the "suspend" part as you said.
> And use RECONNECTING to mark the "resume" part in nvme-rdma/fc
> use RESETTING part to mark the "resume" part in nvme-pci, nvmt-loop.

I have added a comment above the definition of nvme_ctrl_state as below:
  +/*
+ * RESET_PREPARE - mark the state of scheduling gap and disable procedure
+ * RESETTING - nvme-pci, nvmet loop use it to mark the state of setup
+ *   procedure
+ * RECONNECTING  - nvme-fc, nvme-rdma use it to mark the state of setup
+ *   and reconnect procedure
+ */

Consequently, nvme-fc/rdma don't use RESETTING state any more, but only 
RESET_PREPARE.

Thanks
Jianchao

> 
> I want to confirm with all of you, but no none had feedback, so I sent out 
> the patch directly.
> Please forgive my abrupt actions.
> 
> Thanks
> Jianchao
> 
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme=DwIGAA=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=lUkrKY6u8mGr5k1ajcQSKfsk1N2wKpmD60MJfojz4iY=AtvJd5ElCQ_WoO2Q5-LBTEdlZeTmWOHEGO-baUZyl-o=
> 


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi max

On 01/17/2018 05:19 PM, jianchao.wang wrote:
> Hi Max
> 
> Thanks for your kindly response.
> 
> I have merged the response to you together below.
> On 01/17/2018 05:06 PM, Max Gurtovoy wrote:
>>>   case NVME_CTRL_RECONNECTING:
>>>   switch (old_state) {
>>>   case NVME_CTRL_LIVE:
>>> -    case NVME_CTRL_RESETTING:
>>> +    case NVME_CTRL_RESET_PREPARE:
>>
>> As I suggested in V3, please don't allow this transition.
>> We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.
>>
>> I look on it like that:
>>
>> NVME_CTRL_RESET_PREPARE - "suspend" state
>> NVME_CTRL_RESETTING - "resume" state
>>
>> you don't reconnect from "suspend" state, you must "resume" before you 
>> reconnect.
> 
>>> index d49b1e7..6b5f2f4 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -985,7 +985,7 @@ static void nvme_rdma_error_recovery_work(struct 
>>> work_struct *work)
>>>     static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
>>>   {
>>> -    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESETTING))
>>> +    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESET_PREPARE))
>>>   return;
>>
>> We can add a NVME_CTRL_RESET_PREPARE --> NVME_CTRL_RESETTING transition and 
>> then move to NVME_CTRL_RECONNECTING (in nvme_rdma_reset_ctrl_work and 
>> nvme_rdma_error_recovery_work).
>> I want to add an ability to recover from device removal (actually wanted to 
>> send it today but I'm waiting to see what will happen with this patchset) 
>> for RDMA and your approach (enable transition to from both "suspend" and 
>> "resume" to "reconnect") might be problematic.
>>
>> Sagi/Christoph ?
> 
> I used to respond you in the V3 and wait for your feedback.
> Please refer to:

> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl 
> state is changed to RECONNECTING state
> after some clearing and shutdown work, then some initializing procedure,  no 
> matter reset work path or error recovery path.
> The fc reset work also does the same thing.
> So if we define the range that RESET_PREPARE includes scheduling gap and 
> disable and clear work, RESETTING includes initializing
> procedure,  RECONNECTING is very similar with RESETTING.
> 
> Maybe we could do like this;
> In nvme fc/rdma
> - set state to RESET_PREPARE, queue reset_work/err_work 
> - clear/shutdown works, set state to RECONNECTING
> - initialization, set state to LIVE
> 
> In nvme pci
> - set state to RESET_PREPARE, queue reset_work
> - clear/shutdown works, set state to RESETTING
> - initialization, set state to LIVE

> 
> Currently, RECONNECTING has overlapped with RESETTING.
> So I suggest to use RESET_PREPARE to mark the "suspend" part as you said.
> And use RECONNECTING to mark the "resume" part in nvme-rdma/fc
> use RESETTING part to mark the "resume" part in nvme-pci, nvmt-loop.

I have added a comment above the definition of nvme_ctrl_state as below:
  +/*
+ * RESET_PREPARE - mark the state of scheduling gap and disable procedure
+ * RESETTING - nvme-pci, nvmet loop use it to mark the state of setup
+ *   procedure
+ * RECONNECTING  - nvme-fc, nvme-rdma use it to mark the state of setup
+ *   and reconnect procedure
+ */

Consequently, nvme-fc/rdma don't use RESETTING state any more, but only 
RESET_PREPARE.

Thanks
Jianchao

> 
> I want to confirm with all of you, but no none had feedback, so I sent out 
> the patch directly.
> Please forgive my abrupt actions.
> 
> Thanks
> Jianchao
> 
> ___
> Linux-nvme mailing list
> linux-n...@lists.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme=DwIGAA=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=lUkrKY6u8mGr5k1ajcQSKfsk1N2wKpmD60MJfojz4iY=AtvJd5ElCQ_WoO2Q5-LBTEdlZeTmWOHEGO-baUZyl-o=
> 


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi Max

Thanks for your kindly response.

I have merged the response to you together below.
On 01/17/2018 05:06 PM, Max Gurtovoy wrote:
>>   case NVME_CTRL_RECONNECTING:
>>   switch (old_state) {
>>   case NVME_CTRL_LIVE:
>> -    case NVME_CTRL_RESETTING:
>> +    case NVME_CTRL_RESET_PREPARE:
> 
> As I suggested in V3, please don't allow this transition.
> We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.
> 
> I look on it like that:
> 
> NVME_CTRL_RESET_PREPARE - "suspend" state
> NVME_CTRL_RESETTING - "resume" state
> 
> you don't reconnect from "suspend" state, you must "resume" before you 
> reconnect.

>> index d49b1e7..6b5f2f4 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -985,7 +985,7 @@ static void nvme_rdma_error_recovery_work(struct 
>> work_struct *work)
>>     static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
>>   {
>> -    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESETTING))
>> +    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESET_PREPARE))
>>   return;
> 
> We can add a NVME_CTRL_RESET_PREPARE --> NVME_CTRL_RESETTING transition and 
> then move to NVME_CTRL_RECONNECTING (in nvme_rdma_reset_ctrl_work and 
> nvme_rdma_error_recovery_work).
> I want to add an ability to recover from device removal (actually wanted to 
> send it today but I'm waiting to see what will happen with this patchset) for 
> RDMA and your approach (enable transition to from both "suspend" and "resume" 
> to "reconnect") might be problematic.
> 
> Sagi/Christoph ?

I used to respond you in the V3 and wait for your feedback.
Please refer to:
>>>
After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state 
is changed to RECONNECTING state
after some clearing and shutdown work, then some initializing procedure,  no 
matter reset work path or error recovery path.
The fc reset work also does the same thing.
So if we define the range that RESET_PREPARE includes scheduling gap and 
disable and clear work, RESETTING includes initializing
procedure,  RECONNECTING is very similar with RESETTING.

Maybe we could do like this;
In nvme fc/rdma
- set state to RESET_PREPARE, queue reset_work/err_work 
- clear/shutdown works, set state to RECONNECTING
- initialization, set state to LIVE

In nvme pci
- set state to RESET_PREPARE, queue reset_work
- clear/shutdown works, set state to RESETTING
- initialization, set state to LIVE
>>>

Currently, RECONNECTING has overlapped with RESETTING.
So I suggest to use RESET_PREPARE to mark the "suspend" part as you said.
And use RECONNECTING to mark the "resume" part in nvme-rdma/fc
use RESETTING part to mark the "resume" part in nvme-pci, nvmt-loop.

I want to confirm with all of you, but no none had feedback, so I sent out the 
patch directly.
Please forgive my abrupt actions.

Thanks
Jianchao

Thanks
Jianchao


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi Max

Thanks for your kindly response.

I have merged the response to you together below.
On 01/17/2018 05:06 PM, Max Gurtovoy wrote:
>>   case NVME_CTRL_RECONNECTING:
>>   switch (old_state) {
>>   case NVME_CTRL_LIVE:
>> -    case NVME_CTRL_RESETTING:
>> +    case NVME_CTRL_RESET_PREPARE:
> 
> As I suggested in V3, please don't allow this transition.
> We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.
> 
> I look on it like that:
> 
> NVME_CTRL_RESET_PREPARE - "suspend" state
> NVME_CTRL_RESETTING - "resume" state
> 
> you don't reconnect from "suspend" state, you must "resume" before you 
> reconnect.

>> index d49b1e7..6b5f2f4 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -985,7 +985,7 @@ static void nvme_rdma_error_recovery_work(struct 
>> work_struct *work)
>>     static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
>>   {
>> -    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESETTING))
>> +    if (!nvme_change_ctrl_state(>ctrl, NVME_CTRL_RESET_PREPARE))
>>   return;
> 
> We can add a NVME_CTRL_RESET_PREPARE --> NVME_CTRL_RESETTING transition and 
> then move to NVME_CTRL_RECONNECTING (in nvme_rdma_reset_ctrl_work and 
> nvme_rdma_error_recovery_work).
> I want to add an ability to recover from device removal (actually wanted to 
> send it today but I'm waiting to see what will happen with this patchset) for 
> RDMA and your approach (enable transition to from both "suspend" and "resume" 
> to "reconnect") might be problematic.
> 
> Sagi/Christoph ?

I used to respond you in the V3 and wait for your feedback.
Please refer to:
>>>
After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state 
is changed to RECONNECTING state
after some clearing and shutdown work, then some initializing procedure,  no 
matter reset work path or error recovery path.
The fc reset work also does the same thing.
So if we define the range that RESET_PREPARE includes scheduling gap and 
disable and clear work, RESETTING includes initializing
procedure,  RECONNECTING is very similar with RESETTING.

Maybe we could do like this;
In nvme fc/rdma
- set state to RESET_PREPARE, queue reset_work/err_work 
- clear/shutdown works, set state to RECONNECTING
- initialization, set state to LIVE

In nvme pci
- set state to RESET_PREPARE, queue reset_work
- clear/shutdown works, set state to RESETTING
- initialization, set state to LIVE
>>>

Currently, RECONNECTING has overlapped with RESETTING.
So I suggest to use RESET_PREPARE to mark the "suspend" part as you said.
And use RECONNECTING to mark the "resume" part in nvme-rdma/fc
use RESETTING part to mark the "resume" part in nvme-pci, nvmt-loop.

I want to confirm with all of you, but no none had feedback, so I sent out the 
patch directly.
Please forgive my abrupt actions.

Thanks
Jianchao

Thanks
Jianchao


Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread Max Gurtovoy

hi Jianchao Wang,

On 1/17/2018 6:54 AM, Jianchao Wang wrote:

Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked. In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work, and could lead to some confusion.

In addition, after set state to RESETTING and disable procedure,
nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
reconnect procedure. The RESETTING state has been narrowed.

This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
or error recovery work, scheduling gap and disable procedure.
After that,
  - For nvme-pci, nvmet-loop, set state to RESETTING, start
initialization.
  - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
initialization or reconnect.

Suggested-by: Christoph Hellwig 
Signed-off-by: Jianchao Wang 
---
  drivers/nvme/host/core.c   | 18 +++---
  drivers/nvme/host/fc.c |  4 ++--
  drivers/nvme/host/nvme.h   |  8 
  drivers/nvme/host/pci.c| 25 +
  drivers/nvme/host/rdma.c   |  2 +-
  drivers/nvme/target/loop.c |  5 +
  6 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 230cc09..87d209f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
  
  int nvme_reset_ctrl(struct nvme_ctrl *ctrl)

  {
-   if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
+   if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
return -EBUSY;
if (!queue_work(nvme_wq, >reset_work))
return -EBUSY;
@@ -260,7 +260,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
-   case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:
switch (old_state) {
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
@@ -271,10 +271,20 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
+
+   case NVME_CTRL_RESETTING:
+   switch (old_state) {
+   case NVME_CTRL_RESET_PREPARE:
+   changed = true;
+   /* FALLTHRU */
+   default:
+   break;
+   }
+   break;
case NVME_CTRL_RECONNECTING:
switch (old_state) {
case NVME_CTRL_LIVE:
-   case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:


As I suggested in V3, please don't allow this transition.
We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.

I look on it like that:

NVME_CTRL_RESET_PREPARE - "suspend" state
NVME_CTRL_RESETTING - "resume" state

you don't reconnect from "suspend" state, you must "resume" before you 
reconnect.




changed = true;
/* FALLTHRU */
default:
@@ -286,6 +296,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
case NVME_CTRL_LIVE:
case NVME_CTRL_ADMIN_ONLY:
case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:
case NVME_CTRL_RECONNECTING:
changed = true;
/* FALLTHRU */
@@ -2660,6 +2671,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
[NVME_CTRL_LIVE]= "live",
[NVME_CTRL_ADMIN_ONLY]  = "only-admin",
[NVME_CTRL_RESETTING]   = "resetting",
+   [NVME_CTRL_RESET_PREPARE]   = "reset-prepare",
[NVME_CTRL_RECONNECTING]= "reconnecting",
[NVME_CTRL_DELETING]= "deleting",
[NVME_CTRL_DEAD]= "dead",
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 306aee4..1ba0669 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -546,7 +546,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
queue_delayed_work(nvme_wq, >connect_work, 0);
break;
  
-	case NVME_CTRL_RESETTING:

+   case NVME_CTRL_RESET_PREPARE:
/*
 * Controller is already in the process of terminating the
 * association. No need to do anything further. The reconnect
@@ -789,7 +789,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 */
break;
  
-	case NVME_CTRL_RESETTING:

+   case NVME_CTRL_RESET_PREPARE:
/*
 * Controller is already in the process of terminating the
 * association.  No need to do anything further. The reconnect
diff 

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread Max Gurtovoy

hi Jianchao Wang,

On 1/17/2018 6:54 AM, Jianchao Wang wrote:

Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked. In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work, and could lead to some confusion.

In addition, after set state to RESETTING and disable procedure,
nvme-rdma/fc use NVME_CTRL_RECONNECTING to mark the setup and
reconnect procedure. The RESETTING state has been narrowed.

This patch add NVME_CTRL_RESET_PREPARE state to mark the reset_work
or error recovery work, scheduling gap and disable procedure.
After that,
  - For nvme-pci, nvmet-loop, set state to RESETTING, start
initialization.
  - For nvme-rdma, nvme-fc, set state to RECONNECTING, start
initialization or reconnect.

Suggested-by: Christoph Hellwig 
Signed-off-by: Jianchao Wang 
---
  drivers/nvme/host/core.c   | 18 +++---
  drivers/nvme/host/fc.c |  4 ++--
  drivers/nvme/host/nvme.h   |  8 
  drivers/nvme/host/pci.c| 25 +
  drivers/nvme/host/rdma.c   |  2 +-
  drivers/nvme/target/loop.c |  5 +
  6 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 230cc09..87d209f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
  
  int nvme_reset_ctrl(struct nvme_ctrl *ctrl)

  {
-   if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
+   if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
return -EBUSY;
if (!queue_work(nvme_wq, >reset_work))
return -EBUSY;
@@ -260,7 +260,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
-   case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:
switch (old_state) {
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
@@ -271,10 +271,20 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
+
+   case NVME_CTRL_RESETTING:
+   switch (old_state) {
+   case NVME_CTRL_RESET_PREPARE:
+   changed = true;
+   /* FALLTHRU */
+   default:
+   break;
+   }
+   break;
case NVME_CTRL_RECONNECTING:
switch (old_state) {
case NVME_CTRL_LIVE:
-   case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:


As I suggested in V3, please don't allow this transition.
We'll move to NVME_CTRL_RECONNECTING from NVME_CTRL_RESETTING.

I look on it like that:

NVME_CTRL_RESET_PREPARE - "suspend" state
NVME_CTRL_RESETTING - "resume" state

you don't reconnect from "suspend" state, you must "resume" before you 
reconnect.




changed = true;
/* FALLTHRU */
default:
@@ -286,6 +296,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
case NVME_CTRL_LIVE:
case NVME_CTRL_ADMIN_ONLY:
case NVME_CTRL_RESETTING:
+   case NVME_CTRL_RESET_PREPARE:
case NVME_CTRL_RECONNECTING:
changed = true;
/* FALLTHRU */
@@ -2660,6 +2671,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
[NVME_CTRL_LIVE]= "live",
[NVME_CTRL_ADMIN_ONLY]  = "only-admin",
[NVME_CTRL_RESETTING]   = "resetting",
+   [NVME_CTRL_RESET_PREPARE]   = "reset-prepare",
[NVME_CTRL_RECONNECTING]= "reconnecting",
[NVME_CTRL_DELETING]= "deleting",
[NVME_CTRL_DEAD]= "dead",
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 306aee4..1ba0669 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -546,7 +546,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
queue_delayed_work(nvme_wq, >connect_work, 0);
break;
  
-	case NVME_CTRL_RESETTING:

+   case NVME_CTRL_RESET_PREPARE:
/*
 * Controller is already in the process of terminating the
 * association. No need to do anything further. The reconnect
@@ -789,7 +789,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 */
break;
  
-	case NVME_CTRL_RESETTING:

+   case NVME_CTRL_RESET_PREPARE:
/*
 * Controller is already in the process of terminating the
 * association.  No need to do anything further. The reconnect
diff --git a/drivers/nvme/host/nvme.h