Re: [PATCH v4] ibmvscsis: Do not send aborted task response

2017-05-08 Thread Nicholas A. Bellinger
Hi Bryant,

Given running we're almost out of time for -rc1, I'd like to avoid
having to rebase the handful of patches that are atop the -v3 that was
applied to target-pending/for-next over the weekend...

So if you'd be so kind, please post an incremental patch atop -v3, and
I'll apply that instead.

On Mon, 2017-05-08 at 17:07 -0500, Bryant G. Ly wrote:
> The driver is sending a response to the actual scsi op that was
> aborted by an abort task TM, while LIO is sending a response to
> the abort task TM.
> 
> ibmvscsis_tgt does not send the response to the client until
> release_cmd time. The reason for this was because if we did it
> at queue_status time, then the client would be free to reuse the
> tag for that command, but we're still using the tag until the
> command is released at release_cmd time, so we chose to delay
> sending the response until then. That then caused this issue, because
> release_cmd is always called, even if queue_status is not.
> 
> SCSI spec says that the initiator that sends the abort task
> TM NEVER gets a response to the aborted op and with the current
> code it will send a response. Thus this fix will remove that response
> if the CMD_T_ABORTED && !CMD_T_TAS.
> 
> Another case with a small timing window is the case where if LIO sends a
> TMR_DOES_NOT_EXIST, and the release_cmd callback is called for the TMR Abort
> cmd before the release_cmd for the (attemped) aborted cmd, then we need to
> ensure that we send the response for the (attempted) abort cmd to the client
> before we send the response for the TMR Abort cmd.
> 
> Forgot to reset the cmd->abort_cmd pointer after finding it in send_messages.
> 
> Cc:  # v4.8+
> Signed-off-by: Bryant G. Ly 
> Signed-off-by: Michael Cyr 
> ---
>  drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 117 
> ---
>  drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h |   2 +
>  2 files changed, 94 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
> b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> index 0f80779..ee64241 100644
> --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> @@ -1170,6 +1170,9 @@ static struct ibmvscsis_cmd 
> *ibmvscsis_get_free_cmd(struct scsi_info *vscsi)
>   cmd = list_first_entry_or_null(>free_cmd,
>  struct ibmvscsis_cmd, list);
>   if (cmd) {
> + if (cmd->abort_cmd)
> + cmd->abort_cmd = NULL;
> + cmd->flags &= ~(DELAY_SEND);
>   list_del(>list);
>   cmd->iue = iue;
>   cmd->type = UNSET_TYPE;
> @@ -1749,45 +1752,80 @@ static void srp_snd_msg_failed(struct scsi_info 
> *vscsi, long rc)
>  static void ibmvscsis_send_messages(struct scsi_info *vscsi)
>  {
>   u64 msg_hi = 0;
> - /* note do not attmempt to access the IU_data_ptr with this pointer
> + /* note do not attempt to access the IU_data_ptr with this pointer
>* it is not valid
>*/
>   struct viosrp_crq *crq = (struct viosrp_crq *)_hi;
>   struct ibmvscsis_cmd *cmd, *nxt;
>   struct iu_entry *iue;
>   long rc = ADAPT_SUCCESS;
> + bool retry = false;
>  
>   if (!(vscsi->flags & RESPONSE_Q_DOWN)) {
> - list_for_each_entry_safe(cmd, nxt, >waiting_rsp, list) {
> - iue = cmd->iue;
> + do {
> + retry = false;
> + list_for_each_entry_safe(cmd, nxt, >waiting_rsp,
> +  list) {
> + /*
> +  * Check to make sure abort cmd gets processed
> +  * prior to the abort tmr cmd
> +  */
> + if (cmd->flags & DELAY_SEND)
> + continue;
>  
> - crq->valid = VALID_CMD_RESP_EL;
> - crq->format = cmd->rsp.format;
> + if (cmd->abort_cmd) {
> + retry = true;
> + cmd->abort_cmd->flags &= ~(DELAY_SEND);
> + cmd->abort_cmd = NULL;
> + }
>  
> - if (cmd->flags & CMD_FAST_FAIL)
> - crq->status = VIOSRP_ADAPTER_FAIL;
> + /*
> +  * If CMD_T_ABORTED w/o CMD_T_TAS scenarios and
> +  * the case where LIO issued a
> +  * ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST
> +  * case then we dont send a response, since it
> +  * was already done.
> +  */
> 

Re: [PATCH] scsi: qedf: properly update arguments position in function call

2017-05-08 Thread Gustavo A. R. Silva

Hi Martin,

Quoting "Martin K. Petersen" :


Gustavo A.,


Properly update the position of the arguments in function call.


Applied to 4.12/scsi-fixes, thank you!



Awesome, glad to help. :)
--
Gustavo A. R. Silva







Re: [PATCH] scsi: pmcraid: remove redundant check to see if request_size is less than zero

2017-05-08 Thread Martin K. Petersen

Colin,

> The 2nd check to see if request_size is less than zero is redundant
> because the first check takes error exit path on this condition. So,
> since it is redundant, remove it.

Applied to 4.12/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: lpfc: ensure els_wq is being checked before destroying it

2017-05-08 Thread Martin K. Petersen

Colin,

> I believe there is a typo on the wq destroy of els_wq, currently the
> driver is checking if els_cq is not null and I think this should be a
> check on els_wq instead.

Applied to 4.12/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] cxlflash: Select IRQ_POLL

2017-05-08 Thread Martin K. Petersen

Guenter,

> The driver now uses IRQ_POLL and needs to select it to avoid the following
> build error.

Applied to 4.12/scsi-fixes, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedf: Avoid reading past end of buffer

2017-05-08 Thread Martin K. Petersen

Kees,

> Using memcpy() from a string that is shorter than the length copied
> means the destination buffer is being filled with arbitrary data from
> the kernel rodata segment. Instead, use strncpy() which will fill the
> trailing bytes with zeros.

Applied to 4.12/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: remove REQ_OP_WRITE_SAME

2017-05-08 Thread Martin K. Petersen

Christoph,

> Any chance to get a sneak preview of that work?

I have been on the road since LSF/MM and just got back home. I'll make
it a priority.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [REEEEPOST] bnx2i + bnx2fc: convert to generic workqueue (#3)

2017-05-08 Thread Martin K. Petersen

Sebastian,

> Martin, do you see any chance to get this merged? Chad replied to the
> list that he is going to test it on 2017-04-10, didn't respond to the
> ping 10 days later. The series stalled last time in the same way.

I am very reluctant to merge something when a driver has an active
maintainer and that person has not acked the change.

That said, Chad: You have been sitting on this for quite a while. Please
make it a priority. In exchange for veto rights you do have to provide
timely feedback on anything that touches your driver.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] sd: Ignore sync cache failures when not supported

2017-05-08 Thread Martin K. Petersen

Christoph,

> Normally we'd just pass the scsi_sense_hdr structure in from the caler
> if we care about sense data.  Is this something you considered?
>
> Otherwise this looks fine to me.

I agree with Christoph that passing the sense header would be more
consistent with the rest of the SCSI code. Even if we only need the key
in this case.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedf: Cleanup the type of io_log->op

2017-05-08 Thread Martin K. Petersen

Dan,

> We store sc_cmd->cmnd[0] which is an unsigned char in io_log->op so
> this should also be unsigned char.  The other thing is that this is
> displayed in the debugfs:

Applied to 4.12/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: lpfc: double lock typo in lpfc_ns_rsp()

2017-05-08 Thread Martin K. Petersen

Dan,

> There is a double lock bug here so this will deadlock instead of
> unlocking.

Applied to 4.12/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedf: properly update arguments position in function call

2017-05-08 Thread Martin K. Petersen

Gustavo A.,

> Properly update the position of the arguments in function call.

Applied to 4.12/scsi-fixes, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi_lib: Add #include

2017-05-08 Thread Martin K. Petersen

Bart,

> This patch avoids that when building with W=1 the compiler complains
> that __scsi_init_queue() has not been declared.  See also commit
> d48777a633d6 ("scsi: remove __scsi_alloc_queue").

Applied to 4.12/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH, RFC] MAINTAINERS: update OSD entries

2017-05-08 Thread Martin K. Petersen

Christoph,

> The open-osd domain doesn't exist anymore, and mails to the list lead
> to really annoying bounced that repeat every day.
>
> Also the primarydata address for Benny bounces, and while I have a new
> one for him he doesn't seem to be maintaining the OSD code any more.
>
> Which beggs the question: should we really leave the Supported status
> in MAINTAINERS given that the code is barely maintained?

Applied to 4.12/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH]scsi: megaraid_sas: fix raid card hotswap failure

2017-05-08 Thread Martin K. Petersen

Zhou,

> When a scsi_device is unpluged from scsi controller, if the
> scsi_device is still be used by application layer,it won't be released
> until users release it. In this case, scsi_device_remove just set the
> scsi_device's state to be SDEV_DEL. But if you plug the disk just
> before the old scsi_device is released, then there will be two
> scsi_device structures in scsi_host->__devices. when the next
> unpluging event happens,some low-level drivers will check whether the
> scsi_device has been added to host (for example, the megaraid sas
> series controller) by calling scsi_device_lookup(call
> __scsi_device_lookup) in function
> megasas_aen_polling.__scsi_device_lookup will return the first
> scsi_device.  Because its state is SDEV_DEL, the scsi_device_lookup
> will return NULL finally, making the low-level driver assume that the
> scsi_device has been removed,and won't call scsi_device_remove,which
> will lead the failure of hot swap.

Applied to 4.12/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3] lpfc: Fix panic on BFS configuration.

2017-05-08 Thread Martin K. Petersen

James,

> Fix is to reset the sli-3 function before sending the mailbox command,
> thus synchronizing the function/driver on mailbox location.

Applied to 4.12/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] libfc: do not flood console with messages 'libfc: queue full ...'

2017-05-08 Thread Martin K. Petersen

Hannes,

> When the FCoE sending side becomes congested libfc tries to reduce the
> queue depth on the host; however due to the built-in lag before
> attempting to ramp down the queue depth _again_ the message log is
> flooded with messages
>
> libfc: queue full, reducing can_queue to 512
>
> With this patch the message is printed only once (ie when it's
> actually changed).

Applied to 4.12/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v4] ibmvscsis: Do not send aborted task response

2017-05-08 Thread Bryant G. Ly
The driver is sending a response to the actual scsi op that was
aborted by an abort task TM, while LIO is sending a response to
the abort task TM.

ibmvscsis_tgt does not send the response to the client until
release_cmd time. The reason for this was because if we did it
at queue_status time, then the client would be free to reuse the
tag for that command, but we're still using the tag until the
command is released at release_cmd time, so we chose to delay
sending the response until then. That then caused this issue, because
release_cmd is always called, even if queue_status is not.

SCSI spec says that the initiator that sends the abort task
TM NEVER gets a response to the aborted op and with the current
code it will send a response. Thus this fix will remove that response
if the CMD_T_ABORTED && !CMD_T_TAS.

Another case with a small timing window is the case where if LIO sends a
TMR_DOES_NOT_EXIST, and the release_cmd callback is called for the TMR Abort
cmd before the release_cmd for the (attemped) aborted cmd, then we need to
ensure that we send the response for the (attempted) abort cmd to the client
before we send the response for the TMR Abort cmd.

Forgot to reset the cmd->abort_cmd pointer after finding it in send_messages.

Cc:  # v4.8+
Signed-off-by: Bryant G. Ly 
Signed-off-by: Michael Cyr 
---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 117 ---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h |   2 +
 2 files changed, 94 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 0f80779..ee64241 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1170,6 +1170,9 @@ static struct ibmvscsis_cmd 
*ibmvscsis_get_free_cmd(struct scsi_info *vscsi)
cmd = list_first_entry_or_null(>free_cmd,
   struct ibmvscsis_cmd, list);
if (cmd) {
+   if (cmd->abort_cmd)
+   cmd->abort_cmd = NULL;
+   cmd->flags &= ~(DELAY_SEND);
list_del(>list);
cmd->iue = iue;
cmd->type = UNSET_TYPE;
@@ -1749,45 +1752,80 @@ static void srp_snd_msg_failed(struct scsi_info *vscsi, 
long rc)
 static void ibmvscsis_send_messages(struct scsi_info *vscsi)
 {
u64 msg_hi = 0;
-   /* note do not attmempt to access the IU_data_ptr with this pointer
+   /* note do not attempt to access the IU_data_ptr with this pointer
 * it is not valid
 */
struct viosrp_crq *crq = (struct viosrp_crq *)_hi;
struct ibmvscsis_cmd *cmd, *nxt;
struct iu_entry *iue;
long rc = ADAPT_SUCCESS;
+   bool retry = false;
 
if (!(vscsi->flags & RESPONSE_Q_DOWN)) {
-   list_for_each_entry_safe(cmd, nxt, >waiting_rsp, list) {
-   iue = cmd->iue;
+   do {
+   retry = false;
+   list_for_each_entry_safe(cmd, nxt, >waiting_rsp,
+list) {
+   /*
+* Check to make sure abort cmd gets processed
+* prior to the abort tmr cmd
+*/
+   if (cmd->flags & DELAY_SEND)
+   continue;
 
-   crq->valid = VALID_CMD_RESP_EL;
-   crq->format = cmd->rsp.format;
+   if (cmd->abort_cmd) {
+   retry = true;
+   cmd->abort_cmd->flags &= ~(DELAY_SEND);
+   cmd->abort_cmd = NULL;
+   }
 
-   if (cmd->flags & CMD_FAST_FAIL)
-   crq->status = VIOSRP_ADAPTER_FAIL;
+   /*
+* If CMD_T_ABORTED w/o CMD_T_TAS scenarios and
+* the case where LIO issued a
+* ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST
+* case then we dont send a response, since it
+* was already done.
+*/
+   if (cmd->se_cmd.transport_state & CMD_T_ABORTED 
&&
+   !(cmd->se_cmd.transport_state & CMD_T_TAS)) 
{
+   list_del(>list);
+   ibmvscsis_free_cmd_resources(vscsi,
+cmd);
+   } else {
+   

Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
On Mon 2017-05-08 16:40:11, David Woodhouse wrote:
> On Mon, 2017-05-08 at 13:50 +0200, Boris Brezillon wrote:
> > On Mon, 08 May 2017 11:13:10 +0100
> > David Woodhouse  wrote:
> > 
> > > 
> > > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote:
> > > > 
> > > > You're forgetting that the SSD itself (this thread is about SSDs) also 
> > > > has
> > > > a major software component which is doing housekeeping all the time, so 
> > > > even
> > > > if the main CPU gets reset the SSD's controller may still happily be 
> > > > erasing
> > > > blocks.  
> > > We're not really talking about SSDs at all any more; we're talking
> > > about real flash with real maintainable software.
> >
> > It's probably a good sign that this new discussion should take place in
> > a different thread :-).
> 
> Well, maybe. But it was a silly thread in the first place. SATA SSDs
> aren't *expected* to be reliable.

Citation needed?

I'm pretty sure SATA SSDs are expected to be reliable, up to maximum
amount of gigabytes written (specified by manufacturer), as long as
you don't cut power without warning.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


RE: [PATCH 00/19] aacraid: Patchset with reset rework and misc fixes

2017-05-08 Thread Raghava Aditya Renukunta
Hi,

The patch set  has been sent twice. Please ignore the later sent/received  
patch set.

I had sent the original patch set on Saturday, but I did not receive a 
confirmation nor did any mailing list archive(http://marc.info/?l=linux-scsi) 
pick them up  (should have waited).  The next day I resent the patch set and 
here we are.

Regards,
Raghava Aditya

> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Raghava Aditya Renukunta
> Sent: Sunday, May 7, 2017 6:34 AM
> To: j...@linux.vnet.ibm.com; martin.peter...@oracle.com; linux-
> s...@vger.kernel.org
> Cc: Dave Carroll ; Gana Sridaran
> ; Scott Benesh
> ; Prasad Munirathnam
> 
> Subject: [PATCH 00/19] aacraid: Patchset with reset rework and misc fixes
> 
> EXTERNAL EMAIL
> 
> 
> This patchset primarily focuses on tweaking and hardening the controller
> reset support for both ARC and HBA1000 devices. Now the driver can only
> reset the controller thru eh reset. Included a srb memory fix and pci
> dma allocation fix.
> 
> Raghava Aditya Renukunta (19):
>  [SCSI] aacraid: Remove GFP_DMA for raw srb memory
>  [SCSI] aacraid: Fix DMAR issues with  iommu=pt
>  [SCSI] aacraid: Added 32 and 64 queue depth for arc natives
>  [SCSI] aacraid: Set correct Queue Depth for HBA1000 RAW disks
>  [SCSI] aacraid: Remove reset support from check_health
>  [SCSI] aacraid: Change wait time for fib completion
>  [SCSI] aacraid: Log count info of scsi cmds before reset
>  [SCSI] aacraid: Print ctrl status before eh reset
>  [SCSI] aacraid: Using single reset mask for IOP reset
>  [SCSI] aacraid: Rework IOP reset
>  [SCSI] aacraid: Add periodic checks to see IOP reset status
>  [SCSI] aacraid: Rework SOFT reset code
>  [SCSI] aacraid: Rework aac_src_restart
>  [SCSI] aacraid: Use correct function to get ctrl health
>  [SCSI] aacraid: Make sure ioctl returns on controller reset
>  [SCSI] aacraid: Enable ctrl reset for both hba and arc
>  [SCSI] aacraid: Add reset debugging statements
>  [SCSI] aacraid: Remove reference to Series-9
>  [SCSI] aacraid: Update driver version to 50834
> 
>  drivers/scsi/aacraid/aachba.c   |  17 ++-
>  drivers/scsi/aacraid/aacraid.h  |  22 +++-
>  drivers/scsi/aacraid/commctrl.c |  13 ++-
>  drivers/scsi/aacraid/comminit.c |  18 +---
>  drivers/scsi/aacraid/commsup.c  |  73 +++--
>  drivers/scsi/aacraid/linit.c| 232 --
> --
>  drivers/scsi/aacraid/src.c  | 136 +--
>  7 files changed, 296 insertions(+), 215 deletions(-)
> 
> --
> 2.7.4



Re: Race to power off harming SATA SSDs

2017-05-08 Thread Tejun Heo
Hello,

On Mon, May 08, 2017 at 08:56:15PM +0200, Pavel Machek wrote:
> Well... the SMART counter tells us that the device was not shut down
> correctly. Do we have reason to believe that it is _not_ telling us
> truth? It is more than one device.

It also finished power off command successfully.

> SSDs die when you power them without warning:
> http://lkcl.net/reports/ssd_analysis.html
> 
> What kind of data would you like to see? "I have been using linux and
> my SSD died"? We have had such reports. "I have killed 10 SSDs in a
> week then I added one second delay, and this SSD survived 6 months"?

Repeating shutdown cycles and showing that the device actually is in
trouble would be great.  It doesn't have to reach full-on device
failure.  Showing some sign of corruption would be enough - increase
in CRC failure counts, bad block counts (a lot of devices report
remaining reserve or lifetime in one way or the other) and so on.
Right now, it might as well be just the SMART counter being funky.

Thanks.

-- 
tejun


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
On Mon 2017-05-08 13:43:03, Tejun Heo wrote:
> Hello,
> 
> On Mon, May 08, 2017 at 06:43:22PM +0200, Pavel Machek wrote:
> > What I was trying to point out was that storage people try to treat
> > SSDs as HDDs... and SSDs are very different. Harddrives mostly survive
> > powerfails (with emergency parking), while it is very, very difficult
> > to make SSD survive random powerfail, and we have to make sure we
> > always powerdown SSDs "cleanly".
> 
> We do.
> 
> The issue raised is that some SSDs still increment the unexpected
> power loss count even after clean shutdown sequence and that the
> kernel should wait for some secs before powering off.
> 
> We can do that for select devices but I want something more than "this
> SMART counter is getting incremented" before doing that.

Well... the SMART counter tells us that the device was not shut down
correctly. Do we have reason to believe that it is _not_ telling us
truth? It is more than one device.

SSDs die when you power them without warning:
http://lkcl.net/reports/ssd_analysis.html

What kind of data would you like to see? "I have been using linux and
my SSD died"? We have had such reports. "I have killed 10 SSDs in a
week then I added one second delay, and this SSD survived 6 months"?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


RE: Race to power off harming SATA SSDs

2017-05-08 Thread Atlant Schmidt
> Well, you are right.. and I'm responsible.
>
> What I was trying to point out was that storage people try to treat SSDs as 
> HDDs...
> and SSDs are very different. Harddrives mostly survive powerfails (with 
> emergency
> parking), while it is very, very difficult to make SSD survive random 
> powerfail,
> and we have to make sure we always powerdown SSDs "cleanly".

  It all depends on the class of SSD that we're discussing.
  "Enterprise class" SSDs will often use either ultracapacitors
  or batteries to allow them to successfully complete all of
  the necessary operations upon a power cut.

This e-mail and the information, including any attachments it contains, are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message. Thank you. 
Please consider the environment before printing this email.


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Tejun Heo
Hello,

On Mon, May 08, 2017 at 06:43:22PM +0200, Pavel Machek wrote:
> What I was trying to point out was that storage people try to treat
> SSDs as HDDs... and SSDs are very different. Harddrives mostly survive
> powerfails (with emergency parking), while it is very, very difficult
> to make SSD survive random powerfail, and we have to make sure we
> always powerdown SSDs "cleanly".

We do.

The issue raised is that some SSDs still increment the unexpected
power loss count even after clean shutdown sequence and that the
kernel should wait for some secs before powering off.

We can do that for select devices but I want something more than "this
SMART counter is getting incremented" before doing that.

Thanks.

-- 
tejun


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
On Mon 2017-05-08 13:50:05, Boris Brezillon wrote:
> On Mon, 08 May 2017 11:13:10 +0100
> David Woodhouse  wrote:
> 
> > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote:
> > > You're forgetting that the SSD itself (this thread is about SSDs) also has
> > > a major software component which is doing housekeeping all the time, so 
> > > even
> > > if the main CPU gets reset the SSD's controller may still happily be 
> > > erasing
> > > blocks.  
> > 
> > We're not really talking about SSDs at all any more; we're talking
> > about real flash with real maintainable software.
> 
> It's probably a good sign that this new discussion should take place in
> a different thread :-).

Well, you are right.. and I'm responsible.

What I was trying to point out was that storage people try to treat
SSDs as HDDs... and SSDs are very different. Harddrives mostly survive
powerfails (with emergency parking), while it is very, very difficult
to make SSD survive random powerfail, and we have to make sure we
always powerdown SSDs "cleanly".

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 13:50 +0200, Boris Brezillon wrote:
> On Mon, 08 May 2017 11:13:10 +0100
> David Woodhouse  wrote:
> 
> > 
> > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote:
> > > 
> > > You're forgetting that the SSD itself (this thread is about SSDs) also has
> > > a major software component which is doing housekeeping all the time, so 
> > > even
> > > if the main CPU gets reset the SSD's controller may still happily be 
> > > erasing
> > > blocks.  
> > We're not really talking about SSDs at all any more; we're talking
> > about real flash with real maintainable software.
>
> It's probably a good sign that this new discussion should take place in
> a different thread :-).

Well, maybe. But it was a silly thread in the first place. SATA SSDs
aren't *expected* to be reliable.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] cxlflash: Select IRQ_POLL

2017-05-08 Thread Matthew R. Ochs
> On May 5, 2017, at 6:31 PM, Guenter Roeck  wrote:
> 
> The driver now uses IRQ_POLL and needs to select it to avoid the following
> build error.
> 
> ERROR: ".irq_poll_complete" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
> ERROR: ".irq_poll_sched" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
> ERROR: ".irq_poll_disable" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
> ERROR: ".irq_poll_init" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
> 
> Fixes: cba06e6de403 ("scsi: cxlflash: Implement IRQ polling for RRQ 
> processing")
> Signed-off-by: Guenter Roeck 

Acked-by: Matthew R. Ochs 



Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
Hi!

> > 'clean marker' is a good idea... empty pages have plenty of space.
> 
> Well... you lose that space permanently. Although I suppose you could
> do things differently and erase a block immediately prior to using it.
> But in that case why ever write the cleanmarker? Just maintain a set of
> blocks that you *will* erase and re-use.

Yes, but erase is slow so that would hurt performance...?

> > How do you handle the issue during regular write? Always ignore last
> > successfully written block?
> 
> Log nodes have a CRC. If you get interrupted during a write, that CRC
> should fail.

Umm. That is not what "unstable bits" issue is about, right?

If you are interrupted during write, you can get into state where
readback will be correct on next boot (CRC, ECC ok), but then the bits
will go back few hours after that. You can't rely on checksums to
detect that.. because the bits will have the right values -- for a while.

> > Do you handle "paired pages" problem on MLC?
> 
> No. It would theoretically be possible, by not considering a write to
> the first page "committed" until the second page of the pair is also
> written. Essentially, it's not far off expanding the existing 'wbuf'
> which we use to gather writes into full pages for NAND, to cover the
> *whole* of the set of pages which are affected by MLC.
> 
> But we mostly consider JFFS2 to be obsolete these days, in favour of
> UBI/UBIFS or other approaches.

Yes, I guess MLC NAND chips are mostly too big for jjfs2.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Richard Weinberger
Boris,

Am 08.05.2017 um 13:48 schrieb Boris Brezillon:
>>> How do you handle the issue during regular write? Always ignore last
>>> successfully written block?  
> 
> I guess UBIFS can know what was written last, because of the log-based
> approach + the seqnum stored along with FS nodes, but I'm pretty sure
> UBIFS does not re-write the last written block in case of an unclean
> mount. Richard, am I wrong?

Yes. UBIFS has the machinery but uses it differently. When it faces ECC
errors while replying the journal it can recover good data from the LEB.
It assumes that an interrupted write leads always to ECC errors.

>>
>> The last page of a block is inspected and allowed to be corrupted.
> 
> Actually, it's not really about corrupted pages, it's about pages that
> might become unreadable after a few reads.

As stated before, it assumes an ECC error from an interrupted read.

We could automatically re-write everything in UBIFS that was written last
but we don't have this information for data UBI itself wrote since UBI has
no journal.

If unstable bit can be triggered with current systems we can think of a clever
trick to deal with that. So far nobody was able to show me the problem.

Thanks,
//richard


Re: [PATCH] scsi: qla4xxx: check for null return from iscsi_lookup_endpoint

2017-05-08 Thread Dan Carpenter
This should be CC'd to qlogic-storage-upstr...@qlogic.com as well.

regards,
dan carpenter

On Sun, May 07, 2017 at 10:30:20PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> iscsi_lookup_endpoint can potentially return null and in 9 out of
> the 10 calls to this function a null return is checked, so I think
> it is pertinent to perform a null check here too and return -EINVAL
> as in the other null cases.
> 
> Detected by CoverityScan, CID#147282 ("Dereference null return value")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/scsi/qla4xxx/ql4_os.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
> index 64c6fa563fdb..803e342e1093 100644
> --- a/drivers/scsi/qla4xxx/ql4_os.c
> +++ b/drivers/scsi/qla4xxx/ql4_os.c
> @@ -3189,6 +3189,8 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session 
> *cls_session,
>   if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
>   return -EINVAL;
>   ep = iscsi_lookup_endpoint(transport_fd);
> + if (!ep)
> + return -EINVAL;
>   conn = cls_conn->dd_data;
>   qla_conn = conn->dd_data;
>   qla_conn->qla_ep = ep->dd_data;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Boris Brezillon
On Mon, 8 May 2017 13:48:07 +0200
Boris Brezillon  wrote:

> On Mon, 8 May 2017 13:06:17 +0200
> Richard Weinberger  wrote:
> 
> > On Mon, May 8, 2017 at 12:49 PM, Pavel Machek  wrote:  
> > > Aha, nice, so it looks like ubifs is a step back here.
> > >
> > > 'clean marker' is a good idea... empty pages have plenty of space.
> > 
> > If UBI (not UBIFS) faces an empty block, it also re-erases it.  
> 
> Unfortunately, that's not the case, though UBI can easily be patched
> to do that (see below).

Sorry for the noise, I was wrong, UBI already re-erases empty blocks
[1].

[1]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/ubi/attach.c#L983


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Boris Brezillon
On Mon, 08 May 2017 11:13:10 +0100
David Woodhouse  wrote:

> On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote:
> > You're forgetting that the SSD itself (this thread is about SSDs) also has
> > a major software component which is doing housekeeping all the time, so even
> > if the main CPU gets reset the SSD's controller may still happily be erasing
> > blocks.  
> 
> We're not really talking about SSDs at all any more; we're talking
> about real flash with real maintainable software.

It's probably a good sign that this new discussion should take place in
a different thread :-).


[PATCH] qlogicpti: Fix resource releasing in order handling path

2017-05-08 Thread Christophe JAILLET
Reorder 'fail_free_irq' and 'fail_unmap_regs' in order to correctly free
resources  in case of error.

Signed-off-by: Christophe JAILLET 
---
 drivers/scsi/qlogicpti.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 69bfc0a1aea3..e5aeb26bd9b1 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -1385,6 +1385,9 @@ static int qpti_sbus_probe(struct platform_device *op)
  qpti->req_cpu, qpti->req_dvma);
 #undef QSIZE
 
+fail_free_irq:
+   free_irq(qpti->irq, qpti);
+
 fail_unmap_regs:
of_iounmap(>resource[0], qpti->qregs,
   resource_size(>resource[0]));
@@ -1392,9 +1395,6 @@ static int qpti_sbus_probe(struct platform_device *op)
of_iounmap(>resource[0], qpti->sreg,
   sizeof(unsigned char));
 
-fail_free_irq:
-   free_irq(qpti->irq, qpti);
-
 fail_unlink:
scsi_host_put(host);
 
-- 
2.11.0



Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 12:49 +0200, Pavel Machek wrote:
> On Mon 2017-05-08 10:34:08, David Woodhouse wrote:
> > 
> > On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote:
> > > 
> > > 
> > > Are you sure you have it right in JFFS2? Do you journal block erases?
> > > Apparently, that was pretty much non-issue on older flashes.
> > It isn't necessary in JFFS2. It is a *purely* log-structured file
> > system (which is why it doesn't scale well past the 1GiB or so that we
> > made it handle for OLPC).
> > 
> > So we don't erase a block until all its contents are obsolete. And if
> > we fail to complete the erase... well the contents are either going to
> > fail a CRC check, or... still be obsoleted by later entries elsewhere.
> > 
> > And even if it *looks* like an erase has completed and the block is all
> > 0xFF, we erase it again and write a 'clean marker' to it to indicate
> > that the erase was completed successfully. Because otherwise it can't
> > be trusted.
> Aha, nice, so it looks like ubifs is a step back here.
> 
> 'clean marker' is a good idea... empty pages have plenty of space.

Well... you lose that space permanently. Although I suppose you could
do things differently and erase a block immediately prior to using it.
But in that case why ever write the cleanmarker? Just maintain a set of
blocks that you *will* erase and re-use.

> How do you handle the issue during regular write? Always ignore last
> successfully written block?

Log nodes have a CRC. If you get interrupted during a write, that CRC
should fail.

> Do you handle "paired pages" problem on MLC?

No. It would theoretically be possible, by not considering a write to
the first page "committed" until the second page of the pair is also
written. Essentially, it's not far off expanding the existing 'wbuf'
which we use to gather writes into full pages for NAND, to cover the
*whole* of the set of pages which are affected by MLC.

But we mostly consider JFFS2 to be obsolete these days, in favour of
UBI/UBIFS or other approaches.

smime.p7s
Description: S/MIME cryptographic signature


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Richard Weinberger
On Mon, May 8, 2017 at 12:49 PM, Pavel Machek  wrote:
> Aha, nice, so it looks like ubifs is a step back here.
>
> 'clean marker' is a good idea... empty pages have plenty of space.

If UBI (not UBIFS) faces an empty block, it also re-erases it.
The EC header is uses as clean marker.

> How do you handle the issue during regular write? Always ignore last
> successfully written block?

The last page of a block is inspected and allowed to be corrupted.

> Do you handle "paired pages" problem on MLC?

Nope, no MLC support in mainline so far.

-- 
Thanks,
//richard


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
On Mon 2017-05-08 10:34:08, David Woodhouse wrote:
> On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote:
> > 
> > Are you sure you have it right in JFFS2? Do you journal block erases?
> > Apparently, that was pretty much non-issue on older flashes.
> 
> It isn't necessary in JFFS2. It is a *purely* log-structured file
> system (which is why it doesn't scale well past the 1GiB or so that we
> made it handle for OLPC).
> 
> So we don't erase a block until all its contents are obsolete. And if
> we fail to complete the erase... well the contents are either going to
> fail a CRC check, or... still be obsoleted by later entries elsewhere.
> 
> And even if it *looks* like an erase has completed and the block is all
> 0xFF, we erase it again and write a 'clean marker' to it to indicate
> that the erase was completed successfully. Because otherwise it can't
> be trusted.

Aha, nice, so it looks like ubifs is a step back here.

'clean marker' is a good idea... empty pages have plenty of space.

How do you handle the issue during regular write? Always ignore last
successfully written block?

Do you handle "paired pages" problem on MLC?

Best regards,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 11:06 +0200, Ricard Wanderlof wrote:
> 
> My point is really that say that the problem is in fact not that the erase 
> is cut short due to the power fail, but that the software issues a second 
> command before the first erase command has completed, for instance, or 
> some other situation. Then we'd have a concrete situation which we can 
> resolve (i.e., fix the bug), rather than assuming that it's the hardware's 
> fault and implement various software workarounds.

On NOR flash we have *definitely* seen it during powerfail testing.

A block looks like it's all 0xFF when you read it back on mount, but if
you read it repeatedly, you may see bit flips because it wasn't
completely erased. And even if you read it ten times and 'trust' that
it's properly erased, it could start to show those bit flips when you
start to program it.

It was very repeatable, and that's when we implemented the 'clean
markers' written after a successful erase, rather than trusting a block
that "looks empty".


smime.p7s
Description: S/MIME cryptographic signature


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote:
> You're forgetting that the SSD itself (this thread is about SSDs) also has
> a major software component which is doing housekeeping all the time, so even
> if the main CPU gets reset the SSD's controller may still happily be erasing
> blocks.

We're not really talking about SSDs at all any more; we're talking
about real flash with real maintainable software.

smime.p7s
Description: S/MIME cryptographic signature


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Richard Weinberger
Pavel,

On Mon, May 8, 2017 at 11:28 AM, Pavel Machek  wrote:
> Are you sure you have it right in JFFS2? Do you journal block erases?
> Apparently, that was pretty much non-issue on older flashes.

This is what the website says, yes. Do you have hardware where you can
trigger it?
If so, I'd love to get access to it.

So far I never saw the issue, sometimes people claim to suffer from it
but when I
inspect the problems in detail it is always something else.

-- 
Thanks,
//richard


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote:
> 
> Are you sure you have it right in JFFS2? Do you journal block erases?
> Apparently, that was pretty much non-issue on older flashes.

It isn't necessary in JFFS2. It is a *purely* log-structured file
system (which is why it doesn't scale well past the 1GiB or so that we
made it handle for OLPC).

So we don't erase a block until all its contents are obsolete. And if
we fail to complete the erase... well the contents are either going to
fail a CRC check, or... still be obsoleted by later entries elsewhere.

And even if it *looks* like an erase has completed and the block is all
0xFF, we erase it again and write a 'clean marker' to it to indicate
that the erase was completed successfully. Because otherwise it can't
be trusted.

smime.p7s
Description: S/MIME cryptographic signature


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Pavel Machek
On Mon 2017-05-08 08:21:34, David Woodhouse wrote:
> On Sun, 2017-05-07 at 22:40 +0200, Pavel Machek wrote:
> > > > NOTE: unclean SSD power-offs are dangerous and may brick the device in
> > > > the worst case, or otherwise harm it (reduce longevity, damage flash
> > > > blocks).  It is also not impossible to get data corruption.
> >
> > > I get that the incrementing counters might not be pretty but I'm a bit
> > > skeptical about this being an actual issue.  Because if that were
> > > true, the device would be bricking itself from any sort of power
> > > losses be that an actual power loss, battery rundown or hard power off
> > > after crash.
> >
> > And that's exactly what users see. If you do enough power fails on a
> > SSD, you usually brick it, some die sooner than others. There was some
> > test results published, some are here
> > http://lkcl.net/reports/ssd_analysis.html, I believe I seen some
> > others too.
> > 
> > It is very hard for a NAND to work reliably in face of power
> > failures. In fact, not even Linux MTD + UBIFS works well in that
> > regards. See
> > http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its
> > down now?!). If we can't get it right, do you believe SSD manufactures
> > do?
> > 
> > [Issue is, if you powerdown during erase, you get "weakly erased"
> > page, which will contain expected 0xff's, but you'll get bitflips
> > there quickly. Similar issue exists for writes. It is solveable in
> > software, just hard and slow... and we don't do it.]
> 
> It's not that hard. We certainly do it in JFFS2. I was fairly sure that
> it was also part of the design considerations for UBI — it really ought
> to be right there too. I'm less sure about UBIFS but I would have
> expected it to be OK.

Are you sure you have it right in JFFS2? Do you journal block erases?
Apparently, that was pretty much non-issue on older flashes.

https://web-beta.archive.org/web/20160923094716/http://www.linux-mtd.infradead.org:80/doc/ubifs.html#L_unstable_bits

> SSDs however are often crap; power fail those at your peril. And of
> course there's nothing you can do when they do fail, whereas we accept
> patches for things which are implemented in Linux.

Agreed. If the SSD indiciates unexpected powerdown, it is a problem
and we need to fix it.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Hans de Goede

Hi,

On 08-05-17 11:06, Ricard Wanderlof wrote:


On Mon, 8 May 2017, David Woodhouse wrote:


On Mon, 8 May 2017, David Woodhouse wrote:

Our empirical testing trumps your "can never happen" theory :)


I'm sure it does. But what is the explanation then? Has anyone analyzed
what is going on using an oscilloscope to verify relationship between
erase command and supply voltage drop?


Not that I'm aware of. Once we have reached the "it does happen and we
have to cope" there was not a lot of point in working out *why* it
happened.

In fact, the only examples I *personally* remember were on NOR flash,
which takes longer to erase. So it's vaguely possible that it doesn't
happen on NAND. But really, it's not something we should be depending
on and the software mechanisms have to remain in place.


My point is really that say that the problem is in fact not that the erase
is cut short due to the power fail, but that the software issues a second
command before the first erase command has completed, for instance, or
some other situation. Then we'd have a concrete situation which we can
resolve (i.e., fix the bug), rather than assuming that it's the hardware's
fault and implement various software workarounds.


You're forgetting that the SSD itself (this thread is about SSDs) also has
a major software component which is doing housekeeping all the time, so even
if the main CPU gets reset the SSD's controller may still happily be erasing
blocks.

Regards,

Hans


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Ricard Wanderlof

On Mon, 8 May 2017, David Woodhouse wrote:

> > On Mon, 8 May 2017, David Woodhouse wrote:
> > > Our empirical testing trumps your "can never happen" theory :)
> >
> > I'm sure it does. But what is the explanation then? Has anyone analyzed 
> > what is going on using an oscilloscope to verify relationship between 
> > erase command and supply voltage drop?
> 
> Not that I'm aware of. Once we have reached the "it does happen and we
> have to cope" there was not a lot of point in working out *why* it
> happened.
> 
> In fact, the only examples I *personally* remember were on NOR flash,
> which takes longer to erase. So it's vaguely possible that it doesn't
> happen on NAND. But really, it's not something we should be depending
> on and the software mechanisms have to remain in place.

My point is really that say that the problem is in fact not that the erase 
is cut short due to the power fail, but that the software issues a second 
command before the first erase command has completed, for instance, or 
some other situation. Then we'd have a concrete situation which we can 
resolve (i.e., fix the bug), rather than assuming that it's the hardware's 
fault and implement various software workarounds.

On the other hand, making the software resilient to erase problems 
essentially makes the system more robust in any case, so it's not a bad 
thing of course.

It's just that I've seen this "we're software guys, and it must be the 
hardware's fault" (and vice versa) enough times to cause a small warning 
bell to off here.

/Ricard
-- 
Ricard Wolf Wanderlöf   ricardw(at)axis.com
Axis Communications AB, Lund, Swedenwww.axis.com
Phone +46 46 272 2016   Fax +46 46 13 61 30


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 10:36 +0200, Ricard Wanderlof wrote:
> On Mon, 8 May 2017, David Woodhouse wrote:
> > Our empirical testing trumps your "can never happen" theory :)
>
> I'm sure it does. But what is the explanation then? Has anyone analyzed 
> what is going on using an oscilloscope to verify relationship between 
> erase command and supply voltage drop?

Not that I'm aware of. Once we have reached the "it does happen and we
have to cope" there was not a lot of point in working out *why* it
happened.

In fact, the only examples I *personally* remember were on NOR flash,
which takes longer to erase. So it's vaguely possible that it doesn't
happen on NAND. But really, it's not something we should be depending
on and the software mechanisms have to remain in place.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] libfc: do not flood console with messages 'libfc: queue full ...'

2017-05-08 Thread Johannes Thumshirn
On Thu, Apr 27, 2017 at 04:25:03PM +0200, Hannes Reinecke wrote:
> When the FCoE sending side becomes congested libfc tries to
> reduce the queue depth on the host; however due to the built-in
> lag before attempting to ramp down the queue depth _again_ the
> message log is flooded with messages
> 
> libfc: queue full, reducing can_queue to 512
> 
> With this patch the message is printed only once (ie when it's
> actually changed).
> 
> Signed-off-by: Hannes Reinecke 
> ---

Acked-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: Race to power off harming SATA SSDs

2017-05-08 Thread Ricard Wanderlof

On Mon, 8 May 2017, David Woodhouse wrote:

> > I've got a problem with the underlying mechanism. How long does it take to 
> > erase a NAND block? A couple of milliseconds. That means that for an erase 
> > to be "weak" du to a power fail, the host CPU must issue an erase command, 
> > and then the power to the NAND must drop within those milliseconds. 
> > However, in most systems there will be a power monitor which will 
> > essentially reset the CPU as soon as the power starts dropping. So in 
> > practice, by the time the voltage is too low to successfully supply the 
> > NAND chip, the CPU has already been reset, hence, no reset command will 
> > have been given by the time NAND runs out of steam.
> > 
> > Sure, with switchmode power supplies, we don't have those large capacitors 
> > in the power supply which can keep the power going for a second or more, 
> > but still, I would think that the power wouldn't die fast enough for this 
> > to be an issue.
> > 
> Our empirical testing trumps your "can never happen" theory :)

I'm sure it does. But what is the explanation then? Has anyone analyzed 
what is going on using an oscilloscope to verify relationship between 
erase command and supply voltage drop?

/Ricard
-- 
Ricard Wolf Wanderlöf   ricardw(at)axis.com
Axis Communications AB, Lund, Swedenwww.axis.com
Phone +46 46 272 2016   Fax +46 46 13 61 30


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Mon, 2017-05-08 at 09:38 +0200, Ricard Wanderlof wrote:
> On Mon, 8 May 2017, David Woodhouse wrote:
> 
> > 
> > > 
> > > [Issue is, if you powerdown during erase, you get "weakly erased"
> > > page, which will contain expected 0xff's, but you'll get bitflips
> > > there quickly. Similar issue exists for writes. It is solveable in
> > > software, just hard and slow... and we don't do it.]
> > It's not that hard. We certainly do it in JFFS2. I was fairly sure that
> > it was also part of the design considerations for UBI ? it really ought
> > to be right there too. I'm less sure about UBIFS but I would have
> > expected it to be OK.
> I've got a problem with the underlying mechanism. How long does it take to 
> erase a NAND block? A couple of milliseconds. That means that for an erase 
> to be "weak" du to a power fail, the host CPU must issue an erase command, 
> and then the power to the NAND must drop within those milliseconds. 
> However, in most systems there will be a power monitor which will 
> essentially reset the CPU as soon as the power starts dropping. So in 
> practice, by the time the voltage is too low to successfully supply the 
> NAND chip, the CPU has already been reset, hence, no reset command will 
> have been given by the time NAND runs out of steam.
> 
> Sure, with switchmode power supplies, we don't have those large capacitors 
> in the power supply which can keep the power going for a second or more, 
> but still, I would think that the power wouldn't die fast enough for this 
> to be an issue.
> 
> But I could very well be wrong and I haven't had experience with that many 
> NAND flash systems. But then please tell me where the above reasoning is 
> flawed.

Our empirical testing trumps your "can never happen" theory :)


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 19/19] aacraid: Update driver version to 50834

2017-05-08 Thread Raghava Aditya Renukunta
Update the driver version to 50834

Signed-off-by: Raghava Aditya Renukunta 
---
 drivers/scsi/aacraid/aacraid.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 58ccd2a..0995265 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -97,7 +97,7 @@ enum {
 #definePMC_GLOBAL_INT_BIT0 0x0001
 
 #ifndef AAC_DRIVER_BUILD
-# define AAC_DRIVER_BUILD 50792
+# define AAC_DRIVER_BUILD 50834
 # define AAC_DRIVER_BRANCH "-custom"
 #endif
 #define MAXIMUM_NUM_CONTAINERS 32
-- 
2.7.4



Re: Race to power off harming SATA SSDs

2017-05-08 Thread Ricard Wanderlof

On Mon, 8 May 2017, David Woodhouse wrote:

> > [Issue is, if you powerdown during erase, you get "weakly erased"
> > page, which will contain expected 0xff's, but you'll get bitflips
> > there quickly. Similar issue exists for writes. It is solveable in
> > software, just hard and slow... and we don't do it.]
> 
> It's not that hard. We certainly do it in JFFS2. I was fairly sure that
> it was also part of the design considerations for UBI ? it really ought
> to be right there too. I'm less sure about UBIFS but I would have
> expected it to be OK.

I've got a problem with the underlying mechanism. How long does it take to 
erase a NAND block? A couple of milliseconds. That means that for an erase 
to be "weak" du to a power fail, the host CPU must issue an erase command, 
and then the power to the NAND must drop within those milliseconds. 
However, in most systems there will be a power monitor which will 
essentially reset the CPU as soon as the power starts dropping. So in 
practice, by the time the voltage is too low to successfully supply the 
NAND chip, the CPU has already been reset, hence, no reset command will 
have been given by the time NAND runs out of steam.

Sure, with switchmode power supplies, we don't have those large capacitors 
in the power supply which can keep the power going for a second or more, 
but still, I would think that the power wouldn't die fast enough for this 
to be an issue.

But I could very well be wrong and I haven't had experience with that many 
NAND flash systems. But then please tell me where the above reasoning is 
flawed.

/Ricard
-- 
Ricard Wolf Wanderlöf   ricardw(at)axis.com
Axis Communications AB, Lund, Swedenwww.axis.com
Phone +46 46 272 2016   Fax +46 46 13 61 30


Re: [PATCH] scsi_lib: Add #include

2017-05-08 Thread Johannes Thumshirn
On Tue, May 02, 2017 at 10:45:03AM -0700, Bart Van Assche wrote:
> This patch avoids that when building with W=1 the compiler
> complains that __scsi_init_queue() has not been declared.
> See also commit d48777a633d6 ("scsi: remove __scsi_alloc_queue").
> 
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---

Looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: Race to power off harming SATA SSDs

2017-05-08 Thread David Woodhouse
On Sun, 2017-05-07 at 22:40 +0200, Pavel Machek wrote:
> > > NOTE: unclean SSD power-offs are dangerous and may brick the device in
> > > the worst case, or otherwise harm it (reduce longevity, damage flash
> > > blocks).  It is also not impossible to get data corruption.
>
> > I get that the incrementing counters might not be pretty but I'm a bit
> > skeptical about this being an actual issue.  Because if that were
> > true, the device would be bricking itself from any sort of power
> > losses be that an actual power loss, battery rundown or hard power off
> > after crash.
>
> And that's exactly what users see. If you do enough power fails on a
> SSD, you usually brick it, some die sooner than others. There was some
> test results published, some are here
> http://lkcl.net/reports/ssd_analysis.html, I believe I seen some
> others too.
> 
> It is very hard for a NAND to work reliably in face of power
> failures. In fact, not even Linux MTD + UBIFS works well in that
> regards. See
> http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its
> down now?!). If we can't get it right, do you believe SSD manufactures
> do?
> 
> [Issue is, if you powerdown during erase, you get "weakly erased"
> page, which will contain expected 0xff's, but you'll get bitflips
> there quickly. Similar issue exists for writes. It is solveable in
> software, just hard and slow... and we don't do it.]

It's not that hard. We certainly do it in JFFS2. I was fairly sure that
it was also part of the design considerations for UBI — it really ought
to be right there too. I'm less sure about UBIFS but I would have
expected it to be OK.

SSDs however are often crap; power fail those at your peril. And of
course there's nothing you can do when they do fail, whereas we accept
patches for things which are implemented in Linux.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3] lpfc: Fix panic on BFS configuration.

2017-05-08 Thread Johannes Thumshirn
On Thu, Apr 27, 2017 at 03:08:26PM -0700, jsmart2...@gmail.com wrote:
> From: James Smart 
> 
> To select the appropriate shost template, the driver is issuing
> a mailbox command to retrieve the wwn. Turns out the sending of
> the command precedes the reset of the function.  On SLI-4 adapters,
> this is inconsequential as the mailbox command location is specified
> by dma via the BMBX register. However, on SLI-3 adapters, the
> location of the mailbox command submission area changes. When the
> function is first powered on or reset, the cmd is submitted via PCI
> bar memory. Later the driver changes the function config to use
> host memory and DMA. The request to start a mailbox command is the
> same, a simple doorbell write, regardless of submission area.
> So.. if there has not been a boot driver run against the adapter,
> the mailbox command works as defaults are ok. But, if the boot
> driver has configured the card and, and if no platform pci
> function/slot reset occurs as the os starts, the mailbox command
> will fail. The SLI-3 device will use the stale boot driver dma
> location. This can cause PCI eeh errors.
> 
> Fix is to reset the sli-3 function before sending the
> mailbox command, thus synchronizing the function/driver on mailbox
> location.
> 
> Note: The fix uses routines that are typically invoked later in the
> call flow to reset the sli-3 device. The issue in using those routines is
> that the normal (non-fix) flow does additional initialization, namely the
> allocation of the pport structure. So, rather than significantly reworking
> the initialization flow so that the pport is alloc'd first, pointer checks
> are added to work around it. Checks are limited to the routines invoked
> by a sli-3 adapter (s3 routines) as this fix/early call is only invoked
> on a sli3 adapter. Nothing changes post the fix. Subsequent initialization,
> and another adapter reset, still occur - both on sli-3 and sli-4 adapters.
> 
> Signed-off-by: Dick Kennedy 
> Signed-off-by: James Smart 
> Fixes: 96418b5e2c88 ("scsi: lpfc: Fix eh_deadline setting for sli3 adapters.")
> Cc: sta...@vger.kernel.org
> ---

Looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH v2] sd: Write lock zone for REQ_OP_WRITE_ZEROES

2017-05-08 Thread damien . lemoal
From: Damien Le Moal 

For a zoned block device, sd_zbc_complete() handles zone write unlock
on completion of a REQ_OP_WRITE_ZEROES command but the zone write
locking is missing from sd_setup_write_zeroes_cmnd(). This patch fixes
this problem by locking the target zone of a REQ_OP_WRITE_ZEROES
request.

Signed-off-by: Damien Le Moal 
---

Changes from v1:
* Lock zone in all cases so that no warning is triggered in sd_zbc_complete()
  with unlocked zones for the unmap case.

 drivers/scsi/sd.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e60a309..de9e2f2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -827,21 +827,32 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd 
*cmd)
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+   int ret;
 
if (!(rq->cmd_flags & REQ_NOUNMAP)) {
switch (sdkp->zeroing_mode) {
case SD_ZERO_WS16_UNMAP:
-   return sd_setup_write_same16_cmnd(cmd, true);
+   ret = sd_setup_write_same16_cmnd(cmd, true);
+   goto out;
case SD_ZERO_WS10_UNMAP:
-   return sd_setup_write_same10_cmnd(cmd, true);
+   ret = sd_setup_write_same10_cmnd(cmd, true);
+   goto out;
}
}
 
if (sdp->no_write_same)
return BLKPREP_INVALID;
+
if (sdkp->ws16 || sector > 0x || nr_sectors > 0x)
-   return sd_setup_write_same16_cmnd(cmd, false);
-   return sd_setup_write_same10_cmnd(cmd, false);
+   ret = sd_setup_write_same16_cmnd(cmd, false);
+   else
+   ret = sd_setup_write_same10_cmnd(cmd, false);
+
+out:
+   if (sd_is_zoned(sdkp) && ret == BLKPREP_OK)
+   return sd_zbc_write_lock_zone(cmd);
+
+   return ret;
 }
 
 static void sd_config_write_same(struct scsi_disk *sdkp)
-- 
2.9.3



[PATCH] sd: Write lock zone for REQ_OP_WRITE_ZEROES

2017-05-08 Thread damien . lemoal
From: Damien Le Moal 

For a zoned block device, sd_zbc_complete() handles zone write unlock on
completion of a REQ_OP_WRITE_ZEROES command but the zone write locking is
missing from sd_setup_write_zeroes_cmnd(). This patch fixes this problem by
locking the target zone of a REQ_OP_WRITE_ZEROES request, ignoring the
unmap == true case (as in this case, the write same command will necessarily
be aligned on the entire zone and be equivalent to a discard which does not
write lock zones).

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e60a309..b0a525d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -827,6 +827,7 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+   int ret;
 
if (!(rq->cmd_flags & REQ_NOUNMAP)) {
switch (sdkp->zeroing_mode) {
@@ -839,9 +840,22 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd 
*cmd)
 
if (sdp->no_write_same)
return BLKPREP_INVALID;
+
+   if (sd_is_zoned(sdkp)) {
+   ret = sd_zbc_write_lock_zone(cmd);
+   if (ret != BLKPREP_OK)
+   return ret;
+   }
+
if (sdkp->ws16 || sector > 0x || nr_sectors > 0x)
-   return sd_setup_write_same16_cmnd(cmd, false);
-   return sd_setup_write_same10_cmnd(cmd, false);
+   ret = sd_setup_write_same16_cmnd(cmd, false);
+   else
+   ret = sd_setup_write_same10_cmnd(cmd, false);
+
+   if (sd_is_zoned(sdkp) && ret != BLKPREP_OK)
+   sd_zbc_write_unlock_zone(cmd);
+
+   return ret;
 }
 
 static void sd_config_write_same(struct scsi_disk *sdkp)
-- 
2.9.3



[PATCH] sd: Unlock zone in case of error in sd_setup_write_same_cmnd()

2017-05-08 Thread damien . lemoal
From: Damien Le Moal 

scsi_io_init() may fail, leaving a zone of a zoned block device locked.
Fix this by properly unlocking the write same request target zone if
scsi_io_init() fails.

Signed-off-by: Damien Le Moal 
Cc: sta...@vger.kernel.org
---
 drivers/scsi/sd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f9d1432..e60a309 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -948,6 +948,10 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
rq->__data_len = sdp->sector_size;
ret = scsi_init_io(cmd);
rq->__data_len = nr_bytes;
+
+   if (sd_is_zoned(sdkp) && ret != BLKPREP_OK)
+   sd_zbc_write_unlock_zone(cmd);
+
return ret;
 }
 
-- 
2.9.3