from:"Tejun Heo"

Re: [PATCH 2/2] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-08-01 Thread Tejun Heo

On Mon, Jul 30, 2018 at 11:40:52AM -0700, Bart Van Assche wrote:
> A long time ago the unfortunate decision was taken to add a self-
> deletion attribute to the sysfs SCSI device directory. That decision
> was unfortunate because self-deletion is really tricky. We can't drop
> that attribute because widely used user space software depends on it,
> namely the rescan-scsi-bus.sh script. Hence this patch that avoids
> that writing into that attribute triggers a deadlock. See also commit
> 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete
> scsi_devices").

Acked-by: Tejun Heo 

Thanks.

-- 
tejun

Re: [PATCH 1/2] sysfs: Introduce sysfs_{un,}break_active_protection()

2018-08-01 Thread Tejun Heo

On Mon, Jul 30, 2018 at 11:40:51AM -0700, Bart Van Assche wrote:
> Introduce these two functions and export them such that the next patch
> can add calls to these functions from the SCSI core.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Greg Kroah-Hartman 
> Cc: Tejun Heo 
> Cc: 

Acked-by: Tejun Heo 

Thanks.

-- 
tejun

Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Tejun Heo

Hello,

ISTR giving the same feedback before.

On Wed, Jul 25, 2018 at 10:38:28AM -0700, Bart Van Assche wrote:
> +struct remove_dev_work {
> + struct callback_headhead;
> + struct scsi_device  *sdev;
> +};
> +
> +static void delete_sdev(struct callback_head *head)
> +{
> + struct remove_dev_work *work = container_of(head, typeof(*work), head);
> + struct scsi_device *sdev = work->sdev;
> +
> + scsi_remove_device(sdev);
> + kfree(work);
> + scsi_device_put(sdev);
> +}
> +
>  static ssize_t
>  sdev_store_delete(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
>  {
> - if (device_remove_file_self(dev, attr))
> - scsi_remove_device(to_scsi_device(dev));
> - return count;
> -};
> + struct scsi_device *sdev = to_scsi_device(dev);
> + struct remove_dev_work *work;
> + int ret;
> +
> + ret = scsi_device_get(sdev);
> + if (ret < 0)
> + goto out;
> + ret = -ENOMEM;
> + work = kmalloc(sizeof(*work), GFP_KERNEL);
> + if (!work)
> + goto put;
> + work->head.func = delete_sdev;
> + work->sdev = sdev;
> + ret = task_work_add(current, >head, false);
> + if (ret < 0)
> + goto free;
> + ret = count;
> +
> +out:
> + return ret;
> +
> +free:
> + kfree(work);
> +
> +put:
> + scsi_device_put(sdev);
> + goto out;
> +}

Making removal asynchronous this way sometimes causes issues because
whether the user sees the device released or not is racy.
kernfs/sysfs have mechanisms to deal with these cases - remove_self
and kernfs_break_active_protection().  Have you looked at those?

Thanks.

-- 
tejun

Re: [PATCH V2 0/2] ZBC_OUT command translation fixes

2018-07-02 Thread Tejun Heo

On Tue, Jun 26, 2018 at 08:56:53PM +0900, Damien Le Moal wrote:
> Tejun,
> 
> These two patches fix problems with the checks of the ZBC_OUT command fields
> prior to its translation to ZAC MANAGEMENT OUT.
> 
> The first patch fixes an incorrect out-of-range check and changes the returned
> asc/ascq to the ZBC defined INVALID FIELD IN CDB instead of (the more natural
> but incorrect) LBA OUT OF RANGE.
> 
> The second patch disables the ZBC_OUT command block address check if the ALL
> bit is set, as defined by the ZBC specifications.

Applied 1-2 to libata/for-4.18-fixes.

Thanks.

-- 
tejun

Re: [PATCH 15/25] workqueue,zfcp: set description for port work items with their WWPN as context

2018-05-18 Thread Tejun Heo

On Thu, May 17, 2018 at 07:14:57PM +0200, Steffen Maier wrote:
> As a prerequisite, complement commit 3d1cb2059d93 ("workqueue: include
> workqueue info when printing debug dump of a worker task") to be usable
> with kernel modules by exporting the symbol set_worker_desc().
> Current built-in user was introduced with commit ef3b101925f2 ("writeback:
> set worker desc to identify writeback workers in task dumps").
> 
> Can help distinguishing work items which do not have adapter scope.
> Description is printed out with task dump for debugging on
> WARN, BUG, panic, or magic-sysrq [show-task-states(t)].

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH-next] scsi: libsas: dynamically allocate and free ata host

2018-05-10 Thread Tejun Heo

On Thu, May 10, 2018 at 11:05:16AM +0800, Jason Yan wrote:
> Commit 2623c7a5f2 ("libata: add refcounting to ata_host") v4.17+
> introduced refcounting to ata_host and will increase or decrease the
> refcount when adding or deleting transport ATA port.

libata side looks good to me.

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH] scsi: libsas: add transport class for ATA devices

2018-03-26 Thread Tejun Heo

On Mon, Mar 26, 2018 at 05:27:41PM +0800, Jason Yan wrote:
> Now ata devices attached with sas controller do not have transport
> class, so that we can not see any information of these ata devices in
> /sys/class/ata_port(or ata_link or ata_device).
> 
> Add transport class for the ata devices attached with sas controller.
> The /sys/class directory will show the infomation of the ata devices
> as follows:
> 
> localhost:/sys/class # ls ata*
> ata_device:
> dev1.0  dev2.0
> 
> ata_link:
> link1  link2
> 
> ata_port:
> ata1  ata2
> 
> No functional change of the device scanning and io path. The ata
> transport class was deleted when destroying the sas devices.
> 
> Signed-off-by: Jason Yan <yanai...@huawei.com>
> CC: Dan Williams <dan.j.willi...@intel.com>
> CC: Tejun Heo <t...@kernel.org>

Looks good to me on the ata side.

 Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Tejun Heo

On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
> Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> must use synchronize_sched().

Is there a reason to use sched-RCU here instead of the regular one?
If not, it'd be better to switch to regular RCU than the other way
around.

Thanks.

-- 
tejun

Re: [PATCH] Improve ZBC/ZAC error handling

2018-03-04 Thread Tejun Heo

On Fri, Mar 02, 2018 at 04:40:18AM +0900, Damien Le Moal wrote:
> This series introduces changes to scsi and libata error handling for ZBC and 
> ZAC
> devices.
> 
> The first patch moves ZBC specific error handling in sd_zbc_complete() to a
> generic scsi error function that can be used also in libata (second patch). 
> The
> goal of this change is to limit retries for commands that are identify as not
> worth retrying (know failure condition) for both scsi/ZBC and libata/ZAC.
> Without these two patches, only ZBC behaves nicely (commands that are known to
> fail are retried in the ZAC case).
> 
> The following 2 snmall patches are simple fixes.
> 
> The last 2 pacthes are improvements in libata error handling verbosity.
> 
> Damien Le Moal (6):
>   scsi: Introduce scsi_zbc_noretry_cmd()
>   libata: Use scsi_zbc_noretry_cmd() for ZAC devices
>   libata: Fix comment typo in ata_eh_analyze_tf()
>   libata: Fix ata_err_string()
>   libata: Honor RQF_QUIET flag
>   libata: Be quiet when asked to

Except for the nit on the last patch, ata part looks good to me.
Martin, how do you wanna route the SCSI part?

Thanks.

-- 
tejun

Re: [PATCH 6/6] libata: Be quiet when asked to

2018-03-04 Thread Tejun Heo

Hello,

On Fri, Mar 02, 2018 at 04:40:24AM +0900, Damien Le Moal wrote:
> For a successful setting of the device transfer speed mode in
> ata_dev_set_mode(), do not print the message
> "ataX.XX: configured for xxx" if the EH context has the quiet flag set.
> 
> Signed-off-by: Damien Le Moal 
> ---
>  drivers/ata/libata-core.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 3c09122bf038..258afc2e8efd 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -3573,9 +3573,10 @@ static int ata_dev_set_mode(struct ata_device *dev)
>   DPRINTK("xfer_shift=%u, xfer_mode=0x%x\n",
>   dev->xfer_shift, (int)dev->xfer_mode);
>  
> - ata_dev_info(dev, "configured for %s%s\n",
> -  ata_mode_string(ata_xfer_mode2mask(dev->xfer_mode)),
> -  dev_err_whine);
> + if (!(ehc->i.flags & ATA_EHI_QUIET))
> + ata_dev_info(dev, "configured for %s%s\n",
> + ata_mode_string(ata_xfer_mode2mask(dev->xfer_mode)),
> + dev_err_whine);

EHI_QUIET is used during proble to print out the more releveant
messages, so doing the above may surprise some users.  Can we keep the
above message for initial probing?

Thanks.

-- 
tejun

Re: [PATCH] scsi: ata: don't reset three times if device is offline for SAS host

2018-03-01 Thread Tejun Heo

Hello,

On Wed, Feb 28, 2018 at 03:18:39PM +0800, chenxiang (M) wrote:
> If we can introduce a port flags such as ATA_LFLAG_DISABLED or
> ATA_EHI_NO_RECOVERY before ata_eh_recover,
> it will skip recovery. But we only get device status NODEV from
> ata_do_reset which is after ata_eh_recover, i don't
> think it is used to introduce a port flags at that time.
> From function ata_eh_reset, it seems there are two situations that
> end the recovery logic in current code:
> 1. Retry 3 times;
> 2. Reset function return value -ERESTART, but this return value
> seems be specal situation for ATA;

So, we have -ERESTART for restarting, -EPIPE for speeding down.  We
can just add another special return code - say, -ENOENT - to indicate
that there's nothing and it shouldn't keep trying.

Thanks.

-- 
tejun

Re: [PATCH v2] ata: do not schedule hot plug if it is a sas host

2018-03-01 Thread Tejun Heo

On Wed, Feb 28, 2018 at 09:11:10AM +0800, Jason Yan wrote:
> We've got a kernel panic when using sata disk with sas controller:
> 
> [115946.152283] Unable to handle kernel NULL pointer dereference at virtual 
> address 07d8
> [115946.223963] CPU: 0 PID: 22175 Comm: kworker/0:1 Tainted: G   W OEL  
> 4.14.0 #1
> [115946.232925] Workqueue: events ata_scsi_hotplug
> [115946.237938] task: 8021ee50b180 task.stack: 0d5d
> [115946.244717] PC is at sas_find_dev_by_rphy+0x44/0x114
> [115946.250224] LR is at sas_find_dev_by_rphy+0x3c/0x114
> ..
> [115946.355701] Process kworker/0:1 (pid: 22175, stack limit = 
> 0x0d5d)
> [115946.363369] Call trace:
> [115946.456356] [] sas_find_dev_by_rphy+0x44/0x114
> [115946.462908] [] sas_target_alloc+0x20/0x5c
> [115946.469408] [] scsi_alloc_target+0x250/0x308
> [115946.475781] [] __scsi_add_device+0xb0/0x154
> [115946.481991] [] ata_scsi_scan_host+0x180/0x218
> [115946.488367] [] ata_scsi_hotplug+0xb0/0xcc
> [115946.494801] [] process_one_work+0x144/0x390
> [115946.501115] [] worker_thread+0x144/0x418
> [115946.507093] [] kthread+0x10c/0x138
> [115946.512792] [] ret_from_fork+0x10/0x18
> 
> We found that Ding Xiang has reported a similar bug before:
> https://patchwork.kernel.org/patch/9179817/
> 
> And this bug still exists in mainline. Since libsas handles hotplug and
> device adding/removing itself, do not need to schedule ata hot plug task
> here if it is a sas host.
> 
> Signed-off-by: Jason Yan 
> CC: Ding Xiang 

Applied to libata/for-4.16-fixes w/ stable cc'd.

Thanks.

-- 
tejun

Re: [PATCH] ata: do not schedule hot plug if it is a sas host

2018-02-27 Thread Tejun Heo

On Tue, Feb 27, 2018 at 03:08:01PM +0800, Jason Yan wrote:
> We've got a kernel panic when using sata disk with sas controller:
> 
> [115946.152283] Unable to handle kernel NULL pointer dereference at virtual 
> address 07d8
> [115946.223963] CPU: 0 PID: 22175 Comm: kworker/0:1 Tainted: GW  OEL  
> 4.14.0 #1
> [115946.232925] Workqueue: events ata_scsi_hotplug
> [115946.237938] task: 8021ee50b180 task.stack: 0d5d
> [115946.244717] PC is at sas_find_dev_by_rphy+0x44/0x114
> [115946.250224] LR is at sas_find_dev_by_rphy+0x3c/0x114
> ..
> [115946.355701] Process kworker/0:1 (pid: 22175, stack limit = 
> 0x0d5d)
> [115946.363369] Call trace:
> [115946.456356] [] sas_find_dev_by_rphy+0x44/0x114
> [115946.462908] [] sas_target_alloc+0x20/0x5c
> [115946.469408] [] scsi_alloc_target+0x250/0x308
> [115946.475781] [] __scsi_add_device+0xb0/0x154
> [115946.481991] [] ata_scsi_scan_host+0x180/0x218
> [115946.488367] [] ata_scsi_hotplug+0xb0/0xcc
> [115946.494801] [] process_one_work+0x144/0x390
> [115946.501115] [] worker_thread+0x144/0x418
> [115946.507093] [] kthread+0x10c/0x138
> [115946.512792] [] ret_from_fork+0x10/0x18
> 
> We found that Ding Xiang has reported a similar bug before:
> https://patchwork.kernel.org/patch/9179817/
> 
> And this bug still exists in mainline. Since libsas handles hotplug and
> device adding/removing itself, do not need to schedule ata hot plug task
> here if it is a sas host.
> 
> Signed-off-by: Jason Yan 
> CC: Ding Xiang 
> ---
>  drivers/ata/libata-eh.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 11c3137d7b0a..97aeac45b22c 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1384,7 +1384,9 @@ void ata_eh_detach_dev(struct ata_device *dev)
>  
>   if (ata_scsi_offline_dev(dev)) {
>   dev->flags |= ATA_DFLAG_DETACHED;
> - ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
> + /* libsas handles the hotplug itself */
> + if (!(ap->flags & ATA_FLAG_SAS_HOST))
> + ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;

Can you please move the conditional to where we're consuming the flag
instead?

Thanks.

-- 
tejun

Re: [PATCH] scsi: ata: don't reset three times if device is offline for SAS host

2018-02-27 Thread Tejun Heo

Hello,

On Mon, Feb 26, 2018 at 07:45:37PM +0800, chenxiang (M) wrote:
> >So, if there are real consequences, we can definitely add a way to
> >short-circuit the recovery logic but let's do that by adding proper
> >signaling rathr than testing for driver type.
> 
> I am not familiar with ata recovery logic, and do you have idea
> about how to add a way to
> short-circuit the recovery logic by adding proper signaling?

e.g. define a return value from reset function which indicates no
retry or introduce a port flag.  Basically, something which doens't
add special casing logic to the core logic.

Thanks.

-- 
tejun

Re: [PATCH] scsi: ata: don't reset three times if device is offline for SAS host

2018-02-13 Thread Tejun Heo

Hello,

On Tue, Feb 13, 2018 at 09:44:53AM +0800, chenxiang (M) wrote:
> For those drivers using libsas,  i think they have the same issue.
> It takes about 1 minute to
> recover but actually device is gone, so this recover is useless for
> this scenario (when enter EH,
> all normal IOs are blocked actually, so it will cause normal IOs are
> blocked one more minute which
> user doesn't want to).

Right, it'd block other devices sharing the port.  Doesn't sas map
each ata device to its own port tho?

> Actually in sas_ata_hard_reset, there are two situations returned
> -ENODEV which represent device is gone:
> - LLDD directly returns -ENODEV through lldd_I_T_nexus_reset;
> - It sends SMP DISCOVER to check local phy in smp_ata_check_ready,
> and find it is gone;

So, if there are real consequences, we can definitely add a way to
short-circuit the recovery logic but let's do that by adding proper
signaling rathr than testing for driver type.

Thanks.

-- 
tejun

Re: [PATCH] libata: don't try to pass through NCQ commands to non-NCQ devices

2018-02-12 Thread Tejun Heo

On Sat, Feb 03, 2018 at 08:33:51PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> syzkaller hit a WARN() in ata_bmdma_qc_issue() when writing to /dev/sg0.
> This happened because it issued an ATA pass-through command (ATA_16)
> where the protocol field indicated that NCQ should be used -- but the
> device did not support NCQ.
> 
> We could just remove the WARN() from libata-sff.c, but the real problem
> seems to be that the SCSI -> ATA translation code passes through NCQ
> commands without verifying that the device actually supports NCQ.
> 
> Fix this by adding the appropriate check to ata_scsi_pass_thru().

Applied to libata/for-4.16-fixes.

Thanks.

-- 
tejun

Re: [PATCH] libata: remove WARN() for DMA or PIO command without data

2018-02-12 Thread Tejun Heo

On Sat, Feb 03, 2018 at 08:33:27PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> syzkaller hit a WARN() in ata_qc_issue() when writing to /dev/sg0.  This
> happened because it issued a READ_6 command with no data buffer.
> 
> Just remove the WARN(), as it doesn't appear indicate a kernel bug.  The
> expected behavior is to fail the command, which the code does.
> 
> Here's a reproducer that works in QEMU when /dev/sg0 refers to a disk of
> the default type ("82371SB PIIX3 IDE"):
> 
> #include 
> #include 
> 
> int main()
> {
> char buf[42] = { [36] = 0x8 /* READ_6 */ };
> 
> write(open("/dev/sg0", O_RDWR), buf, sizeof(buf));
> }
> 
> Fixes: f92a26365a72 ("libata: change ATA_QCFLAG_DMAMAP semantics")
> Reported-by: 
> syzbot+f7b556d1766502a69d85071d2ff08bd87be53...@syzkaller.appspotmail.com
> Cc:  # v2.6.25+
> Signed-off-by: Eric Biggers 

Applied to libata/for-4.16-fixes.

Thanks.

-- 
tejun

Re: [PATCH] libata: fix length validation of ATAPI-relayed SCSI commands

2018-02-12 Thread Tejun Heo

On Sat, Feb 03, 2018 at 08:30:56PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> syzkaller reported a crash in ata_bmdma_fill_sg() when writing to
> /dev/sg1.  The immediate cause was that the ATA command's scatterlist
> was not DMA-mapped, which causes 'pi - 1' to underflow, resulting in a
> write to 'qc->ap->bmdma_prd[0x]'.
> 
> Strangely though, the flag ATA_QCFLAG_DMAMAP was set in qc->flags.  The
> root cause is that when __ata_scsi_queuecmd() is preparing to relay a
> SCSI command to an ATAPI device, it doesn't correctly validate the CDB
> length before copying it into the 16-byte buffer 'cdb' in 'struct
> ata_queued_cmd'.  Namely, it validates the fixed CDB length expected
> based on the SCSI opcode but not the actual CDB length, which can be
> larger due to the use of the SG_NEXT_CMD_LEN ioctl.  Since 'flags' is
> the next member in ata_queued_cmd, a buffer overflow corrupts it.
> 
> Fix it by requiring that the actual CDB length be <= 16 (ATAPI_CDB_LEN).

Applied to libata/for-4.16-fixes.

Thanks.

-- 
tejun

Re: [PATCH] scsi: ata: don't reset three times if device is offline for SAS host

2018-02-12 Thread Tejun Heo

Hello,

On Wed, Jan 24, 2018 at 09:20:25PM +0800, chenxiang wrote:
> In ata_eh_reset, it will reset three times at most for sata disk. For
> some drivers through libsas, it calls sas_ata_hard_reset at last. When
> device is gone, function sas_ata_hard_reset will return -ENODEV. But
> it will still try to reset three times for offline device. This process
> lasts a long time:
> 
> [11248.344323] ata13.00: status: { ERR }
> [11248.344324] ata13.00: error: { ABRT }
> [11248.344327] ata13: hard resetting link
> [11248.503557] sas: ata: ex 500e004aaa1f phy02:U:A 
> attached: (no device)
> [11249.359524] sas: ata13: end_device-1:0:2: reset failed 
> (errno=-19)[eta03d:19h:35m:17s]
> [11249.365692] ata13: reset failed (errno=-19), retrying in 9 secs
> [11258.451402] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops][eta 
> 03d:22h:10m:48s]
> [11259.411508] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 
> 03d:22h:28m:05s]
> [11259.417683] ata13: reset failed (errno=-19), retrying in 10 secs
> [11268.695401] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 
> 04d:01h:03m:37s]
> [11269.699513] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 
> 04d:01h:20m:54s]
> [11269.705689] ata13: reset failed (errno=-19), retrying in 34 secs
> [11304.275393] ata13: hard resetting linkKB/0KB/0KB /s] [0/0/0 iops] [eta 
> 04d:11h:25m:43s]
> [11305.283516] sas: ata13: end_device-1:0:2: reset failed (errno=-19)[eta 
> 04d:11h:43m:00s]
> [11305.289692] ata13: reset failed, giving up
> [11305.293785] ata13.00: disabled
> 
> Actually it is no need to reset three times for this scenario. So add
> a check to avoid it.

I'm a bit reluctant in changing this per-driver.  Does this actually
hurt something?

Thanks.

-- 
tejun

Re: [PATCH 1/2] ata: enhance the definition of SET MAX feature field value

2018-01-08 Thread Tejun Heo

On Mon, Jan 08, 2018 at 10:16:55PM -0500, Martin K. Petersen wrote:
> 
> Tejun,
> 
> Are you OK with me pulling this change through the SCSI tree?

Sure, yeah, please go ahead.

Thanks.

-- 
tejun

Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238

2017-11-06 Thread Tejun Heo

Hello,

On Mon, Nov 06, 2017 at 03:12:31PM -0800, Linus Torvalds wrote:
> But it does seem to be a new regression in 4.14, caused by commit
> 8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
> that's what added the sysfs_notify() call to scsi_device_set_state(),
> which made that spinlock be a problem.

Yeah, pinged Hannes about it a couple of weeks ago.

> That commit came in through the SCSI merge this merge window, and it
> seems to still revert cleanly.
> 
> So I do suspect that by now we should just revert that commit. It's
> not clear why that state attribute should be pollable, and the new
> code is clearly very much buggy.

I think reverting is the right thing to do right now.  If necessary,
we can make kernfs_notify() safe to be called from atomic contexts.

Thanks.

-- 
tejun

Re: [PATCH v2] ata: fixes kernel crash while tracing ata_eh_link_autopsy event

2017-11-03 Thread Tejun Heo

On Thu, Nov 02, 2017 at 04:31:07PM +0530, Rameshwar Prasad Sahu wrote:
> When tracing ata link error event, the kernel crashes when the disk is
> removed due to NULL pointer access by trace_ata_eh_link_autopsy API.
> This occurs as the dev is NULL when the disk disappeared. This patch
> fixes this crash by calling trace_ata_eh_link_autopsy only if "dev"
> is not NULL.
> 
> v2 changes:
>  Removed direct passing "link" pointer instead of "dev" in trace API.
> 
> Signed-off-by: Rameshwar Prasad Sahu 

Applied to libata/for-4.15.

Thanks.

-- 
tejun

Re: [PATCH] ata: fixes kernel crash while tracing ata_eh_link_autopsy event

2017-11-01 Thread Tejun Heo

Hello,

On Tue, Oct 31, 2017 at 08:52:44PM +0530, Rameshwar Sahu wrote:
> > probably should take both link and dev and use dev iff it's not NULL.
> >
> 
> Instead of this would it be better to call trace_ata_eh_link_autopsy() if
> dev is not NULL from ata error handler ??

Oh yeah, that'd work too and be probably better.

Thanks.

-- 
tejun

Re: [PATCH] ata: fixes kernel crash while tracing ata_eh_link_autopsy event

2017-10-25 Thread Tejun Heo

Hello,

On Wed, Oct 25, 2017 at 03:52:56PM +0530, Rameshwar Prasad Sahu wrote:
> @@ -288,8 +289,8 @@
>   ),
> 
>   TP_fast_assign(
> - __entry->ata_port   = dev->link->ap->print_id;
> - __entry->ata_dev= dev->link->pmp + dev->devno;
> + __entry->ata_port   = link->ap->print_id;
> + __entry->ata_dev= link->pmp + link->device->devno;

The above is wrong if there are multiple devices on the link.  It
probably should take both link and dev and use dev iff it's not NULL.

Thanks.

-- 
tejun

Re: [PATCH V2] scsi: storvsc: Allow only one remove lun work item to be issued per lun

2017-10-21 Thread Tejun Heo

Hello,

On Thu, Oct 19, 2017 at 08:35:10AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 17, 2017 at 01:35:21PM -0400, Cathy Avery wrote:
> > +   /*
> > +* Set the error handler work queue.
> > +*/
> > +   snprintf(host_dev->work_q_name, sizeof(host_dev->work_q_name),
> > +"storvsc_error_wq_%d", host->host_no);
> > +   host_dev->handle_error_wq =
> > +   create_singlethread_workqueue(host_dev->work_q_name);
> 
> If you use alloc_ordered_workqueue directly instead of
> create_singlethread_workqueue you can pass a format string and don't
> need the separate allocation.
> 
> But I'm not sure if Tejun is fine with using __WQ_LEGACY directly..

The only thing that flag does is exempting the workqueue from possible
flush deadlock check as we don't know whether WQ_MEM_RECLAIM on a
legacy workqueue is intentional.  There's no reason to add it when
converting to alloc_ordered_workqueue().  Just decide whether it needs
forward progress guarantee and use WQ_MEM_RECLAIM if so.

Thanks.

-- 
tejun

Re: [PATCH V8 5/5] libata: Align DMA buffer to dma_get_cache_alignment()

2017-10-18 Thread Tejun Heo

On Tue, Oct 17, 2017 at 04:05:42PM +0800, Huacai Chen wrote:
> In non-coherent DMA mode, kernel uses cache flushing operations to
> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
> and a kernel structure share a same cache line, and if the kernel
> structure has dirty data, cache_invalidate (no writeback) will cause
> data corruption.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Huacai Chen 
> ---
>  drivers/ata/libata-core.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index ee4c1ec..e134955 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct 
> ata_device *adev)
>  unsigned int ata_do_dev_read_id(struct ata_device *dev,
>   struct ata_taskfile *tf, u16 *id)
>  {
> - return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
> -  id, sizeof(id[0]) * ATA_ID_WORDS, 0);
> + u16 *devid;
> + int res, size = sizeof(u16) * ATA_ID_WORDS;
> +
> + if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
> + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
> size, 0);
> + else {
> + devid = kmalloc(size, GFP_KERNEL);
> + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
> size, 0);
> + memcpy(id, devid, size);
> + kfree(devid);
> + }
> +
> + return res;

Hmm... I think it'd be a lot better to ensure that the buffers are
aligned properly to begin with.  There are only two buffers which are
used for id reading - ata_port->sector_buf and ata_device->id.  Both
are embedded arrays but making them separately allocated aligned
buffers shouldn't be difficult.

Thanks.

-- 
tejun

Re: [PATCH V8 5/8] percpu-refcount: introduce __percpu_ref_tryget_live

2017-10-03 Thread Tejun Heo

Hello,

On Wed, Oct 04, 2017 at 03:20:40AM +0800, Ming Lei wrote:
> On Tue, Oct 03, 2017 at 07:14:59AM -0700, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Oct 03, 2017 at 10:04:03PM +0800, Ming Lei wrote:
> > > Block layer need to call this function after holding
> > > rcu lock in a real hot path, so introduce this helper.
> > 
> > The patch description is too anemic.  It doesn't even describe what
> > changes are being made, let alone justifying them.
> 
> Sorry for Cc you, and this change is needed by the following
> patch:
> 
>   https://marc.info/?l=linux-scsi=150703956929333=2
> 
> Block layer need to call percpu_ref_tryget_live()
> with holding RCU read lock, given it is a hot I/O
> path, that is why I make this patch so that we can
> avoid nested RCU read lock and save a bit cost.

I'm not necessarily against it but please name it sth like
percpu_ref_tryget_live_locked(), add lockdep assertions in the inner
function and document properly.

Thanks.

-- 
tejun

Re: [PATCH V8 5/8] percpu-refcount: introduce __percpu_ref_tryget_live

2017-10-03 Thread Tejun Heo

Hello,

On Tue, Oct 03, 2017 at 10:04:03PM +0800, Ming Lei wrote:
> Block layer need to call this function after holding
> rcu lock in a real hot path, so introduce this helper.

The patch description is too anemic.  It doesn't even describe what
changes are being made, let alone justifying them.

Thanks.

-- 
tejun

Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()

2017-09-01 Thread Tejun Heo

Hello, Ming.

> +/**
> + * percpu_ref_is_dead - test whether a percpu refcount is killed
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref is dead
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> +{
> + unsigned long __percpu *percpu_count;
> +
> + if (__ref_is_percpu(ref, _count))
> + return false;
> + return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> +}

Can you please explain why percpu_ref_is_dying() isn't enough for your
use case?

Thanks.

-- 
tejun

Re: No I/O errors reported after SATA link hard reset

2017-08-17 Thread Tejun Heo

Hello,

On Thu, Aug 17, 2017 at 04:15:35PM +0200, Gionatan Danti wrote:
> Ok, so *this* is the root cause of the problem: libata not
> identifying spurious link renegotiations vs brief powerloss/powerup
> events. Out of curiosity: is this a SATA-specific problem (ie: in
> the SATA specification), or even SAS disks are affected?

No idea about SAS.  They're identical at the link layer tho.

> >Because we don't wanna be ditching disks on temporary link glitches,
> >which do happen once in a while.
> 
> Any chances to report I/O errors to the upper layers *without*
> offlining the device? In this manner, upper layers (ie: MDRAID) can
> act in a more informate way. For example: single disk device will
> simple retry the failed operation, while MDRAID can take the
> "badblocks" code path to deal with the error.

Upper layer can request to avoid retrying on errors but it won't help
too much.  It doesn't have much to do with specific commands.  A power
event can take place without any command in flight and lose the
buffered data.  Unless upper layer is tracking all that's being
written, there isn't much it can do outside doing full scan.  This is
a condition which should be handled from the driver side.

> >So, the right way to deal with the problem probably is making use of
> >the SMART counter which indicates power loss events and verify that
> >the counter hasn't increased over link issues.  If it changed, the
> >device should be detached and re-probed, which will make it come back
> >as a different block device.  Unfortunately, I haven't had the chance
> >to actually implement that.
> 
> This is a very good idea, maybe I can implement it in userspace with
> a simple, fast polling scheme (for example, each 60 seconds). Such a
> polling would not prevent all corruption scenarios, but will at
> least timely inform the user.

Yeah, looking into getting it implemented on the kernel side.

Thanks.

-- 
tejun

Re: No I/O errors reported after SATA link hard reset

2017-08-17 Thread Tejun Heo

Hello,

On Thu, Aug 17, 2017 at 03:18:06PM +0200, Bernd Schubert wrote:
> So for Gionatan the root cause was an instable power supply, but in my
> case there wasn't any power loss, there were just failed sata commands.
> I'm not sure if this was a port or cable issue - once I changed port and
> sata cable the errors disappeared. I didn't change the power supply or
> power cable. I'm now basically fighting with the data corruption that
> caused - for btrfs it at least has a checksum, but I didn't have ext4
> checksum enabled, so it is hard to figure out which files are corrupts -
> silent data corruption is not well handled by backups either.

No idea there.  Retried and recovered errors shouldn't cause data
corruptions.  Flaky power can behave in unexpected ways tho.  What
happens if you hook up the drive on a different power supply but
revert to the port / cable which showed the problem?  What does your
SMART counters say across those failures?

> Is it possible that sata eh recovery sends resets to the device, which
> makes it evict its cache?

That'd be a very broken device.  It sure is theoretically possible but
I haven't seen any reports on such behaviors yet.

Thanks.

-- 
tejun

Re: No I/O errors reported after SATA link hard reset

2017-08-17 Thread Tejun Heo

Hello,

On Thu, Aug 17, 2017 at 11:24:22AM +0200, Bernd Schubert wrote:
> > More concerning is the fact that these undetected errors can make their
> > way even when the higher application consistently calls sync() and/or
> > fsync. In other words, it seems than even acknowledged writes can fail
> > in this manner (and this is consistent with the first machine corrupting
> > its filesystem due to journal trashing - XFS journal surely uses sync()
> > where appropriate). The mechanism seems the following:
> > 
> > - an higher layer application issue sync();
> > - a write barrier is generated;
> > - a first FLUSH CACHE command is sent to the disk;
> > - data are written to the disk's DRAM cache;
> > - power is lost! The volatile cache lose its content;
> > - power is re-established and the disk become responsive again;
> > - a second FLUSH CACHE command is sent to the disk;
> > - the disk acks each SATA command, but real data are lost.

Recovered errors aren't reported as IO errors and at least from link
state proper there's no way for the driver to tell apart link
glitches and buffer-erasing power issues.

> > Now, I have few questions:
> > - is the above explanation plausible, or I am (horribly) missing something?

For the most part, yes.  To be more accurate, the failure is coming
from libata not being able to tell apart link glitches from the device
getting reset due to power issues.

> > - why the scsi midlevel does not respond to a power loss event by
> > immediately offlining the disks?

Because we don't wanna be ditching disks on temporary link glitches,
which do happen once in a while.

> > - is the scsi midlevel behavior configurable (I know I can lower eh
> > timeout, but is this the right solution)?
> > - how to deal with this problem (other than being 100% sure power is
> > never lost by any disks)?

So, the right way to deal with the problem probably is making use of
the SMART counter which indicates power loss events and verify that
the counter hasn't increased over link issues.  If it changed, the
device should be detached and re-probed, which will make it come back
as a different block device.  Unfortunately, I haven't had the chance
to actually implement that.

Thanks.

-- 
tejun

Re: Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?

2017-08-14 Thread Tejun Heo

Hello, Joe.

On Thu, Aug 10, 2017 at 10:45:54AM -0400, Joe Lawrence wrote:
> In the case of my USB DVD -> laptop example, there is no media in my
> device, however I still see the DISK_EVENT_MEDIA_CHANGE event.  This is
> a bit confusing, and I was wondering if there was an explanation for
> the following:
> 
> drivers/scsi/sr.c :: sr_probe()
> 
> disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;
> ...
> cd->media_present = 1;
> 
> DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and
> for some reason cd->media_present defaults to 1?  More on that below...

I don't have any concrete ideas but I think the only thing it's trying
to do is to always generate at least one changed event no matter what.

...
> sr_check_events() compares the previous (in this case, default)
> media_present value with what the TUR returns.  If it has changed, then
> turn on the DISK_EVENT_MEDIA_CHANGE event bit.
> 
> In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a,
> so last_present (1) and cd->media_present (0) mis-compare and the change
> event is set.  That does not seem intuitive to me, is this a bug?

It's not incorrect.  We can try to change the behavior to avoid double
notifications but given that this has been like this for a really long
time and that it isn't technically incorrect, I'm not sure changing it
is a good idea.  It might as well break other things.

> Bringing this back to the reported BMC case, which presumably does have
> "media" present in the virtual device... is it reasonable to expect a
> DISK_EVENT_MEDIA_CHANGE even for a new device that contains media?  (I
> haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might
> be enough to set media present.)

Yeah, I think so.

> If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions
> somewhere, feel free to point me there.

AFAIK, there isn't any.  The only thing it tries to do is generating
at least one event after media change.  Given that media is reported
present after the last notification, I think userspace should be able
to do the right thing (and must have been doing the right thing until
recently).

Thanks.

-- 
tejun

Re: [PATCH] sd: add support for TCG OPAL self encrypting disks

2017-06-28 Thread Tejun Heo

On Mon, Jun 19, 2017 at 02:26:46PM +0200, Christoph Hellwig wrote:
> Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver
> and the Security In/Out commands.
> 
> Note that I don't know of any actual SCSI disks that do support TCG OPAL,
> but this is required to support ATA disks through libata.
> 
> Signed-off-by: Christoph Hellwig 

Applied to libata/for-4.13 w/ Martin's ack added.

Thanks.

-- 
tejun

Re: [PATCH] libata: Support for an ATA PASS-THROUGH(32) command.

2017-06-27 Thread Tejun Heo

On Sat, Jun 24, 2017 at 03:41:10AM +0900, Minwoo Im wrote:
> SAT-4(SCSI/ATA Translation) supports for an ata pass-thru(32).
> This patch will allow to translate an ata pass-thru(32) SCSI cmd
> to an ATA cmd.
> 
> Signed-off-by: Minwoo Im 
> Reviewed-by: Bart Van Assche 

Applied to libata/for-4.13.

Thanks.

-- 
tejun

Re: [PATCH 02/11] dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs

2017-06-26 Thread Tejun Heo

On Mon, Jun 26, 2017 at 12:07:30AM -0700, Christoph Hellwig wrote:
> Tejun, does this look ok to you?

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH 01/11] dma-mapping: remove dmam_free_noncoherent

2017-06-26 Thread Tejun Heo

On Mon, Jun 26, 2017 at 12:07:15AM -0700, Christoph Hellwig wrote:
> Tejun, does this look ok to you?

Sure,

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH] sd: add support for TCG OPAL self encrypting disks

2017-06-26 Thread Tejun Heo

On Mon, Jun 26, 2017 at 12:43:27PM -0400, Martin K. Petersen wrote:
> 
> Christoph,
> 
> > ping?
> 
> Looks good to me. I'll queue it up for 4.13 as soon as Linus has pulled
> in the ata bits.

I can route it through libata tree w/ your ack if that's more convenient.

Thanks.

-- 
tejun

Re: TCG Opal support for libata

2017-06-05 Thread Tejun Heo

On Sun, Jun 04, 2017 at 02:42:19PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for using our new generic TCG OPAL code with
> SATA disks, and as side effect for SCSI disks (although so far it doesn't
> seem like none of those actually exist).

Applied 1-5 to libata/for-4.13.  Updated 4 for line continuation style
consistency as pointed out by Sergei.

Thanks.

-- 
tejun

Re: [PATCH 09/31] block: Avoid that blk_exit_rl() triggers a use-after-free

2017-05-24 Thread Tejun Heo

On Tue, May 23, 2017 at 05:33:58PM -0700, Bart Van Assche wrote:
> Since the introduction of the .init_rq_fn() and .exit_rq_fn() it
> is essential that the memory allocated for struct request_queue
> stays around until all blk_exit_rl() calls have finished. Hence
> make blk_init_rl() take a reference on struct request_queue.
> 
> This patch fixes the following crash:
> 
> general protection fault:  [#2] SMP
> CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G  D 4.12.0-rc2-dbg+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> task: 88013a108040 task.stack: c971c000
> RIP: 0010:free_request_size+0x1a/0x30
> RSP: 0018:c971fd38 EFLAGS: 00010202
> RAX: 6b6b6b6b6b6b6b6b RBX: 880067362a88 RCX: 0003
> RDX: 880067464178 RSI: 880067362a88 RDI: 880135ea4418
> RBP: c971fd40 R08:  R09: 000100180009
> R10: c971fd38 R11: 81110800 R12: 88006752d3d8
> R13: 88006752d3d8 R14: 88013a108040 R15: 000a
> FS:  () GS:88013fd8() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fa8ec1edb00 CR3: 000138ee8000 CR4: 001406e0
> Call Trace:
>  mempool_destroy.part.10+0x21/0x40
>  mempool_destroy+0xe/0x10
>  blk_exit_rl+0x12/0x20
>  blkg_free+0x4d/0xa0
>  __blkg_release_rcu+0x59/0x170
>  rcu_process_callbacks+0x260/0x4e0
>  __do_softirq+0x116/0x250
>  smpboot_thread_fn+0x123/0x1e0
>  kthread+0x109/0x140
>  ret_from_fork+0x31/0x40
> 
> Fixes: commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of 
> struct request")
> Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
> Cc: Jens Axboe <ax...@fb.com>
> Cc: Christoph Hellwig <h...@lst.de>
> Cc: Tejun Heo <t...@kernel.org>
> Cc: Jan Kara <j...@suse.cz>
> Cc: Hannes Reinecke <h...@suse.com>
> Cc: <sta...@vger.kernel.org> # v4.11+

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: Race to power off harming SATA SSDs

2017-05-08 Thread Tejun Heo

Hello,

On Mon, May 08, 2017 at 08:56:15PM +0200, Pavel Machek wrote:
> Well... the SMART counter tells us that the device was not shut down
> correctly. Do we have reason to believe that it is _not_ telling us
> truth? It is more than one device.

It also finished power off command successfully.

> SSDs die when you power them without warning:
> http://lkcl.net/reports/ssd_analysis.html
> 
> What kind of data would you like to see? "I have been using linux and
> my SSD died"? We have had such reports. "I have killed 10 SSDs in a
> week then I added one second delay, and this SSD survived 6 months"?

Repeating shutdown cycles and showing that the device actually is in
trouble would be great.  It doesn't have to reach full-on device
failure.  Showing some sign of corruption would be enough - increase
in CRC failure counts, bad block counts (a lot of devices report
remaining reserve or lifetime in one way or the other) and so on.
Right now, it might as well be just the SMART counter being funky.

Thanks.

-- 
tejun

Re: Race to power off harming SATA SSDs

2017-05-08 Thread Tejun Heo

Hello,

On Mon, May 08, 2017 at 06:43:22PM +0200, Pavel Machek wrote:
> What I was trying to point out was that storage people try to treat
> SSDs as HDDs... and SSDs are very different. Harddrives mostly survive
> powerfails (with emergency parking), while it is very, very difficult
> to make SSD survive random powerfail, and we have to make sure we
> always powerdown SSDs "cleanly".

We do.

The issue raised is that some SSDs still increment the unexpected
power loss count even after clean shutdown sequence and that the
kernel should wait for some secs before powering off.

We can do that for select devices but I want something more than "this
SMART counter is getting incremented" before doing that.

Thanks.

-- 
tejun

Re: Race to power off harming SATA SSDs

2017-04-10 Thread Tejun Heo

Hello,

On Mon, Apr 10, 2017 at 08:21:19PM -0300, Henrique de Moraes Holschuh wrote:
...
> Per spec (and device manuals), SCSI, SATA and ATA-attached SSDs must be
> informed of an imminent poweroff to checkpoing background tasks, flush
> RAM caches and close logs.  For SCSI SSDs, you must tissue a
> START_STOP_UNIT (stop) command.  For SATA, you must issue a STANDBY
> IMMEDIATE command.  I haven't checked ATA, but it should be the same as
> SATA.

Yeah, it's the same.  Even hard drives are expected to survive a lot
of unexpected power losses tho.  They have to do emergency head
unloads but they're designed to withstand a healthy number of them.

> In order to comply with this requirement, the Linux SCSI "sd" device
> driver issues a START_STOP_UNIT command when the device is shutdown[1].
> For SATA SSD devices, the SCSI START_STOP_UNIT command is properly
> translated by the kernel SAT layer to STANDBY IMMEDIATE for SSDs.
> 
> After issuing the command, the kernel properly waits for the device to
> report that the command has been completed before it proceeds.
> 
> However, *IN PRACTICE*, SATA STANDBY IMMEDIATE command completion
> [often?] only indicates that the device is now switching to the target
> power management state, not that it has reached the target state.  Any
> further device status inquires would return that it is in STANDBY mode,
> even if it is still entering that state.
> 
> The kernel then continues the shutdown path while the SSD is still
> preparing itself to be powered off, and it becomes a race.  When the
> kernel + firmware wins, platform power is cut before the SSD has
> finished (i.e. the SSD is subject to an unclean power-off).

At that point, the device is fully flushed and in terms of data
integrity should be fine with losing power at any point anyway.

> Evidently, how often the SSD will lose the race depends on a platform
> and SSD combination, and also on how often the system is powered off.
> A sluggish firmware that takes its time to cut power can save the day...
> 
> 
> Observing the effects:
> 
> An unclean SSD power-off will be signaled by the SSD device through an
> increase on a specific S.M.A.R.T attribute.  These SMART attributes can
> be read using the smartmontools package from www.smartmontools.org,
> which should be available in just about every Linux distro.
> 
>   smartctl -A /dev/sd#
> 
> The SMART attribute related to unclean power-off is vendor-specific, so
> one might have to track down the SSD datasheet to know which attribute a
> particular SSD uses.  The naming of the attribute also varies.
> 
> For a Crucial M500 SSD with up-to-date firmware, this would be attribute
> 174 "Unexpect_Power_Loss_Ct", for example.
> 
> NOTE: unclean SSD power-offs are dangerous and may brick the device in
> the worst case, or otherwise harm it (reduce longevity, damage flash
> blocks).  It is also not impossible to get data corruption.

I get that the incrementing counters might not be pretty but I'm a bit
skeptical about this being an actual issue.  Because if that were
true, the device would be bricking itself from any sort of power
losses be that an actual power loss, battery rundown or hard power off
after crash.

> Testing, and working around the issue:
> 
> I've asked for several Debian developers to test a patch (attached) in
> any of their boxes that had SSDs complaining of unclean poweroffs.  This
> gave us a test corpus of Intel, Crucial and Samsung SSDs, on laptops,
> desktops, and a few workstations.
> 
> The proof-of-concept patch adds a delay of one second to the SD-device
> shutdown path.
> 
> Previously, the more sensitive devices/platforms in the test set would
> report at least one or two unclean SSD power-offs a month.  With the
> patch, there was NOT a single increase reported after several weeks of
> testing.
> 
> This is obviously not a test with 100% confidence, but it indicates very
> strongly that the above analysis was correct, and that an added delay
> was enough to work around the issue in the entire test set.
> 
> 
> 
> Fixing the issue properly:
> 
> The proof of concept patch works fine, but it "punishes" the system with
> too much delay.  Also, if sd device shutdown is serialized, it will
> punish systems with many /dev/sd devices severely.
> 
> 1. The delay needs to happen only once right before powering down for
>hibernation/suspend/power-off.  There is no need to delay per-device
>for platform power off/suspend/hibernate.
> 
> 2. A per-device delay needs to happen before signaling that a device
>can be safely removed when doing controlled hotswap (e.g. when
>deleting the SD device due to a sysfs command).
> 
> I am unsure how much *total* delay would be enough.  Two seconds seems
> like a safe bet.
> 
> Any comments?  Any clues on how to make the delay "smarter" to trigger
> only once during platform shutdown, but still trigger per-device when
> doing per-device hotswapping ?

So, if this is actually an issue, sure,

Re: [PATCH 0/2] Fix sysfs recursive removal splats in isci

2017-03-29 Thread Tejun Heo

Hello,

On Wed, Mar 29, 2017 at 11:41:07AM +0200, Johannes Thumshirn wrote:
> This series fixes a sysfs warning caused by isci not being able to cope with
> recursive sysfs path removals which are in place since commit bcdde7e
> ("sysfs: make __sysfs_remove_dir() recursive").

Thanks for fixing these.

> The mvsas, aic94xx and pm8001 and hisi_sas patches have been compile tested
> only hence they have no callstack of the affected path in their changelogs.
> 
> I'm not sure whether to mark this patches as stable or not. I tend to say no
> here, although we've seen complaints/bug reports on lkml and the scsi list.

Given that the failures aren't critical or all that common (only
happens on controller removal), I agree that not cc'ing stable is the
right call here.

Thanks.

-- 
tejun

Re: [PATCH v2] libsas: fix "sysfs group not found" warnings at port teardown time

2017-03-28 Thread Tejun Heo

Hello,

On Fri, Mar 24, 2017 at 05:53:54PM +0100, Johannes Thumshirn wrote:
> [ +Cc Tejun ]
> 
> On Fri, Mar 24, 2017 at 11:44:55AM +, John Garry wrote:
> > To be clear, was this the same test with isci which you initially reported?
> 
> Yes, just echo into the PCI device's sysfs remove file and it'll trigger the
> problem.
> 
> I did some archeology and it seems as if commit bcdde7e ("sysfs: make
> __sysfs_remove_dir() recursive") introduced/uncovered this behavior.

I couldn't reproduce it with other devices (don't have a sas controller).

> For reference, here's one of my calltraces (the first of 40!):
> [ cut here ]
> WARNING: CPU: 2 PID: 5 at fs/sysfs/group.c:241 sysfs_remove_group+0xc3/0xd0
> sysfs group 'power' not found for kobject 'end_device-6:0'
> CPU: 16 PID: 5884 Comm: repro.sh Not tainted 4.11.0-rc3-libsas+ #504
> Call Trace:
>  dump_stack+0x85/0xc2
>  __warn+0xc6/0xe0
>  warn_slowpath_fmt+0x4a/0x50
>  sysfs_remove_group+0xc3/0xd0
>  dpm_sysfs_remove+0x52/0x60
>  device_del+0x13c/0x360
>  ? device_remove_file+0x14/0x20
>  attribute_container_class_device_del+0x15/0x20
>  transport_remove_classdev+0x4c/0x60
>  ? transport_add_class_device+0x40/0x40
>  attribute_container_device_trigger+0xb3/0xc0
>  transport_remove_device+0x10/0x20
>  sas_port_delete+0x12d/0x160 [scsi_transport_sas]
>  sas_deform_port+0x1bf/0x1d0 [libsas]
>  sas_unregister_ports+0x36/0x50 [libsas]
>  sas_unregister_ha+0x1b/0x40 [libsas]
>  isci_unregister+0x2a/0x40 [isci]
>  isci_pci_remove+0x52/0xb0 [isci]
>  ? __pm_runtime_resume+0x56/0x80
>  pci_device_remove+0x34/0xb0
>  device_release_driver_internal+0x158/0x210
>  device_release_driver+0xd/0x10
>  pci_stop_bus_device+0x85/0x90
>  pci_stop_and_remove_bus_device_locked+0x15/0x30
>  remove_store+0x59/0x70
>  dev_attr_store+0x13/0x20
>  sysfs_kf_write+0x40/0x50
>  kernfs_fop_write+0x130/0x1b0
>  __vfs_write+0x23/0x130
>  ? rcu_read_lock_sched_held+0x6d/0x80
>  ? rcu_sync_lockdep_assert+0x2a/0x50
>  ? __sb_start_write+0xd7/0x1e0
>  ? vfs_write+0x1a4/0x1f0
>  vfs_write+0xc6/0x1f0
>  SyS_write+0x44/0xa0
>  entry_SYSCALL_64_fastpath+0x23/0xc6
> 
> But as I said, I don't belive this is a problem in the SAS transport or the
> SAS drivers, but a device core or transport class.

So, what's most likely happening is that the parent device or kobject
which contains the attribute group has already been removed earlier
and because the removal is recursive, the later explicit removal is
trying to remove already removed files.  It can be fixed either by
reordering so that the parent node is removed after the children or
simply dropping the explicit removal of children.

Thanks.

-- 
tejun

Re: support ranges TRIM for libata

2017-03-23 Thread Tejun Heo

Hello, Christoph.

On Thu, Mar 23, 2017 at 03:43:30PM +0100, Christoph Hellwig wrote:
> > That's up to you ... from the point of view of code documenting itself,
> > forming the ATA_16 TRIM in sd and not doing any satl transformation is
> > easier for others to follow, but if it's going to cause more code, I'm
> > only marginal on the advantages of easier to follow code.
> 
> I tried this earlier before giving up on it because it looked to ugly.
> But I can complete that version of it and post it for people to compare.

I kinda like the idea of sticking with satl as that's how libata has
been doing most things even if the implementation is uglier.  It'd be
great to find out whether the ugliness would be acceptable or too
much.

Thanks.

-- 
tejun

Re: support ranges TRIM for libata

2017-03-21 Thread Tejun Heo

Hello,

On Mon, Mar 20, 2017 at 04:43:12PM -0400, Christoph Hellwig wrote:
> This series implements rangeѕ discard for ATA SSDs.  Compared to the
> initial NVMe support there are two things that complicate the ATA
> support:
> 
>  - ATA only suports 16-bit long ranges
>  - the whole mess of generating a SCSI command first and then
>translating it to an ATA one.
> 
> This series adds support for limited range size to the block layer,
> and stops translating discard commands - instead we add a new
> Vendor Specific SCSI command that contains the TRIM payload when
> the device asks for it.

I do like the fact that this is a lot simpler than the previous
implementation but am not quite sure we want to deviate significantly
from what we do for other commands (command translation).  Is it
because fixing the existing implementation would involve invaisve
changes including memory allocations?

Thanks.

-- 
tejun

Re: [PATCH] libata: make ata_sg_clean static over again

2017-03-13 Thread Tejun Heo

On Fri, Mar 10, 2017 at 10:05:40AM +0800, Jason Yan wrote:
> Fixes the following sparse warning:
> 
> drivers/ata/libata-core.c:4913:6: warning: symbol 'ata_sg_clean' was not
> declared. Should it be static?
> 
> Signed-off-by: Jason Yan 

Applied to libata/for-4.12.

Thanks.

-- 
tejun

Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-09 Thread Tejun Heo

On Thu, Mar 09, 2017 at 11:16:21AM +0100, Jan Kara wrote:
> SCSI can call device_add_disk() several times for one request queue when
> a device in unbound and bound, creating new gendisk each time. This will
> lead to bdi being repeatedly registered and unregistered. This was not a
> big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
> del_gendisk()" since bdi was only registered repeatedly (bdi_register()
> handles repeated calls fine, only we ended up leaking reference to
> gendisk due to overwriting bdi->owner) but unregistered only in
> blk_cleanup_queue() which didn't get called repeatedly. After
> 165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
> cycles however bdi_unregister() is not prepared for it. So make sure
> bdi_unregister() cleans up bdi in such a way that it is prepared for
> a possible following bdi_register() call.
> 
> An easy way to provoke this behavior is to enable
> CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
> scsi disk which immediately hangs without this fix.
> 
> Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
> Tested-by: Omar Sandoval <osan...@fb.com>
> Signed-off-by: Jan Kara <j...@suse.cz>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks!

-- 
tejun

Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-08 Thread Tejun Heo

Hello,

On Wed, Mar 08, 2017 at 05:48:31PM +0100, Jan Kara wrote:
> @@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info 
> *bdi)
>*/
>   atomic_dec(>usage_cnt);
>   wait_event(cgwb_release_wait, !atomic_read(>usage_cnt));
> + /*
> +  * Grab back our reference so that we hold it when @bdi gets
> +  * re-registered.
> +  */
> + atomic_inc(>usage_cnt);

So, this is more re-initializing the ref to the initial state so that
it can be re-used, right?  Maybe ATOMIC_INIT() is a better choice here
just to clarify what's going on?

Thanks.

-- 
tejun

Re: [PATCH 3/4] block: Make del_gendisk() safer for disks without queues

2017-03-08 Thread Tejun Heo

On Wed, Mar 08, 2017 at 05:48:33PM +0100, Jan Kara wrote:
> Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()"
> added disk->queue dereference to del_gendisk(). Although del_gendisk()
> is not supposed to be called without disk->queue valid and
> blk_unregister_queue() warns in that case, this change will make it oops
> instead. Return to the old more robust behavior of just warning when
> del_gendisk() gets called for gendisk with disk->queue being NULL.
> 
> Reported-by: Dan Carpenter <dan.carpen...@oracle.com>
> Signed-off-by: Jan Kara <j...@suse.cz>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH 2/4] bdi: Fix use-after-free in wb_congested_put()

2017-03-08 Thread Tejun Heo

On Wed, Mar 08, 2017 at 05:48:32PM +0100, Jan Kara wrote:
> bdi_writeback_congested structures get created for each blkcg and bdi
> regardless whether bdi is registered or not. When they are created in
> unregistered bdi and the request queue (and thus bdi) is then destroyed
> while blkg still holds reference to bdi_writeback_congested structure,
> this structure will be referencing freed bdi and last wb_congested_put()
> will try to remove the structure from already freed bdi.
> 
> With commit 165a5e22fafb "block: Move bdi_unregister() to
> del_gendisk()", SCSI started to destroy bdis without calling
> bdi_unregister() first (previously it was calling bdi_unregister() even
> for unregistered bdis) and thus the code detaching
> bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
> started hitting this use-after-free bug. It is enough to boot a KVM
> instance with virtio-scsi device to trigger this behavior.
> 
> Fix the problem by detaching bdi_writeback_congested structures in
> bdi_exit() instead of bdi_unregister(). This is also more logical as
> they can get attached to bdi regardless whether it ever got registered
> or not.
> 
> Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
> Signed-off-by: Jan Kara <j...@suse.cz>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun

Re: [PATCH v2] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2017-01-18 Thread Tejun Heo

Hello,

On Tue, Jan 17, 2017 at 08:25:21PM +0530, Rameshwar Sahu wrote:
> Hi Tejun,
> 
> On Fri, Nov 18, 2016 at 3:15 PM, Rameshwar Prasad Sahu  wrote:
> > This patch enables NCQ support for APM X-Gene SATA controller hardware v1.1
> > that was broken with hardware v1.0. Second thing, here we should not assume
> > XGENE_AHCI_V2 always in case of having valid _CID in ACPI table. I need to
> > remove this assumption because V1_1 also has a valid _CID for backward
> > compatibly with v1.
> >
> > v2 changes:
> > 1. Changed patch description
> >
> > Signed-off-by: Rameshwar Prasad Sahu 

Hmm... I don't have the patch in my queue.  Can you please resend it?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2016-11-15 Thread Tejun Heo

Hello, Rameshwar.

On Fri, Nov 11, 2016 at 01:36:28PM +0530, Rameshwar Sahu wrote:
> Hi Tejun,
> 
> On Wed, Nov 9, 2016 at 10:15 PM, Tejun Heo <t...@kernel.org> wrote:
> > Hello,
> >
> > On Wed, Sep 14, 2016 at 04:15:00PM +0530, Rameshwar Sahu wrote:
> >> > @@ -821,8 +823,6 @@ static int xgene_ahci_probe(struct platform_device
> >> > *pdev)
> >> > dev_warn(>dev, "%s: Error reading
> >> > device info. Assume version1\n",
> >> > __func__);
> >> > version = XGENE_AHCI_V1;
> >> > -   } else if (info->valid & ACPI_VALID_CID) {
> >> > -   version = XGENE_AHCI_V2;
> >
> > Can you please explain this part a bit?  Everything else looks good to
> > me.
> 
> Here we should not assume XGENE_AHCI_V2 always in case of having valid
> _CID in ACPI table.
> I need to remove this assumption because V1_1 has also valid _CID for
> backward compatibly with v1.

Can you please repost with the above explanation added to the commit
message?

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2016-11-09 Thread Tejun Heo

Hello,

On Wed, Sep 14, 2016 at 04:15:00PM +0530, Rameshwar Sahu wrote:
> > @@ -821,8 +823,6 @@ static int xgene_ahci_probe(struct platform_device
> > *pdev)
> > dev_warn(>dev, "%s: Error reading
> > device info. Assume version1\n",
> > __func__);
> > version = XGENE_AHCI_V1;
> > -   } else if (info->valid & ACPI_VALID_CID) {
> > -   version = XGENE_AHCI_V2;

Can you please explain this part a bit?  Everything else looks good to
me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mvsas: fix error return code in mvs_task_prep()

2016-11-01 Thread Tejun Heo

On Mon, Oct 31, 2016 at 10:29:26AM -0600, Tejun Heo wrote:
> On Mon, Oct 31, 2016 at 03:04:10PM +, Wei Yongjun wrote:
> > From: Wei Yongjun <weiyongj...@huawei.com>
> > 
> > Fix to return error code -ENOMEM from the error handling
> > case instead of 0, as done elsewhere in this function.
> > 
> > Signed-off-by: Wei Yongjun <weiyongj...@huawei.com>
> 
> Applied to libata/for-4.9-fixes.

I messed up.  This should have gone through scsi, not libata.  Chatted
with Martin and as the patch is a small obvious fix decided to leave
it in libata tree.  My apologies for the confusion.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mvsas: fix error return code in mvs_task_prep()

2016-10-31 Thread Tejun Heo

On Mon, Oct 31, 2016 at 03:04:10PM +, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fix to return error code -ENOMEM from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Signed-off-by: Wei Yongjun 

Applied to libata/for-4.9-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo

Hello,

Removed ata_ncq_prio_on() and renamed _on to _enable.  If I messed up
anything, please let me know.

Also, can you please send a follow-up patch to make the store function
reject prio enabling if the device doesn't support it?

Thanks.

-- 8< --
>From 84f95243b5439a20c33837075b88926bfa00c4ec Mon Sep 17 00:00:00 2001
From: Adam Manzanares <adam.manzana...@hgst.com>
Date: Mon, 17 Oct 2016 11:27:30 -0700
Subject: [PATCH 2/2] ata: ATA Command Priority Disabled By Default

Add a sysfs entry to turn on priority information being passed
to a ATA device. By default this feature is turned off.

This patch depends on ata: Enabling ATA Command Priorities

tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial
ata_ncq_prio_on() and open-coded the test.

Signed-off-by: Adam Manzanares <adam.manzana...@hgst.com>
Signed-off-by: Tejun Heo <t...@kernel.org>
---
 drivers/ata/libahci.c |  1 +
 drivers/ata/libata-core.c |  3 ++-
 drivers/ata/libata-scsi.c | 68 +++
 include/linux/libata.h|  2 ++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 0d028ea..ee7db31 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
 struct device_attribute *ahci_sdev_attrs[] = {
_attr_sw_activity,
_attr_unload_heads,
+   _attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ahci_sdev_attrs);
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8346faf..b294339 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -787,7 +787,8 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
 
-   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if ((dev->flags & ATA_DFLAG_NCQ_PRIO) &&
+   (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE)) {
if (class == IOPRIO_CLASS_RT)
tf->hob_nsect |= ATA_PRIO_HIGH <<
 ATA_SHIFT_PRIO;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2bccc3c..87597a3 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -271,6 +271,73 @@ DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
ata_scsi_park_show, ata_scsi_park_store);
 EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
 
+static ssize_t ata_ncq_prio_enable_show(struct device *device,
+   struct device_attribute *attr, char 
*buf)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   bool ncq_prio_enable;
+   int rc = 0;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irq(ap->lock);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (!dev) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   ncq_prio_enable = dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irq(ap->lock);
+
+   return rc ? rc : snprintf(buf, 20, "%u\n", ncq_prio_enable);
+}
+
+static ssize_t ata_ncq_prio_enable_store(struct device *device,
+struct device_attribute *attr,
+const char *buf, size_t len)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   long int input;
+   unsigned long flags;
+   int rc;
+
+   rc = kstrtol(buf, 10, );
+   if (rc)
+   return rc;
+   if ((input < 0) || (input > 1))
+   return -EINVAL;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irqsave(ap->lock, flags);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (unlikely(!dev)) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   if (input)
+   dev->flags |= ATA_DFLAG_NCQ_PRIO_ENABLE;
+   else
+   dev->flags &= ~ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irqrestore(ap->lock, flags);
+
+   return rc ? rc : len;
+}
+
+DEVICE_ATTR(ncq_prio_enable, S_IRUGO | S_IWUSR,
+   ata_ncq_prio_enable_show, ata_ncq_prio_enable_store);
+EXPORT_SYMBOL_GPL(dev_attr_ncq_prio_enable);
+
 void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd,
u8 sk, u8 asc, u8 ascq)
 {
@@ -402,6 +469,7 @@ EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
 
 struct device_attribute *ata_common_sdev_attrs[] = {
_attr_unload_heads,
+   _attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ata_common_sdev_attrs);
diff --git a/include/linux/libata.h b/includ

Re: [PATCH v6 0/3] Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo

On Mon, Oct 17, 2016 at 11:27:27AM -0700, Adam Manzanares wrote:
> This patch builds ATA commands with high priority if the iocontext of a 
> process
> is set to real time. The goal of the patch is to improve tail latencies of 
> workloads that use higher queue depths. This requires setting the iocontext 
> ioprio on the request when it is initialized
> 
> This patch has been tested with an Ultrastar HE8 HDD and cuts the 
> the p99.99 tail latency of foreground IO from 2s down to 72ms when
> using the deadline scheduler. This patch works independently of the
> scheduler so it can be used with all of the currently available 
> request based schedulers. 
> 
> Foreground IO, for the previously described results, is an async fio job 
> submitting 4K read requests at a QD of 1 to the HDD. The foreground IO is set 
> with the iopriority class of real time. The background workload is another fio
> job submitting read requests at a QD of 32 to the same HDD with default 
> iopriority.
> 
> This feature is enabled for ATA devices by setting the ata ncq_prio_on device 
> attribute to 1. An ATA device is also checked to see if the device supports 
> per
> command priority.

Applied 1-3 to libata/for-4.10 w/ some modifications.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 2/3] ata: Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo

Hello,

I removed ata_ncq_prio_enabled() as it was colliding with the sysfs
file rename and these trivial test functions tend to obscure more than
help anything.

Thanks.

-- 8< --
>From 8e061784b51ec4a4efed0deaafb5bd9725bf5b06 Mon Sep 17 00:00:00 2001
From: Adam Manzanares <adam.manzana...@hgst.com>
Date: Mon, 17 Oct 2016 11:27:29 -0700
Subject: [PATCH 1/2] ata: Enabling ATA Command Priorities

This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates
IO_PRIO_CLASS_RT then we build a tf with a high priority command.

This is done to improve the tail latency of commands that are high
priority by passing priority to the device.

tj: Removed trivial ata_ncq_prio_enabled() and open-coded the test.

Signed-off-by: Adam Manzanares <adam.manzana...@hgst.com>
Signed-off-by: Tejun Heo <t...@kernel.org>
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c |  6 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h|  3 +++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 223a770..8346faf 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  * @n_block: Number of blocks
  * @tf_flags: RW/FUA etc...
  * @tag: tag
+ * @class: IO priority class
  *
  * LOCKING:
  * None.
@@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  */
 int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
u64 block, u32 n_block, unsigned int tf_flags,
-   unsigned int tag)
+   unsigned int tag, int class)
 {
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
tf->flags |= tf_flags;
@@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf->device = ATA_LBA;
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
+
+   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if (class == IOPRIO_CLASS_RT)
+   tf->hob_nsect |= ATA_PRIO_HIGH <<
+ATA_SHIFT_PRIO;
+   }
} else if (dev->flags & ATA_DFLAG_LBA) {
tf->flags |= ATA_TFLAG_LBA;
 
@@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
}
 }
 
+static void ata_dev_config_ncq_prio(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err_mask;
+
+   err_mask = ata_read_log_page(dev,
+ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_SATA_SETTINGS,
+ap->sector_buf,
+1);
+   if (err_mask) {
+   ata_dev_dbg(dev,
+   "failed to get Identify Device data, Emask 0x%x\n",
+   err_mask);
+   return;
+   }
+
+   if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3))
+   dev->flags |= ATA_DFLAG_NCQ_PRIO;
+   else
+   ata_dev_dbg(dev, "SATA page does not support priority\n");
+
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
   char *desc, size_t desc_sz)
 {
@@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev,
ata_dev_config_ncq_send_recv(dev);
if (ata_id_has_ncq_non_data(dev->id))
ata_dev_config_ncq_non_data(dev);
+   if (ata_id_has_ncq_prio(dev->id))
+   ata_dev_config_ncq_prio(dev);
}
 
return 0;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9cceb4a..2bccc3c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "libata.h"
 #include "libata-transport.h"
@@ -1755,6 +1756,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
 {
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
+   struct request *rq = scmd->request;
+   int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
unsigned int tf_flags = 0;
u64 block;
u32 n_block;
@@ -1821,7 +1824,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
qc->nbytes = n_block * scmd->device->sector_size;
 
rc = ata_build_rw_tf(>tf, qc->dev, block, n_block,

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo

On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote:
> Add a sysfs entry to turn on priority information being passed
> to a ATA device. By default this feature is turned off.
> 
> This patch depends on ata: Enabling ATA Command Priorities
> 
> Signed-off-by: Adam Manzanares 
> ---
>  drivers/ata/libahci.c |  1 +
>  drivers/ata/libata-core.c |  2 +-
>  drivers/ata/libata-scsi.c | 68 
> +++
>  include/linux/libata.h|  7 +
>  4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 0d028ea..0e17285 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
>  struct device_attribute *ahci_sdev_attrs[] = {
>   _attr_sw_activity,
>   _attr_unload_heads,
> + _attr_ncq_prio_on,

I'll rename it to ncq_prio_enable while applying but otherwise looks
good to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Tejun Heo

Hello,

On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote:
> Patch adds an association between iocontext ioprio and the ioprio of a
> request. This is done to enable request based drivers the ability to
> act on priority information stored in the request. An example being
> ATA devices that support command priorities. If the ATA driver discovers
> that the device supports command priorities and the request has valid
> priority information indicating the request is high priority, then a high
> priority command can be sent to the device. This should improve tail
> latencies for high priority IO on any device that queues requests
> internally and can make use of the priority information stored in the
> request.
> 
> The ioprio of the request is set in blk_rq_set_prio which takes the
> request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio
> then the iopriority of the request is set as the iopriority of the ioc.
> In init_request_from_bio a check is made to see if the ioprio of the bio
> is valid and if so then the request prio comes from the bio.
> 
> Signed-off-by: Adam Manzananares 

Jens, if you're okay with it, I can route this through
libata/for-4.10, or this can be applied to block and libata tree can
pull from it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 3/4] ata: Enabling ATA Command Priorities

2016-10-13 Thread Tejun Heo

Hello,

On Thu, Oct 13, 2016 at 04:00:30PM -0700, Adam Manzanares wrote:
> This patch checks to see if an ATA device supports NCQ command priorities.
> If so and the user has specified an iocontext that indicates
> IO_PRIO_CLASS_RT then we build a tf with a high priority command.
> 
> This is done to improve the tail latency of commands that are high
> priority by passing priority to the device.
> 
> Signed-off-by: Adam Manzanares 

Looks good to me.  Once the block changes are applied, I'll pull it
into libata tree and apply the ata part on top.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Some drives failing on SCT Write Same

2016-09-09 Thread Tejun Heo

On Fri, Sep 09, 2016 at 11:44:19AM -0500, Shaun Tancheff wrote:
> Restrict support SCT Write Same to devices which also support ZAC where 
> support is required.
> 
> Reported-by: Mike Krinkin 
> Signed-off-by: Shaun Tancheff 

Applied to libata/for-4.9.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: patch "libata: Add support for SCT Write Same" breaks system

2016-09-09 Thread Tejun Heo

Hello, Shaun.

On Fri, Sep 09, 2016 at 10:26:44AM -0500, Shaun Tancheff wrote:
> I'm looking into it now. Let me see if I can reproduce this on any of my
> hardware.
> 
> If not there are a couple of options ... one is to only enable for ZBC
> devices
> where this explicitly required by the spec.
> 
> Or disable for devices that report support trim?

I'd much prefer enabling this only on ZBC devices.  There isn't any
real benefits to !ZBC devices, right?  Using non-essential features on
ATA never goes well.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: patch "libata: Add support for SCT Write Same" breaks system

2016-09-09 Thread Tejun Heo

Hello,

On Thu, Sep 08, 2016 at 10:27:37PM +0300, Mike Krinkin wrote:
> Hi,
> 
> i tried recent linux-next on my laptop, and after boot system is
> almost unusable because most of apps just crash with segfaults and
> in dmesg output there are a lot of errors like this:
...
> git bisect points at commit 7b20309428598df00ffe ("libata: Add support for SCT
> Write Same". I temporary fixed problem with the following change:

Shaun, any ideas?  If this isn't an easy fix, I'm gonna disable
write_same for all devices for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] fusion: mptbase: Remove deprecated create_singlethread_workqueue

2016-08-31 Thread Tejun Heo

On Wed, Aug 31, 2016 at 12:33:25AM +0530, Bhaktipriya Shridhar wrote:
> The workqueues "ioc->reset_work_q" and "ioc->fw_event_q" queue a single
> work item >fault_reset_work and _event->work, respectively and
> hence don't require ordering. Hence, they have been converted to use
> alloc_workqueue().
> 
> The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
> memory pressure since the workqueue belongs to a storage driver which is
> being used on a memory reclaim path.
> 
> Since there are fixed number of work items, explicit concurrency
> limit is unnecessary here.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] fusion/mptfc: Remove deprecated create_singlethread_workqueue

2016-08-31 Thread Tejun Heo

On Wed, Aug 31, 2016 at 12:33:05AM +0530, Bhaktipriya Shridhar wrote:
> The workqueue "fc_rescan_work_q" queues multiple work items viz
> >fc_rescan_work, >fc_lsc_work, >fc_setup_reset_work,
> which require strict execution ordering.
> Hence, an ordered dedicated workqueue has been used.
> 
> WQ_MEM_RECLAIM has been set since the workqueue is belongs to a storage
> driver which is being used on a memory reclaim path and hence, requires
> forward progress under memory pressure.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to make dynamically allocated devt the default for scsi disks?

2016-08-13 Thread Tejun Heo

Hello, Dan.

On Fri, Aug 12, 2016 at 02:29:30PM -0700, Dan Williams wrote:
> Before spending effort trying to flush the destruction of old bdi
> instances before new ones are registered, is it rather time to
> complete the conversion of sd to only use dynamically allocated devt?

I think that probably would be too disruptive to userland, both
programs and humans, that has been seeing the 8:N numbers forever now.
I haven't been following this particular issue but in general I think
we should switch to the regular "a depdendee object is pinned till all
the dependents are gone" pattern.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices

2016-08-10 Thread Tejun Heo

Hello, Tom.

On Wed, Aug 10, 2016 at 04:32:39PM +0800, Tom Yan wrote:
> I have to admit that libata may not be the right place to deal with my
> concern over the current BLK_DEF_MAX_SECTORS, which seems non-sensical
> to me. In the original commit message:
> 
> (d2be537c3ba3, "block: bump BLK_DEF_MAX_SECTORS to 2560")
> "A value of 2560 (1280k) will accommodate a 10-data-disk stripe write
> with chunk size 128k...a value of 1280 does not show a big performance
> difference from 512, but will hopefully help software RAID setups
> using SATA disks, as reported by Christoph."
> 
> So I have no idea at all why the bump was allowed. What so special

In general, people are okay with upping the limits as long as there
are actual use cases which can benefit.  The devices are getting
constantly faster and the hardware limits are going up along with it
after all.

> about "10-data-disk stripe write with chunk size 128k", that we would
> want to make a block layer general default base on that?

It's not special.  It's just an actual use case which can benefit from
the raised limit.

> The macro appeared to be used by the aoeblk driver only. Yet since a
> later commit (ca369d51b3e1, "block/sd: Fix device-imposed transfer
> length limits"), the scsi disk driver will use it to set the effective
> max_sectors(_kb) for devices that does not report Optimal Transfer
> Length in the Block Limit VPD (which is the case of libata's SATL).
>
> So the consequence is, ATA drives with max_hw_sectors(_kb) (i.e.
> dev->max_sectors set in libata-core.c) set to higher than
> BLK_DEF_MAX_SECTORS will end up having it as the effective
> max_sectors. Not only the value is non-sensical, but also the logic.
> (Btw, ATA_HORKAGE_MAX_SEC_LBA48 will sort of become ineffective
> because of this.)
>
> Now let's just come back to libata. I've thought of reporting
> dev->max_sectors as Optimal Transfer Length in the SATL. However, I am
> not sure if it is a safe thing to do, because we set it as high as
> 65535 for devices with LBA48 devices. Does such a high max_sectors
> ever make sense in libata's case?

libata is just exposing whatever the underlying hardware supports.
Whether that should be exposed as optimal transfer length is a
different matter.

> That's why I took the approach of the USB Attached SCSI driver. That
> is, we only explicitly set max_hw_sectors in cases that is strictly
> necessary (e.g. no LBA48 support, horkages). Otherwise we'll leave
> request queue as-is.

I see.  Alright, let's do that.  Can you please respin the patch with
more detailed description?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tejun Heo

Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:
> On 10 August 2016 at 11:26, Tejun Heo <t...@kernel.org> wrote:
> > Hmmm.. why not?  The hardware limit is 64k and the driver is using a
> 
> Is that referring to the maximum number of entries allowed in the
> PRDT, Physical Region Descriptor Table (which is, more precisely,
> 65535)?

Yeap.

> > Not necessarily.  A single sg entry can point to an area larger than
> > PAGE_SIZE.
> 
> You mean the 4MB limit of "Data Byte Count" in "DW3: Description
> Information" of the PRDT? Is that what max_segment_size (which is set
> to a general fallback of 65536:
> http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
> in this case?

Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.

> And my point was, it will be a multiple of 168 anyway, if 1344 is just
> an example.
> 
> > As written above, that probably makes the ahci command table size
> > nicely aligned.
> 
> I think that's what bothers me ultimately, cause I don't see how 168
> makes it (more) nicely aligned (or even, aligned to what?).

Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices

2016-08-09 Thread Tejun Heo

Hello,

On Tue, Aug 09, 2016 at 10:45:47PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Currently block layer limit max_hw_sectors is set to
> ATA_MAX_SECTORS_LBA48 (65535), for devices with LBA48 support.
> 
> However, block layer limit max_sectors (which is the effective
> one; also adjustable, upper-bounded by max_hw_sectors) is set to
> BLK_DEF_MAX_SECTORS (currently 2560) by the scsi disk driver,
> since libata's SATL does not report an Optimal Transfer Length.
> 
> This does not make much sense, especially when the current
> BLK_DEF_MAX_SECTORS appears to be unsafe for some ATA devices
> (see ATA_HORKAGE_MAX_SEC_1024). Truth is, the current value
> appears to be arbitrary anyway. See commit d2be537c3ba3
> ("block: bump BLK_DEF_MAX_SECTORS to 2560").
> 
> Therefore, avoid setting dev->max_sectors when it is strictly
> necessary. Leave it as 0 otherwise, so that both block layer
> limits will remain as SCSI_DEFAULT_MAX_SECTORS (currently 1024).

These changes feel rather gratuitous.  What's the upside here?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-09 Thread Tejun Heo

Hello, Tom.

On Sun, Aug 07, 2016 at 10:10:17PM +0800, Tom Yan wrote:
> So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
> (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
> devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.
> 
> However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
> the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
> been set to a weird value of 168 (with a comment "hardware max is
> 64K", which neither seem to make any sense).

Hmmm.. why not?  The hardware limit is 64k and the driver is using a
lower limit of 168 most likely because it doesn't make noticeable
difference beyond certain point and it determines the size of
contiguous memory which has to be allocated for the command table.
Each sg entry is 16 bytes.  Pushing it to the hardware limit would
require an order 9 allocation for each port.

> AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
> apparently), which is apparently used to derive the actual "request
> size" (that is, if it is lower than max_sectors(_kb), it will be the
> limiting factor instead).
>
> For example, no matter if the drive has max_sectors set to 2560, or to
> 65535 (by adding it as the Optimal Transfer Length to libata's SATL,
> which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
> "avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.

> However, if I change AHCI_MAX_SG to 128 (which is also the
> sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
> `iostat` will be capped at 1024 (128 * 8), which should make
> ATA_HORKAGE_MAX_SEC_1024 unnecessary.
> 
> So why has AHCI_MAX_SG been set to 168 anyway?

As written above, that probably makes the ahci command table size
nicely aligned.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 5/5] libata-scsi: fix MODE SELECT translation for Control mode page

2016-08-09 Thread Tejun Heo

On Fri, Jul 22, 2016 at 02:41:54AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> scsi_done() was called repeatedly and apparently because of that,
> the kernel would call trace when we touch the Control mode page:
> 
> Call Trace:
>  [] dump_stack+0x63/0x81
>  [] __warn+0xcb/0xf0
>  [] warn_slowpath_null+0x1d/0x20
>  [] ata_eh_finish+0xe0/0xf0 [libata]
>  [] sata_pmp_error_handler+0x640/0xa50 [libata]
>  [] ahci_error_handler+0x1d/0x70 [libahci]
>  [] ata_scsi_port_error_handler+0x430/0x770 [libata]
>  [] ? ata_scsi_cmd_error_handler+0xdd/0x160 [libata]
>  [] ata_scsi_error+0xa7/0xf0 [libata]
>  [] scsi_error_handler+0xaa/0x560 [scsi_mod]
>  [] ? scsi_eh_get_sense+0x180/0x180 [scsi_mod]
>  [] kthread+0xd8/0xf0
>  [] ret_from_fork+0x1f/0x40
>  [] ? kthread_worker_fn+0x170/0x170
> ---[ end trace 8b7501047e928a17 ]---
> 
> Removed the unnecessary code and let ata_scsi_translate() do the job.
> 
> Also, since ata_mselect_control() has no ATA command to send to the
> device, ata_scsi_mode_select_xlat() should return 1 for it, so that
> ata_scsi_translate() will finish early to avoid ata_qc_issue().
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.9.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend v3 3/5] libata-scsi: use u8 array to store mode page copy

2016-08-09 Thread Tejun Heo

On Sat, Jul 23, 2016 at 02:34:08AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> ata_mselect_*() would initialize a char array for storing a copy of
> the current mode page. However, char could be signed char. In that
> case, bytes larger than 127 would be converted to negative number.
> 
> For example, 0xff from def_control_mpage[] would become -1. This
> prevented ata_mselect_control() from working at all, since when it
> did the read-only bits check, there would always be a mismatch.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.9.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 5/5] libata-scsi: fix MODE SELECT translation for Control mode page

2016-07-25 Thread Tejun Heo

On Fri, Jul 22, 2016 at 05:50:18AM +0800, Tom Yan wrote:
> As I've mentioned in the comment/message, there is no ATA command
> needed to be sent to the device, since it only toggles a bit in
> dev->flags. See that there is no ata_taskfile constructed in
> ata_mselect_control().

But ata_mselect_caching() contructs a tf?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 3/5] libata-scsi: fix overflow in mode page copy

2016-07-21 Thread Tejun Heo

On Fri, Jul 22, 2016 at 05:39:27AM +0800, Tom Yan wrote:
> Let me know how I should polish the description for this.

The above is because the signed ones are getting sign-extended making
them different from the unsigned ones which don't get padded in the
high bits.  Converting to u8 is the right thing to do but nothing here
is getting truncated.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 5/5] libata-scsi: fix MODE SELECT translation for Control mode page

2016-07-21 Thread Tejun Heo

On Fri, Jul 22, 2016 at 02:41:54AM +0800, tom.t...@gmail.com wrote:
> @@ -3854,6 +3852,8 @@ static unsigned int ata_scsi_mode_select_xlat(struct 
> ata_queued_cmd *qc)
>   if (ata_mselect_control(qc, p, pg_len, ) < 0) {
>   fp += hdr_len + bd_len;
>   goto invalid_param;
> + } else {
> + goto skip; /* No ATA command to send */

Hmmm... I'm a bit confused.  Why is mselect_control path different
from mselect_caching in terms of qc handling?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 3/5] libata-scsi: fix overflow in mode page copy

2016-07-21 Thread Tejun Heo

Hello,

On Fri, Jul 22, 2016 at 02:41:52AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> ata_mselect_*() would initialize a char array for storing a copy of
> the current mode page. However, if char was actually signed char,
> overflow could occur.

Do you mean sign extension?

> For example, `0xff` from def_control_mpage[] would be "truncated"
> to `-1`. This prevented ata_mselect_control() from working at all,
> since when it did the read-only bits check, there would always be
> a mismatch.

Heh, the description doesn't really make sense.  Are you talking about
something like the following?

char ar[N];
int i;

i = ar[x];
if (i == 0xff)
asdf;

If so, the description isn't quite right.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] libata-scsi: fix read-only bits checking in ata_mselect_*()

2016-07-20 Thread Tejun Heo

So, just reverted this patch.

On Wed, Jul 20, 2016 at 06:59:23AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Commit 7780081c1f04 ("libata-scsi: Set information sense field for
> invalid parameter") changed how ata_mselect_*() make sure read-only
> bits are not modified. The new implementation introduced a bug that
> the read-only bits in the byte that has a changeable bit will not
> be checked.
> 
> Added the necessary check, with comments explaining the heuristic.
> 
> Signed-off-by: Tom Yan 
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 06afe63..b47c3ce 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3617,8 +3617,18 @@ static int ata_mselect_caching(struct ata_queued_cmd 
> *qc,
>*/
>   ata_msense_caching(dev->id, mpage, false);
>   for (i = 0; i < CACHE_MPAGE_LEN - 2; i++) {
> - if (i == 0)
> - continue;
> + /* Check the first byte */
> + if (i == 0) {
> + /* except the WCE bit */
> + if (mpage[i + 2] & 0xfb != buf[i] & 0xfb) {

This not only triggered compiler warning but is actually wrong.  The
above is

mpage[i + 1] & (0xfb != buf[i]) & 0xfb

which is non-sensical.  So, this patch wasn't tested at all.  Not even
compile test.  Please don't do this.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] libata-scsi: fix read-only bits checking in ata_mselect_*()

2016-07-20 Thread Tejun Heo

On Wed, Jul 20, 2016 at 06:59:23AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Commit 7780081c1f04 ("libata-scsi: Set information sense field for
> invalid parameter") changed how ata_mselect_*() make sure read-only
> bits are not modified. The new implementation introduced a bug that
> the read-only bits in the byte that has a changeable bit will not
> be checked.
> 
> Added the necessary check, with comments explaining the heuristic.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

This thread was confusing to look at.  Please use the "n/N" sequence
tagging for the patches (the -n option of git-format-patch).

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] libata-scsi: better style in ata_msense_*()

2016-07-20 Thread Tejun Heo

On Wed, Jul 20, 2016 at 04:39:28AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> `changeable` is the "version" of mode page requested by the user.
> It will be less confusing/misleading if we do not check it
> "together" with the setting bits of the drive.
> 
> Not to mention that we currently have ata_mselect_*() implemented
> in a way that each of them will serve exclusively a particular bit
> on each page. The old style will hence make the condition look even
> more unnecessarily arcane if the ata_msense_*() is reflecting more
> than one bit.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] libata-scsi: minor cleanup in ata_mselect_*()

2016-07-20 Thread Tejun Heo

On Wed, Jul 20, 2016 at 05:11:50AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> 1. Removed a repeated bit masking in ata_mselect_control()
> 2. Moved `wce`/`d_sense` assignment below the page validity checks
> 3. Added/Removed empty lines where appropriate
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fluctuating "acceptance" on requested data length in SATA/AHCI

2016-07-14 Thread Tejun Heo

On Fri, Jul 15, 2016 at 03:10:40AM +0800, Tom Yan wrote:
> I hadn't been able to "locate" a sensible and/or solid point where the
> fluctuation start, but I did notice that the chance of "failure" would
> rise with the requested length. Also, if I boot with
> `libata.force=noncq`, the failure seems to start off from a lower
> point.
> 
> So is that an expected characteristic of SATA/AHCI? Or does that hint
> that there is something wrong in the kernel? And btw, should any block
> limit be set according to this? For example, max_sectors?

Host controllers and drivers have constraints other than max transfer
size such as max scatter gather table entries (how many disconnected
sections can be transferred on one command) and DMA boundary (the
memory boundary that a single DMA transfer can't cross), so if you go
to the high command size, it isn't surprising that commands of the
same length succeed sometimes while not on others.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/3] ata: bump ATA_MAX_SECTORS_LBA48 to 65536

2016-07-14 Thread Tejun Heo

On Fri, Jul 15, 2016 at 02:22:52AM +0800, Tom Yan wrote:
> On 14 July 2016 at 01:03, Tejun Heo <t...@kernel.org> wrote:
> >
> > It's used to device max_sectors.
> >
> 
> Not really. "max_sectors" of ATA drives have been set to
> BLK_DEF_MAX_SECTORS, for at least quite some time.
> 
> >
> > This is way too gratuitous.  There's no rationale for it while it has
> > actual real-world risks.  max_sectors is staying at 65535.
> >
> 
> That's alright. I don't have strong opinion on whether we should bump
> this anyway. So I suppose we should replace the TODO comment with
> something like "avoid count to be h"? And maybe also change
> ATA_MAX_SECTORS to 255 (with comment "avoid count to be 00h")?

I'd just leave both alone.  It's one sector difference one way or the
other and those numbers have a pretty long history.  There's no reason
to disturb them at this point.  There's no actual gain whatsoever but
there is some actual risk.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/3] ata: bump ATA_MAX_SECTORS_LBA48 to 65536

2016-07-13 Thread Tejun Heo

Hello,

On Wed, Jul 13, 2016 at 12:47:06PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> ATA_MAX_SECTORS_LBA48 is only used for setting the queue limit
> "max_hw_sectors", which only serves as the cap for "max_sectors",
> which is in turn the actual limit being used. Therefore, it should

It's used to device max_sectors.

> be alright for us to bump ATA_MAX_SECTORS_LBA48 to 65536. Also,
> lba_48_ok() has been accepting the number of blocks to be 65536.

This is way too gratuitous.  There's no rationale for it while it has
actual real-world risks.  max_sectors is staying at 65535.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 3/3] libata-scsi: add optimal transfer length to block limits VPD

2016-07-13 Thread Tejun Heo

On Wed, Jul 13, 2016 at 12:47:08PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> As of commit 6b7e9cde4969 ("sd: Fix rw_max for devices that report
> an optimal xfer size"), the scsi disk driver (correctly) derive both
> of the queue limits "io_opt" and "max_sectors" from the optimal
> transfer length field.
> 
> In case we would like the two limits to be derived from a value
> other than BLK_DEF_MAX_SECTORS for ATA disks in the future, this
> patch has made it easy.

What's the actual impact of this patch at this point?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 2/2] libata-scsi: avoid repeated calculation of number of TRIM ranges

2016-07-12 Thread Tejun Heo

On Wed, Jul 13, 2016 at 04:31:23AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Currently libata statically allows only 1-block (512-byte) payload
> for each TRIM command. Each payload can carry 64 TRIM ranges since
> each range requires 8 bytes.
> 
> It is silly to keep doing the calculation (512 / 8) in different
> places. Hence, define the new ATA_MAX_TRIM_RNUM for the result.
> 
> Signed-off-by: Tom Yan 

Applied 1-2 to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/2] libata-scsi: reject WRITE SAME (16) with n_block that exceeds limit

2016-07-12 Thread Tejun Heo

On Thu, Jul 07, 2016 at 01:19:05AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Currently if a WRITE SAME (16) command is issued to the SATL with
> "number of blocks" that is larger than the "Maximum write same length"
> (which is the maximum number of blocks per TRIM command allowed in
> libata, currently 65535 * 512 / 8 blocks), the SATL will accept the
> command and translate it to a TRIM command with the upper limit.
> 
> However, according to SBC (as of sbc4r11.pdf), the "device server"
> should terminate the command with "Invalid field in CDB" in that case.
> 
> Signed-off-by: Tom Yan 
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index bfec66f..a1f061a 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3305,7 +3305,11 @@ static unsigned int ata_scsi_write_same_xlat(struct 
> ata_queued_cmd *qc)
>   goto invalid_param_len;
>  
>   buf = page_address(sg_page(scsi_sglist(scmd)));
> - size = ata_set_lba_range_entries(buf, 512, block, n_block);
> +
> + if (n_block <= 65535 * 512 / 8)
> + size = ata_set_lba_range_entries(buf, 512, block, n_block);
> + else
> + goto invalid_fld;

This triggers compiler warning about @fp used w/o initializing it.  I
reverted the patch.  Can you please update the patch?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/2] libata-scsi: avoid repeated calculation of number of TRIM ranges

2016-07-12 Thread Tejun Heo

Hello,

On Thu, Jul 07, 2016 at 01:19:06AM +0800, tom.t...@gmail.com wrote:
> @@ -1071,7 +1072,7 @@ static inline unsigned ata_set_lba_range_entries(void 
> *_buffer,
>   __le64 *buffer = _buffer;
>   unsigned i = 0, used_bytes;
>  
> - while (i < buf_size / 8 ) { /* 6-byte LBA + 2-byte range per entry */
> + while (i < buf_size) {

Please rename @buf_size.  It's confusing for a variable named size to
contain something other than bytes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] libata-scsi: rename ata_msense_ctl_mode() to ata_msense_control()

2016-07-12 Thread Tejun Heo

On Wed, Jul 13, 2016 at 02:54:12AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> To make it consistent with the recently added ata_mselect_control().
> We probably shouldn't have the word "mode" in its name anyway, since
> that's not the case for other ata_msense_*() / ata_mselect_*() either.
> 
> Signed-off-by: Tom Yan 

Applied 1-2 to libata/for-4.8.

In the future, can you please link patches in the same series?  If
there are only several, either chain-replying or replying to the first
patch is fine.  If there are more, people usually write up a head
message and makes all patches replies to that message.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] libata-scsi: do not return designator for serial number

2016-07-12 Thread Tejun Heo

On Thu, Jul 07, 2016 at 06:12:12AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> SAT (as of sat4r05f.pdf) does not require this vendor specific
> designator. Besides, we already have the Unit Serial Number VPD.
> 
> Signed-off-by: Tom Yan 
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index bfec66f..9f478ad 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2210,14 +2210,6 @@ static unsigned int ata_scsiop_inq_83(struct 
> ata_scsi_args *args, u8 *rbuf)
>   rbuf[1] = 0x83; /* this page code */
>   num = 4;
>  
> - /* piv=0, assoc=lu, code_set=ACSII, designator=vendor */
> - rbuf[num + 0] = 2;
> - rbuf[num + 3] = ATA_ID_SERNO_LEN;
> - num += 4;
> - ata_id_string(args->id, (unsigned char *) rbuf + num,
> -   ATA_ID_SERNO, ATA_ID_SERNO_LEN);
> - num += ATA_ID_SERNO_LEN;
> -

If it isn't actually broken, I don't want to change it.  It's a
userland visible change which is gratuitous.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] libata-scsi: better style in ata_msense_caching()

2016-07-12 Thread Tejun Heo

On Wed, Jul 13, 2016 at 12:20:40AM +0800, Tom Yan wrote:
> First of all "changeable" is the "version" of mode page requested by
> the user, while ata_id_*_enabled(id) are the value of setting reported
> by the device. I think it's ugly and confusing to check values of
> totally different nature "together".
> 
> The worse thing is, since we have implemented MODE SELECT /
> ata_mselect_caching() to serve exclusively the write cache feature,
> the difference in the two if-conditions made the code look even more
> arcane.
> 
> In my version, it is clear that when the user request the changeable
> mode page, the value for the WCE bit will always be "1" / "y", since
> we've made (only) it changeable in ata_mselect_caching(); and when the
> user request the current / default page, the values reflect the
> settings from the drive.

Can you please put the rationale in the patch description and repost?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 2/2] libata-scsi: correct SPC version descriptor

2016-07-12 Thread Tejun Heo

On Tue, Jul 12, 2016 at 09:29:35PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> The comment suggests we should be having an SPC-3 version descriptor
> but the 0260h is the code for "SPC-2 (no version claimed)". Correct
> it to 0300h so that it has the "SPC-3 (no version claimed)" descriptor.
> 
> Note that we are claiming SPC-3 version compatibility in the VERSION
> field of the standard INQUIRY data. Therefore, I assume the typo was
> on the code but not on the comment.
> 
> Signed-off-by: Tom Yan 

Applied 1-2 to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] libata-scsi: better style in ata_msense_caching()

2016-07-12 Thread Tejun Heo

On Tue, Jul 12, 2016 at 09:28:23PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Signed-off-by: Tom Yan 
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index bfec66f..6f7c626 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2424,10 +2424,13 @@ static void modecpy(u8 *dest, const u8 *src, int n, 
> bool changeable)
>  static unsigned int ata_msense_caching(u16 *id, u8 *buf, bool changeable)
>  {
>   modecpy(buf, def_cache_mpage, sizeof(def_cache_mpage), changeable);
> - if (changeable || ata_id_wcache_enabled(id))
> - buf[2] |= (1 << 2); /* write cache enable */
> - if (!changeable && !ata_id_rahead_enabled(id))
> - buf[12] |= (1 << 5);/* disable read ahead */
> + if (changeable) {
> + buf[2] |= 1 << 2; /* ata_mselect_caching() */
> + }
> + else {
> + buf[2] |= ata_id_wcache_enabled(id) << 2;   /* write cache 
> enable */
> + buf[12] |= !ata_id_rahead_enabled(id) << 5; /* disable read 
> ahead */
> + }

Again, why is this better?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/2] libata-scsi: fix SET FEATURES "filtering" for ata_msense_caching()

2016-07-12 Thread Tejun Heo

On Tue, Jul 12, 2016 at 09:37:02PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Without this fix, the DRA bit of the caching mode page would not
> be updated when the read look-ahead feature is toggled (e.g. with
> `smartctl --set`), but will only be until, for example, the write
> cache feature is touched.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] libata-scsi: better style in ata_msense_caching()

2016-07-12 Thread Tejun Heo

On Tue, Jul 12, 2016 at 09:37:03PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Signed-off-by: Tom Yan 
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index bfec66f..48ea887 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2424,10 +2424,12 @@ static void modecpy(u8 *dest, const u8 *src, int n, 
> bool changeable)
>  static unsigned int ata_msense_caching(u16 *id, u8 *buf, bool changeable)
>  {
>   modecpy(buf, def_cache_mpage, sizeof(def_cache_mpage), changeable);
> - if (changeable || ata_id_wcache_enabled(id))
> - buf[2] |= (1 << 2); /* write cache enable */
> - if (!changeable && !ata_id_rahead_enabled(id))
> - buf[12] |= (1 << 5);/* disable read ahead */
> + if (changeable) {
> + buf[2] |= 1 << 2; /* ata_mselect_caching() */
> + } else {
> + buf[2] |= ata_id_wcache_enabled(id) << 2;   /* write cache 
> enable */
> + buf[12] |= !ata_id_rahead_enabled(id) << 5; /* disable read 
> ahead */
> + }

It's different but I'm not sure this is better.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] libata-scsi: set correct VERSION field for ZAC devices

2016-07-12 Thread Tejun Heo

On Tue, Jul 12, 2016 at 10:12:01PM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> Commit 856c46639309 ("libata: support device-managed ZAC devices")
> had the line that "bumps" the VERSION field in standard INQUIRY data
> removed. Add it back and claim SPC-5 version compatibility, which
> matches with the current version descriptor "SPC-5 (no version claimed)"
> that is used for ZAC devices.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 1/2] libata-scsi: do not respond with "invalid field" for FORMAT UNIT

2016-07-06 Thread Tejun Heo

On Thu, Jul 07, 2016 at 01:13:08AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> It does not make sense and is confusing to respond with "Invalid
> field in CDB" while we have no support at all implemented for
> FORMAT UNIT. It is decent to let it go to the default, which
> will respond with "Invalid command operation code" instead.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH resend 2/2] libata-scsi: correct cbd to CDB in comment

2016-07-06 Thread Tejun Heo

On Thu, Jul 07, 2016 at 01:13:09AM +0800, tom.t...@gmail.com wrote:
> From: Tom Yan 
> 
> It's Command Descriptor Block. Also capitalized it.
> 
> Signed-off-by: Tom Yan 

Applied to libata/for-4.8.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 >

1 - 100 of 637 matches

Mail list logo