[PATCH] target: fix potential memory leak in option parsing

2018-04-16 Thread Chengguang Xu
When specifying same string type option several times,
current option parsing will cause memory leak. Hence,
call kfree for previous one in this case.

Signed-off-by: Chengguang Xu 
---
 drivers/target/target_core_configfs.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/target/target_core_configfs.c 
b/drivers/target/target_core_configfs.c
index 3f4bf12..ee1a3a8 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -1659,6 +1659,7 @@ static ssize_t target_pr_res_aptpl_metadata_store(struct 
config_item *item,
token = match_token(ptr, tokens, args);
switch (token) {
case Opt_initiator_fabric:
+   kfree(i_fabric);
i_fabric = match_strdup(args);
if (!i_fabric) {
ret = -ENOMEM;
@@ -1666,6 +1667,7 @@ static ssize_t target_pr_res_aptpl_metadata_store(struct 
config_item *item,
}
break;
case Opt_initiator_node:
+   kfree(i_port);
i_port = match_strdup(args);
if (!i_port) {
ret = -ENOMEM;
@@ -1680,6 +1682,7 @@ static ssize_t target_pr_res_aptpl_metadata_store(struct 
config_item *item,
}
break;
case Opt_initiator_sid:
+   kfree(isid);
isid = match_strdup(args);
if (!isid) {
ret = -ENOMEM;
@@ -1737,6 +1740,7 @@ static ssize_t target_pr_res_aptpl_metadata_store(struct 
config_item *item,
 * PR APTPL Metadata for Target Port
 */
case Opt_target_fabric:
+   kfree(t_fabric);
t_fabric = match_strdup(args);
if (!t_fabric) {
ret = -ENOMEM;
@@ -1744,6 +1748,7 @@ static ssize_t target_pr_res_aptpl_metadata_store(struct 
config_item *item,
}
break;
case Opt_target_node:
+   kfree(t_port);
t_port = match_strdup(args);
if (!t_port) {
ret = -ENOMEM;
-- 
1.8.3.1



Re: MPT Fusion - ext4 delayed allocation failed errors on 4.14!

2018-04-16 Thread Martin K. Petersen

Nikola,

> thanks for explanation. but disabling write same for now is safe,
> right?

I was hoping we'd be able to disable it for SATA devices only.

> root@siv-70140:~ # sg_vpd /dev/sda
> Supported VPD pages VPD page:
>   Supported VPD pages [sv]
>   Device identification [di]
>   Supported VPD pages [sv]
>   Supported VPD pages [sv]

However, it doesn't look like we have much to work with since there is
no ATA Information VPD page exposed.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-16 Thread Kees Cook
On Mon, Apr 16, 2018 at 1:44 PM, Kees Cook  wrote:
> On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook  wrote:
>> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook  wrote:
>>> After fixing up some build issues in the middle of the 4.16 cycle, I
>>> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
>>> 'for-4.16/block'"). Instead of letting the test run longer, I'm going
>>> to switch to doing several shorter test boots per kernel and see if
>>> that helps. One more bisect coming...
>>
>> Okay, so I can confirm the bisect points at the _merge_ itself, not a
>> specific patch. I'm not sure how to proceed here. It looks like some
>> kind of interaction between separate trees? Jens, do you have
>> suggestions on how to track this down?
>
> Turning off HARDENED_USERCOPY and turning on KASAN, I see the same report:
>
> [   38.274106] BUG: KASAN: slab-out-of-bounds in _copy_to_user+0x42/0x60
> [   38.274841] Read of size 22 at addr 8800122b8c4b by task smartctl/1064
> [   38.275630]
> [   38.275818] CPU: 2 PID: 1064 Comm: smartctl Not tainted 4.17.0-rc1-ARCH+ 
> #266
> [   38.276631] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   38.277690] Call Trace:
> [   38.277988]  dump_stack+0x71/0xab
> [   38.278397]  ? _copy_to_user+0x42/0x60
> [   38.278833]  print_address_description+0x6a/0x270
> [   38.279368]  ? _copy_to_user+0x42/0x60
> [   38.279800]  kasan_report+0x243/0x360
> [   38.280221]  _copy_to_user+0x42/0x60
> [   38.280635]  sg_io+0x459/0x660
> ...
>
> Though we get slightly more details (some we already knew):
>
> [   38.301330] Allocated by task 329:
> [   38.301734]  kmem_cache_alloc_node+0xca/0x220
> [   38.302239]  scsi_mq_init_request+0x64/0x130 [scsi_mod]
> [   38.302821]  blk_mq_alloc_rqs+0x2cf/0x370
> [   38.303265]  blk_mq_sched_alloc_tags.isra.4+0x7d/0xb0
> [   38.303820]  blk_mq_init_sched+0xf0/0x220
> [   38.304268]  elevator_switch+0x17a/0x2c0
> [   38.304705]  elv_iosched_store+0x173/0x220
> [   38.305171]  queue_attr_store+0x72/0xb0
> [   38.305602]  kernfs_fop_write+0x188/0x220
> [   38.306049]  __vfs_write+0xb6/0x330
> [   38.306436]  vfs_write+0xe9/0x240
> [   38.306804]  ksys_write+0x98/0x110
> [   38.307181]  do_syscall_64+0x6d/0x1d0
> [   38.307590]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   38.308142]
> [   38.308316] Freed by task 0:
> [   38.308652] (stack is not available)
> [   38.309060]
> [   38.309243] The buggy address belongs to the object at 8800122b8c00
> [   38.309243]  which belongs to the cache scsi_sense_cache of size 96
> [   38.310625] The buggy address is located 75 bytes inside of
> [   38.310625]  96-byte region [8800122b8c00, 8800122b8c60)

With a hardware watchpoint, I've isolated the corruption to here:

bfq_dispatch_request+0x2be/0x1610:
__bfq_dispatch_request at block/bfq-iosched.c:3902
3900if (rq) {
3901inc_in_driver_start_rq:
3902bfqd->rq_in_driver++;
3903start_rq:
3904rq->rq_flags |= RQF_STARTED;
3905}

Through some race condition(?), rq_in_driver is also sense_buffer, and
it can get incremented. Specifically, I am doing:

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 25c14c58385c..f50d5053d732 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -6,6 +6,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 

@@ -428,6 +430,18 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
}
 }

+static void sample_hbp_handler(struct perf_event *bp,
+   struct perf_sample_data *data,
+   struct pt_regs *regs)
+{
+printk(KERN_INFO "sense_buffer value is changed\n");
+dump_stack();
+printk(KERN_INFO "Dump stack from sample_hbp_handler\n");
+}
+
+struct perf_event * __percpu *sample_hbp;
+int ok_to_go;
+
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 bool run_queue, bool async)
 {
@@ -435,6 +449,16 @@ void blk_mq_sched_insert_request(struct request
*rq, bool at_head,
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+   struct perf_event_attr attr;
+   struct scsi_request *req = scsi_req(rq);
+
+   if (ok_to_go) {
+   hw_breakpoint_init(&attr);
+   attr.bp_addr = (uintptr_t)&(req->sense);
+   attr.bp_len = HW_BREAKPOINT_LEN_8;
+   attr.bp_type = HW_BREAKPOINT_W;
+   sample_hbp = register_wide_hw_breakpoint(&attr,
sample_hbp_handler, NULL);
+   }

/* flush rq in flush machinery need to be dispatched directly */
if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
@@ -461,6 +485,9 @@ void blk_mq_sched_insert_request(struct request
*rq, bool at_head,
 run:
if (run_queue)
blk_mq_run_hw_queue(hctx

[PATCH 1/3] sd_zbc: Change the type of the ZBC fields into u32

2018-04-16 Thread Bart Van Assche
This patch does not change any functionality but makes it clear
that it is on purpose that these fields are 32 bits wide.

Signed-off-by: Bart Van Assche 
Cc: Damien Le Moal 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 drivers/scsi/sd.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 0d663b5e45bb..392c7d078ae3 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -74,12 +74,12 @@ struct scsi_disk {
struct gendisk  *disk;
struct opal_dev *opal_dev;
 #ifdef CONFIG_BLK_DEV_ZONED
-   unsigned intnr_zones;
-   unsigned intzone_blocks;
-   unsigned intzone_shift;
-   unsigned intzones_optimal_open;
-   unsigned intzones_optimal_nonseq;
-   unsigned intzones_max_open;
+   u32 nr_zones;
+   u32 zone_blocks;
+   u32 zone_shift;
+   u32 zones_optimal_open;
+   u32 zones_optimal_nonseq;
+   u32 zones_max_open;
 #endif
atomic_topeners;
sector_tcapacity;   /* size in logical blocks */
-- 
2.16.3



[PATCH 0/3] sd_zbc: Avoid that resetting a zone fails sporadically

2018-04-16 Thread Bart Van Assche
Hello Martin,

This patch series, when combined with the block layer patch that prevents
that the report and reset zone ioctls are executed while a queue is frozen,
prevents that these ioctls fail sporadically with -EIO. Since this race is
relatively easy to trigger with the SMR support for fio that I'm working on,
please consider this patch series for kernel v4.17.

Thanks,

Bart.

Bart Van Assche (3):
  sd_zbc: Change the type of the ZBC fields into u32
  sd_zbc: Let the SCSI core handle ILLEGAL REQUEST / ASC 0x21
  sd_zbc: Avoid that resetting a zone fails sporadically

 drivers/scsi/sd.h  |  12 ++--
 drivers/scsi/sd_zbc.c  | 150 +++--
 include/linux/blkdev.h |   5 ++
 3 files changed, 93 insertions(+), 74 deletions(-)

-- 
2.16.3



[PATCH 3/3] sd_zbc: Avoid that resetting a zone fails sporadically

2018-04-16 Thread Bart Van Assche
Since SCSI scanning occurs asynchronously, since sd_revalidate_disk()
is called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones().
That can cause these functions to fail with -EIO because
sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it
to the actual value, even if no drive characteristics have changed.
Avoid that this can happen by making the following changes:
- Protect the code that updates zone information with blk_queue_enter()
  and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
  these functions do not modify struct scsi_disk before all zone
  information has been obtained.

Note: since commit 055f6e18e08f ("block: Make q_usage_counter also
track legacy requests"; kernel v4.15) the request queue freezing
mechanism also affects legacy request queues.

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Signed-off-by: Bart Van Assche 
Cc: Jens Axboe 
Cc: Damien Le Moal 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: sta...@vger.kernel.org # v4.10
---
 drivers/scsi/sd_zbc.c  | 140 +
 include/linux/blkdev.h |   5 ++
 2 files changed, 87 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 2d0c06f7db3e..323e3dc4bc59 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -390,8 +390,10 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, 
unsigned char *buf)
  *
  * Check that all zones of the device are equal. The last zone can however
  * be smaller. The zone size must also be a power of two number of LBAs.
+ *
+ * Returns the zone size in bytes upon success or an error code upon failure.
  */
-static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 {
u64 zone_blocks = 0;
sector_t block = 0;
@@ -402,8 +404,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
int ret;
u8 same;
 
-   sdkp->zone_blocks = 0;
-
/* Get a buffer */
buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
if (!buf)
@@ -435,16 +435,17 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 
/* Parse zone descriptors */
while (rec < buf + buf_len) {
-   zone_blocks = get_unaligned_be64(&rec[8]);
-   if (sdkp->zone_blocks == 0) {
-   sdkp->zone_blocks = zone_blocks;
-   } else if (zone_blocks != sdkp->zone_blocks &&
-  (block + zone_blocks < sdkp->capacity
-   || zone_blocks > sdkp->zone_blocks)) {
-   zone_blocks = 0;
+   u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
+
+   if (zone_blocks == 0) {
+   zone_blocks = this_zone_blocks;
+   } else if (this_zone_blocks != zone_blocks &&
+  (block + this_zone_blocks < sdkp->capacity
+   || this_zone_blocks > zone_blocks)) {
+   this_zone_blocks = 0;
goto out;
}
-   block += zone_blocks;
+   block += this_zone_blocks;
rec += 64;
}
 
@@ -457,8 +458,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 
} while (block < sdkp->capacity);
 
-   zone_blocks = sdkp->zone_blocks;
-
 out:
if (!zone_blocks) {
if (sdkp->first_scan)
@@ -478,8 +477,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
  "Zone size too large\n");
ret = -ENODEV;
} else {
-   sdkp->zone_blocks = zone_blocks;
-   sdkp->zone_shift = ilog2(zone_blocks);
+   ret = zone_blocks;
}
 
 out_free:
@@ -490,15 +488,14 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 
 /**
  * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
- * @sdkp: The disk of the bitmap
+ * @nr_zones: Number of zones to allocate space for.
+ * @numa_node: NUMA node to allocate the memory from.
  */
-static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+static inline unsigned long *
+sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node)
 {
-   struct request_queue *q = sdkp->disk->queue;
-
-   return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
-   * sizeof(unsigned long),
-   GFP_KERNEL, q->node);
+   return kzalloc_node(BITS_TO_LONGS(nr_zones) * sizeof(unsigned long),
+   GFP_KERNEL, numa_node);
 }
 

[PATCH 2/3] sd_zbc: Let the SCSI core handle ILLEGAL REQUEST / ASC 0x21

2018-04-16 Thread Bart Van Assche
scsi_io_completion() translates the sense key ILLEGAL REQUEST / ASC
0x21 into ACTION_FAIL. That means that setting cmd->allowed to zero
in sd_zbc_complete() for this sense code / ASC combination is not
necessary. Hence remove the code that resets cmd->allowed from
sd_zbc_complete().

Signed-off-by: Bart Van Assche 
Cc: Damien Le Moal 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 drivers/scsi/sd_zbc.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 41df75eea57b..2d0c06f7db3e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -299,16 +299,6 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int 
good_bytes,
case REQ_OP_WRITE:
case REQ_OP_WRITE_ZEROES:
case REQ_OP_WRITE_SAME:
-
-   if (result &&
-   sshdr->sense_key == ILLEGAL_REQUEST &&
-   sshdr->asc == 0x21)
-   /*
-* INVALID ADDRESS FOR WRITE error: It is unlikely that
-* retrying write requests failed with any kind of
-* alignement error will result in success. So don't.
-*/
-   cmd->allowed = 0;
break;
 
case REQ_OP_ZONE_REPORT:
-- 
2.16.3



[Bug 199419] New: mpt3sas triggers KASAN complaint during reboot

2018-04-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199419

Bug ID: 199419
   Summary: mpt3sas triggers KASAN complaint during reboot
   Product: SCSI Drivers
   Version: 2.5
Kernel Version: v4.17-rc1
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: scsi_drivers-ot...@kernel-bugs.osdl.org
  Reporter: bvanass...@acm.org
Regression: No

Created attachment 275411
  --> https://bugzilla.kernel.org/attachment.cgi?id=275411&action=edit
KASAN complaint

Rebooting a system with an mpt3sas adapter causes the following complaint to be
reported on the serial console:

BUG: KASAN: use-after-free in mpt3sas_scsih_scsi_lookup_get+0xbd/0x120
[mpt3sas]
Read of size 1 at addr 880807f4030a by task systemd-shutdow/1

CPU: 26 PID: 1 Comm: systemd-shutdow Not tainted 4.17.0-rc1-dbg+ #2
Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407
03/10/2017
Call Trace:
 dump_stack+0x7c/0xbb
 print_address_description+0x65/0x270
 kasan_report+0x232/0x350
 mpt3sas_scsih_scsi_lookup_get+0xbd/0x120 [mpt3sas]
 _scsih_flush_running_cmds+0x85/0x130 [mpt3sas]
 scsih_shutdown+0x4f/0xe0 [mpt3sas]
 pci_device_shutdown+0x42/0x80
 device_shutdown+0x1af/0x2f0
 kernel_restart+0x9/0x50
 __do_sys_reboot+0x24e/0x2a0
 do_syscall_64+0x5d/0x200
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

(gdb) list *(mpt3sas_scsih_scsi_lookup_get+0xbd)
0x1fb2d is in mpt3sas_scsih_scsi_lookup_get
(drivers/scsi/mpt3sas/mpt3sas_scsih.c:1468).
1463u32 unique_tag = smid - 1;
1464
1465scmd = scsi_host_find_tag(ioc->shost, unique_tag);
1466if (scmd) {
1467st = scsi_cmd_priv(scmd);
1468if (st->cb_idx == 0xFF)
1469scmd = NULL;
1470}
1471}
1472return scmd;

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 21:30:54 +
Bart Van Assche  wrote:

> Hello Steve,
> 
> The tool I'm most concerned about is blktrace. I'm not sure though how this
> tool receives event data from the block layer core.

Yeah, blktrace is "special", it looks like it registers its callbacks
from the tracepoints, and writes the data to its own relay buffer. As
it's not relying on the output from the tracing directory, additional
fields being added shouldn't affect it.

Looking at the trace event "block_rq_requeue" we have in the blktrace
kernel code:

static void blk_register_tracepoints(void)
{
int ret;

ret = register_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);


Where the callback blk_add_trace_rq_insert() gets called when the
trace event is hit.

static void blk_add_trace_rq_insert(void *ignore,
struct request_queue *q, struct request *rq)
{
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_INSERT,
 blk_trace_request_get_cgid(q, rq));
}

Where:

static void blk_add_trace_rq(struct request *rq, int error,
 unsigned int nr_bytes, u32 what,
 union kernfs_node_id *cgid)
{

calls

__blk_add_trace(bt, blk_rq_trace_sector(rq), nr_bytes, req_op(rq),
rq->cmd_flags, what, error, 0, NULL, cgid);

Which calls either the ftrace tracing file or its own relay buffer.

Looking at the code from
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git

It appears that it does not rely on the ftrace ring buffers.

So I'm guessing blktrace is not affected by this patch.

-- Steve


Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 17:24 -0400, Steven Rostedt wrote:
> On Mon, 16 Apr 2018 20:49:12 +
> Bart Van Assche  wrote:
> 
> > Which tools process these strings? Has it been verified whether or not
> > the tools that process these strings still work fine with this patch
> > applied?
> 
> Ideally, tools shouldn't process trace event strings, but I'm sure some
> do. :-/
> 
> Getting libtraceevent out as a library is a high priority of mine, and
> hopefully I should get something out in a couple of months.
> 
> Ideally, tools should parse the raw data and then new fields will not
> affect them.

Hello Steve,

The tool I'm most concerned about is blktrace. I'm not sure though how this
tool receives event data from the block layer core.

Thanks,

Bart.




Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 20:49:12 +
Bart Van Assche  wrote:

> Which tools process these strings? Has it been verified whether or not
> the tools that process these strings still work fine with this patch
> applied?

Ideally, tools shouldn't process trace event strings, but I'm sure some
do. :-/

Getting libtraceevent out as a library is a high priority of mine, and
hopefully I should get something out in a couple of months.

Ideally, tools should parse the raw data and then new fields will not
affect them.

-- Steve


Proposal

2018-04-16 Thread MS Zeliha Omer Faruk



Hello

Greeetings to you please did you get my previous email regarding my
investment proposal last week friday ?

MS.Zeliha ömer faruk
zeliha.omer.fa...@gmail.com



Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 21:10 +, Bean Huo (beanhuo) wrote:
> Hi, Bart
> 
> > mi...@redhad.com; linux-bl...@vger.kernel.org; raja...@google.com
> > Subject: [EXT] Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in 
> > SCSI
> > trace events
> > 
> > On Mon, 2018-04-16 at 14:31 +, Bean Huo (beanhuo) wrote:
> > >   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u
> > 
> > prot_sgl=%u" \
> > > -   " prot_op=%s cmnd=(%s %s raw=%s)",
> > > +   " prot_op=%s tag=%d cmnd=(%s %s raw=%s)",
> > > 
> > > [ ... ]
> > >   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u
> > 
> > prot_sgl=%u" \
> > > -   " prot_op=%s cmnd=(%s %s raw=%s) rtn=%d",
> > > +   " prot_op=%s tag=%d cmnd=(%s %s raw=%s) rtn=%d",
> > > [ ... ]
> > >   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u " \
> > > -   "prot_sgl=%u prot_op=%s cmnd=(%s %s raw=%s)
> > 
> > result=(driver=" \
> > > -   "%s host=%s message=%s status=%s)",
> > > +   "prot_sgl=%u prot_op=%s tag=%d cmnd=(%s %s raw=%s) " \
> > > +   "result=(driver=%s host=%s message=%s status=%s)",
> > 
> > Which tools process these strings? Has it been verified whether or not the
> > tools that process these strings still work fine with this patch applied?
> > 
> 
> I don't use one certain special tool to analyze this string, I am using 
> ftrace with event.
> With tag information, I can see how many tasks in storage device and easily 
> to trace each request
> under multiple thread workload. 
> Event there is someone who using certain tool to analyze this string, after 
> adding additional
> tag printout, in my opinion, they are happy to see it there. 

Since you want to change these strings it is your job to determine which
user space tools parse these strings and also whether or not this change
will break any of these tools.

Thanks,

Bart.





Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bean Huo (beanhuo)
Hi, Bart

>mi...@redhad.com; linux-bl...@vger.kernel.org; raja...@google.com
>Subject: [EXT] Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI
>trace events
>
>On Mon, 2018-04-16 at 14:31 +, Bean Huo (beanhuo) wrote:
>>  TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u
>prot_sgl=%u" \
>> -  " prot_op=%s cmnd=(%s %s raw=%s)",
>> +  " prot_op=%s tag=%d cmnd=(%s %s raw=%s)",
>>
>> [ ... ]
>>  TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u
>prot_sgl=%u" \
>> -  " prot_op=%s cmnd=(%s %s raw=%s) rtn=%d",
>> +  " prot_op=%s tag=%d cmnd=(%s %s raw=%s) rtn=%d",
>> [ ... ]
>>  TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u " \
>> -  "prot_sgl=%u prot_op=%s cmnd=(%s %s raw=%s)
>result=(driver=" \
>> -  "%s host=%s message=%s status=%s)",
>> +  "prot_sgl=%u prot_op=%s tag=%d cmnd=(%s %s raw=%s) " \
>> +  "result=(driver=%s host=%s message=%s status=%s)",
>
>Which tools process these strings? Has it been verified whether or not the
>tools that process these strings still work fine with this patch applied?
> 

I don't use one certain special tool to analyze this string, I am using ftrace 
with event.
With tag information, I can see how many tasks in storage device and easily to 
trace each request
under multiple thread workload. 
Event there is someone who using certain tool to analyze this string, after 
adding additional
tag printout, in my opinion, they are happy to see it there. 

Again, you said if we add new feature in legacy block, we will also add new 
feature in blk-mq.
I still don't understand here.  "include/trace/event/block.h ... scsi.h" will 
be changed?
If yes, how? Because blk-mq is still using the events defined in 
include/trace/event/block.h.

Bean Huo

>Thanks,
>
>Bart.
>



Re: [PATCH v9] scsi: new zorro_esp.c for Amiga Zorro NCR53C9x boards

2018-04-16 Thread Michael Schmitz
Thanks guys!

Anything else needed for linux-scsi?

Cheers,

  Michael


On Mon, Apr 16, 2018 at 10:34 PM, John Paul Adrian Glaubitz
 wrote:
> On 04/16/2018 11:26 AM, Christian T. Steigies wrote:
>>
>> On Thu, Apr 12, 2018 at 01:53:26PM +1200, Michael Schmitz wrote:
>>>
>>> From: Michael Schmitz 
>>>
>>> New combined SCSI driver for all ESP based Zorro SCSI boards for
>>> m68k Amiga.
>>>
>>> Code largely based on board specific parts of the old drivers (blz1230.c,
>>> blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed
>>> after the 2.6 kernel series for lack of maintenance) with contributions
>>> by Tuomas Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ
>>> bugfix by use of PIO in extended message in transfer).
>>>
>>> New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI
>>> driver included in this patch.
>>>
>>> Use DMA transfers wherever possible, with board-specific DMA set-up
>>> functions copied from the old driver code. Three byte reselection
>>> messages
>>> do appear to cause DMA timeouts. So wire up a PIO transfer routine for
>>> these instead. esp_reselect_with_tag explicitly sets esp->cmd_block_dma
>>> as
>>> target address for the message bytes but PIO requires a virtual address.
>>> Substiute kernel virtual address esp->cmd_block in PIO transfer call if
>>> DMA address is esp->cmd_block_dma and phase is message in.
>>>
>>> PIO code taken from mac_esp.c where the reselection timeout issue was
>>> debugged and fixed first, with minor macro and function rename.
>>>
>>> Signed-off-by: Michael Schmitz 
>>> Reviewed-by: Finn Thain 
>>> Reviewed-by: Christoph Hellwig 
>>
>> Tested-by: Christian T. Steigies 
>>
>
> Also here on an Amiga 4000 with Cyberstorm I:
>
> Tested-by: John Paul Adrian Glaubitz 
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer - glaub...@debian.org
> `. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
>   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [RESEND PATCH v1 2/2] trace: events: block: Add tag in block trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 14:33 +, Bean Huo (beanhuo) wrote:
> [ ... ]
> - TP_printk("%d,%d %s (%s) %llu + %u [%d]",
> + TP_printk("%d,%d %s (%s) %llu + %u tag=%d [%d]",
> [ ... ]
> - TP_printk("%d,%d %s (%s) %llu + %u [%d]",
> + TP_printk("%d,%d %s (%s) %llu + %u tag=%d [%d]",
> [ ... ]

Which tools process these strings? Has it been verified whether or not
the tools that process these strings still work fine with this patch
applied?

Thanks,

Bart.





Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 14:31 +, Bean Huo (beanhuo) wrote:
>   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u prot_sgl=%u" \
> -   " prot_op=%s cmnd=(%s %s raw=%s)",
> +   " prot_op=%s tag=%d cmnd=(%s %s raw=%s)",
> 
> [ ... ]
>   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u prot_sgl=%u" \
> -   " prot_op=%s cmnd=(%s %s raw=%s) rtn=%d",
> +   " prot_op=%s tag=%d cmnd=(%s %s raw=%s) rtn=%d",
> [ ... ]
>   TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u " \
> -   "prot_sgl=%u prot_op=%s cmnd=(%s %s raw=%s) result=(driver=" \
> -   "%s host=%s message=%s status=%s)",
> +   "prot_sgl=%u prot_op=%s tag=%d cmnd=(%s %s raw=%s) " \
> +   "result=(driver=%s host=%s message=%s status=%s)",

Which tools process these strings? Has it been verified whether or not
the tools that process these strings still work fine with this patch
applied?

Thanks,

Bart.




Re: usercopy whitelist woe in scsi_sense_cache

2018-04-16 Thread Kees Cook
On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook  wrote:
> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook  wrote:
>> After fixing up some build issues in the middle of the 4.16 cycle, I
>> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
>> 'for-4.16/block'"). Instead of letting the test run longer, I'm going
>> to switch to doing several shorter test boots per kernel and see if
>> that helps. One more bisect coming...
>
> Okay, so I can confirm the bisect points at the _merge_ itself, not a
> specific patch. I'm not sure how to proceed here. It looks like some
> kind of interaction between separate trees? Jens, do you have
> suggestions on how to track this down?

Turning off HARDENED_USERCOPY and turning on KASAN, I see the same report:

[   38.274106] BUG: KASAN: slab-out-of-bounds in _copy_to_user+0x42/0x60
[   38.274841] Read of size 22 at addr 8800122b8c4b by task smartctl/1064
[   38.275630]
[   38.275818] CPU: 2 PID: 1064 Comm: smartctl Not tainted 4.17.0-rc1-ARCH+ #266
[   38.276631] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   38.277690] Call Trace:
[   38.277988]  dump_stack+0x71/0xab
[   38.278397]  ? _copy_to_user+0x42/0x60
[   38.278833]  print_address_description+0x6a/0x270
[   38.279368]  ? _copy_to_user+0x42/0x60
[   38.279800]  kasan_report+0x243/0x360
[   38.280221]  _copy_to_user+0x42/0x60
[   38.280635]  sg_io+0x459/0x660
...

Though we get slightly more details (some we already knew):

[   38.301330] Allocated by task 329:
[   38.301734]  kmem_cache_alloc_node+0xca/0x220
[   38.302239]  scsi_mq_init_request+0x64/0x130 [scsi_mod]
[   38.302821]  blk_mq_alloc_rqs+0x2cf/0x370
[   38.303265]  blk_mq_sched_alloc_tags.isra.4+0x7d/0xb0
[   38.303820]  blk_mq_init_sched+0xf0/0x220
[   38.304268]  elevator_switch+0x17a/0x2c0
[   38.304705]  elv_iosched_store+0x173/0x220
[   38.305171]  queue_attr_store+0x72/0xb0
[   38.305602]  kernfs_fop_write+0x188/0x220
[   38.306049]  __vfs_write+0xb6/0x330
[   38.306436]  vfs_write+0xe9/0x240
[   38.306804]  ksys_write+0x98/0x110
[   38.307181]  do_syscall_64+0x6d/0x1d0
[   38.307590]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   38.308142]
[   38.308316] Freed by task 0:
[   38.308652] (stack is not available)
[   38.309060]
[   38.309243] The buggy address belongs to the object at 8800122b8c00
[   38.309243]  which belongs to the cache scsi_sense_cache of size 96
[   38.310625] The buggy address is located 75 bytes inside of
[   38.310625]  96-byte region [8800122b8c00, 8800122b8c60)


-Kees

-- 
Kees Cook
Pixel Security


Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 20:27 +, Bean Huo (beanhuo) wrote:
> By the way, these patches are not to add new feature, they are just to
> add print tag along with the other exist printed request parameters.

Are you aware that there are two tag fields in struct request, namely "tag"
and "internal_tag"? Are you aware that how these fields are used depends on
whether or not a scheduler is attached to a request queue? Have you verified
that the "tag" field contains a meaningful value for blk-mq for every blk-mq
tracepoint?

Thanks,

Bart.





Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bean Huo (beanhuo)
>>> This patch is not acceptable because it adds support for tag tracing
>>> to the legacy block layer only. Any patch that adds a new feature to
>>> the legacy block layer must also add it to blk-mq.
>>>
>> To be honest, I don't understand your point, can you give me more
>explanation?
>
>The legacy block layer will be removed as soon as blk-mq provides all the
>functionality of the legacy block layer and as soon as it performs at least as
>well as the legacy block layer for all use cases. If new features are added to
>the legacy block layer but not to blk-mq that prevents removal of the legacy
>block layer. Hence the requirement I explained in my previous e-mail.
>
>Bart.

Thanks for your information.
I have several questions again.
When the legacy block layer will be replaced by blk-mq? 
And "include/trece/event/block.h .. scsi.h" will also be changed? 
Do you have the related git rep or mail list about this topic?
Maybe this is great big change, I am very interested in that. And want to have 
a look at.

By the way, these patches are not to add new feature, they are just to add 
print tag along with the other exist
Printed request parameters.  The blk-mq is now still using 
"include/trace/evet/block.h" defined trace events.

For example: 
void blk_mq_start_request(struct request *rq)  
{
...
...
trace_block_rq_issue(q, rq);
...
...
}
Do you mean that this will also be removed/replaced by someone else?

Bean Huo


Re: [PATCH 08/12] mmc: reduce use of block bounce buffers (fwd)

2018-04-16 Thread Julia Lawall
There is a duplicated test on line 360.

julia

-- Forwarded message --
Date: Mon, 16 Apr 2018 23:04:18 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH 08/12] mmc: reduce use of block bounce buffers

CC: kbuild-...@01.org
In-Reply-To: <20180416085032.7367-9-...@lst.de>
References: <20180416085032.7367-9-...@lst.de>
TO: Christoph Hellwig 
CC: io...@lists.linux-foundation.org, linux-a...@vger.kernel.org, 
linux-bl...@vger.kernel.org, linux-...@vger.kernel.org, 
linux-scsi@vger.kernel.org, net...@vger.kernel.org
CC: linux-ker...@vger.kernel.org

Hi Christoph,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc1 next-20180416]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christoph-Hellwig/iscsi_tcp-don-t-set-a-bounce-limit/20180416-172618
:: branch date: 6 hours ago
:: commit date: 6 hours ago

>> drivers/mmc/core/queue.c:360:5-29: duplicated argument to && or ||

# 
https://github.com/0day-ci/linux/commit/6620a69f0eea8e8b7586f08f721c95a336022497
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 6620a69f0eea8e8b7586f08f721c95a336022497
vim +360 drivers/mmc/core/queue.c

81196976 Adrian Hunter 2017-11-29  350
c8b5fd03 Adrian Hunter 2017-09-22  351  static void mmc_setup_queue(struct 
mmc_queue *mq, struct mmc_card *card)
c8b5fd03 Adrian Hunter 2017-09-22  352  {
c8b5fd03 Adrian Hunter 2017-09-22  353  struct mmc_host *host = 
card->host;
c8b5fd03 Adrian Hunter 2017-09-22  354
8b904b5b Bart Van Assche   2018-03-07  355  
blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
8b904b5b Bart Van Assche   2018-03-07  356  
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
c8b5fd03 Adrian Hunter 2017-09-22  357  if (mmc_can_erase(card))
c8b5fd03 Adrian Hunter 2017-09-22  358  
mmc_queue_setup_discard(mq->queue, card);
c8b5fd03 Adrian Hunter 2017-09-22  359
6620a69f Christoph Hellwig 2018-04-16 @360  if (!mmc_dev(host)->dma_mask || 
!mmc_dev(host)->dma_mask)
6620a69f Christoph Hellwig 2018-04-16  361  
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
c8b5fd03 Adrian Hunter 2017-09-22  362  
blk_queue_max_hw_sectors(mq->queue,
c8b5fd03 Adrian Hunter 2017-09-22  363  
min(host->max_blk_count, host->max_req_size / 512));
c8b5fd03 Adrian Hunter 2017-09-22  364  
blk_queue_max_segments(mq->queue, host->max_segs);
c8b5fd03 Adrian Hunter 2017-09-22  365  
blk_queue_max_segment_size(mq->queue, host->max_seg_size);
c8b5fd03 Adrian Hunter 2017-09-22  366
1e8e55b6 Adrian Hunter 2017-11-29  367  INIT_WORK(&mq->recovery_work, 
mmc_mq_recovery_handler);
81196976 Adrian Hunter 2017-11-29  368  INIT_WORK(&mq->complete_work, 
mmc_blk_mq_complete_work);
81196976 Adrian Hunter 2017-11-29  369
81196976 Adrian Hunter 2017-11-29  370  mutex_init(&mq->complete_lock);
81196976 Adrian Hunter 2017-11-29  371
81196976 Adrian Hunter 2017-11-29  372  init_waitqueue_head(&mq->wait);
81196976 Adrian Hunter 2017-11-29  373  }
81196976 Adrian Hunter 2017-11-29  374

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: Add udev-md-raid-safe-timeouts.rules

2018-04-16 Thread Alan Stern
On Mon, 16 Apr 2018, Chris Murphy wrote:

> Adding linux-usb@ and linux-scsi@
> (This email does contain the thread initiating email, but some replies
> are on the other lists.)
> 
> On Mon, Apr 16, 2018 at 5:43 AM, Austin S. Hemmelgarn
>  wrote:
> > On 2018-04-15 21:04, Chris Murphy wrote:
> >>
> >> I just ran into this:
> >>
> >> https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec
> >>
> >> This solution is inadequate, can it be made more generic? This isn't
> >> an md specific problem, it affects Btrfs and LVM as well. And in fact
> >> raid0, and even none raid setups.
> >>
> >> There is no good reason to prevent deep recovery, which is what
> >> happens with the default command timer of 30 seconds, with this class
> >> of drive. Basically that value is going to cause data loss for the
> >> single device and also raid0 case, where the reset happens before deep
> >> recovery has a chance. And even if deep recovery fails to return user
> >> data, what we need to see is the proper error message: read error UNC,
> >> rather than a link reset message which just obfuscates the problem.
> >
> >
> > This has been discussed at least once here before (probably more times, hard
> > to be sure since it usually comes up as a side discussion in an only
> > marginally related thread).  Last I knew, the consensus here was that it
> > needs to be changed upstream in the kernel, not by adding a udev rule
> > because while the value is technically system policy, the default policy is
> > brain-dead for anything but the original disks it was i9ntended for (30
> > seconds works perfectly fine for actual SCSI devices because they behave
> > sanely in the face of media errors, but it's horribly inadequate for ATA
> > devices).
> >
> > To re-iterate what I've said before on the subject:
> >
> > For ATA drives it should probably be 150 seconds.  That's 30 seconds beyond
> > the typical amount of time most consumer drives will keep retrying a sector,
> > so even if it goes the full time to try and recover a sector this shouldn't
> > trigger.  The only people this change should negatively impact are those who
> > have failing drives which support SCT ERC and have it enabled, but aren't
> > already adjusting this timeout.
> >
> > For physical SCSI devices, it should continue to be 30 seconds.  SCSI disks
> > are sensible here and don't waste your time trying to recover a sector.  For
> > PV-SCSI devices, it should probably be adjusted too, but I don't know what a
> > reasonable value is.
> >
> > For USB devices it should probably be higher than 30 seconds, but again I
> > have no idea what a reasonable value is.
> 
> I don't know how all of this is designed but it seems like there's
> only one location for the command timer, and the SCSI driver owns it,
> and then everyone else (ATA and USB and for all I know SAN) are on top
> of that and lack any ability to have separate timeouts.

As far as mass-storage is concerned, USB is merely a transport.  It 
doesn't impose any timeout rules; the appropriate timeout value is 
whatever the device at the end of the USB link needs.  Thus, a SCSI 
drive connected over USB could use a 30-second timeout, an ATA drive 
could use 150 seconds, and so on.

Unfortunately, the only way to tell what sort of drive you've got is by
looking at the Vendor/Product IDs or other information provided by the
drive itself.  You can't tell anything just from knowing what sort of
bus it's on.

Alan Stern

> The nice thing about the udev rule is that it tests for SCT ERC before
> making a change. There certainly are enterprise and almost enterprise
> "NAS" SATA drives that have short SCT ERC times enabled out of the box
> - and the udev method makes them immune to the change.



Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche

On 04/16/18 10:05, Bean Huo (beanhuo) wrote:

On Mon, 2018-04-16 at 09:41 -0700, Rajat Jain wrote:

On Mon, Apr 16, 2018 at 8:28 AM, Steven Rostedt 

wrote:

On Mon, 16 Apr 2018 14:31:49 +
"Bean Huo (beanhuo)"  wrote:


Print the request tag along with other information while tracing a
command.

Signed-off-by: Bean Huo 


Acked-by: Rajat Jain 


This patch is not acceptable because it adds support for tag tracing to the
legacy block layer only. Any patch that adds a new feature to the legacy block
layer must also add it to blk-mq.


To be honest, I don't understand your point, can you give me more explanation?


The legacy block layer will be removed as soon as blk-mq provides all 
the functionality of the legacy block layer and as soon as it performs 
at least as well as the legacy block layer for all use cases. If new 
features are added to the legacy block layer but not to blk-mq that 
prevents removal of the legacy block layer. Hence the requirement I 
explained in my previous e-mail.


Bart.


Re: Add udev-md-raid-safe-timeouts.rules

2018-04-16 Thread Chris Murphy
Adding linux-usb@ and linux-scsi@
(This email does contain the thread initiating email, but some replies
are on the other lists.)

On Mon, Apr 16, 2018 at 5:43 AM, Austin S. Hemmelgarn
 wrote:
> On 2018-04-15 21:04, Chris Murphy wrote:
>>
>> I just ran into this:
>>
>> https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec
>>
>> This solution is inadequate, can it be made more generic? This isn't
>> an md specific problem, it affects Btrfs and LVM as well. And in fact
>> raid0, and even none raid setups.
>>
>> There is no good reason to prevent deep recovery, which is what
>> happens with the default command timer of 30 seconds, with this class
>> of drive. Basically that value is going to cause data loss for the
>> single device and also raid0 case, where the reset happens before deep
>> recovery has a chance. And even if deep recovery fails to return user
>> data, what we need to see is the proper error message: read error UNC,
>> rather than a link reset message which just obfuscates the problem.
>
>
> This has been discussed at least once here before (probably more times, hard
> to be sure since it usually comes up as a side discussion in an only
> marginally related thread).  Last I knew, the consensus here was that it
> needs to be changed upstream in the kernel, not by adding a udev rule
> because while the value is technically system policy, the default policy is
> brain-dead for anything but the original disks it was i9ntended for (30
> seconds works perfectly fine for actual SCSI devices because they behave
> sanely in the face of media errors, but it's horribly inadequate for ATA
> devices).
>
> To re-iterate what I've said before on the subject:
>
> For ATA drives it should probably be 150 seconds.  That's 30 seconds beyond
> the typical amount of time most consumer drives will keep retrying a sector,
> so even if it goes the full time to try and recover a sector this shouldn't
> trigger.  The only people this change should negatively impact are those who
> have failing drives which support SCT ERC and have it enabled, but aren't
> already adjusting this timeout.
>
> For physical SCSI devices, it should continue to be 30 seconds.  SCSI disks
> are sensible here and don't waste your time trying to recover a sector.  For
> PV-SCSI devices, it should probably be adjusted too, but I don't know what a
> reasonable value is.
>
> For USB devices it should probably be higher than 30 seconds, but again I
> have no idea what a reasonable value is.

I don't know how all of this is designed but it seems like there's
only one location for the command timer, and the SCSI driver owns it,
and then everyone else (ATA and USB and for all I know SAN) are on top
of that and lack any ability to have separate timeouts.

The nice thing about the udev rule is that it tests for SCT ERC before
making a change. There certainly are enterprise and almost enterprise
"NAS" SATA drives that have short SCT ERC times enabled out of the box
- and the udev method makes them immune to the change.


-- 
Chris Murphy


Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bean Huo (beanhuo)
Hi, Bart

>On Mon, 2018-04-16 at 09:41 -0700, Rajat Jain wrote:
>> On Mon, Apr 16, 2018 at 8:28 AM, Steven Rostedt 
>wrote:
>> > On Mon, 16 Apr 2018 14:31:49 +
>> > "Bean Huo (beanhuo)"  wrote:
>> >
>> > > Print the request tag along with other information while tracing a
>> > > command.
>> > >
>> > > Signed-off-by: Bean Huo 
>>
>> Acked-by: Rajat Jain 
>
>This patch is not acceptable because it adds support for tag tracing to the
>legacy block layer only. Any patch that adds a new feature to the legacy block
>layer must also add it to blk-mq.
>
To be honest, I don't understand your point, can you give me more explanation?

>Bart.
>
>



Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bart Van Assche
On Mon, 2018-04-16 at 09:41 -0700, Rajat Jain wrote:
> On Mon, Apr 16, 2018 at 8:28 AM, Steven Rostedt  wrote:
> > On Mon, 16 Apr 2018 14:31:49 +
> > "Bean Huo (beanhuo)"  wrote:
> > 
> > > Print the request tag along with other information
> > > while tracing a command.
> > > 
> > > Signed-off-by: Bean Huo 
> 
> Acked-by: Rajat Jain 

This patch is not acceptable because it adds support for tag tracing to the
legacy block layer only. Any patch that adds a new feature to the legacy block
layer must also add it to blk-mq.

Bart.





Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Rajat Jain
On Mon, Apr 16, 2018 at 8:28 AM, Steven Rostedt  wrote:
> On Mon, 16 Apr 2018 14:31:49 +
> "Bean Huo (beanhuo)"  wrote:
>
>> Print the request tag along with other information
>> while tracing a command.
>>
>> Signed-off-by: Bean Huo 
Acked-by: Rajat Jain 

>> ---
>
> I don't see any issue with the tracing part.
>
> Acked-by: Steven Rostedt (VMware) 
>
> Others need to check the content.
>
> -- Steve


Re: [RESEND PATCH v1 2/2] trace: events: block: Add tag in block trace events

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 14:33:29 +
"Bean Huo (beanhuo)"  wrote:

> Print the request tag along with other information in block trace events
> when tracing request , and unplug type (Sync / Async).
> 
> Signed-off-by: Bean Huo 

I don't see any issue with the tracing part.

Acked-by: Steven Rostedt (VMware) 

Others need to check the content.

-- Steve


> ---
>  include/trace/events/block.h | 36 +---
>  1 file changed, 25 insertions(+), 11 deletions(-)
> 


Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 14:31:49 +
"Bean Huo (beanhuo)"  wrote:

> Print the request tag along with other information
> while tracing a command.
> 
> Signed-off-by: Bean Huo 
> ---

I don't see any issue with the tracing part.

Acked-by: Steven Rostedt (VMware) 

Others need to check the content.

-- Steve


[RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-16 Thread Bean Huo (beanhuo)
Print the request tag along with other information
while tracing a command.

Signed-off-by: Bean Huo 
---
 include/trace/events/scsi.h | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/scsi.h b/include/trace/events/scsi.h
index f624969..a4ada90 100644
--- a/include/trace/events/scsi.h
+++ b/include/trace/events/scsi.h
@@ -210,6 +210,7 @@ TRACE_EVENT(scsi_dispatch_cmd_start,
__field( unsigned int,  lun )
__field( unsigned int,  opcode  )
__field( unsigned int,  cmd_len )
+   __field( int,   tag )
__field( unsigned int,  data_sglen )
__field( unsigned int,  prot_sglen )
__field( unsigned char, prot_op )
@@ -223,6 +224,7 @@ TRACE_EVENT(scsi_dispatch_cmd_start,
__entry->lun= cmd->device->lun;
__entry->opcode = cmd->cmnd[0];
__entry->cmd_len= cmd->cmd_len;
+   __entry->tag= cmd->request->tag;
__entry->data_sglen = scsi_sg_count(cmd);
__entry->prot_sglen = scsi_prot_sg_count(cmd);
__entry->prot_op= scsi_get_prot_op(cmd);
@@ -230,10 +232,10 @@ TRACE_EVENT(scsi_dispatch_cmd_start,
),
 
TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u prot_sgl=%u" \
- " prot_op=%s cmnd=(%s %s raw=%s)",
+ " prot_op=%s tag=%d cmnd=(%s %s raw=%s)",
  __entry->host_no, __entry->channel, __entry->id,
  __entry->lun, __entry->data_sglen, __entry->prot_sglen,
- show_prot_op_name(__entry->prot_op),
+ show_prot_op_name(__entry->prot_op), __entry->tag,
  show_opcode_name(__entry->opcode),
  __parse_cdb(__get_dynamic_array(cmnd), __entry->cmd_len),
  __print_hex(__get_dynamic_array(cmnd), __entry->cmd_len))
@@ -253,6 +255,7 @@ TRACE_EVENT(scsi_dispatch_cmd_error,
__field( int,   rtn )
__field( unsigned int,  opcode  )
__field( unsigned int,  cmd_len )
+   __field( int,   tag )
__field( unsigned int,  data_sglen )
__field( unsigned int,  prot_sglen )
__field( unsigned char, prot_op )
@@ -267,6 +270,7 @@ TRACE_EVENT(scsi_dispatch_cmd_error,
__entry->rtn= rtn;
__entry->opcode = cmd->cmnd[0];
__entry->cmd_len= cmd->cmd_len;
+   __entry->tag= cmd->request->tag;
__entry->data_sglen = scsi_sg_count(cmd);
__entry->prot_sglen = scsi_prot_sg_count(cmd);
__entry->prot_op= scsi_get_prot_op(cmd);
@@ -274,10 +278,10 @@ TRACE_EVENT(scsi_dispatch_cmd_error,
),
 
TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u prot_sgl=%u" \
- " prot_op=%s cmnd=(%s %s raw=%s) rtn=%d",
+ " prot_op=%s tag=%d cmnd=(%s %s raw=%s) rtn=%d",
  __entry->host_no, __entry->channel, __entry->id,
  __entry->lun, __entry->data_sglen, __entry->prot_sglen,
- show_prot_op_name(__entry->prot_op),
+ show_prot_op_name(__entry->prot_op), __entry->tag,
  show_opcode_name(__entry->opcode),
  __parse_cdb(__get_dynamic_array(cmnd), __entry->cmd_len),
  __print_hex(__get_dynamic_array(cmnd), __entry->cmd_len),
@@ -298,6 +302,7 @@ DECLARE_EVENT_CLASS(scsi_cmd_done_timeout_template,
__field( int,   result  )
__field( unsigned int,  opcode  )
__field( unsigned int,  cmd_len )
+   __field( int,   tag )
__field( unsigned int,  data_sglen )
__field( unsigned int,  prot_sglen )
__field( unsigned char, prot_op )
@@ -312,6 +317,7 @@ DECLARE_EVENT_CLASS(scsi_cmd_done_timeout_template,
__entry->result = cmd->result;
__entry->opcode = cmd->cmnd[0];
__entry->cmd_len= cmd->cmd_len;
+   __entry->tag= cmd->request->tag;
__entry->data_sglen = scsi_sg_count(cmd);
__entry->prot_sglen = scsi_prot_sg_count(cmd);
__entry->prot_op= scsi_get_prot_op(cmd);
@@ -319,11 +325,11 @@ DECLARE_EVENT_CLASS(scsi_cmd_done_timeout_template,
),
 
TP_printk("host_no=%u channel=%u id=%u lun=%u data_sgl=%u " \
- "prot_sgl=%u prot_op=%s cmnd=(%s %s raw=%s) result=(driver=" \
- "%s host=%s message=%s status=%s)",
+ "prot_sgl=%u prot_op=%s tag=%d cmnd=(%s %s raw=%s) " \
+ "result=(driver=%s host=%s message=%s status=%s)",
   

[RESEND PATCH v1 0/2] Print the request tag in Block/SCSI trace events

2018-04-16 Thread Bean Huo (beanhuo)
These patches are to add the printout of the request tag in Block/SCSI trace
events when tracing one request or command, this is very useful for tracing
the task running status in the storage device which supports multiple command
queue.

As for the first patch " Add tag in SCSI trace events", copied from
Rajat Jain [1]. I am just re-sending here.
 
[1]https://patchwork.kernel.org/patch/6399661/

-- Resend since patches need to go through the block and SCSI subsystem, and 
need related
maintainer's confirmation.

Bean Huo (2):
  trace: events: scsi: Add tag in SCSI trace events
  trace: events: block: Add tag in block trace events

 include/trace/events/block.h | 36 +---
 include/trace/events/scsi.h  | 20 +---
 2 files changed, 38 insertions(+), 18 deletions(-)

-- 
2.7.4


[RESEND PATCH v1 2/2] trace: events: block: Add tag in block trace events

2018-04-16 Thread Bean Huo (beanhuo)
Print the request tag along with other information in block trace events
when tracing request , and unplug type (Sync / Async).

Signed-off-by: Bean Huo 
---
 include/trace/events/block.h | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 81b43f5..f8c0b9e 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -81,6 +81,7 @@ TRACE_EVENT(block_rq_requeue,
__field(  dev_t,dev )
__field(  sector_t, sector  )
__field(  unsigned int, nr_sector   )
+   __field(  int,  tag )
__array(  char, rwbs,   RWBS_LEN)
__dynamic_array( char,  cmd,1   )
),
@@ -89,16 +90,17 @@ TRACE_EVENT(block_rq_requeue,
__entry->dev   = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
__entry->sector= blk_rq_trace_sector(rq);
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);
+   __entry->tag   = rq->tag;
 
blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
__get_str(cmd)[0] = '\0';
),
 
-   TP_printk("%d,%d %s (%s) %llu + %u [%d]",
+   TP_printk("%d,%d %s (%s) %llu + %u tag=%d [%d]",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  __entry->rwbs, __get_str(cmd),
  (unsigned long long)__entry->sector,
- __entry->nr_sector, 0)
+ __entry->nr_sector, __entry->tag, 0)
 );
 
 /**
@@ -124,6 +126,7 @@ TRACE_EVENT(block_rq_complete,
__field(  sector_t, sector  )
__field(  unsigned int, nr_sector   )
__field(  int,  error   )
+   __field(  int,  tag )
__array(  char, rwbs,   RWBS_LEN)
__dynamic_array( char,  cmd,1   )
),
@@ -133,16 +136,17 @@ TRACE_EVENT(block_rq_complete,
__entry->sector= blk_rq_pos(rq);
__entry->nr_sector = nr_bytes >> 9;
__entry->error = error;
+   __entry->tag   = rq->tag;
 
blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
__get_str(cmd)[0] = '\0';
),
 
-   TP_printk("%d,%d %s (%s) %llu + %u [%d]",
+   TP_printk("%d,%d %s (%s) %llu + %u tag=%d [%d]",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  __entry->rwbs, __get_str(cmd),
  (unsigned long long)__entry->sector,
- __entry->nr_sector, __entry->error)
+ __entry->nr_sector, __entry->tag, __entry->error)
 );
 
 DECLARE_EVENT_CLASS(block_rq,
@@ -156,6 +160,7 @@ DECLARE_EVENT_CLASS(block_rq,
__field(  sector_t, sector  )
__field(  unsigned int, nr_sector   )
__field(  unsigned int, bytes   )
+   __field(  int,  tag )
__array(  char, rwbs,   RWBS_LEN)
__array(  char, comm,   TASK_COMM_LEN   )
__dynamic_array( char,  cmd,1   )
@@ -166,17 +171,18 @@ DECLARE_EVENT_CLASS(block_rq,
__entry->sector= blk_rq_trace_sector(rq);
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);
__entry->bytes = blk_rq_bytes(rq);
+   __entry->tag   = rq->tag;
 
blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
__get_str(cmd)[0] = '\0';
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),
 
-   TP_printk("%d,%d %s %u (%s) %llu + %u [%s]",
+   TP_printk("%d,%d %s %u (%s) %llu + %u tag=%d [%s]",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  __entry->rwbs, __entry->bytes, __get_str(cmd),
  (unsigned long long)__entry->sector,
- __entry->nr_sector, __entry->comm)
+ __entry->nr_sector, __entry->tag, __entry->comm)
 );
 
 /**
@@ -297,6 +303,7 @@ DECLARE_EVENT_CLASS(block_bio_merge,
__field( dev_t, dev )
__field( sector_t,  sector  )
__field( unsigned int,  nr_sector   )
+   __field( int,   tag )
__array( char,  rwbs,   RWBS_LEN)
__array( char,  comm,   TASK_COMM_LEN   )
),
@@ -305,14 +312,15 @@ DECLARE_EVENT_CLASS(block_bio_merge,
__entry->dev= bio_dev(bio);
   

[PATCHv3] tcmu: allow userspace to reset netlink

2018-04-16 Thread xiubli
From: Xiubo Li 

This patch adds 1 tcmu attr to reset and complete all the blocked
netlink waiting threads. It's used when the userspace daemon like
tcmu-runner has crashed or forced to shutdown just before the
netlink requests be replied to the kernel, then the netlink requeting
threads will get stuck forever. We must reboot the machine to recover
from it and by this the rebootng is not a must then.

The Call Trace will be something like:
==
INFO: task targetctl:22655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
targetctl   D 880169718fd0 0 22655  17249 0x0080
Call Trace:
 [] schedule+0x29/0x70
 [] schedule_timeout+0x239/0x2c0
 [] ? skb_release_data+0xf2/0x140
 [] wait_for_completion+0xfd/0x140
 [] ? wake_up_state+0x20/0x20
 [] tcmu_netlink_event+0x26a/0x3a0 [target_core_user]
 [] ? wake_up_atomic_t+0x30/0x30
 [] tcmu_configure_device+0x236/0x350 [target_core_user]
 [] target_configure_device+0x3f/0x3b0 [target_core_mod]
 [] target_core_store_dev_enable+0x2c/0x60 [target_core_mod]
 [] target_core_dev_store+0x24/0x40 [target_core_mod]
 [] configfs_write_file+0xc4/0x130
 [] vfs_write+0xbd/0x1e0
 [] SyS_write+0x7f/0xe0
 [] system_call_fastpath+0x16/0x1b
==

Signed-off-by: Xiubo Li 
---
Changes since v1(suggested by Mike Christie):
v2: - Makes the reset per device.
v3: - Remove nl_cmd->complete, use status instead
- Fix lock issue 
- Check if a nl command is even waiting before trying to wake up

 drivers/target/target_core_user.c | 60 +--
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 4ad89ea..a831f5b7 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -104,8 +104,8 @@ struct tcmu_hba {
 #define TCMU_CONFIG_LEN 256
 
 struct tcmu_nl_cmd {
-   /* wake up thread waiting for reply */
-   struct completion complete;
+   unsigned int waiter;
+
int cmd;
int status;
 };
@@ -159,9 +159,12 @@ struct tcmu_dev {
 
spinlock_t nl_cmd_lock;
struct tcmu_nl_cmd curr_nl_cmd;
-   /* wake up threads waiting on curr_nl_cmd */
+   /* wake up threads waiting on nl_cmd_wq */
wait_queue_head_t nl_cmd_wq;
 
+   /* complete thread waiting complete_wq */
+   wait_queue_head_t complete_wq;
+
char dev_config[TCMU_CONFIG_LEN];
 
int nl_reply_supported;
@@ -307,11 +310,13 @@ static int tcmu_genl_cmd_done(struct genl_info *info, int 
completed_cmd)
nl_cmd->status = rc;
}
 
-   spin_unlock(&udev->nl_cmd_lock);
if (!is_removed)
 target_undepend_item(&dev->dev_group.cg_item);
-   if (!ret)
-   complete(&nl_cmd->complete);
+   if (!ret && nl_cmd->waiter) {
+   nl_cmd->waiter--;
+   wake_up(&udev->complete_wq);
+   }
+   spin_unlock(&udev->nl_cmd_lock);
return ret;
 }
 
@@ -1258,6 +1263,7 @@ static struct se_device *tcmu_alloc_device(struct se_hba 
*hba, const char *name)
timer_setup(&udev->cmd_timer, tcmu_cmd_timedout, 0);
 
init_waitqueue_head(&udev->nl_cmd_wq);
+   init_waitqueue_head(&udev->complete_wq);
spin_lock_init(&udev->nl_cmd_lock);
 
INIT_RADIX_TREE(&udev->data_blocks, GFP_KERNEL);
@@ -1554,8 +1560,8 @@ static void tcmu_init_genl_cmd_reply(struct tcmu_dev 
*udev, int cmd)
}
 
memset(nl_cmd, 0, sizeof(*nl_cmd));
+   nl_cmd->status = 1;
nl_cmd->cmd = cmd;
-   init_completion(&nl_cmd->complete);
 
spin_unlock(&udev->nl_cmd_lock);
 }
@@ -1572,13 +1578,16 @@ static int tcmu_wait_genl_cmd_reply(struct tcmu_dev 
*udev)
if (udev->nl_reply_supported <= 0)
return 0;
 
+   spin_lock(&udev->nl_cmd_lock);
+   nl_cmd->waiter++;
+   spin_unlock(&udev->nl_cmd_lock);
+
pr_debug("sleeping for nl reply\n");
-   wait_for_completion(&nl_cmd->complete);
+   wait_event(udev->complete_wq, nl_cmd->status != 1);
 
spin_lock(&udev->nl_cmd_lock);
nl_cmd->cmd = TCMU_CMD_UNSPEC;
ret = nl_cmd->status;
-   nl_cmd->status = 0;
spin_unlock(&udev->nl_cmd_lock);
 
wake_up_all(&udev->nl_cmd_wq);
@@ -2323,6 +2332,38 @@ static ssize_t tcmu_block_dev_store(struct config_item 
*item, const char *page,
 }
 CONFIGFS_ATTR(tcmu_, block_dev);
 
+static ssize_t tcmu_reset_netlink_store(struct config_item *item, const char 
*page,
+   size_t count)
+{
+   struct se_device *se_dev = container_of(to_config_group(item),
+   struct se_device,
+   dev_action_group);
+   struct tcmu_dev *udev = TCMU_DEV(se_dev);
+   struct tcmu_nl_cmd *nl_cmd = &udev->curr_nl_cmd;
+   u8 val;
+   int ret;
+
+   ret = kstrtou

Re: [PATCH 08/12] mmc: reduce use of block bounce buffers

2018-04-16 Thread Robin Murphy

On 16/04/18 09:50, Christoph Hellwig wrote:

We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but bouncing
for highmem pages.

Signed-off-by: Christoph Hellwig 
---
  drivers/mmc/core/queue.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 56e9a803db21..60a02a763d01 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -351,17 +351,14 @@ static const struct blk_mq_ops mmc_mq_ops = {
  static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
  {
struct mmc_host *host = card->host;
-   u64 limit = BLK_BOUNCE_HIGH;
-
-   if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
-   limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
  
  	blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);

blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
  
-	blk_queue_bounce_limit(mq->queue, limit);

+   if (!mmc_dev(host)->dma_mask || !mmc_dev(host)->dma_mask)


I'm almost surprised that GCC doesn't warn about "x || x", but 
nevertheless I think you've lost a "*" here...


Robin.


+   blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);



Re: [PATCH v9] scsi: new zorro_esp.c for Amiga Zorro NCR53C9x boards

2018-04-16 Thread John Paul Adrian Glaubitz

On 04/16/2018 11:26 AM, Christian T. Steigies wrote:

On Thu, Apr 12, 2018 at 01:53:26PM +1200, Michael Schmitz wrote:

From: Michael Schmitz 

New combined SCSI driver for all ESP based Zorro SCSI boards for
m68k Amiga.

Code largely based on board specific parts of the old drivers (blz1230.c,
blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed
after the 2.6 kernel series for lack of maintenance) with contributions
by Tuomas Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ
bugfix by use of PIO in extended message in transfer).

New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI
driver included in this patch.

Use DMA transfers wherever possible, with board-specific DMA set-up
functions copied from the old driver code. Three byte reselection messages
do appear to cause DMA timeouts. So wire up a PIO transfer routine for
these instead. esp_reselect_with_tag explicitly sets esp->cmd_block_dma as
target address for the message bytes but PIO requires a virtual address.
Substiute kernel virtual address esp->cmd_block in PIO transfer call if
DMA address is esp->cmd_block_dma and phase is message in.

PIO code taken from mac_esp.c where the reselection timeout issue was
debugged and fixed first, with minor macro and function rename.

Signed-off-by: Michael Schmitz 
Reviewed-by: Finn Thain 
Reviewed-by: Christoph Hellwig 

Tested-by: Christian T. Steigies 



Also here on an Amiga 4000 with Cyberstorm I:

Tested-by: John Paul Adrian Glaubitz 

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH v9] scsi: new zorro_esp.c for Amiga Zorro NCR53C9x boards

2018-04-16 Thread Christian T. Steigies
On Thu, Apr 12, 2018 at 01:53:26PM +1200, Michael Schmitz wrote:
> From: Michael Schmitz 
> 
> New combined SCSI driver for all ESP based Zorro SCSI boards for
> m68k Amiga.
> 
> Code largely based on board specific parts of the old drivers (blz1230.c,
> blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed
> after the 2.6 kernel series for lack of maintenance) with contributions
> by Tuomas Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ
> bugfix by use of PIO in extended message in transfer).
> 
> New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI
> driver included in this patch.
> 
> Use DMA transfers wherever possible, with board-specific DMA set-up
> functions copied from the old driver code. Three byte reselection messages
> do appear to cause DMA timeouts. So wire up a PIO transfer routine for
> these instead. esp_reselect_with_tag explicitly sets esp->cmd_block_dma as
> target address for the message bytes but PIO requires a virtual address.
> Substiute kernel virtual address esp->cmd_block in PIO transfer call if
> DMA address is esp->cmd_block_dma and phase is message in.
> 
> PIO code taken from mac_esp.c where the reselection timeout issue was
> debugged and fixed first, with minor macro and function rename.
> 
> Signed-off-by: Michael Schmitz 
> Reviewed-by: Finn Thain 
> Reviewed-by: Christoph Hellwig 
Tested-by: Christian T. Steigies 

Christian


[PATCH] target: fix crash with iscsi target and dvd

2018-04-16 Thread Ming Lei
When the current page can't be added to bio, one new bio should be
created for adding this page again, instead of ignoring this page.

This patch fixes kernel crash with iscsi target and dvd, as reported
by Wakko.

Cc: Wakko Warner 
Cc: Bart Van Assche 
Cc: target-de...@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Nicholas A. Bellinger" 
Cc: Christoph Hellwig 
Fixes: 84c8590646d5b35804 ("target: avoid accessing .bi_vcnt directly")
Signed-off-by: Ming Lei 
---
 drivers/target/target_core_pscsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index 0d99b242e82e..6cb933ecc084 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -890,6 +890,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
bytes = min(bytes, data_len);
 
if (!bio) {
+new_bio:
nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
nr_pages -= nr_vecs;
/*
@@ -931,6 +932,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
 * be allocated with pscsi_get_bio() above.
 */
bio = NULL;
+   goto new_bio;
}
 
data_len -= bytes;
-- 
2.9.5



Re: [PATCH v2] block: ratelimite pr_err on IO path

2018-04-16 Thread Sreekanth Reddy
On Mon, Apr 16, 2018 at 1:46 PM, Jinpu Wang  wrote:
> On Fri, Apr 13, 2018 at 6:59 PM, Martin K. Petersen
>  wrote:
>>
>> Jinpu,
>>
>> [CC:ed the mpt3sas maintainers]
>>
>> The ratelimit patch is just an attempt to treat the symptom, not the
>> cause.
> Agree. If we can fix the root cause, it will be great.
>>
>>> Thanks for asking, we updated mpt3sas driver which enables DIX support
>>> (prot_mask=0x7f), all disks are SATA SSDs, no DIF support.
>>> After reboot, kernel reports the IO errors from all the drives behind
>>> HBA, seems for almost every read IO, which turns the system unusable:
>>> [   13.079375] sda: ref tag error at location 0 (rcvd 143196159)
>>> [   13.079989] sda: ref tag error at location 937702912 (rcvd 143196159)
>>> [   13.080233] sda: ref tag error at location 937703072 (rcvd 143196159)
>>> [   13.080407] sda: ref tag error at location 0 (rcvd 143196159)
>>> [   13.080594] sda: ref tag error at location 8 (rcvd 143196159)
>>
>> That sounds like a bug in the mpt3sas driver or firmware. I guess the
>> HBA could conceivably be operating a SATA device as DIX Type 0 and strip
>> the PI on the drive side. But that doesn't seem to be a particularly
>> useful mode of operation.
>>
>> Jinpu: Which firmware are you running? Also, please send us the output
>> of:
>>
>> sg_readcap -l /dev/sda
>> sg_inq -x /dev/sda
>> sg_vpd /dev/sda
>>
> Disks are INTEL SSDSC2BX48, directly attached to HBA.
> LSISAS3008: FWVersion(13.00.00.00), ChipRevision(0x02), 
> BiosVersion(08.11.00.00)
> mpt3sas_cm2: Protocol=(Initiator,Target),
> Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set
> Full,NCQ)
>
> jwang@x:~$ sudo sg_vpd /dev/sdz
> Supported VPD pages VPD page:
>   Supported VPD pages [sv]
>   Unit serial number [sn]
>   Device identification [di]
>   Mode page policy [mpp]
>   ATA information (SAT) [ai]
>   Block limits (SBC) [bl]
>   Block device characteristics (SBC) [bdc]
>   Logical block provisioning (SBC) [lbpv]
> jwang@x:~$ sudo sg_inq -x /dev/sdz
> VPD INQUIRY: extended INQUIRY data page
> inquiry: field in cdb illegal (page not supported)
> jwang@x:~$ sudo sg_readcap -l /dev/sdz
> Read Capacity results:
>Protection: prot_en=0, p_type=0, p_i_exponent=0
>Logical block provisioning: lbpme=1, lbprz=1
>Last logical block address=937703087 (0x37e436af), Number of
> logical blocks=937703088
>Logical block length=512 bytes
>Logical blocks per physical block exponent=3 [so physical block
> length=4096 bytes]
>Lowest aligned logical block address=0
> Hence:
>Device size: 480103981056 bytes, 457862.8 MiB, 480.10 GB
>
>
>> Broadcom: How is DIX supposed to work for SATA drives behind an mpt3sas
>> controller?

[Sreekanth] Current Upstream mpt3sas driver doesn't have DIX support
capabilities,
it supports only DIF feature.


Thanks,
Sreekanth
>>
>> --
>> Martin K. Petersen  Oracle Linux Engineering
>
>
> Thanks!
>
> --
> Jack Wang
> Linux Kernel Developer
>
> ProfitBricks GmbH
> Greifswalder Str. 207
> D - 10405 Berlin


[PATCH 01/12] iscsi_tcp: don't set a bounce limit

2018-04-16 Thread Christoph Hellwig
The default already is to never bounce, so the call is a no-op.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/iscsi_tcp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 2ba4b68fdb73..b025a0b74341 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -962,7 +962,6 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device 
*sdev)
if (conn->datadgst_en)
sdev->request_queue->backing_dev_info->capabilities
|= BDI_CAP_STABLE_WRITES;
-   blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
blk_queue_dma_alignment(sdev->request_queue, 0);
return 0;
 }
-- 
2.17.0



[PATCH 02/12] storsvc: don't set a bounce limit

2018-04-16 Thread Christoph Hellwig
The default already is to never bounce, so the call is a no-op.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/storvsc_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8c51d628b52e..5f2d177c3bd9 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1382,9 +1382,6 @@ static int storvsc_device_alloc(struct scsi_device 
*sdevice)
 
 static int storvsc_device_configure(struct scsi_device *sdevice)
 {
-
-   blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
-
blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
 
/* Ensure there are no gaps in presented sgls */
-- 
2.17.0



remove PCI_DMA_BUS_IS_PHYS

2018-04-16 Thread Christoph Hellwig
Hi all,

this series tries to get rid of the global and PCI_DMA_BUS_IS_PHYS flag,
which causes the block layer and networking code to bounce buffer memory
above the dma mask in some cases.  It is a leftover from i386 + highmem
days and is obsolete now that we have swiotlb or iommus so that the
dma ops implementations can always (well minus the ISA DMA case which
will require further attention) handle memory passed to them.


[PATCH 03/12] mtip32xx: don't use block layer bounce buffers

2018-04-16 Thread Christoph Hellwig
mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/mtip32xx/mtip32xx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 769c551e3d71..b03bb27dcc58 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3862,7 +3862,6 @@ static int mtip_block_initialize(struct driver_data *dd)
blk_queue_max_hw_sectors(dd->queue, 0x);
blk_queue_max_segment_size(dd->queue, 0x40);
blk_queue_io_min(dd->queue, 4096);
-   blk_queue_bounce_limit(dd->queue, dd->pdev->dma_mask);
 
/* Signal trim support */
if (dd->trim_supp == true) {
-- 
2.17.0



[PATCH 04/12] DAC960: don't use block layer bounce buffers

2018-04-16 Thread Christoph Hellwig
DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/DAC960.c | 9 ++---
 drivers/block/DAC960.h | 1 -
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index f781eff7d23e..c9ba48519d0f 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -1179,7 +1179,6 @@ static bool 
DAC960_V1_EnableMemoryMailboxInterface(DAC960_Controller_T
 
   if (pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
return DAC960_Failure(Controller, "DMA mask out of range");
-  Controller->BounceBufferLimit = DMA_BIT_MASK(32);
 
   if ((hw_type == DAC960_PD_Controller) || (hw_type == DAC960_P_Controller)) {
 CommandMailboxesSize =  0;
@@ -1380,11 +1379,8 @@ static bool 
DAC960_V2_EnableMemoryMailboxInterface(DAC960_Controller_T
   dma_addr_t   CommandMailboxDMA;
   DAC960_V2_CommandStatus_T CommandStatus;
 
-   if (!pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(64)))
-   Controller->BounceBufferLimit = DMA_BIT_MASK(64);
-   else if (!pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
-   Controller->BounceBufferLimit = DMA_BIT_MASK(32);
-   else
+   if (pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(64)) &&
+   pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
return DAC960_Failure(Controller, "DMA mask out of range");
 
   /* This is a temporary dma mapping, used only in the scope of this function 
*/
@@ -2540,7 +2536,6 @@ static bool 
DAC960_RegisterBlockDevice(DAC960_Controller_T *Controller)
continue;
}
Controller->RequestQueue[n] = RequestQueue;
-   blk_queue_bounce_limit(RequestQueue, Controller->BounceBufferLimit);
RequestQueue->queuedata = Controller;
blk_queue_max_segments(RequestQueue, 
Controller->DriverScatterGatherLimit);
blk_queue_max_hw_sectors(RequestQueue, Controller->MaxBlocksPerCommand);
diff --git a/drivers/block/DAC960.h b/drivers/block/DAC960.h
index 21aff470d268..1439e651928b 100644
--- a/drivers/block/DAC960.h
+++ b/drivers/block/DAC960.h
@@ -2295,7 +2295,6 @@ typedef struct DAC960_Controller
   unsigned short MaxBlocksPerCommand;
   unsigned short ControllerScatterGatherLimit;
   unsigned short DriverScatterGatherLimit;
-  u64  BounceBufferLimit;
   unsigned int CombinedStatusBufferLength;
   unsigned int InitialStatusLength;
   unsigned int CurrentStatusLength;
-- 
2.17.0



[PATCH 05/12] sata_nv: don't use block layer bounce buffers

2018-04-16 Thread Christoph Hellwig
sata_nv sets the block bounce limit to the reduce dma mask for ATAPI
devices, which means that the iommu or swiotlb already take care of
the bounce buffering, and the block bouncing can be removed.

Signed-off-by: Christoph Hellwig 
---
 drivers/ata/sata_nv.c | 62 +--
 1 file changed, 24 insertions(+), 38 deletions(-)

diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 8c683ddd0f58..b6e9ad6d33c9 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -740,32 +740,16 @@ static int nv_adma_slave_config(struct scsi_device *sdev)
sdev1 = ap->host->ports[1]->link.device[0].sdev;
if ((port0->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
(port1->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
-   /** We have to set the DMA mask to 32-bit if either port is in
-   ATAPI mode, since they are on the same PCI device which is
-   used for DMA mapping. If we set the mask we also need to set
-   the bounce limit on both ports to ensure that the block
-   layer doesn't feed addresses that cause DMA mapping to
-   choke. If either SCSI device is not allocated yet, it's OK
-   since that port will discover its correct setting when it
-   does get allocated.
-   Note: Setting 32-bit mask should not fail. */
-   if (sdev0)
-   blk_queue_bounce_limit(sdev0->request_queue,
-  ATA_DMA_MASK);
-   if (sdev1)
-   blk_queue_bounce_limit(sdev1->request_queue,
-  ATA_DMA_MASK);
-
-   dma_set_mask(&pdev->dev, ATA_DMA_MASK);
+   /*
+* We have to set the DMA mask to 32-bit if either port is in
+* ATAPI mode, since they are on the same PCI device which is
+* used for DMA mapping.  If either SCSI device is not allocated
+* yet, it's OK since that port will discover its correct
+* setting when it does get allocated.
+*/
+   rc = dma_set_mask(&pdev->dev, ATA_DMA_MASK);
} else {
-   /** This shouldn't fail as it was set to this value before */
-   dma_set_mask(&pdev->dev, pp->adma_dma_mask);
-   if (sdev0)
-   blk_queue_bounce_limit(sdev0->request_queue,
-  pp->adma_dma_mask);
-   if (sdev1)
-   blk_queue_bounce_limit(sdev1->request_queue,
-  pp->adma_dma_mask);
+   rc = dma_set_mask(&pdev->dev, pp->adma_dma_mask);
}
 
blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
@@ -1131,12 +1115,11 @@ static int nv_adma_port_start(struct ata_port *ap)
 
VPRINTK("ENTER\n");
 
-   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
-  pad buffers */
-   rc = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
-   if (rc)
-   return rc;
-   rc = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
+   /*
+* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+* pad buffers.
+*/
+   rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (rc)
return rc;
 
@@ -1156,13 +1139,16 @@ static int nv_adma_port_start(struct ata_port *ap)
pp->notifier_clear_block = pp->gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
 
-   /* Now that the legacy PRD and padding buffer are allocated we can
-  safely raise the DMA mask to allocate the CPB/APRD table.
-  These are allowed to fail since we store the value that ends up
-  being used to set as the bounce limit in slave_config later if
-  needed. */
-   dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
-   dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
+   /*
+* Now that the legacy PRD and padding buffer are allocated we can
+* try to raise the DMA mask to allocate the CPB/APRD table.
+*/
+   rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+   if (rc) {
+   rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+   if (rc)
+   return rc;
+   }
pp->adma_dma_mask = *dev->dma_mask;
 
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
-- 
2.17.0



[PATCH 07/12] scsi: reduce use of block bounce buffers

2018-04-16 Thread Christoph Hellwig
We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but the
unchecked_isa_dma case, or the bouncing for highmem pages.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c | 24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0dfec0dedd5e..ebe2cbb48b80 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2142,27 +2142,6 @@ static int scsi_map_queues(struct blk_mq_tag_set *set)
return blk_mq_map_queues(set);
 }
 
-static u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
-{
-   struct device *host_dev;
-   u64 bounce_limit = 0x;
-
-   if (shost->unchecked_isa_dma)
-   return BLK_BOUNCE_ISA;
-   /*
-* Platforms with virtual-DMA translation
-* hardware have no practical limit.
-*/
-   if (!PCI_DMA_BUS_IS_PHYS)
-   return BLK_BOUNCE_ANY;
-
-   host_dev = scsi_get_device(shost);
-   if (host_dev && host_dev->dma_mask)
-   bounce_limit = (u64)dma_max_pfn(host_dev) << PAGE_SHIFT;
-
-   return bounce_limit;
-}
-
 void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
struct device *dev = shost->dma_dev;
@@ -2182,7 +2161,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
}
 
blk_queue_max_hw_sectors(q, shost->max_sectors);
-   blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
+   if (shost->unchecked_isa_dma)
+   blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
blk_queue_segment_boundary(q, shost->dma_boundary);
dma_set_seg_boundary(dev, shost->dma_boundary);
 
-- 
2.17.0



[PATCH 06/12] memstick: don't call blk_queue_bounce_limit

2018-04-16 Thread Christoph Hellwig
All in-tree host drivers set up a proper dma mask and use the dma-mapping
helpers.  This means they will be able to deal with any address that we
are throwing at them.

Signed-off-by: Christoph Hellwig 
---
 drivers/memstick/core/ms_block.c| 5 -
 drivers/memstick/core/mspro_block.c | 5 -
 2 files changed, 10 deletions(-)

diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 57b13dfbd21e..b2d025f42d14 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -2096,12 +2096,8 @@ static int msb_init_disk(struct memstick_dev *card)
struct msb_data *msb = memstick_get_drvdata(card);
struct memstick_host *host = card->host;
int rc;
-   u64 limit = BLK_BOUNCE_HIGH;
unsigned long capacity;
 
-   if (host->dev.dma_mask && *(host->dev.dma_mask))
-   limit = *(host->dev.dma_mask);
-
mutex_lock(&msb_disk_lock);
msb->disk_id = idr_alloc(&msb_disk_idr, card, 0, 256, GFP_KERNEL);
mutex_unlock(&msb_disk_lock);
@@ -2123,7 +2119,6 @@ static int msb_init_disk(struct memstick_dev *card)
 
msb->queue->queuedata = card;
 
-   blk_queue_bounce_limit(msb->queue, limit);
blk_queue_max_hw_sectors(msb->queue, MS_BLOCK_MAX_PAGES);
blk_queue_max_segments(msb->queue, MS_BLOCK_MAX_SEGS);
blk_queue_max_segment_size(msb->queue,
diff --git a/drivers/memstick/core/mspro_block.c 
b/drivers/memstick/core/mspro_block.c
index 8897962781bb..a2fadc605750 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -1175,12 +1175,8 @@ static int mspro_block_init_disk(struct memstick_dev 
*card)
struct mspro_sys_info *sys_info = NULL;
struct mspro_sys_attr *s_attr = NULL;
int rc, disk_id;
-   u64 limit = BLK_BOUNCE_HIGH;
unsigned long capacity;
 
-   if (host->dev.dma_mask && *(host->dev.dma_mask))
-   limit = *(host->dev.dma_mask);
-
for (rc = 0; msb->attr_group.attrs[rc]; ++rc) {
s_attr = mspro_from_sysfs_attr(msb->attr_group.attrs[rc]);
 
@@ -1219,7 +1215,6 @@ static int mspro_block_init_disk(struct memstick_dev 
*card)
 
msb->queue->queuedata = card;
 
-   blk_queue_bounce_limit(msb->queue, limit);
blk_queue_max_hw_sectors(msb->queue, MSPRO_BLOCK_MAX_PAGES);
blk_queue_max_segments(msb->queue, MSPRO_BLOCK_MAX_SEGS);
blk_queue_max_segment_size(msb->queue,
-- 
2.17.0



[PATCH 08/12] mmc: reduce use of block bounce buffers

2018-04-16 Thread Christoph Hellwig
We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but bouncing
for highmem pages.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/core/queue.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 56e9a803db21..60a02a763d01 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -351,17 +351,14 @@ static const struct blk_mq_ops mmc_mq_ops = {
 static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 {
struct mmc_host *host = card->host;
-   u64 limit = BLK_BOUNCE_HIGH;
-
-   if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
-   limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   blk_queue_bounce_limit(mq->queue, limit);
+   if (!mmc_dev(host)->dma_mask || !mmc_dev(host)->dma_mask)
+   blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);
-- 
2.17.0



[PATCH 10/12] ide: remove the PCI_DMA_BUS_IS_PHYS check

2018-04-16 Thread Christoph Hellwig
We now have ways to deal with drainage in the block layer, and libata has
been using it for ages.  We also want to get rid of PCI_DMA_BUS_IS_PHYS
now, so just reduce the PCI transfer size for ide - anyone who cares for
performance on PCI controllers should have switched to libata long ago.

Signed-off-by: Christoph Hellwig 
---
 drivers/ide/ide-probe.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 8d8ed036ca0a..56d7bc228cb3 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -796,8 +796,7 @@ static int ide_init_queue(ide_drive_t *drive)
 * This will be fixed once we teach pci_map_sg() about our boundary
 * requirements, hopefully soon. *FIXME*
 */
-   if (!PCI_DMA_BUS_IS_PHYS)
-   max_sg_entries >>= 1;
+   max_sg_entries >>= 1;
 #endif /* CONFIG_PCI */
 
blk_queue_max_segments(q, max_sg_entries);
-- 
2.17.0



[PATCH 11/12] net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma

2018-04-16 Thread Christoph Hellwig
These days the dma mapping routines must be able to handle any address
supported by the device, be that using an iommu, or swiotlb if none is
supported.  With that the PCI_DMA_BUS_IS_PHYS check in illegal_highdma
is not needed and can be removed.

Signed-off-by: Christoph Hellwig 
---
 net/core/dev.c | 20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969462ebb296..5802cf177b07 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2884,11 +2884,7 @@ void netdev_rx_csum_fault(struct net_device *dev)
 EXPORT_SYMBOL(netdev_rx_csum_fault);
 #endif
 
-/* Actually, we should eliminate this check as soon as we know, that:
- * 1. IOMMU is present and allows to map all the memory.
- * 2. No high memory really exists on this machine.
- */
-
+/* XXX: check that highmem exists at all on the given machine. */
 static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_HIGHMEM
@@ -2902,20 +2898,6 @@ static int illegal_highdma(struct net_device *dev, 
struct sk_buff *skb)
return 1;
}
}
-
-   if (PCI_DMA_BUS_IS_PHYS) {
-   struct device *pdev = dev->dev.parent;
-
-   if (!pdev)
-   return 0;
-   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-   skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-   dma_addr_t addr = page_to_phys(skb_frag_page(frag));
-
-   if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > 
*pdev->dma_mask)
-   return 1;
-   }
-   }
 #endif
return 0;
 }
-- 
2.17.0



[PATCH 09/12] ide: kill ide_toggle_bounce

2018-04-16 Thread Christoph Hellwig
ide_toggle_bounce did select various strange block bounce limits, including
not bouncing at all as soon as an iommu is present in the system.  Given
that the dma_map routines now handle any required bounce buffering except
for ISA DMA, and the ide code already must handle either ISA DMA or highmem
at least for iommu equipped systems we can get rid of the block layer
bounce limit setting entirely.

Signed-off-by: Christoph Hellwig 
---
 drivers/ide/ide-dma.c   |  2 --
 drivers/ide/ide-lib.c   | 26 --
 drivers/ide/ide-probe.c |  3 ---
 include/linux/ide.h |  2 --
 4 files changed, 33 deletions(-)

diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c
index 54d4d78ca46a..6f344654ef22 100644
--- a/drivers/ide/ide-dma.c
+++ b/drivers/ide/ide-dma.c
@@ -180,7 +180,6 @@ EXPORT_SYMBOL_GPL(ide_dma_unmap_sg);
 void ide_dma_off_quietly(ide_drive_t *drive)
 {
drive->dev_flags &= ~IDE_DFLAG_USING_DMA;
-   ide_toggle_bounce(drive, 0);
 
drive->hwif->dma_ops->dma_host_set(drive, 0);
 }
@@ -211,7 +210,6 @@ EXPORT_SYMBOL(ide_dma_off);
 void ide_dma_on(ide_drive_t *drive)
 {
drive->dev_flags |= IDE_DFLAG_USING_DMA;
-   ide_toggle_bounce(drive, 1);
 
drive->hwif->dma_ops->dma_host_set(drive, 1);
 }
diff --git a/drivers/ide/ide-lib.c b/drivers/ide/ide-lib.c
index e1180fa46196..78cb79eddc8b 100644
--- a/drivers/ide/ide-lib.c
+++ b/drivers/ide/ide-lib.c
@@ -6,32 +6,6 @@
 #include 
 #include 
 
-/**
- * ide_toggle_bounce   -   handle bounce buffering
- * @drive: drive to update
- * @on: on/off boolean
- *
- * Enable or disable bounce buffering for the device. Drives move
- * between PIO and DMA and that changes the rules we need.
- */
-
-void ide_toggle_bounce(ide_drive_t *drive, int on)
-{
-   u64 addr = BLK_BOUNCE_HIGH; /* dma64_addr_t */
-
-   if (!PCI_DMA_BUS_IS_PHYS) {
-   addr = BLK_BOUNCE_ANY;
-   } else if (on && drive->media == ide_disk) {
-   struct device *dev = drive->hwif->dev;
-
-   if (dev && dev->dma_mask)
-   addr = *dev->dma_mask;
-   }
-
-   if (drive->queue)
-   blk_queue_bounce_limit(drive->queue, addr);
-}
-
 u64 ide_get_lba_addr(struct ide_cmd *cmd, int lba48)
 {
struct ide_taskfile *tf = &cmd->tf;
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 2019e66eada7..8d8ed036ca0a 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -805,9 +805,6 @@ static int ide_init_queue(ide_drive_t *drive)
/* assign drive queue */
drive->queue = q;
 
-   /* needs drive->queue to be set */
-   ide_toggle_bounce(drive, 1);
-
return 0;
 }
 
diff --git a/include/linux/ide.h b/include/linux/ide.h
index ca9d34feb572..11f0dd03a4b4 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1508,8 +1508,6 @@ static inline void ide_set_hwifdata (ide_hwif_t * hwif, 
void *data)
hwif->hwif_data = data;
 }
 
-extern void ide_toggle_bounce(ide_drive_t *drive, int on);
-
 u64 ide_get_lba_addr(struct ide_cmd *, int);
 u8 ide_dump_status(ide_drive_t *, const char *, u8);
 
-- 
2.17.0



[PATCH 12/12] PCI: remove PCI_DMA_BUS_IS_PHYS

2018-04-16 Thread Christoph Hellwig
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/include/asm/pci.h  |  5 -
 arch/arc/include/asm/pci.h|  6 --
 arch/arm/include/asm/pci.h|  7 ---
 arch/arm64/include/asm/pci.h  |  5 -
 arch/h8300/include/asm/pci.h  |  2 --
 arch/hexagon/kernel/dma.c |  1 -
 arch/ia64/hp/common/sba_iommu.c   |  3 ---
 arch/ia64/include/asm/pci.h   | 17 -
 arch/ia64/kernel/setup.c  | 12 
 arch/ia64/sn/kernel/io_common.c   |  5 -
 arch/m68k/include/asm/pci.h   |  6 --
 arch/microblaze/include/asm/pci.h |  6 --
 arch/mips/include/asm/pci.h   |  7 ---
 arch/parisc/include/asm/pci.h | 23 ---
 arch/parisc/kernel/setup.c|  5 -
 arch/powerpc/include/asm/pci.h| 18 --
 arch/riscv/include/asm/pci.h  |  3 ---
 arch/s390/include/asm/pci.h   |  2 --
 arch/s390/pci/pci_dma.c   |  2 --
 arch/sh/include/asm/pci.h |  6 --
 arch/sh/kernel/dma-nommu.c|  1 -
 arch/sparc/include/asm/pci_32.h   |  4 
 arch/sparc/include/asm/pci_64.h   |  6 --
 arch/x86/include/asm/pci.h|  3 ---
 arch/x86/kernel/pci-nommu.c   |  1 -
 arch/xtensa/include/asm/pci.h |  2 --
 drivers/parisc/ccio-dma.c |  2 --
 drivers/parisc/sba_iommu.c|  2 --
 include/asm-generic/pci.h |  8 
 include/linux/dma-mapping.h   |  1 -
 lib/dma-direct.c  |  1 -
 tools/virtio/linux/dma-mapping.h  |  2 --
 32 files changed, 174 deletions(-)

diff --git a/arch/alpha/include/asm/pci.h b/arch/alpha/include/asm/pci.h
index b9ec55351924..cf6bc1e64d66 100644
--- a/arch/alpha/include/asm/pci.h
+++ b/arch/alpha/include/asm/pci.h
@@ -56,11 +56,6 @@ struct pci_controller {
 
 /* IOMMU controls.  */
 
-/* The PCI address space does not equal the physical memory address space.
-   The networking and block device layers use this boolean for bounce buffer
-   decisions.  */
-#define PCI_DMA_BUS_IS_PHYS  0
-
 /* TODO: integrate with include/asm-generic/pci.h ? */
 static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
 {
diff --git a/arch/arc/include/asm/pci.h b/arch/arc/include/asm/pci.h
index ba56c23c1b20..4ff53c041c64 100644
--- a/arch/arc/include/asm/pci.h
+++ b/arch/arc/include/asm/pci.h
@@ -16,12 +16,6 @@
 #define PCIBIOS_MIN_MEM 0x10
 
 #define pcibios_assign_all_busses()1
-/*
- * The PCI address space does equal the physical memory address space.
- * The networking and block device layers use this boolean for bounce
- * buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS1
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/arm/include/asm/pci.h b/arch/arm/include/asm/pci.h
index 1f0de808d111..0abd389cf0ec 100644
--- a/arch/arm/include/asm/pci.h
+++ b/arch/arm/include/asm/pci.h
@@ -19,13 +19,6 @@ static inline int pci_proc_domain(struct pci_bus *bus)
 }
 #endif /* CONFIG_PCI_DOMAINS */
 
-/*
- * The PCI address space does equal the physical memory address space.
- * The networking and block device layers use this boolean for bounce
- * buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS (1)
-
 #define HAVE_PCI_MMAP
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE
 
diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h
index 8747f7c5e0e7..9e690686e8aa 100644
--- a/arch/arm64/include/asm/pci.h
+++ b/arch/arm64/include/asm/pci.h
@@ -18,11 +18,6 @@
 #define pcibios_assign_all_busses() \
(pci_has_flag(PCI_REASSIGN_ALL_BUS))
 
-/*
- * PCI address space differs from physical memory address space
- */
-#define PCI_DMA_BUS_IS_PHYS(0)
-
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE 1
 
 extern int isa_dma_bridge_buggy;
diff --git a/arch/h8300/include/asm/pci.h b/arch/h8300/include/asm/pci.h
index 7c9e55d62215..d4d345a52092 100644
--- a/arch/h8300/include/asm/pci.h
+++ b/arch/h8300/include/asm/pci.h
@@ -15,6 +15,4 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active)
/* We don't do dynamic PCI IRQ allocation */
 }
 
-#define PCI_DMA_BUS_IS_PHYS(1)
-
 #endif /* _ASM_H8300_PCI_H */
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index ad8347c29dcf..77459df34e2e 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -208,7 +208,6 @@ const struct dma_map_ops hexagon_dma_ops = {
.sync_single_for_cpu = hexagon_sync_single_for_cpu,
.sync_single_for_device = hexagon_sync_single_for_device,
.mapping_error  = hexagon_mapping_error,
-   .is_phys= 1,
 };
 
 void __init hexagon_dma_init(void)
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index aec4a3354abe..6f05aba9012f 100644
--- a/arch/ia64/hp/com

Re: [PATCH v2] block: ratelimite pr_err on IO path

2018-04-16 Thread Jinpu Wang
On Fri, Apr 13, 2018 at 6:59 PM, Martin K. Petersen
 wrote:
>
> Jinpu,
>
> [CC:ed the mpt3sas maintainers]
>
> The ratelimit patch is just an attempt to treat the symptom, not the
> cause.
Agree. If we can fix the root cause, it will be great.
>
>> Thanks for asking, we updated mpt3sas driver which enables DIX support
>> (prot_mask=0x7f), all disks are SATA SSDs, no DIF support.
>> After reboot, kernel reports the IO errors from all the drives behind
>> HBA, seems for almost every read IO, which turns the system unusable:
>> [   13.079375] sda: ref tag error at location 0 (rcvd 143196159)
>> [   13.079989] sda: ref tag error at location 937702912 (rcvd 143196159)
>> [   13.080233] sda: ref tag error at location 937703072 (rcvd 143196159)
>> [   13.080407] sda: ref tag error at location 0 (rcvd 143196159)
>> [   13.080594] sda: ref tag error at location 8 (rcvd 143196159)
>
> That sounds like a bug in the mpt3sas driver or firmware. I guess the
> HBA could conceivably be operating a SATA device as DIX Type 0 and strip
> the PI on the drive side. But that doesn't seem to be a particularly
> useful mode of operation.
>
> Jinpu: Which firmware are you running? Also, please send us the output
> of:
>
> sg_readcap -l /dev/sda
> sg_inq -x /dev/sda
> sg_vpd /dev/sda
>
Disks are INTEL SSDSC2BX48, directly attached to HBA.
LSISAS3008: FWVersion(13.00.00.00), ChipRevision(0x02), BiosVersion(08.11.00.00)
mpt3sas_cm2: Protocol=(Initiator,Target),
Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set
Full,NCQ)

jwang@x:~$ sudo sg_vpd /dev/sdz
Supported VPD pages VPD page:
  Supported VPD pages [sv]
  Unit serial number [sn]
  Device identification [di]
  Mode page policy [mpp]
  ATA information (SAT) [ai]
  Block limits (SBC) [bl]
  Block device characteristics (SBC) [bdc]
  Logical block provisioning (SBC) [lbpv]
jwang@x:~$ sudo sg_inq -x /dev/sdz
VPD INQUIRY: extended INQUIRY data page
inquiry: field in cdb illegal (page not supported)
jwang@x:~$ sudo sg_readcap -l /dev/sdz
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=1, lbprz=1
   Last logical block address=937703087 (0x37e436af), Number of
logical blocks=937703088
   Logical block length=512 bytes
   Logical blocks per physical block exponent=3 [so physical block
length=4096 bytes]
   Lowest aligned logical block address=0
Hence:
   Device size: 480103981056 bytes, 457862.8 MiB, 480.10 GB


> Broadcom: How is DIX supposed to work for SATA drives behind an mpt3sas
> controller?
>
> --
> Martin K. Petersen  Oracle Linux Engineering


Thanks!

-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin