Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-10-07 Thread Alexander Gordeev
On Thu, Oct 03, 2013 at 04:06:51AM -0700, Christoph Hellwig wrote:
 On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
  What about the attached only compile tested patch. The patch has the mq
  block code work like the non mq code for bio cleanups.
  
  
 
  blk-mq: blk-mq should free bios in pass through case
  
  For non mq calls, the block layer will free the bios when
  blk_finish_request is called.
 e 
  For mq calls, the blk mq code wants the caller to do this.
  
  This patch has the blk mq code work like the non mq code
  and has the block layer free the bios.
  
  Signed-off-by: Mike Christie micha...@cs.wisc.edu
 
 This patch breaks booting for me in the current blk multiqueue tree,
 with an apparent double free of a bio when using virtio-blk in writeback
 mode (cache=writeback or cache=none in qemu):

I am not sure if the root cause the same, but the panic I experience with
mounting a ahci device (and those double-tag usage described in another
thread) is somehow similar:


[  181.184510] general protection fault:  [#1] SMP 
[  181.184546] Modules linked in: lockd sunrpc snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm mperf snd_page_alloc i5000_edac 
coretemp snd_timer edac_core iTCO_wdt snd kvm_intel iTCO_vendor_support lpc_ich 
mfd_core igb dca i5k_amb ppdev soundcore hp_wmi tg3 kvm sparse_keymap serio_raw 
ptp microcode pcspkr rfkill pps_core shpchp parport_pc parport mptsas 
scsi_transport_sas mptscsih mptbase floppy nouveau video mxm_wmi wmi 
i2c_algo_bit drm_kms_helper ttm drm i2c_core
[  181.184550] CPU: 6 PID: 0 Comm: swapper/6 Tainted: GW
3.10.0-rc5.debug+ #180
[  181.184552] Hardware name: Hewlett-Packard HP xw6400 Workstation/0A04h, BIOS 
786D4 v02.31 03/14/2008
[  181.184554] task: 88007b1a8000 ti: 88007b19c000 task.ti: 
88007b19c000
[  181.184563] RIP: 0010:[811fa97b]  [811fa97b] 
bio_endio+0x1b/0x40
[  181.184565] RSP: 0018:88007d203a28  EFLAGS: 00010002
[  181.184567] RAX: 6b6b6b6b6b6b6b6b RBX:  RCX: dead00200200
[  181.184568] RDX:  RSI:  RDI: 880068e29200
[  181.184570] RBP: 88007d203a28 R08: e8201240 R09: 
[  181.184571] R10:  R11: 0001 R12: 880074d86000
[  181.184572] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b R15: 0001
[  181.184575] FS:  () GS:88007d20() 
knlGS:
[  181.184576] CS:  0010 DS:  ES:  CR0: 8005003b
[  181.184578] CR2: 7f8afac8c45c CR3: 5f431000 CR4: 07e0
[  181.184580] DR0:  DR1:  DR2: 
[  181.184581] DR3:  DR6: 0ff0 DR7: 0400
[  181.184582] Stack:
[  181.184587]  88007d203a78 813202aa 88007d203a98 
0046
[  181.184591]  0046 880072d08000  
880074d860d8
[  181.184594]   0001 88007d203a88 
81320445
[  181.184595] Call Trace:
[  181.184597]  IRQ 
[  181.184603]  [813202aa] blk_mq_complete_request+0x5a/0x1d0
[  181.184607]  [81320445] __blk_mq_end_io+0x25/0x30
[  181.184609]  [81320535] blk_mq_end_io+0xe5/0xf0
[  181.184613]  [81319754] blk_flush_complete_seq+0xf4/0x360
[  181.184616]  [81319a4b] ? flush_end_io+0x4b/0x210
[  181.184619]  [81319b2a] flush_end_io+0x12a/0x210
[  181.184622]  [813202da] blk_mq_complete_request+0x8a/0x1d0
[  181.184626]  [8147b9fd] ? scsi_device_unbusy+0x9d/0xd0
[  181.184629]  [81320445] __blk_mq_end_io+0x25/0x30
[  181.184632]  [81320535] blk_mq_end_io+0xe5/0xf0
[  181.184635]  [8147cae5] scsi_mq_end_request+0x15/0x20
[  181.184638]  [8147bf20] scsi_io_completion+0xa0/0x650
[  181.184643]  [810bbc3d] ? trace_hardirqs_off+0xd/0x10
[  181.184648]  [814722f7] scsi_finish_command+0x87/0xe0
[  181.184650]  [8147bccf] scsi_softirq_done+0x13f/0x160
[  181.184653]  [8147cba5] scsi_mq_done+0x15/0x20
[  181.184658]  [81495a73] ata_scsi_qc_complete+0x63/0x470
[  181.184661]  [8148fad0] __ata_qc_complete+0x90/0x140
[  181.184664]  [8148fc1d] ata_qc_complete+0x9d/0x230
[  181.184667]  [8148fe51] ata_qc_complete_multiple+0xa1/0xe0
[  181.184673]  [814aa449] ahci_handle_port_interrupt+0x109/0x560
[  181.184676]  [814ab63f] ahci_port_intr+0x2f/0x40
[  181.184678]  [814ab6f1] ahci_interrupt+0xa1/0x100
[  181.184683]  [810ff7b5] handle_irq_event_percpu+0x75/0x3d0
[  181.184686]  [810ffb58] handle_irq_event+0x48/0x70
[  181.184689]  [81102d9e] ? handle_fasteoi_irq+0x1e/0x100
[  181.184692]  [81102dda] handle_fasteoi_irq+0x5a/0x100
[  181.184696]  [81004320] handle_irq+0x60/0x150
[  181.184702]  [816ff846] ? atomic_notifier_call_chain+0x16/0x20
[ 

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-10-03 Thread Christoph Hellwig
On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
 What about the attached only compile tested patch. The patch has the mq
 block code work like the non mq code for bio cleanups.
 
 

 blk-mq: blk-mq should free bios in pass through case
 
 For non mq calls, the block layer will free the bios when
 blk_finish_request is called.
e 
 For mq calls, the blk mq code wants the caller to do this.
 
 This patch has the blk mq code work like the non mq code
 and has the block layer free the bios.
 
 Signed-off-by: Mike Christie micha...@cs.wisc.edu

This patch breaks booting for me in the current blk multiqueue tree,
with an apparent double free of a bio when using virtio-blk in writeback
mode (cache=writeback or cache=none in qemu):

[   15.253608] [ cut here ]
[   15.256422] kernel BUG at /work/hch/linux/fs/bio.c:498!
[   15.256879] invalid opcode:  [#1] SMP 
[   15.256879] Modules linked in:
[   15.256879] CPU: 3 PID: 353 Comm: kblockd Not tainted 3.11.0+ #25
[   15.256879] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   15.256879] task: 88007d75e0c0 ti: 88007d676000 task.ti: 
88007d676000
[   15.256879] RIP: 0010:[811b470a]  [811b470a] 
bio_put+0x8a/0x90
[   15.256879] RSP: 0018:88007fd83b50  EFLAGS: 00010046
[   15.256879] RAX:  RBX: 88007d713080 RCX: 0035
[   15.256879] RDX: 0002 RSI: 88007ad50338 RDI: 88007d713080
[   15.256879] RBP: 88007fd83b60 R08: 7010 R09: 007ad5033808
[   15.256879] R10: ff672b1b7d38ce02 R11: 028b R12: 
[   15.256879] R13:  R14: 88007b4c36c0 R15: 88007b40d608
[   15.256879] FS:  () GS:88007fd8() 
knlGS:
[   15.256879] CS:  0010 DS:  ES:  CR0: 8005003b
[   15.256879] CR2: 0138 CR3: 02124000 CR4: 06e0
[   15.256879] Stack:
[   15.256879]  88007d713080  88007fd83b80 
811ae8a3
[   15.256879]  88007fd83bf0 1000 88007fd83b90 
811b3268
[   15.256879]  88007fd83bc0 816ac847 88007b4c36c0 
88007fd99d00
[   15.256879] Call Trace:
[   15.256879]  IRQ 
[   15.256879]  [811ae8a3] end_bio_bh_io_sync+0x33/0x50
[   15.256879]  [811b3268] bio_endio+0x18/0x30
[   15.256879]  [816ac847] blk_mq_complete_request+0x47/0xd0
[   15.256879]  [816ac8e9] __blk_mq_end_io+0x19/0x20
[   15.256879]  [816ac958] blk_mq_end_io+0x68/0xd0
[   15.256879]  [816a6162] blk_flush_complete_seq+0xe2/0x370
[   15.256879]  [816a653b] flush_end_io+0x11b/0x200
[   15.256879]  [816ac875] blk_mq_complete_request+0x75/0xd0
[   15.256879]  [816ac8e9] __blk_mq_end_io+0x19/0x20
[   15.256879]  [816ac958] blk_mq_end_io+0x68/0xd0
[   15.256879]  [81844c2f] virtblk_done+0xef/0x260
[   15.256879]  [81753cc0] vring_interrupt+0x30/0x60
[   15.256879]  [81103724] handle_irq_event_percpu+0x54/0x1f0
[   15.256879]  [81103903] handle_irq_event+0x43/0x70
[   15.256879]  [8110609f] handle_edge_irq+0x6f/0x120
[   15.256879]  [810445b8] handle_irq+0x58/0x140
[   15.256879]  [81094bbf] ? irq_enter+0x4f/0x90
[   15.256879]  [810440b5] do_IRQ+0x55/0xd0
[   15.256879]  [81bd3972] common_interrupt+0x72/0x72
[   15.256879]  [810c5135] ? sched_clock_local+0x25/0xa0
[   15.256879]  [81094960] ? __do_softirq+0xb0/0x250
[   15.256879]  [81094959] ? __do_softirq+0xa9/0x250
[   15.256879]  [81094cae] irq_exit+0xae/0xd0
[   15.256879]  [8106dcd5] smp_apic_timer_interrupt+0x45/0x60
[   15.256879]  [81bdc772] apic_timer_interrupt+0x72/0x80
[   15.256879]  EOI 
[   15.256879]  [81bd3a33] ? retint_restore_args+0x13/0x13
[   15.256879]  [81bd3502] ? _raw_spin_unlock_irq+0x32/0x40
[   15.256879]  [81bd34fb] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [810ac0c4] rescuer_thread+0xe4/0x2f0
[   15.256879]  [810abfe0] ? process_scheduled_works+0x40/0x40
[   15.256879]  [810b3916] kthread+0xd6/0xe0
[   15.256879]  [81bd34fb] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [810b3840] ? __init_kthread_worker+0x70/0x70
[   15.256879]  [81bdbabc] ret_from_fork+0x7c/0xb0
[   15.256879]  [810b3840] ? __init_kthread_worker+0x70/0x70
[   15.256879] Code: ff 41 8b 44 24 08 48 89 df 49 8b 74 24 10 48 29 c7 e8 cb 
88 f8 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 90 48 89 df e8 b8 5c fc ff eb 9b 0f 0b 
0f 1f 40 00 55 48 89 e5 41 57 45 31 ff 41 56 41 55 41 54 
[   15.256879] RIP  [811b470a] bio_put+0x8a/0x90
[   15.256879]  RSP 88007fd83b50
[   15.256879] ---[ end trace 1f201608bfddfca7 ]---
[   15.256879] Kernel panic - not syncing: Fatal exception in interrupt
[   15.256879] Shutting down cpus with NMI

--
To unsubscribe from this 

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-09-20 Thread Alexander Gordeev
On Thu, Aug 15, 2013 at 07:19:29PM -0700, Nicholas A. Bellinger wrote:
 Ok, here's a bit better idea of what is going on now..
 
 The problem is that blkdev_issue_flush() - blk_mq_make_request() -
 __blk_mq_alloc_request() allocates the first tag, which calls
 blk_insert_flush() - blk_flush_complete_seq() - blk_flush_kick() -
 mq_flush_work() - blk_mq_alloc_request() to allocate a second tag for
 the struct request that actually gets dispatched into scsi-mq as a
 SYCHRONIZE_CACHE command..
 
 I'm not exactly sure why this double tag usage of struct request is
 occurring, but AFAICT it does happen for every flush, and is not
 specific to the blkdev_issue_flush() codepath..  I'm sure that Jens can
 fill us in on that bit.  ;)

I also played with the double tag using a reserved tag (below).

While it fixes 'fdisk /dev/sda' issue when trying to 'mount /dev/sda1 /mnt'
what appears to be a call to bio-bi_end_io() from the free'd bio hits in.

Not sure if I should pursue the root cause until the whole double-tag
thingy is confirmed.

Jens?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6fc1df3..81794dc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -874,14 +874,14 @@ static void blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
hctx = q-mq_ops-map_queue(q, ctx-cpu);
 
trace_block_getrq(q, bio, rw);
-   rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, false);
+   rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, is_flush_fua);
if (likely(rq))
blk_mq_rq_ctx_init(ctx, rq, rw);
else {
blk_mq_put_ctx(ctx);
trace_block_sleeprq(q, bio, rw);
rq = blk_mq_alloc_request_pinned(q, rw, __GFP_WAIT|GFP_ATOMIC,
-   false);
+   is_flush_fua);
ctx = rq-mq_ctx;
hctx = q-mq_ops-map_queue(q, ctx-cpu);
}
@@ -1317,6 +1317,9 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_reg 
*reg,
reg-queue_depth = BLK_MQ_MAX_DEPTH;
}
 
+   reg-queue_depth++;
+   reg-reserved_tags++;
+
ctx = alloc_percpu(struct blk_mq_ctx);
if (!ctx)
return ERR_PTR(-ENOMEM);

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-09-20 Thread Nicholas A. Bellinger
Hi Alexander!

Apologies for the long delay on this follow-up..  Comments below.

On Fri, 2013-09-20 at 17:19 +0200, Alexander Gordeev wrote:
 On Thu, Aug 15, 2013 at 07:19:29PM -0700, Nicholas A. Bellinger wrote:
  Ok, here's a bit better idea of what is going on now..
  
  The problem is that blkdev_issue_flush() - blk_mq_make_request() -
  __blk_mq_alloc_request() allocates the first tag, which calls
  blk_insert_flush() - blk_flush_complete_seq() - blk_flush_kick() -
  mq_flush_work() - blk_mq_alloc_request() to allocate a second tag for
  the struct request that actually gets dispatched into scsi-mq as a
  SYCHRONIZE_CACHE command..
  
  I'm not exactly sure why this double tag usage of struct request is
  occurring, but AFAICT it does happen for every flush, and is not
  specific to the blkdev_issue_flush() codepath..  I'm sure that Jens can
  fill us in on that bit.  ;)
 
 I also played with the double tag using a reserved tag (below).
 
 While it fixes 'fdisk /dev/sda' issue when trying to 'mount /dev/sda1 /mnt'
 what appears to be a call to bio-bi_end_io() from the free'd bio hits in.
 
 Not sure if I should pursue the root cause until the whole double-tag
 thingy is confirmed.
 
 Jens?
 
 
 diff --git a/block/blk-mq.c b/block/blk-mq.c
 index 6fc1df3..81794dc 100644
 --- a/block/blk-mq.c
 +++ b/block/blk-mq.c
 @@ -874,14 +874,14 @@ static void blk_mq_make_request(struct request_queue 
 *q, struct bio *bio)
   hctx = q-mq_ops-map_queue(q, ctx-cpu);
  
   trace_block_getrq(q, bio, rw);
 - rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, false);
 + rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, is_flush_fua);
   if (likely(rq))
   blk_mq_rq_ctx_init(ctx, rq, rw);
   else {
   blk_mq_put_ctx(ctx);
   trace_block_sleeprq(q, bio, rw);
   rq = blk_mq_alloc_request_pinned(q, rw, __GFP_WAIT|GFP_ATOMIC,
 - false);
 + is_flush_fua);
   ctx = rq-mq_ctx;
   hctx = q-mq_ops-map_queue(q, ctx-cpu);
   }

So this is what I ended up doing as well, and does address the specific
bug with queue_depth=1.

 @@ -1317,6 +1317,9 @@ struct request_queue *blk_mq_init_queue(struct 
 blk_mq_reg *reg,
   reg-queue_depth = BLK_MQ_MAX_DEPTH;
   }
  
 + reg-queue_depth++;
 + reg-reserved_tags++;
 +
   ctx = alloc_percpu(struct blk_mq_ctx);
   if (!ctx)
   return ERR_PTR(-ENOMEM);
 

I was actually setting this within scsi_mq_alloc_queue(), but given that
the queue_depth=1 issue is independent of scsi-mq, this does make more
sense.

Also, these extra increments should probably happen only when the passed
queue_depth == 1  reserved_tags == 0.

Other than that minor nit.

Reviewed-by: Nicholas Bellinger n...@linux-iscsi.org

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-28 Thread Alexander Gordeev
On Thu, Aug 15, 2013 at 07:19:29PM -0700, Nicholas A. Bellinger wrote:
 Anyways, before digging further into reserved tags logic, Jens, what are
 your thoughts for addressing this special queue_depth=1 case for libata
 + the like..?

Hi Jens,

Have some comments?

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-16 Thread Nicholas A. Bellinger
On Fri, 2013-08-16 at 18:41 +0200, Alexander Gordeev wrote:
 On Thu, Aug 15, 2013 at 07:19:29PM -0700, Nicholas A. Bellinger wrote:
  I'm playing with a patch to do this, but am currently getting hung-up on
  what appear to be some separate blk-mq reserved_tags  0 bugs, the first
  of which is passing queue_depth=1 + reserved_tags=1 is broken, and
  results in tags-nr_free = 0.
 
 That is not a bug - please look at Jens replies in this thread some week ago.
 In short, queue_depth=1 means 1 tags in total and reserved_tags=1 results
 in zero normal tags. You need to request the depth=2 and reserved_tags=1.
 

Ahhh, yes of course.  I'll re-work a proposed patch this afternoon with
this in mind..

 But yes, this is a separate topic and I am looking forward to hear from Jens
 wrt flushes.
 

Indeed.  ;)

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-15 Thread Alexander Gordeev
On Fri, Aug 09, 2013 at 01:17:37PM -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-08-09 at 21:15 +0200, Alexander Gordeev wrote:
 Mmmm, I'm able to reproduce over here with ahci + scsi-mq, and it
 appears to be a bug related with using sdev-sdev_md_req.queue_depth=1,
 that ends up causing the blkdev_issue_flush() to wait forever because
 blk_mq_wait_for_tags() never ends up getting the single tag back for the
 WRITE_FLUSH bio - SYNCHRONIZE_CACHE cdb.

It turns out this way - blkdev_issue_flush() claims the only tag, submits
the bio and waits for the completion. But because blk_mq_make_request()
does not mark any context in blk_mq_hw_ctx::ctx_map (nor enslists the request
into blk_mq_ctx::rq_list) it never gets processed from blk_mq_work_fn-
__blk_mq_run_hw_queue() and blkdev_issue_flush() waits endlessly. All other
requests are just waiting for the tag availability as result.

[...]

 Bumping queue_depth=2 seems to work-around the issue, but AFAICT it's a
 genuine tag starvation bug with queue_depth=1 and WRITE_FLUSH..

If I try to hack and force __blk_mq_run_hw_queue() to process the request...

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6fc1df3..c22b6f66 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -889,9 +962,12 @@ static void blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
hctx-queued++;
 
if (unlikely(is_flush_fua)) {
+   list_add(rq-queuelist, hctx-dispatch);
blk_mq_bio_to_request(q, rq, bio);
blk_mq_put_ctx(ctx);
blk_insert_flush(rq);
goto run_queue;
}

... I get a kernel BUG at drivers/scsi/scsi_lib.c:1233

BUG_ON(!req-nr_phys_segments);

IOW I am not sure how to proceed.

 --nab
 

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-15 Thread Nicholas A. Bellinger
On Thu, 2013-08-15 at 18:23 +0200, Alexander Gordeev wrote:
 On Fri, Aug 09, 2013 at 01:17:37PM -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-08-09 at 21:15 +0200, Alexander Gordeev wrote:
  Mmmm, I'm able to reproduce over here with ahci + scsi-mq, and it
  appears to be a bug related with using sdev-sdev_md_req.queue_depth=1,
  that ends up causing the blkdev_issue_flush() to wait forever because
  blk_mq_wait_for_tags() never ends up getting the single tag back for the
  WRITE_FLUSH bio - SYNCHRONIZE_CACHE cdb.
 
 It turns out this way - blkdev_issue_flush() claims the only tag, submits
 the bio and waits for the completion. But because blk_mq_make_request()
 does not mark any context in blk_mq_hw_ctx::ctx_map (nor enslists the request
 into blk_mq_ctx::rq_list) it never gets processed from blk_mq_work_fn-
 __blk_mq_run_hw_queue() and blkdev_issue_flush() waits endlessly. All other
 requests are just waiting for the tag availability as result.
 

Ok, here's a bit better idea of what is going on now..

The problem is that blkdev_issue_flush() - blk_mq_make_request() -
__blk_mq_alloc_request() allocates the first tag, which calls
blk_insert_flush() - blk_flush_complete_seq() - blk_flush_kick() -
mq_flush_work() - blk_mq_alloc_request() to allocate a second tag for
the struct request that actually gets dispatched into scsi-mq as a
SYCHRONIZE_CACHE command..

I'm not exactly sure why this double tag usage of struct request is
occurring, but AFAICT it does happen for every flush, and is not
specific to the blkdev_issue_flush() codepath..  I'm sure that Jens can
fill us in on that bit.  ;)

So, assuming that this double tag usage is necessary and not a bug,
perhaps using a reserved tag for the first tag (eg: the one that's not
dispatched into scsi_mq_queue_rq) makes sense..?

I'm playing with a patch to do this, but am currently getting hung-up on
what appear to be some separate blk-mq reserved_tags  0 bugs, the first
of which is passing queue_depth=1 + reserved_tags=1 is broken, and
results in tags-nr_free = 0.

Here's the quick fix:

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 6718007..ffdf686 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -470,7 +470,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 * Rest of the tags start at the queue list
 */
tags-nr_free = 0;
-   while (nr_tags - tags-nr_reserved) {
+   while (nr_tags) {
tags-freelist[tags-nr_free] = tags-nr_free +
tags-nr_reserved;
nr_tags--;

Anyways, before digging further into reserved tags logic, Jens, what are
your thoughts for addressing this special queue_depth=1 case for libata
+ the like..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-12 Thread Alexander Gordeev
On Fri, Aug 09, 2013 at 11:07:37AM -0600, Jens Axboe wrote:
 You don't have to resubmit, I'll get it reviewed and applied today.

Hi Jens,

I limited the minimal queue depth to 4, which is apparently wrong
in case of libata. I will post a new series.

 -- 
 Jens Axboe
 

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Alexander Gordeev
On Mon, Jul 29, 2013 at 07:46:53AM -0400, Tejun Heo wrote:
 One thing which would probably be worthwhile tho is getting rid of the
 bitmap based qc tag allocator in libata.  That one is just borderline
 stupid to keep around on any setup which is supposed to be scalable.

Hi Tejun,

How about this approach?

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index f218427..5c2a236 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -72,6 +72,8 @@
 #include libata.h
 #include libata-transport.h
 
+#include ../../block/blk-mq-tag.h
+
 /* debounce timing parameters in msecs { interval, duration, timeout } */
 const unsigned long sata_deb_timing_normal[]   = {   5,  100, 2000 };
 const unsigned long sata_deb_timing_hotplug[]  = {  25,  500, 2000 };
@@ -1569,18 +1571,8 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
 
/* initialize internal qc */
 
-   /* XXX: Tag 0 is used for drivers with legacy EH as some
-* drivers choke if any other tag is given.  This breaks
-* ata_tag_internal() test for those drivers.  Don't use new
-* EH stuff without converting to it.
-*/
-   if (ap-ops-error_handler)
-   tag = ATA_TAG_INTERNAL;
-   else
-   tag = 0;
-
-   if (test_and_set_bit(tag, ap-qc_allocated))
-   BUG();
+   tag = blk_mq_get_tag(ap-qc_tags, GFP_ATOMIC, true);
+   BUG_ON(!ata_tag_internal(tag));
qc = __ata_qc_from_tag(ap, tag);
 
qc-tag = tag;
@@ -4733,21 +4725,17 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
 static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
 {
struct ata_queued_cmd *qc = NULL;
-   unsigned int i;
+   unsigned int tag;
 
/* no command while frozen */
if (unlikely(ap-pflags  ATA_PFLAG_FROZEN))
return NULL;
 
-   /* the last tag is reserved for internal command. */
-   for (i = 0; i  ATA_MAX_QUEUE - 1; i++)
-   if (!test_and_set_bit(i, ap-qc_allocated)) {
-   qc = __ata_qc_from_tag(ap, i);
-   break;
-   }
+   tag = blk_mq_get_tag(ap-qc_tags, GFP_ATOMIC, false);
+   qc = __ata_qc_from_tag(ap, tag);
 
if (qc)
-   qc-tag = i;
+   qc-tag = tag;
 
return qc;
 }
@@ -4799,7 +4787,7 @@ void ata_qc_free(struct ata_queued_cmd *qc)
tag = qc-tag;
if (likely(ata_tag_valid(tag))) {
qc-tag = ATA_TAG_POISON;
-   clear_bit(tag, ap-qc_allocated);
+   blk_mq_put_tag(ap-qc_tags, tag);
}
 }
 
@@ -5639,6 +5627,12 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
if (!ap)
return NULL;
 
+   ap-qc_tags = blk_mq_init_tags(ATA_MAX_QUEUE, 1, NUMA_NO_NODE);
+   if (!ap-qc_tags) {
+   kfree(ap);
+   return NULL;
+   }
+
ap-pflags |= ATA_PFLAG_INITIALIZING | ATA_PFLAG_FROZEN;
ap-lock = host-lock;
ap-print_id = -1;
@@ -5677,6 +5671,14 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
return ap;
 }
 
+static void ata_port_free(struct ata_port *ap)
+{
+   kfree(ap-pmp_link);
+   kfree(ap-slave_link);
+   blk_mq_free_tags(ap-qc_tags);
+   kfree(ap);
+}
+
 static void ata_host_release(struct device *gendev, void *res)
 {
struct ata_host *host = dev_get_drvdata(gendev);
@@ -5691,9 +5693,7 @@ static void ata_host_release(struct device *gendev, void 
*res)
if (ap-scsi_host)
scsi_host_put(ap-scsi_host);
 
-   kfree(ap-pmp_link);
-   kfree(ap-slave_link);
-   kfree(ap);
+   ata_port_free(ap);
host-ports[i] = NULL;
}
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index eae7a05..4ff9494 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -126,7 +126,7 @@ enum {
ATA_DEF_QUEUE   = 1,
/* tag ATA_MAX_QUEUE - 1 is reserved for internal commands */
ATA_MAX_QUEUE   = 32,
-   ATA_TAG_INTERNAL= ATA_MAX_QUEUE - 1,
+   ATA_TAG_INTERNAL= 0,
ATA_SHORT_PAUSE = 16,
 
ATAPI_MAX_DRAIN = 16  10,
@@ -766,7 +766,7 @@ struct ata_port {
unsigned intcbl;/* cable type; ATA_CBL_xxx */
 
struct ata_queued_cmd   qcmd[ATA_MAX_QUEUE];
-   unsigned long   qc_allocated;
+   struct blk_mq_tags  *qc_tags;
unsigned intqc_active;
int nr_active_links; /* #links with active qcs */
 

 Thanks.
 
 -- 
 tejun

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Tejun Heo
On Fri, Aug 09, 2013 at 10:23:35AM +0200, Alexander Gordeev wrote:
 On Mon, Jul 29, 2013 at 07:46:53AM -0400, Tejun Heo wrote:
  One thing which would probably be worthwhile tho is getting rid of the
  bitmap based qc tag allocator in libata.  That one is just borderline
  stupid to keep around on any setup which is supposed to be scalable.
 
 Hi Tejun,
 
 How about this approach?

Haven't looked at it thoroughly and I still don't know anything about
blk-mq but it looks good on the first glance.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Jens Axboe
On 08/09/2013 02:23 AM, Alexander Gordeev wrote:
 On Mon, Jul 29, 2013 at 07:46:53AM -0400, Tejun Heo wrote:
 One thing which would probably be worthwhile tho is getting rid of the
 bitmap based qc tag allocator in libata.  That one is just borderline
 stupid to keep around on any setup which is supposed to be scalable.
 
 Hi Tejun,
 
 How about this approach?
 
 @@ -5639,6 +5627,12 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
   if (!ap)
   return NULL;
  
 + ap-qc_tags = blk_mq_init_tags(ATA_MAX_QUEUE, 1, NUMA_NO_NODE);
 + if (!ap-qc_tags) {
 + kfree(ap);
 + return NULL;
 + }

This should be blk_mq_init_tags(ATA_MAX_QUEUE - 1, 1, ...) since the
total depth is normal_tags + reserved_tags. Apart from that, I think it
looks alright based on a cursory look.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Alexander Gordeev
On Fri, Aug 09, 2013 at 08:24:38AM -0600, Jens Axboe wrote:
 On 08/09/2013 02:23 AM, Alexander Gordeev wrote:
  +   ap-qc_tags = blk_mq_init_tags(ATA_MAX_QUEUE, 1, NUMA_NO_NODE);
  +   if (!ap-qc_tags) {
  +   kfree(ap);
  +   return NULL;
  +   }
 
 This should be blk_mq_init_tags(ATA_MAX_QUEUE - 1, 1, ...) since the
 total depth is normal_tags + reserved_tags.

Aha.. If blk_mq_init_tags() should be like this then?

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index dcbc2a4..b131a48 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -468,10 +468,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 * Rest of the tags start at the queue list
 */
tags-nr_free = 0;
-   while (nr_tags - tags-nr_reserved) {
+   while (nr_tags--) {
tags-freelist[tags-nr_free] = tags-nr_free +
tags-nr_reserved;
-   nr_tags--;
tags-nr_free++;
}

 -- 
 Jens Axboe

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Jens Axboe
On 08/09/2013 09:07 AM, Alexander Gordeev wrote:
 On Fri, Aug 09, 2013 at 08:24:38AM -0600, Jens Axboe wrote:
 On 08/09/2013 02:23 AM, Alexander Gordeev wrote:
 +   ap-qc_tags = blk_mq_init_tags(ATA_MAX_QUEUE, 1, NUMA_NO_NODE);
 +   if (!ap-qc_tags) {
 +   kfree(ap);
 +   return NULL;
 +   }

 This should be blk_mq_init_tags(ATA_MAX_QUEUE - 1, 1, ...) since the
 total depth is normal_tags + reserved_tags.
 
 Aha.. If blk_mq_init_tags() should be like this then?
 
 diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
 index dcbc2a4..b131a48 100644
 --- a/block/blk-mq-tag.c
 +++ b/block/blk-mq-tag.c
 @@ -468,10 +468,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int 
 nr_tags,
* Rest of the tags start at the queue list
*/
   tags-nr_free = 0;
 - while (nr_tags - tags-nr_reserved) {
 + while (nr_tags--) {
   tags-freelist[tags-nr_free] = tags-nr_free +
   tags-nr_reserved;
 - nr_tags--;
   tags-nr_free++;
   }

I misremembered, just checked the code. I think I used to have it like I
described, but changed it since I thought it would be more logical to
pass in full depth, and then what part of that is reserved. Looking at
the current code, your patch looks correct as-is.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Alexander Gordeev
On Fri, Aug 09, 2013 at 09:52:19AM -0600, Jens Axboe wrote:
 On 08/09/2013 09:07 AM, Alexander Gordeev wrote:
  diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
  index dcbc2a4..b131a48 100644
  --- a/block/blk-mq-tag.c
  +++ b/block/blk-mq-tag.c
  @@ -468,10 +468,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int 
  nr_tags,
   * Rest of the tags start at the queue list
   */
  tags-nr_free = 0;
  -   while (nr_tags - tags-nr_reserved) {
  +   while (nr_tags--) {
  tags-freelist[tags-nr_free] = tags-nr_free +
  tags-nr_reserved;
  -   nr_tags--;
  tags-nr_free++;
  }
 
 I misremembered, just checked the code. I think I used to have it like I
 described, but changed it since I thought it would be more logical to
 pass in full depth, and then what part of that is reserved. Looking at
 the current code, your patch looks correct as-is.

Ok, then a whole series [PATCH 0/3] blk-mq: Avoid effects of a weird queue
depth (I posted earlier in a separate thread) should make sense. Besides
the hunk above it limits the per-cpu cache size and sanity-checks total vs
reserved length. I can resubmit if you want.

 
 -- 
 Jens Axboe
 

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Jens Axboe
On 08/09/2013 10:46 AM, Alexander Gordeev wrote:
 On Fri, Aug 09, 2013 at 09:52:19AM -0600, Jens Axboe wrote:
 On 08/09/2013 09:07 AM, Alexander Gordeev wrote:
 diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
 index dcbc2a4..b131a48 100644
 --- a/block/blk-mq-tag.c
 +++ b/block/blk-mq-tag.c
 @@ -468,10 +468,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int 
 nr_tags,
  * Rest of the tags start at the queue list
  */
 tags-nr_free = 0;
 -   while (nr_tags - tags-nr_reserved) {
 +   while (nr_tags--) {
 tags-freelist[tags-nr_free] = tags-nr_free +
 tags-nr_reserved;
 -   nr_tags--;
 tags-nr_free++;
 }

 I misremembered, just checked the code. I think I used to have it like I
 described, but changed it since I thought it would be more logical to
 pass in full depth, and then what part of that is reserved. Looking at
 the current code, your patch looks correct as-is.
 
 Ok, then a whole series [PATCH 0/3] blk-mq: Avoid effects of a weird queue
 depth (I posted earlier in a separate thread) should make sense. Besides
 the hunk above it limits the per-cpu cache size and sanity-checks total vs
 reserved length. I can resubmit if you want.

You don't have to resubmit, I'll get it reviewed and applied today.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-08-09 Thread Alexander Gordeev
On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
 What about the attached only compile tested patch. The patch has the mq
 block code work like the non mq code for bio cleanups.

Not sure if it is related to the patch or not, but it never returns from
wait_for_completion_io(wait) in blkdev_issue_flush():

# ps axl | awk '$10 ~ /D\+/'
4 0   938   879  20   0 111216   656 blkdev D+   pts/1  0:00 
fdisk/dev/sda
#
# cat /proc/938/stack
[812a8a5c] blkdev_issue_flush+0xfc/0x160
[811ac606] blkdev_fsync+0x96/0xc0
[811a2f4d] do_fsync+0x5d/0x90
[811a3330] SyS_fsync+0x10/0x20
[81611582] system_call_fastpath+0x16/0x1b
[] 0x

Any ideas?

Thanks!

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-31 Thread Tejun Heo
Hello,

On Tue, Jul 30, 2013 at 09:16:02PM -0700, Marc C wrote:
  One thing which would probably be worthwhile tho is getting rid of the
  bitmap based qc tag allocator in libata.  That one is just borderline
  stupid to keep around on any setup which is supposed to be scalable.
  Your border might be wider than mine :-). Yes, the bitmap should
  definitely go.

 A naive implementation is obviously less-than-efficient. However, what
 other problems exist with the libata QC tag allocator? I highly doubt
 SATA will change to beyond 32 queue tags, primarily because it would
 be a pain to change SDB FIS (it's likely to break the dozens of AHCI
 controller implementations out there). Further, it seems like the
 industry stopped caring about SATA and is pushing NVMe for future
 offerings.
 
 In any event, most modern systems should have instructions to count
 leading zeroes and modify bits atomically.

It's inefficient not because scanning is expensive but because it
makes all CPUs in the system to hit on the exact same cacheline over
and over and over again.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-31 Thread Alexander Gordeev
On Thu, Jul 25, 2013 at 12:16:41PM +0200, Alexander Gordeev wrote:
 On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
  Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
  target-pending/scsi-mq, which now has functioning scsi-generic support.
 
 Survives a boot, a kernel build and the build's result :)

Not that rosy. Turned out the old code is called. Hangs with this hunk..

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index ac05cd6..a75fd41 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -1103,6 +1103,8 @@ static struct device_attribute *piix_sidpr_shost_attrs[] 
= {
 static struct scsi_host_template piix_sidpr_sht = {
ATA_BMDMA_SHT(DRV_NAME),
.shost_attrs= piix_sidpr_shost_attrs,
+   .scsi_mq= true,
+   .queuecommand_mq= ata_scsi_queuecmd,
 };
 
 static struct ata_port_operations piix_sidpr_sata_ops = {


-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-30 Thread Marc C
 One thing which would probably be worthwhile tho is getting rid of the
 bitmap based qc tag allocator in libata.  That one is just borderline
 stupid to keep around on any setup which is supposed to be scalable.
 Your border might be wider than mine :-). Yes, the bitmap should
 definitely go.
A naive implementation is obviously less-than-efficient. However, what
other problems exist with the libata QC tag allocator? I highly doubt
SATA will change to beyond 32 queue tags, primarily because it would
be a pain to change SDB FIS (it's likely to break the dozens of AHCI
controller implementations out there). Further, it seems like the
industry stopped caring about SATA and is pushing NVMe for future
offerings.

In any event, most modern systems should have instructions to count
leading zeroes and modify bits atomically.

-MC

On Mon, Jul 29, 2013 at 12:19 PM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 On Mon, 2013-07-29 at 13:18 +0200, Alexander Gordeev wrote:
 On Fri, Jul 26, 2013 at 05:43:13PM -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-26 at 14:14 -0700, Nicholas A. Bellinger wrote:

 SNIP

 I also tried to make a quick conversion and hit the same issue(s) as you.
 Generally, I am concerned with these assumptions in such approach:

 1. While libata concept of tags matches nicely with blk-mq (blk_mq_hw_ctx::
 rqs[] vs ata_port::qcmd[]) right now, it is too exposed to changes in blk-mq
 in the long run. I.e. ata_link::sactive limits tags to indices, while tags
 might become hashes. Easily fixable, but still.

 2. Unallocated requests in blk-mq are accessed/analized from libata-eh.c as
 result of such iterations:

 for (tag = 0; tag  ATA_MAX_QUEUE; tag++) {
 qc = __ata_qc_from_tag(ap, tag);

 if (!(qc-flags  ATA_QCFLAG_FAILED))
 continue;

   ...
   }

 While it is probably okay right now, it is still based on a premise that
 blk-mq will not change the contents/concept of payload, i.e. from embedded
 to (re-)allocated memory.

  The thing that I'm hung up on now for existing __ata_qc_from_tag() usage
  outside of the main blk_mq_ops-queue_rq - SHT-queuecommand_mq()
  dispatch path, is how to actually locate the underlying scsi_device -
  request_queue - blk_mq_ctx - blk_mq_hw_hctx from the passed
  ata_port..?

 I am actually in favor of getting rid of ata_queued_cmd::tag. Converting
 ata_link::sactive to a list, making ata_link::active_tag as struct
 ata_queued_cmd *ata_link::active_qc and converting ata_port::qc_allocated to 
 a
 list seems solves it all, including [2]. Have not checked it though.

 Anyway, if we need a blk-mq tag (why?), we have qc-scsicmd-request-tag.


 Hi Alexander,

 So given the feedback from Tejun, I'm going to setup back for the moment
 from a larger conversion, and keep the SHT-cmd_size=0 setting for
 libata in the scsi-mq WIP branch.

 I'm happy to accept patches to drop the bitmap piece that Tejun
 mentioned if your interested, but at least on my end right now there are
 bigger fish to fry for scsi-mq.  ;)

 Thanks,

 --nab

 --
 To unsubscribe from this list: send the line unsubscribe linux-ide in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Hannes Reinecke
On 07/26/2013 04:09 AM, Jens Axboe wrote:
 On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
 On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
 Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
 target-pending/scsi-mq, which now has functioning scsi-generic support.

 Survives a boot, a kernel build and the build's result :)

 Great.  Thanks for the feedback Alexander!

 So the next step on my end is to enable -mq for ahci, and verify initial
 correctness using QEMU/KVM hardware emulation.

 Btw, I've been looking at enabling the SHT-cmd_size for struct
 ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
 are already all pre-allocated by libata and obtained via ata_qc_new() -
 __ata_qc_from_tag() during ata_scsi_queuecmd().
 
 Might still not be a bad idea to do it:
 
 - Cleans up a driver, getting rid of the need to alloc, maintain, and
   free those structures.
 
 - Should be some cache locality benefits to having it all sequential.
 
 So that said, with the struct request + struct scsi_cmnd pre-allocations
 already provided by blk-mq - scsi-mq code, all memory allocations
 should have already been eliminated from I/O fast path.
 
 Nice!
 
Hmm.

I'm trying to work out if it would be possible to move multipath
handling over to scsi-mq.
However, when doing so I would need to reconfigure 'nr_hw_queues' on
the fly. Now with all the static cmd preallocation going on this is
going to be tricky.

This leaves me with two choices:
- Tear down the command pool altogether whenever I need to
  reconfigure the device (which is going to be painful)
- Allocate some max nr_hw_queues, and mark the superfluous
  ones as 'unused' or something. Seeing the a sane max nr_hw_queues
  will be possibly the number of cpus this might end up hogging
  quite some memory.

Would you accept patches moving the static command allocation over
to pools or is this a desired feature?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Alexander Gordeev
On Fri, Jul 26, 2013 at 05:43:13PM -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-07-26 at 14:14 -0700, Nicholas A. Bellinger wrote:
  On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote:
   On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
 On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
  Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
  target-pending/scsi-mq, which now has functioning scsi-generic 
  support.
 
 Survives a boot, a kernel build and the build's result :)

Great.  Thanks for the feedback Alexander!

So the next step on my end is to enable -mq for ahci, and verify initial
correctness using QEMU/KVM hardware emulation.

Btw, I've been looking at enabling the SHT-cmd_size for struct
ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
are already all pre-allocated by libata and obtained via ata_qc_new() -
__ata_qc_from_tag() during ata_scsi_queuecmd().
   
   Might still not be a bad idea to do it:
   
   - Cleans up a driver, getting rid of the need to alloc, maintain, and
 free those structures.
   
   - Should be some cache locality benefits to having it all sequential.
   
  
  Looking at this some more, there are a number of locations outside of
  the main blk_mq_ops-queue_rq() - SHT-queuecommand_mq() dispatch that
  use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a
  associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg()
  for example..
  
  So I don't think (completely) getting rid of ata_port-qcmds[] will be
  possible, and just converting the ata_scsi_queuecmd() path to use the
  extra SHT-cmd_size pre-allocation for *ata_queued_cmd might end up
  being more trouble that it's worth.  Still undecided on that part..
  
  Tejun, do you have any thoughts + input here..?
  
 
 OK, so I decided to give this a shot anyways..  Here is a quick
 conversion for libata + AHCI to use blk-mq - scsi-mq pre-allocation for
 ata_queued_cmd descriptors:
 
 diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
 index 2b50dfd..61b3db8 100644
 --- a/drivers/ata/ahci.c
 +++ b/drivers/ata/ahci.c
 @@ -92,6 +92,9 @@ static int ahci_pci_device_resume(struct pci_dev *pdev);
  
  static struct scsi_host_template ahci_sht = {
 AHCI_SHT(ahci),
 +   .scsi_mq = true,
 +   .cmd_size = sizeof(struct ata_queued_cmd),
 +   .queuecommand_mq = ata_scsi_queuecmd,
  };
 diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
 index f218427..e21814d 100644
 --- a/drivers/ata/libata-core.c
 +++ b/drivers/ata/libata-core.c
 @@ -4725,29 +4725,25 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
  /**
   * ata_qc_new - Request an available ATA command, for queueing
   * @ap: target port
 + * @sc: incoming scsi_cmnd descriptor
   *
   * LOCKING:
   * None.
   */
  
 -static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
 +static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap,
 +struct scsi_cmnd *sc)
  {
 struct ata_queued_cmd *qc = NULL;
 -   unsigned int i;
 +   struct request *rq = sc-request;
  
 /* no command while frozen */
 if (unlikely(ap-pflags  ATA_PFLAG_FROZEN))
 return NULL;
  
 -   /* the last tag is reserved for internal command. */
 -   for (i = 0; i  ATA_MAX_QUEUE - 1; i++)

blk-mq does not prevent tag ATA_TAG_INTERNAL from being using. Would it make
sense to promote queue depth of length (ATA_MAX_QUEUE - 1) while always
pointing ATA_TAG_INTERNAL to qcmd (see below)?

 -   if (!test_and_set_bit(i, ap-qc_allocated)) {

ata_port::qc_allocated becomes redundant.

 -   qc = __ata_qc_from_tag(ap, i);
 -   break;
 -   }
 -
 -   if (qc)
 -   qc-tag = i;
 +   qc = (struct ata_queued_cmd *)sc-SCp.ptr;
 +   qc-scsicmd = sc;
 +   qc-tag = rq-tag;
  
 return qc;
  }
 @@ -4755,19 +4751,20 @@ static struct ata_queued_cmd *ata_qc_new(struct 
 ata_port *ap)
  /**
   * ata_qc_new_init - Request an available ATA command, and initialize it
   * @dev: Device from whom we request an available command structure
 + * @sc: incoming scsi_cmnd descriptor
   *
   * LOCKING:
   * None.
   */
  
 -struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev)
 +struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev,
 +  struct scsi_cmnd *sc)
  {
 struct ata_port *ap = dev-link-ap;
 struct ata_queued_cmd *qc;
  
 -   qc = ata_qc_new(ap);
 +   qc = ata_qc_new(ap, sc);
 if (qc) {
 -   qc-scsicmd = NULL;
 qc-ap = ap;
 qc-dev = dev;
  
 @@ -4797,10 +4794,9 @@ void ata_qc_free(struct ata_queued_cmd *qc)
  
 qc-flags 

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Tejun Heo
Hello,

On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote:
 So I don't think (completely) getting rid of ata_port-qcmds[] will be
 possible, and just converting the ata_scsi_queuecmd() path to use the
 extra SHT-cmd_size pre-allocation for *ata_queued_cmd might end up
 being more trouble that it's worth.  Still undecided on that part..
 
 Tejun, do you have any thoughts + input here..?

libata exception handling which includes probing doesn't go through
SCSI at all.  It all works inside libata proper using ata_queuecmds
and only the result is exposed to SCSI.  Most of those SCSI semantics
need to be emulated anyway, so this makes things a lot easier than
going through SCSI for each command.  As it currently stands, it'd be
a lot of effort to try to embed ata_qc's into higher layer construct.
Given how it's used, I don't think it's a high priority task.

One thing which would probably be worthwhile tho is getting rid of the
bitmap based qc tag allocator in libata.  That one is just borderline
stupid to keep around on any setup which is supposed to be scalable.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Tejun Heo
Yo,

On Fri, Jul 26, 2013 at 05:43:13PM -0700, Nicholas A. Bellinger wrote:
 Considering there can be more than a single ata_device hanging off each
 ata_port, the '*sdev = ap-link.device[0].sdev' in __ata_qc_from_tag()
 is definitely bogus, but I'm not sure how else to correlate
 blk-mq/scsi-mq per device descriptors to existing code expecting
 ata_port-qcmd[] descriptors to be shared across multiple devices..
 
 Tejun..?

I have no idea.  Let's please just do simpler conversion and worry
about embedding qc's into scsi_cmnds later.  libata isn't a normal
SCSI driver and has a rather its own thick midlayer doing the
impedance matching inbetween  I really don't think there is too much
benefit to be reaped from embedding qc's into scsi_cmnds.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Jens Axboe
On 07/29/2013 05:46 AM, Tejun Heo wrote:
 Hello,
 
 On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote:
 So I don't think (completely) getting rid of ata_port-qcmds[] will be
 possible, and just converting the ata_scsi_queuecmd() path to use the
 extra SHT-cmd_size pre-allocation for *ata_queued_cmd might end up
 being more trouble that it's worth.  Still undecided on that part..

 Tejun, do you have any thoughts + input here..?
 
 libata exception handling which includes probing doesn't go through
 SCSI at all.  It all works inside libata proper using ata_queuecmds
 and only the result is exposed to SCSI.  Most of those SCSI semantics
 need to be emulated anyway, so this makes things a lot easier than
 going through SCSI for each command.  As it currently stands, it'd be
 a lot of effort to try to embed ata_qc's into higher layer construct.
 Given how it's used, I don't think it's a high priority task.
 
 One thing which would probably be worthwhile tho is getting rid of the
 bitmap based qc tag allocator in libata.  That one is just borderline
 stupid to keep around on any setup which is supposed to be scalable.

Your border might be wider than mine :-). Yes, the bitmap should
definitely go.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Jens Axboe
On 07/29/2013 05:18 AM, Alexander Gordeev wrote:
 -static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
 +static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap,
 +struct scsi_cmnd *sc)
  {
 struct ata_queued_cmd *qc = NULL;
 -   unsigned int i;
 +   struct request *rq = sc-request;
  
 /* no command while frozen */
 if (unlikely(ap-pflags  ATA_PFLAG_FROZEN))
 return NULL;
  
 -   /* the last tag is reserved for internal command. */
 -   for (i = 0; i  ATA_MAX_QUEUE - 1; i++)
 
 blk-mq does not prevent tag ATA_TAG_INTERNAL from being using. Would it make
 sense to promote queue depth of length (ATA_MAX_QUEUE - 1) while always
 pointing ATA_TAG_INTERNAL to qcmd (see below)?

blk-mq does support a number of reserved tags, information just needs to
be passed in appropriately. So there is support for reserving X number
of error handling / emergency tags.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Nicholas A. Bellinger
On Mon, 2013-07-29 at 07:50 -0400, Tejun Heo wrote:
 Yo,
 
 On Fri, Jul 26, 2013 at 05:43:13PM -0700, Nicholas A. Bellinger wrote:
  Considering there can be more than a single ata_device hanging off each
  ata_port, the '*sdev = ap-link.device[0].sdev' in __ata_qc_from_tag()
  is definitely bogus, but I'm not sure how else to correlate
  blk-mq/scsi-mq per device descriptors to existing code expecting
  ata_port-qcmd[] descriptors to be shared across multiple devices..
  
  Tejun..?
 
 I have no idea.  Let's please just do simpler conversion and worry
 about embedding qc's into scsi_cmnds later.  libata isn't a normal
 SCSI driver and has a rather its own thick midlayer doing the
 impedance matching inbetween  I really don't think there is too much
 benefit to be reaped from embedding qc's into scsi_cmnds.
 

That is essentially the same conclusion that I came to, but wanted to at
least give you a chance to comment here.  ;)

So that said, I'll include a simple conversion for libata into the
scsi-mq WIP branch, and folks who are interested in more detailed
conversions can pursue them as separate items.

--nab 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-29 Thread Nicholas A. Bellinger
On Mon, 2013-07-29 at 13:18 +0200, Alexander Gordeev wrote:
 On Fri, Jul 26, 2013 at 05:43:13PM -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-26 at 14:14 -0700, Nicholas A. Bellinger wrote:

SNIP

 I also tried to make a quick conversion and hit the same issue(s) as you.
 Generally, I am concerned with these assumptions in such approach:
 
 1. While libata concept of tags matches nicely with blk-mq (blk_mq_hw_ctx::
 rqs[] vs ata_port::qcmd[]) right now, it is too exposed to changes in blk-mq
 in the long run. I.e. ata_link::sactive limits tags to indices, while tags
 might become hashes. Easily fixable, but still.
 
 2. Unallocated requests in blk-mq are accessed/analized from libata-eh.c as
 result of such iterations:
 
 for (tag = 0; tag  ATA_MAX_QUEUE; tag++) {
 qc = __ata_qc_from_tag(ap, tag);
 
 if (!(qc-flags  ATA_QCFLAG_FAILED))
 continue;
 
   ...
   }
 
 While it is probably okay right now, it is still based on a premise that
 blk-mq will not change the contents/concept of payload, i.e. from embedded
 to (re-)allocated memory.
 
  The thing that I'm hung up on now for existing __ata_qc_from_tag() usage
  outside of the main blk_mq_ops-queue_rq - SHT-queuecommand_mq()
  dispatch path, is how to actually locate the underlying scsi_device -
  request_queue - blk_mq_ctx - blk_mq_hw_hctx from the passed
  ata_port..?
 
 I am actually in favor of getting rid of ata_queued_cmd::tag. Converting
 ata_link::sactive to a list, making ata_link::active_tag as struct
 ata_queued_cmd *ata_link::active_qc and converting ata_port::qc_allocated to a
 list seems solves it all, including [2]. Have not checked it though.
 
 Anyway, if we need a blk-mq tag (why?), we have qc-scsicmd-request-tag.
 

Hi Alexander,

So given the feedback from Tejun, I'm going to setup back for the moment
from a larger conversion, and keep the SHT-cmd_size=0 setting for
libata in the scsi-mq WIP branch.

I'm happy to accept patches to drop the bitmap piece that Tejun
mentioned if your interested, but at least on my end right now there are
bigger fish to fry for scsi-mq.  ;)

Thanks,

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-26 Thread Nicholas A. Bellinger
On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote:
 On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
  On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
   On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
target-pending/scsi-mq, which now has functioning scsi-generic support.
   
   Survives a boot, a kernel build and the build's result :)
  
  Great.  Thanks for the feedback Alexander!
  
  So the next step on my end is to enable -mq for ahci, and verify initial
  correctness using QEMU/KVM hardware emulation.
  
  Btw, I've been looking at enabling the SHT-cmd_size for struct
  ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
  are already all pre-allocated by libata and obtained via ata_qc_new() -
  __ata_qc_from_tag() during ata_scsi_queuecmd().
 
 Might still not be a bad idea to do it:
 
 - Cleans up a driver, getting rid of the need to alloc, maintain, and
   free those structures.
 
 - Should be some cache locality benefits to having it all sequential.
 

Looking at this some more, there are a number of locations outside of
the main blk_mq_ops-queue_rq() - SHT-queuecommand_mq() dispatch that
use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a
associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg()
for example..

So I don't think (completely) getting rid of ata_port-qcmds[] will be
possible, and just converting the ata_scsi_queuecmd() path to use the
extra SHT-cmd_size pre-allocation for *ata_queued_cmd might end up
being more trouble that it's worth.  Still undecided on that part..

Tejun, do you have any thoughts + input here..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-26 Thread Nicholas A. Bellinger
On Fri, 2013-07-26 at 14:14 -0700, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote:
  On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
   On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
 Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
 target-pending/scsi-mq, which now has functioning scsi-generic 
 support.

Survives a boot, a kernel build and the build's result :)
   
   Great.  Thanks for the feedback Alexander!
   
   So the next step on my end is to enable -mq for ahci, and verify initial
   correctness using QEMU/KVM hardware emulation.
   
   Btw, I've been looking at enabling the SHT-cmd_size for struct
   ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
   are already all pre-allocated by libata and obtained via ata_qc_new() -
   __ata_qc_from_tag() during ata_scsi_queuecmd().
  
  Might still not be a bad idea to do it:
  
  - Cleans up a driver, getting rid of the need to alloc, maintain, and
free those structures.
  
  - Should be some cache locality benefits to having it all sequential.
  
 
 Looking at this some more, there are a number of locations outside of
 the main blk_mq_ops-queue_rq() - SHT-queuecommand_mq() dispatch that
 use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a
 associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg()
 for example..
 
 So I don't think (completely) getting rid of ata_port-qcmds[] will be
 possible, and just converting the ata_scsi_queuecmd() path to use the
 extra SHT-cmd_size pre-allocation for *ata_queued_cmd might end up
 being more trouble that it's worth.  Still undecided on that part..
 
 Tejun, do you have any thoughts + input here..?
 

OK, so I decided to give this a shot anyways..  Here is a quick
conversion for libata + AHCI to use blk-mq - scsi-mq pre-allocation for
ata_queued_cmd descriptors:

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 2b50dfd..61b3db8 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -92,6 +92,9 @@ static int ahci_pci_device_resume(struct pci_dev *pdev);
 
 static struct scsi_host_template ahci_sht = {
AHCI_SHT(ahci),
+   .scsi_mq = true,
+   .cmd_size = sizeof(struct ata_queued_cmd),
+   .queuecommand_mq = ata_scsi_queuecmd,
 };
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index f218427..e21814d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4725,29 +4725,25 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
 /**
  * ata_qc_new - Request an available ATA command, for queueing
  * @ap: target port
+ * @sc: incoming scsi_cmnd descriptor
  *
  * LOCKING:
  * None.
  */
 
-static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
+static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap,
+struct scsi_cmnd *sc)
 {
struct ata_queued_cmd *qc = NULL;
-   unsigned int i;
+   struct request *rq = sc-request;
 
/* no command while frozen */
if (unlikely(ap-pflags  ATA_PFLAG_FROZEN))
return NULL;
 
-   /* the last tag is reserved for internal command. */
-   for (i = 0; i  ATA_MAX_QUEUE - 1; i++)
-   if (!test_and_set_bit(i, ap-qc_allocated)) {
-   qc = __ata_qc_from_tag(ap, i);
-   break;
-   }
-
-   if (qc)
-   qc-tag = i;
+   qc = (struct ata_queued_cmd *)sc-SCp.ptr;
+   qc-scsicmd = sc;
+   qc-tag = rq-tag;
 
return qc;
 }
@@ -4755,19 +4751,20 @@ static struct ata_queued_cmd *ata_qc_new(struct 
ata_port *ap)
 /**
  * ata_qc_new_init - Request an available ATA command, and initialize it
  * @dev: Device from whom we request an available command structure
+ * @sc: incoming scsi_cmnd descriptor
  *
  * LOCKING:
  * None.
  */
 
-struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev)
+struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev,
+  struct scsi_cmnd *sc)
 {
struct ata_port *ap = dev-link-ap;
struct ata_queued_cmd *qc;
 
-   qc = ata_qc_new(ap);
+   qc = ata_qc_new(ap, sc);
if (qc) {
-   qc-scsicmd = NULL;
qc-ap = ap;
qc-dev = dev;
 
@@ -4797,10 +4794,9 @@ void ata_qc_free(struct ata_queued_cmd *qc)
 
qc-flags = 0;
tag = qc-tag;
-   if (likely(ata_tag_valid(tag))) {
+
+   if (likely(ata_tag_valid(tag)))
qc-tag = ATA_TAG_POISON;
-   clear_bit(tag, ap-qc_allocated);
-   }
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0101af5..e5ab880 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -742,9 +742,8 @@ static struct ata_queued_cmd 

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-25 Thread Alexander Gordeev
On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
 Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
 target-pending/scsi-mq, which now has functioning scsi-generic support.

Survives a boot, a kernel build and the build's result :)

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-25 Thread Nicholas A. Bellinger
On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
 On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
  Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
  target-pending/scsi-mq, which now has functioning scsi-generic support.
 
 Survives a boot, a kernel build and the build's result :)

Great.  Thanks for the feedback Alexander!

So the next step on my end is to enable -mq for ahci, and verify initial
correctness using QEMU/KVM hardware emulation.

Btw, I've been looking at enabling the SHT-cmd_size for struct
ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
are already all pre-allocated by libata and obtained via ata_qc_new() -
__ata_qc_from_tag() during ata_scsi_queuecmd().

So that said, with the struct request + struct scsi_cmnd pre-allocations
already provided by blk-mq - scsi-mq code, all memory allocations
should have already been eliminated from I/O fast path.

--nab








--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-25 Thread Jens Axboe
On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
  On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
   Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
   target-pending/scsi-mq, which now has functioning scsi-generic support.
  
  Survives a boot, a kernel build and the build's result :)
 
 Great.  Thanks for the feedback Alexander!
 
 So the next step on my end is to enable -mq for ahci, and verify initial
 correctness using QEMU/KVM hardware emulation.
 
 Btw, I've been looking at enabling the SHT-cmd_size for struct
 ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
 are already all pre-allocated by libata and obtained via ata_qc_new() -
 __ata_qc_from_tag() during ata_scsi_queuecmd().

Might still not be a bad idea to do it:

- Cleans up a driver, getting rid of the need to alloc, maintain, and
  free those structures.

- Should be some cache locality benefits to having it all sequential.

 So that said, with the struct request + struct scsi_cmnd pre-allocations
 already provided by blk-mq - scsi-mq code, all memory allocations
 should have already been eliminated from I/O fast path.

Nice!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-22 Thread Alexander Gordeev
On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
 OK, after further investigation the root cause is a actually a missing
 bio-bio_end_io() - bio_copy_kern_endio() - bio_put() from the
 blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
 currently using.

Yes, missing bio_copy_kern_endio() callback is exactly the reason I
turned to blk_mq_execute_rq() initially. I should have been more
specific on this :|

I will try Mike's and your other change, hopefully soon (sorry,
constantly getting distracted).

 Including the following patch into the scsi-mq working branch now, and
 reverting the libata dma_alignment=0x03 hack.

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-22 Thread Nicholas A. Bellinger
On Mon, 2013-07-22 at 17:03 +0200, Alexander Gordeev wrote:
 On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
  OK, after further investigation the root cause is a actually a missing
  bio-bio_end_io() - bio_copy_kern_endio() - bio_put() from the
  blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
  currently using.
 
 Yes, missing bio_copy_kern_endio() callback is exactly the reason I
 turned to blk_mq_execute_rq() initially. I should have been more
 specific on this :|
 
 I will try Mike's and your other change, hopefully soon (sorry,
 constantly getting distracted).
 

Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
target-pending/scsi-mq, which now has functioning scsi-generic support.

Also, your scsi_times_out patch from earlier has not been included just
yet, but that should be the only extra patch you need to apply in order
to get scsi-mq enabled libata/ata_piix running.

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-20 Thread Mike Christie
On 07/19/2013 11:56 PM, Nicholas A. Bellinger wrote:
 On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
 On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
 diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
 index 0101af5..191bc15 100644
 --- a/drivers/ata/libata-scsi.c
 +++ b/drivers/ata/libata-scsi.c
 @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device 
 *sdev,
 sector_size=%u  PAGE_SIZE, PIO may 
 malfunction\n,
 sdev-sector_size);
  
 -   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
 +   if (!q-mq_ops) {
 +   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
 +   } else {
 +   printk(Skipping dma_alignment for libata w/ scsi-mq\n);
 +   }

 Amazingly enough there is a reason for the dma alignment, and it wasn't
 just to annoy you, so you can't blindly do this.

 The email thread is probably lost in the mists of time, but if I
 remember correctly the problem is that some ahci DMA controllers barf if
 the sector they're doing DMA on crosses a page boundary.  Some are
 annoying enough to actually cause silent data corruption.  You won't
 find every ahci DMA controller doing this, so the change will work for
 some, but it will be hard to identify those it won't work for until
 people start losing data.

 Thanks for the extra background.

 So at least from what I gather thus far this shouldn't be an issue for
 initial testing with scsi-mq - libata w/ ata_piix.


 The correct fix, obviously, is to do the bio copy on the kernel path for
 unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
 aligned (because of the block to page alignment).


 Indeed.  Looking into the bio_copy_kern() breakage next..

 
 OK, after further investigation the root cause is a actually a missing
 bio-bio_end_io() - bio_copy_kern_endio() - bio_put() from the
 blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
 currently using.
 
 Including the following patch into the scsi-mq working branch now, and
 reverting the libata dma_alignment=0x03 hack.
 
 Alexander, care to give this a try..?
 
 --nab
 
 diff --git a/block/blk-exec.c b/block/blk-exec.c
 index 0761c89..70303d2 100644
 --- a/block/blk-exec.c
 +++ b/block/blk-exec.c
 @@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int error)
 struct completion *waiting = rq-end_io_data;
  
 rq-end_io_data = NULL;
 -   if (!rq-q-mq_ops) {
 +   if (rq-q-mq_ops) {
 +   if (rq-bio)
 +   bio_endio(rq-bio, error);
 +   } else {
 __blk_put_request(rq-q, rq);
 }
 


This does not handle requests with multiple bios, and for the mq stye
passthrough insertion completions you actually want to call
blk_mq_finish_request in scsi_execute. Same for all the other
passthrough code in your scsi mq tree. That is your root bug. Instead of
doing that though I think we want to have the block layer free the bios
like before.

For the non mq calls, blk_end_request type of calls will complete the
bios when blk_finish_request is called from that path. It will then call
the rq end_io callback.

I think the blk mq code assumes if the end_io callack is set that the
end_io function will do the bio cleanup. See __blk_mq_end_io. Also see
how blk_mq_execute_rq calls blk_mq_finish_request for an example of how
rq passthrough execution and cleanup is being done in the mq paths.

Now with the scsi mq changes, when blk_execute_rq_nowait calls
blk_mq_insert_request it calls it with a old non mq style of end io
function that does not complete the bios.

What about the attached only compile tested patch. The patch has the mq
block code work like the non mq code for bio cleanups.


blk-mq: blk-mq should free bios in pass through case

For non mq calls, the block layer will free the bios when
blk_finish_request is called.

For mq calls, the blk mq code wants the caller to do this.

This patch has the blk mq code work like the non mq code
and has the block layer free the bios.

Signed-off-by: Mike Christie micha...@cs.wisc.edu

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c56c37d..3e4cc9c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,7 +231,7 @@ static void flush_end_io(struct request *flush_rq, int 
error)
unsigned long flags = 0;
 
if (q-mq_ops) {
-   blk_mq_finish_request(flush_rq, error);
+   blk_mq_free_request(flush_rq);
spin_lock_irqsave(q-mq_flush_lock, flags);
}
running = q-flush_queue[q-flush_running_idx];
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 799d305..5489b5a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,7 +270,7 @@ void blk_mq_free_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_free_request);
 
-void blk_mq_finish_request(struct 

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-20 Thread Nicholas A. Bellinger
On Sat, 2013-07-20 at 09:48 -0500, Mike Christie wrote:
 On 07/19/2013 11:56 PM, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:

SNIP

 
  Indeed.  Looking into the bio_copy_kern() breakage next..
 
  
  OK, after further investigation the root cause is a actually a missing
  bio-bio_end_io() - bio_copy_kern_endio() - bio_put() from the
  blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
  currently using.
  
  Including the following patch into the scsi-mq working branch now, and
  reverting the libata dma_alignment=0x03 hack.
  
  Alexander, care to give this a try..?
  
  --nab
  
  diff --git a/block/blk-exec.c b/block/blk-exec.c
  index 0761c89..70303d2 100644
  --- a/block/blk-exec.c
  +++ b/block/blk-exec.c
  @@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int 
  error)
  struct completion *waiting = rq-end_io_data;
   
  rq-end_io_data = NULL;
  -   if (!rq-q-mq_ops) {
  +   if (rq-q-mq_ops) {
  +   if (rq-bio)
  +   bio_endio(rq-bio, error);
  +   } else {
  __blk_put_request(rq-q, rq);
  }
  
 
 
 This does not handle requests with multiple bios, and for the mq stye
 passthrough insertion completions you actually want to call
 blk_mq_finish_request in scsi_execute. Same for all the other
 passthrough code in your scsi mq tree. That is your root bug. Instead of
 doing that though I think we want to have the block layer free the bios
 like before.
 
 For the non mq calls, blk_end_request type of calls will complete the
 bios when blk_finish_request is called from that path. It will then call
 the rq end_io callback.
 
 I think the blk mq code assumes if the end_io callack is set that the
 end_io function will do the bio cleanup. See __blk_mq_end_io. Also see
 how blk_mq_execute_rq calls blk_mq_finish_request for an example of how
 rq passthrough execution and cleanup is being done in the mq paths.
 
 Now with the scsi mq changes, when blk_execute_rq_nowait calls
 blk_mq_insert_request it calls it with a old non mq style of end io
 function that does not complete the bios.
 
 What about the attached only compile tested patch. The patch has the mq
 block code work like the non mq code for bio cleanups.
 
 

OK, so with your blk-mq patch in place to always call
blk_mq_finish_request() in blk_mq_complete_request() regardless of
rq-end_io, the preceding scsi_mq_end_request() can now simply call
blk_mq_end_io() for both BLOCK_RQ and FS request types.

diff --git a/drivers/scsi/scsi-mq.c b/drivers/scsi/scsi-mq.c
index 81b2633..f1d4789 100644
--- a/drivers/scsi/scsi-mq.c
+++ b/drivers/scsi/scsi-mq.c
@@ -93,19 +93,7 @@ struct scsi_cmnd *scsi_mq_end_request(struct scsi_cmnd *sc, 
int error,
 #endif
 
 //FIXME: Add proper blk_mq_end_io residual bytes + requeue
-   if (rq-end_io) {
-#if 0
-   printk(scsi_mq_end_request: Calling rq-end_io BLOCK_PC for
-CDB: 0x%02x\n, sc-cmnd[0]);
-#endif
-   rq-end_io(rq, error);
-   } else {
-#if 0
-   printk(scsi_mq_end_request: Calling blk_mq_end_io for CDB: 
0x%02x\n,
-   sc-cmnd[0]);
-#endif
-   blk_mq_end_io(rq, error);
-   }
+   blk_mq_end_io(rq, error);
 //FIXME: Need to do equiv of scsi_next_command to kick hctx..?
 
return NULL;

Thanks for fixing up that bit of ugliness.  ;)

Jens, care to review+include Mike's change into your working branch..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-20 Thread Jens Axboe
On Sat, Jul 20 2013, Mike Christie wrote:
 blk-mq: blk-mq should free bios in pass through case
 
 For non mq calls, the block layer will free the bios when
 blk_finish_request is called.
 
 For mq calls, the blk mq code wants the caller to do this.
 
 This patch has the blk mq code work like the non mq code
 and has the block layer free the bios.

Thanks Mike, looks good, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread Nicholas A. Bellinger
On Thu, 2013-07-18 at 18:03 -0700, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-18 at 18:30 -0600, Jens Axboe wrote:
  On Thu, Jul 18 2013, Nicholas A. Bellinger wrote:
   On Thu, 2013-07-18 at 13:12 -0600, Mike Christie wrote:
On 07/18/2013 12:51 PM, Nicholas A. Bellinger wrote:

SNIP

   nod, just noticed the blk_queue_bounce() in blk_rq_map_kern().  
   
   Not sure why this doesn't seem to be doing what it's supposed to for
   libata just yet..
  
  How are you make the request from the bio? It'd be pretty trivial to
  ensure that it gets bounced properly... blk_mq_execute_rq() assumes a
  fully complete request, so it wont bounce anything.
  
 
 From what I gather for REQ_TYPE_BLOCK_PC, scsi_execute() -
 blk_rq_map_kern() - blk_rq_append_bio() - blk_rq_bio_prep() is what
 does the request setup from the bios returned by bio_[copy,map]_kern()
 in blk_rq_map_kern() code.
 
 blk_queue_bounce() is called immediately after blk_rq_append_bio() here,
 which AFAICT looks like it's doing the correct thing for scsi-mq..
 
 What is strange here is that libata-scsi.c CDB emulation code is doing
 the same stuff for both INQUIRY (that seems to be OK) and READ_CAPACITY
 (that is returning zeros), which makes me think that something else is
 going on..
 
 Alexander, where you able to re-test using sdev-sdev_mq_reg.queue_depth
 = 1 in scsi_mq_alloc_queue()..?
 

So after a bit more hacking tonight it appears the explicit setting of
dma_alignment by libata (sdev-sector_size - 1) is what is making
blk_rq_map_kern() invoke blk_copy_kern() instead of blk_map_kern(), and
triggering this scsi-mq specific bug with libata.  I'm able to confirm
with QEMU IDE emulation the bug is occurring only after INQUIRY and
before READ_CAPACITY, as reported by Alexander.

Below is a quick hack to skip this setting in ata_scsi_dev_config() for
blk-mq, and leaves the default dma_alignment=0x03 for REQ_TYPE_BLOCK_PC
requests as initially set by scsi-core in scsi_init_request_queue().

Also included is the change for using queue_depth = min(SHT-can_queue,
SHT-cmd_per_lun) during scsi-mq request_queue initialization, along
with a very basic ata_piix conversion for testing purposes.

With these three changes in place, I'm able to register a single 1GB
ata_piix LUN using QEMU IDE emulation, and successfully run simple fio
writeverify tests with blocksize=4k @ queue_depth=1.

Obviously this is not a proper bugfix for unaligned blk_copy_kern() with
scsi-mq + REQ_TYPE_BLOCK_PC case, but should be enough to at least get
libata LUN scanning to work.  Need to take a deeper look at the problem
here..

Alexander, care to give this work-around a shot on your bare-metal setup
using ata_piix..?

--nab

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 9a8a674..ac05cd6 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -1066,6 +1066,8 @@ static u8 piix_vmw_bmdma_status(struct ata_port *ap)
 
 static struct scsi_host_template piix_sht = {
ATA_BMDMA_SHT(DRV_NAME),
+   .scsi_mq= true,
+   .queuecommand_mq = ata_scsi_queuecmd,
 };
 
 static struct ata_port_operations piix_sata_ops = {
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0101af5..191bc15 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
sector_size=%u  PAGE_SIZE, PIO may malfunction\n,
sdev-sector_size);
 
-   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
+   if (!q-mq_ops) {
+   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
+   } else {
+   printk(Skipping dma_alignment for libata w/ scsi-mq\n);
+   }
 
if (dev-flags  ATA_DFLAG_AN)
set_bit(SDEV_EVT_MEDIA_CHANGE, sdev-supported_events);
diff --git a/drivers/scsi/scsi-mq.c b/drivers/scsi/scsi-mq.c
index ca6ff67..81b2633 100644
--- a/drivers/scsi/scsi-mq.c
+++ b/drivers/scsi/scsi-mq.c
@@ -199,11 +199,11 @@ int scsi_mq_alloc_queue(struct Scsi_Host *sh, struct 
scsi_device *sdev)
int i, j;
 
sdev-sdev_mq_reg.ops = scsi_mq_ops;
-   sdev-sdev_mq_reg.queue_depth = sdev-queue_depth;
+   sdev-sdev_mq_reg.queue_depth = min((short)sh-hostt-can_queue,
+   sh-hostt-cmd_per_lun);
sdev-sdev_mq_reg.cmd_size = sizeof(struct scsi_cmnd) + 
sh-hostt-cmd_size;
sdev-sdev_mq_reg.numa_node = NUMA_NO_NODE;
sdev-sdev_mq_reg.nr_hw_queues = 1;
-   sdev-sdev_mq_reg.queue_depth = 64;
sdev-sdev_mq_reg.flags = BLK_MQ_F_SHOULD_MERGE;
 
printk(Calling blk_mq_init_queue: scsi_mq_ops: %p, queue_depth: %d,

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread James Bottomley
On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
 diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
 index 0101af5..191bc15 100644
 --- a/drivers/ata/libata-scsi.c
 +++ b/drivers/ata/libata-scsi.c
 @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device 
 *sdev,
 sector_size=%u  PAGE_SIZE, PIO may malfunction\n,
 sdev-sector_size);
  
 -   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
 +   if (!q-mq_ops) {
 +   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
 +   } else {
 +   printk(Skipping dma_alignment for libata w/ scsi-mq\n);
 +   }

Amazingly enough there is a reason for the dma alignment, and it wasn't
just to annoy you, so you can't blindly do this.

The email thread is probably lost in the mists of time, but if I
remember correctly the problem is that some ahci DMA controllers barf if
the sector they're doing DMA on crosses a page boundary.  Some are
annoying enough to actually cause silent data corruption.  You won't
find every ahci DMA controller doing this, so the change will work for
some, but it will be hard to identify those it won't work for until
people start losing data.

The correct fix, obviously, is to do the bio copy on the kernel path for
unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
aligned (because of the block to page alignment).

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread Mike Christie
On 07/18/2013 06:23 PM, Nicholas A. Bellinger wrote:
 Just saw this while trying out iscsi with the scsi-mq stuff :)
  
 Took at stab at this a while back, but ended getting distracted on other
 items.  Do you have an initial conversion running yet..?

Not running well :) Have a patch but I am debugging it now. I messed up
something with the cmd_size stuff.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread Nicholas A. Bellinger
On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
 On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
  diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
  index 0101af5..191bc15 100644
  --- a/drivers/ata/libata-scsi.c
  +++ b/drivers/ata/libata-scsi.c
  @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device 
  *sdev,
  sector_size=%u  PAGE_SIZE, PIO may malfunction\n,
  sdev-sector_size);
   
  -   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
  +   if (!q-mq_ops) {
  +   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
  +   } else {
  +   printk(Skipping dma_alignment for libata w/ scsi-mq\n);
  +   }
 
 Amazingly enough there is a reason for the dma alignment, and it wasn't
 just to annoy you, so you can't blindly do this.
 
 The email thread is probably lost in the mists of time, but if I
 remember correctly the problem is that some ahci DMA controllers barf if
 the sector they're doing DMA on crosses a page boundary.  Some are
 annoying enough to actually cause silent data corruption.  You won't
 find every ahci DMA controller doing this, so the change will work for
 some, but it will be hard to identify those it won't work for until
 people start losing data.

Thanks for the extra background.

So at least from what I gather thus far this shouldn't be an issue for
initial testing with scsi-mq - libata w/ ata_piix.

 
 The correct fix, obviously, is to do the bio copy on the kernel path for
 unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
 aligned (because of the block to page alignment).
 

Indeed.  Looking into the bio_copy_kern() breakage next..

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread Nicholas A. Bellinger
On Fri, 2013-07-19 at 09:58 -0600, Mike Christie wrote:
 On 07/18/2013 06:23 PM, Nicholas A. Bellinger wrote:
  Just saw this while trying out iscsi with the scsi-mq stuff :)
   
  Took at stab at this a while back, but ended getting distracted on other
  items.  Do you have an initial conversion running yet..?
 
 Not running well :) Have a patch but I am debugging it now. I messed up
 something with the cmd_size stuff.

nod, looking forward to see ib_iser move beyond the existing
scsi_request_fn() bottleneck.   ;)

Feel free to send me the WIP off-list if you'd like another pair of
eyes.

--nab



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-19 Thread Nicholas A. Bellinger
On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
 On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
  On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
   diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
   index 0101af5..191bc15 100644
   --- a/drivers/ata/libata-scsi.c
   +++ b/drivers/ata/libata-scsi.c
   @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device 
   *sdev,
   sector_size=%u  PAGE_SIZE, PIO may 
   malfunction\n,
   sdev-sector_size);

   -   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
   +   if (!q-mq_ops) {
   +   blk_queue_update_dma_alignment(q, sdev-sector_size - 1);
   +   } else {
   +   printk(Skipping dma_alignment for libata w/ scsi-mq\n);
   +   }
  
  Amazingly enough there is a reason for the dma alignment, and it wasn't
  just to annoy you, so you can't blindly do this.
  
  The email thread is probably lost in the mists of time, but if I
  remember correctly the problem is that some ahci DMA controllers barf if
  the sector they're doing DMA on crosses a page boundary.  Some are
  annoying enough to actually cause silent data corruption.  You won't
  find every ahci DMA controller doing this, so the change will work for
  some, but it will be hard to identify those it won't work for until
  people start losing data.
 
 Thanks for the extra background.
 
 So at least from what I gather thus far this shouldn't be an issue for
 initial testing with scsi-mq - libata w/ ata_piix.
 
  
  The correct fix, obviously, is to do the bio copy on the kernel path for
  unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
  aligned (because of the block to page alignment).
  
 
 Indeed.  Looking into the bio_copy_kern() breakage next..
 

OK, after further investigation the root cause is a actually a missing
bio-bio_end_io() - bio_copy_kern_endio() - bio_put() from the
blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
currently using.

Including the following patch into the scsi-mq working branch now, and
reverting the libata dma_alignment=0x03 hack.

Alexander, care to give this a try..?

--nab

diff --git a/block/blk-exec.c b/block/blk-exec.c
index 0761c89..70303d2 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int error)
struct completion *waiting = rq-end_io_data;
 
rq-end_io_data = NULL;
-   if (!rq-q-mq_ops) {
+   if (rq-q-mq_ops) {
+   if (rq-bio)
+   bio_endio(rq-bio, error);
+   } else {
__blk_put_request(rq-q, rq);
}

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Nicholas A. Bellinger
On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
 On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
  [7.927818] scsi_execute(): Calling blk_mq_free_request 
  [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  
  CC03 PQ: 0 ANSI: 5
  
  OK, so INQUIRY response payload is looking as expected here.
 
 Yep. It is not on the top of my head, but I remember something like INQUIRYs
 are emulated and thus do not have payload.
 
  [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
  [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
  [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
  
  Strange..  READ_CAPACITY appears to be returning a payload as zeros..?
 
 Yep. Because blk_execute_rq() does not put the proper callback and data do
 not get copied from sg's to bounce buffer. That is why I tried to use
 blk_mq_execute_rq() instead. Once I do that, data start getting read and
 booting stops elsewhere.

Mm.

The call to blk_queue_bounce() exists within blk_mq_make_request(), but
AFAICT this should still be getting invoked regardless of if the struct
request is dispatched into blk-mq via the modified blk_execute_rq() -
blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
via blk_mq_execute_rq()..

Jens..?

 
 Of course, I was suspecting that change alone is not valid and wondered
 about the status of scsi-mq in the first place, and if more changes are
 coming.

Most certainly.  ;)

 
 So I it turns out req-errors + req-resid_len issue (you described
 earlier) needs to be addressed before going forward with libata (only?).

AFAICT, getting an initial conversion of libata up does not depend upon
this specific issue being addressed first.  I could be wrong however..

 
  Not sure why yet some control CDBs are getting back the expected
  payload, while others are returning zeros..
 
 Bio buffers do not get updated from callback.
 
  Also, looking at the included stack back-trace:
 
 [...]
 
 Thanks a lot for these and other your comments, Nicholas!
 

Sure.  I should have a few extra cycles to hack on this over the
weekend.

Also, thinking about this some more, trying to convert ahci to scsi-mq
first (using QEMU emulation), while keeping a rootfs on PIIX_IDE might
make debugging slightly easier..

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Nicholas A. Bellinger
On Thu, 2013-07-18 at 11:51 -0700, Nicholas A. Bellinger wrote:
 On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
  On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
   [7.927818] scsi_execute(): Calling blk_mq_free_request 
   [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  
   CC03 PQ: 0 ANSI: 5
   
   OK, so INQUIRY response payload is looking as expected here.
  
  Yep. It is not on the top of my head, but I remember something like INQUIRYs
  are emulated and thus do not have payload.
  
   [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
   [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
   [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
   
   Strange..  READ_CAPACITY appears to be returning a payload as zeros..?
  
  Yep. Because blk_execute_rq() does not put the proper callback and data do
  not get copied from sg's to bounce buffer. That is why I tried to use
  blk_mq_execute_rq() instead. Once I do that, data start getting read and
  booting stops elsewhere.
 
 Mm.
 
 The call to blk_queue_bounce() exists within blk_mq_make_request(), but
 AFAICT this should still be getting invoked regardless of if the struct
 request is dispatched into blk-mq via the modified blk_execute_rq() -
 blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
 via blk_mq_execute_rq()..
 
 Jens..?
 

Actually sorry, your right.  A call to blk_mq_insert_request() for
REQ_TYPE_BLOCK_PC will not invoke blk_queue_bounce() located near the
top of blk_mq_execute_rq(), which means that only REQ_TYPE_FS is
currently using bounce buffers, if required.

Need to think a bit more about what to do here for REQ_TYPE_BLOCK_PC
bounce buffer special case with blk_execute_rq(), but I'm thinking that
blk_mq_execute_rq() should really not be used here..

Jens..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Mike Christie
On 07/18/2013 12:51 PM, Nicholas A. Bellinger wrote:
 On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
 On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
 [7.927818] scsi_execute(): Calling blk_mq_free_request 
 [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  
 CC03 PQ: 0 ANSI: 5

 OK, so INQUIRY response payload is looking as expected here.

 Yep. It is not on the top of my head, but I remember something like INQUIRYs
 are emulated and thus do not have payload.

 [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
 [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
 [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks

 Strange..  READ_CAPACITY appears to be returning a payload as zeros..?

 Yep. Because blk_execute_rq() does not put the proper callback and data do
 not get copied from sg's to bounce buffer. That is why I tried to use
 blk_mq_execute_rq() instead. Once I do that, data start getting read and
 booting stops elsewhere.
 
 Mm.
 
 The call to blk_queue_bounce() exists within blk_mq_make_request(), but
 AFAICT this should still be getting invoked regardless of if the struct
 request is dispatched into blk-mq via the modified blk_execute_rq() -
 blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
 via blk_mq_execute_rq()..
 

blk_mq_make_request is not called from the blk insert/execute paths.
blk_mq_make_request takes a bio and tries to merge it with a request and
adds it to the queue. It is only called when the make_request_fn is
called like when generic_make_request is called.

blk_mq_insert_request adds a already formed request to the queue. It is
already formed so that is why that path does not bounce bios. The
bios/pages should already be added within the drivers restrictions. So
for the read_cap path, the call to blk_rq_map_kern in scsi_execute does
the blk_queue_bounce call.

Just saw this while trying out iscsi with the scsi-mq stuff :)

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Jens Axboe
On 07/18/2013 01:14 PM, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-18 at 11:51 -0700, Nicholas A. Bellinger wrote:
 On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
 On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
 [7.927818] scsi_execute(): Calling blk_mq_free_request 
 [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  
 CC03 PQ: 0 ANSI: 5

 OK, so INQUIRY response payload is looking as expected here.

 Yep. It is not on the top of my head, but I remember something like INQUIRYs
 are emulated and thus do not have payload.

 [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
 [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
 [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks

 Strange..  READ_CAPACITY appears to be returning a payload as zeros..?

 Yep. Because blk_execute_rq() does not put the proper callback and data do
 not get copied from sg's to bounce buffer. That is why I tried to use
 blk_mq_execute_rq() instead. Once I do that, data start getting read and
 booting stops elsewhere.

 Mm.

 The call to blk_queue_bounce() exists within blk_mq_make_request(), but
 AFAICT this should still be getting invoked regardless of if the struct
 request is dispatched into blk-mq via the modified blk_execute_rq() -
 blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
 via blk_mq_execute_rq()..

 Jens..?

 
 Actually sorry, your right.  A call to blk_mq_insert_request() for
 REQ_TYPE_BLOCK_PC will not invoke blk_queue_bounce() located near the
 top of blk_mq_execute_rq(), which means that only REQ_TYPE_FS is
 currently using bounce buffers, if required.
 
 Need to think a bit more about what to do here for REQ_TYPE_BLOCK_PC
 bounce buffer special case with blk_execute_rq(), but I'm thinking that
 blk_mq_execute_rq() should really not be used here..
 
 Jens..?

It needs to be pre-bounced, blk-mq will only bounce incoming bios and
not requests merely added to the queue(s). Might be useful to add an
equiv blk_mq_make_request() for this.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Nicholas A. Bellinger
On Thu, 2013-07-18 at 13:12 -0600, Mike Christie wrote:
 On 07/18/2013 12:51 PM, Nicholas A. Bellinger wrote:
  On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
  On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
  [7.927818] scsi_execute(): Calling blk_mq_free_request 
  [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  
  CC03 PQ: 0 ANSI: 5
 
  OK, so INQUIRY response payload is looking as expected here.
 
  Yep. It is not on the top of my head, but I remember something like 
  INQUIRYs
  are emulated and thus do not have payload.
 
  [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
  [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
  [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
 
  Strange..  READ_CAPACITY appears to be returning a payload as zeros..?
 
  Yep. Because blk_execute_rq() does not put the proper callback and data do
  not get copied from sg's to bounce buffer. That is why I tried to use
  blk_mq_execute_rq() instead. Once I do that, data start getting read and
  booting stops elsewhere.
  
  Mm.
  
  The call to blk_queue_bounce() exists within blk_mq_make_request(), but
  AFAICT this should still be getting invoked regardless of if the struct
  request is dispatched into blk-mq via the modified blk_execute_rq() -
  blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
  via blk_mq_execute_rq()..
  
 
 blk_mq_make_request is not called from the blk insert/execute paths.
 blk_mq_make_request takes a bio and tries to merge it with a request and
 adds it to the queue. It is only called when the make_request_fn is
 called like when generic_make_request is called.
 
 blk_mq_insert_request adds a already formed request to the queue. It is
 already formed so that is why that path does not bounce bios. The
 bios/pages should already be added within the drivers restrictions. So
 for the read_cap path, the call to blk_rq_map_kern in scsi_execute does
 the blk_queue_bounce call.
 

nod, just noticed the blk_queue_bounce() in blk_rq_map_kern().  

Not sure why this doesn't seem to be doing what it's supposed to for
libata just yet..

 Just saw this while trying out iscsi with the scsi-mq stuff :)
 

Took at stab at this a while back, but ended getting distracted on other
items.  Do you have an initial conversion running yet..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Jens Axboe
On Thu, Jul 18 2013, Nicholas A. Bellinger wrote:
 On Thu, 2013-07-18 at 13:12 -0600, Mike Christie wrote:
  On 07/18/2013 12:51 PM, Nicholas A. Bellinger wrote:
   On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
   On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
   [7.927818] scsi_execute(): Calling blk_mq_free_request 
   [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS 
CC03 PQ: 0 ANSI: 5
  
   OK, so INQUIRY response payload is looking as expected here.
  
   Yep. It is not on the top of my head, but I remember something like 
   INQUIRYs
   are emulated and thus do not have payload.
  
   [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
   [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 
   B)
   [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
  
   Strange..  READ_CAPACITY appears to be returning a payload as zeros..?
  
   Yep. Because blk_execute_rq() does not put the proper callback and data 
   do
   not get copied from sg's to bounce buffer. That is why I tried to use
   blk_mq_execute_rq() instead. Once I do that, data start getting read and
   booting stops elsewhere.
   
   Mm.
   
   The call to blk_queue_bounce() exists within blk_mq_make_request(), but
   AFAICT this should still be getting invoked regardless of if the struct
   request is dispatched into blk-mq via the modified blk_execute_rq() -
   blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
   via blk_mq_execute_rq()..
   
  
  blk_mq_make_request is not called from the blk insert/execute paths.
  blk_mq_make_request takes a bio and tries to merge it with a request and
  adds it to the queue. It is only called when the make_request_fn is
  called like when generic_make_request is called.
  
  blk_mq_insert_request adds a already formed request to the queue. It is
  already formed so that is why that path does not bounce bios. The
  bios/pages should already be added within the drivers restrictions. So
  for the read_cap path, the call to blk_rq_map_kern in scsi_execute does
  the blk_queue_bounce call.
  
 
 nod, just noticed the blk_queue_bounce() in blk_rq_map_kern().  
 
 Not sure why this doesn't seem to be doing what it's supposed to for
 libata just yet..

How are you make the request from the bio? It'd be pretty trivial to
ensure that it gets bounced properly... blk_mq_execute_rq() assumes a
fully complete request, so it wont bounce anything.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-18 Thread Nicholas A. Bellinger
On Thu, 2013-07-18 at 18:30 -0600, Jens Axboe wrote:
 On Thu, Jul 18 2013, Nicholas A. Bellinger wrote:
  On Thu, 2013-07-18 at 13:12 -0600, Mike Christie wrote:
   On 07/18/2013 12:51 PM, Nicholas A. Bellinger wrote:
On Wed, 2013-07-17 at 18:19 +0200, Alexander Gordeev wrote:
On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
[7.927818] scsi_execute(): Calling blk_mq_free_request 
[7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS   
   CC03 PQ: 0 ANSI: 5
   
OK, so INQUIRY response payload is looking as expected here.
   
Yep. It is not on the top of my head, but I remember something like 
INQUIRYs
are emulated and thus do not have payload.
   
[7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
[7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 
B/512 B)
[7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
   
Strange..  READ_CAPACITY appears to be returning a payload as zeros..?
   
Yep. Because blk_execute_rq() does not put the proper callback and 
data do
not get copied from sg's to bounce buffer. That is why I tried to use
blk_mq_execute_rq() instead. Once I do that, data start getting read 
and
booting stops elsewhere.

Mm.

The call to blk_queue_bounce() exists within blk_mq_make_request(), but
AFAICT this should still be getting invoked regardless of if the struct
request is dispatched into blk-mq via the modified blk_execute_rq() -
blk_execute_rq_nowait() - blk_mq_insert_request() codepath, or directly
via blk_mq_execute_rq()..

   
   blk_mq_make_request is not called from the blk insert/execute paths.
   blk_mq_make_request takes a bio and tries to merge it with a request and
   adds it to the queue. It is only called when the make_request_fn is
   called like when generic_make_request is called.
   
   blk_mq_insert_request adds a already formed request to the queue. It is
   already formed so that is why that path does not bounce bios. The
   bios/pages should already be added within the drivers restrictions. So
   for the read_cap path, the call to blk_rq_map_kern in scsi_execute does
   the blk_queue_bounce call.
   
  
  nod, just noticed the blk_queue_bounce() in blk_rq_map_kern().  
  
  Not sure why this doesn't seem to be doing what it's supposed to for
  libata just yet..
 
 How are you make the request from the bio? It'd be pretty trivial to
 ensure that it gets bounced properly... blk_mq_execute_rq() assumes a
 fully complete request, so it wont bounce anything.
 

From what I gather for REQ_TYPE_BLOCK_PC, scsi_execute() -
blk_rq_map_kern() - blk_rq_append_bio() - blk_rq_bio_prep() is what
does the request setup from the bios returned by bio_[copy,map]_kern()
in blk_rq_map_kern() code.

blk_queue_bounce() is called immediately after blk_rq_append_bio() here,
which AFAICT looks like it's doing the correct thing for scsi-mq..

What is strange here is that libata-scsi.c CDB emulation code is doing
the same stuff for both INQUIRY (that seems to be OK) and READ_CAPACITY
(that is returning zeros), which makes me think that something else is
going on..

Alexander, where you able to re-test using sdev-sdev_mq_reg.queue_depth
= 1 in scsi_mq_alloc_queue()..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-17 Thread Alexander Gordeev
On Tue, Jul 16, 2013 at 02:38:03PM -0700, Nicholas A. Bellinger wrote:
 [7.927818] scsi_execute(): Calling blk_mq_free_request 
 [7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  CC03 
 PQ: 0 ANSI: 5
 
 OK, so INQUIRY response payload is looking as expected here.

Yep. It is not on the top of my head, but I remember something like INQUIRYs
are emulated and thus do not have payload.

 [7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
 [7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
 [7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks
 
 Strange..  READ_CAPACITY appears to be returning a payload as zeros..?

Yep. Because blk_execute_rq() does not put the proper callback and data do
not get copied from sg's to bounce buffer. That is why I tried to use
blk_mq_execute_rq() instead. Once I do that, data start getting read and
booting stops elsewhere.

Of course, I was suspecting that change alone is not valid and wondered
about the status of scsi-mq in the first place, and if more changes are
coming.

So I it turns out req-errors + req-resid_len issue (you described
earlier) needs to be addressed before going forward with libata (only?).

 Not sure why yet some control CDBs are getting back the expected
 payload, while others are returning zeros..

Bio buffers do not get updated from callback.

 Also, looking at the included stack back-trace:

[...]

Thanks a lot for these and other your comments, Nicholas!

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-16 Thread Nicholas A. Bellinger
On Tue, 2013-07-16 at 20:32 +0200, Alexander Gordeev wrote:
 On Fri, Jul 12, 2013 at 10:20:12PM -0700, Nicholas A. Bellinger wrote:
  On Fri, 2013-07-12 at 09:46 +0200, Alexander Gordeev wrote:
 diff --git a/drivers/scsi/scsi-mq.c b/drivers/scsi/scsi-mq.c
 index ca6ff67..d8cc7a4 100644
 --- a/drivers/scsi/scsi-mq.c
 +++ b/drivers/scsi/scsi-mq.c
 @@ -155,6 +155,7 @@ void scsi_mq_done(struct scsi_cmnd *sc)
  static struct blk_mq_ops scsi_mq_ops = {
   .queue_rq   = scsi_mq_queue_rq,
   .map_queue  = blk_mq_map_queue,
 + .timeout= scsi_times_out,
   .alloc_hctx = blk_mq_alloc_single_hw_queue,
   .free_hctx  = blk_mq_free_single_hw_queue,
  };
  
  So your actually triggering a blk-mq timeout with ata_piix..?
 
 No.
 That is to avoid a NULL-pointer assignment from -timeout elsewhere.
 In fact I return -ENODEV for sr_probe() to not hit it.
 
  That is why scsi-mq still uses blk_execute_rq() for this reason, and
  this will need be addressed in order to safely use blk_mq_execute_rq()
  in the above context.
 
 Got it.
 
  Do you have an OOPs backtrace handy to post w/o the last two changes in
  place..?
 
 Attaching the output. No oops actually (due to aforementioned .timeout).
 

Hi Alexander,

Thanks for the logs.  I'm In-lining some of the output here for
reference:

[7.927596] Calling blk_mq_init_queue: scsi_mq_ops: 81ca13e0, 
queue_depth: 64, cmd_size: 296 SCSI cmd_size: 0

Just FYI, a SCSI cmd_size of zero here means that scsi-mq will not be
providing pre-allocated LLD descriptors (located at scsi_cmnd-SCp.ptr)
for use by libata driver code.

That is fine for initial testing, but libata will eventually want to
take advantage of scsi_host_template-cmd_size = sizeof(ata_queued_cmd)
in order to remove (all) memory allocations from the I/O fast-path.

[7.927639] blk-mq: CPU - queue map
[7.927640]   CPU 0 - Queue 0
[7.927640]   CPU 1 - Queue 0
[7.927640]   CPU 2 - Queue 0
[7.927641]   CPU 3 - Queue 0
[7.927641]   CPU 4 - Queue 0
[7.927642]   CPU 5 - Queue 0
[7.927642]   CPU 6 - Queue 0
[7.927643]   CPU 7 - Queue 0
[7.927643]   CPU 8 - Queue 0
[7.927643]   CPU 9 - Queue 0
[7.927644]   CPU10 - Queue 0
[7.927644]   CPU11 - Queue 0
[7.927645]   CPU12 - Queue 0
[7.927645]   CPU13 - Queue 0
[7.927646]   CPU14 - Queue 0
[7.927646]   CPU15 - Queue 0
[7.927673] Performing sc map setup on q: 88046243 hctx: 
880462a14200 i: 0
[7.927780] scsi_mq_alloc_queue() complete !! 
[7.927784] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927785] Allocated blk-mq req: 88046244ad40, req-tag: 63
[7.927790] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927803] scsi_execute(): Calling blk_mq_free_request 
[7.927815] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927816] Allocated blk-mq req: 88046244ad40, req-tag: 63
[7.927817] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927818] scsi_execute(): Calling blk_mq_free_request 
[7.927826] scsi 0:0:0:0: Direct-Access ATA  ST9500530NS  CC03 
PQ: 0 ANSI: 5

OK, so INQUIRY response payload is looking as expected here.

[7.927944] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927946] Allocated blk-mq req: 88046244a7c0, req-tag: 61
[7.927946] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927949] scsi_execute(): Calling blk_mq_free_request 
[7.927951] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927952] Allocated blk-mq req: 88046244a7c0, req-tag: 61
[7.927955] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927958] scsi_execute(): Calling blk_mq_free_request 
[7.927960] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
[7.927964] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
[7.927965] sd 0:0:0:0: [sda] 0-byte physical blocks

Strange..  READ_CAPACITY appears to be returning a payload as zeros..?

[7.927966] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927967] Allocated blk-mq req: 88046244a7c0, req-tag: 61
[7.927968] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927970] scsi_execute(): Calling blk_mq_free_request 
[7.927970] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927971] Allocated blk-mq req: 88046244a7c0, req-tag: 61
[7.927972] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927973] scsi_execute(): Calling blk_mq_free_request 
[7.927975] Entering scsi_execute with q-mq_ops: 81ca13e0
[7.927976] Allocated blk-mq req: 88046244a7c0, req-tag: 61
[7.927977] Calling blk_mq_insert_request from blk_execute_rq_nowait 

[7.927979] scsi_execute(): Calling blk_mq_free_request 
[7.927981] sd 0:0:0:0: [sda] Write Protect is off
[7.927982] sd 0:0:0:0: [sda]