Re: scsi-mq V2

2014-07-14 Thread Benjamin LaHaise
Hi Robert,

On Sun, Jul 13, 2014 at 05:15:15PM +, Elliott, Robert (Server Storage) 
wrote:
   I will see if that solves the problem with the scsi-mq-3 tree, or
   at least some of the bisect trees leading up to it.
  
  scsi-mq-3 is still going after 45 minutes.  I'll leave it running
  overnight.
  
 
 That has been going strong for 18 hours, so I think that's the patch
 we need.

Thanks for taking the time to narrow this down.  I've applied the fix to 
my aio-fixes tree at git://git.kvack.org/~bcrl/aio-fixes.git and fowarded 
it on to Linus as well.

-ben
-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi-mq V2

2014-07-11 Thread Benjamin LaHaise
On Fri, Jul 11, 2014 at 02:33:12PM +, Elliott, Robert (Server Storage) 
wrote:
 That ran 9 total hours with no problem.
 
 Rather than revert in the bisect trees, I added just this single additional
 patch to the no-rebase tree, and the problem appeared:

Can you try the below totally untested patch instead?  It looks like
put_reqs_available() is not irq-safe.

-ben
-- 
Thought is the essence of where you are now.


diff --git a/fs/aio.c b/fs/aio.c
index 955947e..4b97180 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -830,16 +830,20 @@ void exit_aio(struct mm_struct *mm)
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
 {
struct kioctx_cpu *kcpu;
+   unsigned long flags;
 
preempt_disable();
kcpu = this_cpu_ptr(ctx-cpu);
 
+   local_irq_save(flags);
kcpu-reqs_available += nr;
+
while (kcpu-reqs_available = ctx-req_batch * 2) {
kcpu-reqs_available -= ctx-req_batch;
atomic_add(ctx-req_batch, ctx-reqs_available);
}
 
+   local_irq_restore(flags);
preempt_enable();
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi-mq V2

2014-07-10 Thread Benjamin LaHaise
On Wed, Jul 09, 2014 at 11:20:40PM -0700, Christoph Hellwig wrote:
 On Thu, Jul 10, 2014 at 12:53:36AM +, Elliott, Robert (Server Storage) 
 wrote:
  the problem still occurs - fio results in low or 0 IOPS, with perf top 
  reporting unusual amounts of time spent in do_io_submit and io_submit.
 
 The diff between the two version doesn't show too much other possible
 interesting commits, the most interesting being some minor block
 updates.
 
 I guess we'll have to a manual bisect, I've pushed out a
 scsi-mq.3-bisect-1 branch that is rebased to just before the merge of
 the block tree, and a scsi-mq.3-bisect-2 branch that is just after
 the merge of the block tree to get started.

There is one possible concern that could be exacerbated by other changes in 
the system: if the application is running close to the bare minimum number 
of requests allocated in io_setup(), the per cpu reference counters will 
have a hard time batching things.  It might be worth testing with an 
increased number of requests being allocated if this is the case.

-ben
-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi-mq V2

2014-07-10 Thread Benjamin LaHaise
On Thu, Jul 10, 2014 at 03:39:57PM +0200, Jens Axboe wrote:
 That's how fio always runs, it sets up the context with the exact queue 
 depth that it needs. Do we have a good enough understanding of other aio 
 use cases to say that this isn't the norm? I would expect it to be, it's 
 the way that the API would most obviously be used.

The problem with this approach is that it works very poorly with per cpu 
reference counting's batching of references, which is pretty much a 
requirement now that many core systems are the norm.  Allocating the bare 
minimum is not the right thing to do today.  That said, the default limits 
on the number of requests probably needs to be raised.

-ben
-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi-mq V2

2014-07-10 Thread Benjamin LaHaise
On Thu, Jul 10, 2014 at 03:48:10PM +0200, Jens Axboe wrote:
 On 2014-07-10 15:44, Benjamin LaHaise wrote:
 On Thu, Jul 10, 2014 at 03:39:57PM +0200, Jens Axboe wrote:
 That's how fio always runs, it sets up the context with the exact queue
 depth that it needs. Do we have a good enough understanding of other aio
 use cases to say that this isn't the norm? I would expect it to be, it's
 the way that the API would most obviously be used.
 
 The problem with this approach is that it works very poorly with per cpu
 reference counting's batching of references, which is pretty much a
 requirement now that many core systems are the norm.  Allocating the bare
 minimum is not the right thing to do today.  That said, the default limits
 on the number of requests probably needs to be raised.
 
 Sorry, that's a complete cop-out. Then you handle this internally, 
 allocate a bigger pool and cap the limit if you need to. Look at the 
 API. You pass in the number of requests you will use. Do you expect 
 anyone to double up, just in case? Will never happen.
 
 But all of this is side stepping the point that there's a real bug 
 reported here. The above could potentially explain the it's using X 
 more CPU, or it's Y slower. The above is a softlock, it never completes.

I'm not trying to cop out on this -- I'm asking for a data point to see 
if changing the request limits has any effect.

-ben

 -- 
 Jens Axboe

-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi-mq V2

2014-07-10 Thread Benjamin LaHaise
On Thu, Jul 10, 2014 at 02:36:40PM +, Elliott, Robert (Server Storage) 
wrote:
 
 
  -Original Message-
  From: Jens Axboe [mailto:ax...@kernel.dk]
  Sent: Thursday, 10 July, 2014 8:53 AM
  To: Christoph Hellwig; Benjamin LaHaise
  Cc: Elliott, Robert (Server Storage); dgilb...@interlog.com; James 
  Bottomley;
  Bart Van Assche; linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org
  Subject: Re: scsi-mq V2
  
  On 2014-07-10 15:50, Christoph Hellwig wrote:
   On Thu, Jul 10, 2014 at 09:36:09AM -0400, Benjamin LaHaise wrote:
   There is one possible concern that could be exacerbated by other changes
  in
   the system: if the application is running close to the bare minimum 
   number
   of requests allocated in io_setup(), the per cpu reference counters will
   have a hard time batching things.  It might be worth testing with an
   increased number of requests being allocated if this is the case.
  
   Well, Robert said reverting the two aio commits didn't help.  Either he
   didn't manage to boot into the right kernel, or we need to look
   elsewhere for the culprit.
  
  Rob, let me know what scsi_debug setup you use, and I can try and
  reproduce it here as well.
  
  --
  Jens Axboe
 
 This system has 6 online CPUs and 64 possible CPUs.
 
 Printing avail and req_batch in that loop results in many of these:
 ** 3813 printk messages dropped ** [10643.503772] ctx 88042d8d4cc0 
 avail=0 req_batch=2
 
 Adding CFLAGS_aio.o := -DDEBUG to the Makefile to enable
 those pr_debug prints results in nothing extra printing,
 so it's not hitting an error.
 
 Printing nr_events and aio_max_nr at the top of ioctx_alloc results in
 these as fio starts:
 
 [  186.339064] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.339065] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.339067] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.339068] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.339069] ioctx_alloc: nr_events=-2 aio_max_nr=65536

Something is horribly wrong here.  There is no way that value for nr_events 
should be passed in to ioctx_alloc().  This implies that userland is calling 
io_setup() with an impossibly large value for nr_events.  Can you post the 
actual diff for your fs/aio.c relative to linus' tree?

-ben


 [  186.339070] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.339071] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.339071] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.339074] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.339076] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.339076] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.359772] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.359971] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.359972] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.359985] sd 5:0:3:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.359986] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.359987] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.359995] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.359995] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.359998] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.359998] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.362529] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.362529] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.363510] sd 5:0:1:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.363513] sd 5:0:4:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.363520] sd 5:0:2:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.363521] sd 5:0:3:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.398113] sd 5:0:5:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.398115] sd 5:0:1:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.398121] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.398122] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.398124] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.398124] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.398130] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.398131] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.398164] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.398165] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.398499] sd 5:0:4:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.400489] sd 5:0:1:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.401478] sd 5:0:1:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.401491] sd 5:0:3:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.434522] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.434523] sd 5:0:3:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.434526] sd 5:0:5:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.434533] sd 5:0:0:0: scsi_debug_ioctl: BLKFLSBUF [0x1261]
 [  186.435370] hrtimer: interrupt took 6868 ns
 [  186.435491] ioctx_alloc: nr_events=-2 aio_max_nr=65536
 [  186.435492] ioctx_alloc: nr_events=512 aio_max_nr=65536
 [  186.447864] sd 5:0:0:0: scsi_debug_ioctl: BLKFLSBUF [0x1261

Re: [RFC PATCH 0/5] userspace PI passthrough via AIO/DIO

2014-03-21 Thread Benjamin LaHaise
Hi folks,

On Fri, Mar 21, 2014 at 11:23:32AM -0700, Zach Brown wrote:
 On Thu, Mar 20, 2014 at 09:30:41PM -0700, Darrick J. Wong wrote:
  This RFC provides a rough implementation of a mechanism to allow
  userspace to attach protection information (e.g. T10 DIF) data to a
  disk write and to receive the information alongside a disk read.  The
  interface is an extension to the AIO interface: two new commands
  (IOCB_CMD_P{READ,WRITE}VM) are provided.  The last struct iovec in the
  arg list is interpreted to point to a buffer containing a header,
  followed by the the PI data.
 
 Instead of adding commands that indicate that the final element is a
 magical pi buffer, why not expand the iocb?
 
 In the user iocb, a bit in aio_flags could indicate that aio_reserved2
 is a pointer to an extension of the iocb.  In that extension could be a
 full iov *, nr_segs for PI data.

I'm inclined to agree with Zach on this item.  Ultimately, we need an 
extensible data structure that can be grown without completely revising 
the ABI as new parameters are added.  We need something that is either 
TLV based, or an extensible array.

 You'd then translate that into a bigger kernel kiocb with a specific
 pointer to PI data rather than having to bubble the tests for this magic
 final iovec down through the kernel.
 
 +   if (iocb-ki_flags  KIOCB_USE_PI) {
 +   nr_segs--;
 +   pi_iov = (struct iovec *)(iov + nr_segs);
 +   }
 
 I suggest this because there's already pressure to extend the iocb.
 Folks want io priority inputs, completion time outputs, etc.

There are already folks at other companies looking at similar extensions.  
I think there are folks at Google who have similar requirements.

Do you have time to put in some effort into defining these extensions?

-ben

 It's a much cleaner way to extend the interface without an explosion of
 command enums that are really combinations of per-io arguments that are
 present or not.
 
 And heck, on the sync rw syscall side, add variant that have a pointer
 to this same extension struct.  There's nothing inherently aio specific
 about having lots more per-io inputs and outputs.
 
 - z

-- 
Thought is the essence of where you are now.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sysfs makes scaling suck Re: Asynchronous scsi scanning

2007-05-17 Thread Benjamin LaHaise
On Wed, May 16, 2007 at 04:57:52AM +0530, Satyam Sharma wrote:
 
 echo 1  /sys/module/scsi_mod/.../wait_for_async_scans
 
 somewhere in some script. In fact, the latter method seems simpler,
 saner, better (in every which way)!

Please don't force sysfs on people.  Just watch how it keels over and dies 
when you end up with lots of disks/network interfaces on reasonably sized 
boxes.  16 statically allocated files per network interface starts being 
a problem once you've got a few thousand interfaces.  Anything that forces 
me to use sysfs is not gonna fly.

-ben
-- 
Time is of no importance, Mr. President, only life is important.
Don't Email: [EMAIL PROTECTED].
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sysfs makes scaling suck Re: Asynchronous scsi scanning

2007-05-17 Thread Benjamin LaHaise
On Thu, May 17, 2007 at 01:45:24PM -0400, James Bottomley wrote:
 But also, the sysfs with over 4,000 (and higher) devices was
 specifically checked by OSDL (actually as part of the CGL testing) some
 of the Manoj changes (for unpinning entries etc) were needed to get it
 to function, but as of now, I believe an enterprise scaling test works
 reasonably well for it ... there certainly wasn't any evidence of it
 dying horribly in the tests.

i386 exhausts lowmem very quickly.  SCSI is in a bit better shape than 
network devices as the multiplier is only around 4 compared to 16 for network 
devices.

My point stands, though.  Forcing every new feature to go in via sysfs is 
not the right thing to do.  Some people don't/can't use it, please remember 
them.

-ben
-- 
Time is of no importance, Mr. President, only life is important.
Don't Email: [EMAIL PROTECTED].
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asynchronous scsi scanning

2007-05-17 Thread Benjamin LaHaise
On Thu, May 17, 2007 at 01:39:54PM -0600, Matthew Wilcox wrote:
 On Fri, May 18, 2007 at 12:34:40AM +0530, Satyam Sharma wrote:
  Hmmm, actually those other users could easily write and maintain
  a 20-line patch that does the wait for async scans thing for them
  using /proc/scsi/scsi in any case.
 
 How about the three users who're bothered by this extra module being
 built maintain a one-line patch to Kconfig and leave well enough alone?

The module has an added bonus that it doesn't require any new tools to 
make work.  Doing it via sysfs/procfs means a new rev of whatever tool 
generates the boot initrd, plus fixing up boot scripts.  Loading a module 
can be done via a simple option to the existing boot tools.

-ben
-- 
Time is of no importance, Mr. President, only life is important.
Don't Email: [EMAIL PROTECTED].
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.13 1/20] aic94xx: Makefile

2005-09-09 Thread Benjamin LaHaise
A single file per patch is not really a patch split up.  Patches should 
stand on their own, leaving the tree in a compilable functioning state 
during each step.

-ben
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html