Re: [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jeff Layton
On Tue, 4 Jun 2013 07:56:44 -0400
Jim Rees  wrote:

> Jeff Layton wrote:
> 
>   > Might be nice to look at some profiles to confirm all of that.  I'd also
>   > be curious how much variation there was in the results above, as they're
>   > pretty close.
>   > 
>   
>   The above is just a random representative sample. The results are
>   pretty close when running this test, but I can average up several runs
>   and present the numbers. I plan to get a bare-metal test box on which
>   to run some more detailed testing and maybe some profiling this week.
> 
> Just contributing more runs into the mean doesn't tell us anything about the
> variance. With numbers that close you need the variance to tell whether it's
> a significant change.

Thanks. I'll see if I can get some standard deviation numbers here, and
I'll do it on some bare metal to ensure that virtualization doesn't
skew any results.

FWIW, they were all consistently very close to one another when I ran
these tests, and the times were all consistently shorter than the
unpatched kernel.

That said, this test is pretty rough. Doing this with "time" measures
other things that aren't related to locking. So I'll also see if I
can come up with a way to measure the actual locking performance more
accurately too.

-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Davidlohr Bueso
On Fri, 2013-05-31 at 23:07 -0400, Jeff Layton wrote:
> This is not the first attempt at doing this. The conversion to the
> i_lock was originally attempted by Bruce Fields a few years ago. His
> approach was NAK'ed since it involved ripping out the deadlock
> detection. People also really seem to like /proc/locks for debugging, so
> keeping that in is probably worthwhile.

Yep, we need to keep this. FWIW, lslocks(8) relies on /proc/locks.

Thanks,
Davidlohr



Re: [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jim Rees
Jeff Layton wrote:

  > Might be nice to look at some profiles to confirm all of that.  I'd also
  > be curious how much variation there was in the results above, as they're
  > pretty close.
  > 
  
  The above is just a random representative sample. The results are
  pretty close when running this test, but I can average up several runs
  and present the numbers. I plan to get a bare-metal test box on which
  to run some more detailed testing and maybe some profiling this week.

Just contributing more runs into the mean doesn't tell us anything about the
variance. With numbers that close you need the variance to tell whether it's
a significant change.



Re: [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jeff Layton
On Mon, 3 Jun 2013 17:31:01 -0400
"J. Bruce Fields"  wrote:

> On Fri, May 31, 2013 at 11:07:23PM -0400, Jeff Layton wrote:
> > Executive summary (tl;dr version): This patchset represents an overhaul
> > of the file locking code with an aim toward improving its scalability
> > and making the code a bit easier to understand.
> 
> Thanks for working on this, that code could use some love!
> 
> > Longer version:
> > 
> > When the BKL was finally ripped out of the kernel in 2010, the strategy
> > taken for the file locking code was to simply turn it into a new
> > file_lock_locks spinlock. It was an expedient way to deal with the file
> > locking code at the time, but having a giant spinlock around all of this
> > code is clearly not great for scalability. Red Hat has bug reports that
> > go back into the 2.6.18 era that point to BKL scalability problems in
> > the file locking code and the file_lock_lock suffers from the same
> > issues.
> > 
> > This patchset is my first attempt to make this code less dependent on
> > global locking. The main change is to switch most of the file locking
> > code to be protected by the inode->i_lock instead of the file_lock_lock.
> > 
> > While that works for most things, there are a couple of global data
> > structures (lists in the current code) that need a global lock to
> > protect them. So we still need a global lock in order to deal with
> > those. The remaining patches are intended to make that global locking
> > less painful. The big gain is made by turning the blocked_list into a
> > hashtable, which greatly speeds up the deadlock detection code.
> > 
> > I rolled a couple of small programs in order to test this code. The
> > first one just forks off 128 children and has them lock and unlock the
> > same file 10k times. Running this under "time" against a file on tmpfs
> > gives typical values like this:
> 
> What kind of hardware was this?
> 

Mostly a KVM guest on Intel hardware. I've done some testing on bare
metal too, and the results are pretty similar.
 
> > 
> > Unpatched (3.10-rc3-ish):
> > real0m5.283s
> > user0m0.380s
> > sys 0m20.469s
> > 
> > Patched (same base kernel):
> > real0m5.099s
> > user0m0.478s
> > sys 0m19.662s
> > 
> > ...so there seems to be some modest performance gain in this test. I
> > think that's almost entirely due to the change to a hashtable and to
> > optimize removing and readding blocked locks to the global lists. Note
> > that with this code we have to take two spinlocks instead of just one,
> > and that has some performance impact too. So the real peformance gain
> > from that hashtable conversion is eaten up to some degree by this.
> 
> Might be nice to look at some profiles to confirm all of that.  I'd also
> be curious how much variation there was in the results above, as they're
> pretty close.
> 

The above is just a random representative sample. The results are
pretty close when running this test, but I can average up several runs
and present the numbers. I plan to get a bare-metal test box on which
to run some more detailed testing and maybe some profiling this week.

> > The next test just forks off a bunch of children that each create their
> > own file and then lock and unlock it 20k times. Obviously, the locks in
> > this case are uncontended. Running that under "time" typically gives
> > these rough numbers.
> > 
> > Unpatched (3.10-rc3-ish):
> > real0m8.836s
> > user0m1.018s
> > sys 0m34.094s
> > 
> > Patched (same base kernel):
> > real0m4.965s
> > user0m1.043s
> > sys 0m18.651s
> > 
> > In this test, we see the real benefit of moving to the i_lock for most
> > of this code. The run time is almost cut in half in this test. With
> > these changes locking different inodes needs very little serialization.
> > 
> > If people know of other file locking performance tests, then I'd be
> > happy to try them out too. It's possible that this might make some
> > workloads slower, and it would be helpful to know what they are (and
> > address them) if so.
> > 
> > This is not the first attempt at doing this. The conversion to the
> > i_lock was originally attempted by Bruce Fields a few years ago. His
> > approach was NAK'ed since it involved ripping out the deadlock
> > detection. People also really seem to like /proc/locks for debugging, so
> > keeping that in is probably worthwhile.
> 
> Yes, there's already code that depends on it.
> 
> The deadlock detection, though--I still wonder if we could get away with
> ripping it out.  Might be worth at least giving an option to configure
> it out as a first step.
> 
> --b.
> 


I considered that, and have patches that add such a config option. Some
of the later patches in this set optimize the code that is necessary to
support deadlock detection. In particular, turning the blocked_list
into a hashtable really speeds up the list walking. So much so that I
think the case for compiling it out is less obvious.

Should we a

Re: [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-03 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:23PM -0400, Jeff Layton wrote:
> Executive summary (tl;dr version): This patchset represents an overhaul
> of the file locking code with an aim toward improving its scalability
> and making the code a bit easier to understand.

Thanks for working on this, that code could use some love!

> Longer version:
> 
> When the BKL was finally ripped out of the kernel in 2010, the strategy
> taken for the file locking code was to simply turn it into a new
> file_lock_locks spinlock. It was an expedient way to deal with the file
> locking code at the time, but having a giant spinlock around all of this
> code is clearly not great for scalability. Red Hat has bug reports that
> go back into the 2.6.18 era that point to BKL scalability problems in
> the file locking code and the file_lock_lock suffers from the same
> issues.
> 
> This patchset is my first attempt to make this code less dependent on
> global locking. The main change is to switch most of the file locking
> code to be protected by the inode->i_lock instead of the file_lock_lock.
> 
> While that works for most things, there are a couple of global data
> structures (lists in the current code) that need a global lock to
> protect them. So we still need a global lock in order to deal with
> those. The remaining patches are intended to make that global locking
> less painful. The big gain is made by turning the blocked_list into a
> hashtable, which greatly speeds up the deadlock detection code.
> 
> I rolled a couple of small programs in order to test this code. The
> first one just forks off 128 children and has them lock and unlock the
> same file 10k times. Running this under "time" against a file on tmpfs
> gives typical values like this:

What kind of hardware was this?

> 
> Unpatched (3.10-rc3-ish):
> real  0m5.283s
> user  0m0.380s
> sys   0m20.469s
> 
> Patched (same base kernel):
> real  0m5.099s
> user  0m0.478s
> sys   0m19.662s
> 
> ...so there seems to be some modest performance gain in this test. I
> think that's almost entirely due to the change to a hashtable and to
> optimize removing and readding blocked locks to the global lists. Note
> that with this code we have to take two spinlocks instead of just one,
> and that has some performance impact too. So the real peformance gain
> from that hashtable conversion is eaten up to some degree by this.

Might be nice to look at some profiles to confirm all of that.  I'd also
be curious how much variation there was in the results above, as they're
pretty close.

> The next test just forks off a bunch of children that each create their
> own file and then lock and unlock it 20k times. Obviously, the locks in
> this case are uncontended. Running that under "time" typically gives
> these rough numbers.
> 
> Unpatched (3.10-rc3-ish):
> real  0m8.836s
> user  0m1.018s
> sys   0m34.094s
> 
> Patched (same base kernel):
> real  0m4.965s
> user  0m1.043s
> sys   0m18.651s
> 
> In this test, we see the real benefit of moving to the i_lock for most
> of this code. The run time is almost cut in half in this test. With
> these changes locking different inodes needs very little serialization.
> 
> If people know of other file locking performance tests, then I'd be
> happy to try them out too. It's possible that this might make some
> workloads slower, and it would be helpful to know what they are (and
> address them) if so.
> 
> This is not the first attempt at doing this. The conversion to the
> i_lock was originally attempted by Bruce Fields a few years ago. His
> approach was NAK'ed since it involved ripping out the deadlock
> detection. People also really seem to like /proc/locks for debugging, so
> keeping that in is probably worthwhile.

Yes, there's already code that depends on it.

The deadlock detection, though--I still wonder if we could get away with
ripping it out.  Might be worth at least giving an option to configure
it out as a first step.

--b.

> There's more work to be done in this area and this patchset is just a
> start. There's a horrible thundering herd problem when a blocking lock
> is released, for instance. There was also interest in solving the goofy
> "unlock on any close" POSIX lock semantics at this year's LSF. I think
> this patchset will help lay the groundwork for those changes as well.
> 
> Comments and suggestions welcome.
> 
> Jeff Layton (11):
>   cifs: use posix_unblock_lock instead of locks_delete_block
>   locks: make generic_add_lease and generic_delete_lease static
>   locks: comment cleanups and clarifications
>   locks: make "added" in __posix_lock_file a bool
>   locks: encapsulate the fl_link list handling
>   locks: convert to i_lock to protect i_flock list
>   locks: only pull entries off of blocked_list when they are really
> unblocked
>   locks: convert fl_link to a hlist_node
>   locks: turn the blocked_list into a hashtable
>   locks: add a new "lm_owner_key" lock operation
>   locks: give the blocked_hash its own spinlock
> 
>