Re: [PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-14 Thread Jim Schutt
On 05/14/2013 10:44 AM, Alex Elder wrote:
> On 05/09/2013 09:42 AM, Jim Schutt wrote:
>> Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
>> holding a lock, but it's spoiled because ceph_pagelist_addpage() always
>> calls kmap(), which might sleep.  Here's the result:
> 
> I finally took a close look at this today, Jim.  Sorry
> for the delay.
> 

No worries - thanks for taking a look.

> The issue is formatting the reconnect message--which will
> hold an arbitrary amount of data and therefore which we'll
> need to do some allocation (and kmap) for--in the face of
> having to hold the flock spinlock while doing so.
> 
> And as you found, ceph_pagelist_addpage(), which is called
> by ceph_pagelist_append(), calls kmap() even if it doesn't
> need to allocate anything.  This means that despite reserving
> the pages, those pages are in the free list and because they'll
> need to be the subject of kmap() their preallocation doesn't
> help.
> 
> Your solution was to pre-allocate a buffer, format the locks
> into that buffer while holding the lock, then append the
> buffer contents to a pagelist after releasing the lock.  You
> check for a changing (increasing) lock count while you format
> the locks, which is good.
> 
> So...  Given that, I think your change looks good.  It's a shame
> we can't format directly into the pagelist buffer but this won't
> happen much so it's not a big deal.  I have a few small suggestions,
> below.
> 
> I do find some byte order bugs though.   They aren't your doing,
> but I think they ought to be fixed first, as a separate patch
> that would precede this one.  The bug is that the lock counts
> that are put into the buffer (num_fcntl_locks and num_flock_locks)
> are not properly byte-swapped.  I'll point it out inline
> in your code, below.
> 
> I'll say that what you have is OK.  Consider my suggestions, and
> if you choose not to fix the byte order bugs, please let me know.

I'll happily fix up a v2 series with your suggestions addressed.
Thanks for catching those issues.  Stay tuned...

Thanks -- Jim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-14 Thread Alex Elder
On 05/09/2013 09:42 AM, Jim Schutt wrote:
> Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
> holding a lock, but it's spoiled because ceph_pagelist_addpage() always
> calls kmap(), which might sleep.  Here's the result:

I finally took a close look at this today, Jim.  Sorry
for the delay.

The issue is formatting the reconnect message--which will
hold an arbitrary amount of data and therefore which we'll
need to do some allocation (and kmap) for--in the face of
having to hold the flock spinlock while doing so.

And as you found, ceph_pagelist_addpage(), which is called
by ceph_pagelist_append(), calls kmap() even if it doesn't
need to allocate anything.  This means that despite reserving
the pages, those pages are in the free list and because they'll
need to be the subject of kmap() their preallocation doesn't
help.

Your solution was to pre-allocate a buffer, format the locks
into that buffer while holding the lock, then append the
buffer contents to a pagelist after releasing the lock.  You
check for a changing (increasing) lock count while you format
the locks, which is good.

So...  Given that, I think your change looks good.  It's a shame
we can't format directly into the pagelist buffer but this won't
happen much so it's not a big deal.  I have a few small suggestions,
below.

I do find some byte order bugs though.   They aren't your doing,
but I think they ought to be fixed first, as a separate patch
that would precede this one.  The bug is that the lock counts
that are put into the buffer (num_fcntl_locks and num_flock_locks)
are not properly byte-swapped.  I'll point it out inline
in your code, below.

I'll say that what you have is OK.  Consider my suggestions, and
if you choose not to fix the byte order bugs, please let me know.

Reviewed-by: Alex Elder 


> [13439.295457] ceph: mds0 reconnect start
> [13439.300572] BUG: sleeping function called from invalid context at 
> include/linux/highmem.h:58
> [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: 
> kworker/1:1
> [13439.316464] 5 locks held by kworker/1:1/12059:
> [13439.320998]  #0:  (ceph-msgr){..}, at: [] 
> process_one_work+0x218/0x480
> [13439.329701]  #1:  ((&(>work)->work)){..}, at: 
> [] process_one_work+0x218/0x480
> [13439.339446]  #2:  (>s_mutex){..}, at: [] 
> send_mds_reconnect+0xec/0x450 [ceph]
> [13439.349081]  #3:  (>snap_rwsem){..}, at: [] 
> send_mds_reconnect+0x16e/0x450 [ceph]
> [13439.359278]  #4:  (file_lock_lock){..}, at: [] 
> lock_flocks+0x15/0x20
> [13439.367816] Pid: 12059, comm: kworker/1:1 Tainted: GW
> 3.9.0-00358-g308ae61 #557
> [13439.376225] Call Trace:
> [13439.378757]  [] __might_sleep+0xfc/0x110

. . .

> [13501.300419] ceph: mds0 caps renewed
> 
> Fix it up by encoding locks into a buffer first, and when the
> number of encoded locks is stable, copy that into a ceph_pagelist.
> 
> Signed-off-by: Jim Schutt 
> ---
>  fs/ceph/locks.c  |   73 +
>  fs/ceph/mds_client.c |   62 ++
>  fs/ceph/super.h  |9 +-
>  3 files changed, 88 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
> index 202dd3d..9a46161 100644
> --- a/fs/ceph/locks.c
> +++ b/fs/ceph/locks.c

Unrelated, but I noticed that the comment above
ceph_count_locks() is out of date ("BKL").  Maybe
you could fix that.

. . .

> @@ -239,12 +228,48 @@ int ceph_encode_locks(struct inode *inode, struct 
> ceph_pagelist *pagelist,
>   err = -ENOSPC;
>   goto fail;
>   }
> - err = lock_to_ceph_filelock(lock, );
> + err = lock_to_ceph_filelock(lock, [l]);
>   if (err)
>   goto fail;
> - err = ceph_pagelist_append(pagelist, ,
> -sizeof(struct ceph_filelock));
> + ++l;
>   }
> + }
> +fail:
> + return err;
> +}
> +
> +/**
> + * Copy the encoded flock and fcntl locks into the pagelist.
> + * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
> + * sequential flock locks.
> + * Returns zero on success.
> + */
> +int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
> +struct ceph_pagelist *pagelist,
> +int num_fcntl_locks, int num_flock_locks)
> +{
> + int err = 0;
> + int l;
> +
> + err = ceph_pagelist_append(pagelist, _fcntl_locks, sizeof(u32));

This is a bug, but I realize you're preserving the existing
functionality.  The fcntl lock count should be converted to
le32 for over-the-wire format.  (I haven't checked the other
end--if it's not expecting le32, it's got a bug too.)

> + if (err)
> + goto fail;
> +
> + for (l = 0; l < num_fcntl_locks; l++) {
> + err = 

Re: [PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-14 Thread Alex Elder
On 05/09/2013 09:42 AM, Jim Schutt wrote:
 Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
 holding a lock, but it's spoiled because ceph_pagelist_addpage() always
 calls kmap(), which might sleep.  Here's the result:

I finally took a close look at this today, Jim.  Sorry
for the delay.

The issue is formatting the reconnect message--which will
hold an arbitrary amount of data and therefore which we'll
need to do some allocation (and kmap) for--in the face of
having to hold the flock spinlock while doing so.

And as you found, ceph_pagelist_addpage(), which is called
by ceph_pagelist_append(), calls kmap() even if it doesn't
need to allocate anything.  This means that despite reserving
the pages, those pages are in the free list and because they'll
need to be the subject of kmap() their preallocation doesn't
help.

Your solution was to pre-allocate a buffer, format the locks
into that buffer while holding the lock, then append the
buffer contents to a pagelist after releasing the lock.  You
check for a changing (increasing) lock count while you format
the locks, which is good.

So...  Given that, I think your change looks good.  It's a shame
we can't format directly into the pagelist buffer but this won't
happen much so it's not a big deal.  I have a few small suggestions,
below.

I do find some byte order bugs though.   They aren't your doing,
but I think they ought to be fixed first, as a separate patch
that would precede this one.  The bug is that the lock counts
that are put into the buffer (num_fcntl_locks and num_flock_locks)
are not properly byte-swapped.  I'll point it out inline
in your code, below.

I'll say that what you have is OK.  Consider my suggestions, and
if you choose not to fix the byte order bugs, please let me know.

Reviewed-by: Alex Elder el...@inktank.com


 [13439.295457] ceph: mds0 reconnect start
 [13439.300572] BUG: sleeping function called from invalid context at 
 include/linux/highmem.h:58
 [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: 
 kworker/1:1
 [13439.316464] 5 locks held by kworker/1:1/12059:
 [13439.320998]  #0:  (ceph-msgr){..}, at: [810609f8] 
 process_one_work+0x218/0x480
 [13439.329701]  #1:  (((con-work)-work)){..}, at: 
 [810609f8] process_one_work+0x218/0x480
 [13439.339446]  #2:  (s-s_mutex){..}, at: [a046273c] 
 send_mds_reconnect+0xec/0x450 [ceph]
 [13439.349081]  #3:  (mdsc-snap_rwsem){..}, at: [a04627be] 
 send_mds_reconnect+0x16e/0x450 [ceph]
 [13439.359278]  #4:  (file_lock_lock){..}, at: [811cadf5] 
 lock_flocks+0x15/0x20
 [13439.367816] Pid: 12059, comm: kworker/1:1 Tainted: GW
 3.9.0-00358-g308ae61 #557
 [13439.376225] Call Trace:
 [13439.378757]  [81076f4c] __might_sleep+0xfc/0x110

. . .

 [13501.300419] ceph: mds0 caps renewed
 
 Fix it up by encoding locks into a buffer first, and when the
 number of encoded locks is stable, copy that into a ceph_pagelist.
 
 Signed-off-by: Jim Schutt jasc...@sandia.gov
 ---
  fs/ceph/locks.c  |   73 +
  fs/ceph/mds_client.c |   62 ++
  fs/ceph/super.h  |9 +-
  3 files changed, 88 insertions(+), 56 deletions(-)
 
 diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
 index 202dd3d..9a46161 100644
 --- a/fs/ceph/locks.c
 +++ b/fs/ceph/locks.c

Unrelated, but I noticed that the comment above
ceph_count_locks() is out of date (BKL).  Maybe
you could fix that.

. . .

 @@ -239,12 +228,48 @@ int ceph_encode_locks(struct inode *inode, struct 
 ceph_pagelist *pagelist,
   err = -ENOSPC;
   goto fail;
   }
 - err = lock_to_ceph_filelock(lock, cephlock);
 + err = lock_to_ceph_filelock(lock, flocks[l]);
   if (err)
   goto fail;
 - err = ceph_pagelist_append(pagelist, cephlock,
 -sizeof(struct ceph_filelock));
 + ++l;
   }
 + }
 +fail:
 + return err;
 +}
 +
 +/**
 + * Copy the encoded flock and fcntl locks into the pagelist.
 + * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
 + * sequential flock locks.
 + * Returns zero on success.
 + */
 +int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
 +struct ceph_pagelist *pagelist,
 +int num_fcntl_locks, int num_flock_locks)
 +{
 + int err = 0;
 + int l;
 +
 + err = ceph_pagelist_append(pagelist, num_fcntl_locks, sizeof(u32));

This is a bug, but I realize you're preserving the existing
functionality.  The fcntl lock count should be converted to
le32 for over-the-wire format.  (I haven't checked the other
end--if it's not expecting le32, it's got a bug too.)

 + if (err)
 + goto fail;
 +
 + for (l 

Re: [PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-14 Thread Jim Schutt
On 05/14/2013 10:44 AM, Alex Elder wrote:
 On 05/09/2013 09:42 AM, Jim Schutt wrote:
 Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
 holding a lock, but it's spoiled because ceph_pagelist_addpage() always
 calls kmap(), which might sleep.  Here's the result:
 
 I finally took a close look at this today, Jim.  Sorry
 for the delay.
 

No worries - thanks for taking a look.

 The issue is formatting the reconnect message--which will
 hold an arbitrary amount of data and therefore which we'll
 need to do some allocation (and kmap) for--in the face of
 having to hold the flock spinlock while doing so.
 
 And as you found, ceph_pagelist_addpage(), which is called
 by ceph_pagelist_append(), calls kmap() even if it doesn't
 need to allocate anything.  This means that despite reserving
 the pages, those pages are in the free list and because they'll
 need to be the subject of kmap() their preallocation doesn't
 help.
 
 Your solution was to pre-allocate a buffer, format the locks
 into that buffer while holding the lock, then append the
 buffer contents to a pagelist after releasing the lock.  You
 check for a changing (increasing) lock count while you format
 the locks, which is good.
 
 So...  Given that, I think your change looks good.  It's a shame
 we can't format directly into the pagelist buffer but this won't
 happen much so it's not a big deal.  I have a few small suggestions,
 below.
 
 I do find some byte order bugs though.   They aren't your doing,
 but I think they ought to be fixed first, as a separate patch
 that would precede this one.  The bug is that the lock counts
 that are put into the buffer (num_fcntl_locks and num_flock_locks)
 are not properly byte-swapped.  I'll point it out inline
 in your code, below.
 
 I'll say that what you have is OK.  Consider my suggestions, and
 if you choose not to fix the byte order bugs, please let me know.

I'll happily fix up a v2 series with your suggestions addressed.
Thanks for catching those issues.  Stay tuned...

Thanks -- Jim


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-09 Thread Jim Schutt
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
holding a lock, but it's spoiled because ceph_pagelist_addpage() always
calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at 
include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
[13439.316464] 5 locks held by kworker/1:1/12059:
[13439.320998]  #0:  (ceph-msgr){..}, at: [] 
process_one_work+0x218/0x480
[13439.329701]  #1:  ((&(>work)->work)){..}, at: [] 
process_one_work+0x218/0x480
[13439.339446]  #2:  (>s_mutex){..}, at: [] 
send_mds_reconnect+0xec/0x450 [ceph]
[13439.349081]  #3:  (>snap_rwsem){..}, at: [] 
send_mds_reconnect+0x16e/0x450 [ceph]
[13439.359278]  #4:  (file_lock_lock){..}, at: [] 
lock_flocks+0x15/0x20
[13439.367816] Pid: 12059, comm: kworker/1:1 Tainted: GW
3.9.0-00358-g308ae61 #557
[13439.376225] Call Trace:
[13439.378757]  [] __might_sleep+0xfc/0x110
[13439.384353]  [] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [] ? lock_flocks+0x15/0x20
[13439.409277]  [] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [] ? igrab+0x28/0x70
[13439.420610]  [] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [] ? mutex_unlock+0xe/0x10
[13439.473934]  [] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [] ? read_partial_message+0x3e8/0x520 
[libceph]
[13439.500583]  [] ? kernel_recvmsg+0x44/0x60
[13439.506324]  [] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
[13439.513140]  [] try_read+0x5fe/0x7e0 [libceph]
[13439.519246]  [] con_work+0x378/0x4a0 [libceph]
[13439.525345]  [] ? finish_task_switch+0x3f/0x110
[13439.531515]  [] process_one_work+0x2b5/0x480
[13439.537439]  [] ? process_one_work+0x218/0x480
[13439.543526]  [] worker_thread+0x1f5/0x320
[13439.549191]  [] ? manage_workers+0x170/0x170
[13439.555102]  [] kthread+0xe1/0xf0
[13439.560075]  [] ? __init_kthread_worker+0x70/0x70
[13439.566419]  [] ret_from_fork+0x7c/0xb0
[13439.571918]  [] ? __init_kthread_worker+0x70/0x70
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the
number of encoded locks is stable, copy that into a ceph_pagelist.

Signed-off-by: Jim Schutt 
---
 fs/ceph/locks.c  |   73 +
 fs/ceph/mds_client.c |   62 ++
 fs/ceph/super.h  |9 +-
 3 files changed, 88 insertions(+), 56 deletions(-)

diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 202dd3d..9a46161 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -191,27 +191,23 @@ void ceph_count_locks(struct inode *inode, int 
*fcntl_count, int *flock_count)
 }
 
 /**
- * Encode the flock and fcntl locks for the given inode into the pagelist.
- * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
- * sequential flock locks.
- * Must be called with lock_flocks() already held.
- * If we encounter more of a specific lock type than expected,
- * we return the value 1.
+ * Encode the flock and fcntl locks for the given inode into the ceph_filelock
+ * array. Must be called with lock_flocks() already held.
+ * If we encounter more of a specific lock type than expected, return -ENOSPC.
  */
-int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
- int num_fcntl_locks, int num_flock_locks)
+int ceph_encode_locks_to_buffer(struct inode *inode,
+   struct ceph_filelock *flocks,
+   int num_fcntl_locks, int num_flock_locks)
 {
struct file_lock *lock;
-   struct ceph_filelock cephlock;
int err = 0;
int seen_fcntl = 0;
int seen_flock = 0;
+   int l = 0;
 
dout("encoding %d flock and %d fcntl locks", num_flock_locks,
 num_fcntl_locks);
-   err = ceph_pagelist_append(pagelist, _fcntl_locks, sizeof(u32));
-   if (err)
-   goto fail;
+
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
if (lock->fl_flags & FL_POSIX) {
++seen_fcntl;
@@ -219,19 +215,12 @@ int ceph_encode_locks(struct inode *inode, struct 
ceph_pagelist *pagelist,
  

[PATCH] libceph: ceph_pagelist_append might sleep while atomic

2013-05-09 Thread Jim Schutt
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
holding a lock, but it's spoiled because ceph_pagelist_addpage() always
calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at 
include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
[13439.316464] 5 locks held by kworker/1:1/12059:
[13439.320998]  #0:  (ceph-msgr){..}, at: [810609f8] 
process_one_work+0x218/0x480
[13439.329701]  #1:  (((con-work)-work)){..}, at: [810609f8] 
process_one_work+0x218/0x480
[13439.339446]  #2:  (s-s_mutex){..}, at: [a046273c] 
send_mds_reconnect+0xec/0x450 [ceph]
[13439.349081]  #3:  (mdsc-snap_rwsem){..}, at: [a04627be] 
send_mds_reconnect+0x16e/0x450 [ceph]
[13439.359278]  #4:  (file_lock_lock){..}, at: [811cadf5] 
lock_flocks+0x15/0x20
[13439.367816] Pid: 12059, comm: kworker/1:1 Tainted: GW
3.9.0-00358-g308ae61 #557
[13439.376225] Call Trace:
[13439.378757]  [81076f4c] __might_sleep+0xfc/0x110
[13439.384353]  [a03f4ce0] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [a0448fe9] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [814ee849] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [811cadf5] ? lock_flocks+0x15/0x20
[13439.409277]  [a045e2af] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [81196748] ? igrab+0x28/0x70
[13439.420610]  [a045e9f8] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [a045ea25] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [a045de90] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [a0462888] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [a0464542] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [a0462e42] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [a04631ad] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [814ebc7e] ? mutex_unlock+0xe/0x10
[13439.473934]  [a043c612] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [a03f6c2c] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [a03eec3d] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [a03f1498] ? read_partial_message+0x3e8/0x520 
[libceph]
[13439.500583]  [81415184] ? kernel_recvmsg+0x44/0x60
[13439.506324]  [a03ef3a8] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
[13439.513140]  [a03f2aae] try_read+0x5fe/0x7e0 [libceph]
[13439.519246]  [a03f39f8] con_work+0x378/0x4a0 [libceph]
[13439.525345]  [8107792f] ? finish_task_switch+0x3f/0x110
[13439.531515]  [81060a95] process_one_work+0x2b5/0x480
[13439.537439]  [810609f8] ? process_one_work+0x218/0x480
[13439.543526]  [81064185] worker_thread+0x1f5/0x320
[13439.549191]  [81063f90] ? manage_workers+0x170/0x170
[13439.555102]  [81069641] kthread+0xe1/0xf0
[13439.560075]  [81069560] ? __init_kthread_worker+0x70/0x70
[13439.566419]  [814f7edc] ret_from_fork+0x7c/0xb0
[13439.571918]  [81069560] ? __init_kthread_worker+0x70/0x70
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the
number of encoded locks is stable, copy that into a ceph_pagelist.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 fs/ceph/locks.c  |   73 +
 fs/ceph/mds_client.c |   62 ++
 fs/ceph/super.h  |9 +-
 3 files changed, 88 insertions(+), 56 deletions(-)

diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 202dd3d..9a46161 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -191,27 +191,23 @@ void ceph_count_locks(struct inode *inode, int 
*fcntl_count, int *flock_count)
 }
 
 /**
- * Encode the flock and fcntl locks for the given inode into the pagelist.
- * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
- * sequential flock locks.
- * Must be called with lock_flocks() already held.
- * If we encounter more of a specific lock type than expected,
- * we return the value 1.
+ * Encode the flock and fcntl locks for the given inode into the ceph_filelock
+ * array. Must be called with lock_flocks() already held.
+ * If we encounter more of a specific lock type than expected, return -ENOSPC.
  */
-int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
- int num_fcntl_locks, int num_flock_locks)
+int ceph_encode_locks_to_buffer(struct inode *inode,
+   struct ceph_filelock *flocks,
+   int num_fcntl_locks, int num_flock_locks)
 {
struct file_lock *lock;
-   struct