Re: [PATCH -next] fs: dlm: fix missing unlock on error in accept_from_sock()

2021-03-29 Thread David Teigland
On Sat, Mar 27, 2021 at 04:37:04PM +0800, Yang Yingliang wrote:
> Add the missing unlock before return from accept_from_sock()
> in the error handling case.

Thanks, applied to the next branch.
Dave

> Fixes: 6cde210a9758 ("fs: dlm: add helper for init connection")
> Reported-by: Hulk Robot 
> Signed-off-by: Yang Yingliang 
> ---
>  fs/dlm/lowcomms.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index 73cc1809050a..166e36fcf3e4 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -931,6 +931,7 @@ static int accept_from_sock(struct listen_connection *con)
>   result = dlm_con_init(othercon, nodeid);
>   if (result < 0) {
>   kfree(othercon);
> + mutex_unlock(>sock_mutex);
>   goto accept_err;
>   }
>  
> -- 
> 2.25.1



[GIT PULL] dlm updates for 5.11

2020-12-14 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.11

This set includes more low level communication layer cleanups.
The main change is the listening socket is no longer handled as
a special case of node connection sockets.  There is one small
fix for checking the number of local connections.
Thanks,
Dave

Alexander Aring (13):
  fs: dlm: fix proper srcu api call
  fs: dlm: define max send buffer
  fs: dlm: add get buffer error handling
  fs: dlm: flush othercon at close
  fs: dlm: handle non blocked connect event
  fs: dlm: add helper for init connection
  fs: dlm: move connect callback in node creation
  fs: dlm: move shutdown action to node creation
  fs: dlm: refactor sctp sock parameter
  fs: dlm: listen socket out of connection hash
  fs: dlm: fix check for multi-homed hosts
  fs: dlm: constify addr_compare
  fs: dlm: check on existing node address

 fs/dlm/lockspace.c |   2 +-
 fs/dlm/lowcomms.c  | 304 -
 fs/dlm/lowcomms.h  |   2 +
 fs/dlm/member.c|   2 +-
 fs/dlm/rcom.c  |   6 +-
 5 files changed, 168 insertions(+), 148 deletions(-)



[GIT PULL] dlm updates for 5.10

2020-10-12 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.10

This set continues the ongoing rework of the low level
communication layer in the dlm.  The focus here is on
improvements to connection handling, and reworking the
receiving of messages.

Thanks,
Dave

Alexander Aring (13):
  fs: dlm: synchronize dlm before shutdown
  fs: dlm: make connection hash lockless
  fs: dlm: fix dlm_local_addr memory leak
  fs: dlm: fix configfs memory leak
  fs: dlm: move free writequeue into con free
  fs: dlm: handle possible othercon writequeues
  fs: dlm: use free_con to free connection
  fs: dlm: remove lock dependency warning
  fs: dlm: fix mark per nodeid setting
  fs: dlm: handle range check as callback
  fs: dlm: disallow buffer size below default
  fs: dlm: rework receive handling
  fs: dlm: fix race in nodeid2con

 fs/dlm/Kconfig|   1 +
 fs/dlm/config.c   |  66 ++-
 fs/dlm/config.h   |   4 +-
 fs/dlm/lowcomms.c | 329 ++
 fs/dlm/midcomms.c | 136 +-
 fs/dlm/midcomms.h |   3 +-
 6 files changed, 260 insertions(+), 279 deletions(-)



[GIT PULL] dlm updates for 5.9

2020-08-06 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.9

This set includes a some improvements to the dlm networking layer:
improving the ability to trace dlm messages for debugging, and improved
handling of bad messages or disrupted connections.

Note two unusual things:
- There is a commit under net that was posted to netdev, which add a
  socket helper for setting a mark value on a socket.
- This branch was just rebased to drop a commit that was adding a tuning
  knob to adjust blocking during recovery; we decided there's not enough
  evidence it's necessary.

Thanks,
Dave

Alexander Aring (6):
  net: sock: add sock_set_mark
  fs: dlm: set skb mark for listen socket
  fs: dlm: set skb mark per peer socket
  fs: dlm: don't close socket on invalid message
  fs: dlm: change handling of reconnects
  fs: dlm: implement tcp graceful shutdown

Wang Hai (1):
  dlm: Fix kobject memleak

 fs/dlm/config.c|  44 +
 fs/dlm/config.h|   2 ++
 fs/dlm/lockspace.c |   6 ++--
 fs/dlm/lowcomms.c  | 131 
-
 include/net/sock.h |   1 +
 net/core/sock.c|   8 ++
 6 files changed, 164 insertions(+), 28 deletions(-)



[GIT PULL] dlm updates for 5.8

2020-06-05 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.8

This set includes a couple minor cleanups, and dropping the
interruptible from a wait_event that waits for an event from
the userspace cluster management.

Thanks,
Dave

Arnd Bergmann (1):
  dlm: remove BUG() before panic()

Gustavo A. R. Silva (2):
  dlm: dlm_internal: Replace zero-length array with flexible-array member
  dlm: user: Replace zero-length array with flexible-array member

Ross Lagerwall (1):
  dlm: Switch to using wait_event()

Wu Bo (1):
  fs:dlm:remove unneeded semicolon in rcom.c

 fs/dlm/dlm_internal.h |  7 +++
 fs/dlm/lockspace.c| 18 --
 fs/dlm/rcom.c |  2 +-
 fs/dlm/user.c |  2 +-
 4 files changed, 9 insertions(+), 20 deletions(-)



Re: [PATCH 1/4] sctp: add sctp_sock_set_nodelay

2020-05-29 Thread David Teigland
On Fri, May 29, 2020 at 02:09:40PM +0200, Christoph Hellwig wrote:
> Add a helper to directly set the SCTP_NODELAY sockopt from kernel space
> without going through a fake uaccess.

Ack, they look fine to me, thanks.
Dave



Re: is it ok to always pull in sctp for dlm, was: Re: [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-14 Thread David Teigland
On Thu, May 14, 2020 at 12:40:40PM +0200, Christoph Hellwig wrote:
> On Wed, May 13, 2020 at 03:00:58PM -0300, Marcelo Ricardo Leitner wrote:
> > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote:
> > > And call it directly from dlm instead of going through kernel_setsockopt.
> > 
> > The advantage on using kernel_setsockopt here is that sctp module will
> > only be loaded if dlm actually creates a SCTP socket.  With this
> > change, sctp will be loaded on setups that may not be actually using
> > it. It's a quite big module and might expose the system.
> > 
> > I'm okay with the SCTP changes, but I'll defer to DLM folks to whether
> > that's too bad or what for DLM.
> 
> So for ipv6 I could just move the helpers inline as they were trivial
> and avoid that issue.  But some of the sctp stuff really is way too
> big for that, so the only other option would be to use symbol_get.

Let's try symbol_get, having the sctp module always loaded caused problems
last time it happened (almost nobody uses dlm with it.)
Dave 



[GIT PULL] dlm updates for 5.3 (second try)

2019-07-12 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.3

This set removes some unnecessary debugfs error handling, and
checks that lowcomms workqueues are not NULL before destroying.

(Dropped the commits related to incorrect wait_event usage from the
first pull request.)

Thanks,
Dave

David Windsor (1):
  dlm: check if workqueues are NULL before flushing/destroying

Greg Kroah-Hartman (1):
  dlm: no need to check return value of debugfs_create functions

 fs/dlm/debug_fs.c | 21 ++---
 fs/dlm/dlm_internal.h |  8 
 fs/dlm/lowcomms.c | 18 --
 fs/dlm/main.c |  5 +
 4 files changed, 19 insertions(+), 33 deletions(-)



Re: [GIT PULL] dlm updates for 5.3

2019-07-11 Thread David Teigland
On Wed, Jul 10, 2019 at 09:05:21PM -0700, Linus Torvalds wrote:
> If wait_event_interruptible() returns -ERESTARTSYS, it means that we
> have a signal pending.
> 
> And if we have a signal pending, then you can't go back and call
> wait_event_interruptible() in a loop, because the signal will
> *continue* to be pending, so now your "wait event" becomes a kernel
> busy loop.
> 
> If you don't want to react to signals, then you shouldn't use the
> "interruptible()" version of wait-event.

Right, a simple wait_event looks obvious; I'll have the submitters test
that before sending that next time around.  I'll put together another pull
with the two trivial commits.
Thanks
Dave


[GIT PULL] dlm updates for 5.3

2019-07-09 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-5.3

Apart from a couple trivial fixes, the more notable fix makes the dlm
continuing waiting for a user space result if a signal interrupts the
wait event.

Thanks,
Dave


David Teigland (1):
  dlm: Fix test for -ERESTARTSYS

David Windsor (1):
  dlm: check if workqueues are NULL before flushing/destroying

Greg Kroah-Hartman (1):
  dlm: no need to check return value of debugfs_create functions

Mark Syms (1):
  dlm: retry wait_event_interruptible in event of ERESTARTSYS


 fs/dlm/debug_fs.c | 21 ++---
 fs/dlm/dlm_internal.h |  8 
 fs/dlm/lockspace.c|  6 --
 fs/dlm/lowcomms.c | 18 --
 fs/dlm/main.c |  5 +
 5 files changed, 23 insertions(+), 35 deletions(-)



[GIT PULL] dlm updates for 4.21

2018-12-19 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.21

This set is entirely trivial fixes, mainly around correct cleanup
on error paths and improved error checks.  One patch adds scheduling
in a potentially long recovery loop.

Thanks,
Dave

Bob Peterson (1):
  dlm: Don't swamp the CPU with callbacks queued during recovery

David Teigland (2):
  dlm: fix missing idr_destroy for recover_idr
  dlm: fix invalid cluster name warning

Denis V. Lunev (1):
  dlm: fix possible call to kfree() for non-initialized pointer

Thomas Meyer (1):
  dlm: NULL check before some freeing functions is not needed

Tycho Andersen (3):
  dlm: fix invalid free
  dlm: don't allow zero length names
  dlm: don't leak kernel pointer to userspace

Vasily Averin (4):
  dlm: fixed memory leaks after failed ls_remove_names allocation
  dlm: possible memory leak on error path in create_lkb()
  dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
  dlm: memory leaks on error path in dlm_user_request()

Wen Yang (1):
  dlm: NULL check before kmem_cache_destroy is not needed

 fs/dlm/ast.c   | 10 ++
 fs/dlm/lock.c  | 17 ++---
 fs/dlm/lockspace.c |  9 -
 fs/dlm/member.c|  7 ---
 fs/dlm/memory.c|  9 +++--
 fs/dlm/user.c  |  5 +++--
 6 files changed, 34 insertions(+), 23 deletions(-)



[GIT PULL] dlm updates for 4.18

2018-06-04 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.18

These three commits fix and clean up the flags dlm was
using on its SCTP sockets.  The result improves the
performance and fixes some bad connection delays.

Thanks,
Dave

Gang He (3):
  dlm: fix a clerical error when set SCTP_NODELAY
  dlm: make sctp_connect_to_sock() return in specified time
  dlm: remove O_NONBLOCK flag in sctp_connect_to_sock

 fs/dlm/lowcomms.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)




[GIT PULL] dlm updates for 4.18

2018-06-04 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.18

These three commits fix and clean up the flags dlm was
using on its SCTP sockets.  The result improves the
performance and fixes some bad connection delays.

Thanks,
Dave

Gang He (3):
  dlm: fix a clerical error when set SCTP_NODELAY
  dlm: make sctp_connect_to_sock() return in specified time
  dlm: remove O_NONBLOCK flag in sctp_connect_to_sock

 fs/dlm/lowcomms.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)




Re: [PATCH] dlm: prompt the user SCTP is experimental

2018-04-02 Thread David Teigland
On Thu, Mar 22, 2018 at 10:27:56PM -0600, Gang He wrote:
> Hello David,
> 
> Do you agree to add this prompt to the user? 
> Since sometimes customers attempted to setup SCTP protocol with two rings, 
> but they could not get the expected result, then it maybe bring some concerns 
> to the customer for DLM qualities.

I don't think the kernel message is a good way to communicate this to users.
Dave


> > As you know, DLM module can use TCP or SCTP protocols to
> > communicate among the cluster.
> > But, according to our testing, SCTP protocol is still considered
> > experimental, since not all aspects are working correctly and
> > it is not full tested.
> > e.g. SCTP connection channel switch needs about 5mins hang in case
> > one connection(ring) is broken.
> > Then, I suggest to add a kernel print, which prompts the user SCTP
> > protocol for DLM should be considered experimental, it is not
> > recommended in production environment.
> > 
> > Signed-off-by: Gang He 
> > ---
> >  fs/dlm/lowcomms.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> > index cff79ea..18fd85d 100644
> > --- a/fs/dlm/lowcomms.c
> > +++ b/fs/dlm/lowcomms.c
> > @@ -1307,6 +1307,7 @@ static int sctp_listen_for_all(void)
> > return -ENOMEM;
> >  
> > log_print("Using SCTP for communications");
> > +   log_print("SCTP protocol is experimental, use at your own risk");
> >  
> > result = sock_create_kern(_net, dlm_local_addr[0]->ss_family,
> >   SOCK_STREAM, IPPROTO_SCTP, );
> > -- 
> > 1.8.5.6


Re: [PATCH] dlm: prompt the user SCTP is experimental

2018-04-02 Thread David Teigland
On Thu, Mar 22, 2018 at 10:27:56PM -0600, Gang He wrote:
> Hello David,
> 
> Do you agree to add this prompt to the user? 
> Since sometimes customers attempted to setup SCTP protocol with two rings, 
> but they could not get the expected result, then it maybe bring some concerns 
> to the customer for DLM qualities.

I don't think the kernel message is a good way to communicate this to users.
Dave


> > As you know, DLM module can use TCP or SCTP protocols to
> > communicate among the cluster.
> > But, according to our testing, SCTP protocol is still considered
> > experimental, since not all aspects are working correctly and
> > it is not full tested.
> > e.g. SCTP connection channel switch needs about 5mins hang in case
> > one connection(ring) is broken.
> > Then, I suggest to add a kernel print, which prompts the user SCTP
> > protocol for DLM should be considered experimental, it is not
> > recommended in production environment.
> > 
> > Signed-off-by: Gang He 
> > ---
> >  fs/dlm/lowcomms.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> > index cff79ea..18fd85d 100644
> > --- a/fs/dlm/lowcomms.c
> > +++ b/fs/dlm/lowcomms.c
> > @@ -1307,6 +1307,7 @@ static int sctp_listen_for_all(void)
> > return -ENOMEM;
> >  
> > log_print("Using SCTP for communications");
> > +   log_print("SCTP protocol is experimental, use at your own risk");
> >  
> > result = sock_create_kern(_net, dlm_local_addr[0]->ss_family,
> >   SOCK_STREAM, IPPROTO_SCTP, );
> > -- 
> > 1.8.5.6


[GIT PULL] dlm updates for 4.15

2017-11-13 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.15

This set focuses, as usual, on fixes to the comms layer.
New testing of the dlm with ocfs2 uncovered a number of
bugs in the TCP connection handling during recovery,
starting, and stopping.

Thanks,
Dave


Bob Peterson (3):
  DLM: Eliminate CF_CONNECT_PENDING flag
  DLM: Eliminate CF_WRITE_PENDING flag
  DLM: Fix saving of NULL callbacks

David Teigland (1):
  dlm: remove dlm_send_rcom_lookup_dump

Guoqing Jiang (1):
  dlm: recheck kthread_should_stop() before schedule()

tsutomu@toshiba.co.jp (15):
  DLM: fix remove save_cb argument from add_sock()
  DLM: fix double list_del()
  DLM: fix race condition between dlm_send and dlm_recv
  DLM: fix to use sock_mutex correctly in xxx_accept_from_sock
  DLM: retry rcom when dlm_wait_function is timed out.
  DLM: close othercon at send/receive error
  DLM: fix race condition between dlm_recoverd_stop and dlm_recoverd
  DLM: Reanimate CF_WRITE_PENDING flag
  DLM: use CF_CLOSE flag to stop dlm_send correctly
  DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
  DLM: fix memory leak in tcp_accept_from_sock()
  DLM: fix overflow dlm_cb_seq
  DLM: fix to use sk_callback_lock correctly
  DLM: fix to reschedule rwork
  DLM: fix NULL pointer dereference in send_to_sock()


 fs/dlm/ast.c  |   2 +
 fs/dlm/lock.c |  43 ++-
 fs/dlm/lowcomms.c | 218 ++
 fs/dlm/rcom.c |  26 ++-
 fs/dlm/rcom.h |   1 -
 fs/dlm/recover.c  |   4 +
 fs/dlm/recoverd.c |  16 +++-
 7 files changed, 186 insertions(+), 124 deletions(-)



[GIT PULL] dlm updates for 4.15

2017-11-13 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.15

This set focuses, as usual, on fixes to the comms layer.
New testing of the dlm with ocfs2 uncovered a number of
bugs in the TCP connection handling during recovery,
starting, and stopping.

Thanks,
Dave


Bob Peterson (3):
  DLM: Eliminate CF_CONNECT_PENDING flag
  DLM: Eliminate CF_WRITE_PENDING flag
  DLM: Fix saving of NULL callbacks

David Teigland (1):
  dlm: remove dlm_send_rcom_lookup_dump

Guoqing Jiang (1):
  dlm: recheck kthread_should_stop() before schedule()

tsutomu@toshiba.co.jp (15):
  DLM: fix remove save_cb argument from add_sock()
  DLM: fix double list_del()
  DLM: fix race condition between dlm_send and dlm_recv
  DLM: fix to use sock_mutex correctly in xxx_accept_from_sock
  DLM: retry rcom when dlm_wait_function is timed out.
  DLM: close othercon at send/receive error
  DLM: fix race condition between dlm_recoverd_stop and dlm_recoverd
  DLM: Reanimate CF_WRITE_PENDING flag
  DLM: use CF_CLOSE flag to stop dlm_send correctly
  DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
  DLM: fix memory leak in tcp_accept_from_sock()
  DLM: fix overflow dlm_cb_seq
  DLM: fix to use sk_callback_lock correctly
  DLM: fix to reschedule rwork
  DLM: fix NULL pointer dereference in send_to_sock()


 fs/dlm/ast.c  |   2 +
 fs/dlm/lock.c |  43 ++-
 fs/dlm/lowcomms.c | 218 ++
 fs/dlm/rcom.c |  26 ++-
 fs/dlm/rcom.h |   1 -
 fs/dlm/recover.c  |   4 +
 fs/dlm/recoverd.c |  16 +++-
 7 files changed, 186 insertions(+), 124 deletions(-)



Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread David Teigland
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> > create_rcom
> >   dlm_lowcomms_get_buffer
> > nodeid2con
> >   mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that 
> point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(>ls_rsbtbl[b].keep, name, len, )
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(>ls_rsbtbl[b].toss, name, len, )
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
> log_error(ls, "from_master %d our_master", 
> from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave


Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread David Teigland
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> > create_rcom
> >   dlm_lowcomms_get_buffer
> > nodeid2con
> >   mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that 
> point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(>ls_rsbtbl[b].keep, name, len, )
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(>ls_rsbtbl[b].toss, name, len, )
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
> log_error(ls, "from_master %d our_master", 
> from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave


Re: linux-next: Signed-off-by missing for commits in the dlm tree

2017-09-18 Thread David Teigland
On Tue, Sep 19, 2017 at 07:49:19AM +1000, Stephen Rothwell wrote:
> Hi David,
> 
> Commits
> 
>   bcc976a145c9 ("DLM: Eliminate CF_CONNECT_PENDING flag")
>   c071b28b2bd5 ("DLM: Eliminate CF_WRITE_PENDING flag")
>   782551aac851 ("DLM: Fix saving of NULL callbacks")
> 
> are missing a Signed-off-by from their author.

Thanks, should be fixed now.
Dave


Re: linux-next: Signed-off-by missing for commits in the dlm tree

2017-09-18 Thread David Teigland
On Tue, Sep 19, 2017 at 07:49:19AM +1000, Stephen Rothwell wrote:
> Hi David,
> 
> Commits
> 
>   bcc976a145c9 ("DLM: Eliminate CF_CONNECT_PENDING flag")
>   c071b28b2bd5 ("DLM: Eliminate CF_WRITE_PENDING flag")
>   782551aac851 ("DLM: Fix saving of NULL callbacks")
> 
> are missing a Signed-off-by from their author.

Thanks, should be fixed now.
Dave


[GIT PULL] dlm updates for 4.14

2017-09-05 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.14

This set includes a bunch of minor code cleanups that
have accumulated, probably from code analyzers people
like to run.  There is one nice fix that avoids some
socket leaks by switching to use sock_create_lite().

Thanks,
Dave


Bhumika Goyal (1):
  dlm: constify kset_uevent_ops structure

Edwin Török (1):
  dlm: avoid double-free on error path in dlm_device_{register,unregister}

Gang He (1):
  dlm: Make dismatch error message more clear

Guoqing Jiang (1):
  dlm: use sock_create_lite inside tcp_accept_from_sock

Markus Elfring (10):
  dlm: Replace six seq_puts() calls by seq_putc()
  dlm: Add spaces for better code readability
  dlm: Improve a size determination in table_seq_start()
  dlm: Use kcalloc() in dlm_scan_waiters()
  dlm: Improve a size determination in dlm_recover_waiters_pre()
  dlm: Delete an error message for a failed memory allocation in 
dlm_recover_waiters_pre()
  dlm: Use kmalloc_array() in make_member_array()
  dlm: Use kcalloc() in two functions
  dlm: Improve a size determination in two functions
  dlm: Delete an unnecessary variable initialisation in dlm_ls_start()

Mikko Rapeli (1):
  uapi linux/dlm_netlink.h: include linux/dlmconstants.h

Vlad Tsyrklevich (1):
  dlm: Fix kernel memory disclosure

Zhu Lingshan (1):
  dlm: print log message when cluster name is not set


 fs/dlm/debug_fs.c| 25 -
 fs/dlm/lock.c|  8 +++-
 fs/dlm/lockspace.c   |  9 +++--
 fs/dlm/lowcomms.c|  2 +-
 fs/dlm/member.c  | 15 ++-
 fs/dlm/user.c|  6 ++
 include/uapi/linux/dlm_netlink.h |  1 +
 7 files changed, 36 insertions(+), 30 deletions(-)


[GIT PULL] dlm updates for 4.14

2017-09-05 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.14

This set includes a bunch of minor code cleanups that
have accumulated, probably from code analyzers people
like to run.  There is one nice fix that avoids some
socket leaks by switching to use sock_create_lite().

Thanks,
Dave


Bhumika Goyal (1):
  dlm: constify kset_uevent_ops structure

Edwin Török (1):
  dlm: avoid double-free on error path in dlm_device_{register,unregister}

Gang He (1):
  dlm: Make dismatch error message more clear

Guoqing Jiang (1):
  dlm: use sock_create_lite inside tcp_accept_from_sock

Markus Elfring (10):
  dlm: Replace six seq_puts() calls by seq_putc()
  dlm: Add spaces for better code readability
  dlm: Improve a size determination in table_seq_start()
  dlm: Use kcalloc() in dlm_scan_waiters()
  dlm: Improve a size determination in dlm_recover_waiters_pre()
  dlm: Delete an error message for a failed memory allocation in 
dlm_recover_waiters_pre()
  dlm: Use kmalloc_array() in make_member_array()
  dlm: Use kcalloc() in two functions
  dlm: Improve a size determination in two functions
  dlm: Delete an unnecessary variable initialisation in dlm_ls_start()

Mikko Rapeli (1):
  uapi linux/dlm_netlink.h: include linux/dlmconstants.h

Vlad Tsyrklevich (1):
  dlm: Fix kernel memory disclosure

Zhu Lingshan (1):
  dlm: print log message when cluster name is not set


 fs/dlm/debug_fs.c| 25 -
 fs/dlm/lock.c|  8 +++-
 fs/dlm/lockspace.c   |  9 +++--
 fs/dlm/lowcomms.c|  2 +-
 fs/dlm/member.c  | 15 ++-
 fs/dlm/user.c|  6 ++
 include/uapi/linux/dlm_netlink.h |  1 +
 7 files changed, 36 insertions(+), 30 deletions(-)


[GIT PULL] dlm fixes for 4.10

2016-12-12 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.10

This set fixes error reporting for dlm sockets, removes the unbound
property on the dlm callback workqueue to improve performance, and
includes a couple trivial changes.

Thanks,
Dave


Bob Peterson (3):
  dlm: don't save callbacks after accept
  dlm: remove lock_sock to avoid scheduling while atomic
  dlm: don't specify WQ_UNBOUND for the ast callback workqueue

Paul Gortmaker (1):
  dlm: audit and remove any unnecessary uses of module.h

Stephen Hemminger (1):
  dlm: make genl_ops const

Wei Yongjun (1):
  dlm: fix error return code in sctp_accept_from_sock()


 fs/dlm/ast.c  |  2 +-
 fs/dlm/config.c   |  2 +-
 fs/dlm/debug_fs.c |  2 +-
 fs/dlm/dlm_internal.h |  1 -
 fs/dlm/lockspace.c|  2 ++
 fs/dlm/lowcomms.c | 28 ++--
 fs/dlm/main.c |  2 ++
 fs/dlm/netlink.c  |  2 +-
 fs/dlm/user.c |  1 -
 9 files changed, 22 insertions(+), 20 deletions(-)



[GIT PULL] dlm fixes for 4.10

2016-12-12 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.10

This set fixes error reporting for dlm sockets, removes the unbound
property on the dlm callback workqueue to improve performance, and
includes a couple trivial changes.

Thanks,
Dave


Bob Peterson (3):
  dlm: don't save callbacks after accept
  dlm: remove lock_sock to avoid scheduling while atomic
  dlm: don't specify WQ_UNBOUND for the ast callback workqueue

Paul Gortmaker (1):
  dlm: audit and remove any unnecessary uses of module.h

Stephen Hemminger (1):
  dlm: make genl_ops const

Wei Yongjun (1):
  dlm: fix error return code in sctp_accept_from_sock()


 fs/dlm/ast.c  |  2 +-
 fs/dlm/config.c   |  2 +-
 fs/dlm/debug_fs.c |  2 +-
 fs/dlm/dlm_internal.h |  1 -
 fs/dlm/lockspace.c|  2 ++
 fs/dlm/lowcomms.c | 28 ++--
 fs/dlm/main.c |  2 ++
 fs/dlm/netlink.c  |  2 +-
 fs/dlm/user.c |  1 -
 9 files changed, 22 insertions(+), 20 deletions(-)



[GIT PULL] dlm fixes for 4.9

2016-10-10 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.9

This includes a bug fix for a bad memory access during workqueue
cleanup, which can happen while shutting down the dlm networking
layer.  (This was found and fixed in the past week, so has not
appeared in -next.)

Thanks,
Dave

Marcelo Ricardo Leitner (1):
  dlm: free workqueues after the connections

 fs/dlm/lowcomms.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)



[GIT PULL] dlm fixes for 4.9

2016-10-10 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.9

This includes a bug fix for a bad memory access during workqueue
cleanup, which can happen while shutting down the dlm networking
layer.  (This was found and fixed in the past week, so has not
appeared in -next.)

Thanks,
Dave

Marcelo Ricardo Leitner (1):
  dlm: free workqueues after the connections

 fs/dlm/lowcomms.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)



[GIT PULL] dlm fixes for 4.8

2016-08-26 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git 
dlm-4.8-fixes

This fixes a bug introduced by recent debugfs cleanup.

Thanks,
Dave

Eric Ren (1):
  dlm: fix malfunction of dlm_tool caused by debugfs changes

 fs/dlm/debug_fs.c | 62 
--
 1 file changed, 48 insertions(+), 14 deletions(-)



[GIT PULL] dlm fixes for 4.8

2016-08-26 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git 
dlm-4.8-fixes

This fixes a bug introduced by recent debugfs cleanup.

Thanks,
Dave

Eric Ren (1):
  dlm: fix malfunction of dlm_tool caused by debugfs changes

 fs/dlm/debug_fs.c | 62 
--
 1 file changed, 48 insertions(+), 14 deletions(-)



[GIT PULL] dlm updates for 4.8

2016-07-27 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.8

This set includes two trivial changes, one to
use kmemdup and another to control the log level
of recovery messages.

Thanks,
Dave

Amitoj Kaur Chawla (1):
  dlm: Use kmemdup instead of kmalloc and memcpy

Zhilong Liu (1):
  dlm: add log_info config option

 fs/dlm/config.c   |  7 +++
 fs/dlm/config.h   |  1 +
 fs/dlm/dlm_internal.h | 10 +-
 fs/dlm/lowcomms.c |  3 +--
 4 files changed, 18 insertions(+), 3 deletions(-)



[GIT PULL] dlm updates for 4.8

2016-07-27 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.8

This set includes two trivial changes, one to
use kmemdup and another to control the log level
of recovery messages.

Thanks,
Dave

Amitoj Kaur Chawla (1):
  dlm: Use kmemdup instead of kmalloc and memcpy

Zhilong Liu (1):
  dlm: add log_info config option

 fs/dlm/config.c   |  7 +++
 fs/dlm/config.h   |  1 +
 fs/dlm/dlm_internal.h | 10 +-
 fs/dlm/lowcomms.c |  3 +--
 4 files changed, 18 insertions(+), 3 deletions(-)



[GIT PULL] dlm fixes for 4.6

2016-03-29 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git 
dlm-4.6-fixes

This fixes a bug from the configfs cleanup.

Thanks,
Dave

Andrew Price (1):
  dlm: config: Fix ENOMEM failures in make_cluster()

 fs/dlm/config.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)



[GIT PULL] dlm fixes for 4.6

2016-03-29 Thread David Teigland
Hi Linus,

Please pull dlm fixes from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git 
dlm-4.6-fixes

This fixes a bug from the configfs cleanup.

Thanks,
Dave

Andrew Price (1):
  dlm: config: Fix ENOMEM failures in make_cluster()

 fs/dlm/config.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)



[GIT PULL] dlm updates for 4.6

2016-03-18 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.6

Previous changes introduced the use of socket error reporting
for dlm sockets.  This set includes two fixes in how the
socket error callbacks are used.

Thanks,
Dave

Bob Peterson (2):
  DLM: Replace nodeid_to_addr with kernel_getpeername
  DLM: Save and restore socket callbacks properly

 fs/dlm/lowcomms.c | 74 ++-
 1 file changed, 62 insertions(+), 12 deletions(-)



[GIT PULL] dlm updates for 4.6

2016-03-18 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.6

Previous changes introduced the use of socket error reporting
for dlm sockets.  This set includes two fixes in how the
socket error callbacks are used.

Thanks,
Dave

Bob Peterson (2):
  DLM: Replace nodeid_to_addr with kernel_getpeername
  DLM: Save and restore socket callbacks properly

 fs/dlm/lowcomms.c | 74 ++-
 1 file changed, 62 insertions(+), 12 deletions(-)



[GIT PULL] dlm updates for 4.4

2015-11-05 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.4

This includes one simple fix to make posix locks
interruptible by signals in cases where a signal
handler is used.

Thanks,
Dave

Eric Ren (1):
  dlm: make posix locks interruptible

 fs/dlm/plock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 4.4

2015-11-05 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.4

This includes one simple fix to make posix locks
interruptible by signals in cases where a signal
handler is used.

Thanks,
Dave

Eric Ren (1):
  dlm: make posix locks interruptible

 fs/dlm/plock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 4.3

2015-08-31 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.3

This set mainly includes a change to the way the
dlm uses the SCTP API in the kernel, removing the
direct dependency on the sctp module.  Other odd
SCTP-related fixes are also included.  The other
notable fix is for a long standing regression in
the behavior of lock value blocks for user space
locks.

Thanks,
Dave


Bob Peterson (1):
  dlm: print error from kernel_sendpage

David Teigland (1):
  dlm: fix lvb copy for user locks

Marcelo Ricardo Leitner (6):
  dlm: fix connection stealing if using SCTP
  dlm: fix race while closing connections
  dlm: fix not reconnecting on connecting error handling
  dlm: use sctp 1-to-1 API
  dlm: replace BUG_ON with a less severe handling
  dlm: fix reconnecting but not sending data

kbuild test robot (1):
  dlm: sctp_accept_from_sock() can be static


 fs/dlm/lowcomms.c   | 743 
+++-
 fs/dlm/user.c   |   7 +-
 include/uapi/linux/dlm_device.h |   2 +-
 3 files changed, 305 insertions(+), 447 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 4.3

2015-08-31 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-4.3

This set mainly includes a change to the way the
dlm uses the SCTP API in the kernel, removing the
direct dependency on the sctp module.  Other odd
SCTP-related fixes are also included.  The other
notable fix is for a long standing regression in
the behavior of lock value blocks for user space
locks.

Thanks,
Dave


Bob Peterson (1):
  dlm: print error from kernel_sendpage

David Teigland (1):
  dlm: fix lvb copy for user locks

Marcelo Ricardo Leitner (6):
  dlm: fix connection stealing if using SCTP
  dlm: fix race while closing connections
  dlm: fix not reconnecting on connecting error handling
  dlm: use sctp 1-to-1 API
  dlm: replace BUG_ON with a less severe handling
  dlm: fix reconnecting but not sending data

kbuild test robot (1):
  dlm: sctp_accept_from_sock() can be static


 fs/dlm/lowcomms.c   | 743 
+++-
 fs/dlm/user.c   |   7 +-
 include/uapi/linux/dlm_device.h |   2 +-
 3 files changed, 305 insertions(+), 447 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Wed, Aug 05, 2015 at 12:01:38PM -0700, Guenter Roeck wrote:
> I think I can understand why Wim was reluctant to accept your patch;
> I must admit I don't understand your use case either.

Very breifly, sanlock is a shared storage based lease manager, and the
expiration of a lease is tied to the expiration of the watchdog.  I have
to ensure that the watchdog expires at or before the time that the lease
expires.  This means that I cannot allow a watchdog heartbeat apart from a
corresponding lease renewal on the shared storage.  Otherwise, the
calculation by other hosts of the time of the hard reset will be wrong,
and the data on shared storage could be corrupted.

> I wonder if you are actually mis-using the watchdog subsystem to generate
> hard resets.

I am indeed using it to generate hard resets.

> After all, you could avoid the unexpected close situation with
> an exit handler in your application. That handler could catch anything but
> SIGKILL, but anyone using SIGKILL doesn't really deserve better.

I avoid the unexpected close situation by prematurely closing the device
to generate the heartbeat from close, and then reopening if needed.  That
covers the SIGKILL case.  So, I have a work around, but the patch would
still be nice.

> If the intent is to reset the system after the application closes,
> executing "/sbin/restart -f" might be a safer approach than just killing
> the watchdog.

I need to reset the system if the application crashes, or if the
application is running but can't renew its lease.  In the former case,
executing something doesn't work.  In the later case, I have done similar
(with /proc/sysrq-trigger), but it doesn't always apply and I'd still want
the hardware reset as redundancy.

> In addition to that, I don't think it is a good idea to rely on the assumption
> that the watchdog will expire exactly after the configured timeout.
> Many watchdog drivers implement a soft timeout on top of the hardware timeout,
> and thus already implement the internal heartbeat. Most of those drivers
> will stop sending internal heartbeats if user space did not send a heartbeat
> within the configured timeout period. The actual reset will then occur later,
> after the actual hardware watchdog timed out. This can be as much as the
> hardware timeout period, which may be substantial.

OK, thanks, I'll look into this in more detail.  Is there a way I can
identify which cases these are, or do you know an example I can look at?
In the worst case I'd have to extend the lease expiration time by a full
timeout period when the dubious drivers are used.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Wed, Aug 05, 2015 at 10:41:51AM -0700, Guenter Roeck wrote:
> Not really. The heartbeats will be generated such that the watchdog expires
> no later that . I 
> discussed
> this already with Uwe; he had the same concern. This isn't in the current
> version of the patch set, but it will be in the next version. That means
> that nothing will change from user space perspective.

Sounds good, thanks.

> >A related issue from some years ago is the unfortunate fact that closing
> >the watchdog device also generates a heartbeat.  I'd like to disable that
> >also, and submitted a patch for it here:
> >http://www.spinics.net/lists/linux-watchdog/msg01477.html
> >
> 
> That is a different issue, though, and unrelated to this patch set.
> Wim had a good point there: Presumably the problem you are trying to solve
> applies to the entire system, not to a specific watchdog. What you are looking
> for looks more like a system parameter, not like something to set with an 
> ioctl
> message. The reason here is that you'd still want to be able to use standard
> applications such as systemd or watchdogd to trigger heartbeats, and not 
> depend
> on your own.

I'd need this behavior when the system is running my program (sanlock with
wdmd), which uses /dev/watchdog.  No other programs (systemd or watchdogd)
could be using /dev/watchdog at the same time.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Mon, Aug 03, 2015 at 07:13:26PM -0700, Guenter Roeck wrote:
> - Some watchdogs have a very short maximum timeout, in the range of just a few
>   seconds. Such low timeouts are difficult if not impossible to support from
>   user space. Drivers supporting such watchdog hardware need to implement
>   a timer function to augment heartbeats from user space.

> - A new status flag, WDOG_RUNNING, informs the watchdog subsystem that a
>   watchdog is running, and that the watchdog subsystem needs to generate
>   heartbeat requests while the associated watchdog device is closed.

> Patch #2 adds timer functionality to the watchdog core. It solves the problem
> of short maximum hardware timeouts by augmenting heartbeats triggered from
> user space with internally triggered heartbeats.
> 
> Patch #3 adds functionality to generate heartbeats while the watchdog device 
> is
> closed. It handles situation where where the watchdog is running after
> the driver has been instantiated, but the device is not yet opened,
> and post-close situations necessary if a watchdog can not be stopped.

These sound concerning because it seems that heartbeats could be generated
outside of the direct control of userspace.  I have a program that depends
on having direct control over whether heartbeats are generated (or more
specifically, *not* generated.)  If these new features introduce a new way
for heartbeats to be generated, is there a way I can detect or disable
that behavior from userspace?  Unwanted heartbeats could break my program
and may lead to data corruption.

A related issue from some years ago is the unfortunate fact that closing
the watchdog device also generates a heartbeat.  I'd like to disable that
also, and submitted a patch for it here:
http://www.spinics.net/lists/linux-watchdog/msg01477.html

(Without the patch, I have to work around it by closing the device
prematurely as a way to generate the potentially final heartbeat, and then
reopen it again if I want to continue the heartbeats.)

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Mon, Aug 03, 2015 at 07:13:26PM -0700, Guenter Roeck wrote:
 - Some watchdogs have a very short maximum timeout, in the range of just a few
   seconds. Such low timeouts are difficult if not impossible to support from
   user space. Drivers supporting such watchdog hardware need to implement
   a timer function to augment heartbeats from user space.

 - A new status flag, WDOG_RUNNING, informs the watchdog subsystem that a
   watchdog is running, and that the watchdog subsystem needs to generate
   heartbeat requests while the associated watchdog device is closed.

 Patch #2 adds timer functionality to the watchdog core. It solves the problem
 of short maximum hardware timeouts by augmenting heartbeats triggered from
 user space with internally triggered heartbeats.
 
 Patch #3 adds functionality to generate heartbeats while the watchdog device 
 is
 closed. It handles situation where where the watchdog is running after
 the driver has been instantiated, but the device is not yet opened,
 and post-close situations necessary if a watchdog can not be stopped.

These sound concerning because it seems that heartbeats could be generated
outside of the direct control of userspace.  I have a program that depends
on having direct control over whether heartbeats are generated (or more
specifically, *not* generated.)  If these new features introduce a new way
for heartbeats to be generated, is there a way I can detect or disable
that behavior from userspace?  Unwanted heartbeats could break my program
and may lead to data corruption.

A related issue from some years ago is the unfortunate fact that closing
the watchdog device also generates a heartbeat.  I'd like to disable that
also, and submitted a patch for it here:
http://www.spinics.net/lists/linux-watchdog/msg01477.html

(Without the patch, I have to work around it by closing the device
prematurely as a way to generate the potentially final heartbeat, and then
reopen it again if I want to continue the heartbeats.)

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Wed, Aug 05, 2015 at 10:41:51AM -0700, Guenter Roeck wrote:
 Not really. The heartbeats will be generated such that the watchdog expires
 no later that last heartbeat from userspace + configured timeout. I 
 discussed
 this already with Uwe; he had the same concern. This isn't in the current
 version of the patch set, but it will be in the next version. That means
 that nothing will change from user space perspective.

Sounds good, thanks.

 A related issue from some years ago is the unfortunate fact that closing
 the watchdog device also generates a heartbeat.  I'd like to disable that
 also, and submitted a patch for it here:
 http://www.spinics.net/lists/linux-watchdog/msg01477.html
 
 
 That is a different issue, though, and unrelated to this patch set.
 Wim had a good point there: Presumably the problem you are trying to solve
 applies to the entire system, not to a specific watchdog. What you are looking
 for looks more like a system parameter, not like something to set with an 
 ioctl
 message. The reason here is that you'd still want to be able to use standard
 applications such as systemd or watchdogd to trigger heartbeats, and not 
 depend
 on your own.

I'd need this behavior when the system is running my program (sanlock with
wdmd), which uses /dev/watchdog.  No other programs (systemd or watchdogd)
could be using /dev/watchdog at the same time.

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-05 Thread David Teigland
On Wed, Aug 05, 2015 at 12:01:38PM -0700, Guenter Roeck wrote:
 I think I can understand why Wim was reluctant to accept your patch;
 I must admit I don't understand your use case either.

Very breifly, sanlock is a shared storage based lease manager, and the
expiration of a lease is tied to the expiration of the watchdog.  I have
to ensure that the watchdog expires at or before the time that the lease
expires.  This means that I cannot allow a watchdog heartbeat apart from a
corresponding lease renewal on the shared storage.  Otherwise, the
calculation by other hosts of the time of the hard reset will be wrong,
and the data on shared storage could be corrupted.

 I wonder if you are actually mis-using the watchdog subsystem to generate
 hard resets.

I am indeed using it to generate hard resets.

 After all, you could avoid the unexpected close situation with
 an exit handler in your application. That handler could catch anything but
 SIGKILL, but anyone using SIGKILL doesn't really deserve better.

I avoid the unexpected close situation by prematurely closing the device
to generate the heartbeat from close, and then reopening if needed.  That
covers the SIGKILL case.  So, I have a work around, but the patch would
still be nice.

 If the intent is to reset the system after the application closes,
 executing /sbin/restart -f might be a safer approach than just killing
 the watchdog.

I need to reset the system if the application crashes, or if the
application is running but can't renew its lease.  In the former case,
executing something doesn't work.  In the later case, I have done similar
(with /proc/sysrq-trigger), but it doesn't always apply and I'd still want
the hardware reset as redundancy.

 In addition to that, I don't think it is a good idea to rely on the assumption
 that the watchdog will expire exactly after the configured timeout.
 Many watchdog drivers implement a soft timeout on top of the hardware timeout,
 and thus already implement the internal heartbeat. Most of those drivers
 will stop sending internal heartbeats if user space did not send a heartbeat
 within the configured timeout period. The actual reset will then occur later,
 after the actual hardware watchdog timed out. This can be as much as the
 hardware timeout period, which may be substantial.

OK, thanks, I'll look into this in more detail.  Is there a way I can
identify which cases these are, or do you know an example I can look at?
In the worst case I'd have to extend the lease expiration time by a full
timeout period when the dubious drivers are used.

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-12 Thread David Teigland
When a node fails, its dirty areas get special treatment from other nodes
using the area_resyncing() function.  Should the suspend_list be created
before any reads or writes from the file system are processed by md?  It
seems to me that gfs journal recovery could read/write to dirty regions
(from the failed node) before md was finished setting up the suspend_list.
md could probably prevent that by using the recover_prep() dlm callback to
set a flag that would block any i/o that arrived before the suspend_list
was ready.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-12 Thread David Teigland
When a node fails, its dirty areas get special treatment from other nodes
using the area_resyncing() function.  Should the suspend_list be created
before any reads or writes from the file system are processed by md?  It
seems to me that gfs journal recovery could read/write to dirty regions
(from the failed node) before md was finished setting up the suspend_list.
md could probably prevent that by using the recover_prep() dlm callback to
set a flag that would block any i/o that arrived before the suspend_list
was ready.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread David Teigland
On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
> Do you consider take the following clean up? If yes, I will send a
> formal patch, otherwise pls ignore it.

On first glance, the old and new code do not appear to do the same thing,
so let's leave it as it is.

> -   to_nodeid = dlm_dir_nodeid(r);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread David Teigland
On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
 Do you consider take the following clean up? If yes, I will send a
 formal patch, otherwise pls ignore it.

On first glance, the old and new code do not appear to do the same thing,
so let's leave it as it is.

 -   to_nodeid = dlm_dir_nodeid(r);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 04:07:44PM -0500, David Teigland wrote:
> > Also a slightly less adversarial tone would make me feel more
> > comfortable, though maybe I'm misreading your intent.
> 
> You're probably misreading "concerned".
> 
> The initial responses to my inquiry were severely lacking in any
> substance, even dismissive, which raised "concerned" to "troubled".

Reading those messages again I see what you mean, they don't sound very
nice, so sorry about that.  I'll repeat the one positive note, which is
that the brief things I've noticed make it look much better than the dm
approach from several years ago.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Thu, Jun 11, 2015 at 06:31:31AM +1000, Neil Brown wrote:
> What is your interest in this?  I'm always happy for open discussion and
> varied input, but it would help to know to what extent you are a stake
> holder?

Using the dlm correctly is non-trivial and should be reviewed.
If the dlm is misused, some part of that may fall in my lap, if
only so far as having to debug problems to distinguish between dlm
bugs or md-cluster bugs.  This has been learned the hard way.

I have yet to find time to look up the previous review discussion.
I will be more than happy if I find the dlm usage has already been
thoroughly reviewed.

> Also a slightly less adversarial tone would make me feel more
> comfortable, though maybe I'm misreading your intent.

You're probably misreading "concerned".

The initial responses to my inquiry were severely lacking in any
substance, even dismissive, which raised "concerned" to "troubled".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 12:05:33PM -0500, David Teigland wrote:
> Separate bitmaps for each node sounds like a much better design than the
> cmirror design which used a single shared bitmap (I argued for using a
> single bitmap when cmirror was being designed.)

Sorry misspoke, I argued for one bitmap per node, like you're doing, so in
general I think you're starting off in a much better direction than I saw
before.  (I still doubt there's enough value in this to do it at all,
which is another reason I'm particularly interested to see some real world
success with this.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:23:25AM -0500, Goldwyn Rodrigues wrote:
> To start with, the goal of (basic) MD RAID1 is to keep the two
> mirrored device consistent _all_ of the time. In case of a device
> failure, it should degrade the array pointing to the failed device,
> so it can be (hot)removed/replaced. Now, take the same concepts to
> multiple nodes using the same MD-RAID1 device..

"multiple nodes using the same MD-RAID1 device" concurrently!?  That's a
crucial piece information that really frames the entire topic.  That needs
to be your very first point defining the purpose of this.

How would you use the same MD-RAID1 device concurrently on multiple nodes
without a cluster file system?  Does this imply that your work is only
useful for the tiny segment of people who could use MD-RAID1 under a
cluster file system?  There was a previous implementation of this in user
space called "cmirror", built on dm, which turned out to be quite useless,
and is being deprecated.  Did you talk to cluster file system developers
and users to find out if this is worth doing?  Or are you just hoping it
turns out to be worthwhile?  That's might be answered by examples of
successful real world usage that I asked about.  We don't want to be tied
down with long term maintenance of something that isn't worth it.


> >What's different about disks being on SAN that breaks data consistency vs
> >disks being locally attached?  Where did the dlm come into the picture?
> 
> There are multiple nodes using the same shared device. Different
> nodes would be writing their own data to the shared device possibly
> using a shared filesystem such as ocfs2 on top of it. Each node
> maintains a bitmap to co-ordinate syncs between the two devices of
> the RAID. Since there are two devices, writes on the two devices can
> end at different times and must be co-ordinated.

Thank you, this is the kind of technical detail that I'm looking for.
Separate bitmaps for each node sounds like a much better design than the
cmirror design which used a single shared bitmap (I argued for using a
single bitmap when cmirror was being designed.)

Given that the cluster file system does locking to prevent concurrent
writes to the same blocks, you shouldn't need any locking in raid1 for
that.  Could elaborate on exactly when inter-node locking is needed,
i.e. what specific steps need to be coordinated?


> >>Device failure can be partial. Say, only node 1 sees that one of the
> >>device has failed (link break).  You need to "tell" other nodes not
> >>to use the device and that the array is degraded.
> >
> >Why?
> 
> Data consistency. Because the node which continues to "see" the
> failed device (on another node) as working will read stale data.

I still don't understand, but I suspect this will become clear from other
examples.


> Different nodes will be writing to different
> blocks. So, if a node fails, you need to make sure that what the
> other node has not synced between the two devices is completed by
> the one performing recovery. You need to provide a consistent view
> to all nodes.

This is getting closer to the kind of detail we need, but it's not quite
there yet.  I think a full-blown example is probably required, e.g. in
terms of specific reads and writes

1. node1 writes to block X
2. node2 ...


> Also, may I point you to linux/Documentation/md-cluster.txt?

That looks like it will be very helpful when I get to the point of
reviewing the implementation.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 10:27:27AM -0500, Goldwyn Rodrigues wrote:
> I thought I answered that:
> To use a software RAID1 across multiple nodes of a cluster. Let me
> explain in more words..
> 
> In a cluster with multiple nodes with a shared storage, such as a
> SAN. The shared device becomes a single point of failure.

OK, shared storage, that's an important starting point that was never
clear.

> If the
> device loses power, you will lose everything. A solution proposed is
> to use software RAID, say with two SAN switches with different
> devices and create a RAID1 on it. So if you lose power on one switch
> or one of the device is fails the other is still available. Once you
> get the other switch/device back up, it would resync the devices.

OK, MD RAID1 on shared disks.

> >, and exactly
> >what breaks when you use raid1 in that way?  Once we've established the
> >technical problem, then I can fairly evaluate your solution for it.
> 
> Data consistency breaks. If node 1 is writing to the RAID1 device,
> you have to make sure the data between the two RAID devices is
> consistent. With software raid, this is performed with bitmaps. The
> DLM is used to maintain data consistency.

What's different about disks being on SAN that breaks data consistency vs
disks being locally attached?  Where did the dlm come into the picture?

> Device failure can be partial. Say, only node 1 sees that one of the
> device has failed (link break).  You need to "tell" other nodes not
> to use the device and that the array is degraded.

Why?

> In case of node failure, the blocks of the failed nodes must be
> synced before the cluster can continue operation.

What do cluster/node failures have to do with syncing mirror copies?

> Does that explain the situation?

No.  I don't see what clusters have to do with MD RAID1 devices, they seem
like completely orthogonal concepts.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote:
> The remove_from_waiters could  only be invoked after failed to
> create_message, right?
> Since send_message always returns 0, this patch doesn't touch anything
> about the failure
> path, and it also doesn't change the original semantic.

I'm not inclined to take any patches unless there's a problem identified.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Tue, Jun 09, 2015 at 10:33:08PM -0500, Goldwyn Rodrigues wrote:
> >>>some real world utility to warrant the potential maintenance effort.
> >>
> >>We do have a valid real world utility. It is to provide
> >>high-availability of RAID1 storage  over the cluster. The
> >>distributed locking is required only during cases of error and
> >>superblock updates and is not required during normal operations,
> >>which makes it fast enough for usual case scenarios.
> >
> >That's the theory, how much evidence do you have of that in practice?
> 
> We wanted to develop a solution which is lock free (or atleast
> minimum) for the most common/frequent usage scenario. Also, we
> compared it with iozone on top of ocfs2 to find that it is very
> close to local device performance numbers. we compared it with cLVM
> mirroring to find it better as well. However, in the future we would
> want to use it with with other RAID (10?) scenarios which is missing
> now.

OK, but that's the second time you've missed the question I asked about
examples of real world usage.  Given the early stage of development, I'm
supposing there is none, which also implies it's too early for merging.

> >>What are the doubts you have about it?
> >
> >Before I begin reviewing the implementation, I'd like to better understand
> >what it is about the existing raid1 that doesn't work correctly for what
> >you'd like to do with it, i.e. I don't know what the problem is.
> 
> David Lang has already responded: The idea is to use a RAID device
> (currently only level 1 mirroring is supported) with multiple nodes
> of the cluster.

That doesn't come close to answering the question: exactly how do you want
to use raid1 (I have no idea from the statements you've made), and exactly
what breaks when you use raid1 in that way?  Once we've established the
technical problem, then I can fairly evaluate your solution for it.

Isn't this process what staging is for?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Tue, Jun 09, 2015 at 10:33:08PM -0500, Goldwyn Rodrigues wrote:
 some real world utility to warrant the potential maintenance effort.
 
 We do have a valid real world utility. It is to provide
 high-availability of RAID1 storage  over the cluster. The
 distributed locking is required only during cases of error and
 superblock updates and is not required during normal operations,
 which makes it fast enough for usual case scenarios.
 
 That's the theory, how much evidence do you have of that in practice?
 
 We wanted to develop a solution which is lock free (or atleast
 minimum) for the most common/frequent usage scenario. Also, we
 compared it with iozone on top of ocfs2 to find that it is very
 close to local device performance numbers. we compared it with cLVM
 mirroring to find it better as well. However, in the future we would
 want to use it with with other RAID (10?) scenarios which is missing
 now.

OK, but that's the second time you've missed the question I asked about
examples of real world usage.  Given the early stage of development, I'm
supposing there is none, which also implies it's too early for merging.

 What are the doubts you have about it?
 
 Before I begin reviewing the implementation, I'd like to better understand
 what it is about the existing raid1 that doesn't work correctly for what
 you'd like to do with it, i.e. I don't know what the problem is.
 
 David Lang has already responded: The idea is to use a RAID device
 (currently only level 1 mirroring is supported) with multiple nodes
 of the cluster.

That doesn't come close to answering the question: exactly how do you want
to use raid1 (I have no idea from the statements you've made), and exactly
what breaks when you use raid1 in that way?  Once we've established the
technical problem, then I can fairly evaluate your solution for it.

Isn't this process what staging is for?

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 10:27:27AM -0500, Goldwyn Rodrigues wrote:
 I thought I answered that:
 To use a software RAID1 across multiple nodes of a cluster. Let me
 explain in more words..
 
 In a cluster with multiple nodes with a shared storage, such as a
 SAN. The shared device becomes a single point of failure.

OK, shared storage, that's an important starting point that was never
clear.

 If the
 device loses power, you will lose everything. A solution proposed is
 to use software RAID, say with two SAN switches with different
 devices and create a RAID1 on it. So if you lose power on one switch
 or one of the device is fails the other is still available. Once you
 get the other switch/device back up, it would resync the devices.

OK, MD RAID1 on shared disks.

 , and exactly
 what breaks when you use raid1 in that way?  Once we've established the
 technical problem, then I can fairly evaluate your solution for it.
 
 Data consistency breaks. If node 1 is writing to the RAID1 device,
 you have to make sure the data between the two RAID devices is
 consistent. With software raid, this is performed with bitmaps. The
 DLM is used to maintain data consistency.

What's different about disks being on SAN that breaks data consistency vs
disks being locally attached?  Where did the dlm come into the picture?

 Device failure can be partial. Say, only node 1 sees that one of the
 device has failed (link break).  You need to tell other nodes not
 to use the device and that the array is degraded.

Why?

 In case of node failure, the blocks of the failed nodes must be
 synced before the cluster can continue operation.

What do cluster/node failures have to do with syncing mirror copies?

 Does that explain the situation?

No.  I don't see what clusters have to do with MD RAID1 devices, they seem
like completely orthogonal concepts.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 12:05:33PM -0500, David Teigland wrote:
 Separate bitmaps for each node sounds like a much better design than the
 cmirror design which used a single shared bitmap (I argued for using a
 single bitmap when cmirror was being designed.)

Sorry misspoke, I argued for one bitmap per node, like you're doing, so in
general I think you're starting off in a much better direction than I saw
before.  (I still doubt there's enough value in this to do it at all,
which is another reason I'm particularly interested to see some real world
success with this.)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:23:25AM -0500, Goldwyn Rodrigues wrote:
 To start with, the goal of (basic) MD RAID1 is to keep the two
 mirrored device consistent _all_ of the time. In case of a device
 failure, it should degrade the array pointing to the failed device,
 so it can be (hot)removed/replaced. Now, take the same concepts to
 multiple nodes using the same MD-RAID1 device..

multiple nodes using the same MD-RAID1 device concurrently!?  That's a
crucial piece information that really frames the entire topic.  That needs
to be your very first point defining the purpose of this.

How would you use the same MD-RAID1 device concurrently on multiple nodes
without a cluster file system?  Does this imply that your work is only
useful for the tiny segment of people who could use MD-RAID1 under a
cluster file system?  There was a previous implementation of this in user
space called cmirror, built on dm, which turned out to be quite useless,
and is being deprecated.  Did you talk to cluster file system developers
and users to find out if this is worth doing?  Or are you just hoping it
turns out to be worthwhile?  That's might be answered by examples of
successful real world usage that I asked about.  We don't want to be tied
down with long term maintenance of something that isn't worth it.


 What's different about disks being on SAN that breaks data consistency vs
 disks being locally attached?  Where did the dlm come into the picture?
 
 There are multiple nodes using the same shared device. Different
 nodes would be writing their own data to the shared device possibly
 using a shared filesystem such as ocfs2 on top of it. Each node
 maintains a bitmap to co-ordinate syncs between the two devices of
 the RAID. Since there are two devices, writes on the two devices can
 end at different times and must be co-ordinated.

Thank you, this is the kind of technical detail that I'm looking for.
Separate bitmaps for each node sounds like a much better design than the
cmirror design which used a single shared bitmap (I argued for using a
single bitmap when cmirror was being designed.)

Given that the cluster file system does locking to prevent concurrent
writes to the same blocks, you shouldn't need any locking in raid1 for
that.  Could elaborate on exactly when inter-node locking is needed,
i.e. what specific steps need to be coordinated?


 Device failure can be partial. Say, only node 1 sees that one of the
 device has failed (link break).  You need to tell other nodes not
 to use the device and that the array is degraded.
 
 Why?
 
 Data consistency. Because the node which continues to see the
 failed device (on another node) as working will read stale data.

I still don't understand, but I suspect this will become clear from other
examples.


 Different nodes will be writing to different
 blocks. So, if a node fails, you need to make sure that what the
 other node has not synced between the two devices is completed by
 the one performing recovery. You need to provide a consistent view
 to all nodes.

This is getting closer to the kind of detail we need, but it's not quite
there yet.  I think a full-blown example is probably required, e.g. in
terms of specific reads and writes

1. node1 writes to block X
2. node2 ...


 Also, may I point you to linux/Documentation/md-cluster.txt?

That looks like it will be very helpful when I get to the point of
reviewing the implementation.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Thu, Jun 11, 2015 at 06:31:31AM +1000, Neil Brown wrote:
 What is your interest in this?  I'm always happy for open discussion and
 varied input, but it would help to know to what extent you are a stake
 holder?

Using the dlm correctly is non-trivial and should be reviewed.
If the dlm is misused, some part of that may fall in my lap, if
only so far as having to debug problems to distinguish between dlm
bugs or md-cluster bugs.  This has been learned the hard way.

I have yet to find time to look up the previous review discussion.
I will be more than happy if I find the dlm usage has already been
thoroughly reviewed.

 Also a slightly less adversarial tone would make me feel more
 comfortable, though maybe I'm misreading your intent.

You're probably misreading concerned.

The initial responses to my inquiry were severely lacking in any
substance, even dismissive, which raised concerned to troubled.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 04:07:44PM -0500, David Teigland wrote:
  Also a slightly less adversarial tone would make me feel more
  comfortable, though maybe I'm misreading your intent.
 
 You're probably misreading concerned.
 
 The initial responses to my inquiry were severely lacking in any
 substance, even dismissive, which raised concerned to troubled.

Reading those messages again I see what you mean, they don't sound very
nice, so sorry about that.  I'll repeat the one positive note, which is
that the brief things I've noticed make it look much better than the dm
approach from several years ago.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote:
 The remove_from_waiters could  only be invoked after failed to
 create_message, right?
 Since send_message always returns 0, this patch doesn't touch anything
 about the failure
 path, and it also doesn't change the original semantic.

I'm not inclined to take any patches unless there's a problem identified.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Teigland
On Tue, Jun 09, 2015 at 03:08:11PM -0500, Goldwyn Rodrigues wrote:
> Hi David,
> 
> On 06/09/2015 02:45 PM, David Teigland wrote:
> >On Tue, Jun 09, 2015 at 02:26:25PM -0500, Goldwyn Rodrigues wrote:
> >>On 06/09/2015 01:22 PM, David Teigland wrote:
> >>>I've just noticed the existence of clustered MD for the first time.
> >>>It is a major new user of the dlm, and I have some doubts about it.
> >>>When did this appear on the mailing list for review?
> >>
> >>It first appeared in December, 2014 on the RAID mailing list.
> >>http://marc.info/?l=linux-raid=141891941330336=2
> >
> >I don't read that mailing list.  Searching my archives of linux-kernel, it
> >has never been mentioned.  I can't even find an email for the md pull
> >request that included it.
> 
> Is this what you are looking for?
> http://marc.info/?l=linux-kernel=142976971510061=2

Yes, I guess gmail lost it, or put it in spam.

> >- "experimental" code for managing md/raid1 across a cluster using
> >  DLM.  Code is not ready for general use and triggers a WARNING if
> >  used.  However it is looking good and mostly done and having in
> >  mainline will help co-ordinate development.
> >
> >That falls far short of the bar for adding it to the kernel.  It not only
> >needs to work, it needs to be reviewed and justified, usually by showing
> 
> Why do you say it does not work?

It's just my abbreviation of that summary paragraph.

> It did go through it's round of reviews on the RAID mailing list. I
> understand that you missed it because you are not subscribed to the raid
> mailing list.

I will look for that.

> >some real world utility to warrant the potential maintenance effort.
> 
> We do have a valid real world utility. It is to provide
> high-availability of RAID1 storage  over the cluster. The
> distributed locking is required only during cases of error and
> superblock updates and is not required during normal operations,
> which makes it fast enough for usual case scenarios.

That's the theory, how much evidence do you have of that in practice?

> What are the doubts you have about it?

Before I begin reviewing the implementation, I'd like to better understand
what it is about the existing raid1 that doesn't work correctly for what
you'd like to do with it, i.e. I don't know what the problem is.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Teigland
On Tue, Jun 09, 2015 at 02:26:25PM -0500, Goldwyn Rodrigues wrote:
> On 06/09/2015 01:22 PM, David Teigland wrote:
> >I've just noticed the existence of clustered MD for the first time.
> >It is a major new user of the dlm, and I have some doubts about it.
> >When did this appear on the mailing list for review?
> 
> It first appeared in December, 2014 on the RAID mailing list.
> http://marc.info/?l=linux-raid=141891941330336=2

I don't read that mailing list.  Searching my archives of linux-kernel, it
has never been mentioned.  I can't even find an email for the md pull
request that included it.

The merge commit states:

- "experimental" code for managing md/raid1 across a cluster using
 DLM.  Code is not ready for general use and triggers a WARNING if
 used.  However it is looking good and mostly done and having in
 mainline will help co-ordinate development.

That falls far short of the bar for adding it to the kernel.  It not only
needs to work, it needs to be reviewed and justified, usually by showing
some real world utility to warrant the potential maintenance effort.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


clustered MD

2015-06-09 Thread David Teigland
I've just noticed the existence of clustered MD for the first time.
It is a major new user of the dlm, and I have some doubts about it.
When did this appear on the mailing list for review?
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


clustered MD

2015-06-09 Thread David Teigland
I've just noticed the existence of clustered MD for the first time.
It is a major new user of the dlm, and I have some doubts about it.
When did this appear on the mailing list for review?
Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Teigland
On Tue, Jun 09, 2015 at 02:26:25PM -0500, Goldwyn Rodrigues wrote:
 On 06/09/2015 01:22 PM, David Teigland wrote:
 I've just noticed the existence of clustered MD for the first time.
 It is a major new user of the dlm, and I have some doubts about it.
 When did this appear on the mailing list for review?
 
 It first appeared in December, 2014 on the RAID mailing list.
 http://marc.info/?l=linux-raidm=141891941330336w=2

I don't read that mailing list.  Searching my archives of linux-kernel, it
has never been mentioned.  I can't even find an email for the md pull
request that included it.

The merge commit states:

- experimental code for managing md/raid1 across a cluster using
 DLM.  Code is not ready for general use and triggers a WARNING if
 used.  However it is looking good and mostly done and having in
 mainline will help co-ordinate development.

That falls far short of the bar for adding it to the kernel.  It not only
needs to work, it needs to be reviewed and justified, usually by showing
some real world utility to warrant the potential maintenance effort.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clustered MD

2015-06-09 Thread David Teigland
On Tue, Jun 09, 2015 at 03:08:11PM -0500, Goldwyn Rodrigues wrote:
 Hi David,
 
 On 06/09/2015 02:45 PM, David Teigland wrote:
 On Tue, Jun 09, 2015 at 02:26:25PM -0500, Goldwyn Rodrigues wrote:
 On 06/09/2015 01:22 PM, David Teigland wrote:
 I've just noticed the existence of clustered MD for the first time.
 It is a major new user of the dlm, and I have some doubts about it.
 When did this appear on the mailing list for review?
 
 It first appeared in December, 2014 on the RAID mailing list.
 http://marc.info/?l=linux-raidm=141891941330336w=2
 
 I don't read that mailing list.  Searching my archives of linux-kernel, it
 has never been mentioned.  I can't even find an email for the md pull
 request that included it.
 
 Is this what you are looking for?
 http://marc.info/?l=linux-kernelm=142976971510061w=2

Yes, I guess gmail lost it, or put it in spam.

 - experimental code for managing md/raid1 across a cluster using
   DLM.  Code is not ready for general use and triggers a WARNING if
   used.  However it is looking good and mostly done and having in
   mainline will help co-ordinate development.
 
 That falls far short of the bar for adding it to the kernel.  It not only
 needs to work, it needs to be reviewed and justified, usually by showing
 
 Why do you say it does not work?

It's just my abbreviation of that summary paragraph.

 It did go through it's round of reviews on the RAID mailing list. I
 understand that you missed it because you are not subscribed to the raid
 mailing list.

I will look for that.

 some real world utility to warrant the potential maintenance effort.
 
 We do have a valid real world utility. It is to provide
 high-availability of RAID1 storage  over the cluster. The
 distributed locking is required only during cases of error and
 superblock updates and is not required during normal operations,
 which makes it fast enough for usual case scenarios.

That's the theory, how much evidence do you have of that in practice?

 What are the doubts you have about it?

Before I begin reviewing the implementation, I'd like to better understand
what it is about the existing raid1 that doesn't work correctly for what
you'd like to do with it, i.e. I don't know what the problem is.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.19

2014-12-10 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.19

This set includes one feature, which allows locks that
have been orphaned to be reacquired.
Thanks,
Dave

David Teigland (1):
  dlm: adopt orphan locks

 fs/dlm/lock.c | 76 +--
 fs/dlm/lock.h |  3 ++
 fs/dlm/user.c | 13 +--
 include/uapi/linux/dlmconstants.h |  2 +-
 4 files changed, 89 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.19

2014-12-10 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.19

This set includes one feature, which allows locks that
have been orphaned to be reacquired.
Thanks,
Dave

David Teigland (1):
  dlm: adopt orphan locks

 fs/dlm/lock.c | 76 +--
 fs/dlm/lock.h |  3 ++
 fs/dlm/user.c | 13 +--
 include/uapi/linux/dlmconstants.h |  2 +-
 4 files changed, 89 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFA][PATCH 5/8] dlm: Remove seq_printf() return checks and use seq_has_overflowed()

2014-11-04 Thread David Teigland
On Tue, Nov 04, 2014 at 08:08:52AM -0500, Steven Rostedt wrote:
> On Wed, 29 Oct 2014 17:56:07 -0400
> Steven Rostedt  wrote:
> 
> > From: Joe Perches 
> > 
> > [ REQUEST FOR ACKS ]
> 
> Can any of the DLM maintainers give me an Acked-by for this?

Looks ok,
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFA][PATCH 5/8] dlm: Remove seq_printf() return checks and use seq_has_overflowed()

2014-11-04 Thread David Teigland
On Tue, Nov 04, 2014 at 08:08:52AM -0500, Steven Rostedt wrote:
 On Wed, 29 Oct 2014 17:56:07 -0400
 Steven Rostedt rost...@goodmis.org wrote:
 
  From: Joe Perches j...@perches.com
  
  [ REQUEST FOR ACKS ]
 
 Can any of the DLM maintainers give me an Acked-by for this?

Looks ok,
Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.18

2014-10-17 Thread David Teigland
(v2: add cc lkml)

Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.18

This includes a single commit fixing a missing endian conversion.
Thanks,
Dave

Neale Ferguson (1):
  dlm: fix missing endian conversion of rcom_status flags

 fs/dlm/rcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.18

2014-10-17 Thread David Teigland
(v2: add cc lkml)

Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.18

This includes a single commit fixing a missing endian conversion.
Thanks,
Dave

Neale Ferguson (1):
  dlm: fix missing endian conversion of rcom_status flags

 fs/dlm/rcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] fs: dlm: lockd: Convert int result to unsigned char type

2014-07-23 Thread David Teigland
On Wed, Jul 23, 2014 at 02:11:39PM -0400, Jeff Layton wrote:
> On Sun, 20 Jul 2014 11:23:43 -0700
> Joe Perches  wrote:
> 
> > op->info.rv is an s32, but it's only used as a u8.
> > 
> 
> I don't understand this patch. info.rv is s32 (and I assume that "rv"
> stands for "return value"). What I don't get is why you think it's just
> used as a u8. It seems to be used more like a bool than anything else,

Thank you, Jeff.

/* info.rv from userspace is 1 for conflict, 0 for no-conflict,
   -ENOENT if there are no locks on the file */

rv = op->info.rv;

> and I'm not sure that "type" is really a good description for it. Maybe
> it should be a "bool" and named "conflict", given the comments in
> dlm_posix_get ?

type is not a good name.

Sorry Joe, I'm not a fan of your patches.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] fs: dlm: lockd: Convert int result to unsigned char type

2014-07-23 Thread David Teigland
On Wed, Jul 23, 2014 at 02:11:39PM -0400, Jeff Layton wrote:
 On Sun, 20 Jul 2014 11:23:43 -0700
 Joe Perches j...@perches.com wrote:
 
  op-info.rv is an s32, but it's only used as a u8.
  
 
 I don't understand this patch. info.rv is s32 (and I assume that rv
 stands for return value). What I don't get is why you think it's just
 used as a u8. It seems to be used more like a bool than anything else,

Thank you, Jeff.

/* info.rv from userspace is 1 for conflict, 0 for no-conflict,
   -ENOENT if there are no locks on the file */

rv = op-info.rv;

 and I'm not sure that type is really a good description for it. Maybe
 it should be a bool and named conflict, given the comments in
 dlm_posix_get ?

type is not a good name.

Sorry Joe, I'm not a fan of your patches.

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 01:16:32PM -0400, Bob Peterson wrote:
> - Original Message -
> > On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote:
> > > On Tue, 01 Jul 2014 06:20:10 -0700
> > > Joe Perches  wrote:
> > > 
> > > > While doing a bit of adding argument names to fs.h,
> > > > I looked at lm_grant and it seems the 2nd argument
> > > > is always NULL.
> > > > 
> > > > How about removing it?
> > > > 
> > > > This doesn't apply as it depends on some other patches
> > > > but it should be clear enough...
> > > > 
> > > 
> > > ACK on the general idea from my standpoint. Anything that simplifies
> > > the file locking interfaces is a good thing, particularly the deferred
> > > locking code.
> > 
> > Fine with me.  I'd be happy to remove all the deferred locking code from
> > dlm; it never really worked.

> GFS2 uses deferred locks, at the very least in its direct_io path
> (gfs2_direct_IO in aops.c). So AFAIK we can't remove THAT without a certain
> amount of pain. Steve is on vacation / holiday this week, but he will
> be back on Thursday and Friday (which is a holiday).

This is about deferred file locks from NFS, not gfs2's "deferred" lock mode.
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote:
> On Tue, 01 Jul 2014 06:20:10 -0700
> Joe Perches  wrote:
> 
> > While doing a bit of adding argument names to fs.h,
> > I looked at lm_grant and it seems the 2nd argument
> > is always NULL.
> > 
> > How about removing it?
> > 
> > This doesn't apply as it depends on some other patches
> > but it should be clear enough...
> > 
> 
> ACK on the general idea from my standpoint. Anything that simplifies
> the file locking interfaces is a good thing, particularly the deferred
> locking code.

Fine with me.  I'd be happy to remove all the deferred locking code from
dlm; it never really worked.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote:
 On Tue, 01 Jul 2014 06:20:10 -0700
 Joe Perches j...@perches.com wrote:
 
  While doing a bit of adding argument names to fs.h,
  I looked at lm_grant and it seems the 2nd argument
  is always NULL.
  
  How about removing it?
  
  This doesn't apply as it depends on some other patches
  but it should be clear enough...
  
 
 ACK on the general idea from my standpoint. Anything that simplifies
 the file locking interfaces is a good thing, particularly the deferred
 locking code.

Fine with me.  I'd be happy to remove all the deferred locking code from
dlm; it never really worked.

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 01:16:32PM -0400, Bob Peterson wrote:
 - Original Message -
  On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote:
   On Tue, 01 Jul 2014 06:20:10 -0700
   Joe Perches j...@perches.com wrote:
   
While doing a bit of adding argument names to fs.h,
I looked at lm_grant and it seems the 2nd argument
is always NULL.

How about removing it?

This doesn't apply as it depends on some other patches
but it should be clear enough...

   
   ACK on the general idea from my standpoint. Anything that simplifies
   the file locking interfaces is a good thing, particularly the deferred
   locking code.
  
  Fine with me.  I'd be happy to remove all the deferred locking code from
  dlm; it never really worked.

 GFS2 uses deferred locks, at the very least in its direct_io path
 (gfs2_direct_IO in aops.c). So AFAIK we can't remove THAT without a certain
 amount of pain. Steve is on vacation / holiday this week, but he will
 be back on Thursday and Friday (which is a holiday).

This is about deferred file locks from NFS, not gfs2's deferred lock mode.
Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.16

2014-06-13 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.16

This set includes one small fix related to resending SCTP messages.
Thanks,
Dave

Lidong Zhong (1):
  dlm: keep listening connection alive with sctp mode

 fs/dlm/lowcomms.c | 5 +
 1 file changed, 5 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.16

2014-06-13 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.16

This set includes one small fix related to resending SCTP messages.
Thanks,
Dave

Lidong Zhong (1):
  dlm: keep listening connection alive with sctp mode

 fs/dlm/lowcomms.c | 5 +
 1 file changed, 5 insertions(+)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.15

2014-04-02 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.15

This set includes a couple trivial cleanups and changes recovery
log messages from DEBUG to INFO.
Thanks,
Dave

Dan Carpenter (1):
  dlm: silence a harmless use after free warning

David Teigland (1):
  dlm: use INFO for recovery messages

Rashika Kheria (1):
  fs: Include appropriate header file in dlm/ast.c

 fs/dlm/ast.c  |  3 ++-
 fs/dlm/dir.c  |  4 ++--
 fs/dlm/dlm_internal.h |  2 ++
 fs/dlm/lock.c |  7 ---
 fs/dlm/lockspace.c|  8 
 fs/dlm/member.c   | 27 ---
 fs/dlm/recover.c  | 10 +-
 fs/dlm/recoverd.c | 34 +-
 8 files changed, 48 insertions(+), 47 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.15

2014-04-02 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.15

This set includes a couple trivial cleanups and changes recovery
log messages from DEBUG to INFO.
Thanks,
Dave

Dan Carpenter (1):
  dlm: silence a harmless use after free warning

David Teigland (1):
  dlm: use INFO for recovery messages

Rashika Kheria (1):
  fs: Include appropriate header file in dlm/ast.c

 fs/dlm/ast.c  |  3 ++-
 fs/dlm/dir.c  |  4 ++--
 fs/dlm/dlm_internal.h |  2 ++
 fs/dlm/lock.c |  7 ---
 fs/dlm/lockspace.c|  8 
 fs/dlm/member.c   | 27 ---
 fs/dlm/recover.c  | 10 +-
 fs/dlm/recoverd.c | 34 +-
 8 files changed, 48 insertions(+), 47 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.14

2014-01-21 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.14

This set includes a single change to speed up
recovery times when using SCTP connections.
Thanks,
Dave

Dongmao Zhang (1):
  dlm: set zero linger time on sctp socket

 fs/dlm/lowcomms.c | 8 
 1 file changed, 8 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.14

2014-01-21 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.14

This set includes a single change to speed up
recovery times when using SCTP connections.
Thanks,
Dave

Dongmao Zhang (1):
  dlm: set zero linger time on sctp socket

 fs/dlm/lowcomms.c | 8 
 1 file changed, 8 insertions(+)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.13

2013-11-11 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.13

This set includes a single fix to resolve to a race that could cause
lockspace shutdown to incorrectly return -EBUSY.
Thanks,
Dave

Bart Van Assche (1):
  dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY

 fs/dlm/lockspace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.13

2013-11-11 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.13

This set includes a single fix to resolve to a race that could cause
lockspace shutdown to incorrectly return -EBUSY.
Thanks,
Dave

Bart Van Assche (1):
  dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY

 fs/dlm/lockspace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.12

2013-09-04 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.12

This set includes a workqueue cleanup and the removal
of incorrect and unneeded signal blocking.

The removal of signal blocking conflicts with Oleg's
201d3df dlm: kill the unnecessary and wrong device_close()->recalc_sigpending()

Let me know if you'd like me to send a resolved patch.
Thanks,
Dave

David Teigland (1):
  dlm: remove signal blocking

Tejun Heo (1):
  dlm: WQ_NON_REENTRANT is meaningless and going away

 fs/dlm/ast.c  |  5 +
 fs/dlm/user.c | 25 ++---
 2 files changed, 7 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.12

2013-09-04 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.12

This set includes a workqueue cleanup and the removal
of incorrect and unneeded signal blocking.

The removal of signal blocking conflicts with Oleg's
201d3df dlm: kill the unnecessary and wrong device_close()-recalc_sigpending()

Let me know if you'd like me to send a resolved patch.
Thanks,
Dave

David Teigland (1):
  dlm: remove signal blocking

Tejun Heo (1):
  dlm: WQ_NON_REENTRANT is meaningless and going away

 fs/dlm/ast.c  |  5 +
 fs/dlm/user.c | 25 ++---
 2 files changed, 7 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the dlm tree with Linus' tree

2013-08-16 Thread David Teigland
On Fri, Aug 16, 2013 at 11:40:50AM +1000, Stephen Rothwell wrote:
> Hi David,
> 
> Today's linux-next merge of the dlm tree got a conflict in fs/dlm/user.c
> between commit 201d3dfa4da1 ("dlm: kill the unnecessary and wrong
> device_close()->recalc_sigpending()") from Linus' tree and commit
> c6ca7bc91d51 ("dlm: remove signal blocking") from the dlm tree.
> 
> I fixed it up (the latter is a superset of the former, so I just used it)
> and can carry the fix as necessary (no action is required).

Thanks, what's the procedure to get the right thing merged in the end?
Apply Oleg's patch to my tree followed by my own?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the dlm tree with Linus' tree

2013-08-16 Thread David Teigland
On Fri, Aug 16, 2013 at 11:40:50AM +1000, Stephen Rothwell wrote:
 Hi David,
 
 Today's linux-next merge of the dlm tree got a conflict in fs/dlm/user.c
 between commit 201d3dfa4da1 (dlm: kill the unnecessary and wrong
 device_close()-recalc_sigpending()) from Linus' tree and commit
 c6ca7bc91d51 (dlm: remove signal blocking) from the dlm tree.
 
 I fixed it up (the latter is a superset of the former, so I just used it)
 and can carry the fix as necessary (no action is required).

Thanks, what's the procedure to get the right thing merged in the end?
Apply Oleg's patch to my tree followed by my own?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] dlm: kill the unnecessary and wrong device_close()->recalc_sigpending()

2013-08-12 Thread David Teigland
On Fri, Aug 09, 2013 at 05:19:13PM +0200, Oleg Nesterov wrote:
> device_close()->recalc_sigpending() is not needed, sigprocmask()
> takes care of TIF_SIGPENDING correctly.
> 
> And without ->siglock it is racy and wrong, it can wrongly clear
> TIF_SIGPENDING and miss a signal.
> 
> But even with this patch device_close() is still buggy:
> 
>   1. sigprocmask() should not be used, we have set_task_blocked(),
>  but this is minor.
> 
>   2. We should never block SIGKILL or SIGSTOP, and this is what
>  the code tries to do.
> 
>   3. This can't protect against SIGKILL or SIGSTOP anyway. Another
>  thread can do signal_wake_up(), say, do_signal_stop() or
>  complete_signal() or debugger.
> 
>   4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
>  TIF_SIGPENDING, say, freezing() or ->jobctl.
> 
>   5. device_write() looks equally wrong by the same reason.
> 
> Looks like, this tries to protect some wait_event_interruptible() logic
> from signals, it should be turned into uninterruptible wait. Or we need
> to implement something like signals_stop/start for such a use-case.

I can't remember why that signal code exists, or if I ever knew; it was
there when the code was added seven years ago.  I agree that if there's
something we cannot interrupt, we should use uninterruptible, but I don't
see any cases of that either.  I think we should just remove it all
(untested):

From: David Teigland 
Date: Mon, 12 Aug 2013 15:22:43 -0500
Subject: [PATCH] dlm: remove signal blocking

The signal blocking was incorrect and unnecessary
so just remove it.

Signed-off-by: David Teigland 
---
 fs/dlm/user.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index 911649a..142e216 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -493,7 +493,6 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
 {
struct dlm_user_proc *proc = file->private_data;
struct dlm_write_request *kbuf;
-   sigset_t tmpsig, allsigs;
int error;
 
 #ifdef CONFIG_COMPAT
@@ -557,9 +556,6 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
goto out_free;
}
 
-   sigfillset();
-   sigprocmask(SIG_BLOCK, , );
-
error = -EINVAL;
 
switch (kbuf->cmd)
@@ -567,7 +563,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_LOCK:
if (!proc) {
log_print("no locking on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_user_lock(proc, >i.lock);
break;
@@ -575,7 +571,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_UNLOCK:
if (!proc) {
log_print("no locking on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_user_unlock(proc, >i.lock);
break;
@@ -583,7 +579,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_DEADLOCK:
if (!proc) {
log_print("no locking on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_user_deadlock(proc, >i.lock);
break;
@@ -591,7 +587,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_CREATE_LOCKSPACE:
if (proc) {
log_print("create/remove only on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_create_lockspace(>i.lspace);
break;
@@ -599,7 +595,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_REMOVE_LOCKSPACE:
if (proc) {
log_print("create/remove only on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_remove_lockspace(>i.lspace);
break;
@@ -607,7 +603,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_PURGE:
if (!proc) {
log_print("no locking on control device");
-   goto out_sig;
+   goto out_free;
}
error = device_user_purge(proc, >i.purge);
break;
@@ -617,8 +613,6 @@ s

Re: [PATCH 1/1] dlm: kill the unnecessary and wrong device_close()-recalc_sigpending()

2013-08-12 Thread David Teigland
On Fri, Aug 09, 2013 at 05:19:13PM +0200, Oleg Nesterov wrote:
 device_close()-recalc_sigpending() is not needed, sigprocmask()
 takes care of TIF_SIGPENDING correctly.
 
 And without -siglock it is racy and wrong, it can wrongly clear
 TIF_SIGPENDING and miss a signal.
 
 But even with this patch device_close() is still buggy:
 
   1. sigprocmask() should not be used, we have set_task_blocked(),
  but this is minor.
 
   2. We should never block SIGKILL or SIGSTOP, and this is what
  the code tries to do.
 
   3. This can't protect against SIGKILL or SIGSTOP anyway. Another
  thread can do signal_wake_up(), say, do_signal_stop() or
  complete_signal() or debugger.
 
   4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
  TIF_SIGPENDING, say, freezing() or -jobctl.
 
   5. device_write() looks equally wrong by the same reason.
 
 Looks like, this tries to protect some wait_event_interruptible() logic
 from signals, it should be turned into uninterruptible wait. Or we need
 to implement something like signals_stop/start for such a use-case.

I can't remember why that signal code exists, or if I ever knew; it was
there when the code was added seven years ago.  I agree that if there's
something we cannot interrupt, we should use uninterruptible, but I don't
see any cases of that either.  I think we should just remove it all
(untested):

From: David Teigland teigl...@redhat.com
Date: Mon, 12 Aug 2013 15:22:43 -0500
Subject: [PATCH] dlm: remove signal blocking

The signal blocking was incorrect and unnecessary
so just remove it.

Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/user.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index 911649a..142e216 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -493,7 +493,6 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
 {
struct dlm_user_proc *proc = file-private_data;
struct dlm_write_request *kbuf;
-   sigset_t tmpsig, allsigs;
int error;
 
 #ifdef CONFIG_COMPAT
@@ -557,9 +556,6 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
goto out_free;
}
 
-   sigfillset(allsigs);
-   sigprocmask(SIG_BLOCK, allsigs, tmpsig);
-
error = -EINVAL;
 
switch (kbuf-cmd)
@@ -567,7 +563,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_LOCK:
if (!proc) {
log_print(no locking on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_user_lock(proc, kbuf-i.lock);
break;
@@ -575,7 +571,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_UNLOCK:
if (!proc) {
log_print(no locking on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_user_unlock(proc, kbuf-i.lock);
break;
@@ -583,7 +579,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_DEADLOCK:
if (!proc) {
log_print(no locking on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_user_deadlock(proc, kbuf-i.lock);
break;
@@ -591,7 +587,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_CREATE_LOCKSPACE:
if (proc) {
log_print(create/remove only on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_create_lockspace(kbuf-i.lspace);
break;
@@ -599,7 +595,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_REMOVE_LOCKSPACE:
if (proc) {
log_print(create/remove only on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_remove_lockspace(kbuf-i.lspace);
break;
@@ -607,7 +603,7 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
case DLM_USER_PURGE:
if (!proc) {
log_print(no locking on control device);
-   goto out_sig;
+   goto out_free;
}
error = device_user_purge(proc, kbuf-i.purge);
break;
@@ -617,8 +613,6 @@ static ssize_t device_write(struct file *file, const char 
__user *buf,
  kbuf-cmd);
}
 
- out_sig

[GIT PULL] dlm updates for 3.11

2013-07-01 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.11

This set includes a number of SCTP related fixes in the dlm,
and a few other minor fixes and changes.

Thanks,
Dave


Bart Van Assche (1):
  dlm: Avoid LVB truncation

David Teigland (1):
  dlm: log an error for unmanaged lockspaces

Mike Christie (6):
  dlm: clear correct init bit during sctp setup
  dlm: set sctp assoc id during setup
  dlm: clear correct bit during sctp init failure handling
  dlm: try other IPs when sctp init assoc fails
  dlm: retry failed SCTP sends
  dlm: disable nagle for SCTP

Wei Yongjun (1):
  dlm: remove duplicated include from lowcomms.c

Zhao Hongjiang (1):
  dlm: config: using strlcpy instead of strncpy


 fs/dlm/config.c|   5 +-
 fs/dlm/lock.c  |   8 +--
 fs/dlm/lockspace.c |   9 ++-
 fs/dlm/lowcomms.c  | 177 -
 4 files changed, 149 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.11

2013-07-01 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.11

This set includes a number of SCTP related fixes in the dlm,
and a few other minor fixes and changes.

Thanks,
Dave


Bart Van Assche (1):
  dlm: Avoid LVB truncation

David Teigland (1):
  dlm: log an error for unmanaged lockspaces

Mike Christie (6):
  dlm: clear correct init bit during sctp setup
  dlm: set sctp assoc id during setup
  dlm: clear correct bit during sctp init failure handling
  dlm: try other IPs when sctp init assoc fails
  dlm: retry failed SCTP sends
  dlm: disable nagle for SCTP

Wei Yongjun (1):
  dlm: remove duplicated include from lowcomms.c

Zhao Hongjiang (1):
  dlm: config: using strlcpy instead of strncpy


 fs/dlm/config.c|   5 +-
 fs/dlm/lock.c  |   8 +--
 fs/dlm/lockspace.c |   9 ++-
 fs/dlm/lowcomms.c  | 177 -
 4 files changed, 149 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/10] idr: Rewrite ida

2013-06-21 Thread David Teigland
On Wed, Jun 19, 2013 at 04:38:36PM -0700, Kent Overstreet wrote:
> On Wed, Jun 19, 2013 at 10:40:22AM +0100, Steven Whitehouse wrote:
> > Millions of IDs is something that is fairly normal for DLM, since there
> > will be two DLM locks per cached inode with GFS2 and people tend to use
> > it on pretty large servers with lots of memory,
> 
> Thanks, I wasn't aware of that. Is the 31 bits for the id limitation an
> issue for you? While I'm at changing ids to longs should be fairly
> trivial.

There is a dlm_lkb struct in memory for each id, so 31 bits will not
be a problem.
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/10] idr: Rewrite ida

2013-06-21 Thread David Teigland
On Wed, Jun 19, 2013 at 04:38:36PM -0700, Kent Overstreet wrote:
 On Wed, Jun 19, 2013 at 10:40:22AM +0100, Steven Whitehouse wrote:
  Millions of IDs is something that is fairly normal for DLM, since there
  will be two DLM locks per cached inode with GFS2 and people tend to use
  it on pretty large servers with lots of memory,
 
 Thanks, I wasn't aware of that. Is the 31 bits for the id limitation an
 issue for you? While I'm at changing ids to longs should be fairly
 trivial.

There is a dlm_lkb struct in memory for each id, so 31 bits will not
be a problem.
Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for May 8 (dlm)

2013-05-09 Thread David Teigland
On Thu, May 09, 2013 at 09:47:45AM +1000, Stephen Rothwell wrote:
> [Just forwarding to David ...]
> 
> On Wed, 08 May 2013 11:04:45 -0700 Randy Dunlap  wrote:
> >
> > on x86_64:
> > 
> > when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m:
> > 
> > fs/built-in.o: In function `gfs2_lock':
> > file.c:(.text+0xa512c): undefined reference to `dlm_posix_get'
> > file.c:(.text+0xa5140): undefined reference to `dlm_posix_unlock'
> > file.c:(.text+0xa514a): undefined reference to `dlm_posix_lock'

gfs2/file.c calls the dlm directly, so I suppose gfs2 itself needs
to depend on the dlm.  It's been like this for a long time, so I
don't know why it only appeared now.

> > fs/built-in.o: In function `gdlm_cancel':
> > lock_dlm.c:(.text+0xb3f57): undefined reference to `dlm_unlock'
> > fs/built-in.o: In function `gdlm_unmount':
> > lock_dlm.c:(.text+0xb40ff): undefined reference to `dlm_release_lockspace'
> > fs/built-in.o: In function `sync_unlock.isra.4':
> > lock_dlm.c:(.text+0xb420d): undefined reference to `dlm_unlock'
> > fs/built-in.o: In function `sync_lock.isra.5':
> > lock_dlm.c:(.text+0xb42d9): undefined reference to `dlm_lock'
> > fs/built-in.o: In function `gdlm_put_lock':
> > lock_dlm.c:(.text+0xb45e7): undefined reference to `dlm_unlock'
> > fs/built-in.o: In function `gdlm_mount':
> > lock_dlm.c:(.text+0xb4928): undefined reference to `dlm_new_lockspace'
> > lock_dlm.c:(.text+0xb4c75): undefined reference to `dlm_release_lockspace'
> > fs/built-in.o: In function `gdlm_lock':
> > lock_dlm.c:(.text+0xb529f): undefined reference to `dlm_lock'

lock_dlm.c is GFS2_FS_LOCKING_DLM which depends on DLM.
Is that not correct?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >