Re: Lease semantic proposal

2019-10-02 Thread Jeff Layton
On Wed, 2019-10-02 at 15:27 -0400, J. Bruce Fields wrote:
> On Wed, Oct 02, 2019 at 08:28:40AM -0400, Jeff Layton wrote:
> > On Tue, 2019-10-01 at 11:17 -0700, Ira Weiny wrote:
> > > On Mon, Sep 23, 2019 at 04:17:59PM -0400, Jeff Layton wrote:
> > > > On Mon, 2019-09-23 at 12:08 -0700, Ira Weiny wrote:
> > > > > Since the last RFC patch set[1] much of the discussion of supporting 
> > > > > RDMA with
> > > > > FS DAX has been around the semantics of the lease mechanism.[2]  
> > > > > Within that
> > > > > thread it was suggested I try and write some documentation and/or 
> > > > > tests for the
> > > > > new mechanism being proposed.  I have created a foundation to test 
> > > > > lease
> > > > > functionality within xfstests.[3] This should be close to being 
> > > > > accepted.
> > > > > Before writing additional lease tests, or changing lots of kernel 
> > > > > code, this
> > > > > email presents documentation for the new proposed "layout lease" 
> > > > > semantic.
> > > > > 
> > > > > At Linux Plumbers[4] just over a week ago, I presented the current 
> > > > > state of the
> > > > > patch set and the outstanding issues.  Based on the discussion there, 
> > > > > well as
> > > > > follow up emails, I propose the following addition to the fcntl() man 
> > > > > page.
> > > > > 
> > > > > Thank you,
> > > > > Ira
> > > > > 
> > > > > [1] https://lkml.org/lkml/2019/8/9/1043
> > > > > [2] https://lkml.org/lkml/2019/8/9/1062
> > > > > [3] https://www.spinics.net/lists/fstests/msg12620.html
> > > > > [4] https://linuxplumbersconf.org/event/4/contributions/368/
> > > > > 
> > > > > 
> > > > 
> > > > Thank you so much for doing this, Ira. This allows us to debate the
> > > > user-visible behavior semantics without getting bogged down in the
> > > > implementation details. More comments below:
> > > 
> > > Thanks.  Sorry for the delay in response.  Turns out this email was in my
> > > spam...  :-/  I'll need to work out why.
> > > 
> > > > > 
> > > > > Layout Leases
> > > > > -
> > > > > 
> > > > > Layout (F_LAYOUT) leases are special leases which can be used to 
> > > > > control and/or
> > > > > be informed about the manipulation of the underlying layout of a file.
> > > > > 
> > > > > A layout is defined as the logical file block -> physical file block 
> > > > > mapping
> > > > > including the file size and sharing of physical blocks among files.  
> > > > > Note that
> > > > > the unwritten state of a block is not considered part of file layout.
> > > > > 
> > > > > **Read layout lease F_RDLCK | F_LAYOUT**
> > > > > 
> > > > > Read layout leases can be used to be informed of layout changes by the
> > > > > system or other users.  This lease is similar to the standard read 
> > > > > (F_RDLCK)
> > > > > lease in that any attempt to change the _layout_ of the file will be 
> > > > > reported to
> > > > > the process through the lease break process.  But this lease is 
> > > > > different
> > > > > because the file can be opened for write and data can be read and/or 
> > > > > written to
> > > > > the file as long as the underlying layout of the file does not change.
> > > > > Therefore, the lease is not broken if the file is simply open for 
> > > > > write, but
> > > > > _may_ be broken if an operation such as, truncate(), fallocate() or 
> > > > > write()
> > > > > results in changing the underlying layout.
> > > > > 
> > > > > **Write layout lease (F_WRLCK | F_LAYOUT)**
> > > > > 
> > > > > Write Layout leases can be used to break read layout leases to 
> > > > > indicate that
> > > > > the process intends to change the underlying layout lease of the file.
> > > > > 
> > > > > A process which has taken a write layout lease has exclusive 
> > > > > ownership of the
> > > > > file layout and can modify that 

Re: Lease semantic proposal

2019-10-02 Thread Jeff Layton
On Tue, 2019-10-01 at 11:17 -0700, Ira Weiny wrote:
> On Mon, Sep 23, 2019 at 04:17:59PM -0400, Jeff Layton wrote:
> > On Mon, 2019-09-23 at 12:08 -0700, Ira Weiny wrote:
> > > Since the last RFC patch set[1] much of the discussion of supporting RDMA 
> > > with
> > > FS DAX has been around the semantics of the lease mechanism.[2]  Within 
> > > that
> > > thread it was suggested I try and write some documentation and/or tests 
> > > for the
> > > new mechanism being proposed.  I have created a foundation to test lease
> > > functionality within xfstests.[3] This should be close to being accepted.
> > > Before writing additional lease tests, or changing lots of kernel code, 
> > > this
> > > email presents documentation for the new proposed "layout lease" semantic.
> > > 
> > > At Linux Plumbers[4] just over a week ago, I presented the current state 
> > > of the
> > > patch set and the outstanding issues.  Based on the discussion there, 
> > > well as
> > > follow up emails, I propose the following addition to the fcntl() man 
> > > page.
> > > 
> > > Thank you,
> > > Ira
> > > 
> > > [1] https://lkml.org/lkml/2019/8/9/1043
> > > [2] https://lkml.org/lkml/2019/8/9/1062
> > > [3] https://www.spinics.net/lists/fstests/msg12620.html
> > > [4] https://linuxplumbersconf.org/event/4/contributions/368/
> > > 
> > > 
> > 
> > Thank you so much for doing this, Ira. This allows us to debate the
> > user-visible behavior semantics without getting bogged down in the
> > implementation details. More comments below:
> 
> Thanks.  Sorry for the delay in response.  Turns out this email was in my
> spam...  :-/  I'll need to work out why.
> 
> > > 
> > > Layout Leases
> > > -
> > > 
> > > Layout (F_LAYOUT) leases are special leases which can be used to control 
> > > and/or
> > > be informed about the manipulation of the underlying layout of a file.
> > > 
> > > A layout is defined as the logical file block -> physical file block 
> > > mapping
> > > including the file size and sharing of physical blocks among files.  Note 
> > > that
> > > the unwritten state of a block is not considered part of file layout.
> > > 
> > > **Read layout lease F_RDLCK | F_LAYOUT**
> > > 
> > > Read layout leases can be used to be informed of layout changes by the
> > > system or other users.  This lease is similar to the standard read 
> > > (F_RDLCK)
> > > lease in that any attempt to change the _layout_ of the file will be 
> > > reported to
> > > the process through the lease break process.  But this lease is different
> > > because the file can be opened for write and data can be read and/or 
> > > written to
> > > the file as long as the underlying layout of the file does not change.
> > > Therefore, the lease is not broken if the file is simply open for write, 
> > > but
> > > _may_ be broken if an operation such as, truncate(), fallocate() or 
> > > write()
> > > results in changing the underlying layout.
> > > 
> > > **Write layout lease (F_WRLCK | F_LAYOUT)**
> > > 
> > > Write Layout leases can be used to break read layout leases to indicate 
> > > that
> > > the process intends to change the underlying layout lease of the file.
> > > 
> > > A process which has taken a write layout lease has exclusive ownership of 
> > > the
> > > file layout and can modify that layout as long as the lease is held.
> > > Operations which change the layout are allowed by that process.  But 
> > > operations
> > > from other file descriptors which attempt to change the layout will break 
> > > the
> > > lease through the standard lease break process.  The F_LAYOUT flag is 
> > > used to
> > > indicate a difference between a regular F_WRLCK and F_WRLCK with 
> > > F_LAYOUT.  In
> > > the F_LAYOUT case opens for write do not break the lease.  But some 
> > > operations,
> > > if they change the underlying layout, may.
> > > 
> > > The distinction between read layout leases and write layout leases is that
> > > write layout leases can change the layout without breaking the lease 
> > > within the
> > > owning process.  This is useful to guarantee a layout prior to specifying 
> > > the
> > > unbreakable flag described bel

Re: Lease semantic proposal

2019-09-26 Thread Jeff Layton
 > multiple unbreakable read layout leases can be taken by multiple 
> > > processes and
> > > used to pin the underlying pages of that file.
> > 
> > Ok, so what happens when someone now takes a
> > F_WRLOCK|F_LAYOUT|F_UNBREAK? Is that supposed to break
> > F_RDLCK|F_LAYOUT|F_UNBREAK, as the wording about F_WRLCK behaviour
> > implies it should?
> 
> Ah no.  F_RDLCK|F_LAYOUT|F_UNBREAK could not be broken.  I'll have to update
> the text for this.  The idea here is that no one can be changing the layout 
> but
> multiple readers could be using that layout.  So I'll update the text that a
> F_WRLCK|F_LAYOUT|F_UNBREAK would fail in this case.
> 
> > > Care must therefore be taken to ensure that the layout of the file is as 
> > > the
> > > user wants prior to using the unbreakable read layout lease.  A safe 
> > > mechanism
> > > to do this would be to take a write layout lease and use fallocate() to 
> > > set the
> > > layout of the file.  The layout lease can then be "downgraded" to 
> > > unbreakable
> > > read layout as long as no other user broke the write layout lease.
> > 
> > What are the semantics of this "downgrade" behaviour you speak of? :)
> 
> As I said above it may be a downgrade or an upgrade but the idea is to
> atomically convert the lease to F_UNBREAK.
> 
> > My thoughts are:
> > - RDLCK can only be used for O_RDONLY because write()
> >   requires breaking of leases
> 
> Does the file system require write() break the layout lease?  Or is that just
> the way the file system is currently implemented?  What about mmap()?  I need
> to have the file open WR to mmap() the file for RDMA.
> 
> To be clear I'm intending F_RDLCK|F_LAYOUT to be something new.  As I said
> above we could use something like F_RDLAYOUT instead?
> 
> > - WRLCK is open to abuse simply by not using a layout lease
> >   to do a "no change" layout modification
> 
> I'm sorry, I don't understand this comment.
> 
> > - RDLCK|F_UNBREAK is entirely unusable
> 
> Well even if write() fails with ETXTBSY this should give us the ability to do
> RDMA and XDP to these areas from multiple processes.  Furthermore, for FS DAX
> which bypasses the page cache mmap'ed areas can be written without write() 
> with
> CPU stores.  Which is how many RDMA applications are likely to write this 
> data.
> 
> > - WRLCK|F_UNBREAK will be what every application uses
> >   because everything else either doesn't work or is too easy
> >   to abuse.
> 
> Maybe.  IMO that is still a step in the right direction as at least 1 process
> can use this now.  And these semantics allow for a shared unbreakable lease
> (F_RDLCK|F_LAYOUT|F_UNBREAK) which can be used with some configurations (FS 
> DAX
> in particular).
> 
> Also, I do think we will need something to ensure file ownership for 
> F_UNBREAK.
> 
> It sounds like the difficulty here is in potential implementation of allowing
> write() to not break layouts.  And dealing with writes to mmap'ed page cache
> pages.  The file system is free to do better later.
> 

Whatever the semantics, from an API standpoint, I think we should not
try to multiplex layout behavior on top of F_SETLEASE. Layouts should
get new fcntl cmd values (maybe F_SETLAYOUT/F_GETLAYOUT) and then you
wouldn't need to expose the F_LAYOUT flag to userland.

The behavior of layouts is different enough from "traditional" leases
that trying to mash them together like this is only going to cause
confusion.

Note that I'm mainly concerned with the user-facing API with this. The
kernel internals can still use F_LAYOUT flag, and you can do the
F_SETLAYOUT -> F_LAYOUT translation at a high level.
-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Lease semantic proposal

2019-09-23 Thread Jeff Layton
 layout leases can have the unbreakable flag (F_UNBREAK)
> specified.  The difference between an unbreakable read layout lease and an
> unbreakable write layout lease are that an unbreakable read layout lease is
> _not_ exclusive.  This means that once a layout is established on a file,
> multiple unbreakable read layout leases can be taken by multiple processes and
> used to pin the underlying pages of that file.
> 
> Care must therefore be taken to ensure that the layout of the file is as the
> user wants prior to using the unbreakable read layout lease.  A safe mechanism
> to do this would be to take a write layout lease and use fallocate() to set 
> the
> layout of the file.  The layout lease can then be "downgraded" to unbreakable
> read layout as long as no other user broke the write layout lease.
> 

Will userland require any special privileges in order to set an
F_UNBREAK lease? This seems like something that could be used for DoS. I
assume that these will never time out.

How will we deal with the case where something is is squatting on an
F_UNBREAK lease and isn't letting it go?

Leases are technically "owned" by the file description -- we can't
necessarily trace it back to a single task in a threaded program. The
kernel task that set the lease may have exited by the time we go
looking.

Will we be content trying to determine this using /proc/locks+lsof, etc,
or will we need something better?

> 

-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH v2 02/19] fs/locks: Add Exclusive flag to user Layout lease

2019-09-04 Thread Jeff Layton
On Thu, 2019-08-29 at 16:34 -0700, Ira Weiny wrote:
> Missed this.  sorry.
> 
> On Mon, Aug 26, 2019 at 06:41:07AM -0400, Jeff Layton wrote:
> > On Thu, 2019-08-15 at 07:56 +1000, Dave Chinner wrote:
> > > On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote:
> > > > On Fri, 2019-08-09 at 15:58 -0700, ira.we...@intel.com wrote:
> > > > > From: Ira Weiny 
> > > > > 
> > > > > Add an exclusive lease flag which indicates that the layout mechanism
> > > > > can not be broken.
> > > > > 
> > > > > Exclusive layout leases allow the file system to know that pages may 
> > > > > be
> > > > > GUP pined and that attempts to change the layout, ie truncate, should 
> > > > > be
> > > > > failed.
> > > > > 
> > > > > A process which attempts to break it's own exclusive lease gets an
> > > > > EDEADLOCK return to help determine that this is likely a programming 
> > > > > bug
> > > > > vs someone else holding a resource.
> > > .
> > > > > diff --git a/include/uapi/asm-generic/fcntl.h 
> > > > > b/include/uapi/asm-generic/fcntl.h
> > > > > index baddd54f3031..88b175ceccbc 100644
> > > > > --- a/include/uapi/asm-generic/fcntl.h
> > > > > +++ b/include/uapi/asm-generic/fcntl.h
> > > > > @@ -176,6 +176,8 @@ struct f_owner_ex {
> > > > >  
> > > > >  #define F_LAYOUT 16  /* layout lease to allow longterm pins 
> > > > > such as
> > > > >  RDMA */
> > > > > +#define F_EXCLUSIVE  32  /* layout lease is exclusive */
> > > > > + /* FIXME or shoudl this be F_EXLCK??? */
> > > > >  
> > > > >  /* operations for bsd flock(), also used by the kernel 
> > > > > implementation */
> > > > >  #define LOCK_SH  1   /* shared lock */
> > > > 
> > > > This interface just seems weird to me. The existing F_*LCK values aren't
> > > > really set up to be flags, but are enumerated values (even if there are
> > > > some gaps on some arches). For instance, on parisc and sparc:
> > > 
> > > I don't think we need to worry about this - the F_WRLCK version of
> > > the layout lease should have these exclusive access semantics (i.e
> > > other ops fail rather than block waiting for lease recall) and hence
> > > the API shouldn't need a new flag to specify them.
> > > 
> > > i.e. the primary difference between F_RDLCK and F_WRLCK layout
> > > leases is that the F_RDLCK is a shared, co-operative lease model
> > > where only delays in operations will be seen, while F_WRLCK is a
> > > "guarantee exclusive access and I don't care what it breaks"
> > > model... :)
> > > 
> > 
> > Not exactly...
> > 
> > F_WRLCK and F_RDLCK leases can both be broken, and will eventually time
> > out if there is conflicting access. The F_EXCLUSIVE flag on the other
> > hand is there to prevent any sort of lease break from 
> 
> Right EXCLUSIVE will not break for any reason.  It will fail truncate and hole
> punch as we discussed back in June.  This is for the use case where the user
> has handed this file/pages off to some hardware for which removing the lease
> would be impossible.  _And_ we don't anticipate any valid use case that 
> someone
> will need to truncate short of killing the process to free up file system
> space.
> 
> > I'm guessing what Ira really wants with the F_EXCLUSIVE flag is
> > something akin to what happens when we set fl_break_time to 0 in the
> > nfsd code. nfsd never wants the locks code to time out a lease of any
> > sort, since it handles that timeout itself.
> > 
> > If you're going to add this functionality, it'd be good to also convert
> > knfsd to use it as well, so we don't end up with multiple ways to deal
> > with that situation.
> 
> Could you point me at the source for knfsd?  I looked in 
> 
> git://git.linux-nfs.org/projects/steved/nfs-utils.git
> 
> but I don't see anywhere leases are used in that source?
> 

Ahh sorry that wasn't clear. It's the fs/nfsd directory in the Linux
kernel sources. See nfsd4_layout_lm_break and nfsd_break_deleg_cb in
particular.

-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH v2 02/19] fs/locks: Add Exclusive flag to user Layout lease

2019-08-26 Thread Jeff Layton
On Thu, 2019-08-15 at 07:56 +1000, Dave Chinner wrote:
> On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote:
> > On Fri, 2019-08-09 at 15:58 -0700, ira.we...@intel.com wrote:
> > > From: Ira Weiny 
> > > 
> > > Add an exclusive lease flag which indicates that the layout mechanism
> > > can not be broken.
> > > 
> > > Exclusive layout leases allow the file system to know that pages may be
> > > GUP pined and that attempts to change the layout, ie truncate, should be
> > > failed.
> > > 
> > > A process which attempts to break it's own exclusive lease gets an
> > > EDEADLOCK return to help determine that this is likely a programming bug
> > > vs someone else holding a resource.
> .
> > > diff --git a/include/uapi/asm-generic/fcntl.h 
> > > b/include/uapi/asm-generic/fcntl.h
> > > index baddd54f3031..88b175ceccbc 100644
> > > --- a/include/uapi/asm-generic/fcntl.h
> > > +++ b/include/uapi/asm-generic/fcntl.h
> > > @@ -176,6 +176,8 @@ struct f_owner_ex {
> > >  
> > >  #define F_LAYOUT 16  /* layout lease to allow longterm pins such as
> > >  RDMA */
> > > +#define F_EXCLUSIVE  32  /* layout lease is exclusive */
> > > + /* FIXME or shoudl this be F_EXLCK??? */
> > >  
> > >  /* operations for bsd flock(), also used by the kernel implementation */
> > >  #define LOCK_SH  1   /* shared lock */
> > 
> > This interface just seems weird to me. The existing F_*LCK values aren't
> > really set up to be flags, but are enumerated values (even if there are
> > some gaps on some arches). For instance, on parisc and sparc:
> 
> I don't think we need to worry about this - the F_WRLCK version of
> the layout lease should have these exclusive access semantics (i.e
> other ops fail rather than block waiting for lease recall) and hence
> the API shouldn't need a new flag to specify them.
> 
> i.e. the primary difference between F_RDLCK and F_WRLCK layout
> leases is that the F_RDLCK is a shared, co-operative lease model
> where only delays in operations will be seen, while F_WRLCK is a
> "guarantee exclusive access and I don't care what it breaks"
> model... :)
> 

Not exactly...

F_WRLCK and F_RDLCK leases can both be broken, and will eventually time
out if there is conflicting access. The F_EXCLUSIVE flag on the other
hand is there to prevent any sort of lease break from 

I'm guessing what Ira really wants with the F_EXCLUSIVE flag is
something akin to what happens when we set fl_break_time to 0 in the
nfsd code. nfsd never wants the locks code to time out a lease of any
sort, since it handles that timeout itself.

If you're going to add this functionality, it'd be good to also convert
knfsd to use it as well, so we don't end up with multiple ways to deal
with that situation.
-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH v2 02/19] fs/locks: Add Exclusive flag to user Layout lease

2019-08-14 Thread Jeff Layton
   03

While your new flag values are well above these values, it's still a bit
sketchy to do what you're proposing from a cross-platform interface
standpoint.

I think this would be a lot cleaner if you weren't overloading the
F_SETLEASE command with new flags, and instead added new
F_SETLAYOUT/F_GETLAYOUT cmd values.

You'd then be free to define a new set of "arg" values for use with
layouts, and there's be a clear distinction interface-wise between
setting a layout and a lease.

-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH v2 01/19] fs/locks: Export F_LAYOUT lease to user space

2019-08-14 Thread Jeff Layton
On Wed, 2019-08-14 at 18:05 +1000, Dave Chinner wrote:
> On Mon, Aug 12, 2019 at 10:36:26AM -0700, Ira Weiny wrote:
> > On Sat, Aug 10, 2019 at 09:52:31AM +1000, Dave Chinner wrote:
> > > On Fri, Aug 09, 2019 at 03:58:15PM -0700, ira.we...@intel.com wrote:
> > > > +   /*
> > > > +* NOTE on F_LAYOUT lease
> > > > +*
> > > > +* LAYOUT lease types are taken on files which the user knows 
> > > > that
> > > > +* they will be pinning in memory for some indeterminate amount 
> > > > of
> > > > +* time.
> > > 
> > > Indeed, layout leases have nothing to do with pinning of memory.
> > 
> > Yep, Fair enough.  I'll rework the comment.
> > 
> > > That's something an application taht uses layout leases might do,
> > > but it largely irrelevant to the functionality layout leases
> > > provide. What needs to be done here is explain what the layout lease
> > > API actually guarantees w.r.t. the physical file layout, not what
> > > some application is going to do with a lease. e.g.
> > > 
> > >   The layout lease F_RDLCK guarantees that the holder will be
> > >   notified that the physical file layout is about to be
> > >   changed, and that it needs to release any resources it has
> > >   over the range of this lease, drop the lease and then
> > >   request it again to wait for the kernel to finish whatever
> > >   it is doing on that range.
> > > 
> > >   The layout lease F_RDLCK also allows the holder to modify
> > >   the physical layout of the file. If an operation from the
> > >   lease holder occurs that would modify the layout, that lease
> > >   holder does not get notification that a change will occur,
> > >   but it will block until all other F_RDLCK leases have been
> > >   released by their holders before going ahead.
> > > 
> > >   If there is a F_WRLCK lease held on the file, then a F_RDLCK
> > >   holder will fail any operation that may modify the physical
> > >   layout of the file. F_WRLCK provides exclusive physical
> > >   modification access to the holder, guaranteeing nothing else
> > >   will change the layout of the file while it holds the lease.
> > > 
> > >   The F_WRLCK holder can change the physical layout of the
> > >   file if it so desires, this will block while F_RDLCK holders
> > >   are notified and release their leases before the
> > >   modification will take place.
> > > 
> > > We need to define the semantics we expose to userspace first.

Absolutely.

> > 
> > Agreed.  I believe I have implemented the semantics you describe above.  Do 
> > I
> > have your permission to use your verbiage as part of reworking the comment 
> > and
> > commit message?
> 
> Of course. :)
> 
> Cheers,
> 

I'll review this in more detail soon, but subsequent postings of the set
should probably also go to linux-api mailing list. This is a significant
API change. It might not also hurt to get the glibc folks involved here
too since you'll probably want to add the constants to the headers there
as well.

Finally, consider going ahead and drafting a patch to the fcntl(2)
manpage if you think you have the API mostly nailed down. This API is a
little counterintuitive (i.e. you can change the layout with an F_RDLCK
lease), so it will need to be very clearly documented. I've also found
that when creating a new API, documenting it tends to help highlight its
warts and areas where the behavior is not clearly defined.

-- 
Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RFC 01/10] fs/locks: Add trace_leases_conflict

2019-06-09 Thread Jeff Layton
On Wed, 2019-06-05 at 18:45 -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Signed-off-by: Ira Weiny 
> ---
>  fs/locks.c  | 20 ++-
>  include/trace/events/filelock.h | 35 +
>  2 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ec1e4a5df629..0cc2b9f30e22 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1534,11 +1534,21 @@ static void time_out_leases(struct inode *inode, 
> struct list_head *dispose)
>  
>  static bool leases_conflict(struct file_lock *lease, struct file_lock 
> *breaker)
>  {
> - if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT))
> - return false;
> - if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
> - return false;
> - return locks_conflict(breaker, lease);
> + bool rc;
> +
> + if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) {
> + rc = false;
> + goto trace;
> + }
> + if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE)) {
> + rc = false;
> + goto trace;
> + }
> +
> + rc = locks_conflict(breaker, lease);
> +trace:
> + trace_leases_conflict(rc, lease, breaker);
> + return rc;
>  }
>  
>  static bool
> diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
> index fad7befa612d..4b735923f2ff 100644
> --- a/include/trace/events/filelock.h
> +++ b/include/trace/events/filelock.h
> @@ -203,6 +203,41 @@ TRACE_EVENT(generic_add_lease,
>   show_fl_type(__entry->fl_type))
>  );
>  
> +TRACE_EVENT(leases_conflict,
> + TP_PROTO(bool conflict, struct file_lock *lease, struct file_lock 
> *breaker),
> +
> + TP_ARGS(conflict, lease, breaker),
> +
> + TP_STRUCT__entry(
> + __field(void *, lease)
> + __field(void *, breaker)
> + __field(unsigned int, l_fl_flags)
> + __field(unsigned int, b_fl_flags)
> + __field(unsigned char, l_fl_type)
> + __field(unsigned char, b_fl_type)
> + __field(bool, conflict)
> + ),
> +
> + TP_fast_assign(
> + __entry->lease = lease;
> + __entry->l_fl_flags = lease->fl_flags;
> + __entry->l_fl_type = lease->fl_type;
> + __entry->breaker = breaker;
> + __entry->b_fl_flags = breaker->fl_flags;
> + __entry->b_fl_type = breaker->fl_type;
> + __entry->conflict = conflict;
> + ),
> +
> + TP_printk("conflict %d: lease=0x%p fl_flags=%s fl_type=%s; breaker=0x%p 
> fl_flags=%s fl_type=%s",
> + __entry->conflict,
> + __entry->lease,
> + show_fl_flags(__entry->l_fl_flags),
> +     show_fl_type(__entry->l_fl_type),
> + __entry->breaker,
> + show_fl_flags(__entry->b_fl_flags),
> + show_fl_type(__entry->b_fl_type))
> +);
> +
>  #endif /* _TRACE_FILELOCK_H */
>  
>  /* This part must be outside protection */

This looks useful. I'll plan to merge this one for v5.3 unless there
are objections.

Reviewed-by: Jeff Layton 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 12/13] dax: handle truncate of dma-busy pages

2017-10-20 Thread Jeff Layton
On Thu, 2017-10-19 at 19:40 -0700, Dan Williams wrote:
> get_user_pages() pins file backed memory pages for access by dma
> devices. However, it only pins the memory pages not the page-to-file
> offset association. If a file is truncated the pages are mapped out of
> the file and dma may continue indefinitely into a page that is owned by
> a device driver. This breaks coherency of the file vs dma, but the
> assumption is that if userspace wants the file-space truncated it does
> not matter what data is inbound from the device, it is not relevant
> anymore.
> 
> The assumptions of the truncate-page-cache model are broken by DAX where
> the target DMA page *is* the filesystem block. Leaving the page pinned
> for DMA, but truncating the file block out of the file, means that the
> filesytem is free to reallocate a block under active DMA to another
> file!
> 
> Here are some possible options for fixing this situation ('truncate' and
> 'fallocate(punch hole)' are synonymous below):
> 
> 1/ Fail truncate while any file blocks might be under dma
> 
> 2/ Block (sleep-wait) truncate while any file blocks might be under
>dma
> 
> 3/ Remap file blocks to a "lost+found"-like file-inode where
>dma can continue and we might see what inbound data from DMA was
>mapped out of the original file. Blocks in this file could be
>freed back to the filesystem when dma eventually ends.
> 
> 4/ Disable dax until option 3 or another long term solution has been
>implemented. However, filesystem-dax is still marked experimental
>for concerns like this.
> 
> Option 1 will throw failures where userspace has never expected them
> before, option 2 might hang the truncating process indefinitely, and
> option 3 requires per filesystem enabling to remap blocks from one inode
> to another.  Option 2 is implemented in this patch for the DAX path with
> the expectation that non-transient users of get_user_pages() (RDMA) are
> disallowed from setting up dax mappings and that the potential delay
> introduced to the truncate path is acceptable compared to the response
> time of the page cache case. This can only be seen as a stop-gap until
> we can solve the problem of safely sequestering unallocated filesystem
> blocks under active dma.
> 

FWIW, I like #3 a lot more than #2 here. I get that it's quite a bit
more work though, so no objection to this as a stop-gap fix.


> The solution introduces a new FL_ALLOCATED lease to pin the allocated
> blocks in a dax file while dma might be accessing them. It behaves
> identically to an FL_LAYOUT lease save for the fact that it is
> immediately sheduled to be reaped, and that the only path that waits for
> its removal is the truncate path. We can not reuse FL_LAYOUT directly
> since that would deadlock in the case where userspace did a direct-I/O
> operation with a target buffer backed by an mmap range of the same file.
> 
> Credit / inspiration for option 3 goes to Dave Hansen, who proposed
> something similar as an alternative way to solve the problem that
> MAP_DIRECT was trying to solve.
> 
> Cc: Jan Kara <j...@suse.cz>
> Cc: Jeff Moyer <jmo...@redhat.com>
> Cc: Dave Chinner <da...@fromorbit.com>
> Cc: Matthew Wilcox <mawil...@microsoft.com>
> Cc: Alexander Viro <v...@zeniv.linux.org.uk>
> Cc: "Darrick J. Wong" <darrick.w...@oracle.com>
> Cc: Ross Zwisler <ross.zwis...@linux.intel.com>
> Cc: Jeff Layton <jlay...@poochiereds.net>
> Cc: "J. Bruce Fields" <bfie...@fieldses.org>
> Cc: Dave Hansen <dave.han...@linux.intel.com>
> Reported-by: Christoph Hellwig <h...@lst.de>
> Signed-off-by: Dan Williams <dan.j.willi...@intel.com>
> ---
>  fs/Kconfig  |1 
>  fs/dax.c|  188 
> +++
>  fs/locks.c  |   17 -
>  include/linux/dax.h |   23 ++
>  include/linux/fs.h  |   22 +-
>  mm/gup.c|   27 ++-
>  6 files changed, 268 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7aee6d699fd6..a7b31a96a753 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -37,6 +37,7 @@ source "fs/f2fs/Kconfig"
>  config FS_DAX
>   bool "Direct Access (DAX) support"
>   depends on MMU
> + depends on FILE_LOCKING
>   depends on !(ARM || MIPS || SPARC)
>   select FS_IOMAP
>   select DAX
> diff --git a/fs/dax.c b/fs/dax.c
> index b03f547b36e7..e0a3958fc5f2 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #inclu