Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-23 Thread Christoph Hellwig
On Sat, Dec 20, 2014 at 06:51:33AM +, Al Viro wrote:
> > The problem is that the use of ->direct_IO by the swap code is a gross
> > layering violation.  ->direct_IO is a callback for the filesystem, and
> > the swap code need to call ->read_iter instead of ->readpage and
> > ->write_tier instead of ->direct_IO, and leave the locking to the
> > filesystem.
> 
> The thing is, ->read_iter() and ->write_iter() might decide to fall back to 
> buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
> with short write in such case.  Other filesystems, OTOH...

We'll just need a ->swap_activate method that makes sure we really do
direct I/O.  For all filesystems currently suporting swap just checking
that all blocks are allocated (as the ->bmap path already does) should
be enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-23 Thread Christoph Hellwig
On Sat, Dec 20, 2014 at 06:51:33AM +, Al Viro wrote:
  The problem is that the use of -direct_IO by the swap code is a gross
  layering violation.  -direct_IO is a callback for the filesystem, and
  the swap code need to call -read_iter instead of -readpage and
  -write_tier instead of -direct_IO, and leave the locking to the
  filesystem.
 
 The thing is, -read_iter() and -write_iter() might decide to fall back to 
 buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
 with short write in such case.  Other filesystems, OTOH...

We'll just need a -swap_activate method that makes sure we really do
direct I/O.  For all filesystems currently suporting swap just checking
that all blocks are allocated (as the -bmap path already does) should
be enough.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-21 Thread Omar Sandoval
On Sat, Dec 20, 2014 at 06:51:33AM +, Al Viro wrote:
> On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
> > On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
> > > On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
> > > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
> > > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
> > > > be held, but most direct_IO implementations do.
> > >   I think you are speaking about direct IO writes only, aren't you? For 
> > > DIO
> > > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
> > > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
> > > for direct IO writes. It uses it's internal rwlock for this (see
> > > xfs_file_dio_aio_write()). So I think this is just wrong.
> > 
> > The problem is that the use of ->direct_IO by the swap code is a gross
> > layering violation.  ->direct_IO is a callback for the filesystem, and
> > the swap code need to call ->read_iter instead of ->readpage and
> > ->write_tier instead of ->direct_IO, and leave the locking to the
> > filesystem.
> 
> The thing is, ->read_iter() and ->write_iter() might decide to fall back to 
> buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
> with short write in such case.  Other filesystems, OTOH...

Alright, now what? Using ->direct_IO directly is pretty much a no go
because of the different locking conventions as was pointed out. Maybe
some "no, really, just direct I/O" iocb flag?

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-21 Thread Omar Sandoval
On Sat, Dec 20, 2014 at 06:51:33AM +, Al Viro wrote:
 On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
  On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
   On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
be held, but most direct_IO implementations do.
 I think you are speaking about direct IO writes only, aren't you? For 
   DIO
   reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
   necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
   for direct IO writes. It uses it's internal rwlock for this (see
   xfs_file_dio_aio_write()). So I think this is just wrong.
  
  The problem is that the use of -direct_IO by the swap code is a gross
  layering violation.  -direct_IO is a callback for the filesystem, and
  the swap code need to call -read_iter instead of -readpage and
  -write_tier instead of -direct_IO, and leave the locking to the
  filesystem.
 
 The thing is, -read_iter() and -write_iter() might decide to fall back to 
 buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
 with short write in such case.  Other filesystems, OTOH...

Alright, now what? Using -direct_IO directly is pretty much a no go
because of the different locking conventions as was pointed out. Maybe
some no, really, just direct I/O iocb flag?

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-19 Thread Al Viro
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
> > On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
> > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
> > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
> > > be held, but most direct_IO implementations do.
> >   I think you are speaking about direct IO writes only, aren't you? For DIO
> > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
> > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
> > for direct IO writes. It uses it's internal rwlock for this (see
> > xfs_file_dio_aio_write()). So I think this is just wrong.
> 
> The problem is that the use of ->direct_IO by the swap code is a gross
> layering violation.  ->direct_IO is a callback for the filesystem, and
> the swap code need to call ->read_iter instead of ->readpage and
> ->write_tier instead of ->direct_IO, and leave the locking to the
> filesystem.

The thing is, ->read_iter() and ->write_iter() might decide to fall back to 
buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
with short write in such case.  Other filesystems, OTOH...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-19 Thread Al Viro
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
 On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
  On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
   The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
   doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
   be held, but most direct_IO implementations do.
I think you are speaking about direct IO writes only, aren't you? For DIO
  reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
  necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
  for direct IO writes. It uses it's internal rwlock for this (see
  xfs_file_dio_aio_write()). So I think this is just wrong.
 
 The problem is that the use of -direct_IO by the swap code is a gross
 layering violation.  -direct_IO is a callback for the filesystem, and
 the swap code need to call -read_iter instead of -readpage and
 -write_tier instead of -direct_IO, and leave the locking to the
 filesystem.

The thing is, -read_iter() and -write_iter() might decide to fall back to 
buffered IO path.  XFS is unusual in that respect - there O_DIRECT ends up
with short write in such case.  Other filesystems, OTOH...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-18 Thread Al Viro
On Thu, Dec 18, 2014 at 10:24:05PM -0800, Omar Sandoval wrote:
> +   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
> +   if (IS_ERR(swap_file) && PTR_ERR(swap_file) == -EINVAL)

ITYM if (swap_file == ERR_PTR(-EINVAL)) here...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-18 Thread Omar Sandoval
On Wed, Dec 17, 2014 at 10:03:13PM +, Al Viro wrote:
> On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
> > On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
> > > See my previous message. If we use O_DIRECT on the original open, then
> > > filesystems that implement bmap but not direct_IO will no longer work.
> > > These are the ones that I found in my tree:
> > 
> > In the long run I don't think they are worth keeping.  But to keep you
> > out of that discussion you can just try an open without O_DIRECT if the
> > open with the flag failed.
> 
> Umm...  That's one possibility, of course (and if swapon(2) is on someone's
> hotpath, I really would like to see what the hell they are doing - it has
> to be interesting in a sick way).

If this is the approach you'd prefer, I'll go ahead and do that for v2.
I personally think it looks pretty kludgey, but I'm fine either way:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..c1b3073 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, 
specialfile, int, swap_flags)
name = NULL;
goto bad_swap;
}
-   swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+   if (IS_ERR(swap_file) && PTR_ERR(swap_file) == -EINVAL)
+   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
if (IS_ERR(swap_file)) {
error = PTR_ERR(swap_file);
swap_file = NULL;

> BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
> write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
>   * it's done before acquiring i_mutex, so it isn't race-free
>   * it's dubious from the POSIX POV - EBUSY isn't in the error
> list for write(2).
>   * other filesystems generally don't have anything of that sort.
> NFS does, but local ones do not...
> Besides, do we even allow swapfiles on AFS?

AFS doesn't implement ->bmap or ->swap_activate, so that code is dead,
probably cargo-culted from the NFS code. It seems pretty pointless, not
only because it's inconsistent with the local filesystems like you
mentioned, but also because it's trivial to bypass with O_DIRECT on NFS:

ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
unsigned long written = 0;
ssize_t result;
size_t count = iov_iter_count(from);
loff_t pos = iocb->ki_pos;

result = nfs_key_timeout_notify(file, inode);
if (result)
return result;

if (file->f_flags & O_DIRECT)
return nfs_file_direct_write(iocb, from, pos);

dprintk("NFS: write(%pD2, %zu@%Ld)\n",
file, count, (long long) pos);

result = -EBUSY;
if (IS_SWAPFILE(inode))
goto out_swapfile;

I think it's safe to scrap that code. However, this also led me to find that
NFS doesn't prevent truncates on an active swapfile. I'm submitting a patch for
that now.

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-18 Thread Omar Sandoval
On Wed, Dec 17, 2014 at 10:03:13PM +, Al Viro wrote:
 On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
  On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
   See my previous message. If we use O_DIRECT on the original open, then
   filesystems that implement bmap but not direct_IO will no longer work.
   These are the ones that I found in my tree:
  
  In the long run I don't think they are worth keeping.  But to keep you
  out of that discussion you can just try an open without O_DIRECT if the
  open with the flag failed.
 
 Umm...  That's one possibility, of course (and if swapon(2) is on someone's
 hotpath, I really would like to see what the hell they are doing - it has
 to be interesting in a sick way).

If this is the approach you'd prefer, I'll go ahead and do that for v2.
I personally think it looks pretty kludgey, but I'm fine either way:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..c1b3073 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, 
specialfile, int, swap_flags)
name = NULL;
goto bad_swap;
}
-   swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+   if (IS_ERR(swap_file)  PTR_ERR(swap_file) == -EINVAL)
+   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
if (IS_ERR(swap_file)) {
error = PTR_ERR(swap_file);
swap_file = NULL;

 BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
 write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
   * it's done before acquiring i_mutex, so it isn't race-free
   * it's dubious from the POSIX POV - EBUSY isn't in the error
 list for write(2).
   * other filesystems generally don't have anything of that sort.
 NFS does, but local ones do not...
 Besides, do we even allow swapfiles on AFS?

AFS doesn't implement -bmap or -swap_activate, so that code is dead,
probably cargo-culted from the NFS code. It seems pretty pointless, not
only because it's inconsistent with the local filesystems like you
mentioned, but also because it's trivial to bypass with O_DIRECT on NFS:

ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb-ki_filp;
struct inode *inode = file_inode(file);
unsigned long written = 0;
ssize_t result;
size_t count = iov_iter_count(from);
loff_t pos = iocb-ki_pos;

result = nfs_key_timeout_notify(file, inode);
if (result)
return result;

if (file-f_flags  O_DIRECT)
return nfs_file_direct_write(iocb, from, pos);

dprintk(NFS: write(%pD2, %zu@%Ld)\n,
file, count, (long long) pos);

result = -EBUSY;
if (IS_SWAPFILE(inode))
goto out_swapfile;

I think it's safe to scrap that code. However, this also led me to find that
NFS doesn't prevent truncates on an active swapfile. I'm submitting a patch for
that now.

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-18 Thread Al Viro
On Thu, Dec 18, 2014 at 10:24:05PM -0800, Omar Sandoval wrote:
 +   swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
 +   if (IS_ERR(swap_file)  PTR_ERR(swap_file) == -EINVAL)

ITYM if (swap_file == ERR_PTR(-EINVAL)) here...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Al Viro
On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
> > See my previous message. If we use O_DIRECT on the original open, then
> > filesystems that implement bmap but not direct_IO will no longer work.
> > These are the ones that I found in my tree:
> 
> In the long run I don't think they are worth keeping.  But to keep you
> out of that discussion you can just try an open without O_DIRECT if the
> open with the flag failed.

Umm...  That's one possibility, of course (and if swapon(2) is on someone's
hotpath, I really would like to see what the hell they are doing - it has
to be interesting in a sick way).

Said that, there's an interesting problem with O_DIRECT.  It's irrelevant
in this case, but it *can* be changed halfway through e.g write(2) and
AFAICS we have at least some suspicious codepaths.  Look at
ext4_file_write_iter(), for example.  We check O_DIRECT, then grab some
locks, then proceed to look at the results of that check, do some work...
and call __generic_file_write_iter(), which checks O_DIRECT again.  If
it has been cleared (or, probably worse, set) it looks like we'll get
an interesting bunch of holes.

Should we just labda-expand that call of __generic_file_write_iter() and
replace its 
if (unlikely(file->f_flags & O_DIRECT)) {
with
if (odirect)
to be guaranteed that it'll match the things we'd done before the call?

I'm looking through the callchains right now in search of similar places
right now, will follow up when I'm done...

BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
* it's done before acquiring i_mutex, so it isn't race-free
* it's dubious from the POSIX POV - EBUSY isn't in the error
list for write(2).
* other filesystems generally don't have anything of that sort.
NFS does, but local ones do not...
Besides, do we even allow swapfiles on AFS?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
> See my previous message. If we use O_DIRECT on the original open, then
> filesystems that implement bmap but not direct_IO will no longer work.
> These are the ones that I found in my tree:

In the long run I don't think they are worth keeping.  But to keep you
out of that discussion you can just try an open without O_DIRECT if the
open with the flag failed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Omar Sandoval
On Wed, Dec 17, 2014 at 12:24:37AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 17, 2014 at 08:20:21AM +, Al Viro wrote:
> > Where the hell would those other references come from?  We open the damn
> > thing in sys_swapon(), never put it into descriptor tables, etc. and
> > the only reason why we use filp_close() instead of fput() is that we
> > would miss ->flush() otherwise.
> > 
> > Said that, why not simply *open* it with O_DIRECT to start with and be done
> > with that?  It's not as if those guys came preopened by caller - swapon(2)
> > gets a pathname and does opening itself.
> 
> Oops, should have dug deeper into the code.  For some reason I assumed
> the fd is passed in from userspace.
> 
> The suggestion from Al is much better, given that we never do normal
> I/O on the swapfile, just the bmap + direct bio submission which I hope
> could go away in favor of the direct I/O variant in the long run.

See my previous message. If we use O_DIRECT on the original open, then
filesystems that implement bmap but not direct_IO will no longer work.
These are the ones that I found in my tree:

adfs
befs
bfs
ecryptfs
efs
freevxfs
hpfs
isofs
minix
ntfs
omfs
qnx4
qnx6
sysv
ufs

Several of these are read only, and I can't imagine that anyone is using
a swapfile on any of the rest, but if someone is, this would be a
regression, wouldn't it?

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Wed, Dec 17, 2014 at 08:20:21AM +, Al Viro wrote:
> Where the hell would those other references come from?  We open the damn
> thing in sys_swapon(), never put it into descriptor tables, etc. and
> the only reason why we use filp_close() instead of fput() is that we
> would miss ->flush() otherwise.
> 
> Said that, why not simply *open* it with O_DIRECT to start with and be done
> with that?  It's not as if those guys came preopened by caller - swapon(2)
> gets a pathname and does opening itself.

Oops, should have dug deeper into the code.  For some reason I assumed
the fd is passed in from userspace.

The suggestion from Al is much better, given that we never do normal
I/O on the swapfile, just the bmap + direct bio submission which I hope
could go away in favor of the direct I/O variant in the long run.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Al Viro
On Wed, Dec 17, 2014 at 12:06:10AM -0800, Christoph Hellwig wrote:

> > This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
> > add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
> > calls filp_close on swap_file, so I don't see why it's necessary to
> > clear the flag.
> 
> filp_lose doesn't nessecarily destroy the file structure, there might be
> other reference to it, e.g. from dup() or descriptor passing.

Where the hell would those other references come from?  We open the damn
thing in sys_swapon(), never put it into descriptor tables, etc. and
the only reason why we use filp_close() instead of fput() is that we
would miss ->flush() otherwise.

Said that, why not simply *open* it with O_DIRECT to start with and be done
with that?  It's not as if those guys came preopened by caller - swapon(2)
gets a pathname and does opening itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Tue, Dec 16, 2014 at 12:56:24AM -0800, Omar Sandoval wrote:
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct 
> *sis, sector_t *span)
> }
>  
> if (mapping->a_ops->swap_activate) {
> +   if (!mapping->a_ops->direct_IO)
> +   return -EINVAL;
> +   swap_file->f_flags |= O_DIRECT;
> ret = mapping->a_ops->swap_activate(sis, swap_file, span);
> if (!ret) {
> sis->flags |= SWP_FILE;

This needs to hold swap_file->f_lock, but otherwise looks good.

> This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
> add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
> calls filp_close on swap_file, so I don't see why it's necessary to
> clear the flag.

filp_lose doesn't nessecarily destroy the file structure, there might be
other reference to it, e.g. from dup() or descriptor passing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Tue, Dec 16, 2014 at 12:56:24AM -0800, Omar Sandoval wrote:
 --- a/mm/swapfile.c
 +++ b/mm/swapfile.c
 @@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct 
 *sis, sector_t *span)
 }
  
 if (mapping-a_ops-swap_activate) {
 +   if (!mapping-a_ops-direct_IO)
 +   return -EINVAL;
 +   swap_file-f_flags |= O_DIRECT;
 ret = mapping-a_ops-swap_activate(sis, swap_file, span);
 if (!ret) {
 sis-flags |= SWP_FILE;

This needs to hold swap_file-f_lock, but otherwise looks good.

 This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
 add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
 calls filp_close on swap_file, so I don't see why it's necessary to
 clear the flag.

filp_lose doesn't nessecarily destroy the file structure, there might be
other reference to it, e.g. from dup() or descriptor passing.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Al Viro
On Wed, Dec 17, 2014 at 12:06:10AM -0800, Christoph Hellwig wrote:

  This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
  add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
  calls filp_close on swap_file, so I don't see why it's necessary to
  clear the flag.
 
 filp_lose doesn't nessecarily destroy the file structure, there might be
 other reference to it, e.g. from dup() or descriptor passing.

Where the hell would those other references come from?  We open the damn
thing in sys_swapon(), never put it into descriptor tables, etc. and
the only reason why we use filp_close() instead of fput() is that we
would miss -flush() otherwise.

Said that, why not simply *open* it with O_DIRECT to start with and be done
with that?  It's not as if those guys came preopened by caller - swapon(2)
gets a pathname and does opening itself.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Wed, Dec 17, 2014 at 08:20:21AM +, Al Viro wrote:
 Where the hell would those other references come from?  We open the damn
 thing in sys_swapon(), never put it into descriptor tables, etc. and
 the only reason why we use filp_close() instead of fput() is that we
 would miss -flush() otherwise.
 
 Said that, why not simply *open* it with O_DIRECT to start with and be done
 with that?  It's not as if those guys came preopened by caller - swapon(2)
 gets a pathname and does opening itself.

Oops, should have dug deeper into the code.  For some reason I assumed
the fd is passed in from userspace.

The suggestion from Al is much better, given that we never do normal
I/O on the swapfile, just the bmap + direct bio submission which I hope
could go away in favor of the direct I/O variant in the long run.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Omar Sandoval
On Wed, Dec 17, 2014 at 12:24:37AM -0800, Christoph Hellwig wrote:
 On Wed, Dec 17, 2014 at 08:20:21AM +, Al Viro wrote:
  Where the hell would those other references come from?  We open the damn
  thing in sys_swapon(), never put it into descriptor tables, etc. and
  the only reason why we use filp_close() instead of fput() is that we
  would miss -flush() otherwise.
  
  Said that, why not simply *open* it with O_DIRECT to start with and be done
  with that?  It's not as if those guys came preopened by caller - swapon(2)
  gets a pathname and does opening itself.
 
 Oops, should have dug deeper into the code.  For some reason I assumed
 the fd is passed in from userspace.
 
 The suggestion from Al is much better, given that we never do normal
 I/O on the swapfile, just the bmap + direct bio submission which I hope
 could go away in favor of the direct I/O variant in the long run.

See my previous message. If we use O_DIRECT on the original open, then
filesystems that implement bmap but not direct_IO will no longer work.
These are the ones that I found in my tree:

adfs
befs
bfs
ecryptfs
efs
freevxfs
hpfs
isofs
minix
ntfs
omfs
qnx4
qnx6
sysv
ufs

Several of these are read only, and I can't imagine that anyone is using
a swapfile on any of the rest, but if someone is, this would be a
regression, wouldn't it?

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Christoph Hellwig
On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
 See my previous message. If we use O_DIRECT on the original open, then
 filesystems that implement bmap but not direct_IO will no longer work.
 These are the ones that I found in my tree:

In the long run I don't think they are worth keeping.  But to keep you
out of that discussion you can just try an open without O_DIRECT if the
open with the flag failed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-17 Thread Al Viro
On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
 On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
  See my previous message. If we use O_DIRECT on the original open, then
  filesystems that implement bmap but not direct_IO will no longer work.
  These are the ones that I found in my tree:
 
 In the long run I don't think they are worth keeping.  But to keep you
 out of that discussion you can just try an open without O_DIRECT if the
 open with the flag failed.

Umm...  That's one possibility, of course (and if swapon(2) is on someone's
hotpath, I really would like to see what the hell they are doing - it has
to be interesting in a sick way).

Said that, there's an interesting problem with O_DIRECT.  It's irrelevant
in this case, but it *can* be changed halfway through e.g write(2) and
AFAICS we have at least some suspicious codepaths.  Look at
ext4_file_write_iter(), for example.  We check O_DIRECT, then grab some
locks, then proceed to look at the results of that check, do some work...
and call __generic_file_write_iter(), which checks O_DIRECT again.  If
it has been cleared (or, probably worse, set) it looks like we'll get
an interesting bunch of holes.

Should we just labda-expand that call of __generic_file_write_iter() and
replace its 
if (unlikely(file-f_flags  O_DIRECT)) {
with
if (odirect)
to be guaranteed that it'll match the things we'd done before the call?

I'm looking through the callchains right now in search of similar places
right now, will follow up when I'm done...

BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
* it's done before acquiring i_mutex, so it isn't race-free
* it's dubious from the POSIX POV - EBUSY isn't in the error
list for write(2).
* other filesystems generally don't have anything of that sort.
NFS does, but local ones do not...
Besides, do we even allow swapfiles on AFS?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-16 Thread Omar Sandoval
On Tue, Dec 16, 2014 at 12:35:43AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 15, 2014 at 02:11:00PM -0800, Omar Sandoval wrote:
> > Ok, I got the swap code working with ->read_iter/->write_iter without
> > too much trouble. I wanted to double check before I submit if there's
> > any gotchas involved with adding the O_DIRECT flag to a file pointer
> > after it has been opened -- swapon opens the swapfile before we know if
> > we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
> > ->{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
> > original open without excluding filesystems that support the old bmap
> > path but not direct I/O.
> 
> In general just adding O_DIRECT is a problem.  However given that the
> swap file is locked against any other access while in use it seems ok
> in this particular case.  Just make sure to clear it on swapoff, and
> write a detailed comment explaining the situation.

I'll admit that I'm a bit confused. I want to do this:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8798b2e..5145c09 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct 
*sis, sector_t *span)
}
 
if (mapping->a_ops->swap_activate) {
+   if (!mapping->a_ops->direct_IO)
+   return -EINVAL;
+   swap_file->f_flags |= O_DIRECT;
ret = mapping->a_ops->swap_activate(sis, swap_file, span);
if (!ret) {
sis->flags |= SWP_FILE;

This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
calls filp_close on swap_file, so I don't see why it's necessary to
clear the flag.

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-16 Thread Christoph Hellwig
On Mon, Dec 15, 2014 at 02:11:00PM -0800, Omar Sandoval wrote:
> Ok, I got the swap code working with ->read_iter/->write_iter without
> too much trouble. I wanted to double check before I submit if there's
> any gotchas involved with adding the O_DIRECT flag to a file pointer
> after it has been opened -- swapon opens the swapfile before we know if
> we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
> ->{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
> original open without excluding filesystems that support the old bmap
> path but not direct I/O.

In general just adding O_DIRECT is a problem.  However given that the
swap file is locked against any other access while in use it seems ok
in this particular case.  Just make sure to clear it on swapoff, and
write a detailed comment explaining the situation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-16 Thread Christoph Hellwig
On Mon, Dec 15, 2014 at 02:11:00PM -0800, Omar Sandoval wrote:
 Ok, I got the swap code working with -read_iter/-write_iter without
 too much trouble. I wanted to double check before I submit if there's
 any gotchas involved with adding the O_DIRECT flag to a file pointer
 after it has been opened -- swapon opens the swapfile before we know if
 we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
 -{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
 original open without excluding filesystems that support the old bmap
 path but not direct I/O.

In general just adding O_DIRECT is a problem.  However given that the
swap file is locked against any other access while in use it seems ok
in this particular case.  Just make sure to clear it on swapoff, and
write a detailed comment explaining the situation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-16 Thread Omar Sandoval
On Tue, Dec 16, 2014 at 12:35:43AM -0800, Christoph Hellwig wrote:
 On Mon, Dec 15, 2014 at 02:11:00PM -0800, Omar Sandoval wrote:
  Ok, I got the swap code working with -read_iter/-write_iter without
  too much trouble. I wanted to double check before I submit if there's
  any gotchas involved with adding the O_DIRECT flag to a file pointer
  after it has been opened -- swapon opens the swapfile before we know if
  we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
  -{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
  original open without excluding filesystems that support the old bmap
  path but not direct I/O.
 
 In general just adding O_DIRECT is a problem.  However given that the
 swap file is locked against any other access while in use it seems ok
 in this particular case.  Just make sure to clear it on swapoff, and
 write a detailed comment explaining the situation.

I'll admit that I'm a bit confused. I want to do this:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8798b2e..5145c09 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct 
*sis, sector_t *span)
}
 
if (mapping-a_ops-swap_activate) {
+   if (!mapping-a_ops-direct_IO)
+   return -EINVAL;
+   swap_file-f_flags |= O_DIRECT;
ret = mapping-a_ops-swap_activate(sis, swap_file, span);
if (!ret) {
sis-flags |= SWP_FILE;

This seems to be more or less equivalent to doing a fcntl(F_SETFL) to
add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff
calls filp_close on swap_file, so I don't see why it's necessary to
clear the flag.

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Omar Sandoval
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
> > On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
> > > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
> > > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
> > > be held, but most direct_IO implementations do.
> >   I think you are speaking about direct IO writes only, aren't you? For DIO
> > reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
> > necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
> > for direct IO writes. It uses it's internal rwlock for this (see
> > xfs_file_dio_aio_write()). So I think this is just wrong.
> 
> The problem is that the use of ->direct_IO by the swap code is a gross
> layering violation.  ->direct_IO is a callback for the filesystem, and
> the swap code need to call ->read_iter instead of ->readpage and
> ->write_tier instead of ->direct_IO, and leave the locking to the
> filesystem.
>
Ok, I got the swap code working with ->read_iter/->write_iter without
too much trouble. I wanted to double check before I submit if there's
any gotchas involved with adding the O_DIRECT flag to a file pointer
after it has been opened -- swapon opens the swapfile before we know if
we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
->{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
original open without excluding filesystems that support the old bmap
path but not direct I/O.

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Christoph Hellwig
On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
> On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
> > The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
> > doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
> > be held, but most direct_IO implementations do.
>   I think you are speaking about direct IO writes only, aren't you? For DIO
> reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
> necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
> for direct IO writes. It uses it's internal rwlock for this (see
> xfs_file_dio_aio_write()). So I think this is just wrong.

The problem is that the use of ->direct_IO by the swap code is a gross
layering violation.  ->direct_IO is a callback for the filesystem, and
the swap code need to call ->read_iter instead of ->readpage and
->write_tier instead of ->direct_IO, and leave the locking to the
filesystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Jan Kara
On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
> The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
> doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
> be held, but most direct_IO implementations do.
  I think you are speaking about direct IO writes only, aren't you? For DIO
reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
for direct IO writes. It uses it's internal rwlock for this (see
xfs_file_dio_aio_write()). So I think this is just wrong.

Honza

> Signed-off-by: Omar Sandoval 
> ---
>  mm/page_io.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 955db8b..1630ac0 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct 
> writeback_control *wbc,
>   if (sis->flags & SWP_FILE) {
>   struct kiocb kiocb;
>   struct file *swap_file = sis->swap_file;
> + struct inode *inode = file_inode(swap_file);
>   struct address_space *mapping = swap_file->f_mapping;
>   struct bio_vec bv = {
>   .bv_page = page,
> @@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct 
> writeback_control *wbc,
>  
>   set_page_writeback(page);
>   unlock_page(page);
> + mutex_lock(>i_mutex);
>   ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
>   , ,
>   kiocb.ki_pos);
> + mutex_unlock(>i_mutex);
>   if (ret == PAGE_SIZE) {
>   count_vm_event(PSWPOUT);
>   ret = 0;
> -- 
> 2.1.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Jan Kara
On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
 The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
 doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
 be held, but most direct_IO implementations do.
  I think you are speaking about direct IO writes only, aren't you? For DIO
reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
for direct IO writes. It uses it's internal rwlock for this (see
xfs_file_dio_aio_write()). So I think this is just wrong.

Honza

 Signed-off-by: Omar Sandoval osan...@osandov.com
 ---
  mm/page_io.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/mm/page_io.c b/mm/page_io.c
 index 955db8b..1630ac0 100644
 --- a/mm/page_io.c
 +++ b/mm/page_io.c
 @@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct 
 writeback_control *wbc,
   if (sis-flags  SWP_FILE) {
   struct kiocb kiocb;
   struct file *swap_file = sis-swap_file;
 + struct inode *inode = file_inode(swap_file);
   struct address_space *mapping = swap_file-f_mapping;
   struct bio_vec bv = {
   .bv_page = page,
 @@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct 
 writeback_control *wbc,
  
   set_page_writeback(page);
   unlock_page(page);
 + mutex_lock(inode-i_mutex);
   ret = mapping-a_ops-direct_IO(ITER_BVEC | WRITE,
   kiocb, from,
   kiocb.ki_pos);
 + mutex_unlock(inode-i_mutex);
   if (ret == PAGE_SIZE) {
   count_vm_event(PSWPOUT);
   ret = 0;
 -- 
 2.1.3
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Christoph Hellwig
On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
 On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
  The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
  doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
  be held, but most direct_IO implementations do.
   I think you are speaking about direct IO writes only, aren't you? For DIO
 reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
 necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
 for direct IO writes. It uses it's internal rwlock for this (see
 xfs_file_dio_aio_write()). So I think this is just wrong.

The problem is that the use of -direct_IO by the swap code is a gross
layering violation.  -direct_IO is a callback for the filesystem, and
the swap code need to call -read_iter instead of -readpage and
-write_tier instead of -direct_IO, and leave the locking to the
filesystem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-15 Thread Omar Sandoval
On Mon, Dec 15, 2014 at 08:56:15AM -0800, Christoph Hellwig wrote:
 On Mon, Dec 15, 2014 at 05:27:05PM +0100, Jan Kara wrote:
  On Sun 14-12-14 21:26:56, Omar Sandoval wrote:
   The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
   doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
   be held, but most direct_IO implementations do.
I think you are speaking about direct IO writes only, aren't you? For DIO
  reads we don't hold i_mutex AFAICS. And also for DIO writes we don't
  necessarily hold i_mutex - see for example XFS which doesn't take i_mutex
  for direct IO writes. It uses it's internal rwlock for this (see
  xfs_file_dio_aio_write()). So I think this is just wrong.
 
 The problem is that the use of -direct_IO by the swap code is a gross
 layering violation.  -direct_IO is a callback for the filesystem, and
 the swap code need to call -read_iter instead of -readpage and
 -write_tier instead of -direct_IO, and leave the locking to the
 filesystem.

Ok, I got the swap code working with -read_iter/-write_iter without
too much trouble. I wanted to double check before I submit if there's
any gotchas involved with adding the O_DIRECT flag to a file pointer
after it has been opened -- swapon opens the swapfile before we know if
we're using the SWP_FILE infrastructure, and we need to add O_DIRECT so
-{read,write}_iter use direct I/O, but we can't add O_DIRECT to the
original open without excluding filesystems that support the old bmap
path but not direct I/O.

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-14 Thread Omar Sandoval
The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
be held, but most direct_IO implementations do.

Signed-off-by: Omar Sandoval 
---
 mm/page_io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index 955db8b..1630ac0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
if (sis->flags & SWP_FILE) {
struct kiocb kiocb;
struct file *swap_file = sis->swap_file;
+   struct inode *inode = file_inode(swap_file);
struct address_space *mapping = swap_file->f_mapping;
struct bio_vec bv = {
.bv_page = page,
@@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
 
set_page_writeback(page);
unlock_page(page);
+   mutex_lock(>i_mutex);
ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
, ,
kiocb.ki_pos);
+   mutex_unlock(>i_mutex);
if (ret == PAGE_SIZE) {
count_vm_event(PSWPOUT);
ret = 0;
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO

2014-12-14 Thread Omar Sandoval
The generic write code locks i_mutex for a direct_IO. Swap-over-NFS
doesn't grab the mutex because nfs_direct_IO doesn't expect i_mutex to
be held, but most direct_IO implementations do.

Signed-off-by: Omar Sandoval osan...@osandov.com
---
 mm/page_io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index 955db8b..1630ac0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -263,6 +263,7 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
if (sis-flags  SWP_FILE) {
struct kiocb kiocb;
struct file *swap_file = sis-swap_file;
+   struct inode *inode = file_inode(swap_file);
struct address_space *mapping = swap_file-f_mapping;
struct bio_vec bv = {
.bv_page = page,
@@ -283,9 +284,11 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
 
set_page_writeback(page);
unlock_page(page);
+   mutex_lock(inode-i_mutex);
ret = mapping-a_ops-direct_IO(ITER_BVEC | WRITE,
kiocb, from,
kiocb.ki_pos);
+   mutex_unlock(inode-i_mutex);
if (ret == PAGE_SIZE) {
count_vm_event(PSWPOUT);
ret = 0;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/