[PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 fs/read_write.c| 12 +++-
 include/linux/fs.h | 14 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 47c1d4484df9..53c816c61122 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -678,16 +678,10 @@ static ssize_t do_iter_readv_writev(struct file *filp, 
struct iov_iter *iter,
struct kiocb kiocb;
ssize_t ret;
 
-   if (flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC))
-   return -EOPNOTSUPP;
-
init_sync_kiocb(, filp);
-   if (flags & RWF_HIPRI)
-   kiocb.ki_flags |= IOCB_HIPRI;
-   if (flags & RWF_DSYNC)
-   kiocb.ki_flags |= IOCB_DSYNC;
-   if (flags & RWF_SYNC)
-   kiocb.ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+   ret = kiocb_set_rw_flags(, flags);
+   if (ret)
+   return ret;
kiocb.ki_pos = *ppos;
 
if (type == READ)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 803e5a9b2654..f53867140f43 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3056,6 +3056,20 @@ static inline int iocb_flags(struct file *file)
return res;
 }
 
+static inline int kiocb_set_rw_flags(struct kiocb *ki, int flags)
+{
+   if (unlikely(flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC)))
+   return -EOPNOTSUPP;
+
+   if (flags & RWF_HIPRI)
+   ki->ki_flags |= IOCB_HIPRI;
+   if (flags & RWF_DSYNC)
+   ki->ki_flags |= IOCB_DSYNC;
+   if (flags & RWF_SYNC)
+   ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+   return 0;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
ino_t res;
-- 
2.12.0



[PATCH 0/10 v10] No wait AIO

2017-06-04 Thread Goldwyn Rodrigues
Formerly known as non-blocking AIO.

This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
 - Block allocation for files
 - Data writebacks for direct I/O
 - Sleeping because of waiting to acquire i_rwsem
 - Congested block device

The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
any of these conditions are met. This way userspace can push most
of the write()s to the kernel to the best of its ability to complete
and if it returns -EAGAIN, can defer it to another thread.

In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
not use aio_flags because it is not currently checked for invalidity
in the kernel.

This feature is provided for direct I/O of asynchronous I/O only. I have
tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
The nowait feature is for request based devices. In the future, I intend to
add support to stacked devices such as md.

Applications will have to check supportability
by sending a async direct write and any other error besides -EAGAIN
would mean it is not supported.

First two patches are prep patches into nowait I/O.

Changes since v1:
 + changed name from _NONBLOCKING to *_NOWAIT
 + filemap_range_has_page call moved to closer to (just before) calling 
filemap_write_and_wait_range().
 + BIO_NOWAIT limited to get_request()
 + XFS fixes 
- included reflink 
- use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
- Translate the flag through IOMAP_NOWAIT (iomap) to check for
  block allocation for the file.
 + ext4 coding style

Changes since v2:
 + Using aio_reserved1 as aio_rw_flags instead of aio_flags
 + blk-mq support
 + xfs uptodate with kernel and reflink changes

 Changes since v3:
  + Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature.
  + Checks in generic_make_request() to make sure BIO_NOWAIT comes in
for async direct writes only.
  + Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT.
This is added (rather not set) to block devices such as dm/md currently.

 Changes since v4:
  + Ported AIO code to use RWF_* flags. Check for RWF_* flags in
generic_file_write_iter().
  + Changed IOCB_RW_FLAGS_NOWAIT to RWF_NOWAIT.

 Changes since v5:
  + BIO_NOWAIT to REQ_NOWAIT
  + Common helper for RWF flags.

 Changes since v6:
  + REQ_NOWAIT will be ignored for request based devices since they
cannot block. So, removed QUEUE_FLAG_NOWAIT since it is not
required in the current implementation. It will be resurrected
when we program for stacked devices.
  + changed kiocb_rw_flags() to kiocb_set_rw_flags() in order to accomodate
for errors. Moved checks in the function.

 Changes since v7:
  + split patches into prep so the main patches are smaller and easier
to understand
  + All patches are reviewed or acked!
 
 Changes since v8:
 + Err out AIO reads with -EINVAL flagged as RWF_NOWAIT

 Changes since v9:
 + Retract - Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
 + XFS returns EAGAIN if extent list is not in memory
 + Man page updates to io_submit with iocb description and nowait features.

-- 
Goldwyn




[PATCH 06/10] fs: Introduce IOMAP_NOWAIT

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps.
This is used by XFS in the XFS patch.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 fs/iomap.c| 2 ++
 include/linux/iomap.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index 4b10892967a5..5d85ec6e7b20 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -879,6 +879,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
} else {
dio->flags |= IOMAP_DIO_WRITE;
flags |= IOMAP_WRITE;
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   flags |= IOMAP_NOWAIT;
}
 
ret = filemap_write_and_wait_range(mapping, start, end);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f753e788da31..69f4e9470084 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -52,6 +52,7 @@ struct iomap {
 #define IOMAP_REPORT   (1 << 2) /* report extent status, e.g. FIEMAP */
 #define IOMAP_FAULT(1 << 3) /* mapping for page fault */
 #define IOMAP_DIRECT   (1 << 4) /* direct I/O */
+#define IOMAP_NOWAIT   (1 << 5) /* Don't wait for writeback */
 
 struct iomap_ops {
/*
-- 
2.12.0



[PATCH 02/10] fs: Introduce filemap_range_has_page()

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

filemap_range_has_page() return true if the file's mapping has
a page within the range mentioned. This function will be used
to check if a write() call will cause a writeback of previous
writes.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 include/linux/fs.h |  2 ++
 mm/filemap.c   | 33 +
 2 files changed, 35 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f53867140f43..dc0ab585cd56 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2517,6 +2517,8 @@ extern int filemap_fdatawait(struct address_space *);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
   loff_t lend);
+extern int filemap_range_has_page(struct address_space *, loff_t lstart,
+ loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f1be573a5e6..87aba7698584 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,6 +376,39 @@ int filemap_flush(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_flush);
 
+/**
+ * filemap_range_has_page - check if a page exists in range.
+ * @mapping:   address space structure to wait for
+ * @start_byte:offset in bytes where the range starts
+ * @end_byte:  offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback.
+ */
+int filemap_range_has_page(struct address_space *mapping,
+  loff_t start_byte, loff_t end_byte)
+{
+   pgoff_t index = start_byte >> PAGE_SHIFT;
+   pgoff_t end = end_byte >> PAGE_SHIFT;
+   struct pagevec pvec;
+   int ret;
+
+   if (end_byte < start_byte)
+   return 0;
+
+   if (mapping->nrpages == 0)
+   return 0;
+
+   pagevec_init(, 0);
+   ret = pagevec_lookup(, mapping, index, 1);
+   if (!ret)
+   return 0;
+   ret = (pvec.pages[0]->index <= end);
+   pagevec_release();
+   return ret;
+}
+EXPORT_SYMBOL(filemap_range_has_page);
+
 static int __filemap_fdatawait_range(struct address_space *mapping,
 loff_t start_byte, loff_t end_byte)
 {
-- 
2.12.0



[PATCH 05/10] fs: return if direct write will trigger writeback

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

Find out if the write will trigger a wait due to writeback. If yes,
return -EAGAIN.

Return -EINVAL for buffered AIO: there are multiple causes of
delay such as page locks, dirty throttling logic, page loading
from disk etc. which cannot be taken care of.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 mm/filemap.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 097213275461..bc146efa6815 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2675,6 +2675,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, 
struct iov_iter *from)
 
pos = iocb->ki_pos;
 
+   if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+   return -EINVAL;
+
if (limit != RLIM_INFINITY) {
if (iocb->ki_pos >= limit) {
send_sig(SIGXFSZ, current, 0);
@@ -2743,9 +2746,17 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 
1);
-   if (written)
-   goto out;
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* If there are pages to writeback, return */
+   if (filemap_range_has_page(inode->i_mapping, pos,
+  pos + iov_iter_count(from)))
+   return -EAGAIN;
+   } else {
+   written = filemap_write_and_wait_range(mapping, pos,
+   pos + write_len - 1);
+   if (written)
+   goto out;
+   }
 
/*
 * After a write we want buffered reads to be sure to go to disk to get
-- 
2.12.0



[PATCH 09/10] xfs: nowait aio support

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable
immediately.

IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin
if it needs allocation either due to file extension, writing to a hole,
or COW or waiting for other DIOs to finish.

Return -EAGAIN if we don't have extent list in memory.

Signed-off-by: Goldwyn Rodrigues 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c  | 19 ++-
 fs/xfs/xfs_iomap.c | 22 ++
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5fb5a0958a14..f87a8a66e6f7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -541,8 +541,11 @@ xfs_file_dio_aio_write(
iolock = XFS_IOLOCK_SHARED;
}
 
-   xfs_ilock(ip, iolock);
-
+   if (!xfs_ilock_nowait(ip, iolock)) {
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EAGAIN;
+   xfs_ilock(ip, iolock);
+   }
ret = xfs_file_aio_write_checks(iocb, from, );
if (ret)
goto out;
@@ -553,9 +556,15 @@ xfs_file_dio_aio_write(
 * otherwise demote the lock if we had to take the exclusive lock
 * for other reasons in xfs_file_aio_write_checks.
 */
-   if (unaligned_io)
-   inode_dio_wait(inode);
-   else if (iolock == XFS_IOLOCK_EXCL) {
+   if (unaligned_io) {
+   /* If we are going to wait for other DIO to finish, bail */
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (atomic_read(>i_dio_count))
+   return -EAGAIN;
+   } else {
+   inode_dio_wait(inode);
+   }
+   } else if (iolock == XFS_IOLOCK_EXCL) {
xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
iolock = XFS_IOLOCK_SHARED;
}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf7304c..05dc87e8c1f5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -995,6 +995,11 @@ xfs_file_iomap_begin(
lockmode = xfs_ilock_data_map_shared(ip);
}
 
+   if ((flags & IOMAP_NOWAIT) && !(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+   error = -EAGAIN;
+   goto out_unlock;
+   }
+
ASSERT(offset <= mp->m_super->s_maxbytes);
if ((xfs_fsize_t)offset + length > mp->m_super->s_maxbytes)
length = mp->m_super->s_maxbytes - offset;
@@ -1016,6 +1021,15 @@ xfs_file_iomap_begin(
 
if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
if (flags & IOMAP_DIRECT) {
+   /*
+* A reflinked inode will result in CoW alloc.
+* FIXME: It could still overwrite on unshared extents
+* and not need allocation.
+*/
+   if (flags & IOMAP_NOWAIT) {
+   error = -EAGAIN;
+   goto out_unlock;
+   }
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, , ,
);
@@ -1033,6 +1047,14 @@ xfs_file_iomap_begin(
 
if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, , nimaps)) {
/*
+* If nowait is set bail since we are going to make
+* allocations.
+*/
+   if (flags & IOMAP_NOWAIT) {
+   error = -EAGAIN;
+   goto out_unlock;
+   }
+   /*
 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
 * pages to keep the chunks of work done where somewhat 
symmetric
 * with the work writeback does. This is a completely arbitrary
-- 
2.12.0



[PATCH 04/10] fs: Introduce RWF_NOWAIT

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

RWF_NOWAIT informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.

RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags.

The check for -EOPNOTSUPP is placed in generic_file_write_iter(). This
is called by most filesystems, either through fsops.write_iter() or through
the function defined by write_iter(). If not, we perform the check defined
by .write_iter() which is called for direct IO specifically.

Filesystems xfs, btrfs and ext4 would be supported in the following patches.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 fs/9p/vfs_file.c| 3 +++
 fs/aio.c| 6 ++
 fs/ceph/file.c  | 3 +++
 fs/cifs/file.c  | 3 +++
 fs/fuse/file.c  | 3 +++
 fs/nfs/direct.c | 3 +++
 fs/ocfs2/file.c | 3 +++
 include/linux/fs.h  | 5 -
 include/uapi/linux/fs.h | 1 +
 mm/filemap.c| 3 +++
 10 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3de3b4a89d89..403681db7723 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -411,6 +411,9 @@ v9fs_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
loff_t origin;
int err = 0;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
retval = generic_write_checks(iocb, from);
if (retval <= 0)
return retval;
diff --git a/fs/aio.c b/fs/aio.c
index 020fa0045e3c..34027b67e2f4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1592,6 +1592,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
goto out_put_req;
}
 
+   if ((req->common.ki_flags & IOCB_NOWAIT) &&
+   !(req->common.ki_flags & IOCB_DIRECT)) {
+   ret = -EOPNOTSUPP;
+   goto out_put_req;
+   }
+
ret = put_user(KIOCB_KEY, _iocb->aio_key);
if (unlikely(ret)) {
pr_debug("EFAULT: aio_key\n");
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 29308a80d66f..366b0bb71f97 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1300,6 +1300,9 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
int err, want, got;
loff_t pos;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
if (ceph_snap(inode) != CEPH_NOSNAP)
return -EROFS;
 
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0fd081bd2a2f..ff84fa9ddb6c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2725,6 +2725,9 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
iov_iter *from)
 * write request.
 */
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
rc = generic_write_checks(iocb, from);
if (rc <= 0)
return rc;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 3ee4fdc3da9e..812c7bd0c290 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1425,6 +1425,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(file);
ssize_t res;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
if (is_bad_inode(inode))
return -EIO;
 
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 6fb9fad2d1e6..c8e7dd76126c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -979,6 +979,9 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct 
iov_iter *iter)
dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n",
file, iov_iter_count(iter), (long long) iocb->ki_pos);
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
result = generic_write_checks(iocb, iter);
if (result <= 0)
return result;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index bfeb647459d9..e7f8ba890305 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2235,6 +2235,9 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
if (count == 0)
return 0;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EOPNOTSUPP;
+
direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0;
 
inode_lock(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dc0ab585cd56..2a7d14af6d12 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -268,6 +268,7 @@ struct writeback_control;
 #define IOCB_DSYNC (1 << 4)
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
+#define IOCB_NOWAIT(1 << 7)
 
 struct kiocb {
struct file *ki_filp;
@@ -3060,7 +3061,7 @@ static inline int iocb_flags(struct file 

[PATCH 08/10] ext4: nowait aio support

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

Return EAGAIN if any of the following checks fail for direct I/O:
  + i_rwsem is lockable
  + Writing beyond end of file (will trigger allocation)
  + Blocks are not allocated at the write location

Signed-off-by: Goldwyn Rodrigues 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 831fd6beebf0..07f08ff2c11b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -216,7 +216,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
return ext4_dax_write_iter(iocb, from);
 #endif
 
-   inode_lock(inode);
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (!inode_trylock(inode))
+   return -EAGAIN;
+   } else {
+   inode_lock(inode);
+   }
+
ret = ext4_write_checks(iocb, from);
if (ret <= 0)
goto out;
@@ -235,9 +241,15 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
 
iocb->private = 
/* Check whether we do a DIO overwrite or not */
-   if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
-   ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
-   overwrite = 1;
+   if (o_direct && !unaligned_aio) {
+   if (ext4_overwrite_io(inode, iocb->ki_pos, 
iov_iter_count(from))) {
+   if (ext4_should_dioread_nolock(inode))
+   overwrite = 1;
+   } else if (iocb->ki_flags & IOCB_NOWAIT) {
+   ret = -EAGAIN;
+   goto out;
+   }
+   }
 
ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
-- 
2.12.0



[PATCH 10/10] btrfs: nowait aio support

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

Return EAGAIN if any of the following checks fail
 + i_rwsem is not lockable
 + NODATACOW or PREALLOC is not set
 + Cannot nocow at the desired location
 + Writing beyond end of file which is not allocated

Acked-by: David Sterba 
Signed-off-by: Goldwyn Rodrigues 
---
 fs/btrfs/file.c  | 25 -
 fs/btrfs/inode.c |  3 +++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index da1096eb1a40..aae088e49915 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1875,12 +1875,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t num_written = 0;
bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
ssize_t err;
-   loff_t pos;
-   size_t count;
+   loff_t pos = iocb->ki_pos;
+   size_t count = iov_iter_count(from);
loff_t oldsize;
int clean_page = 0;
 
-   inode_lock(inode);
+   if ((iocb->ki_flags & IOCB_NOWAIT) &&
+   (iocb->ki_flags & IOCB_DIRECT)) {
+   /* Don't sleep on inode rwsem */
+   if (!inode_trylock(inode))
+   return -EAGAIN;
+   /*
+* We will allocate space in case nodatacow is not set,
+* so bail
+*/
+   if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) ||
+   check_can_nocow(BTRFS_I(inode), pos, ) <= 0) {
+   inode_unlock(inode);
+   return -EAGAIN;
+   }
+   } else
+   inode_lock(inode);
+
err = generic_write_checks(iocb, from);
if (err <= 0) {
inode_unlock(inode);
@@ -1914,8 +1931,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 */
update_time_for_write(inode);
 
-   pos = iocb->ki_pos;
-   count = iov_iter_count(from);
start_pos = round_down(pos, fs_info->sectorsize);
oldsize = i_size_read(inode);
if (start_pos > oldsize) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 17cbe9306faf..2ab71b946829 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8755,6 +8755,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
dio_data.overwrite = 1;
inode_unlock(inode);
relock = true;
+   } else if (iocb->ki_flags & IOCB_NOWAIT) {
+   ret = -EAGAIN;
+   goto out;
}
ret = btrfs_delalloc_reserve_space(inode, offset, count);
if (ret)
-- 
2.12.0



[PATCH 03/10] fs: Use RWF_* flags for AIO operations

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will
carry the RWF_* flags. We cannot use aio_flags because they are not
checked for validity which may break existing applications.

Note, the only place RWF_HIPRI comes in effect is dio_await_one().
All the rest of the locations, aio code return -EIOCBQUEUED before the
checks for RWF_HIPRI.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Goldwyn Rodrigues 
---
 fs/aio.c | 8 +++-
 include/uapi/linux/aio_abi.h | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f52d925ee259..020fa0045e3c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1541,7 +1541,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
ssize_t ret;
 
/* enforce forwards compatibility on users */
-   if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
+   if (unlikely(iocb->aio_reserved2)) {
pr_debug("EINVAL: reserve field set\n");
return -EINVAL;
}
@@ -1586,6 +1586,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
req->common.ki_flags |= IOCB_EVENTFD;
}
 
+   ret = kiocb_set_rw_flags(>common, iocb->aio_rw_flags);
+   if (unlikely(ret)) {
+   pr_debug("EINVAL: aio_rw_flags\n");
+   goto out_put_req;
+   }
+
ret = put_user(KIOCB_KEY, _iocb->aio_key);
if (unlikely(ret)) {
pr_debug("EFAULT: aio_key\n");
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index bb2554f7fbd1..a2d4a8ac94ca 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -79,7 +79,7 @@ struct io_event {
 struct iocb {
/* these are internal to the kernel/libc. */
__u64   aio_data;   /* data to be returned in event's data */
-   __u32   PADDED(aio_key, aio_reserved1);
+   __u32   PADDED(aio_key, aio_rw_flags);
/* the kernel sets aio_key to the req # */
 
/* common fields */
-- 
2.12.0



[PATCH 07/10] fs: return on congested block device

2017-06-04 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.

Stacked devices such as md (the ones with make_request_fn hooks)
currently are not supported because it may block for housekeeping.
For example, an md can have a part of the device suspended.
For this reason, only request based devices are supported.
In the future, this feature will be expanded to stacked devices
by teaching them how to handle the REQ_NOWAIT flags.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Goldwyn Rodrigues 
---
 block/blk-core.c  | 24 ++--
 block/blk-mq-sched.c  |  3 +++
 block/blk-mq.c|  2 ++
 fs/direct-io.c| 10 --
 include/linux/bio.h   |  6 ++
 include/linux/blk_types.h |  2 ++
 6 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a7421b772d0e..a6ee659fd56b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1256,6 +1256,11 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
if (!IS_ERR(rq))
return rq;
 
+   if (op & REQ_NOWAIT) {
+   blk_put_rl(rl);
+   return ERR_PTR(-EAGAIN);
+   }
+
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) 
{
blk_put_rl(rl);
return rq;
@@ -1900,6 +1905,17 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
+   /*
+* For a REQ_NOWAIT based request, return -EOPNOTSUPP
+* if queue does not have QUEUE_FLAG_NOWAIT_SUPPORT set
+* and if it is not a request based queue.
+*/
+
+   if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) {
+   err = -EOPNOTSUPP;
+   goto end_io;
+   }
+
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(_to_disk(part)->part0,
@@ -2057,7 +2073,7 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
-   if (likely(blk_queue_enter(q, false) == 0)) {
+   if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
struct bio_list lower, same;
 
/* Create a fresh bio_list for all subordinate requests 
*/
@@ -2082,7 +2098,11 @@ blk_qc_t generic_make_request(struct bio *bio)
bio_list_merge(_list_on_stack[0], );
bio_list_merge(_list_on_stack[0], 
_list_on_stack[1]);
} else {
-   bio_io_error(bio);
+   if (unlikely(!blk_queue_dying(q) &&
+   (bio->bi_opf & REQ_NOWAIT)))
+   bio_wouldblock_error(bio);
+   else
+   bio_io_error(bio);
}
bio = bio_list_pop(_list_on_stack[0]);
} while (bio);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 1f5b692526ae..9a1dea8b964e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -83,6 +83,9 @@ struct request *blk_mq_sched_get_request(struct request_queue 
*q,
if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
+   if (op & REQ_NOWAIT)
+   data->flags |= BLK_MQ_REQ_NOWAIT;
+
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1bcccedcc74f..b0608f1955b2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1556,6 +1556,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue 
*q, struct bio *bio)
rq = blk_mq_sched_get_request(q, bio, bio->bi_opf, );
if (unlikely(!rq)) {
__wbt_done(q->rq_wb, wb_acct);
+   if (bio->bi_opf & REQ_NOWAIT)
+   bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
}
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea77de8..139ebd5ae1c7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -480,8 +480,12 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
unsigned i;
int err;
 
-   if (bio->bi_error)
-   dio->io_error = -EIO;
+   if (bio->bi_error) {
+   if (bio->bi_error == -EAGAIN && (bio->bi_opf & REQ_NOWAIT))
+   dio->io_error = -EAGAIN;
+   else
+   dio->io_error = -EIO;
+   }
 
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
err = bio->bi_error;
@@ -1197,6 +1201,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,

Re: [PATCH] nvme: fix hang in remove path

2017-06-04 Thread Sagi Grimberg



It would make sense to still add:

if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
return

inside nvme_configure_apst at the top irrespective of this change.


I'm not sure what is the value given that it is taken care of in
.queue_rq?


Re: [PATCH 4/8] genirq/affinity: assign vectors to all present CPUs

2017-06-04 Thread Sagi Grimberg



On 03/06/17 17:03, Christoph Hellwig wrote:

Currently we only assign spread vectors to online CPUs, which ties the
IRQ mapping to the currently online devices and doesn't deal nicely with
the fact that CPUs could come and go rapidly due to e.g. power management.

Instead assign vectors to all present CPUs to avoid this churn.

For this we have to build a map of all possible CPUs for a give node, as


s/give/given/

Reviewed-by: Sagi Grimberg 


Re: [PATCH 2/8] genirq: move pending helpers to internal.h

2017-06-04 Thread Sagi Grimberg

Looks good to me,

Reviewed-by: Sagi Grimberg 


Re: [PATCH 3/8] genirq/affinity: factor out a irq_affinity_set helper

2017-06-04 Thread Sagi Grimberg

Looks good to me,

Reviewed-by: Sagi Grimberg 


Re: [PATCH 1/8] genirq: allow assigning affinity to present but not online CPUs

2017-06-04 Thread Sagi Grimberg

Looks good to me,

Reviewed-by: Sagi Grimberg 


Re: [PATCH 8/8] nvme: allocate queues for all possible CPUs

2017-06-04 Thread Sagi Grimberg

Reviewed-by: Sagi Grimberg 


Re: [PATCH 6/8] blk-mq: include all present CPUs in the default queue mapping

2017-06-04 Thread Sagi Grimberg

Looks good,

Reviewed-by: Sagi Grimberg 


Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-04 Thread Sagi Grimberg

Nice cleanup!

Reviewed-by: Sagi Grimberg 


[PATCH 6/6] sd: add support for TCG OPAL self encrypting disks

2017-06-04 Thread Christoph Hellwig
Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver
and the Security In/Out commands.

Note that I don't know of any actual SCSI disks that do support TCG OPAL,
but this is required to support ATA disks through libata.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 44 
 drivers/scsi/sd.h |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b6bb4e0ce0e3..782f909a223c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -643,6 +644,26 @@ static void scsi_disk_put(struct scsi_disk *sdkp)
mutex_unlock(_ref_mutex);
 }
 
+#ifdef CONFIG_BLK_SED_OPAL
+static int sd_sec_submit(void *data, u16 spsp, u8 secp, void *buffer,
+   size_t len, bool send)
+{
+   struct scsi_device *sdev = data;
+   u8 cdb[12] = { 0, };
+   int ret;
+
+   cdb[0] = send ? SECURITY_PROTOCOL_OUT : SECURITY_PROTOCOL_IN;
+   cdb[1] = secp;
+   put_unaligned_be16(spsp, [2]);
+   put_unaligned_be32(len, [6]);
+
+   ret = scsi_execute_req(sdev, cdb,
+   send ? DMA_TO_DEVICE : DMA_FROM_DEVICE,
+   buffer, len, NULL, SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+   return ret <= 0 ? ret : -EIO;
+}
+#endif /* CONFIG_BLK_SED_OPAL */
+
 static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd,
   unsigned int dix, unsigned int dif)
 {
@@ -1454,6 +1475,9 @@ static int sd_ioctl(struct block_device *bdev, fmode_t 
mode,
if (error)
goto out;
 
+   if (is_sed_ioctl(cmd))
+   return sed_ioctl(sdkp->opal_dev, cmd, p);
+
/*
 * Send SCSI addressing ioctls directly to mid level, send other
 * ioctls to block level and then onto mid level if they can't be
@@ -3014,6 +3038,17 @@ static void sd_read_write_same(struct scsi_disk *sdkp, 
unsigned char *buffer)
sdkp->ws10 = 1;
 }
 
+static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+   struct scsi_device *sdev = sdkp->device;
+
+   if (scsi_report_opcode(sdev, buffer, SD_BUF_SIZE,
+   SECURITY_PROTOCOL_IN) == 1 &&
+   scsi_report_opcode(sdev, buffer, SD_BUF_SIZE,
+   SECURITY_PROTOCOL_OUT) == 1)
+   sdkp->security = 1;
+}
+
 /**
  * sd_revalidate_disk - called the first time a new disk is seen,
  * performs disk spin up, read_capacity, etc.
@@ -3067,6 +3102,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_cache_type(sdkp, buffer);
sd_read_app_tag_own(sdkp, buffer);
sd_read_write_same(sdkp, buffer);
+   sd_read_security(sdkp, buffer);
}
 
sdkp->first_scan = 0;
@@ -3227,6 +3263,12 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
 
sd_revalidate_disk(gd);
 
+   if (sdkp->security) {
+   sdkp->opal_dev = init_opal_dev(sdp, _sec_submit);
+   if (sdkp->opal_dev)
+   sd_printk(KERN_NOTICE, sdkp, "supports TCG Opal\n");
+   }
+
sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
  sdp->removable ? "removable " : "");
scsi_autopm_put_device(sdp);
@@ -3376,6 +3418,8 @@ static int sd_remove(struct device *dev)
 
sd_zbc_remove(sdkp);
 
+   free_opal_dev(sdkp->opal_dev);
+
blk_register_region(devt, SD_MINORS, NULL,
sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 61d02efd366c..99c4dde9b6bf 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -71,6 +71,7 @@ struct scsi_disk {
struct scsi_device *device;
struct device   dev;
struct gendisk  *disk;
+   struct opal_dev *opal_dev;
 #ifdef CONFIG_BLK_DEV_ZONED
unsigned intnr_zones;
unsigned intzone_blocks;
@@ -114,6 +115,7 @@ struct scsi_disk {
unsignedrc_basis: 2;
unsignedzoned: 2;
unsignedurswrz : 1;
+   unsignedsecurity : 1;
unsignedignore_medium_access_errors : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
-- 
2.11.0



[PATCH 5/6] libata: implement SECURITY PROTOCOL IN/OUT

2017-06-04 Thread Christoph Hellwig
This allows us to use the generic OPAL code with ATA devices.

Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-core.c | 32 
 drivers/ata/libata-scsi.c | 76 +++
 include/linux/ata.h   |  1 +
 include/linux/libata.h|  1 +
 4 files changed, 110 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index f57131115594..6eb08595a1b5 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2405,6 +2405,37 @@ static void ata_dev_config_zac(struct ata_device *dev)
}
 }
 
+static void ata_dev_config_trusted(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   u64 trusted_cap;
+   unsigned int err;
+
+   if (!ata_identify_page_supported(dev, ATA_LOG_SECURITY)) {
+   ata_dev_warn(dev,
+"Security Log not supported\n");
+   return;
+   }
+
+   err = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE, ATA_LOG_SECURITY,
+   ap->sector_buf, 1);
+   if (err) {
+   ata_dev_dbg(dev,
+   "failed to read Security Log, Emask 0x%x\n", err);
+   return;
+   }
+
+   trusted_cap = get_unaligned_le64(>sector_buf[40]);
+   if (!(trusted_cap & (1ULL << 63))) {
+   ata_dev_dbg(dev,
+   "Trusted Computing capability qword not valid!\n");
+   return;
+   }
+
+   if (trusted_cap & (1 << 0))
+   dev->flags |= ATA_DFLAG_TRUSTED;
+}
+
 /**
  * ata_dev_configure - Configure the specified ATA/ATAPI device
  * @dev: Target device to configure
@@ -2629,6 +2660,7 @@ int ata_dev_configure(struct ata_device *dev)
}
ata_dev_config_sense_reporting(dev);
ata_dev_config_zac(dev);
+   ata_dev_config_trusted(dev);
dev->cdb_len = 16;
}
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 49ba9834c715..3d28f2bd79af 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3563,6 +3563,11 @@ static unsigned int ata_scsiop_maint_in(struct 
ata_scsi_args *args, u8 *rbuf)
dev->class == ATA_DEV_ZAC)
supported = 3;
break;
+   case SECURITY_PROTOCOL_IN:
+   case SECURITY_PROTOCOL_OUT:
+   if (dev->flags & ATA_DFLAG_TRUSTED)
+   supported = 3;
+   break;
default:
break;
}
@@ -4067,6 +4072,71 @@ static unsigned int ata_scsi_mode_select_xlat(struct 
ata_queued_cmd *qc)
return 1;
 }
 
+static u8 ata_scsi_trusted_op(u32 len, bool send, bool dma)
+{
+   if (len == 0)
+   return ATA_CMD_TRUSTED_NONDATA;
+   else if (send)
+   return dma ? ATA_CMD_TRUSTED_SND_DMA : ATA_CMD_TRUSTED_SND;
+   else
+   return dma ? ATA_CMD_TRUSTED_RCV_DMA : ATA_CMD_TRUSTED_RCV;
+}
+
+static unsigned int ata_scsi_security_inout_xlat(struct ata_queued_cmd *qc)
+{
+   struct scsi_cmnd *scmd = qc->scsicmd;
+   const u8 *cdb = scmd->cmnd;
+   struct ata_taskfile *tf = >tf;
+   u8 secp = cdb[1];
+   bool send = (cdb[0] == SECURITY_PROTOCOL_OUT);
+   u16 spsp = get_unaligned_be16([2]);
+   u32 len = get_unaligned_be32([6]);
+   bool dma = !(qc->dev->flags & ATA_DFLAG_PIO);
+
+   /*
+* We don't support the ATA "security" protocol.
+*/
+   if (secp == 0xef) {
+   ata_scsi_set_invalid_field(qc->dev, scmd, 1, 0);
+   return 1;
+   }
+
+   if (cdb[4] & 7) { /* INC_512 */
+   if (len > 0x) {
+   ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0);
+   return 1;
+   }
+   } else {
+   if (len > 0x01fffe00) {
+   ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0);
+   return 1;
+   }
+
+   /* convert to the sector-based ATA addressing */
+   len = (len + 511) / 512;
+   }
+
+   tf->protocol = dma ? ATA_PROT_DMA : ATA_PROT_PIO;
+   tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_LBA;
+   if (send)
+   tf->flags |= ATA_TFLAG_WRITE;
+   tf->command = ata_scsi_trusted_op(len, send, dma);
+   tf->feature = secp;
+   tf->lbam = spsp & 0xff;
+   tf->lbah = spsp >> 8;
+
+   if (len) {
+   tf->nsect = len & 0xff;
+   tf->lbal = len >> 8;
+   } else {
+   if (!send)
+   tf->lbah = (1 << 7);
+   }
+
+   ata_qc_set_pc_nbytes(qc);
+   return 0;
+}
+
 /**
  * ata_get_xlat_func - check if SCSI to ATA translation is possible
  * @dev: ATA device
@@ -4118,6 +4188,12 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct 
ata_device 

[PATCH 4/6] libata: factor out a ata_identify_page_supported helper

2017-06-04 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-core.c | 59 +--
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 445e7050637b..f57131115594 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2120,6 +2120,37 @@ static bool ata_log_supported(struct ata_device *dev, u8 
log)
return get_unaligned_le16(>sector_buf[log * 2]) ? true : false;
 }
 
+static bool ata_identify_page_supported(struct ata_device *dev, u8 page)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err, i;
+
+   if (!ata_log_supported(dev, ATA_LOG_IDENTIFY_DEVICE)) {
+   ata_dev_warn(dev, "ATA Identify Device Log not supported\n");
+   return false;
+   }
+
+   /*
+* Read IDENTIFY DEVICE data log, page 0, to figure out if the page is
+* supported.
+*/
+   err = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE, 0, ap->sector_buf,
+   1);
+   if (err) {
+   ata_dev_info(dev,
+"failed to get Device Identify Log Emask 0x%x\n",
+err);
+   return false;
+   }
+
+   for (i = 0; i < ap->sector_buf[8]; i++) {
+   if (ap->sector_buf[9 + i] == page)
+   return true;
+   }
+
+   return false;
+}
+
 static int ata_do_link_spd_horkage(struct ata_device *dev)
 {
struct ata_link *plink = ata_dev_phys_link(dev);
@@ -2325,8 +2356,6 @@ static void ata_dev_config_zac(struct ata_device *dev)
struct ata_port *ap = dev->link->ap;
unsigned int err_mask;
u8 *identify_buf = ap->sector_buf;
-   int i, found = 0;
-   u16 log_pages;
 
dev->zac_zones_optimal_open = U32_MAX;
dev->zac_zones_optimal_nonseq = U32_MAX;
@@ -2346,31 +2375,7 @@ static void ata_dev_config_zac(struct ata_device *dev)
if (!(dev->flags & ATA_DFLAG_ZAC))
return;
 
-   if (!ata_log_supported(dev, ATA_LOG_IDENTIFY_DEVICE)) {
-   ata_dev_warn(dev, "ATA Identify Device Log not supported\n");
-   return;
-   }
-
-   /*
-* Read IDENTIFY DEVICE data log, page 0, to figure out
-* if page 9 is supported.
-*/
-   err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE, 0,
-identify_buf, 1);
-   if (err_mask) {
-   ata_dev_info(dev,
-"failed to get Device Identify Log Emask 0x%x\n",
-err_mask);
-   return;
-   }
-   log_pages = identify_buf[8];
-   for (i = 0; i < log_pages; i++) {
-   if (identify_buf[9 + i] == ATA_LOG_ZONED_INFORMATION) {
-   found++;
-   break;
-   }
-   }
-   if (!found) {
+   if (!ata_identify_page_supported(dev, ATA_LOG_ZONED_INFORMATION)) {
ata_dev_warn(dev,
 "ATA Zoned Information Log not supported\n");
return;
-- 
2.11.0



TCG Opal support for libata

2017-06-04 Thread Christoph Hellwig
Hi all,

this series adds support for using our new generic TCG OPAL code with
SATA disks, and as side effect for SCSI disks (although so far it doesn't
seem like none of those actually exist).


[PATCH 1/6] libata: move ata_read_log_page to libata-core.c

2017-06-04 Thread Christoph Hellwig
It is core functionality, and only one of the users is in the EH code.

Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-core.c | 64 +++
 drivers/ata/libata-eh.c   | 64 ---
 drivers/ata/libata.h  |  4 +--
 3 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 2d83b8c75965..d4bab5052268 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2047,6 +2047,70 @@ int ata_dev_read_id(struct ata_device *dev, unsigned int 
*p_class,
return rc;
 }
 
+/**
+ * ata_read_log_page - read a specific log page
+ * @dev: target device
+ * @log: log to read
+ * @page: page to read
+ * @buf: buffer to store read page
+ * @sectors: number of sectors to read
+ *
+ * Read log page using READ_LOG_EXT command.
+ *
+ * LOCKING:
+ * Kernel thread context (may sleep).
+ *
+ * RETURNS:
+ * 0 on success, AC_ERR_* mask otherwise.
+ */
+unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
+  u8 page, void *buf, unsigned int sectors)
+{
+   unsigned long ap_flags = dev->link->ap->flags;
+   struct ata_taskfile tf;
+   unsigned int err_mask;
+   bool dma = false;
+
+   DPRINTK("read log page - log 0x%x, page 0x%x\n", log, page);
+
+   /*
+* Return error without actually issuing the command on controllers
+* which e.g. lockup on a read log page.
+*/
+   if (ap_flags & ATA_FLAG_NO_LOG_PAGE)
+   return AC_ERR_DEV;
+
+retry:
+   ata_tf_init(dev, );
+   if (dev->dma_mode && ata_id_has_read_log_dma_ext(dev->id) &&
+   !(dev->horkage & ATA_HORKAGE_NO_NCQ_LOG)) {
+   tf.command = ATA_CMD_READ_LOG_DMA_EXT;
+   tf.protocol = ATA_PROT_DMA;
+   dma = true;
+   } else {
+   tf.command = ATA_CMD_READ_LOG_EXT;
+   tf.protocol = ATA_PROT_PIO;
+   dma = false;
+   }
+   tf.lbal = log;
+   tf.lbam = page;
+   tf.nsect = sectors;
+   tf.hob_nsect = sectors >> 8;
+   tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_LBA48 | ATA_TFLAG_DEVICE;
+
+   err_mask = ata_exec_internal(dev, , NULL, DMA_FROM_DEVICE,
+buf, sectors * ATA_SECT_SIZE, 0);
+
+   if (err_mask && dma) {
+   dev->horkage |= ATA_HORKAGE_NO_NCQ_LOG;
+   ata_dev_warn(dev, "READ LOG DMA EXT failed, trying unqueued\n");
+   goto retry;
+   }
+
+   DPRINTK("EXIT, err_mask=%x\n", err_mask);
+   return err_mask;
+}
+
 static int ata_do_link_spd_horkage(struct ata_device *dev)
 {
struct ata_link *plink = ata_dev_phys_link(dev);
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index ef68232b5222..528a4e1b2af3 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1488,70 +1488,6 @@ static const char *ata_err_string(unsigned int err_mask)
 }
 
 /**
- * ata_read_log_page - read a specific log page
- * @dev: target device
- * @log: log to read
- * @page: page to read
- * @buf: buffer to store read page
- * @sectors: number of sectors to read
- *
- * Read log page using READ_LOG_EXT command.
- *
- * LOCKING:
- * Kernel thread context (may sleep).
- *
- * RETURNS:
- * 0 on success, AC_ERR_* mask otherwise.
- */
-unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
-  u8 page, void *buf, unsigned int sectors)
-{
-   unsigned long ap_flags = dev->link->ap->flags;
-   struct ata_taskfile tf;
-   unsigned int err_mask;
-   bool dma = false;
-
-   DPRINTK("read log page - log 0x%x, page 0x%x\n", log, page);
-
-   /*
-* Return error without actually issuing the command on controllers
-* which e.g. lockup on a read log page.
-*/
-   if (ap_flags & ATA_FLAG_NO_LOG_PAGE)
-   return AC_ERR_DEV;
-
-retry:
-   ata_tf_init(dev, );
-   if (dev->dma_mode && ata_id_has_read_log_dma_ext(dev->id) &&
-   !(dev->horkage & ATA_HORKAGE_NO_NCQ_LOG)) {
-   tf.command = ATA_CMD_READ_LOG_DMA_EXT;
-   tf.protocol = ATA_PROT_DMA;
-   dma = true;
-   } else {
-   tf.command = ATA_CMD_READ_LOG_EXT;
-   tf.protocol = ATA_PROT_PIO;
-   dma = false;
-   }
-   tf.lbal = log;
-   tf.lbam = page;
-   tf.nsect = sectors;
-   tf.hob_nsect = sectors >> 8;
-   tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_LBA48 | ATA_TFLAG_DEVICE;
-
-   err_mask = ata_exec_internal(dev, , NULL, DMA_FROM_DEVICE,
-buf, sectors * ATA_SECT_SIZE, 0);
-
-   if (err_mask && dma) {
-   dev->horkage |= ATA_HORKAGE_NO_NCQ_LOG;
-   ata_dev_warn(dev, "READ LOG DMA EXT 

[PATCH 2/6] libata: factor out a ata_log_supported helper

2017-06-04 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-core.c | 59 +--
 1 file changed, 16 insertions(+), 43 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d4bab5052268..0672733997bb 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2111,6 +2111,15 @@ unsigned int ata_read_log_page(struct ata_device *dev, 
u8 log,
return err_mask;
 }
 
+static bool ata_log_supported(struct ata_device *dev, u8 log)
+{
+   struct ata_port *ap = dev->link->ap;
+
+   if (ata_read_log_page(dev, ATA_LOG_DIRECTORY, 0, ap->sector_buf, 1))
+   return false;
+   return get_unaligned_le16(>sector_buf[log * 2]) ? true : false;
+}
+
 static int ata_do_link_spd_horkage(struct ata_device *dev)
 {
struct ata_link *plink = ata_dev_phys_link(dev);
@@ -2158,21 +2167,9 @@ static void ata_dev_config_ncq_send_recv(struct 
ata_device *dev)
 {
struct ata_port *ap = dev->link->ap;
unsigned int err_mask;
-   int log_index = ATA_LOG_NCQ_SEND_RECV * 2;
-   u16 log_pages;
 
-   err_mask = ata_read_log_page(dev, ATA_LOG_DIRECTORY,
-0, ap->sector_buf, 1);
-   if (err_mask) {
-   ata_dev_dbg(dev,
-   "failed to get Log Directory Emask 0x%x\n",
-   err_mask);
-   return;
-   }
-   log_pages = get_unaligned_le16(>sector_buf[log_index]);
-   if (!log_pages) {
-   ata_dev_warn(dev,
-"NCQ Send/Recv Log not supported\n");
+   if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) {
+   ata_dev_warn(dev, "NCQ Send/Recv Log not supported\n");
return;
}
err_mask = ata_read_log_page(dev, ATA_LOG_NCQ_SEND_RECV,
@@ -2199,19 +2196,8 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
 {
struct ata_port *ap = dev->link->ap;
unsigned int err_mask;
-   int log_index = ATA_LOG_NCQ_NON_DATA * 2;
-   u16 log_pages;
 
-   err_mask = ata_read_log_page(dev, ATA_LOG_DIRECTORY,
-0, ap->sector_buf, 1);
-   if (err_mask) {
-   ata_dev_dbg(dev,
-   "failed to get Log Directory Emask 0x%x\n",
-   err_mask);
-   return;
-   }
-   log_pages = get_unaligned_le16(>sector_buf[log_index]);
-   if (!log_pages) {
+   if (!ata_log_supported(dev, ATA_LOG_NCQ_NON_DATA)) {
ata_dev_warn(dev,
 "NCQ Send/Recv Log not supported\n");
return;
@@ -2339,7 +2325,7 @@ static void ata_dev_config_zac(struct ata_device *dev)
struct ata_port *ap = dev->link->ap;
unsigned int err_mask;
u8 *identify_buf = ap->sector_buf;
-   int log_index = ATA_LOG_SATA_ID_DEV_DATA * 2, i, found = 0;
+   int i, found = 0;
u16 log_pages;
 
dev->zac_zones_optimal_open = U32_MAX;
@@ -2360,24 +2346,11 @@ static void ata_dev_config_zac(struct ata_device *dev)
if (!(dev->flags & ATA_DFLAG_ZAC))
return;
 
-   /*
-* Read Log Directory to figure out if IDENTIFY DEVICE log
-* is supported.
-*/
-   err_mask = ata_read_log_page(dev, ATA_LOG_DIRECTORY,
-0, ap->sector_buf, 1);
-   if (err_mask) {
-   ata_dev_info(dev,
-"failed to get Log Directory Emask 0x%x\n",
-err_mask);
-   return;
-   }
-   log_pages = get_unaligned_le16(>sector_buf[log_index]);
-   if (log_pages == 0) {
-   ata_dev_warn(dev,
-"ATA Identify Device Log not supported\n");
+   if (!ata_log_supported(dev, ATA_LOG_SATA_ID_DEV_DATA)) {
+   ata_dev_warn(dev, "ATA Identify Device Log not supported\n");
return;
}
+
/*
 * Read IDENTIFY DEVICE data log, page 0, to figure out
 * if page 9 is supported.
-- 
2.11.0



[PATCH 3/6] libata: clarify log page naming / grouping

2017-06-04 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/ata/libata-core.c | 10 +-
 include/linux/ata.h   | 10 +++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0672733997bb..445e7050637b 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2226,7 +2226,7 @@ static void ata_dev_config_ncq_prio(struct ata_device 
*dev)
}
 
err_mask = ata_read_log_page(dev,
-ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_IDENTIFY_DEVICE,
 ATA_LOG_SATA_SETTINGS,
 ap->sector_buf,
 1);
@@ -2346,7 +2346,7 @@ static void ata_dev_config_zac(struct ata_device *dev)
if (!(dev->flags & ATA_DFLAG_ZAC))
return;
 
-   if (!ata_log_supported(dev, ATA_LOG_SATA_ID_DEV_DATA)) {
+   if (!ata_log_supported(dev, ATA_LOG_IDENTIFY_DEVICE)) {
ata_dev_warn(dev, "ATA Identify Device Log not supported\n");
return;
}
@@ -2355,7 +2355,7 @@ static void ata_dev_config_zac(struct ata_device *dev)
 * Read IDENTIFY DEVICE data log, page 0, to figure out
 * if page 9 is supported.
 */
-   err_mask = ata_read_log_page(dev, ATA_LOG_SATA_ID_DEV_DATA, 0,
+   err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE, 0,
 identify_buf, 1);
if (err_mask) {
ata_dev_info(dev,
@@ -2379,7 +2379,7 @@ static void ata_dev_config_zac(struct ata_device *dev)
/*
 * Read IDENTIFY DEVICE data log, page 9 (Zoned-device information)
 */
-   err_mask = ata_read_log_page(dev, ATA_LOG_SATA_ID_DEV_DATA,
+   err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
 ATA_LOG_ZONED_INFORMATION,
 identify_buf, 1);
if (!err_mask) {
@@ -2608,7 +2608,7 @@ int ata_dev_configure(struct ata_device *dev)
 
dev->flags |= ATA_DFLAG_DEVSLP;
err_mask = ata_read_log_page(dev,
-ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_IDENTIFY_DEVICE,
 ATA_LOG_SATA_SETTINGS,
 sata_setting,
 1);
diff --git a/include/linux/ata.h b/include/linux/ata.h
index ad7d9ee89ff0..44de34c954d8 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -336,11 +336,15 @@ enum {
/* READ_LOG_EXT pages */
ATA_LOG_DIRECTORY   = 0x0,
ATA_LOG_SATA_NCQ= 0x10,
-   ATA_LOG_NCQ_NON_DATA  = 0x12,
-   ATA_LOG_NCQ_SEND_RECV = 0x13,
-   ATA_LOG_SATA_ID_DEV_DATA  = 0x30,
+   ATA_LOG_NCQ_NON_DATA= 0x12,
+   ATA_LOG_NCQ_SEND_RECV   = 0x13,
+   ATA_LOG_IDENTIFY_DEVICE = 0x30,
+
+   /* Identify device log pages: */
ATA_LOG_SATA_SETTINGS = 0x08,
ATA_LOG_ZONED_INFORMATION = 0x09,
+
+   /* Identify device SATA settings log:*/
ATA_LOG_DEVSLP_OFFSET = 0x30,
ATA_LOG_DEVSLP_SIZE   = 0x08,
ATA_LOG_DEVSLP_MDAT   = 0x00,
-- 
2.11.0



Opal userspace

2017-06-04 Thread Christoph Hellwig
Hi Scott,

is https://github.com/ScottyBauer/sed-opal-temp/ still the latest and
greatest in terms of OPAL userspace?  The temp name always sounds a bit
odd..