Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-12 Thread Christoph Hellwig
On Sun, Mar 11, 2018 at 12:16:25PM -0700, Dan Williams wrote:
> I did the rename, and am housing these in fs/dax.c, I assume that's
> what you wanted.

libfs.c would seem ok to, but we're into micro-management land now :)


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-12 Thread Christoph Hellwig
On Sun, Mar 11, 2018 at 12:16:25PM -0700, Dan Williams wrote:
> I did the rename, and am housing these in fs/dax.c, I assume that's
> what you wanted.

libfs.c would seem ok to, but we're into micro-management land now :)


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-11 Thread Dan Williams
On Sat, Mar 10, 2018 at 9:40 AM, Dan Williams  wrote:
> On Sat, Mar 10, 2018 at 1:46 AM, Christoph Hellwig  wrote:
>>> +int dax_set_page_dirty(struct page *page)
>>> +{
>>> + /*
>>> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
>>> +  * tracking in the page object, dax does all dirty tracking in
>>> +  * the inode address_space in response to mkwrite faults. In the
>>> +  * dax case we only need to worry about potentially dirty CPU
>>> +  * caches, not dirty page cache pages to write back.
>>> +  *
>>> +  * This callback is defined to prevent fallback to
>>> +  * __set_page_dirty_buffers() in set_page_dirty().
>>> +  */
>>> + return 0;
>>> +}
>>
>> Make this a generic noop_set_page_dirty maybe?
>>
>>> +EXPORT_SYMBOL(dax_set_page_dirty);
>>> +
>>> +void dax_invalidatepage(struct page *page, unsigned int offset,
>>> + unsigned int length)
>>> +{
>>> + /*
>>> +  * There is no page cache to invalidate in the dax case, however
>>> +  * we need this callback defined to prevent falling back to
>>> +  * block_invalidatepage() in do_invalidatepage().
>>> +  */
>>> +}
>>
>> Same here.
>
> I guess I'm not sure what you mean. These nops are specific to dax I
> don't think they make sense in another context besides dax.
>

I did the rename, and am housing these in fs/dax.c, I assume that's
what you wanted.


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-11 Thread Dan Williams
On Sat, Mar 10, 2018 at 9:40 AM, Dan Williams  wrote:
> On Sat, Mar 10, 2018 at 1:46 AM, Christoph Hellwig  wrote:
>>> +int dax_set_page_dirty(struct page *page)
>>> +{
>>> + /*
>>> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
>>> +  * tracking in the page object, dax does all dirty tracking in
>>> +  * the inode address_space in response to mkwrite faults. In the
>>> +  * dax case we only need to worry about potentially dirty CPU
>>> +  * caches, not dirty page cache pages to write back.
>>> +  *
>>> +  * This callback is defined to prevent fallback to
>>> +  * __set_page_dirty_buffers() in set_page_dirty().
>>> +  */
>>> + return 0;
>>> +}
>>
>> Make this a generic noop_set_page_dirty maybe?
>>
>>> +EXPORT_SYMBOL(dax_set_page_dirty);
>>> +
>>> +void dax_invalidatepage(struct page *page, unsigned int offset,
>>> + unsigned int length)
>>> +{
>>> + /*
>>> +  * There is no page cache to invalidate in the dax case, however
>>> +  * we need this callback defined to prevent falling back to
>>> +  * block_invalidatepage() in do_invalidatepage().
>>> +  */
>>> +}
>>
>> Same here.
>
> I guess I'm not sure what you mean. These nops are specific to dax I
> don't think they make sense in another context besides dax.
>

I did the rename, and am housing these in fs/dax.c, I assume that's
what you wanted.


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-10 Thread Dan Williams
On Sat, Mar 10, 2018 at 1:46 AM, Christoph Hellwig  wrote:
>> +int dax_set_page_dirty(struct page *page)
>> +{
>> + /*
>> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
>> +  * tracking in the page object, dax does all dirty tracking in
>> +  * the inode address_space in response to mkwrite faults. In the
>> +  * dax case we only need to worry about potentially dirty CPU
>> +  * caches, not dirty page cache pages to write back.
>> +  *
>> +  * This callback is defined to prevent fallback to
>> +  * __set_page_dirty_buffers() in set_page_dirty().
>> +  */
>> + return 0;
>> +}
>
> Make this a generic noop_set_page_dirty maybe?
>
>> +EXPORT_SYMBOL(dax_set_page_dirty);
>> +
>> +void dax_invalidatepage(struct page *page, unsigned int offset,
>> + unsigned int length)
>> +{
>> + /*
>> +  * There is no page cache to invalidate in the dax case, however
>> +  * we need this callback defined to prevent falling back to
>> +  * block_invalidatepage() in do_invalidatepage().
>> +  */
>> +}
>
> Same here.

I guess I'm not sure what you mean. These nops are specific to dax I
don't think they make sense in another context besides dax.

>
>> +EXPORT_SYMBOL(dax_invalidatepage);
>
> And EXPORT_SYMBOL_GPL for anything dax-related, please.
>
>> +const struct address_space_operations xfs_dax_aops = {
>> + .writepages = xfs_vm_writepages,
>
> Please split out the DAX case from xfs_vm_writepages.

Will do.

> This patch should probably also split into VFS and XFS parts.

Ok.


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-10 Thread Dan Williams
On Sat, Mar 10, 2018 at 1:46 AM, Christoph Hellwig  wrote:
>> +int dax_set_page_dirty(struct page *page)
>> +{
>> + /*
>> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
>> +  * tracking in the page object, dax does all dirty tracking in
>> +  * the inode address_space in response to mkwrite faults. In the
>> +  * dax case we only need to worry about potentially dirty CPU
>> +  * caches, not dirty page cache pages to write back.
>> +  *
>> +  * This callback is defined to prevent fallback to
>> +  * __set_page_dirty_buffers() in set_page_dirty().
>> +  */
>> + return 0;
>> +}
>
> Make this a generic noop_set_page_dirty maybe?
>
>> +EXPORT_SYMBOL(dax_set_page_dirty);
>> +
>> +void dax_invalidatepage(struct page *page, unsigned int offset,
>> + unsigned int length)
>> +{
>> + /*
>> +  * There is no page cache to invalidate in the dax case, however
>> +  * we need this callback defined to prevent falling back to
>> +  * block_invalidatepage() in do_invalidatepage().
>> +  */
>> +}
>
> Same here.

I guess I'm not sure what you mean. These nops are specific to dax I
don't think they make sense in another context besides dax.

>
>> +EXPORT_SYMBOL(dax_invalidatepage);
>
> And EXPORT_SYMBOL_GPL for anything dax-related, please.
>
>> +const struct address_space_operations xfs_dax_aops = {
>> + .writepages = xfs_vm_writepages,
>
> Please split out the DAX case from xfs_vm_writepages.

Will do.

> This patch should probably also split into VFS and XFS parts.

Ok.


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-10 Thread Christoph Hellwig
> +int dax_set_page_dirty(struct page *page)
> +{
> + /*
> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
> +  * tracking in the page object, dax does all dirty tracking in
> +  * the inode address_space in response to mkwrite faults. In the
> +  * dax case we only need to worry about potentially dirty CPU
> +  * caches, not dirty page cache pages to write back.
> +  *
> +  * This callback is defined to prevent fallback to
> +  * __set_page_dirty_buffers() in set_page_dirty().
> +  */
> + return 0;
> +}

Make this a generic noop_set_page_dirty maybe?

> +EXPORT_SYMBOL(dax_set_page_dirty);
> +
> +void dax_invalidatepage(struct page *page, unsigned int offset,
> + unsigned int length)
> +{
> + /*
> +  * There is no page cache to invalidate in the dax case, however
> +  * we need this callback defined to prevent falling back to
> +  * block_invalidatepage() in do_invalidatepage().
> +  */
> +}

Same here.

> +EXPORT_SYMBOL(dax_invalidatepage);

And EXPORT_SYMBOL_GPL for anything dax-related, please.

> +const struct address_space_operations xfs_dax_aops = {
> + .writepages = xfs_vm_writepages,

Please split out the DAX case from xfs_vm_writepages.

This patch should probably also split into VFS and XFS parts.


Re: [PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-10 Thread Christoph Hellwig
> +int dax_set_page_dirty(struct page *page)
> +{
> + /*
> +  * Unlike __set_page_dirty_no_writeback that handles dirty page
> +  * tracking in the page object, dax does all dirty tracking in
> +  * the inode address_space in response to mkwrite faults. In the
> +  * dax case we only need to worry about potentially dirty CPU
> +  * caches, not dirty page cache pages to write back.
> +  *
> +  * This callback is defined to prevent fallback to
> +  * __set_page_dirty_buffers() in set_page_dirty().
> +  */
> + return 0;
> +}

Make this a generic noop_set_page_dirty maybe?

> +EXPORT_SYMBOL(dax_set_page_dirty);
> +
> +void dax_invalidatepage(struct page *page, unsigned int offset,
> + unsigned int length)
> +{
> + /*
> +  * There is no page cache to invalidate in the dax case, however
> +  * we need this callback defined to prevent falling back to
> +  * block_invalidatepage() in do_invalidatepage().
> +  */
> +}

Same here.

> +EXPORT_SYMBOL(dax_invalidatepage);

And EXPORT_SYMBOL_GPL for anything dax-related, please.

> +const struct address_space_operations xfs_dax_aops = {
> + .writepages = xfs_vm_writepages,

Please split out the DAX case from xfs_vm_writepages.

This patch should probably also split into VFS and XFS parts.


[PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-09 Thread Dan Williams
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings like the
following:

 WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468
 xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs]
 [..]
 CPU: 27 PID: 1783 Comm: dma-collision Tainted: G   O 4.15.0-rc2+ #984
 [..]
 Call Trace:
  set_page_dirty_lock+0x40/0x60
  bio_set_pages_dirty+0x37/0x50
  iomap_dio_actor+0x2b7/0x3b0
  ? iomap_dio_zero+0x110/0x110
  iomap_apply+0xa4/0x110
  iomap_dio_rw+0x29e/0x3b0
  ? iomap_dio_zero+0x110/0x110
  ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_read_iter+0xa0/0xc0 [xfs]
  __vfs_read+0xf9/0x170
  vfs_read+0xa6/0x150
  SyS_pread64+0x93/0xb0
  entry_SYSCALL_64_fastpath+0x1f/0x96

...where the default set_page_dirty() handler assumes that dirty state
is being tracked in 'struct page' flags.

Cc: Jeff Moyer 
Cc: Christoph Hellwig 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Suggested-by: Jan Kara 
Suggested-by: Dave Chinner 
Signed-off-by: Dan Williams 
---
 fs/dax.c|   27 +++
 fs/xfs/xfs_aops.c   |7 +++
 fs/xfs/xfs_aops.h   |1 +
 fs/xfs/xfs_iops.c   |5 -
 include/linux/dax.h |6 ++
 5 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index b646a46e4d12..ba02772fccbc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -46,6 +46,33 @@
 #define PG_PMD_COLOUR  ((PMD_SIZE >> PAGE_SHIFT) - 1)
 #define PG_PMD_NR  (PMD_SIZE >> PAGE_SHIFT)
 
+int dax_set_page_dirty(struct page *page)
+{
+   /*
+* Unlike __set_page_dirty_no_writeback that handles dirty page
+* tracking in the page object, dax does all dirty tracking in
+* the inode address_space in response to mkwrite faults. In the
+* dax case we only need to worry about potentially dirty CPU
+* caches, not dirty page cache pages to write back.
+*
+* This callback is defined to prevent fallback to
+* __set_page_dirty_buffers() in set_page_dirty().
+*/
+   return 0;
+}
+EXPORT_SYMBOL(dax_set_page_dirty);
+
+void dax_invalidatepage(struct page *page, unsigned int offset,
+   unsigned int length)
+{
+   /*
+* There is no page cache to invalidate in the dax case, however
+* we need this callback defined to prevent falling back to
+* block_invalidatepage() in do_invalidatepage().
+*/
+}
+EXPORT_SYMBOL(dax_invalidatepage);
+
 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
 static int __init init_dax_wait_table(void)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 9c6a830da0ee..5788b680fa01 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1505,3 +1505,10 @@ const struct address_space_operations 
xfs_address_space_operations = {
.is_partially_uptodate  = block_is_partially_uptodate,
.error_remove_page  = generic_error_remove_page,
 };
+
+const struct address_space_operations xfs_dax_aops = {
+   .writepages = xfs_vm_writepages,
+   .direct_IO  = xfs_vm_direct_IO,
+   .set_page_dirty = dax_set_page_dirty,
+   .invalidatepage = dax_invalidatepage,
+};
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 88c85ea63da0..69346d460dfa 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -54,6 +54,7 @@ struct xfs_ioend {
 };
 
 extern const struct address_space_operations xfs_address_space_operations;
+extern const struct address_space_operations xfs_dax_aops;
 
 intxfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 56475fcd76f2..951e84df5576 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1272,7 +1272,10 @@ xfs_setup_iops(
case S_IFREG:
inode->i_op = _inode_operations;
inode->i_fop = _file_operations;
-   inode->i_mapping->a_ops = _address_space_operations;
+   if (IS_DAX(inode))
+   inode->i_mapping->a_ops = _dax_aops;
+   else
+   inode->i_mapping->a_ops = _address_space_operations;
break;
case S_IFDIR:
if (xfs_sb_version_hasasciici(_M(inode->i_sb)->m_sb))
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0185ecdae135..3045c0d9c804 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -57,6 +57,9 @@ static inline void fs_put_dax(struct dax_device *dax_dev)
 }
 
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
+int dax_set_page_dirty(struct page *page);
+void dax_invalidatepage(struct page *page, unsigned int offset,

[PATCH v5 02/11] xfs, dax: introduce xfs_dax_aops

2018-03-09 Thread Dan Williams
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings like the
following:

 WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468
 xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs]
 [..]
 CPU: 27 PID: 1783 Comm: dma-collision Tainted: G   O 4.15.0-rc2+ #984
 [..]
 Call Trace:
  set_page_dirty_lock+0x40/0x60
  bio_set_pages_dirty+0x37/0x50
  iomap_dio_actor+0x2b7/0x3b0
  ? iomap_dio_zero+0x110/0x110
  iomap_apply+0xa4/0x110
  iomap_dio_rw+0x29e/0x3b0
  ? iomap_dio_zero+0x110/0x110
  ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_read_iter+0xa0/0xc0 [xfs]
  __vfs_read+0xf9/0x170
  vfs_read+0xa6/0x150
  SyS_pread64+0x93/0xb0
  entry_SYSCALL_64_fastpath+0x1f/0x96

...where the default set_page_dirty() handler assumes that dirty state
is being tracked in 'struct page' flags.

Cc: Jeff Moyer 
Cc: Christoph Hellwig 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Suggested-by: Jan Kara 
Suggested-by: Dave Chinner 
Signed-off-by: Dan Williams 
---
 fs/dax.c|   27 +++
 fs/xfs/xfs_aops.c   |7 +++
 fs/xfs/xfs_aops.h   |1 +
 fs/xfs/xfs_iops.c   |5 -
 include/linux/dax.h |6 ++
 5 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index b646a46e4d12..ba02772fccbc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -46,6 +46,33 @@
 #define PG_PMD_COLOUR  ((PMD_SIZE >> PAGE_SHIFT) - 1)
 #define PG_PMD_NR  (PMD_SIZE >> PAGE_SHIFT)
 
+int dax_set_page_dirty(struct page *page)
+{
+   /*
+* Unlike __set_page_dirty_no_writeback that handles dirty page
+* tracking in the page object, dax does all dirty tracking in
+* the inode address_space in response to mkwrite faults. In the
+* dax case we only need to worry about potentially dirty CPU
+* caches, not dirty page cache pages to write back.
+*
+* This callback is defined to prevent fallback to
+* __set_page_dirty_buffers() in set_page_dirty().
+*/
+   return 0;
+}
+EXPORT_SYMBOL(dax_set_page_dirty);
+
+void dax_invalidatepage(struct page *page, unsigned int offset,
+   unsigned int length)
+{
+   /*
+* There is no page cache to invalidate in the dax case, however
+* we need this callback defined to prevent falling back to
+* block_invalidatepage() in do_invalidatepage().
+*/
+}
+EXPORT_SYMBOL(dax_invalidatepage);
+
 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
 static int __init init_dax_wait_table(void)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 9c6a830da0ee..5788b680fa01 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1505,3 +1505,10 @@ const struct address_space_operations 
xfs_address_space_operations = {
.is_partially_uptodate  = block_is_partially_uptodate,
.error_remove_page  = generic_error_remove_page,
 };
+
+const struct address_space_operations xfs_dax_aops = {
+   .writepages = xfs_vm_writepages,
+   .direct_IO  = xfs_vm_direct_IO,
+   .set_page_dirty = dax_set_page_dirty,
+   .invalidatepage = dax_invalidatepage,
+};
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 88c85ea63da0..69346d460dfa 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -54,6 +54,7 @@ struct xfs_ioend {
 };
 
 extern const struct address_space_operations xfs_address_space_operations;
+extern const struct address_space_operations xfs_dax_aops;
 
 intxfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 56475fcd76f2..951e84df5576 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1272,7 +1272,10 @@ xfs_setup_iops(
case S_IFREG:
inode->i_op = _inode_operations;
inode->i_fop = _file_operations;
-   inode->i_mapping->a_ops = _address_space_operations;
+   if (IS_DAX(inode))
+   inode->i_mapping->a_ops = _dax_aops;
+   else
+   inode->i_mapping->a_ops = _address_space_operations;
break;
case S_IFDIR:
if (xfs_sb_version_hasasciici(_M(inode->i_sb)->m_sb))
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0185ecdae135..3045c0d9c804 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -57,6 +57,9 @@ static inline void fs_put_dax(struct dax_device *dax_dev)
 }
 
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
+int dax_set_page_dirty(struct page *page);
+void dax_invalidatepage(struct page *page, unsigned int offset,
+   unsigned int length);
 #else
 static inline int bdev_dax_supported(struct super_block *sb, int blocksize)
 {
@@ -76,6 +79,9 @@