from:"Lukas Czerner"

Re: [PATCH] Fix use after free in get_tree_bdev()

2020-04-29 Thread Lukas Czerner

On Tue, Apr 28, 2020 at 09:27:48PM +0100, David Howells wrote:
> Commit 6fcf0c72e4b9, a fix to get_tree_bdev() put a missing blkdev_put() in
> the wrong place, before a warnf() that displays the bdev under
> consideration rather after it.
> 
> This results in a silent lockup in printk("%pg") called via warnf() from
> get_tree_bdev() under some circumstances when there's a race with the
> blockdev being frozen.  This can be caused by xfstests/tests/generic/085 in
> combination with Lukas Czerner's ext4 mount API conversion patchset.  It
> looks like it ought to occur with other users of get_tree_bdev() such as
> XFS, but apparently doesn't.
> 
> Fix this by switching the order of the lines.

This fixes the problem I was seeing. Thanks David.

Reviewed-by: Lukas Czerner 

> 
> Fixes: 6fcf0c72e4b9 ("vfs: add missing blkdev_put() in get_tree_bdev()")
> Reported-by: Lukas Czerner 
> Signed-off-by: David Howells 
> cc: Ian Kent 
> cc: Al Viro 
> ---
> 
>  fs/super.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index cd352530eca9..a288cd60d2ae 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1302,8 +1302,8 @@ int get_tree_bdev(struct fs_context *fc,
>   mutex_lock(>bd_fsfreeze_mutex);
>   if (bdev->bd_fsfreeze_count > 0) {
>   mutex_unlock(>bd_fsfreeze_mutex);
> - blkdev_put(bdev, mode);
>   warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
> + blkdev_put(bdev, mode);
>   return -EBUSY;
>   }
>  
> 
>

Re: [PATCH] vfs: Handle fs_param_neg_with_empty

2019-10-16 Thread Lukas Czerner

On Wed, Oct 16, 2019 at 11:37:54AM +0100, David Howells wrote:
> Make fs_param_neg_with_empty work.  It says that a parameter with no value
> or and empty value should be marked as negated.
> 
> This is intended for use with ext4, which hadn't yet been converted.

Hi David,

thanks for the fix, this seems to be working fine for me. However this
will only work for fs_param_is_string, not anything else. I do not need
anything else, but unless you want to make it work for all the value types
some changes in documentation might be needed as well.

Thanks!
-Lukas

> 
> Fixes: 31d921c7fb96 ("vfs: Add configuration parser helpers")
> Reported-by: Lukas Czerner 
> Signed-off-by: David Howells 
> ---
> 
>  fs/fs_parser.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/fs_parser.c b/fs/fs_parser.c
> index d1930adce68d..f95997a76738 100644
> --- a/fs/fs_parser.c
> +++ b/fs/fs_parser.c
> @@ -129,6 +129,11 @@ int fs_parse(struct fs_context *fc,
>   case fs_param_is_string:
>   if (param->type != fs_value_is_string)
>   goto bad_value;
> + if ((p->flags & fs_param_neg_with_empty) &&
> + (!result->has_value || !param->string[0])) {
> + result->negated = true;
> + goto okay;
> + }
>   if (!result->has_value) {
>   if (p->flags & fs_param_v_optional)
>   goto okay;
>

Re: [PATCH v2] VFS: Handle lazytime in do_mount()

2017-09-19 Thread Lukas Czerner

On Tue, Sep 19, 2017 at 12:37:24PM +0200, Markus Trippelsdorf wrote:
> Since commit e462ec50cb5fa ("VFS: Differentiate mount flags (MS_*) from
> internal superblock flags") the lazytime mount option didn't get passed
> on anymore.
> 
> Fix the issue by handling the option in do_mount().
> 
> Signed-off-by: Markus Trippelsdorf <mar...@trippelsdorf.de>
> ---
>  fs/namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 54059b142d6b..b633838b8f02 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2823,7 +2823,8 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
>   SB_MANDLOCK |
>   SB_DIRSYNC |
>   SB_SILENT |
> - SB_POSIXACL);
> + SB_POSIXACL |
> + SB_LAZYTIME);

Looks good. Although I still think that this can be per mountpoint options.

Regardless of that, you can add
Reviewed-by: Lukas Czerner <lczer...@redhat.com>

>  
>   if (flags & MS_REMOUNT)
>   retval = do_remount(, flags, sb_flags, mnt_flags,
> -- 
> Markus

Re: [PATCH v2] VFS: Handle lazytime in do_mount()

2017-09-19 Thread Lukas Czerner

On Tue, Sep 19, 2017 at 12:37:24PM +0200, Markus Trippelsdorf wrote:
> Since commit e462ec50cb5fa ("VFS: Differentiate mount flags (MS_*) from
> internal superblock flags") the lazytime mount option didn't get passed
> on anymore.
> 
> Fix the issue by handling the option in do_mount().
> 
> Signed-off-by: Markus Trippelsdorf 
> ---
>  fs/namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 54059b142d6b..b633838b8f02 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2823,7 +2823,8 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
>   SB_MANDLOCK |
>   SB_DIRSYNC |
>   SB_SILENT |
> - SB_POSIXACL);
> + SB_POSIXACL |
> + SB_LAZYTIME);

Looks good. Although I still think that this can be per mountpoint options.

Regardless of that, you can add
Reviewed-by: Lukas Czerner 

>  
>   if (flags & MS_REMOUNT)
>   retval = do_remount(, flags, sb_flags, mnt_flags,
> -- 
> Markus

[RFC][PATCH] fs: Prevent syncing frozen file system

2015-07-09 Thread Lukas Czerner

Currently we can end up in a deadlock because of broken
sb_start_write -> s_umount ordering.

The race goes like this:

 - write the file
 - unlink the file - final_iput will not be calles as file is opened
 - freeze the file system
 - Now simultaneously close the file and call sync (or syncfs on that
   particular file system). Sync will get to wait_sb_inodes() where it will
   grab the referece to the inode (__iget()) and later to call iput().
   If we manage to close the file and drop the reference in between those
   calls sync will attempt to do a iput_final() because the inode is now
   unlinked and we're holding the last reference to it. This will
   however block on a frozen file system (ext4_delete_inode for
   example).

Note that I've not been able to reproduce the issue, I've only seen this
happen once. However with some instrumentation (like msleep() in the
wait_sb_inodes() it can be achieved.

Fix this by properly doing sb_start_write/sb_end_write to prevent us
from fsfreeze.

Note that with this patch syncfs will block on the frozen file system
which is probably ok, but sync will block if any file system happens to
be frozen - not sure if that's a problem, but it's certainly different
from what we've been used to.

Signed-off-by: Lukas Czerner 
---
 fs/super.c | 7 +++
 fs/sync.c  | 8 
 2 files changed, 15 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index b613723..d337c91 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -514,10 +514,17 @@ void iterate_supers(void (*f)(struct super_block *, void 
*), void *arg)
sb->s_count++;
spin_unlock(_lock);
 
+   /*
+* Whatever we're going to do to the file system we have to
+* make sure that we'll not end up blocking on frozen file
+* system.
+*/
+   sb_start_write(sb);
down_read(>s_umount);
if (sb->s_root && (sb->s_flags & MS_BORN))
f(sb, arg);
up_read(>s_umount);
+   sb_end_write(sb);
 
spin_lock(_lock);
if (p)
diff --git a/fs/sync.c b/fs/sync.c
index fbc98ee..074247f 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -156,9 +156,17 @@ SYSCALL_DEFINE1(syncfs, int, fd)
return -EBADF;
sb = f.file->f_path.dentry->d_sb;
 
+   /*
+* If the file system is frozen we can't proceed because we
+* could potentially block on frozen file system. This would
+* lead to a deadlock, because we'll be holding s_umount which
+* has to be taken in order to thaw the file system as well
+*/
+   sb_start_write(sb);
down_read(>s_umount);
ret = sync_filesystem(sb);
up_read(>s_umount);
+   sb_end_write(sb);
 
fdput(f);
return ret;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] fs: Prevent syncing frozen file system

2015-07-09 Thread Lukas Czerner

Currently we can end up in a deadlock because of broken
sb_start_write - s_umount ordering.

The race goes like this:

 - write the file
 - unlink the file - final_iput will not be calles as file is opened
 - freeze the file system
 - Now simultaneously close the file and call sync (or syncfs on that
   particular file system). Sync will get to wait_sb_inodes() where it will
   grab the referece to the inode (__iget()) and later to call iput().
   If we manage to close the file and drop the reference in between those
   calls sync will attempt to do a iput_final() because the inode is now
   unlinked and we're holding the last reference to it. This will
   however block on a frozen file system (ext4_delete_inode for
   example).

Note that I've not been able to reproduce the issue, I've only seen this
happen once. However with some instrumentation (like msleep() in the
wait_sb_inodes() it can be achieved.

Fix this by properly doing sb_start_write/sb_end_write to prevent us
from fsfreeze.

Note that with this patch syncfs will block on the frozen file system
which is probably ok, but sync will block if any file system happens to
be frozen - not sure if that's a problem, but it's certainly different
from what we've been used to.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/super.c | 7 +++
 fs/sync.c  | 8 
 2 files changed, 15 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index b613723..d337c91 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -514,10 +514,17 @@ void iterate_supers(void (*f)(struct super_block *, void 
*), void *arg)
sb-s_count++;
spin_unlock(sb_lock);
 
+   /*
+* Whatever we're going to do to the file system we have to
+* make sure that we'll not end up blocking on frozen file
+* system.
+*/
+   sb_start_write(sb);
down_read(sb-s_umount);
if (sb-s_root  (sb-s_flags  MS_BORN))
f(sb, arg);
up_read(sb-s_umount);
+   sb_end_write(sb);
 
spin_lock(sb_lock);
if (p)
diff --git a/fs/sync.c b/fs/sync.c
index fbc98ee..074247f 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -156,9 +156,17 @@ SYSCALL_DEFINE1(syncfs, int, fd)
return -EBADF;
sb = f.file-f_path.dentry-d_sb;
 
+   /*
+* If the file system is frozen we can't proceed because we
+* could potentially block on frozen file system. This would
+* lead to a deadlock, because we'll be holding s_umount which
+* has to be taken in order to thaw the file system as well
+*/
+   sb_start_write(sb);
down_read(sb-s_umount);
ret = sync_filesystem(sb);
up_read(sb-s_umount);
+   sb_end_write(sb);
 
fdput(f);
return ret;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 03/20] ext4: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/inode.c |   30 +++---
 include/trace/events/ext4.h |   22 --
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 96d5927..ae58749 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1415,21 +1415,28 @@ static void ext4_da_release_space(struct inode *inode, 
int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-unsigned long offset)
+unsigned int offset,
+unsigned int length)
 {
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
 
+   if (next_off > stop)
+   break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2839,7 +2846,7 @@ static void ext4_da_invalidatepage(struct page *page, 
unsigned int offset,
if (!page_has_buffers(page))
goto out;
 
-   ext4_da_page_release_reservation(page, offset);
+   ext4_da_page_release_reservation(page, offset, length);
 
 out:
ext4_invalidatepage(page, offset, length);
@@ -2993,29 +3000,29 @@ ext4_readpages(struct file *file, struct address_space 
*mapping,
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
 {
-   trace_ext4_invalidatepage(page, offset);
+   trace_ext4_invalidatepage(page, offset, length);
 
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
 
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   block_invalidatepage(page, offset, length);
 }
 
 static int __ext4_journalled_invalidatepage(struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
journal_t *journal = EXT4_JOURNAL(page->mapping->host);
 
-   trace_ext4_journalled_invalidatepage(page, offset);
+   trace_ext4_journalled_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset,
-  PAGE_CACHE_SIZE - offset);
+   return jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 /* Wrapper for aops... */
@@ -3023,7 +3030,7 @@ static void ext4_journalled_invalidatepage(struct page 
*page,
   unsigned int offset,
   unsigned int length)
 {
-   WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
+   WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
 }
 
 static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4627,7 +4634,8 @@ static void ext4_wait_for_tail_page_commit(struct inode 
*inode)
  inode->i_size >> PAGE_CACHE_SHIFT);
if (!page)
return;
-   ret = __ext4_journalled_invalidatepage(page, offset);
+   ret = __ext4_journalled_invalidatepage(page, offset,
+   PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 8ee15b9..dcfce96 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 );
 
 DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),

[PATCH v4 05/20] xfs: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner 
Reviewed-by: Ben Myers 
Cc: x...@oss.sgi.com
---
v4: use xfs_page_class instead of separate tracepoint

 fs/xfs/xfs_aops.c  |9 +
 fs/xfs/xfs_trace.h |   15 ++-
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..55c85ec 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned intoffset,
unsigned intlength)
 {
-   trace_xfs_invalidatepage(page->mapping->host, page, offset);
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   trace_xfs_invalidatepage(page->mapping->host, page, offset,
+length);
+   block_invalidatepage(page, offset, length);
 }
 
 /*
@@ -921,7 +922,7 @@ xfs_vm_writepage(
int count = 0;
int nonblocking = 0;
 
-   trace_xfs_writepage(inode, page, 0);
+   trace_xfs_writepage(inode, page, 0, 0);
 
ASSERT(page_has_buffers(page));
 
@@ -1152,7 +1153,7 @@ xfs_vm_releasepage(
 {
int delalloc, unwritten;
 
-   trace_xfs_releasepage(page->mapping->host, page, 0);
+   trace_xfs_releasepage(page->mapping->host, page, 0, 0);
 
xfs_count_page_state(page, , );
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..7f075ed 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -950,14 +950,16 @@ DEFINE_RW_EVENT(xfs_file_splice_read);
 DEFINE_RW_EVENT(xfs_file_splice_write);
 
 DECLARE_EVENT_CLASS(xfs_page_class,
-   TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
-   TP_ARGS(inode, page, off),
+   TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
+unsigned int len),
+   TP_ARGS(inode, page, off, len),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(pgoff_t, pgoff)
__field(loff_t, size)
__field(unsigned long, offset)
+   __field(unsigned int, length)
__field(int, delalloc)
__field(int, unwritten)
),
@@ -971,24 +973,27 @@ DECLARE_EVENT_CLASS(xfs_page_class,
__entry->pgoff = page_offset(page);
__entry->size = i_size_read(inode);
__entry->offset = off;
+   __entry->length = len;
__entry->delalloc = delalloc;
__entry->unwritten = unwritten;
),
TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
- "delalloc %d unwritten %d",
+ "length %x delalloc %d unwritten %d",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  __entry->ino,
  __entry->pgoff,
  __entry->size,
  __entry->offset,
+ __entry->length,
  __entry->delalloc,
  __entry->unwritten)
 )
 
 #define DEFINE_PAGE_EVENT(name)\
 DEFINE_EVENT(xfs_page_class, name, \
-   TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
\
-   TP_ARGS(inode, page, off))
+   TP_PROTO(struct inode *inode, struct page *page, unsigned long off, \
+unsigned int len), \
+   TP_ARGS(inode, page, off, len))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
 DEFINE_PAGE_EVENT(xfs_invalidatepage);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 06/20] ocfs2: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
Acked-by: Joel Becker 
---
 fs/ocfs2/aops.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset,
-   PAGE_CACHE_SIZE - offset);
+   jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 02/20] jbd2: change jbd2_journal_invalidatepage to accept length

2013-05-14 Thread Lukas Czerner

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/inode.c   |3 ++-
 fs/jbd2/transaction.c |   24 +---
 fs/ocfs2/aops.c   |3 ++-
 include/linux/jbd2.h  |2 +-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2930eb7..96d5927 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3014,7 +3014,8 @@ static int __ext4_journalled_invalidatepage(struct page 
*page,
if (offset == 0)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset);
+   return jbd2_journal_invalidatepage(journal, page, offset,
+  PAGE_CACHE_SIZE - offset);
 }
 
 /* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 10f524c..5d8268a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2034,18 +2034,23 @@ zap_buffer_unlocked:
  * void jbd2_journal_invalidatepage()
  * @journal: journal to use for flush...
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  start of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
  */
 int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;
 
@@ -2054,6 +2059,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2063,10 +2070,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   return 0;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
-   ret = journal_unmap_buffer(journal, bh, offset > 0);
+   ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret < 0)
return ret;
@@ -2077,7 +2087,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset);
+   jbd2_journal_invalidatepage(journal, page, offset,
+   PAGE_CACHE_SIZE - offset);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..682a63c 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t 
*, struct buffer_head *);
 extern int  jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(j

[PATCH v4 01/20] mm: change invalidatepage prototype to accept length

2013-05-14 Thread Lukas Czerner

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 Documentation/filesystems/Locking |6 +++---
 Documentation/filesystems/vfs.txt |   20 ++--
 fs/9p/vfs_addr.c  |5 +++--
 fs/afs/file.c |   10 ++
 fs/btrfs/disk-io.c|3 ++-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 ++-
 fs/buffer.c   |   21 ++---
 fs/ceph/addr.c|5 +++--
 fs/cifs/file.c|5 +++--
 fs/exofs/inode.c  |6 --
 fs/ext3/inode.c   |3 ++-
 fs/ext4/inode.c   |   18 +++---
 fs/f2fs/data.c|3 ++-
 fs/f2fs/node.c|3 ++-
 fs/gfs2/aops.c|8 +---
 fs/jfs/jfs_metapage.c |5 +++--
 fs/logfs/file.c   |3 ++-
 fs/logfs/segment.c|3 ++-
 fs/nfs/file.c |8 +---
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |3 ++-
 fs/reiserfs/inode.c   |3 ++-
 fs/ubifs/file.c   |5 +++--
 fs/xfs/xfs_aops.c |7 ---
 include/linux/buffer_head.h   |3 ++-
 include/linux/fs.h|2 +-
 include/linux/mm.h|3 ++-
 mm/readahead.c|2 +-
 mm/truncate.c |   15 +--
 30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage) (struct page *, unsigned long);
+   void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually 
go away.  Please,
 keep it that way and don't breed new callers.
 
->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
-returns zero on success.  If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 
->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
 ---
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 
 struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage)

[PATCH v4 07/20] ceph: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner 
Acked-by: Sage Weil 
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/addr.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, 
unsigned int offset,
if (!PageDirty(page))
pr_err("%p invalidatepage %p page not dirty\n", inode, page);
 
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
ci = ceph_inode(inode);
-   if (offset == 0) {
-   dout("%p invalidatepage %p idx %lu full dirty page %u\n",
-inode, page, page->index, offset);
+   if (offset == 0 && length == PAGE_CACHE_SIZE) {
+   dout("%p invalidatepage %p idx %lu full dirty page\n",
+inode, page, page->index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page->private = 0;
ClearPagePrivate(page);
} else {
-   dout("%p invalidatepage %p idx %lu partial dirty page\n",
-inode, page, page->index);
+   dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
+inode, page, page->index, offset, length);
}
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 11/20] Revert "ext4: remove no longer used functions in inode.c"

2013-05-14 Thread Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |4 ++
 fs/ext4/inode.c |  120 +++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..9f9719f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2096,6 +2096,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ae58749..11c07e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3569,6 +3569,126 @@ next:
return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from)
+{
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned length;
+   unsigned blocksize;
+   struct inode *inode = mapping->host;
+
+   blocksize = inode->i_sb->s_blocksize;
+   length = blocksize - (offset & (blocksize - 1));
+
+   return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length)
+{
+   ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned blocksize, max, pos;
+   ext4_lblk_t iblock;
+   struct inode *inode = mapping->host;
+   struct buffer_head *bh;
+   struct page *page;
+   int err = 0;
+
+   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+  mapping_gfp_mask(mapping) & ~__GFP_FS);
+   if (!page)
+   return -ENOMEM;
+
+   blocksize = inode->i_sb->s_blocksize;
+   max = blocksize - (offset & (blocksize - 1));
+
+   /*
+* correct length if it does not fall between
+* 'from' and the end of the block
+*/
+   if (length > max || length < 0)
+   length = max;
+
+   iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+   if (!page_has_buffers(page))
+   create_empty_buffers(page, blocksize, 0);
+
+   /* Find the buffer that contains "offset" */
+   bh = page_buffers(page);
+   pos = blocksize;
+   while (offset >= pos) {
+   bh = bh->b_this_page;
+   iblock++;
+   pos += blocksize;
+   }
+
+   err = 0;
+   if (buffer_freed(bh)) {
+   BUFFER_TRACE(bh, "freed: skip");
+   goto unlock;
+   }
+
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "unmapped");
+   ext4_get_block(inode, iblock, bh, 0);
+   /* unmapped? It's a hole - nothing to do */
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "still unmapped");
+   goto unlock;
+   }
+   }
+
+   /* Ok, it's mapped. Make sure it's up-to-date */
+   if (PageUptodate(page))
+   set_buffer_uptodate(bh);
+
+   if (!buffer_uptodate(bh)) {
+   err = -EIO;
+   ll_rw_block(READ, 1, );
+   wait_on_buffer(bh);
+   /* Uhhu

[PATCH v4 08/20] gfs2: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner 
Acked-by: Steven Whitehouse 
Cc: cluster-de...@redhat.com
---
 fs/gfs2/aops.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int length)
 {
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
 
BUG_ON(!PageLocked(page));
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
 
bh = head = page_buffers(page);
do {
+   if (pos + bh->b_size > stop)
+   return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
 out:
-   if (offset == 0)
+   if (!partial_page)
try_to_release_page(page, 0);
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 13/20] Revert "ext4: fix fsx truncate failure"

2013-05-14 Thread Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/inode.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8187c3e..34ebb62 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3937,7 +3937,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode->i_mapping;
-   loff_t page_len;
 
/*
 * There is a possibility that we're either freeing the inode
@@ -3981,14 +3980,8 @@ void ext4_truncate(struct inode *inode)
return;
}
 
-   if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-   if (ext4_discard_partial_page_buffers(handle,
-   mapping, inode->i_size, page_len, 0))
-   goto out_stop;
-   }
+   if (inode->i_size & (inode->i_sb->s_blocksize - 1))
+   ext4_block_truncate_page(handle, mapping, inode->i_size);
 
/*
 * We add the inode to the orphan list, so that if this
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 16/20] ext4: remove unused discard_partial_page_buffers

2013-05-14 Thread Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/ext4.h  |8 --
 fs/ext4/inode.c |  206 ---
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2d4b0aa..019db3c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -581,11 +581,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER   0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED  0x0001
-
-/*
  * ioctl commands
  */
 #defineEXT4_IOC_GETFLAGS   FS_IOC_GETFLAGS
@@ -2102,9 +2097,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 44333a5..f504efa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned 
int offset,
unsigned int length);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head 
*bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page *page, loff_t from,
-   loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3366,209 +3363,6 @@ void ext4_set_aops(struct inode *inode)
inode->i_mapping->a_ops = _aops;
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags)
-{
-   struct inode *inode = mapping->host;
-   struct page *page;
-   int err = 0;
-
-   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
-  mapping_gfp_mask(mapping) & ~__GFP_FS);
-   if (!page)
-   return -ENOMEM;
-
-   err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-   from, length, flags);
-
-   unlock_page(page);
-   page_cache_release(page);
-   return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset "from"
- * from:   The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len:The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped.  This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *hand

[PATCH v4 14/20] ext4: truncate_inode_pages() in orphan cleanup path

2013-05-14 Thread Lukas Czerner

Currently we do not tell mm to zero out tail of the page before truncate
in orphan_cleanup(). This is ok, because the page should not be
uptodate, however this may eventually change and I might cause problems.

Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
for pointing this out.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/super.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dbc7c09..b971066 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2173,6 +2173,7 @@ static void ext4_orphan_cleanup(struct super_block *sb,
jbd_debug(2, "truncating inode %lu to %lld bytes\n",
  inode->i_ino, inode->i_size);
mutex_lock(>i_mutex);
+   truncate_inode_pages(inode->i_mapping, inode->i_size);
ext4_truncate(inode);
mutex_unlock(>i_mutex);
nr_truncates++;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 10/20] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

2013-05-14 Thread Lukas Czerner

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 mm/truncate.c |  104 -
 1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int 
offset,
(*invalidatepage)(page, offset, length);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-   zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-   cleancache_invalidate_page(page->mapping, page);
-   if (page_has_private(page))
-   do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
  * truncate_inode_pages_range - truncate range of pages specified by start & 
end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
 {
-   const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-   const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
-   struct pagevec pvec;
-   pgoff_t index;
-   pgoff_t end;
-   int i;
+   pgoff_t start;  /* inclusive */
+   pgoff_t end;/* exclusive */
+   unsigned intpartial_start;  /* inclusive */
+   unsigned intpartial_end;/* exclusive */
+   struct pagevec  pvec;
+   pgoff_t index;
+   int i;
 
cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;
 
-   BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-   end = (lend >> PAGE_CACHE_SHIFT);
+   /* Offsets within partial pages */
+   partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+   partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+   /*
+* 'start' and 'end' always covers the range of pages to be fully
+* truncated. Partial pages are covered with 'partial_start' at the
+* start of the range and 'partial_end' at the end of the range.
+* Note that 'end' is exclusive while 'lend' is inclusive.
+*/
+   start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+   if (lend == -1)
+   /*
+* lend == -1 indicates end-of-file so we have to set 'end'
+* to the highest possible pgoff_t and since the type is
+* unsigned we're using -1.
+*/
+   end = -1;
+   else
+   end = (lend + 1) >> PAGE_CACHE_SHIFT;
 
pagevec_init(, 0);
index = start;
-   while (index <= end && pagevec_lookup(, mapping, index,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   while (index < end && pagevec_lookup(, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
m

[PATCH v4 09/20] reiserfs: use ->invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner 
Cc: reiserfs-de...@vger.kernel.org
---
 fs/reiserfs/inode.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;
 
BUG_ON(!PageLocked(page));
 
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
 
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   goto out;
+
/*
 * is this block fully invalidated?
 */
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
 * The get_block cached value has been unconditionally invalidated,
 * so real IO is not possible anymore.
 */
-   if (!offset && ret) {
+   if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled

2013-05-14 Thread Lukas Czerner

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/inode.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f504efa..daffbb8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3557,11 +3557,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
if (!S_ISREG(inode->i_mode))
return -EOPNOTSUPP;
 
-   if (EXT4_SB(sb)->s_cluster_ratio > 1) {
-   /* TODO: Add support for bigalloc file systems */
-   return -EOPNOTSUPP;
-   }
-
trace_ext4_punch_hole(inode, offset, length);
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 18/20] ext4: update ext4_ext_remove_space trace point

2013-05-14 Thread Lukas Czerner

Add "end" variable.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/extents.c   |6 +++---
 include/trace/events/ext4.h |   21 ++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 18c3f1a..fb9b414 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2663,7 +2663,7 @@ int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
return PTR_ERR(handle);
 
 again:
-   trace_ext4_ext_remove_space(inode, start, depth);
+   trace_ext4_ext_remove_space(inode, start, end, depth);
 
/*
 * Check if we are removing extents inside the extent tree. If that
@@ -2831,8 +2831,8 @@ again:
}
}
 
-   trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-   path->p_hdr->eh_entries);
+   trace_ext4_ext_remove_space_done(inode, start, end, depth,
+   partial_cluster, path->p_hdr->eh_entries);
 
/* If we still have something in the partial cluster and we have removed
 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index dcfce96..bcb5a02 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ext4_lblk_t end, int depth),
 
-   TP_ARGS(inode, start, depth),
+   TP_ARGS(inode, start, end, depth),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
),
 
@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d",
+   TP_printk("dev %d,%d ino %lu since %u end %u depth %d",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-   ext4_lblk_t partial, __le16 eh_entries),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+int depth, ext4_lblk_t partial, __le16 eh_entries),
 
-   TP_ARGS(inode, start, depth, partial, eh_entries),
+   TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
__field(ext4_lblk_t,partial )
__field(unsigned short, eh_entries  )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
__entry->partial= partial;
__entry->eh_entries = le16_to_cpu(eh_entries);
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+   TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %u "
  "remaining_entries %u",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth,
  (unsigned) __entry->partial,
  (unsigned short) __entry->eh_entries)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 19/20] ext4: make punch hole code path work with bigalloc

2013-05-14 Thread Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^--^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/extents.c   |   69 +++---
 include/trace/events/ext4.h |   25 ---
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index fb9b414..214e68a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2359,7 +2359,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int 
nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ long long *partial_cluster,
  ext4_lblk_t from, ext4_lblk_t to)
 {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2388,7 +2388,8 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 * partial cluster here.
 */
pblk = ext4_ext_pblock(ex) + ee_len - 1;
-   if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+   if ((*partial_cluster > 0) &&
+   (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
 EXT4_C2B(sbi, *partial_cluster),
 sbi->s_cluster_ratio, flags);
@@ -2414,23 +2415,41 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+   unsigned int unaligned;
 
num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
-   ext_debug("free last %u blocks starting %llu\n", num, pblk);
+   /*
+* Usually we want to free partial cluster at the end of the
+* extent, except for the situation when the cluster is still
+* used by any other extent (partial_cluster is negative).
+*/
+   if (*partial_cluster < 0 &&
+   -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+   flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+   ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
 * If the block range to be freed didn't start at the
 * beginning of a cluster, and we removed the entire
-* extent, save the partial cluster here, since we
-* might need to delete if we determine that the
-* truncate operation has removed all of the blocks in
-* the cluster.
+* extent and the cluster is not used by any other extent,
+* save the partial cluster here, since we might need to
+* delete if we determine that the truncate operation has
+* removed all of the blocks in the cluster.
+*
+* On the other hand, if we did not manage to free the whole
+* extent, we have to mark the cluster as used (store negative
+* cluster number in partial_cluster).
 */
-   if (pblk & (sbi->s_cluster

[PATCH v4 17/20] ext4: remove unused code from ext4_remove_blocks()

2013-05-14 Thread Lukas Czerner

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext4/extents.c |   21 -
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bc0f191..18c3f1a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2432,23 +2432,10 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
-   } else if (from == le32_to_cpu(ex->ee_block)
-  && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
-   /* head removal */
-   ext4_lblk_t num;
-   ext4_fsblk_t start;
-
-   num = to - from;
-   start = ext4_ext_pblock(ex);
-
-   ext_debug("free first %u blocks starting %llu\n", num, start);
-   ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-   } else {
-   printk(KERN_INFO "strange request: removal(2) "
-   "%u-%u from %u:%u\n",
-   from, to, le32_to_cpu(ex->ee_block), ee_len);
-   }
+   } else
+   ext4_error(sbi->s_sb, "strange request: removal(2) "
+  "%u-%u from %u:%u\n",
+  from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 12/20] ext4: Call ext4_jbd2_file_inode() after zeroing block

2013-05-14 Thread Lukas Czerner

In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
after the truncate transaction has committed does not expose stall data
in the tail of the block.

Thanks Jan Kara for pointing that out.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 11c07e1..8187c3e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3680,8 +3680,11 @@ int ext4_block_zero_page_range(handle_t *handle,
err = 0;
if (ext4_should_journal_data(inode)) {
err = ext4_handle_dirty_metadata(handle, inode, bh);
-   } else
+   } else {
mark_buffer_dirty(bh);
+   if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE))
+   err = ext4_jbd2_file_inode(handle, inode);
+   }
 
 unlock:
unlock_page(page);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole

2013-05-14 Thread Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner 
---
v4: Use start-len arguments in ext4_zero_partial_blocks()

 fs/ext4/ext4.h  |2 +
 fs/ext4/inode.c |  118 +-
 2 files changed, 48 insertions(+), 72 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9f9719f..2d4b0aa 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2100,6 +2100,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 34ebb62..44333a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3692,6 +3692,41 @@ unlock:
return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t length)
+{
+   struct super_block *sb = inode->i_sb;
+   struct address_space *mapping = inode->i_mapping;
+   unsigned partial = lstart & (sb->s_blocksize - 1);
+   ext4_fsblk_t start, end;
+   loff_t byte_end = (lstart + length - 1);
+   int err = 0;
+
+   start = lstart >> sb->s_blocksize_bits;
+   end = byte_end >> sb->s_blocksize_bits;
+
+   /* Handle partial zero within the single block */
+   if (start == end) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, length);
+   return err;
+   }
+   /* Handle partial zero out on the start of the range */
+   if (partial) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, sb->s_blocksize);
+   if (err)
+   return err;
+   }
+   /* Handle partial zero out on the end of the range */
+   partial = byte_end & (sb->s_blocksize - 1);
+   if (partial != sb->s_blocksize - 1)
+   err = ext4_block_zero_page_range(handle, mapping,
+byte_end - partial,
+partial + 1);
+   return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
if (S_ISREG(inode->i_mode))
@@ -3720,8 +3755,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
-   loff_t first_page, last_page, page_len;
-   loff_t first_page_offset, last_page_offset;
+   loff_t first_block_offset, last_block_offset;
handle_t *handle;
unsigned int credits;
int ret = 0;
@@ -3772,17 +3806,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
   offset;
}
 
-   first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-   last_page = (offset + length) >> PAGE_CACHE_SHIFT;
+   first_block_offset = round_up(offset, sb->s_blocksize);
+   last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-   first_page_offset = first_page << PAGE_CACHE_SHIFT;
-   last_page_offset = last_page << PAGE_CACHE_SHIFT;
-
-   /* Now release the pages */
-   if (last_page_offset > first_page_offset) {
-   truncate_pagecache_range(inode, first_page_offset,
-last_page_offset - 1);
-   }
+   /* Now release the pages and zero block aligned part of pages*/
+   if (last_block_offset &g

[PATCH v4 04/20] jbd: change journal_invalidatepage() to accept length

2013-05-14 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner 
Reviewed-by: Jan Kara 
---
 fs/ext3/inode.c |6 +++---
 fs/jbd/transaction.c|   19 ++-
 include/linux/jbd.h |2 +-
 include/trace/events/ext3.h |   12 +++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = EXT3_JOURNAL(page->mapping->host);
 
-   trace_ext3_invalidatepage(page, offset);
+   trace_ext3_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   journal_invalidatepage(journal, page, offset);
+   journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
  * void journal_invalidatepage() - invalidate a journal page
  * @journal: journal to use for flush
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
  */
 void journal_invalidatepage(journal_t *journal,
  struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
 
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
-offset > 0);
+partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void  journal_release_buffer (handle_t *, struct 
buffer_head *);
 extern int  journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern void journal_invalidatepage(journal_t *,
-   struct page *, unsigned long);
+   struct page *, unsigned int, unsigned int);
 extern int  journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int  journal_stop(handle_t *);
 extern int  journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 
 TRACE_EVENT(ext3_invalidatepage,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_A

[PATCH v4 00/20] change invalidatepage prototype to accept length

2013-05-14 Thread Lukas Czerner

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change ->invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.

It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4 
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)

For description purposes this patch set can be divided into following
groups:

patch 0001:Change ->invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010:Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0011 - 0020:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

I've also tested block size < page size on ext4 with xfstests and fsx.


v3 -> v4: Some minor changes based on the reviews. Added two ext4 patches
  as suggested by Jan Kara.

Thanks!
-Lukas

-- 
 Documentation/filesystems/Locking |6 +-
 Documentation/filesystems/vfs.txt |   20 +-
 fs/9p/vfs_addr.c  |5 +-
 fs/afs/file.c |   10 +-
 fs/btrfs/disk-io.c|3 +-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 +-
 fs/buffer.c   |   21 ++-
 fs/ceph/addr.c|   15 +-
 fs/cifs/file.c|5 +-
 fs/exofs/inode.c  |6 +-
 fs/ext3/inode.c   |9 +-
 fs/ext4/ext4.h|   14 +-
 fs/ext4/extents.c |   96 ++
 fs/ext4/inode.c   |  402 ++---
 fs/ext4/super.c   |1 +
 fs/f2fs/data.c|3 +-
 fs/f2fs/node.c|3 +-
 fs/gfs2/aops.c|   17 +-
 fs/jbd/transaction.c  |   19 ++-
 fs/jbd2/transaction.c |   24 ++-
 fs/jfs/jfs_metapage.c |5 +-
 fs/logfs/file.c   |3 +-
 fs/logfs/segment.c|3 +-
 fs/nfs/file.c |8 +-
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |5 +-
 fs/reiserfs/inode.c   |   12 +-
 fs/ubifs/file.c   |5 +-
 fs/xfs/xfs_aops.c |   14 +-
 fs/xfs/xfs_trace.h|   15 +-
 include/linux/buffer_head.h   |3 +-
 include/linux/fs.h|2 +-
 include/linux/jbd.h   |2 +-
 include/linux/jbd2.h  |2 +-
 include/linux/mm.h|3 +-
 include/trace/events/ext3.h   |   12 +-
 include/trace/events/ext4.h   |   64 ---
 mm/readahead.c|2 +-
 mm/truncate.c |  117 
 40 files changed, 503 insertions(+), 460 deletions(-)

[PATCH v4 01/20] mm: change invalidatepage prototype to accept
[PATCH v4 02/20] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v4 03/20] ext4: use ->invalidatepage() length argument
[PATCH v4 04/20] jbd: change journal_invalidatepage() to accept
[PATCH v4 05/20] xfs: use ->invalidatepage() length argument
[PATCH v4 06/20] ocfs2: use ->invalidatepage() length argument
[PATCH v4 07/20] ceph: use ->invalidatepage() length argument
[PATCH v4 08/20] gfs2: use ->invalidatepage() length argument
[PATCH v4 09/20] reiserfs: use ->invalidatepage() length argument
[PATCH v4 10/20] mm: teach

[PATCH v4 00/20] change invalidatepage prototype to accept length

2013-05-14 Thread Lukas Czerner

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change -invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop -invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.

It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4 
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)

For description purposes this patch set can be divided into following
groups:

patch 0001:Change -invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010:Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0011 - 0020:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

I've also tested block size  page size on ext4 with xfstests and fsx.


v3 - v4: Some minor changes based on the reviews. Added two ext4 patches
  as suggested by Jan Kara.

Thanks!
-Lukas

-- 
 Documentation/filesystems/Locking |6 +-
 Documentation/filesystems/vfs.txt |   20 +-
 fs/9p/vfs_addr.c  |5 +-
 fs/afs/file.c |   10 +-
 fs/btrfs/disk-io.c|3 +-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 +-
 fs/buffer.c   |   21 ++-
 fs/ceph/addr.c|   15 +-
 fs/cifs/file.c|5 +-
 fs/exofs/inode.c  |6 +-
 fs/ext3/inode.c   |9 +-
 fs/ext4/ext4.h|   14 +-
 fs/ext4/extents.c |   96 ++
 fs/ext4/inode.c   |  402 ++---
 fs/ext4/super.c   |1 +
 fs/f2fs/data.c|3 +-
 fs/f2fs/node.c|3 +-
 fs/gfs2/aops.c|   17 +-
 fs/jbd/transaction.c  |   19 ++-
 fs/jbd2/transaction.c |   24 ++-
 fs/jfs/jfs_metapage.c |5 +-
 fs/logfs/file.c   |3 +-
 fs/logfs/segment.c|3 +-
 fs/nfs/file.c |8 +-
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |5 +-
 fs/reiserfs/inode.c   |   12 +-
 fs/ubifs/file.c   |5 +-
 fs/xfs/xfs_aops.c |   14 +-
 fs/xfs/xfs_trace.h|   15 +-
 include/linux/buffer_head.h   |3 +-
 include/linux/fs.h|2 +-
 include/linux/jbd.h   |2 +-
 include/linux/jbd2.h  |2 +-
 include/linux/mm.h|3 +-
 include/trace/events/ext3.h   |   12 +-
 include/trace/events/ext4.h   |   64 ---
 mm/readahead.c|2 +-
 mm/truncate.c |  117 
 40 files changed, 503 insertions(+), 460 deletions(-)

[PATCH v4 01/20] mm: change invalidatepage prototype to accept
[PATCH v4 02/20] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v4 03/20] ext4: use -invalidatepage() length argument
[PATCH v4 04/20] jbd: change journal_invalidatepage() to accept
[PATCH v4 05/20] xfs: use -invalidatepage() length argument
[PATCH v4 06/20] ocfs2: use -invalidatepage() length argument
[PATCH v4 07/20] ceph: use -invalidatepage() length argument
[PATCH v4 08/20] gfs2: use -invalidatepage() length argument
[PATCH v4 09/20] reiserfs: use -invalidatepage() length argument
[PATCH v4 10/20] mm: teach

[PATCH v4 04/20] jbd: change journal_invalidatepage() to accept length

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext3/inode.c |6 +++---
 fs/jbd/transaction.c|   19 ++-
 include/linux/jbd.h |2 +-
 include/trace/events/ext3.h |   12 +++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = EXT3_JOURNAL(page-mapping-host);
 
-   trace_ext3_invalidatepage(page, offset);
+   trace_ext3_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   journal_invalidatepage(journal, page, offset);
+   journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
  * void journal_invalidatepage() - invalidate a journal page
  * @journal: journal to use for flush
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
  */
 void journal_invalidatepage(journal_t *journal,
  struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int may_free = 1;
 
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   return;
+
if (offset = curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free = journal_unmap_buffer(journal, bh,
-offset  0);
+partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free  try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void  journal_release_buffer (handle_t *, struct 
buffer_head *);
 extern int  journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern void journal_invalidatepage(journal_t *,
-   struct page *, unsigned long);
+   struct page *, unsigned int, unsigned int);
 extern int  journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int  journal_stop(handle_t *);
 extern int  journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 
 TRACE_EVENT(ext3_invalidatepage,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_STRUCT__entry

[PATCH v4 12/20] ext4: Call ext4_jbd2_file_inode() after zeroing block

2013-05-14 Thread Lukas Czerner

In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
after the truncate transaction has committed does not expose stall data
in the tail of the block.

Thanks Jan Kara for pointing that out.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/inode.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 11c07e1..8187c3e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3680,8 +3680,11 @@ int ext4_block_zero_page_range(handle_t *handle,
err = 0;
if (ext4_should_journal_data(inode)) {
err = ext4_handle_dirty_metadata(handle, inode, bh);
-   } else
+   } else {
mark_buffer_dirty(bh);
+   if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE))
+   err = ext4_jbd2_file_inode(handle, inode);
+   }
 
 unlock:
unlock_page(page);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole

2013-05-14 Thread Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
v4: Use start-len arguments in ext4_zero_partial_blocks()

 fs/ext4/ext4.h  |2 +
 fs/ext4/inode.c |  118 +-
 2 files changed, 48 insertions(+), 72 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9f9719f..2d4b0aa 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2100,6 +2100,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 34ebb62..44333a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3692,6 +3692,41 @@ unlock:
return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t length)
+{
+   struct super_block *sb = inode-i_sb;
+   struct address_space *mapping = inode-i_mapping;
+   unsigned partial = lstart  (sb-s_blocksize - 1);
+   ext4_fsblk_t start, end;
+   loff_t byte_end = (lstart + length - 1);
+   int err = 0;
+
+   start = lstart  sb-s_blocksize_bits;
+   end = byte_end  sb-s_blocksize_bits;
+
+   /* Handle partial zero within the single block */
+   if (start == end) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, length);
+   return err;
+   }
+   /* Handle partial zero out on the start of the range */
+   if (partial) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, sb-s_blocksize);
+   if (err)
+   return err;
+   }
+   /* Handle partial zero out on the end of the range */
+   partial = byte_end  (sb-s_blocksize - 1);
+   if (partial != sb-s_blocksize - 1)
+   err = ext4_block_zero_page_range(handle, mapping,
+byte_end - partial,
+partial + 1);
+   return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
if (S_ISREG(inode-i_mode))
@@ -3720,8 +3755,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
struct super_block *sb = inode-i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode-i_mapping;
-   loff_t first_page, last_page, page_len;
-   loff_t first_page_offset, last_page_offset;
+   loff_t first_block_offset, last_block_offset;
handle_t *handle;
unsigned int credits;
int ret = 0;
@@ -3772,17 +3806,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
   offset;
}
 
-   first_page = (offset + PAGE_CACHE_SIZE - 1)  PAGE_CACHE_SHIFT;
-   last_page = (offset + length)  PAGE_CACHE_SHIFT;
+   first_block_offset = round_up(offset, sb-s_blocksize);
+   last_block_offset = round_down((offset + length), sb-s_blocksize) - 1;
 
-   first_page_offset = first_page  PAGE_CACHE_SHIFT;
-   last_page_offset = last_page  PAGE_CACHE_SHIFT;
-
-   /* Now release the pages */
-   if (last_page_offset  first_page_offset) {
-   truncate_pagecache_range(inode, first_page_offset,
-last_page_offset - 1);
-   }
+   /* Now release the pages and zero block aligned part of pages*/
+   if (last_block_offset  first_block_offset)
+   truncate_pagecache_range(inode, first_block_offset

[PATCH v4 17/20] ext4: remove unused code from ext4_remove_blocks()

2013-05-14 Thread Lukas Czerner

The head removal branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/extents.c |   21 -
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bc0f191..18c3f1a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2432,23 +2432,10 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
-   } else if (from == le32_to_cpu(ex-ee_block)
-   to = le32_to_cpu(ex-ee_block) + ee_len - 1) {
-   /* head removal */
-   ext4_lblk_t num;
-   ext4_fsblk_t start;
-
-   num = to - from;
-   start = ext4_ext_pblock(ex);
-
-   ext_debug(free first %u blocks starting %llu\n, num, start);
-   ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-   } else {
-   printk(KERN_INFO strange request: removal(2) 
-   %u-%u from %u:%u\n,
-   from, to, le32_to_cpu(ex-ee_block), ee_len);
-   }
+   } else
+   ext4_error(sbi-s_sb, strange request: removal(2) 
+  %u-%u from %u:%u\n,
+  from, to, le32_to_cpu(ex-ee_block), ee_len);
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 19/20] ext4: make punch hole code path work with bigalloc

2013-05-14 Thread Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^--^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/extents.c   |   69 +++---
 include/trace/events/ext4.h |   25 ---
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index fb9b414..214e68a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2359,7 +2359,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int 
nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ long long *partial_cluster,
  ext4_lblk_t from, ext4_lblk_t to)
 {
struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
@@ -2388,7 +2388,8 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 * partial cluster here.
 */
pblk = ext4_ext_pblock(ex) + ee_len - 1;
-   if (*partial_cluster  (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+   if ((*partial_cluster  0) 
+   (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
 EXT4_C2B(sbi, *partial_cluster),
 sbi-s_cluster_ratio, flags);
@@ -2414,23 +2415,41 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 to == le32_to_cpu(ex-ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+   unsigned int unaligned;
 
num = le32_to_cpu(ex-ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
-   ext_debug(free last %u blocks starting %llu\n, num, pblk);
+   /*
+* Usually we want to free partial cluster at the end of the
+* extent, except for the situation when the cluster is still
+* used by any other extent (partial_cluster is negative).
+*/
+   if (*partial_cluster  0 
+   -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+   flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+   ext_debug(free last %u blocks starting %llu partial %lld\n,
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
 * If the block range to be freed didn't start at the
 * beginning of a cluster, and we removed the entire
-* extent, save the partial cluster here, since we
-* might need to delete if we determine that the
-* truncate operation has removed all of the blocks in
-* the cluster.
+* extent and the cluster is not used by any other extent,
+* save the partial cluster here, since we might need to
+* delete if we determine that the truncate operation has
+* removed all of the blocks in the cluster.
+*
+* On the other hand, if we did not manage to free the whole
+* extent, we have to mark the cluster as used (store negative
+* cluster number in partial_cluster).
 */
-   if (pblk  (sbi-s_cluster_ratio - 1) 
-   (ee_len == num

[PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled

2013-05-14 Thread Lukas Czerner

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/inode.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f504efa..daffbb8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3557,11 +3557,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
if (!S_ISREG(inode-i_mode))
return -EOPNOTSUPP;
 
-   if (EXT4_SB(sb)-s_cluster_ratio  1) {
-   /* TODO: Add support for bigalloc file systems */
-   return -EOPNOTSUPP;
-   }
-
trace_ext4_punch_hole(inode, offset, length);
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 18/20] ext4: update ext4_ext_remove_space trace point

2013-05-14 Thread Lukas Czerner

Add end variable.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/extents.c   |6 +++---
 include/trace/events/ext4.h |   21 ++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 18c3f1a..fb9b414 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2663,7 +2663,7 @@ int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
return PTR_ERR(handle);
 
 again:
-   trace_ext4_ext_remove_space(inode, start, depth);
+   trace_ext4_ext_remove_space(inode, start, end, depth);
 
/*
 * Check if we are removing extents inside the extent tree. If that
@@ -2831,8 +2831,8 @@ again:
}
}
 
-   trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-   path-p_hdr-eh_entries);
+   trace_ext4_ext_remove_space_done(inode, start, end, depth,
+   partial_cluster, path-p_hdr-eh_entries);
 
/* If we still have something in the partial cluster and we have removed
 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index dcfce96..bcb5a02 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ext4_lblk_t end, int depth),
 
-   TP_ARGS(inode, start, depth),
+   TP_ARGS(inode, start, end, depth),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
),
 
@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry-dev= inode-i_sb-s_dev;
__entry-ino= inode-i_ino;
__entry-start  = start;
+   __entry-end= end;
__entry-depth  = depth;
),
 
-   TP_printk(dev %d,%d ino %lu since %u depth %d,
+   TP_printk(dev %d,%d ino %lu since %u end %u depth %d,
  MAJOR(__entry-dev), MINOR(__entry-dev),
  (unsigned long) __entry-ino,
  (unsigned) __entry-start,
+ (unsigned) __entry-end,
  __entry-depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-   ext4_lblk_t partial, __le16 eh_entries),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+int depth, ext4_lblk_t partial, __le16 eh_entries),
 
-   TP_ARGS(inode, start, depth, partial, eh_entries),
+   TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
__field(ext4_lblk_t,partial )
__field(unsigned short, eh_entries  )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry-dev= inode-i_sb-s_dev;
__entry-ino= inode-i_ino;
__entry-start  = start;
+   __entry-end= end;
__entry-depth  = depth;
__entry-partial= partial;
__entry-eh_entries = le16_to_cpu(eh_entries);
),
 
-   TP_printk(dev %d,%d ino %lu since %u depth %d partial %u 
+   TP_printk(dev %d,%d ino %lu since %u end %u depth %d partial %u 
  remaining_entries %u,
  MAJOR(__entry-dev), MINOR(__entry-dev),
  (unsigned long) __entry-ino,
  (unsigned) __entry-start,
+ (unsigned) __entry-end,
  __entry-depth,
  (unsigned) __entry-partial,
  (unsigned short) __entry-eh_entries)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 09/20] reiserfs: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: reiserfs-de...@vger.kernel.org
---
 fs/reiserfs/inode.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page-mapping-host;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int ret = 1;
 
BUG_ON(!PageLocked(page));
 
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
 
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   goto out;
+
/*
 * is this block fully invalidated?
 */
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
 * The get_block cached value has been unconditionally invalidated,
 * so real IO is not possible anymore.
 */
-   if (!offset  ret) {
+   if (!partial_page  ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 10/20] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

2013-05-14 Thread Lukas Czerner

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed -invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Hugh Dickins hu...@google.com
---
 mm/truncate.c |  104 -
 1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int 
offset,
(*invalidatepage)(page, offset, length);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-   zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-   cleancache_invalidate_page(page-mapping, page);
-   if (page_has_private(page))
-   do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
  * truncate_inode_pages_range - truncate range of pages specified by start  
end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since -invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
 {
-   const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1)  PAGE_CACHE_SHIFT;
-   const unsigned partial = lstart  (PAGE_CACHE_SIZE - 1);
-   struct pagevec pvec;
-   pgoff_t index;
-   pgoff_t end;
-   int i;
+   pgoff_t start;  /* inclusive */
+   pgoff_t end;/* exclusive */
+   unsigned intpartial_start;  /* inclusive */
+   unsigned intpartial_end;/* exclusive */
+   struct pagevec  pvec;
+   pgoff_t index;
+   int i;
 
cleancache_invalidate_inode(mapping);
if (mapping-nrpages == 0)
return;
 
-   BUG_ON((lend  (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-   end = (lend  PAGE_CACHE_SHIFT);
+   /* Offsets within partial pages */
+   partial_start = lstart  (PAGE_CACHE_SIZE - 1);
+   partial_end = (lend + 1)  (PAGE_CACHE_SIZE - 1);
+
+   /*
+* 'start' and 'end' always covers the range of pages to be fully
+* truncated. Partial pages are covered with 'partial_start' at the
+* start of the range and 'partial_end' at the end of the range.
+* Note that 'end' is exclusive while 'lend' is inclusive.
+*/
+   start = (lstart + PAGE_CACHE_SIZE - 1)  PAGE_CACHE_SHIFT;
+   if (lend == -1)
+   /*
+* lend == -1 indicates end-of-file so we have to set 'end'
+* to the highest possible pgoff_t and since the type is
+* unsigned we're using -1.
+*/
+   end = -1;
+   else
+   end = (lend + 1)  PAGE_CACHE_SHIFT;
 
pagevec_init(pvec, 0);
index = start;
-   while (index = end  pagevec_lookup(pvec, mapping, index,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   while (index  end  pagevec_lookup(pvec, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i

[PATCH v4 14/20] ext4: truncate_inode_pages() in orphan cleanup path

2013-05-14 Thread Lukas Czerner

Currently we do not tell mm to zero out tail of the page before truncate
in orphan_cleanup(). This is ok, because the page should not be
uptodate, however this may eventually change and I might cause problems.

Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
for pointing this out.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/super.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dbc7c09..b971066 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2173,6 +2173,7 @@ static void ext4_orphan_cleanup(struct super_block *sb,
jbd_debug(2, truncating inode %lu to %lld bytes\n,
  inode-i_ino, inode-i_size);
mutex_lock(inode-i_mutex);
+   truncate_inode_pages(inode-i_mapping, inode-i_size);
ext4_truncate(inode);
mutex_unlock(inode-i_mutex);
nr_truncates++;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 16/20] ext4: remove unused discard_partial_page_buffers

2013-05-14 Thread Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/ext4.h  |8 --
 fs/ext4/inode.c |  206 ---
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2d4b0aa..019db3c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -581,11 +581,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER   0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED  0x0001
-
-/*
  * ioctl commands
  */
 #defineEXT4_IOC_GETFLAGS   FS_IOC_GETFLAGS
@@ -2102,9 +2097,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 44333a5..f504efa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned 
int offset,
unsigned int length);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head 
*bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page *page, loff_t from,
-   loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3366,209 +3363,6 @@ void ext4_set_aops(struct inode *inode)
inode-i_mapping-a_ops = ext4_aops;
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * from and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags)
-{
-   struct inode *inode = mapping-host;
-   struct page *page;
-   int err = 0;
-
-   page = find_or_create_page(mapping, from  PAGE_CACHE_SHIFT,
-  mapping_gfp_mask(mapping)  ~__GFP_FS);
-   if (!page)
-   return -ENOMEM;
-
-   err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-   from, length, flags);
-
-   unlock_page(page);
-   page_cache_release(page);
-   return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset from
- * from:   The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len:The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped.  This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle

[PATCH v4 13/20] Revert ext4: fix fsx truncate failure

2013-05-14 Thread Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay  BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/inode.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8187c3e..34ebb62 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3937,7 +3937,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode-i_mapping;
-   loff_t page_len;
 
/*
 * There is a possibility that we're either freeing the inode
@@ -3981,14 +3980,8 @@ void ext4_truncate(struct inode *inode)
return;
}
 
-   if (inode-i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode-i_size  (PAGE_CACHE_SIZE - 1));
-
-   if (ext4_discard_partial_page_buffers(handle,
-   mapping, inode-i_size, page_len, 0))
-   goto out_stop;
-   }
+   if (inode-i_size  (inode-i_sb-s_blocksize - 1))
+   ext4_block_truncate_page(handle, mapping, inode-i_size);
 
/*
 * We add the inode to the orphan list, so that if this
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 11/20] Revert ext4: remove no longer used functions in inode.c

2013-05-14 Thread Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/ext4.h  |4 ++
 fs/ext4/inode.c |  120 +++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..9f9719f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2096,6 +2096,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ae58749..11c07e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3569,6 +3569,126 @@ next:
return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from)
+{
+   unsigned offset = from  (PAGE_CACHE_SIZE-1);
+   unsigned length;
+   unsigned blocksize;
+   struct inode *inode = mapping-host;
+
+   blocksize = inode-i_sb-s_blocksize;
+   length = blocksize - (offset  (blocksize - 1));
+
+   return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length)
+{
+   ext4_fsblk_t index = from  PAGE_CACHE_SHIFT;
+   unsigned offset = from  (PAGE_CACHE_SIZE-1);
+   unsigned blocksize, max, pos;
+   ext4_lblk_t iblock;
+   struct inode *inode = mapping-host;
+   struct buffer_head *bh;
+   struct page *page;
+   int err = 0;
+
+   page = find_or_create_page(mapping, from  PAGE_CACHE_SHIFT,
+  mapping_gfp_mask(mapping)  ~__GFP_FS);
+   if (!page)
+   return -ENOMEM;
+
+   blocksize = inode-i_sb-s_blocksize;
+   max = blocksize - (offset  (blocksize - 1));
+
+   /*
+* correct length if it does not fall between
+* 'from' and the end of the block
+*/
+   if (length  max || length  0)
+   length = max;
+
+   iblock = index  (PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits);
+
+   if (!page_has_buffers(page))
+   create_empty_buffers(page, blocksize, 0);
+
+   /* Find the buffer that contains offset */
+   bh = page_buffers(page);
+   pos = blocksize;
+   while (offset = pos) {
+   bh = bh-b_this_page;
+   iblock++;
+   pos += blocksize;
+   }
+
+   err = 0;
+   if (buffer_freed(bh)) {
+   BUFFER_TRACE(bh, freed: skip);
+   goto unlock;
+   }
+
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, unmapped);
+   ext4_get_block(inode, iblock, bh, 0);
+   /* unmapped? It's a hole - nothing to do */
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, still unmapped);
+   goto unlock;
+   }
+   }
+
+   /* Ok, it's mapped. Make sure it's up-to-date */
+   if (PageUptodate(page))
+   set_buffer_uptodate(bh);
+
+   if (!buffer_uptodate(bh)) {
+   err = -EIO;
+   ll_rw_block(READ, 1, bh);
+   wait_on_buffer(bh);
+   /* Uhhuh. Read error. Complain and punt. */
+   if (!buffer_uptodate(bh))
+   goto unlock

[PATCH v4 08/20] gfs2: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Acked-by: Steven Whitehouse swhit...@redhat.com
Cc: cluster-de...@redhat.com
---
 fs/gfs2/aops.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int length)
 {
struct gfs2_sbd *sdp = GFS2_SB(page-mapping-host);
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
 
BUG_ON(!PageLocked(page));
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
 
bh = head = page_buffers(page);
do {
+   if (pos + bh-b_size  stop)
+   return;
+
if (offset = pos)
gfs2_discard(sdp, bh);
pos += bh-b_size;
bh = bh-b_this_page;
} while (bh != head);
 out:
-   if (offset == 0)
+   if (!partial_page)
try_to_release_page(page, 0);
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 07/20] ceph: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Acked-by: Sage Weil s...@inktank.com
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/addr.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, 
unsigned int offset,
if (!PageDirty(page))
pr_err(%p invalidatepage %p page not dirty\n, inode, page);
 
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
ci = ceph_inode(inode);
-   if (offset == 0) {
-   dout(%p invalidatepage %p idx %lu full dirty page %u\n,
-inode, page, page-index, offset);
+   if (offset == 0  length == PAGE_CACHE_SIZE) {
+   dout(%p invalidatepage %p idx %lu full dirty page\n,
+inode, page, page-index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page-private = 0;
ClearPagePrivate(page);
} else {
-   dout(%p invalidatepage %p idx %lu partial dirty page\n,
-inode, page, page-index);
+   dout(%p invalidatepage %p idx %lu partial dirty page %u(%u)\n,
+inode, page, page-index, offset, length);
}
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 01/20] mm: change invalidatepage prototype to accept length

2013-05-14 Thread Lukas Czerner

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Hugh Dickins hu...@google.com
---
 Documentation/filesystems/Locking |6 +++---
 Documentation/filesystems/vfs.txt |   20 ++--
 fs/9p/vfs_addr.c  |5 +++--
 fs/afs/file.c |   10 ++
 fs/btrfs/disk-io.c|3 ++-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 ++-
 fs/buffer.c   |   21 ++---
 fs/ceph/addr.c|5 +++--
 fs/cifs/file.c|5 +++--
 fs/exofs/inode.c  |6 --
 fs/ext3/inode.c   |3 ++-
 fs/ext4/inode.c   |   18 +++---
 fs/f2fs/data.c|3 ++-
 fs/f2fs/node.c|3 ++-
 fs/gfs2/aops.c|8 +---
 fs/jfs/jfs_metapage.c |5 +++--
 fs/logfs/file.c   |3 ++-
 fs/logfs/segment.c|3 ++-
 fs/nfs/file.c |8 +---
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |3 ++-
 fs/reiserfs/inode.c   |3 ++-
 fs/ubifs/file.c   |5 +++--
 fs/xfs/xfs_aops.c |7 ---
 include/linux/buffer_head.h   |3 ++-
 include/linux/fs.h|2 +-
 include/linux/mm.h|3 ++-
 mm/readahead.c|2 +-
 mm/truncate.c |   15 +--
 30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage) (struct page *, unsigned long);
+   void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually 
go away.  Please,
 keep it that way and don't breed new callers.
 
-invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
-returns zero on success.  If -invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If -invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 
-releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
 ---
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 
 struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space

[PATCH v4 02/20] jbd2: change jbd2_journal_invalidatepage to accept length

2013-05-14 Thread Lukas Czerner

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/inode.c   |3 ++-
 fs/jbd2/transaction.c |   24 +---
 fs/ocfs2/aops.c   |3 ++-
 include/linux/jbd2.h  |2 +-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2930eb7..96d5927 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3014,7 +3014,8 @@ static int __ext4_journalled_invalidatepage(struct page 
*page,
if (offset == 0)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset);
+   return jbd2_journal_invalidatepage(journal, page, offset,
+  PAGE_CACHE_SIZE - offset);
 }
 
 /* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 10f524c..5d8268a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2034,18 +2034,23 @@ zap_buffer_unlocked:
  * void jbd2_journal_invalidatepage()
  * @journal: journal to use for flush...
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  start of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
  */
 int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;
 
@@ -2054,6 +2059,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2063,10 +2070,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   return 0;
+
if (offset = curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
-   ret = journal_unmap_buffer(journal, bh, offset  0);
+   ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret  0)
return ret;
@@ -2077,7 +2087,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free  try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page-mapping-host-i_sb)-journal-j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset);
+   jbd2_journal_invalidatepage(journal, page, offset,
+   PAGE_CACHE_SIZE - offset);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..682a63c 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t 
*, struct buffer_head *);
 extern int  jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(journal_t

[PATCH v4 06/20] ocfs2: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
Acked-by: Joel Becker jl...@evilplan.org
---
 fs/ocfs2/aops.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page-mapping-host-i_sb)-journal-j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset,
-   PAGE_CACHE_SIZE - offset);
+   jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 05/20] xfs: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Ben Myers b...@sgi.com
Cc: x...@oss.sgi.com
---
v4: use xfs_page_class instead of separate tracepoint

 fs/xfs/xfs_aops.c  |9 +
 fs/xfs/xfs_trace.h |   15 ++-
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..55c85ec 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned intoffset,
unsigned intlength)
 {
-   trace_xfs_invalidatepage(page-mapping-host, page, offset);
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   trace_xfs_invalidatepage(page-mapping-host, page, offset,
+length);
+   block_invalidatepage(page, offset, length);
 }
 
 /*
@@ -921,7 +922,7 @@ xfs_vm_writepage(
int count = 0;
int nonblocking = 0;
 
-   trace_xfs_writepage(inode, page, 0);
+   trace_xfs_writepage(inode, page, 0, 0);
 
ASSERT(page_has_buffers(page));
 
@@ -1152,7 +1153,7 @@ xfs_vm_releasepage(
 {
int delalloc, unwritten;
 
-   trace_xfs_releasepage(page-mapping-host, page, 0);
+   trace_xfs_releasepage(page-mapping-host, page, 0, 0);
 
xfs_count_page_state(page, delalloc, unwritten);
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..7f075ed 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -950,14 +950,16 @@ DEFINE_RW_EVENT(xfs_file_splice_read);
 DEFINE_RW_EVENT(xfs_file_splice_write);
 
 DECLARE_EVENT_CLASS(xfs_page_class,
-   TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
-   TP_ARGS(inode, page, off),
+   TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
+unsigned int len),
+   TP_ARGS(inode, page, off, len),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(pgoff_t, pgoff)
__field(loff_t, size)
__field(unsigned long, offset)
+   __field(unsigned int, length)
__field(int, delalloc)
__field(int, unwritten)
),
@@ -971,24 +973,27 @@ DECLARE_EVENT_CLASS(xfs_page_class,
__entry-pgoff = page_offset(page);
__entry-size = i_size_read(inode);
__entry-offset = off;
+   __entry-length = len;
__entry-delalloc = delalloc;
__entry-unwritten = unwritten;
),
TP_printk(dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx 
- delalloc %d unwritten %d,
+ length %x delalloc %d unwritten %d,
  MAJOR(__entry-dev), MINOR(__entry-dev),
  __entry-ino,
  __entry-pgoff,
  __entry-size,
  __entry-offset,
+ __entry-length,
  __entry-delalloc,
  __entry-unwritten)
 )
 
 #define DEFINE_PAGE_EVENT(name)\
 DEFINE_EVENT(xfs_page_class, name, \
-   TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
\
-   TP_ARGS(inode, page, off))
+   TP_PROTO(struct inode *inode, struct page *page, unsigned long off, \
+unsigned int len), \
+   TP_ARGS(inode, page, off, len))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
 DEFINE_PAGE_EVENT(xfs_invalidatepage);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 03/20] ext4: use -invalidatepage() length argument

2013-05-14 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/ext4/inode.c |   30 +++---
 include/trace/events/ext4.h |   22 --
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 96d5927..ae58749 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1415,21 +1415,28 @@ static void ext4_da_release_space(struct inode *inode, 
int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-unsigned long offset)
+unsigned int offset,
+unsigned int length)
 {
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page-mapping-host;
struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
+   unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh-b_size;
 
+   if (next_off  stop)
+   break;
+
if ((offset = curr_off)  (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2839,7 +2846,7 @@ static void ext4_da_invalidatepage(struct page *page, 
unsigned int offset,
if (!page_has_buffers(page))
goto out;
 
-   ext4_da_page_release_reservation(page, offset);
+   ext4_da_page_release_reservation(page, offset, length);
 
 out:
ext4_invalidatepage(page, offset, length);
@@ -2993,29 +3000,29 @@ ext4_readpages(struct file *file, struct address_space 
*mapping,
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
 {
-   trace_ext4_invalidatepage(page, offset);
+   trace_ext4_invalidatepage(page, offset, length);
 
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page)  buffer_jbd(page_buffers(page)));
 
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   block_invalidatepage(page, offset, length);
 }
 
 static int __ext4_journalled_invalidatepage(struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
journal_t *journal = EXT4_JOURNAL(page-mapping-host);
 
-   trace_ext4_journalled_invalidatepage(page, offset);
+   trace_ext4_journalled_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset,
-  PAGE_CACHE_SIZE - offset);
+   return jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 /* Wrapper for aops... */
@@ -3023,7 +3030,7 @@ static void ext4_journalled_invalidatepage(struct page 
*page,
   unsigned int offset,
   unsigned int length)
 {
-   WARN_ON(__ext4_journalled_invalidatepage(page, offset)  0);
+   WARN_ON(__ext4_journalled_invalidatepage(page, offset, length)  0);
 }
 
 static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4627,7 +4634,8 @@ static void ext4_wait_for_tail_page_commit(struct inode 
*inode)
  inode-i_size  PAGE_CACHE_SHIFT);
if (!page)
return;
-   ret = __ext4_journalled_invalidatepage(page, offset);
+   ret = __ext4_journalled_invalidatepage(page, offset,
+   PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 8ee15b9..dcfce96 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 );
 
 DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_STRUCT__entry(
__field(dev_t,  dev

[PATCH v3 01/18] mm: change invalidatepage prototype to accept length

2013-04-09 Thread Lukas Czerner

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 Documentation/filesystems/Locking |6 +++---
 Documentation/filesystems/vfs.txt |   20 ++--
 fs/9p/vfs_addr.c  |5 +++--
 fs/afs/file.c |   10 ++
 fs/btrfs/disk-io.c|3 ++-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 ++-
 fs/buffer.c   |   21 ++---
 fs/ceph/addr.c|5 +++--
 fs/cifs/file.c|5 +++--
 fs/exofs/inode.c  |6 --
 fs/ext3/inode.c   |3 ++-
 fs/ext4/inode.c   |   18 +++---
 fs/f2fs/data.c|3 ++-
 fs/f2fs/node.c|3 ++-
 fs/gfs2/aops.c|8 +---
 fs/jfs/jfs_metapage.c |5 +++--
 fs/logfs/file.c   |3 ++-
 fs/logfs/segment.c|3 ++-
 fs/nfs/file.c |8 +---
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |3 ++-
 fs/reiserfs/inode.c   |3 ++-
 fs/ubifs/file.c   |5 +++--
 fs/xfs/xfs_aops.c |7 ---
 include/linux/buffer_head.h   |3 ++-
 include/linux/fs.h|2 +-
 include/linux/mm.h|3 ++-
 mm/readahead.c|2 +-
 mm/truncate.c |   15 +--
 30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage) (struct page *, unsigned long);
+   void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually 
go away.  Please,
 keep it that way and don't breed new callers.
 
->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
-returns zero on success.  If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 
->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
 ---
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 
 struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage)

[PATCH v3 03/18] ext4: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c |   30 +++---
 include/trace/events/ext4.h |   22 --
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 69595f5..f80e0c3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1411,21 +1411,28 @@ static void ext4_da_release_space(struct inode *inode, 
int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-unsigned long offset)
+unsigned int offset,
+unsigned int length)
 {
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
 
+   if (next_off > stop)
+   break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2825,7 +2832,7 @@ static void ext4_da_invalidatepage(struct page *page, 
unsigned int offset,
if (!page_has_buffers(page))
goto out;
 
-   ext4_da_page_release_reservation(page, offset);
+   ext4_da_page_release_reservation(page, offset, length);
 
 out:
ext4_invalidatepage(page, offset, length);
@@ -2979,29 +2986,29 @@ ext4_readpages(struct file *file, struct address_space 
*mapping,
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
 {
-   trace_ext4_invalidatepage(page, offset);
+   trace_ext4_invalidatepage(page, offset, length);
 
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
 
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   block_invalidatepage(page, offset, length);
 }
 
 static int __ext4_journalled_invalidatepage(struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
journal_t *journal = EXT4_JOURNAL(page->mapping->host);
 
-   trace_ext4_journalled_invalidatepage(page, offset);
+   trace_ext4_journalled_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset,
-  PAGE_CACHE_SIZE - offset);
+   return jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 /* Wrapper for aops... */
@@ -3009,7 +3016,7 @@ static void ext4_journalled_invalidatepage(struct page 
*page,
   unsigned int offset,
   unsigned int length)
 {
-   WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
+   WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
 }
 
 static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4607,7 +4614,8 @@ static void ext4_wait_for_tail_page_commit(struct inode 
*inode)
  inode->i_size >> PAGE_CACHE_SHIFT);
if (!page)
return;
-   ret = __ext4_journalled_invalidatepage(page, offset);
+   ret = __ext4_journalled_invalidatepage(page, offset,
+   PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 58459b7..60b329a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 );
 
 DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),

[PATCH v3 04/18] jbd: change journal_invalidatepage() to accept length

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner 
---
 fs/ext3/inode.c |6 +++---
 fs/jbd/transaction.c|   19 ++-
 include/linux/jbd.h |2 +-
 include/trace/events/ext3.h |   12 +++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = EXT3_JOURNAL(page->mapping->host);
 
-   trace_ext3_invalidatepage(page, offset);
+   trace_ext3_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   journal_invalidatepage(journal, page, offset);
+   journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
  * void journal_invalidatepage() - invalidate a journal page
  * @journal: journal to use for flush
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
  */
 void journal_invalidatepage(journal_t *journal,
  struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
 
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
-offset > 0);
+partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void  journal_release_buffer (handle_t *, struct 
buffer_head *);
 extern int  journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern void journal_invalidatepage(journal_t *,
-   struct page *, unsigned long);
+   struct page *, unsigned int, unsigned int);
 extern int  journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int  journal_stop(handle_t *);
 extern int  journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 
 TRACE_EVENT(ext3_invalidatepage,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_S

[PATCH v3 07/18] ceph: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner 
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/addr.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, 
unsigned int offset,
if (!PageDirty(page))
pr_err("%p invalidatepage %p page not dirty\n", inode, page);
 
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
ci = ceph_inode(inode);
-   if (offset == 0) {
-   dout("%p invalidatepage %p idx %lu full dirty page %u\n",
-inode, page, page->index, offset);
+   if (offset == 0 && length == PAGE_CACHE_SIZE) {
+   dout("%p invalidatepage %p idx %lu full dirty page\n",
+inode, page, page->index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page->private = 0;
ClearPagePrivate(page);
} else {
-   dout("%p invalidatepage %p idx %lu partial dirty page\n",
-inode, page, page->index);
+   dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
+inode, page, page->index, offset, length);
}
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 05/18] xfs: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner 
Cc: x...@oss.sgi.com
---
 fs/xfs/xfs_aops.c  |5 +++--
 fs/xfs/xfs_trace.h |   41 -
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..e8018d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned intoffset,
unsigned intlength)
 {
-   trace_xfs_invalidatepage(page->mapping->host, page, offset);
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   trace_xfs_invalidatepage(page->mapping->host, page, offset,
+length);
+   block_invalidatepage(page, offset, length);
 }
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..91d6434 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage,
+   TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+unsigned int len),
+   TP_ARGS(inode, page, off, len),
+   TP_STRUCT__entry(
+   __field(dev_t, dev)
+   __field(xfs_ino_t, ino)
+   __field(pgoff_t, pgoff)
+   __field(loff_t, size)
+   __field(unsigned int, offset)
+   __field(unsigned int, length)
+   __field(int, delalloc)
+   __field(int, unwritten)
+   ),
+   TP_fast_assign(
+   int delalloc = -1, unwritten = -1;
+
+   if (page_has_buffers(page))
+   xfs_count_page_state(page, , );
+   __entry->dev = inode->i_sb->s_dev;
+   __entry->ino = XFS_I(inode)->i_ino;
+   __entry->pgoff = page_offset(page);
+   __entry->size = i_size_read(inode);
+   __entry->offset = off;
+   __entry->length = len;
+   __entry->delalloc = delalloc;
+   __entry->unwritten = unwritten;
+   ),
+   TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x "
+ "length %x delalloc %d unwritten %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pgoff,
+ __entry->size,
+ __entry->offset,
+ __entry->length,
+ __entry->delalloc,
+ __entry->unwritten)
+)
 
 DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner 
Cc: Joel Becker 
---
 fs/ocfs2/aops.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset,
-   PAGE_CACHE_SIZE - offset);
+   jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner 
Cc: reiserfs-de...@vger.kernel.org
---
 fs/reiserfs/inode.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;
 
BUG_ON(!PageLocked(page));
 
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
 
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   goto out;
+
/*
 * is this block fully invalidated?
 */
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
 * The get_block cached value has been unconditionally invalidated,
 * so real IO is not possible anymore.
 */
-   if (!offset && ret) {
+   if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 12/18] Revert "ext4: fix fsx truncate failure"

2013-04-09 Thread Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5729d21..d58e13c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3920,7 +3920,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode->i_mapping;
-   loff_t page_len;
 
/*
 * There is a possibility that we're either freeing the inode
@@ -3964,14 +3963,8 @@ void ext4_truncate(struct inode *inode)
return;
}
 
-   if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-   if (ext4_discard_partial_page_buffers(handle,
-   mapping, inode->i_size, page_len, 0))
-   goto out_stop;
-   }
+   if (inode->i_size & (inode->i_sb->s_blocksize - 1))
+   ext4_block_truncate_page(handle, mapping, inode->i_size);
 
/*
 * We add the inode to the orphan list, so that if this
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 11/18] Revert "ext4: remove no longer used functions in inode.c"

2013-04-09 Thread Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |4 ++
 fs/ext4/inode.c |  120 +++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a0637e5..3aa5943 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2105,6 +2105,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f80e0c3..5729d21 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3555,6 +3555,126 @@ next:
return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from)
+{
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned length;
+   unsigned blocksize;
+   struct inode *inode = mapping->host;
+
+   blocksize = inode->i_sb->s_blocksize;
+   length = blocksize - (offset & (blocksize - 1));
+
+   return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length)
+{
+   ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned blocksize, max, pos;
+   ext4_lblk_t iblock;
+   struct inode *inode = mapping->host;
+   struct buffer_head *bh;
+   struct page *page;
+   int err = 0;
+
+   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+  mapping_gfp_mask(mapping) & ~__GFP_FS);
+   if (!page)
+   return -ENOMEM;
+
+   blocksize = inode->i_sb->s_blocksize;
+   max = blocksize - (offset & (blocksize - 1));
+
+   /*
+* correct length if it does not fall between
+* 'from' and the end of the block
+*/
+   if (length > max || length < 0)
+   length = max;
+
+   iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+   if (!page_has_buffers(page))
+   create_empty_buffers(page, blocksize, 0);
+
+   /* Find the buffer that contains "offset" */
+   bh = page_buffers(page);
+   pos = blocksize;
+   while (offset >= pos) {
+   bh = bh->b_this_page;
+   iblock++;
+   pos += blocksize;
+   }
+
+   err = 0;
+   if (buffer_freed(bh)) {
+   BUFFER_TRACE(bh, "freed: skip");
+   goto unlock;
+   }
+
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "unmapped");
+   ext4_get_block(inode, iblock, bh, 0);
+   /* unmapped? It's a hole - nothing to do */
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "still unmapped");
+   goto unlock;
+   }
+   }
+
+   /* Ok, it's mapped. Make sure it's up-to-date */
+   if (PageUptodate(page))
+   set_buffer_uptodate(bh);
+
+   if (!buffer_uptodate(bh)) {
+   err = -EIO;
+   ll_rw_block(READ, 1, );
+   wait_on_buffer(bh);
+   /* Uhhu

[PATCH v3 14/18] ext4: remove unused discard_partial_page_buffers

2013-04-09 Thread Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |8 --
 fs/ext4/inode.c |  206 ---
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2428244..cc9020e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -593,11 +593,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER   0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED  0x0001
-
-/*
  * ioctl commands
  */
 #defineEXT4_IOC_GETFLAGS   FS_IOC_GETFLAGS
@@ -2111,9 +2106,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6003fd1..0d452c1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned 
int offset,
unsigned int length);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head 
*bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page *page, loff_t from,
-   loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3352,209 +3349,6 @@ void ext4_set_aops(struct inode *inode)
inode->i_mapping->a_ops = _aops;
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags)
-{
-   struct inode *inode = mapping->host;
-   struct page *page;
-   int err = 0;
-
-   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
-  mapping_gfp_mask(mapping) & ~__GFP_FS);
-   if (!page)
-   return -ENOMEM;
-
-   err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-   from, length, flags);
-
-   unlock_page(page);
-   page_cache_release(page);
-   return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset "from"
- * from:   The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len:The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped.  This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct

[PATCH v3 16/18] ext4: update ext4_ext_remove_space trace point

2013-04-09 Thread Lukas Czerner

Add "end" variable.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c   |6 +++---
 include/trace/events/ext4.h |   21 ++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4adaa8a..9023b76 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2666,7 +2666,7 @@ int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
return PTR_ERR(handle);
 
 again:
-   trace_ext4_ext_remove_space(inode, start, depth);
+   trace_ext4_ext_remove_space(inode, start, end, depth);
 
/*
 * Check if we are removing extents inside the extent tree. If that
@@ -2832,8 +2832,8 @@ again:
}
}
 
-   trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-   path->p_hdr->eh_entries);
+   trace_ext4_ext_remove_space_done(inode, start, end, depth,
+   partial_cluster, path->p_hdr->eh_entries);
 
/* If we still have something in the partial cluster and we have removed
 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 60b329a..c92500c 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ext4_lblk_t end, int depth),
 
-   TP_ARGS(inode, start, depth),
+   TP_ARGS(inode, start, end, depth),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
),
 
@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d",
+   TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-   ext4_lblk_t partial, unsigned short eh_entries),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+int depth, ext4_lblk_t partial, unsigned short eh_entries),
 
-   TP_ARGS(inode, start, depth, partial, eh_entries),
+   TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
__field(ext4_lblk_t,partial )
__field(unsigned short, eh_entries  )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
__entry->partial= partial;
__entry->eh_entries = eh_entries;
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+   TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
  "remaining_entries %u",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth,
  (unsigned) __entry->partial,
  (unsigned short) __entry->eh_entries)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 15/18] ext4: remove unused code from ext4_remove_blocks()

2013-04-09 Thread Lukas Czerner

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c |   21 -
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6c5a70a..4adaa8a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2435,23 +2435,10 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
-   } else if (from == le32_to_cpu(ex->ee_block)
-  && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
-   /* head removal */
-   ext4_lblk_t num;
-   ext4_fsblk_t start;
-
-   num = to - from;
-   start = ext4_ext_pblock(ex);
-
-   ext_debug("free first %u blocks starting %llu\n", num, start);
-   ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-   } else {
-   printk(KERN_INFO "strange request: removal(2) "
-   "%u-%u from %u:%u\n",
-   from, to, le32_to_cpu(ex->ee_block), ee_len);
-   }
+   } else
+   ext4_error(sbi->s_sb, "strange request: removal(2) "
+  "%u-%u from %u:%u\n",
+  from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

2013-04-09 Thread Lukas Czerner

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 mm/truncate.c |  104 -
 1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int 
offset,
(*invalidatepage)(page, offset, length);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-   zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-   cleancache_invalidate_page(page->mapping, page);
-   if (page_has_private(page))
-   do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
  * truncate_inode_pages_range - truncate range of pages specified by start & 
end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
 {
-   const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-   const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
-   struct pagevec pvec;
-   pgoff_t index;
-   pgoff_t end;
-   int i;
+   pgoff_t start;  /* inclusive */
+   pgoff_t end;/* exclusive */
+   unsigned intpartial_start;  /* inclusive */
+   unsigned intpartial_end;/* exclusive */
+   struct pagevec  pvec;
+   pgoff_t index;
+   int i;
 
cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;
 
-   BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-   end = (lend >> PAGE_CACHE_SHIFT);
+   /* Offsets within partial pages */
+   partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+   partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+   /*
+* 'start' and 'end' always covers the range of pages to be fully
+* truncated. Partial pages are covered with 'partial_start' at the
+* start of the range and 'partial_end' at the end of the range.
+* Note that 'end' is exclusive while 'lend' is inclusive.
+*/
+   start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+   if (lend == -1)
+   /*
+* lend == -1 indicates end-of-file so we have to set 'end'
+* to the highest possible pgoff_t and since the type is
+* unsigned we're using -1.
+*/
+   end = -1;
+   else
+   end = (lend + 1) >> PAGE_CACHE_SHIFT;
 
pagevec_init(, 0);
index = start;
-   while (index <= end && pagevec_lookup(, mapping, index,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   while (index < end && pagevec_lookup(, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
m

[PATCH v3 18/18] ext4: Allow punch hole with bigalloc enabled

2013-04-09 Thread Lukas Czerner

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d452c1..87d6171 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3536,11 +3536,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
if (!S_ISREG(inode->i_mode))
return -EOPNOTSUPP;
 
-   if (EXT4_SB(sb)->s_cluster_ratio > 1) {
-   /* TODO: Add support for bigalloc file systems */
-   return -EOPNOTSUPP;
-   }
-
trace_ext4_punch_hole(inode, offset, length);
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

2013-04-09 Thread Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^--^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c   |   69 +++---
 include/trace/events/ext4.h |   25 ---
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9023b76..577c4f5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int 
nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
  ext4_lblk_t from, ext4_lblk_t to)
 {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 * partial cluster here.
 */
pblk = ext4_ext_pblock(ex) + ee_len - 1;
-   if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+   if ((*partial_cluster > 0) &&
+   (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
 EXT4_C2B(sbi, *partial_cluster),
 sbi->s_cluster_ratio, flags);
@@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+   unsigned int unaligned;
 
num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
-   ext_debug("free last %u blocks starting %llu\n", num, pblk);
+   /*
+* Usually we want to free partial cluster at the end of the
+* extent, except for the situation when the cluster is still
+* used by any other extent (partial_cluster is negative).
+*/
+   if (*partial_cluster < 0 &&
+   -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+   flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+   ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
 * If the block range to be freed didn't start at the
 * beginning of a cluster, and we removed the entire
-* extent, save the partial cluster here, since we
-* might need to delete if we determine that the
-* truncate operation has removed all of the blocks in
-* the cluster.
+* extent and the cluster is not used by any other extent,
+* save the partial cluster here, since we might need to
+* delete if we determine that the truncate operation has
+* removed all of the blocks in the cluster.
+*
+* On the other hand, if we did not manage to free the whole
+* extent, we have to mark the cluster as used (store negative
+* cluster number in partial_cluster).
 */
-   if (pblk & (sbi->s_cluster_ratio - 1) &a

[PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner 
Cc: cluster-de...@redhat.com
---
 fs/gfs2/aops.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int length)
 {
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
 
BUG_ON(!PageLocked(page));
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
 
bh = head = page_buffers(page);
do {
+   if (pos + bh->b_size > stop)
+   return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
 out:
-   if (offset == 0)
+   if (!partial_page)
try_to_release_page(page, 0);
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

2013-04-09 Thread Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |2 +
 fs/ext4/inode.c |  110 ---
 2 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3aa5943..2428244 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2109,6 +2109,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d58e13c..6003fd1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3675,6 +3675,37 @@ unlock:
return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend)
+{
+   struct super_block *sb = inode->i_sb;
+   struct address_space *mapping = inode->i_mapping;
+   unsigned partial = lstart & (sb->s_blocksize - 1);
+   ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
+   ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
+   int err = 0;
+
+   /* Handle partial zero within the single block */
+   if (start == end) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, lend - lstart + 1);
+   return err;
+   }
+   /* Handle partial zero out on the start of the range */
+   if (partial) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, sb->s_blocksize);
+   if (err)
+   return err;
+   }
+   /* Handle partial zero out on the end of the range */
+   partial = lend & (sb->s_blocksize - 1);
+   if (partial != sb->s_blocksize - 1)
+   err = ext4_block_zero_page_range(handle, mapping,
+lend - partial, partial + 1);
+   return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
if (S_ISREG(inode->i_mode))
@@ -3703,7 +3734,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
-   loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
handle_t *handle;
unsigned int credits;
@@ -3755,17 +3785,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
   offset;
}
 
-   first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-   last_page = (offset + length) >> PAGE_CACHE_SHIFT;
-
-   first_page_offset = first_page << PAGE_CACHE_SHIFT;
-   last_page_offset = last_page << PAGE_CACHE_SHIFT;
+   first_page_offset = round_up(offset, sb->s_blocksize);
+   last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-   /* Now release the pages */
-   if (last_page_offset > first_page_offset) {
+   /* Now release the pages and zero block aligned part of pages*/
+   if (last_page_offset > first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
-last_page_offset - 1);
-   }
+last_page_offset);
 
/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -3785,66 +3811,10 @@ int ext4_

[PATCH v3 00/18] change invalidatepage prototype to accept length

2013-04-09 Thread Lukas Czerner

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change ->invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.

It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4 
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)

For description purposes this patch set can be divided into following
groups:

patch 0001: Change ->invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0011 - 0018:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

the much smaller changes in other file systems has not been directly tested,
so please review.


--- 
 Documentation/filesystems/Locking |6 +-
 Documentation/filesystems/vfs.txt |   20 +-
 fs/9p/vfs_addr.c  |5 +-
 fs/afs/file.c |   10 +-
 fs/btrfs/disk-io.c|3 +-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 +-
 fs/buffer.c   |   21 ++-
 fs/ceph/addr.c|   15 +-
 fs/cifs/file.c|5 +-
 fs/exofs/inode.c  |6 +-
 fs/ext3/inode.c   |9 +-
 fs/ext4/ext4.h|   14 +-
 fs/ext4/extents.c |   96 ++
 fs/ext4/inode.c   |  393 +
 fs/f2fs/data.c|3 +-
 fs/f2fs/node.c|3 +-
 fs/gfs2/aops.c|   17 ++-
 fs/jbd/transaction.c  |   19 ++-
 fs/jbd2/transaction.c |   24 ++-
 fs/jfs/jfs_metapage.c |5 +-
 fs/logfs/file.c   |3 +-
 fs/logfs/segment.c|3 +-
 fs/nfs/file.c |8 +-
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |5 +-
 fs/reiserfs/inode.c   |   12 +-
 fs/ubifs/file.c   |5 +-
 fs/xfs/xfs_aops.c |   10 +-
 fs/xfs/xfs_trace.h|   41 -
 include/linux/buffer_head.h   |3 +-
 include/linux/fs.h|2 +-
 include/linux/jbd.h   |2 +-
 include/linux/jbd2.h  |2 +-
 include/linux/mm.h|3 +-
 include/trace/events/ext3.h   |   12 +-
 include/trace/events/ext4.h   |   64 ---
 mm/readahead.c|2 +-
 mm/truncate.c |  117 
 39 files changed, 522 insertions(+), 453 deletions(-)

[PATCH v3 01/18] mm: change invalidatepage prototype to accept
[PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v3 03/18] ext4: use ->invalidatepage() length argument
[PATCH v3 04/18] jbd: change journal_invalidatepage() to accept
[PATCH v3 05/18] xfs: use ->invalidatepage() length argument
[PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument
[PATCH v3 07/18] ceph: use ->invalidatepage() length argument
[PATCH v3 08/18] gfs2: use ->invalidatepage() length argument
[PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument
[PATCH v3 10/18] mm: teach

[PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept length

2013-04-09 Thread Lukas Czerner

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c   |3 ++-
 fs/jbd2/transaction.c |   24 +---
 fs/ocfs2/aops.c   |3 ++-
 include/linux/jbd2.h  |2 +-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f5bf189..69595f5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3000,7 +3000,8 @@ static int __ext4_journalled_invalidatepage(struct page 
*page,
if (offset == 0)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset);
+   return jbd2_journal_invalidatepage(journal, page, offset,
+  PAGE_CACHE_SIZE - offset);
 }
 
 /* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..d334e17 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2027,18 +2027,23 @@ zap_buffer_unlocked:
  * void jbd2_journal_invalidatepage()
  * @journal: journal to use for flush...
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  start of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
  */
 int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;
 
@@ -2047,6 +2052,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2056,10 +2063,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   return 0;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
-   ret = journal_unmap_buffer(journal, bh, offset > 0);
+   ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret < 0)
return ret;
@@ -2070,7 +2080,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset);
+   jbd2_journal_invalidatepage(journal, page, offset,
+   PAGE_CACHE_SIZE - offset);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f9fe889..8c34abd 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t 
*, struct buffer_head *);
 extern int  jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(journal_t *,
-

[PATCH v3 00/18] change invalidatepage prototype to accept length

2013-04-09 Thread Lukas Czerner

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change -invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop -invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.

It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4 
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)

For description purposes this patch set can be divided into following
groups:

patch 0001: Change -invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0011 - 0018:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

the much smaller changes in other file systems has not been directly tested,
so please review.


--- 
 Documentation/filesystems/Locking |6 +-
 Documentation/filesystems/vfs.txt |   20 +-
 fs/9p/vfs_addr.c  |5 +-
 fs/afs/file.c |   10 +-
 fs/btrfs/disk-io.c|3 +-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 +-
 fs/buffer.c   |   21 ++-
 fs/ceph/addr.c|   15 +-
 fs/cifs/file.c|5 +-
 fs/exofs/inode.c  |6 +-
 fs/ext3/inode.c   |9 +-
 fs/ext4/ext4.h|   14 +-
 fs/ext4/extents.c |   96 ++
 fs/ext4/inode.c   |  393 +
 fs/f2fs/data.c|3 +-
 fs/f2fs/node.c|3 +-
 fs/gfs2/aops.c|   17 ++-
 fs/jbd/transaction.c  |   19 ++-
 fs/jbd2/transaction.c |   24 ++-
 fs/jfs/jfs_metapage.c |5 +-
 fs/logfs/file.c   |3 +-
 fs/logfs/segment.c|3 +-
 fs/nfs/file.c |8 +-
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |5 +-
 fs/reiserfs/inode.c   |   12 +-
 fs/ubifs/file.c   |5 +-
 fs/xfs/xfs_aops.c |   10 +-
 fs/xfs/xfs_trace.h|   41 -
 include/linux/buffer_head.h   |3 +-
 include/linux/fs.h|2 +-
 include/linux/jbd.h   |2 +-
 include/linux/jbd2.h  |2 +-
 include/linux/mm.h|3 +-
 include/trace/events/ext3.h   |   12 +-
 include/trace/events/ext4.h   |   64 ---
 mm/readahead.c|2 +-
 mm/truncate.c |  117 
 39 files changed, 522 insertions(+), 453 deletions(-)

[PATCH v3 01/18] mm: change invalidatepage prototype to accept
[PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v3 03/18] ext4: use -invalidatepage() length argument
[PATCH v3 04/18] jbd: change journal_invalidatepage() to accept
[PATCH v3 05/18] xfs: use -invalidatepage() length argument
[PATCH v3 06/18] ocfs2: use -invalidatepage() length argument
[PATCH v3 07/18] ceph: use -invalidatepage() length argument
[PATCH v3 08/18] gfs2: use -invalidatepage() length argument
[PATCH v3 09/18] reiserfs: use -invalidatepage() length argument
[PATCH v3 10/18] mm: teach truncate_inode_pages_range() to

[PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept length

2013-04-09 Thread Lukas Czerner

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/inode.c   |3 ++-
 fs/jbd2/transaction.c |   24 +---
 fs/ocfs2/aops.c   |3 ++-
 include/linux/jbd2.h  |2 +-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f5bf189..69595f5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3000,7 +3000,8 @@ static int __ext4_journalled_invalidatepage(struct page 
*page,
if (offset == 0)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset);
+   return jbd2_journal_invalidatepage(journal, page, offset,
+  PAGE_CACHE_SIZE - offset);
 }
 
 /* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..d334e17 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2027,18 +2027,23 @@ zap_buffer_unlocked:
  * void jbd2_journal_invalidatepage()
  * @journal: journal to use for flush...
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  start of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
  */
 int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;
 
@@ -2047,6 +2052,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2056,10 +2063,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   return 0;
+
if (offset = curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
-   ret = journal_unmap_buffer(journal, bh, offset  0);
+   ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret  0)
return ret;
@@ -2070,7 +2080,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free  try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page-mapping-host-i_sb)-journal-j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset);
+   jbd2_journal_invalidatepage(journal, page, offset,
+   PAGE_CACHE_SIZE - offset);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f9fe889..8c34abd 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t 
*, struct buffer_head *);
 extern int  jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(journal_t *,
-   struct page *, unsigned long

[PATCH v3 08/18] gfs2: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: cluster-de...@redhat.com
---
 fs/gfs2/aops.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int length)
 {
struct gfs2_sbd *sdp = GFS2_SB(page-mapping-host);
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
 
BUG_ON(!PageLocked(page));
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
 
bh = head = page_buffers(page);
do {
+   if (pos + bh-b_size  stop)
+   return;
+
if (offset = pos)
gfs2_discard(sdp, bh);
pos += bh-b_size;
bh = bh-b_this_page;
} while (bh != head);
 out:
-   if (offset == 0)
+   if (!partial_page)
try_to_release_page(page, 0);
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

2013-04-09 Thread Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/ext4.h  |2 +
 fs/ext4/inode.c |  110 ---
 2 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3aa5943..2428244 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2109,6 +2109,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d58e13c..6003fd1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3675,6 +3675,37 @@ unlock:
return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend)
+{
+   struct super_block *sb = inode-i_sb;
+   struct address_space *mapping = inode-i_mapping;
+   unsigned partial = lstart  (sb-s_blocksize - 1);
+   ext4_fsblk_t start = lstart  sb-s_blocksize_bits;
+   ext4_fsblk_t end = lend  sb-s_blocksize_bits;
+   int err = 0;
+
+   /* Handle partial zero within the single block */
+   if (start == end) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, lend - lstart + 1);
+   return err;
+   }
+   /* Handle partial zero out on the start of the range */
+   if (partial) {
+   err = ext4_block_zero_page_range(handle, mapping,
+lstart, sb-s_blocksize);
+   if (err)
+   return err;
+   }
+   /* Handle partial zero out on the end of the range */
+   partial = lend  (sb-s_blocksize - 1);
+   if (partial != sb-s_blocksize - 1)
+   err = ext4_block_zero_page_range(handle, mapping,
+lend - partial, partial + 1);
+   return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
if (S_ISREG(inode-i_mode))
@@ -3703,7 +3734,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
struct super_block *sb = inode-i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode-i_mapping;
-   loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
handle_t *handle;
unsigned int credits;
@@ -3755,17 +3785,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
   offset;
}
 
-   first_page = (offset + PAGE_CACHE_SIZE - 1)  PAGE_CACHE_SHIFT;
-   last_page = (offset + length)  PAGE_CACHE_SHIFT;
-
-   first_page_offset = first_page  PAGE_CACHE_SHIFT;
-   last_page_offset = last_page  PAGE_CACHE_SHIFT;
+   first_page_offset = round_up(offset, sb-s_blocksize);
+   last_page_offset = round_down((offset + length), sb-s_blocksize) - 1;
 
-   /* Now release the pages */
-   if (last_page_offset  first_page_offset) {
+   /* Now release the pages and zero block aligned part of pages*/
+   if (last_page_offset  first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
-last_page_offset - 1);
-   }
+last_page_offset);
 
/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -3785,66 +3811,10 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
goto out_dio

[PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

2013-04-09 Thread Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^--^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/extents.c   |   69 +++---
 include/trace/events/ext4.h |   25 ---
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9023b76..577c4f5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int 
nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
  ext4_lblk_t from, ext4_lblk_t to)
 {
struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
@@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 * partial cluster here.
 */
pblk = ext4_ext_pblock(ex) + ee_len - 1;
-   if (*partial_cluster  (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+   if ((*partial_cluster  0) 
+   (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
 EXT4_C2B(sbi, *partial_cluster),
 sbi-s_cluster_ratio, flags);
@@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 to == le32_to_cpu(ex-ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+   unsigned int unaligned;
 
num = le32_to_cpu(ex-ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
-   ext_debug(free last %u blocks starting %llu\n, num, pblk);
+   /*
+* Usually we want to free partial cluster at the end of the
+* extent, except for the situation when the cluster is still
+* used by any other extent (partial_cluster is negative).
+*/
+   if (*partial_cluster  0 
+   -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+   flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+   ext_debug(free last %u blocks starting %llu partial %lld\n,
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
 * If the block range to be freed didn't start at the
 * beginning of a cluster, and we removed the entire
-* extent, save the partial cluster here, since we
-* might need to delete if we determine that the
-* truncate operation has removed all of the blocks in
-* the cluster.
+* extent and the cluster is not used by any other extent,
+* save the partial cluster here, since we might need to
+* delete if we determine that the truncate operation has
+* removed all of the blocks in the cluster.
+*
+* On the other hand, if we did not manage to free the whole
+* extent, we have to mark the cluster as used (store negative
+* cluster number in partial_cluster).
 */
-   if (pblk  (sbi-s_cluster_ratio - 1) 
-   (ee_len == num))
+   unaligned = pblk  (sbi

[PATCH v3 18/18] ext4: Allow punch hole with bigalloc enabled

2013-04-09 Thread Lukas Czerner

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/inode.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d452c1..87d6171 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3536,11 +3536,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, 
loff_t length)
if (!S_ISREG(inode-i_mode))
return -EOPNOTSUPP;
 
-   if (EXT4_SB(sb)-s_cluster_ratio  1) {
-   /* TODO: Add support for bigalloc file systems */
-   return -EOPNOTSUPP;
-   }
-
trace_ext4_punch_hole(inode, offset, length);
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

2013-04-09 Thread Lukas Czerner

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed -invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Hugh Dickins hu...@google.com
---
 mm/truncate.c |  104 -
 1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int 
offset,
(*invalidatepage)(page, offset, length);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-   zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-   cleancache_invalidate_page(page-mapping, page);
-   if (page_has_private(page))
-   do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
  * truncate_inode_pages_range - truncate range of pages specified by start  
end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since -invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
 {
-   const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1)  PAGE_CACHE_SHIFT;
-   const unsigned partial = lstart  (PAGE_CACHE_SIZE - 1);
-   struct pagevec pvec;
-   pgoff_t index;
-   pgoff_t end;
-   int i;
+   pgoff_t start;  /* inclusive */
+   pgoff_t end;/* exclusive */
+   unsigned intpartial_start;  /* inclusive */
+   unsigned intpartial_end;/* exclusive */
+   struct pagevec  pvec;
+   pgoff_t index;
+   int i;
 
cleancache_invalidate_inode(mapping);
if (mapping-nrpages == 0)
return;
 
-   BUG_ON((lend  (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-   end = (lend  PAGE_CACHE_SHIFT);
+   /* Offsets within partial pages */
+   partial_start = lstart  (PAGE_CACHE_SIZE - 1);
+   partial_end = (lend + 1)  (PAGE_CACHE_SIZE - 1);
+
+   /*
+* 'start' and 'end' always covers the range of pages to be fully
+* truncated. Partial pages are covered with 'partial_start' at the
+* start of the range and 'partial_end' at the end of the range.
+* Note that 'end' is exclusive while 'lend' is inclusive.
+*/
+   start = (lstart + PAGE_CACHE_SIZE - 1)  PAGE_CACHE_SHIFT;
+   if (lend == -1)
+   /*
+* lend == -1 indicates end-of-file so we have to set 'end'
+* to the highest possible pgoff_t and since the type is
+* unsigned we're using -1.
+*/
+   end = -1;
+   else
+   end = (lend + 1)  PAGE_CACHE_SHIFT;
 
pagevec_init(pvec, 0);
index = start;
-   while (index = end  pagevec_lookup(pvec, mapping, index,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   while (index  end  pagevec_lookup(pvec, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i

[PATCH v3 15/18] ext4: remove unused code from ext4_remove_blocks()

2013-04-09 Thread Lukas Czerner

The head removal branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/extents.c |   21 -
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6c5a70a..4adaa8a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2435,23 +2435,10 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
-   } else if (from == le32_to_cpu(ex-ee_block)
-   to = le32_to_cpu(ex-ee_block) + ee_len - 1) {
-   /* head removal */
-   ext4_lblk_t num;
-   ext4_fsblk_t start;
-
-   num = to - from;
-   start = ext4_ext_pblock(ex);
-
-   ext_debug(free first %u blocks starting %llu\n, num, start);
-   ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-   } else {
-   printk(KERN_INFO strange request: removal(2) 
-   %u-%u from %u:%u\n,
-   from, to, le32_to_cpu(ex-ee_block), ee_len);
-   }
+   } else
+   ext4_error(sbi-s_sb, strange request: removal(2) 
+  %u-%u from %u:%u\n,
+  from, to, le32_to_cpu(ex-ee_block), ee_len);
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 16/18] ext4: update ext4_ext_remove_space trace point

2013-04-09 Thread Lukas Czerner

Add end variable.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/extents.c   |6 +++---
 include/trace/events/ext4.h |   21 ++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4adaa8a..9023b76 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2666,7 +2666,7 @@ int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
return PTR_ERR(handle);
 
 again:
-   trace_ext4_ext_remove_space(inode, start, depth);
+   trace_ext4_ext_remove_space(inode, start, end, depth);
 
/*
 * Check if we are removing extents inside the extent tree. If that
@@ -2832,8 +2832,8 @@ again:
}
}
 
-   trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-   path-p_hdr-eh_entries);
+   trace_ext4_ext_remove_space_done(inode, start, end, depth,
+   partial_cluster, path-p_hdr-eh_entries);
 
/* If we still have something in the partial cluster and we have removed
 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 60b329a..c92500c 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ext4_lblk_t end, int depth),
 
-   TP_ARGS(inode, start, depth),
+   TP_ARGS(inode, start, end, depth),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
),
 
@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry-dev= inode-i_sb-s_dev;
__entry-ino= inode-i_ino;
__entry-start  = start;
+   __entry-end= end;
__entry-depth  = depth;
),
 
-   TP_printk(dev %d,%d ino %lu since %u depth %d,
+   TP_printk(dev %d,%d ino %lu start %u end %u depth %d,
  MAJOR(__entry-dev), MINOR(__entry-dev),
  (unsigned long) __entry-ino,
  (unsigned) __entry-start,
+ (unsigned) __entry-end,
  __entry-depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-   ext4_lblk_t partial, unsigned short eh_entries),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+int depth, ext4_lblk_t partial, unsigned short eh_entries),
 
-   TP_ARGS(inode, start, depth, partial, eh_entries),
+   TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
__field(ext4_lblk_t,partial )
__field(unsigned short, eh_entries  )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry-dev= inode-i_sb-s_dev;
__entry-ino= inode-i_ino;
__entry-start  = start;
+   __entry-end= end;
__entry-depth  = depth;
__entry-partial= partial;
__entry-eh_entries = eh_entries;
),
 
-   TP_printk(dev %d,%d ino %lu since %u depth %d partial %u 
+   TP_printk(dev %d,%d ino %lu start %u end %u depth %d partial %u 
  remaining_entries %u,
  MAJOR(__entry-dev), MINOR(__entry-dev),
  (unsigned long) __entry-ino,
  (unsigned) __entry-start,
+ (unsigned) __entry-end,
  __entry-depth,
  (unsigned) __entry-partial,
  (unsigned short) __entry-eh_entries)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 14/18] ext4: remove unused discard_partial_page_buffers

2013-04-09 Thread Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/ext4.h  |8 --
 fs/ext4/inode.c |  206 ---
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2428244..cc9020e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -593,11 +593,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER   0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED  0x0001
-
-/*
  * ioctl commands
  */
 #defineEXT4_IOC_GETFLAGS   FS_IOC_GETFLAGS
@@ -2111,9 +2106,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6003fd1..0d452c1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned 
int offset,
unsigned int length);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head 
*bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page *page, loff_t from,
-   loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3352,209 +3349,6 @@ void ext4_set_aops(struct inode *inode)
inode-i_mapping-a_ops = ext4_aops;
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * from and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags)
-{
-   struct inode *inode = mapping-host;
-   struct page *page;
-   int err = 0;
-
-   page = find_or_create_page(mapping, from  PAGE_CACHE_SHIFT,
-  mapping_gfp_mask(mapping)  ~__GFP_FS);
-   if (!page)
-   return -ENOMEM;
-
-   err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-   from, length, flags);
-
-   unlock_page(page);
-   page_cache_release(page);
-   return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset from
- * from:   The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len:The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped.  This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page

[PATCH v3 11/18] Revert ext4: remove no longer used functions in inode.c

2013-04-09 Thread Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/ext4.h  |4 ++
 fs/ext4/inode.c |  120 +++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a0637e5..3aa5943 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2105,6 +2105,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f80e0c3..5729d21 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3555,6 +3555,126 @@ next:
return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from)
+{
+   unsigned offset = from  (PAGE_CACHE_SIZE-1);
+   unsigned length;
+   unsigned blocksize;
+   struct inode *inode = mapping-host;
+
+   blocksize = inode-i_sb-s_blocksize;
+   length = blocksize - (offset  (blocksize - 1));
+
+   return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length)
+{
+   ext4_fsblk_t index = from  PAGE_CACHE_SHIFT;
+   unsigned offset = from  (PAGE_CACHE_SIZE-1);
+   unsigned blocksize, max, pos;
+   ext4_lblk_t iblock;
+   struct inode *inode = mapping-host;
+   struct buffer_head *bh;
+   struct page *page;
+   int err = 0;
+
+   page = find_or_create_page(mapping, from  PAGE_CACHE_SHIFT,
+  mapping_gfp_mask(mapping)  ~__GFP_FS);
+   if (!page)
+   return -ENOMEM;
+
+   blocksize = inode-i_sb-s_blocksize;
+   max = blocksize - (offset  (blocksize - 1));
+
+   /*
+* correct length if it does not fall between
+* 'from' and the end of the block
+*/
+   if (length  max || length  0)
+   length = max;
+
+   iblock = index  (PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits);
+
+   if (!page_has_buffers(page))
+   create_empty_buffers(page, blocksize, 0);
+
+   /* Find the buffer that contains offset */
+   bh = page_buffers(page);
+   pos = blocksize;
+   while (offset = pos) {
+   bh = bh-b_this_page;
+   iblock++;
+   pos += blocksize;
+   }
+
+   err = 0;
+   if (buffer_freed(bh)) {
+   BUFFER_TRACE(bh, freed: skip);
+   goto unlock;
+   }
+
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, unmapped);
+   ext4_get_block(inode, iblock, bh, 0);
+   /* unmapped? It's a hole - nothing to do */
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, still unmapped);
+   goto unlock;
+   }
+   }
+
+   /* Ok, it's mapped. Make sure it's up-to-date */
+   if (PageUptodate(page))
+   set_buffer_uptodate(bh);
+
+   if (!buffer_uptodate(bh)) {
+   err = -EIO;
+   ll_rw_block(READ, 1, bh);
+   wait_on_buffer(bh);
+   /* Uhhuh. Read error. Complain and punt. */
+   if (!buffer_uptodate(bh))
+   goto unlock

[PATCH v3 12/18] Revert ext4: fix fsx truncate failure

2013-04-09 Thread Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay  BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/inode.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5729d21..d58e13c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3920,7 +3920,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode-i_mapping;
-   loff_t page_len;
 
/*
 * There is a possibility that we're either freeing the inode
@@ -3964,14 +3963,8 @@ void ext4_truncate(struct inode *inode)
return;
}
 
-   if (inode-i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode-i_size  (PAGE_CACHE_SIZE - 1));
-
-   if (ext4_discard_partial_page_buffers(handle,
-   mapping, inode-i_size, page_len, 0))
-   goto out_stop;
-   }
+   if (inode-i_size  (inode-i_sb-s_blocksize - 1))
+   ext4_block_truncate_page(handle, mapping, inode-i_size);
 
/*
 * We add the inode to the orphan list, so that if this
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 09/18] reiserfs: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: reiserfs-de...@vger.kernel.org
---
 fs/reiserfs/inode.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page-mapping-host;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int ret = 1;
 
BUG_ON(!PageLocked(page));
 
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
 
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   goto out;
+
/*
 * is this block fully invalidated?
 */
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
 * The get_block cached value has been unconditionally invalidated,
 * so real IO is not possible anymore.
 */
-   if (!offset  ret) {
+   if (!partial_page  ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 06/18] ocfs2: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: Joel Becker jl...@evilplan.org
---
 fs/ocfs2/aops.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = 
OCFS2_SB(page-mapping-host-i_sb)-journal-j_journal;
 
-   jbd2_journal_invalidatepage(journal, page, offset,
-   PAGE_CACHE_SIZE - offset);
+   jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 05/18] xfs: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: x...@oss.sgi.com
---
 fs/xfs/xfs_aops.c  |5 +++--
 fs/xfs/xfs_trace.h |   41 -
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..e8018d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned intoffset,
unsigned intlength)
 {
-   trace_xfs_invalidatepage(page-mapping-host, page, offset);
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   trace_xfs_invalidatepage(page-mapping-host, page, offset,
+length);
+   block_invalidatepage(page, offset, length);
 }
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..91d6434 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage,
+   TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+unsigned int len),
+   TP_ARGS(inode, page, off, len),
+   TP_STRUCT__entry(
+   __field(dev_t, dev)
+   __field(xfs_ino_t, ino)
+   __field(pgoff_t, pgoff)
+   __field(loff_t, size)
+   __field(unsigned int, offset)
+   __field(unsigned int, length)
+   __field(int, delalloc)
+   __field(int, unwritten)
+   ),
+   TP_fast_assign(
+   int delalloc = -1, unwritten = -1;
+
+   if (page_has_buffers(page))
+   xfs_count_page_state(page, delalloc, unwritten);
+   __entry-dev = inode-i_sb-s_dev;
+   __entry-ino = XFS_I(inode)-i_ino;
+   __entry-pgoff = page_offset(page);
+   __entry-size = i_size_read(inode);
+   __entry-offset = off;
+   __entry-length = len;
+   __entry-delalloc = delalloc;
+   __entry-unwritten = unwritten;
+   ),
+   TP_printk(dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x 
+ length %x delalloc %d unwritten %d,
+ MAJOR(__entry-dev), MINOR(__entry-dev),
+ __entry-ino,
+ __entry-pgoff,
+ __entry-size,
+ __entry-offset,
+ __entry-length,
+ __entry-delalloc,
+ __entry-unwritten)
+)
 
 DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 07/18] ceph: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/addr.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, 
unsigned int offset,
if (!PageDirty(page))
pr_err(%p invalidatepage %p page not dirty\n, inode, page);
 
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
ci = ceph_inode(inode);
-   if (offset == 0) {
-   dout(%p invalidatepage %p idx %lu full dirty page %u\n,
-inode, page, page-index, offset);
+   if (offset == 0  length == PAGE_CACHE_SIZE) {
+   dout(%p invalidatepage %p idx %lu full dirty page\n,
+inode, page, page-index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page-private = 0;
ClearPagePrivate(page);
} else {
-   dout(%p invalidatepage %p idx %lu partial dirty page\n,
-inode, page, page-index);
+   dout(%p invalidatepage %p idx %lu partial dirty page %u(%u)\n,
+inode, page, page-index, offset, length);
}
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 04/18] jbd: change journal_invalidatepage() to accept length

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext3/inode.c |6 +++---
 fs/jbd/transaction.c|   19 ++-
 include/linux/jbd.h |2 +-
 include/trace/events/ext3.h |   12 +++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = EXT3_JOURNAL(page-mapping-host);
 
-   trace_ext3_invalidatepage(page, offset);
+   trace_ext3_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   journal_invalidatepage(journal, page, offset);
+   journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
  * void journal_invalidatepage() - invalidate a journal page
  * @journal: journal to use for flush
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
  */
 void journal_invalidatepage(journal_t *journal,
  struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length  PAGE_CACHE_SIZE);
int may_free = 1;
 
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh-b_size;
next = bh-b_this_page;
 
+   if (next_off  stop)
+   return;
+
if (offset = curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free = journal_unmap_buffer(journal, bh,
-offset  0);
+partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free  try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void  journal_release_buffer (handle_t *, struct 
buffer_head *);
 extern int  journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern void journal_invalidatepage(journal_t *,
-   struct page *, unsigned long);
+   struct page *, unsigned int, unsigned int);
 extern int  journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int  journal_stop(handle_t *);
 extern int  journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 
 TRACE_EVENT(ext3_invalidatepage,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_STRUCT__entry(
__field(pgoff_t

[PATCH v3 03/18] ext4: use -invalidatepage() length argument

2013-04-09 Thread Lukas Czerner

-invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner lczer...@redhat.com
---
 fs/ext4/inode.c |   30 +++---
 include/trace/events/ext4.h |   22 --
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 69595f5..f80e0c3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1411,21 +1411,28 @@ static void ext4_da_release_space(struct inode *inode, 
int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-unsigned long offset)
+unsigned int offset,
+unsigned int length)
 {
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page-mapping-host;
struct ext4_sb_info *sbi = EXT4_SB(inode-i_sb);
+   unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
 
+   BUG_ON(stop  PAGE_CACHE_SIZE || stop  length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh-b_size;
 
+   if (next_off  stop)
+   break;
+
if ((offset = curr_off)  (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2825,7 +2832,7 @@ static void ext4_da_invalidatepage(struct page *page, 
unsigned int offset,
if (!page_has_buffers(page))
goto out;
 
-   ext4_da_page_release_reservation(page, offset);
+   ext4_da_page_release_reservation(page, offset, length);
 
 out:
ext4_invalidatepage(page, offset, length);
@@ -2979,29 +2986,29 @@ ext4_readpages(struct file *file, struct address_space 
*mapping,
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
 {
-   trace_ext4_invalidatepage(page, offset);
+   trace_ext4_invalidatepage(page, offset, length);
 
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page)  buffer_jbd(page_buffers(page)));
 
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   block_invalidatepage(page, offset, length);
 }
 
 static int __ext4_journalled_invalidatepage(struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
journal_t *journal = EXT4_JOURNAL(page-mapping-host);
 
-   trace_ext4_journalled_invalidatepage(page, offset);
+   trace_ext4_journalled_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0  length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidatepage(journal, page, offset,
-  PAGE_CACHE_SIZE - offset);
+   return jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 
 /* Wrapper for aops... */
@@ -3009,7 +3016,7 @@ static void ext4_journalled_invalidatepage(struct page 
*page,
   unsigned int offset,
   unsigned int length)
 {
-   WARN_ON(__ext4_journalled_invalidatepage(page, offset)  0);
+   WARN_ON(__ext4_journalled_invalidatepage(page, offset, length)  0);
 }
 
 static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4607,7 +4614,8 @@ static void ext4_wait_for_tail_page_commit(struct inode 
*inode)
  inode-i_size  PAGE_CACHE_SHIFT);
if (!page)
return;
-   ret = __ext4_journalled_invalidatepage(page, offset);
+   ret = __ext4_journalled_invalidatepage(page, offset,
+   PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 58459b7..60b329a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 );
 
 DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_STRUCT__entry(
__field(dev_t,  dev

[PATCH v3 01/18] mm: change invalidatepage prototype to accept length

2013-04-09 Thread Lukas Czerner

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner lczer...@redhat.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Hugh Dickins hu...@google.com
---
 Documentation/filesystems/Locking |6 +++---
 Documentation/filesystems/vfs.txt |   20 ++--
 fs/9p/vfs_addr.c  |5 +++--
 fs/afs/file.c |   10 ++
 fs/btrfs/disk-io.c|3 ++-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 ++-
 fs/buffer.c   |   21 ++---
 fs/ceph/addr.c|5 +++--
 fs/cifs/file.c|5 +++--
 fs/exofs/inode.c  |6 --
 fs/ext3/inode.c   |3 ++-
 fs/ext4/inode.c   |   18 +++---
 fs/f2fs/data.c|3 ++-
 fs/f2fs/node.c|3 ++-
 fs/gfs2/aops.c|8 +---
 fs/jfs/jfs_metapage.c |5 +++--
 fs/logfs/file.c   |3 ++-
 fs/logfs/segment.c|3 ++-
 fs/nfs/file.c |8 +---
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |3 ++-
 fs/reiserfs/inode.c   |3 ++-
 fs/ubifs/file.c   |5 +++--
 fs/xfs/xfs_aops.c |7 ---
 include/linux/buffer_head.h   |3 ++-
 include/linux/fs.h|2 +-
 include/linux/mm.h|3 ++-
 mm/readahead.c|2 +-
 mm/truncate.c |   15 +--
 30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage) (struct page *, unsigned long);
+   void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually 
go away.  Please,
 keep it that way and don't breed new callers.
 
-invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
-returns zero on success.  If -invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If -invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 
-releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
 ---
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 
 struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space

[PATCH v2 03/18] ext4: use ->invalidatepage() length argument

2013-02-05 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/inode.c |   47 ++
 include/trace/events/ext4.h |   22 +++-
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a3be92c..eb61fb7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1289,21 +1289,28 @@ static void ext4_da_release_space(struct inode *inode, 
int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-unsigned long offset)
+unsigned int offset,
+unsigned int length)
 {
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
 
+   if (next_off > stop)
+   break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2709,7 +2716,7 @@ static void ext4_da_invalidatepage(struct page *page, 
unsigned int offset,
if (!page_has_buffers(page))
goto out;
 
-   ext4_da_page_release_reservation(page, offset);
+   ext4_da_page_release_reservation(page, offset, length);
 
 out:
ext4_invalidatepage(page, offset, length);
@@ -2860,22 +2867,33 @@ ext4_readpages(struct file *file, struct address_space 
*mapping,
return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
 }
 
-static void ext4_invalidatepage_free_endio(struct page *page, unsigned long 
offset)
+static void ext4_invalidatepage_free_endio(struct page *page,
+  unsigned int offset,
+  unsigned int length)
 {
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
 
if (!page_has_buffers(page))
return;
+
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = bh = page_buffers(page);
do {
+   unsigned int next_off = curr_off + bh->b_size;
+
+   if (next_off > stop)
+   return;
+
if (offset <= curr_off && test_clear_buffer_uninit(bh)
&& bh->b_private) {
ext4_free_io_end(bh->b_private);
bh->b_private = NULL;
bh->b_end_io = NULL;
}
-   curr_off = curr_off + bh->b_size;
+   curr_off = next_off;
bh = bh->b_this_page;
} while (bh != head);
 }
@@ -2883,35 +2901,35 @@ static void ext4_invalidatepage_free_endio(struct page 
*page, unsigned long offs
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
 {
-   trace_ext4_invalidatepage(page, offset);
+   trace_ext4_invalidatepage(page, offset, length);
 
/*
 * free any io_end structure allocated for buffers to be discarded
 */
if (ext4_should_dioread_nolock(page->mapping->host))
-   ext4_invalidatepage_free_endio(page, offset);
+   ext4_invalidatepage_free_endio(page, offset, length);
 
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
 
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   block_invalidatepage(page, offset, length);
 }
 
 static int __ext4_journalled_invalidatepage(struct page *page,
-   unsigned long offset)
+   unsigned int offset,
+   unsigned int length)
 {
journal_t *journal = EXT4_JOURNAL(page->mapping->host);
 
-   trace_ext4_journalled_invalidatepage(page, offset);
+   trace_ext4_journalled_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   return jbd2_journal_invalidate

[PATCH v2 01/18] mm: change invalidatepage prototype to accept length

2013-02-05 Thread Lukas Czerner

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 Documentation/filesystems/Locking |6 +++---
 Documentation/filesystems/vfs.txt |   20 ++--
 fs/9p/vfs_addr.c  |5 +++--
 fs/afs/file.c |   10 ++
 fs/btrfs/disk-io.c|3 ++-
 fs/btrfs/extent_io.c  |2 +-
 fs/btrfs/inode.c  |3 ++-
 fs/buffer.c   |   21 ++---
 fs/ceph/addr.c|5 +++--
 fs/cifs/file.c|5 +++--
 fs/exofs/inode.c  |6 --
 fs/ext3/inode.c   |3 ++-
 fs/ext4/inode.c   |   18 +++---
 fs/f2fs/data.c|3 ++-
 fs/f2fs/node.c|3 ++-
 fs/gfs2/aops.c|8 +---
 fs/jfs/jfs_metapage.c |5 +++--
 fs/logfs/file.c   |3 ++-
 fs/logfs/segment.c|3 ++-
 fs/nfs/file.c |8 +---
 fs/ntfs/aops.c|2 +-
 fs/ocfs2/aops.c   |3 ++-
 fs/reiserfs/inode.c   |3 ++-
 fs/ubifs/file.c   |5 +++--
 fs/xfs/xfs_aops.c |7 ---
 include/linux/buffer_head.h   |3 ++-
 include/linux/fs.h|2 +-
 include/linux/mm.h|3 ++-
 mm/readahead.c|2 +-
 mm/truncate.c |   15 +--
 30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index f48e0c6..8043a87 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -187,7 +187,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatepage) (struct page *, unsigned long);
+   void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -308,8 +308,8 @@ filesystems and by the swapper. The latter will eventually 
go away.  Please,
 keep it that way and don't breed new callers.
 
->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
-returns zero on success.  If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 
->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index e3869098..279fb73 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
 ---
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 
 struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
-   int (*invalidatep

[PATCH v2 04/18] jbd: change journal_invalidatepage() to accept length

2013-02-05 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner 
---
 fs/ext3/inode.c |6 +++---
 fs/jbd/transaction.c|   19 ++-
 include/linux/jbd.h |2 +-
 include/trace/events/ext3.h |   12 +++-
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 0b8d382..88404fd 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1824,15 +1824,15 @@ static void ext3_invalidatepage(struct page *page, 
unsigned int offset,
 {
journal_t *journal = EXT3_JOURNAL(page->mapping->host);
 
-   trace_ext3_invalidatepage(page, offset);
+   trace_ext3_invalidatepage(page, offset, length);
 
/*
 * If it's a full truncate we just forget about the pending dirtying
 */
-   if (offset == 0)
+   if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
 
-   journal_invalidatepage(journal, page, offset);
+   journal_invalidatepage(journal, page, offset, length);
 }
 
 static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
  * void journal_invalidatepage() - invalidate a journal page
  * @journal: journal to use for flush
  * @page:page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
+ * @length:  length of the range to invalidate
  *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
  */
 void journal_invalidatepage(journal_t *journal,
  struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
 {
struct buffer_head *head, *bh, *next;
+   unsigned int stop = offset + length;
unsigned int curr_off = 0;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
 
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
 
+   BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
 * data lists (especially for journaled data mode), so be
 * cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
-offset > 0);
+partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
 
} while (bh != head);
 
-   if (!offset) {
+   if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void  journal_release_buffer (handle_t *, struct 
buffer_head *);
 extern int  journal_forget (handle_t *, struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern void journal_invalidatepage(journal_t *,
-   struct page *, unsigned long);
+   struct page *, unsigned int, unsigned int);
 extern int  journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int  journal_stop(handle_t *);
 extern int  journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 
 TRACE_EVENT(ext3_invalidatepage,
-   TP_PROTO(struct page *page, unsigned long offset),
+   TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
 
-   TP_ARGS(page, offset),
+   TP_ARGS(page, offset, length),
 
TP_S

[PATCH v2 05/18] xfs: use ->invalidatepage() length argument

2013-02-05 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner 
Cc: x...@oss.sgi.com
---
 fs/xfs/xfs_aops.c  |5 +++--
 fs/xfs/xfs_trace.h |   41 -
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..e8018d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned intoffset,
unsigned intlength)
 {
-   trace_xfs_invalidatepage(page->mapping->host, page, offset);
-   block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+   trace_xfs_invalidatepage(page->mapping->host, page, offset,
+length);
+   block_invalidatepage(page, offset, length);
 }
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..91d6434 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage,
+   TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+unsigned int len),
+   TP_ARGS(inode, page, off, len),
+   TP_STRUCT__entry(
+   __field(dev_t, dev)
+   __field(xfs_ino_t, ino)
+   __field(pgoff_t, pgoff)
+   __field(loff_t, size)
+   __field(unsigned int, offset)
+   __field(unsigned int, length)
+   __field(int, delalloc)
+   __field(int, unwritten)
+   ),
+   TP_fast_assign(
+   int delalloc = -1, unwritten = -1;
+
+   if (page_has_buffers(page))
+   xfs_count_page_state(page, , );
+   __entry->dev = inode->i_sb->s_dev;
+   __entry->ino = XFS_I(inode)->i_ino;
+   __entry->pgoff = page_offset(page);
+   __entry->size = i_size_read(inode);
+   __entry->offset = off;
+   __entry->length = len;
+   __entry->delalloc = delalloc;
+   __entry->unwritten = unwritten;
+   ),
+   TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x "
+ "length %x delalloc %d unwritten %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pgoff,
+ __entry->size,
+ __entry->offset,
+ __entry->length,
+ __entry->delalloc,
+ __entry->unwritten)
+)
 
 DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 11/18] Revert "ext4: remove no longer used functions in inode.c"

2013-02-05 Thread Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |4 ++
 fs/ext4/inode.c |  120 +++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8462eb3..4246a55 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2086,6 +2086,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eb61fb7..5ccf556 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3552,6 +3552,126 @@ next:
return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+   struct address_space *mapping, loff_t from)
+{
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned length;
+   unsigned blocksize;
+   struct inode *inode = mapping->host;
+
+   blocksize = inode->i_sb->s_blocksize;
+   length = blocksize - (offset & (blocksize - 1));
+
+   return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+   struct address_space *mapping, loff_t from, loff_t length)
+{
+   ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+   unsigned offset = from & (PAGE_CACHE_SIZE-1);
+   unsigned blocksize, max, pos;
+   ext4_lblk_t iblock;
+   struct inode *inode = mapping->host;
+   struct buffer_head *bh;
+   struct page *page;
+   int err = 0;
+
+   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+  mapping_gfp_mask(mapping) & ~__GFP_FS);
+   if (!page)
+   return -ENOMEM;
+
+   blocksize = inode->i_sb->s_blocksize;
+   max = blocksize - (offset & (blocksize - 1));
+
+   /*
+* correct length if it does not fall between
+* 'from' and the end of the block
+*/
+   if (length > max || length < 0)
+   length = max;
+
+   iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+   if (!page_has_buffers(page))
+   create_empty_buffers(page, blocksize, 0);
+
+   /* Find the buffer that contains "offset" */
+   bh = page_buffers(page);
+   pos = blocksize;
+   while (offset >= pos) {
+   bh = bh->b_this_page;
+   iblock++;
+   pos += blocksize;
+   }
+
+   err = 0;
+   if (buffer_freed(bh)) {
+   BUFFER_TRACE(bh, "freed: skip");
+   goto unlock;
+   }
+
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "unmapped");
+   ext4_get_block(inode, iblock, bh, 0);
+   /* unmapped? It's a hole - nothing to do */
+   if (!buffer_mapped(bh)) {
+   BUFFER_TRACE(bh, "still unmapped");
+   goto unlock;
+   }
+   }
+
+   /* Ok, it's mapped. Make sure it's up-to-date */
+   if (PageUptodate(page))
+   set_buffer_uptodate(bh);
+
+   if (!buffer_uptodate(bh)) {
+   err = -EIO;
+   ll_rw_block(READ, 1, );
+   wait_on_buffer(bh);
+   /* Uhhu

[PATCH v2 09/18] reiserfs: use ->invalidatepage() length argument

2013-02-05 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner 
Cc: reiserfs-de...@vger.kernel.org
---
 fs/reiserfs/inode.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 5ca4fb4..cf949b9 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;
 
BUG_ON(!PageLocked(page));
 
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
 
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
 
+   if (next_off > stop)
+   goto out;
+
/*
 * is this block fully invalidated?
 */
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, 
unsigned int offset,
 * The get_block cached value has been unconditionally invalidated,
 * so real IO is not possible anymore.
 */
-   if (!offset && ret) {
+   if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

2013-02-05 Thread Lukas Czerner

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner 
Cc: Andrew Morton 
Cc: Hugh Dickins 
---
 mm/truncate.c |  104 -
 1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int 
offset,
(*invalidatepage)(page, offset, length);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-   zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-   cleancache_invalidate_page(page->mapping, page);
-   if (page_has_private(page))
-   do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
  * truncate_inode_pages_range - truncate range of pages specified by start & 
end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
 {
-   const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-   const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
-   struct pagevec pvec;
-   pgoff_t index;
-   pgoff_t end;
-   int i;
+   pgoff_t start;  /* inclusive */
+   pgoff_t end;/* exclusive */
+   unsigned intpartial_start;  /* inclusive */
+   unsigned intpartial_end;/* exclusive */
+   struct pagevec  pvec;
+   pgoff_t index;
+   int i;
 
cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;
 
-   BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-   end = (lend >> PAGE_CACHE_SHIFT);
+   /* Offsets within partial pages */
+   partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+   partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+   /*
+* 'start' and 'end' always covers the range of pages to be fully
+* truncated. Partial pages are covered with 'partial_start' at the
+* start of the range and 'partial_end' at the end of the range.
+* Note that 'end' is exclusive while 'lend' is inclusive.
+*/
+   start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+   if (lend == -1)
+   /*
+* lend == -1 indicates end-of-file so we have to set 'end'
+* to the highest possible pgoff_t and since the type is
+* unsigned we're using -1.
+*/
+   end = -1;
+   else
+   end = (lend + 1) >> PAGE_CACHE_SHIFT;
 
pagevec_init(, 0);
index = start;
-   while (index <= end && pagevec_lookup(, mapping, index,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   while (index < end && pagevec_lookup(, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
m

[PATCH v2 12/18] Revert "ext4: fix fsx truncate failure"

2013-02-05 Thread Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c  |   13 ++---
 fs/ext4/indirect.c |   13 ++---
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5ae1674..5ce5a14 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4271,7 +4271,6 @@ void ext4_ext_truncate(struct inode *inode)
struct super_block *sb = inode->i_sb;
ext4_lblk_t last_block;
handle_t *handle;
-   loff_t page_len;
int err = 0;
 
/*
@@ -4288,16 +4287,8 @@ void ext4_ext_truncate(struct inode *inode)
if (IS_ERR(handle))
return;
 
-   if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-   err = ext4_discard_partial_page_buffers(handle,
-   mapping, inode->i_size, page_len, 0);
-
-   if (err)
-   goto out_stop;
-   }
+   if (inode->i_size & (sb->s_blocksize - 1))
+   ext4_block_truncate_page(handle, mapping, inode->i_size);
 
if (ext4_orphan_add(handle, inode))
goto out_stop;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 20862f9..48d10b2 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1363,9 +1363,7 @@ void ext4_ind_truncate(struct inode *inode)
__le32 nr = 0;
int n = 0;
ext4_lblk_t last_block, max_block;
-   loff_t page_len;
unsigned blocksize = inode->i_sb->s_blocksize;
-   int err;
 
handle = start_transaction(inode);
if (IS_ERR(handle))
@@ -1376,16 +1374,9 @@ void ext4_ind_truncate(struct inode *inode)
max_block = (EXT4_SB(inode->i_sb)->s_bitmap_maxbytes + blocksize-1)
>> EXT4_BLOCK_SIZE_BITS(inode->i_sb);
 
-   if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-   page_len = PAGE_CACHE_SIZE -
-   (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-   err = ext4_discard_partial_page_buffers(handle,
-   mapping, inode->i_size, page_len, 0);
-
-   if (err)
+   if (inode->i_size & (blocksize - 1))
+   if (ext4_block_truncate_page(handle, mapping, inode->i_size))
goto out_stop;
-   }
 
if (last_block != max_block) {
n = ext4_block_to_path(inode, last_block, offsets, NULL);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

2013-02-05 Thread Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h|2 +
 fs/ext4/extents.c |   79 ++---
 fs/ext4/inode.c   |   31 +
 3 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4246a55..665b975 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2090,6 +2090,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5ce5a14..7b44dc1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4639,7 +4639,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, 
loff_t length)
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
-   loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
int credits, err = 0;
 
@@ -4680,17 +4679,13 @@ int ext4_ext_punch_hole(struct file *file, loff_t 
offset, loff_t length)
   offset;
}
 
-   first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-   last_page = (offset + length) >> PAGE_CACHE_SHIFT;
+   first_page_offset = round_up(offset, sb->s_blocksize);
+   last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-   first_page_offset = first_page << PAGE_CACHE_SHIFT;
-   last_page_offset = last_page << PAGE_CACHE_SHIFT;
-
-   /* Now release the pages */
-   if (last_page_offset > first_page_offset) {
+   /* Now release the pages and zero block aligned part of pages*/
+   if (last_page_offset > first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
-last_page_offset - 1);
-   }
+last_page_offset);
 
/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -4707,66 +4702,10 @@ int ext4_ext_punch_hole(struct file *file, loff_t 
offset, loff_t length)
}
 
 
-   /*
-* Now we need to zero out the non-page-aligned data in the
-* pages at the start and tail of the hole, and unmap the buffer
-* heads for the block aligned regions of the page that were
-* completely zeroed.
-*/
-   if (first_page > last_page) {
-   /*
-* If the file space being truncated is contained within a page
-* just zero out and unmap the middle of that page
-*/
-   err = ext4_discard_partial_page_buffers(handle,
-   mapping, offset, length, 0);
-
-   if (err)
-   goto out;
-   } else {
-   /*
-* zero out and unmap the partial page that contains
-* the start of the hole
-*/
-   page_len  = first_page_offset - offset;
-   if (page_len > 0) {
-   err = ext4_discard_partial_page_buffers(handle, mapping,
-  offset, page_len, 0);
-   if (err)
-   goto out;
-   }
-
-   /*
-* zero out and unmap the partial page that contains
-* the end of the hole
-*/
-   page_len = offset + length - last_page_offset;
-   if (page_len > 0) {
-   err = ext4_discard_partial_page_buffers(handle, mapping,
-   last_page_offset, page_len, 0);
-   i

[PATCH v2 08/18] gfs2: use ->invalidatepage() length argument

2013-02-05 Thread Lukas Czerner

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner 
Cc: cluster-de...@redhat.com
---
 fs/gfs2/aops.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 5bd558c..3cf3dc8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -949,24 +949,29 @@ static void gfs2_invalidatepage(struct page *page, 
unsigned int offset,
unsigned int length)
 {
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+   unsigned int stop = offset + length;
+   int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
 
BUG_ON(!PageLocked(page));
-   if (offset == 0)
+   if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
 
bh = head = page_buffers(page);
do {
+   if (pos + bh->b_size > stop)
+   return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
 out:
-   if (offset == 0)
+   if (!partial_page)
try_to_release_page(page, 0);
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 14/18] ext4: remove unused discard_partial_page_buffers

2013-02-05 Thread Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/ext4.h  |8 --
 fs/ext4/inode.c |  206 ---
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 665b975..c276dcb 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -594,11 +594,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER   0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED  0x0001
-
-/*
  * ioctl commands
  */
 #defineEXT4_IOC_GETFLAGS   FS_IOC_GETFLAGS
@@ -2092,9 +2087,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5df9716..e7bf594 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -139,9 +139,6 @@ static int ext4_set_bh_endio(struct buffer_head *bh, struct 
inode *inode);
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head 
*bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inode, struct page *page, loff_t from,
-   loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3349,209 +3346,6 @@ void ext4_set_aops(struct inode *inode)
}
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-   struct address_space *mapping, loff_t from,
-   loff_t length, int flags)
-{
-   struct inode *inode = mapping->host;
-   struct page *page;
-   int err = 0;
-
-   page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
-  mapping_gfp_mask(mapping) & ~__GFP_FS);
-   if (!page)
-   return -ENOMEM;
-
-   err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-   from, length, flags);
-
-   unlock_page(page);
-   page_cache_release(page);
-   return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset "from"
- * from:   The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len:The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped.  This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-   struct inode *inod

[PATCH v2 16/18] ext4: update ext4_ext_remove_space trace point

2013-02-05 Thread Lukas Czerner

Add "end" variable.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c   |6 +++---
 include/trace/events/ext4.h |   21 ++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 966a09e..2b7e521 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2647,7 +2647,7 @@ static int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
 again:
ext4_ext_invalidate_cache(inode);
 
-   trace_ext4_ext_remove_space(inode, start, depth);
+   trace_ext4_ext_remove_space(inode, start, end, depth);
 
/*
 * Check if we are removing extents inside the extent tree. If that
@@ -2813,8 +2813,8 @@ again:
}
}
 
-   trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-   path->p_hdr->eh_entries);
+   trace_ext4_ext_remove_space_done(inode, start, end, depth,
+   partial_cluster, path->p_hdr->eh_entries);
 
/* If we still have something in the partial cluster and we have removed
 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 6ab6f8a..5fd98f9 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2010,14 +2010,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ext4_lblk_t end, int depth),
 
-   TP_ARGS(inode, start, depth),
+   TP_ARGS(inode, start, end, depth),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
),
 
@@ -2025,26 +2027,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d",
+   TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-   TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-   ext4_lblk_t partial, unsigned short eh_entries),
+   TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+int depth, ext4_lblk_t partial, unsigned short eh_entries),
 
-   TP_ARGS(inode, start, depth, partial, eh_entries),
+   TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
TP_STRUCT__entry(
__field(dev_t,  dev )
__field(ino_t,  ino )
__field(ext4_lblk_t,start   )
+   __field(ext4_lblk_t,end )
__field(int,depth   )
__field(ext4_lblk_t,partial )
__field(unsigned short, eh_entries  )
@@ -2054,16 +2059,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev= inode->i_sb->s_dev;
__entry->ino= inode->i_ino;
__entry->start  = start;
+   __entry->end= end;
__entry->depth  = depth;
__entry->partial= partial;
__entry->eh_entries = eh_entries;
),
 
-   TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+   TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
  "remaining_entries %u",
  MAJOR(__entry->dev), MINOR(__entry->dev),
  (unsigned long) __entry->ino,
  (unsigned) __entry->start,
+ (unsigned) __entry->end,
  __entry->depth,
  (unsigned) __entry->partial,
  (unsigned short) __entry->eh_entries)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 17/18] ext4: make punch hole code path work with bigalloc

2013-02-05 Thread Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^--^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c   |   69 +++---
 include/trace/events/ext4.h |   25 ---
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2b7e521..fbb2940 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2341,7 +2341,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int 
nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
  ext4_lblk_t from, ext4_lblk_t to)
 {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2370,7 +2370,8 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
 * partial cluster here.
 */
pblk = ext4_ext_pblock(ex) + ee_len - 1;
-   if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+   if ((*partial_cluster > 0) &&
+   (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
 EXT4_C2B(sbi, *partial_cluster),
 sbi->s_cluster_ratio, flags);
@@ -2396,23 +2397,41 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+   unsigned int unaligned;
 
num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
-   ext_debug("free last %u blocks starting %llu\n", num, pblk);
+   /*
+* Usually we want to free partial cluster at the end of the
+* extent, except for the situation when the cluster is still
+* used by any other extent (partial_cluster is negative).
+*/
+   if (*partial_cluster < 0 &&
+   -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+   flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+   ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
 * If the block range to be freed didn't start at the
 * beginning of a cluster, and we removed the entire
-* extent, save the partial cluster here, since we
-* might need to delete if we determine that the
-* truncate operation has removed all of the blocks in
-* the cluster.
+* extent and the cluster is not used by any other extent,
+* save the partial cluster here, since we might need to
+* delete if we determine that the truncate operation has
+* removed all of the blocks in the cluster.
+*
+* On the other hand, if we did not manage to free the whole
+* extent, we have to mark the cluster as used (store negative
+* cluster number in partial_cluster).
 */
-   if (pblk & (sbi->s_cluster_ratio - 1) &a

[PATCH v2 15/18] ext4: remove unused code from ext4_remove_blocks()

2013-02-05 Thread Lukas Czerner

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner 
---
 fs/ext4/extents.c |   21 -
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7b44dc1..966a09e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2414,23 +2414,10 @@ static int ext4_remove_blocks(handle_t *handle, struct 
inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
-   } else if (from == le32_to_cpu(ex->ee_block)
-  && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
-   /* head removal */
-   ext4_lblk_t num;
-   ext4_fsblk_t start;
-
-   num = to - from;
-   start = ext4_ext_pblock(ex);
-
-   ext_debug("free first %u blocks starting %llu\n", num, start);
-   ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-   } else {
-   printk(KERN_INFO "strange request: removal(2) "
-   "%u-%u from %u:%u\n",
-   from, to, le32_to_cpu(ex->ee_block), ee_len);
-   }
+   } else
+   ext4_error(sbi->s_sb, "strange request: removal(2) "
+  "%u-%u from %u:%u\n",
+  from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 182 matches

Mail list logo