from:"Changman Lee"

Re: [PATCH] f2fs: fix reference leaks in f2fs_acl_create

2015-03-09 Thread Changman Lee

Reviewed-by: Changman Lee 

On Mon, Mar 09, 2015 at 06:18:19PM +0800, Chao Yu wrote:
> Our f2fs_acl_create is copied and modified from posix_acl_create to avoid
> deadlock bug when inline_dentry feature is enabled.
> 
> Now, we got reference leaks in posix_acl_create, and this has been fixed in
> commit fed0b588be2f ("posix_acl: fix reference leaks in posix_acl_create")
> by Omar Sandoval.
> https://lkml.org/lkml/2015/2/9/5
> 
> Let's fix this issue in f2fs_acl_create too.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/acl.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
> index 7422027..4320ffa 100644
> --- a/fs/f2fs/acl.c
> +++ b/fs/f2fs/acl.c
> @@ -351,13 +351,11 @@ static int f2fs_acl_create(struct inode *dir, umode_t 
> *mode,
>  
>   *acl = f2fs_acl_clone(p, GFP_NOFS);
>   if (!*acl)
> - return -ENOMEM;
> + goto no_mem;
>  
>   ret = f2fs_acl_create_masq(*acl, mode);
> - if (ret < 0) {
> - posix_acl_release(*acl);
> - return -ENOMEM;
> - }
> + if (ret < 0)
> + goto no_mem_clone;
>  
>   if (ret == 0) {
>   posix_acl_release(*acl);
> @@ -378,6 +376,12 @@ no_acl:
>   *default_acl = NULL;
>   *acl = NULL;
>   return 0;
> +
> +no_mem_clone:
> + posix_acl_release(*acl);
> +no_mem:
> + posix_acl_release(p);
> + return -ENOMEM;
>  }
>  
>  int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage,
> -- 
> 2.3.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] f2fs: fix reference leaks in f2fs_acl_create

2015-03-09 Thread Changman Lee

Reviewed-by: Changman Lee cm224@ssamsung.com

On Mon, Mar 09, 2015 at 06:18:19PM +0800, Chao Yu wrote:
 Our f2fs_acl_create is copied and modified from posix_acl_create to avoid
 deadlock bug when inline_dentry feature is enabled.
 
 Now, we got reference leaks in posix_acl_create, and this has been fixed in
 commit fed0b588be2f (posix_acl: fix reference leaks in posix_acl_create)
 by Omar Sandoval.
 https://lkml.org/lkml/2015/2/9/5
 
 Let's fix this issue in f2fs_acl_create too.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/acl.c | 14 +-
  1 file changed, 9 insertions(+), 5 deletions(-)
 
 diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
 index 7422027..4320ffa 100644
 --- a/fs/f2fs/acl.c
 +++ b/fs/f2fs/acl.c
 @@ -351,13 +351,11 @@ static int f2fs_acl_create(struct inode *dir, umode_t 
 *mode,
  
   *acl = f2fs_acl_clone(p, GFP_NOFS);
   if (!*acl)
 - return -ENOMEM;
 + goto no_mem;
  
   ret = f2fs_acl_create_masq(*acl, mode);
 - if (ret  0) {
 - posix_acl_release(*acl);
 - return -ENOMEM;
 - }
 + if (ret  0)
 + goto no_mem_clone;
  
   if (ret == 0) {
   posix_acl_release(*acl);
 @@ -378,6 +376,12 @@ no_acl:
   *default_acl = NULL;
   *acl = NULL;
   return 0;
 +
 +no_mem_clone:
 + posix_acl_release(*acl);
 +no_mem:
 + posix_acl_release(p);
 + return -ENOMEM;
  }
  
  int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage,
 -- 
 2.3.0
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs:remove unnecessary condition judgment

2015-03-08 Thread Changman Lee

Hi Yuan,

On Sat, Mar 07, 2015 at 10:05:25AM +, Yuan Zhong wrote:
> Remove the unnecessary condition judgment, because 
> 'max_slots' has been initialized to '0' at the beginging 
> of the function, as following:
> if (max_slots)
>max_len = 0;

There is wrong statement. It should be fixed as *max_slot = 0.

Thanks,

>   
> Signed-off-by: Yuan Zhong
> ---
>  fs/f2fs/dir.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 590aeef..1f1a1bc 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -139,7 +139,7 @@ struct f2fs_dir_entry *find_target_dentry(struct qstr 
> *name, int *max_slots,
>   !memcmp(d->filename[bit_pos], name->name, name->len))
>   goto found;
>  
> - if (max_slots && *max_slots >= 0 && max_len > *max_slots) {
> + if (max_slots && max_len > *max_slots) {
>   *max_slots = max_len;
>   max_len = 0;
>   }
> -- 
> 1.7.9.5
> --
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the 
> conversation now. http://goparallel.sourceforge.net/
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix max orphan inodes calculation

2015-03-08 Thread Changman Lee


--> 8 --

>From ce2462523dd5940b59f770c09a50d4babff5fcdb Mon Sep 17 00:00:00 2001
From: Changman Lee 
Date: Mon, 9 Mar 2015 08:07:04 +0900
Subject: [PATCH] f2fs: cleanup statement about max orphan inodes calc

Through each macro, we can read the meaning easily.

Signed-off-by: Changman Lee 
---
 fs/f2fs/checkpoint.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 53bc328..384bfc4 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1104,13 +1104,6 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi)
im->ino_num = 0;
}
 
-   /*
-* considering 512 blocks in a segment 8+cp_payload blocks are
-* needed for cp and log segment summaries. Remaining blocks are
-* used to keep orphan entries with the limitation one reserved
-* segment for cp pack we can have max 1020*(504-cp_payload)
-* orphan entries
-*/
sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
NR_CURSEG_TYPE - __cp_payload(sbi)) *
F2FS_ORPHANS_PER_BLOCK;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs:remove unnecessary condition judgment

2015-03-08 Thread Changman Lee

Hi Yuan,

On Sat, Mar 07, 2015 at 10:05:25AM +, Yuan Zhong wrote:
 Remove the unnecessary condition judgment, because 
 'max_slots' has been initialized to '0' at the beginging 
 of the function, as following:
 if (max_slots)
max_len = 0;

There is wrong statement. It should be fixed as *max_slot = 0.

Thanks,

   
 Signed-off-by: Yuan Zhong yuan.mark.zh...@samsung.com   
 ---
  fs/f2fs/dir.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index 590aeef..1f1a1bc 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -139,7 +139,7 @@ struct f2fs_dir_entry *find_target_dentry(struct qstr 
 *name, int *max_slots,
   !memcmp(d-filename[bit_pos], name-name, name-len))
   goto found;
  
 - if (max_slots  *max_slots = 0  max_len  *max_slots) {
 + if (max_slots  max_len  *max_slots) {
   *max_slots = max_len;
   max_len = 0;
   }
 -- 
 1.7.9.5
 --
 Dive into the World of Parallel Programming The Go Parallel Website, sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for all
 things parallel software development, from weekly thought leadership blogs to
 news, videos, case studies, tutorials and more. Take a look and join the 
 conversation now. http://goparallel.sourceforge.net/
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix max orphan inodes calculation

2015-03-08 Thread Changman Lee


-- 8 --

From ce2462523dd5940b59f770c09a50d4babff5fcdb Mon Sep 17 00:00:00 2001
From: Changman Lee cm224@samsung.com
Date: Mon, 9 Mar 2015 08:07:04 +0900
Subject: [PATCH] f2fs: cleanup statement about max orphan inodes calc

Through each macro, we can read the meaning easily.

Signed-off-by: Changman Lee cm224@samsung.com
---
 fs/f2fs/checkpoint.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 53bc328..384bfc4 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1104,13 +1104,6 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi)
im-ino_num = 0;
}
 
-   /*
-* considering 512 blocks in a segment 8+cp_payload blocks are
-* needed for cp and log segment summaries. Remaining blocks are
-* used to keep orphan entries with the limitation one reserved
-* segment for cp pack we can have max 1020*(504-cp_payload)
-* orphan entries
-*/
sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS -
NR_CURSEG_TYPE - __cp_payload(sbi)) *
F2FS_ORPHANS_PER_BLOCK;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix max orphan inodes calculation

2015-03-02 Thread Changman Lee

On Fri, Feb 27, 2015 at 05:38:13PM +0800, Wanpeng Li wrote:
> cp_payload is introduced for sit bitmap to support large volume, and it is
> just after the block of f2fs_checkpoint + nat bitmap, so the first segment
> should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks.
> However, current max orphan inodes calculation don't consider cp_payload,
> this patch fix it by reducing the number of cp_payload from total blocks of
> the first segment when calculate max orphan inodes.
> 
> Signed-off-by: Wanpeng Li 
> ---
> v1 -> v2:
>  * adjust comments above the codes 
>  * fix coding style issue
> 
>  fs/f2fs/checkpoint.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index db82e09..a914e99 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1103,13 +1103,15 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi)
>   }
>  
>   /*
> -  * considering 512 blocks in a segment 8 blocks are needed for cp
> -  * and log segment summaries. Remaining blocks are used to keep
> -  * orphan entries with the limitation one reserved segment
> -  * for cp pack we can have max 1020*504 orphan entries
> +  * considering 512 blocks in a segment 8+cp_payload blocks are
> +  * needed for cp and log segment summaries. Remaining blocks are
> +  * used to keep orphan entries with the limitation one reserved
> +  * segment for cp pack we can have max 1020*(504-cp_payload)
> +  * orphan entries
>*/

Hi all,

I think below code give us information enough so it doesn't need to
describe above comments. And someone could get confused by 1020 constants.
How do you think about removing comments.

Regards,
Changman

>   sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
> - NR_CURSEG_TYPE) * F2FS_ORPHANS_PER_BLOCK;
> + NR_CURSEG_TYPE - __cp_payload(sbi)) *
> + F2FS_ORPHANS_PER_BLOCK;
>  }
>  
>  int __init create_checkpoint_caches(void)
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] f2fs: fix to issue small discard in real-time mode discard

2015-03-02 Thread Changman Lee

On Sat, Feb 28, 2015 at 05:23:30PM +0800, Chao Yu wrote:
> Now in f2fs, we share functions and structures for batch mode and real-time 
> mode
> discard. For real-time mode discard, in shared function add_discard_addrs, we
> will use uninitialized trim_minlen in struct cp_control to compare with length
> of contiguous free blocks to decide whether skipping discard fragmented 
> freespace
> or not, this makes us ignore small discard sometimes. Fix it.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index daee4ab..fcc1cc2 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -549,7 +549,7 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>  
>   end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1);
>  
> - if (end - start < cpc->trim_minlen)
> + if (force && end - start < cpc->trim_minlen)
>   continue;

Reviewed-by : Changman Lee 

>  
>   __add_discard_entry(sbi, cpc, start, end);
> -- 
> 2.3.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix max orphan inodes calculation

2015-03-02 Thread Changman Lee

On Fri, Feb 27, 2015 at 05:38:13PM +0800, Wanpeng Li wrote:
 cp_payload is introduced for sit bitmap to support large volume, and it is
 just after the block of f2fs_checkpoint + nat bitmap, so the first segment
 should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks.
 However, current max orphan inodes calculation don't consider cp_payload,
 this patch fix it by reducing the number of cp_payload from total blocks of
 the first segment when calculate max orphan inodes.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
 ---
 v1 - v2:
  * adjust comments above the codes 
  * fix coding style issue
 
  fs/f2fs/checkpoint.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)
 
 diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
 index db82e09..a914e99 100644
 --- a/fs/f2fs/checkpoint.c
 +++ b/fs/f2fs/checkpoint.c
 @@ -1103,13 +1103,15 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi)
   }
  
   /*
 -  * considering 512 blocks in a segment 8 blocks are needed for cp
 -  * and log segment summaries. Remaining blocks are used to keep
 -  * orphan entries with the limitation one reserved segment
 -  * for cp pack we can have max 1020*504 orphan entries
 +  * considering 512 blocks in a segment 8+cp_payload blocks are
 +  * needed for cp and log segment summaries. Remaining blocks are
 +  * used to keep orphan entries with the limitation one reserved
 +  * segment for cp pack we can have max 1020*(504-cp_payload)
 +  * orphan entries
*/

Hi all,

I think below code give us information enough so it doesn't need to
describe above comments. And someone could get confused by 1020 constants.
How do you think about removing comments.

Regards,
Changman

   sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS -
 - NR_CURSEG_TYPE) * F2FS_ORPHANS_PER_BLOCK;
 + NR_CURSEG_TYPE - __cp_payload(sbi)) *
 + F2FS_ORPHANS_PER_BLOCK;
  }
  
  int __init create_checkpoint_caches(void)
 -- 
 1.9.1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] f2fs: fix to issue small discard in real-time mode discard

2015-03-02 Thread Changman Lee

On Sat, Feb 28, 2015 at 05:23:30PM +0800, Chao Yu wrote:
 Now in f2fs, we share functions and structures for batch mode and real-time 
 mode
 discard. For real-time mode discard, in shared function add_discard_addrs, we
 will use uninitialized trim_minlen in struct cp_control to compare with length
 of contiguous free blocks to decide whether skipping discard fragmented 
 freespace
 or not, this makes us ignore small discard sometimes. Fix it.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index daee4ab..fcc1cc2 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -549,7 +549,7 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, 
 struct cp_control *cpc)
  
   end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1);
  
 - if (end - start  cpc-trim_minlen)
 + if (force  end - start  cpc-trim_minlen)
   continue;

Reviewed-by : Changman Lee cm224@samsung.com

  
   __add_discard_entry(sbi, cpc, start, end);
 -- 
 2.3.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 5/5 v2] f2fs: introduce a batched trim

2015-02-02 Thread Changman Lee

Hi Jaegeuk,

IMHO, it looks better user could decide the size of trim considering latency of 
trim.
Otherwise, additional checkpoints user doesn't want will occur.

Regards,
Changman

On Mon, Feb 02, 2015 at 03:29:25PM -0800, Jaegeuk Kim wrote:
> Change long from v1:
>  o add description
>  o change the # of batched segments suggested by Chao
>  o make consistent for # of batched segments
> 
> This patch introduces a batched trimming feature, which submits split discard
> commands.
> 
> This patch introduces a batched trimming feature, which submits split discard
> commands.
> 
> This is to avoid long latency due to huge trim commands.
> If fstrim was triggered ranging from 0 to the end of device, we should lock
> all the checkpoint-related mutexes, resulting in very long latency.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|  2 ++
>  fs/f2fs/segment.c | 16 +++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 8231a59..ec5e66f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -105,6 +105,8 @@ enum {
>   CP_DISCARD,
>  };
>  
> +#define BATCHED_TRIM_SEGMENTS(sbi)   (((sbi)->segs_per_sec) << 5)
> +
>  struct cp_control {
>   int reason;
>   __u64 trim_start;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 5ea57ec..b85bb97 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1066,14 +1066,20 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
> fstrim_range *range)
>   end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
>   GET_SEGNO(sbi, end);
>   cpc.reason = CP_DISCARD;
> - cpc.trim_start = start_segno;
> - cpc.trim_end = end_segno;
>   cpc.trim_minlen = range->minlen >> sbi->log_blocksize;
>  
>   /* do checkpoint to issue discard commands safely */
> - mutex_lock(>gc_mutex);
> - write_checkpoint(sbi, );
> - mutex_unlock(>gc_mutex);
> + for (; start_segno <= end_segno;
> + start_segno += BATCHED_TRIM_SEGMENTS(sbi)) {
> + cpc.trim_start = start_segno;
> + cpc.trim_end = min_t(unsigned int,
> + start_segno + BATCHED_TRIM_SEGMENTS (sbi) - 1,
> + end_segno);
> +
> + mutex_lock(>gc_mutex);
> + write_checkpoint(sbi, );
> + mutex_unlock(>gc_mutex);
> + }
>  out:
>   range->len = cpc.trimmed << sbi->log_blocksize;
>   return 0;
> -- 
> 2.1.1
> 
> 
> --
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 5/5 v2] f2fs: introduce a batched trim

2015-02-02 Thread Changman Lee

Hi Jaegeuk,

IMHO, it looks better user could decide the size of trim considering latency of 
trim.
Otherwise, additional checkpoints user doesn't want will occur.

Regards,
Changman

On Mon, Feb 02, 2015 at 03:29:25PM -0800, Jaegeuk Kim wrote:
 Change long from v1:
  o add description
  o change the # of batched segments suggested by Chao
  o make consistent for # of batched segments
 
 This patch introduces a batched trimming feature, which submits split discard
 commands.
 
 This patch introduces a batched trimming feature, which submits split discard
 commands.
 
 This is to avoid long latency due to huge trim commands.
 If fstrim was triggered ranging from 0 to the end of device, we should lock
 all the checkpoint-related mutexes, resulting in very long latency.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/f2fs.h|  2 ++
  fs/f2fs/segment.c | 16 +++-
  2 files changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 8231a59..ec5e66f 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -105,6 +105,8 @@ enum {
   CP_DISCARD,
  };
  
 +#define BATCHED_TRIM_SEGMENTS(sbi)   (((sbi)-segs_per_sec)  5)
 +
  struct cp_control {
   int reason;
   __u64 trim_start;
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 5ea57ec..b85bb97 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1066,14 +1066,20 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
 fstrim_range *range)
   end_segno = (end = MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
   GET_SEGNO(sbi, end);
   cpc.reason = CP_DISCARD;
 - cpc.trim_start = start_segno;
 - cpc.trim_end = end_segno;
   cpc.trim_minlen = range-minlen  sbi-log_blocksize;
  
   /* do checkpoint to issue discard commands safely */
 - mutex_lock(sbi-gc_mutex);
 - write_checkpoint(sbi, cpc);
 - mutex_unlock(sbi-gc_mutex);
 + for (; start_segno = end_segno;
 + start_segno += BATCHED_TRIM_SEGMENTS(sbi)) {
 + cpc.trim_start = start_segno;
 + cpc.trim_end = min_t(unsigned int,
 + start_segno + BATCHED_TRIM_SEGMENTS (sbi) - 1,
 + end_segno);
 +
 + mutex_lock(sbi-gc_mutex);
 + write_checkpoint(sbi, cpc);
 + mutex_unlock(sbi-gc_mutex);
 + }
  out:
   range-len = cpc.trimmed  sbi-log_blocksize;
   return 0;
 -- 
 2.1.1
 
 
 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache

2015-01-21 Thread Changman Lee

On Wed, Jan 21, 2015 at 04:41:17PM +0800, Chao Yu wrote:
> Hi Changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@gmail.com]
> > Sent: Tuesday, January 20, 2015 11:06 PM
> > To: Chao Yu
> > Cc: Jaegeuk Kim; Changman Lee; linux-f2fs-de...@lists.sourceforge.net;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for 
> > rb-tree extent cache
> > 
> > Hi Chao,
> > 
> > Great works. :)
> 
> Thanks! :)
> 
> > 
> > 2015-01-12 16:14 GMT+09:00 Chao Yu :
> > > This patch adds core functions including slab cache init function and
> > > init/lookup/update/shrink/destroy function for rb-tree based extent cache.
> > >
> > > Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about 
> > > detail
> > > design and implementation of extent cache.
> > >
> > > Todo:
> > >  * add a cached_ei into struct extent_tree for a quick recent cache.
> > >  * register rb-based extent cache shrink with mm shrink interface.
> > >  * disable dir inode's extent cache.
> > >
> > > Signed-off-by: Chao Yu 
> > > Signed-off-by: Jaegeuk Kim 
> > > Signed-off-by: Changman Lee 
> 
> If you do not object, I'd like to keep this as lots of details and ideas are 
> from
> you and Jaegeuk.
> 

I have no objection.

> > > ---
> > >  fs/f2fs/data.c | 458 
> > > +
> > >  fs/f2fs/node.c |   9 +-
> > >  2 files changed, 466 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > index 4f5b871e..bf8c5eb 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -25,6 +25,9 @@
> > >  #include "trace.h"
> > >  #include 
> > >
> > 
> > ~ snip ~
> > 
> > > +
> > > +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs,
> > > +   block_t blkaddr)
> > > +{
> > > +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > +   nid_t ino = inode->i_ino;
> > > +   struct extent_tree *et;
> > > +   struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = 
> > > NULL;
> > > +   struct extent_node *den = NULL;
> > > +   struct extent_info *pei;
> > > +   struct extent_info ei;
> > > +   unsigned int endofs;
> > > +
> > > +   if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
> > > +   return;
> > > +
> > > +retry:
> > > +   down_write(>extent_tree_lock);
> > > +   et = radix_tree_lookup(>extent_tree_root, ino);
> > > +   if (!et) {
> > 
> > We've already made some useful functions.
> > How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ?
> 
> IMO, we'd better to use original function kmem_cache_alloc and 
> radix_tree_insert,
> because if we use f2fs_{kmem_cache_alloc, radix_tree_insert}, we may loop in 
> these
> functions without releasing extent_tree_lock lock when OOM, so it will block 
> lock
> grabbers for long time which we do not wish to see.
> 

I see. If so, let's use cond_resched() in front of goto retry after
up_write. And also look into kmem_cache_alloc in __insert_extent_tree, please.

> > 
> > > +   et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
> > > +   if (!et) {
> > > +   up_write(>extent_tree_lock);
> > > +   goto retry;
> > > +   }
> > > +   if (radix_tree_insert(>extent_tree_root, ino, et)) {
> > > +   up_write(>extent_tree_lock);
> > > +   kmem_cache_free(extent_tree_slab, et);
> > > +   goto retry;
> > > +   }
> > > +   memset(et, 0, sizeof(struct extent_tree));
> > > +   et->ino = ino;
> > > +   et->root = RB_ROOT;
> > > +   rwlock_init(>lock);
> > > +   atomic_set(>refcount, 0);
> > > +   et->count = 0;
> > > +   sbi->total_ext_tree++;
> > > +   }
> > > +   atomic_inc(>refcount);
> > > +   up_write(>extent_tree_lock);
> > > +
> > 
> > ~ snip ~
> > 
> > > +
> &g

Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache

2015-01-21 Thread Changman Lee

On Wed, Jan 21, 2015 at 04:41:17PM +0800, Chao Yu wrote:
 Hi Changman,

  -Original Message-
  From: Changman Lee [mailto:cm224@gmail.com]
  Sent: Tuesday, January 20, 2015 11:06 PM
  To: Chao Yu
  Cc: Jaegeuk Kim; Changman Lee; linux-f2fs-de...@lists.sourceforge.net;
  linux-kernel@vger.kernel.org
  Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for 
  rb-tree extent cache

  Hi Chao,

  Great works. :)

 Thanks! :)

  2015-01-12 16:14 GMT+09:00 Chao Yu chao2...@samsung.com:
   This patch adds core functions including slab cache init function and
   init/lookup/update/shrink/destroy function for rb-tree based extent cache.

   Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about 
   detail
   design and implementation of extent cache.

   Todo:
* add a cached_ei into struct extent_tree for a quick recent cache.
* register rb-based extent cache shrink with mm shrink interface.
* disable dir inode's extent cache.

   Signed-off-by: Chao Yu chao2...@samsung.com
   Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
   Signed-off-by: Changman Lee cm224@samsung.com

 If you do not object, I'd like to keep this as lots of details and ideas are 
 from
 you and Jaegeuk.

I have no objection.

   ---
fs/f2fs/data.c | 458 
   +
fs/f2fs/node.c |   9 +-
2 files changed, 466 insertions(+), 1 deletion(-)

   diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
   index 4f5b871e..bf8c5eb 100644
   --- a/fs/f2fs/data.c
   +++ b/fs/f2fs/data.c
   @@ -25,6 +25,9 @@
#include trace.h
#include trace/events/f2fs.h

  ~ snip ~

   +
   +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs,
   +   block_t blkaddr)
   +{
   +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
   +   nid_t ino = inode-i_ino;
   +   struct extent_tree *et;
   +   struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = 
   NULL;
   +   struct extent_node *den = NULL;
   +   struct extent_info *pei;
   +   struct extent_info ei;
   +   unsigned int endofs;
   +
   +   if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
   +   return;
   +
   +retry:
   +   down_write(sbi-extent_tree_lock);
   +   et = radix_tree_lookup(sbi-extent_tree_root, ino);
   +   if (!et) {

  We've already made some useful functions.
  How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ?

 IMO, we'd better to use original function kmem_cache_alloc and 
 radix_tree_insert,
 because if we use f2fs_{kmem_cache_alloc, radix_tree_insert}, we may loop in 
 these
 functions without releasing extent_tree_lock lock when OOM, so it will block 
 lock
 grabbers for long time which we do not wish to see.

I see. If so, let's use cond_resched() in front of goto retry after
up_write. And also look into kmem_cache_alloc in __insert_extent_tree, please.

   +   et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
   +   if (!et) {
   +   up_write(sbi-extent_tree_lock);
   +   goto retry;
   +   }
   +   if (radix_tree_insert(sbi-extent_tree_root, ino, et)) {
   +   up_write(sbi-extent_tree_lock);
   +   kmem_cache_free(extent_tree_slab, et);
   +   goto retry;
   +   }
   +   memset(et, 0, sizeof(struct extent_tree));
   +   et-ino = ino;
   +   et-root = RB_ROOT;
   +   rwlock_init(et-lock);
   +   atomic_set(et-refcount, 0);
   +   et-count = 0;
   +   sbi-total_ext_tree++;
   +   }
   +   atomic_inc(et-refcount);
   +   up_write(sbi-extent_tree_lock);
   +

  ~ snip ~

   +
   +   write_unlock(et-lock);
   +   atomic_dec(et-refcount);
   +}
   +
   +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
   +{
   +   struct extent_tree *treevec[EXT_TREE_VEC_SIZE];
   +   struct extent_node *en, *tmp;
   +   unsigned long ino = F2FS_ROOT_INO(sbi);
   +   struct radix_tree_iter iter;
   +   void **slot;
   +   unsigned int found;
   +   unsigned int node_cnt = 0, tree_cnt = 0;
   +
   +   if (available_free_memory(sbi, EXTENT_CACHE))
   +   return;
   +
   +   spin_lock(sbi-extent_lock);
   +   list_for_each_entry_safe(en, tmp, sbi-extent_list, list) {
   +   if (!nr_shrink--)
   +   break;
   +   list_del_init(en-list);
   +   }
   +   spin_unlock(sbi-extent_lock);
   +

  IMHO, it's expensive to retrieve all extent_tree to free extent_node
  that list_empty() is true.

 Yes, it will cause heavy overhead to release extent_node in extent cache
 which has huge number of extent_node.

  Is there any idea

Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache

2015-01-20 Thread Changman Lee

Hi Chao,

Great works. :)

2015-01-12 16:14 GMT+09:00 Chao Yu :
> This patch adds core functions including slab cache init function and
> init/lookup/update/shrink/destroy function for rb-tree based extent cache.
>
> Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
> design and implementation of extent cache.
>
> Todo:
>  * add a cached_ei into struct extent_tree for a quick recent cache.
>  * register rb-based extent cache shrink with mm shrink interface.
>  * disable dir inode's extent cache.
>
> Signed-off-by: Chao Yu 
> Signed-off-by: Jaegeuk Kim 
> Signed-off-by: Changman Lee 
> ---
>  fs/f2fs/data.c | 458 
> +
>  fs/f2fs/node.c |   9 +-
>  2 files changed, 466 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 4f5b871e..bf8c5eb 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -25,6 +25,9 @@
>  #include "trace.h"
>  #include 
>

~ snip ~

> +
> +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs,
> +   block_t blkaddr)
> +{
> +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +   nid_t ino = inode->i_ino;
> +   struct extent_tree *et;
> +   struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL;
> +   struct extent_node *den = NULL;
> +   struct extent_info *pei;
> +   struct extent_info ei;
> +   unsigned int endofs;
> +
> +   if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
> +   return;
> +
> +retry:
> +   down_write(>extent_tree_lock);
> +   et = radix_tree_lookup(>extent_tree_root, ino);
> +   if (!et) {

We've already made some useful functions.
How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ?

> +   et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
> +   if (!et) {
> +   up_write(>extent_tree_lock);
> +   goto retry;
> +   }
> +   if (radix_tree_insert(>extent_tree_root, ino, et)) {
> +   up_write(>extent_tree_lock);
> +   kmem_cache_free(extent_tree_slab, et);
> +   goto retry;
> +   }
> +   memset(et, 0, sizeof(struct extent_tree));
> +   et->ino = ino;
> +   et->root = RB_ROOT;
> +   rwlock_init(>lock);
> +   atomic_set(>refcount, 0);
> +   et->count = 0;
> +   sbi->total_ext_tree++;
> +   }
> +   atomic_inc(>refcount);
> +   up_write(>extent_tree_lock);
> +

~ snip ~

> +
> +   write_unlock(>lock);
> +   atomic_dec(>refcount);
> +}
> +
> +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
> +{
> +   struct extent_tree *treevec[EXT_TREE_VEC_SIZE];
> +   struct extent_node *en, *tmp;
> +   unsigned long ino = F2FS_ROOT_INO(sbi);
> +   struct radix_tree_iter iter;
> +   void **slot;
> +   unsigned int found;
> +   unsigned int node_cnt = 0, tree_cnt = 0;
> +
> +   if (available_free_memory(sbi, EXTENT_CACHE))
> +   return;
> +
> +   spin_lock(>extent_lock);
> +   list_for_each_entry_safe(en, tmp, >extent_list, list) {
> +   if (!nr_shrink--)
> +   break;
> +   list_del_init(>list);
> +   }
> +   spin_unlock(>extent_lock);
> +

IMHO, it's expensive to retrieve all extent_tree to free extent_node
that list_empty() is true.
Is there any idea to improve this?
For example, if each extent_node has its extent_root, it would be more
fast by not to retrieve all trees.
Of course, however, it uses more memory.

But, I think that your patchset might just as well be merged because
patches are well made and it's clearly separated with mount option. In
the next time, we could improve this.

Regards,
Changman

> +   down_read(>extent_tree_lock);
> +   while ((found = radix_tree_gang_lookup(>extent_tree_root,
> +   (void **)treevec, ino, EXT_TREE_VEC_SIZE))) {
> +   unsigned i;
> +
> +   ino = treevec[found - 1]->ino + 1;
> +   for (i = 0; i < found; i++) {
> +   struct extent_tree *et = treevec[i];
> +
> +   atomic_inc(>refcount);
> +   write_lock(>lock);
> +   node_cnt += __free_extent_tree(sbi, et, false);
> +   write_unlock(>lock);
> +

Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache

2015-01-20 Thread Changman Lee

Hi Chao,

Great works. :)

2015-01-12 16:14 GMT+09:00 Chao Yu chao2...@samsung.com:
 This patch adds core functions including slab cache init function and
 init/lookup/update/shrink/destroy function for rb-tree based extent cache.

 Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
 design and implementation of extent cache.

 Todo:
  * add a cached_ei into struct extent_tree for a quick recent cache.
  * register rb-based extent cache shrink with mm shrink interface.
  * disable dir inode's extent cache.

 Signed-off-by: Chao Yu chao2...@samsung.com
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 Signed-off-by: Changman Lee cm224@samsung.com
 ---
  fs/f2fs/data.c | 458 
 +
  fs/f2fs/node.c |   9 +-
  2 files changed, 466 insertions(+), 1 deletion(-)

 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index 4f5b871e..bf8c5eb 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -25,6 +25,9 @@
  #include trace.h
  #include trace/events/f2fs.h


~ snip ~

 +
 +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs,
 +   block_t blkaddr)
 +{
 +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 +   nid_t ino = inode-i_ino;
 +   struct extent_tree *et;
 +   struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL;
 +   struct extent_node *den = NULL;
 +   struct extent_info *pei;
 +   struct extent_info ei;
 +   unsigned int endofs;
 +
 +   if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
 +   return;
 +
 +retry:
 +   down_write(sbi-extent_tree_lock);
 +   et = radix_tree_lookup(sbi-extent_tree_root, ino);
 +   if (!et) {

We've already made some useful functions.
How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ?

 +   et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
 +   if (!et) {
 +   up_write(sbi-extent_tree_lock);
 +   goto retry;
 +   }
 +   if (radix_tree_insert(sbi-extent_tree_root, ino, et)) {
 +   up_write(sbi-extent_tree_lock);
 +   kmem_cache_free(extent_tree_slab, et);
 +   goto retry;
 +   }
 +   memset(et, 0, sizeof(struct extent_tree));
 +   et-ino = ino;
 +   et-root = RB_ROOT;
 +   rwlock_init(et-lock);
 +   atomic_set(et-refcount, 0);
 +   et-count = 0;
 +   sbi-total_ext_tree++;
 +   }
 +   atomic_inc(et-refcount);
 +   up_write(sbi-extent_tree_lock);
 +

~ snip ~

 +
 +   write_unlock(et-lock);
 +   atomic_dec(et-refcount);
 +}
 +
 +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
 +{
 +   struct extent_tree *treevec[EXT_TREE_VEC_SIZE];
 +   struct extent_node *en, *tmp;
 +   unsigned long ino = F2FS_ROOT_INO(sbi);
 +   struct radix_tree_iter iter;
 +   void **slot;
 +   unsigned int found;
 +   unsigned int node_cnt = 0, tree_cnt = 0;
 +
 +   if (available_free_memory(sbi, EXTENT_CACHE))
 +   return;
 +
 +   spin_lock(sbi-extent_lock);
 +   list_for_each_entry_safe(en, tmp, sbi-extent_list, list) {
 +   if (!nr_shrink--)
 +   break;
 +   list_del_init(en-list);
 +   }
 +   spin_unlock(sbi-extent_lock);
 +

IMHO, it's expensive to retrieve all extent_tree to free extent_node
that list_empty() is true.
Is there any idea to improve this?
For example, if each extent_node has its extent_root, it would be more
fast by not to retrieve all trees.
Of course, however, it uses more memory.

But, I think that your patchset might just as well be merged because
patches are well made and it's clearly separated with mount option. In
the next time, we could improve this.

Regards,
Changman

 +   down_read(sbi-extent_tree_lock);
 +   while ((found = radix_tree_gang_lookup(sbi-extent_tree_root,
 +   (void **)treevec, ino, EXT_TREE_VEC_SIZE))) {
 +   unsigned i;
 +
 +   ino = treevec[found - 1]-ino + 1;
 +   for (i = 0; i  found; i++) {
 +   struct extent_tree *et = treevec[i];
 +
 +   atomic_inc(et-refcount);
 +   write_lock(et-lock);
 +   node_cnt += __free_extent_tree(sbi, et, false);
 +   write_unlock(et-lock);
 +   atomic_dec(et-refcount);
 +   }
 +   }
 +   up_read(sbi-extent_tree_lock);
 +
 +   down_write(sbi-extent_tree_lock);
 +   radix_tree_for_each_slot(slot, sbi-extent_tree_root, iter,
 +   F2FS_ROOT_INO(sbi)) {
 +   struct extent_tree *et = (struct extent_tree *)*slot

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2015-01-07 Thread Changman Lee

Hi Chao,

On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote:
> Hi Changman,
> 
> Sorry for replying late!
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@samsung.com]
> > Sent: Tuesday, December 30, 2014 8:32 AM
> > To: Jaegeuk Kim
> > Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; 
> > linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > 
> > Hi all,
> > 
> > On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
> > > Hi Chao,
> > >
> > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> > >
> > > [snip]
> > >
> > > Nice draft. :)
> > >
> > > >
> > > > Please see the draft below.
> > > >
> > > > 1) Extent management:
> > > > If we use global management that managing all extents which are from 
> > > > different
> > > > inodes in sbi, we will face with serious lock contention when we access 
> > > > these
> > > > extents belong to different inodes concurrently, the loss may 
> > > > outweights the
> > > > gain.
> > >
> > > Agreed.
> > >
> > > > So we choose a local management for extent which means all extents are
> > > > managed by inode itself to avoid above lock contention. Addtionlly, we 
> > > > manage
> > > > all extents globally by linking all inode into a global lru list for 
> > > > extent
> > > > cache shrinker.
> > > > Approach:
> > > > a) build extent tree/rwlock/lru list/extent count in each inode.
> > > > *extent tree: link all extent in rb-tree;
> > > > *rwlock: protect fields when accessing extent cache 
> > > > concurrently;
> > > > *lru list: sort all extents in accessing time order;
> > > > *extent count: record total count of extents in cache.
> > > > b) use lru shrink list in sbi to manage all inode which cached 
> > > > extents.
> > > > *inode will be added or repostioned in this global list 
> > > > whenever
> > > > extent is being access in this inode.
> > > > *use spinlock to protect this shrink list.
> > >
> > > 1. How about adding a data structure with inode number instead of 
> > > referring
> > > inode pointer?
> > >
> > > 2. How about managing extent entries globally and setting an upper bound 
> > > to
> > > the number of extent entries instead of limiting them per each inode?
> > > (The rb-tree will handle many extents per inode.)
> > >
> > > 3. It needs to set a minimum length for the candidate of extent cache.
> > >  (e.g., 64)
> > >
> > 
> > Agreed.
> > 
> > > So, for example,
> > > struct ino_entry_for_extents {
> > >   inode number;
> > >   rb_tree for extent_entry objects;
> > >   rwlock;
> > > };
> > >
> > > struct extent_entry {
> > >   blkaddr, len;
> > >   list_head *;
> > > };
> > >
> > > Something like this.
> > >
> > > [A, B, C, ... are extent entry]
> > >
> > > The sbi has
> > > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> > > 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
> > >   ` ino_entry_for_extents (#11) has A, C in rb-tree
> > >   ` ino_entry_for_extents (#12) has Fin rb-tree
> > >   ` ino_entry_for_extents (#13) has G, E in rb-tree
> > >
> > > In f2fs_update_extent_cache and __get_data_block for #10,
> > >   ino_entry_for_extents (#10) was founded and updated D or B.
> > >   Then, updated entries are moved to MRU.
> > >
> > > In f2fs_evict_inode for #11, A and C are moved to LRU.
> > > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens 
> > > (#11)
> > > should be released.
> > >
> > > In f2fs_balance_fs_bg, some LRU extents are released according to the 
> > > amount
> > > of consumed memory. Then, it frees any ino_entry_for_extents having no 
> > > extent.
> > >
> > > IMO, we don't need to consider readahead for this, since get_data_block 
> > > will
> > > be called by VFS readahead.
> > &

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2015-01-07 Thread Changman Lee

Hi Chao,

On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote:
 Hi Changman,
 
 Sorry for replying late!
 
  -Original Message-
  From: Changman Lee [mailto:cm224@samsung.com]
  Sent: Tuesday, December 30, 2014 8:32 AM
  To: Jaegeuk Kim
  Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; 
  linux-kernel@vger.kernel.org
  Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
  
  Hi all,
  
  On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
   Hi Chao,
  
   On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
  
   [snip]
  
   Nice draft. :)
  
   
Please see the draft below.
   
1) Extent management:
If we use global management that managing all extents which are from 
different
inodes in sbi, we will face with serious lock contention when we access 
these
extents belong to different inodes concurrently, the loss may 
outweights the
gain.
  
   Agreed.
  
So we choose a local management for extent which means all extents are
managed by inode itself to avoid above lock contention. Addtionlly, we 
manage
all extents globally by linking all inode into a global lru list for 
extent
cache shrinker.
Approach:
a) build extent tree/rwlock/lru list/extent count in each inode.
*extent tree: link all extent in rb-tree;
*rwlock: protect fields when accessing extent cache 
concurrently;
*lru list: sort all extents in accessing time order;
*extent count: record total count of extents in cache.
b) use lru shrink list in sbi to manage all inode which cached 
extents.
*inode will be added or repostioned in this global list 
whenever
extent is being access in this inode.
*use spinlock to protect this shrink list.
  
   1. How about adding a data structure with inode number instead of 
   referring
   inode pointer?
  
   2. How about managing extent entries globally and setting an upper bound 
   to
   the number of extent entries instead of limiting them per each inode?
   (The rb-tree will handle many extents per inode.)
  
   3. It needs to set a minimum length for the candidate of extent cache.
(e.g., 64)
  
  
  Agreed.
  
   So, for example,
   struct ino_entry_for_extents {
 inode number;
 rb_tree for extent_entry objects;
 rwlock;
   };
  
   struct extent_entry {
 blkaddr, len;
 list_head *;
   };
  
   Something like this.
  
   [A, B, C, ... are extent entry]
  
   The sbi has
   1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU)
   2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
 ` ino_entry_for_extents (#11) has A, C in rb-tree
 ` ino_entry_for_extents (#12) has Fin rb-tree
 ` ino_entry_for_extents (#13) has G, E in rb-tree
  
   In f2fs_update_extent_cache and __get_data_block for #10,
 ino_entry_for_extents (#10) was founded and updated D or B.
 Then, updated entries are moved to MRU.
  
   In f2fs_evict_inode for #11, A and C are moved to LRU.
   But, if this inode is unlinked, all the A, C, and ino_entry_for_extens 
   (#11)
   should be released.
  
   In f2fs_balance_fs_bg, some LRU extents are released according to the 
   amount
   of consumed memory. Then, it frees any ino_entry_for_extents having no 
   extent.
  
   IMO, we don't need to consider readahead for this, since get_data_block 
   will
   be called by VFS readahead.
  
   Furthermore, we need to think about whether LRU is really best or not.
   IMO, the extent cache aims to improve second access speed, rather than 
   initial
   cold misses. So, maybe MRU or another algorithms would be better.
  
  
  Right. It's very comflicated to judge which is better.
  In read or write path, extents could be made every time. At that time, we 
  should
  decide which extent evicts instead of new extents if we set upper bound.
  In update, one extent could be seperated into 3. It requires 3 insertion 
  and 1 deletion.
  So if update happends frequently, we could give up extent management for 
  some ranges.
  And we need to bring ideas from vm managemnt. For example,
  active/inactive list and second chance to promotion, or batch work for 
  insertion/deletion
  
  I thought suddenly 'Simple is best'.
  Let's think about better ideas together.
 
 Yeah, how about using an opposite way to the way of page cache manager?
 
 for example:
 node page A,B,C,D is in page cache;
 extent a,b,c,d is in extent cache;
 extent a is built from page A, ..., d is built from page D.
 page cache: LRU A - B - C - D MRU
 extent cache: LRU a - b - c - d MRU
 
 If we use
 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same 
 time as OOM.
 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be 
 reclaimed,
 but we still can hit whole cache

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-29 Thread Changman Lee

Hi all,

On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> 
> [snip]
> 
> Nice draft. :)
> 
> > 
> > Please see the draft below.
> > 
> > 1) Extent management:
> > If we use global management that managing all extents which are from 
> > different
> > inodes in sbi, we will face with serious lock contention when we access 
> > these
> > extents belong to different inodes concurrently, the loss may outweights the
> > gain.
> 
> Agreed.
> 
> > So we choose a local management for extent which means all extents are
> > managed by inode itself to avoid above lock contention. Addtionlly, we 
> > manage
> > all extents globally by linking all inode into a global lru list for extent
> > cache shrinker.
> > Approach:
> > a) build extent tree/rwlock/lru list/extent count in each inode.
> > *extent tree: link all extent in rb-tree;
> > *rwlock: protect fields when accessing extent cache 
> > concurrently;
> > *lru list: sort all extents in accessing time order;
> > *extent count: record total count of extents in cache.
> > b) use lru shrink list in sbi to manage all inode which cached extents.
> > *inode will be added or repostioned in this global list whenever
> > extent is being access in this inode.
> > *use spinlock to protect this shrink list.
> 
> 1. How about adding a data structure with inode number instead of referring
> inode pointer?
> 
> 2. How about managing extent entries globally and setting an upper bound to
> the number of extent entries instead of limiting them per each inode?
> (The rb-tree will handle many extents per inode.)
> 
> 3. It needs to set a minimum length for the candidate of extent cache.
>  (e.g., 64)
> 

Agreed.

> So, for example,
> struct ino_entry_for_extents {
>   inode number;
>   rb_tree for extent_entry objects;
>   rwlock;
> };
> 
> struct extent_entry {
>   blkaddr, len;
>   list_head *;
> };
> 
> Something like this.
> 
> [A, B, C, ... are extent entry]
> 
> The sbi has
> 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
>   ` ino_entry_for_extents (#11) has A, C in rb-tree
>   ` ino_entry_for_extents (#12) has Fin rb-tree
>   ` ino_entry_for_extents (#13) has G, E in rb-tree
> 
> In f2fs_update_extent_cache and __get_data_block for #10,
>   ino_entry_for_extents (#10) was founded and updated D or B.
>   Then, updated entries are moved to MRU.
> 
> In f2fs_evict_inode for #11, A and C are moved to LRU.
> But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11)
> should be released.
> 
> In f2fs_balance_fs_bg, some LRU extents are released according to the amount
> of consumed memory. Then, it frees any ino_entry_for_extents having no extent.
> 
> IMO, we don't need to consider readahead for this, since get_data_block will
> be called by VFS readahead.
> 
> Furthermore, we need to think about whether LRU is really best or not.
> IMO, the extent cache aims to improve second access speed, rather than initial
> cold misses. So, maybe MRU or another algorithms would be better.
> 

Right. It's very comflicated to judge which is better.
In read or write path, extents could be made every time. At that time, we should
decide which extent evicts instead of new extents if we set upper bound.
In update, one extent could be seperated into 3. It requires 3 insertion and 1 
deletion.
So if update happends frequently, we could give up extent management for some 
ranges.
And we need to bring ideas from vm managemnt. For example,
active/inactive list and second chance to promotion, or batch work for 
insertion/deletion

I thought suddenly 'Simple is best'.
Let's think about better ideas together.

> Thanks,
> 
> > 
> > 2) Limitation:
> > In one inode, as we split or add extent in extent cache when read/write, 
> > extent
> > number will enlarge, so memory and CPU overhead will increase.
> > In order to control the overhead of memory and CPU, we try to set a upper 
> > bound
> > number to limit total extent number in each inode, This number is global
> > configuration which is visable to all inode. This number will be exported to
> > sysfs for configuring according to requirement of user. By default, designed
> > number is 8.
> > 

Chao,
It's better which # of extent are controlled globally rather than limit extents
per inode as Jaegeuk said to reduce extent management overhead.

> > 3) Shrinker:
> > There are two shrink paths:
> > a) one is triggered when extent count has exceed the upper bound of
> > inode's extent cache. We will try to release extent(s) from head of
> > inode's inner extent lru list until extent count is equal to upper 
> > bound.
> > This operation could be in f2fs_update_extent_cache().
> > b) the other one is triggered

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-29 Thread Changman Lee

Hi all,

On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
 Hi Chao,
 
 On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
 
 [snip]
 
 Nice draft. :)
 
  
  Please see the draft below.
  
  1) Extent management:
  If we use global management that managing all extents which are from 
  different
  inodes in sbi, we will face with serious lock contention when we access 
  these
  extents belong to different inodes concurrently, the loss may outweights the
  gain.
 
 Agreed.
 
  So we choose a local management for extent which means all extents are
  managed by inode itself to avoid above lock contention. Addtionlly, we 
  manage
  all extents globally by linking all inode into a global lru list for extent
  cache shrinker.
  Approach:
  a) build extent tree/rwlock/lru list/extent count in each inode.
  *extent tree: link all extent in rb-tree;
  *rwlock: protect fields when accessing extent cache 
  concurrently;
  *lru list: sort all extents in accessing time order;
  *extent count: record total count of extents in cache.
  b) use lru shrink list in sbi to manage all inode which cached extents.
  *inode will be added or repostioned in this global list whenever
  extent is being access in this inode.
  *use spinlock to protect this shrink list.
 
 1. How about adding a data structure with inode number instead of referring
 inode pointer?
 
 2. How about managing extent entries globally and setting an upper bound to
 the number of extent entries instead of limiting them per each inode?
 (The rb-tree will handle many extents per inode.)
 
 3. It needs to set a minimum length for the candidate of extent cache.
  (e.g., 64)
 

Agreed.

 So, for example,
 struct ino_entry_for_extents {
   inode number;
   rb_tree for extent_entry objects;
   rwlock;
 };
 
 struct extent_entry {
   blkaddr, len;
   list_head *;
 };
 
 Something like this.
 
 [A, B, C, ... are extent entry]
 
 The sbi has
 1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU)
 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
   ` ino_entry_for_extents (#11) has A, C in rb-tree
   ` ino_entry_for_extents (#12) has Fin rb-tree
   ` ino_entry_for_extents (#13) has G, E in rb-tree
 
 In f2fs_update_extent_cache and __get_data_block for #10,
   ino_entry_for_extents (#10) was founded and updated D or B.
   Then, updated entries are moved to MRU.
 
 In f2fs_evict_inode for #11, A and C are moved to LRU.
 But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11)
 should be released.
 
 In f2fs_balance_fs_bg, some LRU extents are released according to the amount
 of consumed memory. Then, it frees any ino_entry_for_extents having no extent.
 
 IMO, we don't need to consider readahead for this, since get_data_block will
 be called by VFS readahead.
 
 Furthermore, we need to think about whether LRU is really best or not.
 IMO, the extent cache aims to improve second access speed, rather than initial
 cold misses. So, maybe MRU or another algorithms would be better.
 

Right. It's very comflicated to judge which is better.
In read or write path, extents could be made every time. At that time, we should
decide which extent evicts instead of new extents if we set upper bound.
In update, one extent could be seperated into 3. It requires 3 insertion and 1 
deletion.
So if update happends frequently, we could give up extent management for some 
ranges.
And we need to bring ideas from vm managemnt. For example,
active/inactive list and second chance to promotion, or batch work for 
insertion/deletion

I thought suddenly 'Simple is best'.
Let's think about better ideas together.

 Thanks,
 
  
  2) Limitation:
  In one inode, as we split or add extent in extent cache when read/write, 
  extent
  number will enlarge, so memory and CPU overhead will increase.
  In order to control the overhead of memory and CPU, we try to set a upper 
  bound
  number to limit total extent number in each inode, This number is global
  configuration which is visable to all inode. This number will be exported to
  sysfs for configuring according to requirement of user. By default, designed
  number is 8.
  

Chao,
It's better which # of extent are controlled globally rather than limit extents
per inode as Jaegeuk said to reduce extent management overhead.

  3) Shrinker:
  There are two shrink paths:
  a) one is triggered when extent count has exceed the upper bound of
  inode's extent cache. We will try to release extent(s) from head of
  inode's inner extent lru list until extent count is equal to upper 
  bound.
  This operation could be in f2fs_update_extent_cache().
  b) the other one is triggered when memory util exceed threshold, we try
  get inode from head of global lru list(s), and release extent(s) with
  fixed number (by default: 64 extents)

Re: linux-next: Tree for Dec 26 (f2fs)

2014-12-28 Thread Changman Lee

On Fri, Dec 26, 2014 at 12:59:05PM -0800, Jaegeuk Kim wrote:
> I fixed the merged patch directly.
> 
> Changman,
> The patch was initially made by you, so let me know, if you have objection.
> 
> Thanks,

Sorry for my mistake.
Thanks, Stephen and Jaegeuk.

> 
> On Fri, Dec 26, 2014 at 11:17:15AM -0800, Randy Dunlap wrote:
> > On 12/26/14 00:30, Stephen Rothwell wrote:
> > > Hi all,
> > > 
> > > There will only be intermittent releases of linux-next between now and
> > > Jan 5.
> > > 
> > > Changes since 20141221:
> > > 
> > 
> > on x86_64:
> > when CONFIG_F2FS_STAT_FS is not enabled:
> > 
> > ../fs/f2fs/segment.c: In function 'rewrite_data_page':
> > ../fs/f2fs/segment.c:1233:2: error: implicit declaration of function 
> > 'stat_inc_inplace_blocks' [-Werror=implicit-function-declaration]
> > 
> > 
> > 
> > -- 
> > ~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Dec 26 (f2fs)

2014-12-28 Thread Changman Lee

On Fri, Dec 26, 2014 at 12:59:05PM -0800, Jaegeuk Kim wrote:
 I fixed the merged patch directly.
 
 Changman,
 The patch was initially made by you, so let me know, if you have objection.
 
 Thanks,

Sorry for my mistake.
Thanks, Stephen and Jaegeuk.

 
 On Fri, Dec 26, 2014 at 11:17:15AM -0800, Randy Dunlap wrote:
  On 12/26/14 00:30, Stephen Rothwell wrote:
   Hi all,
   
   There will only be intermittent releases of linux-next between now and
   Jan 5.
   
   Changes since 20141221:
   
  
  on x86_64:
  when CONFIG_F2FS_STAT_FS is not enabled:
  
  ../fs/f2fs/segment.c: In function 'rewrite_data_page':
  ../fs/f2fs/segment.c:1233:2: error: implicit declaration of function 
  'stat_inc_inplace_blocks' [-Werror=implicit-function-declaration]
  
  
  
  -- 
  ~Randy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-23 Thread Changman Lee

On Mon, Dec 22, 2014 at 11:36:09PM -0800, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Tue, Dec 23, 2014 at 11:01:39AM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> > 
> > > -Original Message-
> > > From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
> > > Sent: Tuesday, December 23, 2014 7:16 AM
> > > To: Chao Yu
> > > Cc: 'Changman Lee'; linux-f2fs-de...@lists.sourceforge.net; 
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > > 
> > > Hi Chao,
> > > 
> > > On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote:
> > > > Hi Changman,
> > > >
> > > > > -Original Message-
> > > > > From: Changman Lee [mailto:cm224@samsung.com]
> > > > > Sent: Monday, December 22, 2014 10:03 AM
> > > > > To: Chao Yu
> > > > > Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; 
> > > > > linux-kernel@vger.kernel.org
> > > > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > > > >
> > > > > Hi Yu,
> > > > >
> > > > > Good approach.
> > > >
> > > > Thank you. :)
> > > >
> > > > > As you know, however, f2fs breaks extent itself due to COW.
> > > >
> > > > Yes, and sometimes f2fs use IPU when override writing, in this 
> > > > condition,
> > > > by using this approach we can cache more contiguous mapping extent for 
> > > > better
> > > > performance.
> > > 
> > > Hmm. When f2fs faces with this case, there is no chance to make an extent 
> > > itself
> > > at all.
> > 
> > With new implementation of this patch f2fs will build extent cache when 
> > readpage/readpages.
> 
> I don't understand your points exactly. :(
> If there are no on-disk extents, it doesn't matter when the caches are built.
> Could you define what scenarios you're looking at?
> 
> > 
> > > 
> > > >
> > > > > Unlike other filesystem like btrfs, minimum extent of f2fs could have 
> > > > > 4KB granularity.
> > > > > So we would have lots of extents per inode and it could lead to 
> > > > > overhead
> > > > > to manage extents.
> > > >
> > > > Agree, the more number of extents are growing in one inode, the more 
> > > > memory
> > > > pressure and longer latency operating in rb-tree we are facing.
> > > > IMO, to solve this problem, we'd better to add limitation or shrink 
> > > > ability into
> > > > extent cache:
> > > > 1.limit extent number per inode with the value set from sysfs and 
> > > > discard extent
> > > > from inode's extent lru list if we touch the limitation; (e.g. in FAT, 
> > > > max number
> > > > of mapping extent per inode is fixed: 8)
> > > > 2.add all extents of inodes into a global lru list, we will try to 
> > > > shrink this list
> > > > if we're facing memory pressure.
> > > >
> > > > How do you think? or any better ideas are welcome. :)
> > > 
> > > Historically, the reason that I added only one small extent cache is that 
> > > I
> > > wanted to avoid additional data structures having any overhead in 
> > > critical data
> > > write path.
> > 
> > Thank you for telling me the history of original extent cache.
> > 
> > > Instead, I intended to use a well operating node page cache.
> > > 
> > > We need to consider what would be the benefit when using extent cache 
> > > rather
> > > than existing node page cache.
> > 
> > IMO, node page cache belongs to system level cache, filesystem sub system 
> > can
> > not control it completely, cached uptodate node page will be invalidated by
> > using drop_caches from sysfs, or reclaimer of mm, result in more IO when we 
> > need
> > these node page next time.
> 
> Yes, that's exactly what I wanted.
> 
> > New extent cache belongs to filesystem level cache, it is completely 
> > controlled
> > by filesystem itself. What we can profit is: on the one hand, it is used as
> > first level cache above the node page cache, which can also increase the 
> > cache
> > hit ratio.
> 
> I don't think so. The hit ratio depends on the cache policy. The node page
> cache is managed globally by kernel in L

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-23 Thread Changman Lee

On Mon, Dec 22, 2014 at 11:36:09PM -0800, Jaegeuk Kim wrote:
 Hi Chao,

 On Tue, Dec 23, 2014 at 11:01:39AM +0800, Chao Yu wrote:
  Hi Jaegeuk,

   -Original Message-
   From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
   Sent: Tuesday, December 23, 2014 7:16 AM
   To: Chao Yu
   Cc: 'Changman Lee'; linux-f2fs-de...@lists.sourceforge.net; 
   linux-kernel@vger.kernel.org
   Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

   Hi Chao,

   On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote:
Hi Changman,

 -Original Message-
 From: Changman Lee [mailto:cm224@samsung.com]
 Sent: Monday, December 22, 2014 10:03 AM
 To: Chao Yu
 Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; 
 linux-kernel@vger.kernel.org
 Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

 Hi Yu,

 Good approach.

Thank you. :)

 As you know, however, f2fs breaks extent itself due to COW.

Yes, and sometimes f2fs use IPU when override writing, in this 
condition,
by using this approach we can cache more contiguous mapping extent for 
better
performance.

   Hmm. When f2fs faces with this case, there is no chance to make an extent 
   itself
   at all.

  With new implementation of this patch f2fs will build extent cache when 
  readpage/readpages.

 I don't understand your points exactly. :(
 If there are no on-disk extents, it doesn't matter when the caches are built.
 Could you define what scenarios you're looking at?

 Unlike other filesystem like btrfs, minimum extent of f2fs could have 
 4KB granularity.
 So we would have lots of extents per inode and it could lead to 
 overhead
 to manage extents.

Agree, the more number of extents are growing in one inode, the more 
memory
pressure and longer latency operating in rb-tree we are facing.
IMO, to solve this problem, we'd better to add limitation or shrink 
ability into
extent cache:
1.limit extent number per inode with the value set from sysfs and 
discard extent
from inode's extent lru list if we touch the limitation; (e.g. in FAT, 
max number
of mapping extent per inode is fixed: 8)
2.add all extents of inodes into a global lru list, we will try to 
shrink this list
if we're facing memory pressure.

How do you think? or any better ideas are welcome. :)

   Historically, the reason that I added only one small extent cache is that 
   I
   wanted to avoid additional data structures having any overhead in 
   critical data
   write path.

  Thank you for telling me the history of original extent cache.

   Instead, I intended to use a well operating node page cache.

   We need to consider what would be the benefit when using extent cache 
   rather
   than existing node page cache.

  IMO, node page cache belongs to system level cache, filesystem sub system 
  can
  not control it completely, cached uptodate node page will be invalidated by
  using drop_caches from sysfs, or reclaimer of mm, result in more IO when we 
  need
  these node page next time.

 Yes, that's exactly what I wanted.

  New extent cache belongs to filesystem level cache, it is completely 
  controlled
  by filesystem itself. What we can profit is: on the one hand, it is used as
  first level cache above the node page cache, which can also increase the 
  cache
  hit ratio.

 I don't think so. The hit ratio depends on the cache policy. The node page
 cache is managed globally by kernel in LRU manner, so I think this can show
 affordable hit ratio.

  On the other hand, it is more instable and controllable than node page
  cache.

 It depends on how you can control the extent cache. But, I'm not sure that
 would be better than page cache managed by MM.

 So, my concerns are:

 1. Redundant memory overhead
  : The extent cache is likely on top of the node page cache, which will 
 consume 
  memory redundantly.

 2. CPU overhead
  : In every block address updates, it needs to traverse extent cache entries.

 3. Effectiveness
  : We have a node page cache that is managed by MM in LRU order. I think this
  provides good hit ratio, system-wide memory relciaming algorithms, and well-
  defined locking mechanism.

 4. Cache reclaiming policy
  a. global approach: it needs to consider lock contention, CPU overhead, and
  shrinker. I don't think it is better than page cache.
  b. local approach: there still exists cold misses at the initial read
 operations. After then, how does the extent cache increase
   hit ratio more than giving node page cache?

   For example, in the case of pretty normal scenario like
   open - read - close - open - read ..., we can't get
   benefits form locally-managed extent cache, while node
   page

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-22 Thread Changman Lee

Hi,

On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote:
> Hi Changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@samsung.com]
> > Sent: Monday, December 22, 2014 10:03 AM
> > To: Chao Yu
> > Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; 
> > linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > 
> > Hi Yu,
> > 
> > Good approach.
> 
> Thank you. :)
> 
> > As you know, however, f2fs breaks extent itself due to COW.
> 
> Yes, and sometimes f2fs use IPU when override writing, in this condition,
> by using this approach we can cache more contiguous mapping extent for better
> performance.
> 
> > Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB 
> > granularity.
> > So we would have lots of extents per inode and it could lead to overhead
> > to manage extents.
> 
> Agree, the more number of extents are growing in one inode, the more memory
> pressure and longer latency operating in rb-tree we are facing.
> IMO, to solve this problem, we'd better to add limitation or shrink ability 
> into
> extent cache:
> 1.limit extent number per inode with the value set from sysfs and discard 
> extent
> from inode's extent lru list if we touch the limitation; (e.g. in FAT, max 
> number
> of mapping extent per inode is fixed: 8)
> 2.add all extents of inodes into a global lru list, we will try to shrink 
> this list
> if we're facing memory pressure.
> 
> How do you think? or any better ideas are welcome. :)
> 

I think both of them are considerable options.
How about adding extent to inode selected by user using ioctl or xattr?
In the case of read most files having large size, user could get a benefit
surely although they are seperated some pieces.

Thanks,

> > 
> > Anyway, mount option could be alternative for this patch.
> 
> Yes, will do.
> 
> Thanks,
> Yu
> 
> > 
> > On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote:
> > > Now f2fs have page-block mapping cache which can cache only one extent 
> > > mapping
> > > between contiguous logical address and physical address.
> > > Normally, this design will work well because f2fs will expand coverage 
> > > area of
> > > the mapping extent when we write forward sequentially. But when we write 
> > > data
> > > randomly in Out-Place-Update mode, the extent will be shorten and hardly 
> > > be
> > > expanded for most time as following reasons:
> > > 1.The short part of extent will be discarded if we break contiguous 
> > > mapping in
> > > the middle of extent.
> > > 2.The new mapping will be added into mapping cache only at head or tail 
> > > of the
> > > extent.
> > > 3.We will drop the extent cache when the extent became very fragmented.
> > > 4.We will not update the extent with mapping which we get from readpages 
> > > or
> > > readpage.
> > >
> > > To solve above problems, this patch adds extent cache base on rb-tree 
> > > like other
> > > filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support 
> > > another
> > > more effective cache between dnode page cache and disk. It will supply 
> > > high hit
> > > ratio in the cache with fewer memory when dnode page cache are reclaimed 
> > > in
> > > environment of low memory.
> > >
> > > Todo:
> > > *introduce mount option for extent cache.
> > > *add shrink ability for extent cache.
> > >
> > > Signed-off-by: Chao Yu 
> > > ---
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-22 Thread Changman Lee

Hi,

On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote:
 Hi Changman,

  -Original Message-
  From: Changman Lee [mailto:cm224@samsung.com]
  Sent: Monday, December 22, 2014 10:03 AM
  To: Chao Yu
  Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; 
  linux-kernel@vger.kernel.org
  Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

  Hi Yu,

  Good approach.

 Thank you. :)

  As you know, however, f2fs breaks extent itself due to COW.

 Yes, and sometimes f2fs use IPU when override writing, in this condition,
 by using this approach we can cache more contiguous mapping extent for better
 performance.

  Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB 
  granularity.
  So we would have lots of extents per inode and it could lead to overhead
  to manage extents.

 Agree, the more number of extents are growing in one inode, the more memory
 pressure and longer latency operating in rb-tree we are facing.
 IMO, to solve this problem, we'd better to add limitation or shrink ability 
 into
 extent cache:
 1.limit extent number per inode with the value set from sysfs and discard 
 extent
 from inode's extent lru list if we touch the limitation; (e.g. in FAT, max 
 number
 of mapping extent per inode is fixed: 8)
 2.add all extents of inodes into a global lru list, we will try to shrink 
 this list
 if we're facing memory pressure.

 How do you think? or any better ideas are welcome. :)

I think both of them are considerable options.
How about adding extent to inode selected by user using ioctl or xattr?
In the case of read most files having large size, user could get a benefit
surely although they are seperated some pieces.

Thanks,

  Anyway, mount option could be alternative for this patch.

 Yes, will do.

 Thanks,
 Yu

  On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote:
   Now f2fs have page-block mapping cache which can cache only one extent 
   mapping
   between contiguous logical address and physical address.
   Normally, this design will work well because f2fs will expand coverage 
   area of
   the mapping extent when we write forward sequentially. But when we write 
   data
   randomly in Out-Place-Update mode, the extent will be shorten and hardly 
   be
   expanded for most time as following reasons:
   1.The short part of extent will be discarded if we break contiguous 
   mapping in
   the middle of extent.
   2.The new mapping will be added into mapping cache only at head or tail 
   of the
   extent.
   3.We will drop the extent cache when the extent became very fragmented.
   4.We will not update the extent with mapping which we get from readpages 
   or
   readpage.

   To solve above problems, this patch adds extent cache base on rb-tree 
   like other
   filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support 
   another
   more effective cache between dnode page cache and disk. It will supply 
   high hit
   ratio in the cache with fewer memory when dnode page cache are reclaimed 
   in
   environment of low memory.

   Todo:
   *introduce mount option for extent cache.
   *add shrink ability for extent cache.

   Signed-off-by: Chao Yu chao2...@samsung.com
   ---

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-21 Thread Changman Lee

Hi Yu,

Good approach.
As you know, however, f2fs breaks extent itself due to COW.
Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB 
granularity.
So we would have lots of extents per inode and it could lead to overhead
to manage extents.

Anyway, mount option could be alternative for this patch.

On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote:
> Now f2fs have page-block mapping cache which can cache only one extent mapping
> between contiguous logical address and physical address.
> Normally, this design will work well because f2fs will expand coverage area of
> the mapping extent when we write forward sequentially. But when we write data
> randomly in Out-Place-Update mode, the extent will be shorten and hardly be
> expanded for most time as following reasons:
> 1.The short part of extent will be discarded if we break contiguous mapping in
> the middle of extent.
> 2.The new mapping will be added into mapping cache only at head or tail of the
> extent.
> 3.We will drop the extent cache when the extent became very fragmented.
> 4.We will not update the extent with mapping which we get from readpages or
> readpage.
> 
> To solve above problems, this patch adds extent cache base on rb-tree like 
> other
> filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another
> more effective cache between dnode page cache and disk. It will supply high 
> hit
> ratio in the cache with fewer memory when dnode page cache are reclaimed in
> environment of low memory.
> 
> Todo:
> *introduce mount option for extent cache.
> *add shrink ability for extent cache.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/data.c  | 348 
> +---
>  fs/f2fs/debug.c |   2 +
>  fs/f2fs/f2fs.h  |  49 
>  fs/f2fs/inode.c |   5 +-
>  fs/f2fs/super.c |  11 +-
>  5 files changed, 291 insertions(+), 124 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 7ec697b..20592e2 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -24,6 +24,8 @@
>  #include "segment.h"
>  #include 
>  
> +struct kmem_cache *extent_info_cache;
> +
>  static void f2fs_read_end_io(struct bio *bio, int err)
>  {
>   struct bio_vec *bvec;
> @@ -247,126 +249,264 @@ int f2fs_reserve_block(struct dnode_of_data *dn, 
> pgoff_t index)
>   return err;
>  }
>  
> -static int check_extent_cache(struct inode *inode, pgoff_t pgofs,
> - struct buffer_head *bh_result)
> +static struct extent_info *__insert_extent_cache(struct inode *inode,
> + unsigned int fofs, unsigned int len, u32 blk)
>  {
> - struct f2fs_inode_info *fi = F2FS_I(inode);
> - pgoff_t start_fofs, end_fofs;
> - block_t start_blkaddr;
> -
> - if (is_inode_flag_set(fi, FI_NO_EXTENT))
> - return 0;
> -
> - read_lock(>ext.ext_lock);
> - if (fi->ext.len == 0) {
> - read_unlock(>ext.ext_lock);
> - return 0;
> + struct rb_root *root = _I(inode)->ei_tree.root;
> + struct rb_node *p = root->rb_node;
> + struct rb_node *parent = NULL;
> + struct extent_info *ei;
> +
> + while (p) {
> + parent = p;
> + ei = rb_entry(parent, struct extent_info, rb_node);
> +
> + if (fofs < ei->fofs)
> + p = p->rb_left;
> + else if (fofs >= ei->fofs + ei->len)
> + p = p->rb_right;
> + else
> + f2fs_bug_on(F2FS_I_SB(inode), 1);
>   }
>  
> - stat_inc_total_hit(inode->i_sb);
> + ei = kmem_cache_alloc(extent_info_cache, GFP_ATOMIC);
> + ei->fofs = fofs;
> + ei->blk = blk;
> + ei->len = len;
> +
> + rb_link_node(>rb_node, parent, );
> + rb_insert_color(>rb_node, root);
> + stat_inc_extent_count(inode->i_sb);
> + return ei;
> +}
>  
> - start_fofs = fi->ext.fofs;
> - end_fofs = fi->ext.fofs + fi->ext.len - 1;
> - start_blkaddr = fi->ext.blk_addr;
> +static bool __remove_extent_cache(struct inode *inode, unsigned int fofs,
> + struct extent_info *cei)
> +{
> + struct rb_root *root = _I(inode)->ei_tree.root;
> + struct rb_node *p = root->rb_node;
> + struct extent_info *ei;
>  
> - if (pgofs >= start_fofs && pgofs <= end_fofs) {
> - unsigned int blkbits = inode->i_sb->s_blocksize_bits;
> - size_t count;
> + while (p) {
> + ei = rb_entry(p, struct extent_info, rb_node);
>  
> - clear_buffer_new(bh_result);
> - map_bh(bh_result, inode->i_sb,
> - start_blkaddr + pgofs - start_fofs);
> - count = end_fofs - pgofs + 1;
> - if (count < (UINT_MAX >> blkbits))
> - bh_result->b_size = (count << blkbits);
> + if (fofs < ei->fofs)
> + p = p->rb_left;
> + else if (fofs >=

Re: [RFC PATCH] f2fs: add extent cache base on rb-tree

2014-12-21 Thread Changman Lee

Hi Yu,

Good approach.
As you know, however, f2fs breaks extent itself due to COW.
Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB 
granularity.
So we would have lots of extents per inode and it could lead to overhead
to manage extents.

Anyway, mount option could be alternative for this patch.

On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote:
 Now f2fs have page-block mapping cache which can cache only one extent mapping
 between contiguous logical address and physical address.
 Normally, this design will work well because f2fs will expand coverage area of
 the mapping extent when we write forward sequentially. But when we write data
 randomly in Out-Place-Update mode, the extent will be shorten and hardly be
 expanded for most time as following reasons:
 1.The short part of extent will be discarded if we break contiguous mapping in
 the middle of extent.
 2.The new mapping will be added into mapping cache only at head or tail of the
 extent.
 3.We will drop the extent cache when the extent became very fragmented.
 4.We will not update the extent with mapping which we get from readpages or
 readpage.
 
 To solve above problems, this patch adds extent cache base on rb-tree like 
 other
 filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another
 more effective cache between dnode page cache and disk. It will supply high 
 hit
 ratio in the cache with fewer memory when dnode page cache are reclaimed in
 environment of low memory.
 
 Todo:
 *introduce mount option for extent cache.
 *add shrink ability for extent cache.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/data.c  | 348 
 +---
  fs/f2fs/debug.c |   2 +
  fs/f2fs/f2fs.h  |  49 
  fs/f2fs/inode.c |   5 +-
  fs/f2fs/super.c |  11 +-
  5 files changed, 291 insertions(+), 124 deletions(-)
 
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index 7ec697b..20592e2 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -24,6 +24,8 @@
  #include segment.h
  #include trace/events/f2fs.h
  
 +struct kmem_cache *extent_info_cache;
 +
  static void f2fs_read_end_io(struct bio *bio, int err)
  {
   struct bio_vec *bvec;
 @@ -247,126 +249,264 @@ int f2fs_reserve_block(struct dnode_of_data *dn, 
 pgoff_t index)
   return err;
  }
  
 -static int check_extent_cache(struct inode *inode, pgoff_t pgofs,
 - struct buffer_head *bh_result)
 +static struct extent_info *__insert_extent_cache(struct inode *inode,
 + unsigned int fofs, unsigned int len, u32 blk)
  {
 - struct f2fs_inode_info *fi = F2FS_I(inode);
 - pgoff_t start_fofs, end_fofs;
 - block_t start_blkaddr;
 -
 - if (is_inode_flag_set(fi, FI_NO_EXTENT))
 - return 0;
 -
 - read_lock(fi-ext.ext_lock);
 - if (fi-ext.len == 0) {
 - read_unlock(fi-ext.ext_lock);
 - return 0;
 + struct rb_root *root = F2FS_I(inode)-ei_tree.root;
 + struct rb_node *p = root-rb_node;
 + struct rb_node *parent = NULL;
 + struct extent_info *ei;
 +
 + while (p) {
 + parent = p;
 + ei = rb_entry(parent, struct extent_info, rb_node);
 +
 + if (fofs  ei-fofs)
 + p = p-rb_left;
 + else if (fofs = ei-fofs + ei-len)
 + p = p-rb_right;
 + else
 + f2fs_bug_on(F2FS_I_SB(inode), 1);
   }
  
 - stat_inc_total_hit(inode-i_sb);
 + ei = kmem_cache_alloc(extent_info_cache, GFP_ATOMIC);
 + ei-fofs = fofs;
 + ei-blk = blk;
 + ei-len = len;
 +
 + rb_link_node(ei-rb_node, parent, p);
 + rb_insert_color(ei-rb_node, root);
 + stat_inc_extent_count(inode-i_sb);
 + return ei;
 +}
  
 - start_fofs = fi-ext.fofs;
 - end_fofs = fi-ext.fofs + fi-ext.len - 1;
 - start_blkaddr = fi-ext.blk_addr;
 +static bool __remove_extent_cache(struct inode *inode, unsigned int fofs,
 + struct extent_info *cei)
 +{
 + struct rb_root *root = F2FS_I(inode)-ei_tree.root;
 + struct rb_node *p = root-rb_node;
 + struct extent_info *ei;
  
 - if (pgofs = start_fofs  pgofs = end_fofs) {
 - unsigned int blkbits = inode-i_sb-s_blocksize_bits;
 - size_t count;
 + while (p) {
 + ei = rb_entry(p, struct extent_info, rb_node);
  
 - clear_buffer_new(bh_result);
 - map_bh(bh_result, inode-i_sb,
 - start_blkaddr + pgofs - start_fofs);
 - count = end_fofs - pgofs + 1;
 - if (count  (UINT_MAX  blkbits))
 - bh_result-b_size = (count  blkbits);
 + if (fofs  ei-fofs)
 + p = p-rb_left;
 + else if (fofs = ei-fofs + ei-len)
 + p = p-rb_right;
   else
 -

Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost

2014-12-18 Thread Changman Lee

On Thu, Dec 18, 2014 at 02:29:51PM +0800, Chao Yu wrote:
> Hi Changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@gmail.com]
> > Sent: Wednesday, December 17, 2014 11:09 PM
> > To: Chao Yu
> > Cc: Jaegeuk Kim; Changman Lee; linux-fsde...@vger.kernel.org; 
> > linux-kernel@vger.kernel.org;
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct 
> > node_info to reduce
> > memory cost
> > 
> > Hi Yu,
> > 
> > This patch is effective only in 32 bit machine. In case of 64 bit
> > machine, nat_entry will be aligned in 8 bytes due to pointer variable
> > (i.e. struct list_head). So it can't get any benefit to reduce memory
> > usage. In the case of node_info, however, it will be gain in terms of
> > memory usage.
> > Hence, I think it's not correct for commit log to describe this patch.
> > 
> 
> Thanks for your review! :)
> 
> AFFIK, in 64 bit machine, size of struct nat_entry is 40 bytes before this 
> patch
> apply, the reason is that our compiler will fill 3 bytes pads after flag as
> nid's offset should align to type size of nid, and then fill 7 byte pads after
> version as size of structure should align to 64 bits when the struct size is 
> bigger
> than 64 bits.
> layout of struct nat_entry:
> |-8 bytes-|
> |list.next|
> |list.prev|
> |flag|nid |
> |ino |blk_addr|
> |version  |
> After we apply this patch, size of struct nat_entry will be reduced to 32 
> bytes.
> Please correct me if I'm wrong.

Hi,

Sorry, you're right.
I miscalculated.

Thanks,

> 
> Anyway, I agreed that commit log should be uptodate.
> 
> Thanks,
> Yu
> 
> > Thanks,
> > 
> > Reviewed-by: Changman Lee 
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost

2014-12-18 Thread Changman Lee

On Thu, Dec 18, 2014 at 02:29:51PM +0800, Chao Yu wrote:
 Hi Changman,

  -Original Message-
  From: Changman Lee [mailto:cm224@gmail.com]
  Sent: Wednesday, December 17, 2014 11:09 PM
  To: Chao Yu
  Cc: Jaegeuk Kim; Changman Lee; linux-fsde...@vger.kernel.org; 
  linux-kernel@vger.kernel.org;
  linux-f2fs-de...@lists.sourceforge.net
  Subject: Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct 
  node_info to reduce
  memory cost

  Hi Yu,

  This patch is effective only in 32 bit machine. In case of 64 bit
  machine, nat_entry will be aligned in 8 bytes due to pointer variable
  (i.e. struct list_head). So it can't get any benefit to reduce memory
  usage. In the case of node_info, however, it will be gain in terms of
  memory usage.
  Hence, I think it's not correct for commit log to describe this patch.

 Thanks for your review! :)

 AFFIK, in 64 bit machine, size of struct nat_entry is 40 bytes before this 
 patch
 apply, the reason is that our compiler will fill 3 bytes pads after flag as
 nid's offset should align to type size of nid, and then fill 7 byte pads after
 version as size of structure should align to 64 bits when the struct size is 
 bigger
 than 64 bits.
 layout of struct nat_entry:
 |-8 bytes-|
 |list.next|
 |list.prev|
 |flag|nid |
 |ino |blk_addr|
 |version  |
 After we apply this patch, size of struct nat_entry will be reduced to 32 
 bytes.
 Please correct me if I'm wrong.

Hi,

Sorry, you're right.
I miscalculated.

Thanks,

 Anyway, I agreed that commit log should be uptodate.

 Thanks,
 Yu

  Thanks,

  Reviewed-by: Changman Lee cm224@samsung.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH v2] f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary

2014-12-17 Thread Changman Lee

Hi,

Is there any reason to use truncate_inode_pages_range instead of
invalidate_mapping_pages?
IMHO, it seems nice to just use invalidate_mapping_pages because pages
of meta_inode shouldn't be dirty, locked, under writeback or mapped in
this function.
If there is my misunderstanding, let me know.

Thanks,

Reviewed-by: Changman Lee 

2014-12-17 19:10 GMT+09:00 Chao Yu :
> Use more common function ra_meta_pages() with META_POR to readahead node 
> blocks
> in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
> readahead code there, and also we can remove unused function ra_sum_pages().
>
> changes from v1:
>  o fix one bug when using truncate_inode_pages_range which is pointed out by
>Jaegeuk Kim.
>
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/node.c | 68 
> +-
>  1 file changed, 15 insertions(+), 53 deletions(-)
>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 5aa54a0..ab48b4c 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1726,80 +1726,42 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
> struct page *page)
> return 0;
>  }
>
> -/*
> - * ra_sum_pages() merge contiguous pages into one bio and submit.
> - * these pre-read pages are allocated in bd_inode's mapping tree.
> - */
> -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
> -   int start, int nrpages)
> -{
> -   struct inode *inode = sbi->sb->s_bdev->bd_inode;
> -   struct address_space *mapping = inode->i_mapping;
> -   int i, page_idx = start;
> -   struct f2fs_io_info fio = {
> -   .type = META,
> -   .rw = READ_SYNC | REQ_META | REQ_PRIO
> -   };
> -
> -   for (i = 0; page_idx < start + nrpages; page_idx++, i++) {
> -   /* alloc page in bd_inode for reading node summary info */
> -   pages[i] = grab_cache_page(mapping, page_idx);
> -   if (!pages[i])
> -   break;
> -   f2fs_submit_page_mbio(sbi, pages[i], page_idx, );
> -   }
> -
> -   f2fs_submit_merged_bio(sbi, META, READ);
> -   return i;
> -}
> -
>  int restore_node_summary(struct f2fs_sb_info *sbi,
> unsigned int segno, struct f2fs_summary_block *sum)
>  {
> struct f2fs_node *rn;
> struct f2fs_summary *sum_entry;
> -   struct inode *inode = sbi->sb->s_bdev->bd_inode;
> block_t addr;
> int bio_blocks = MAX_BIO_BLOCKS(sbi);
> -   struct page *pages[bio_blocks];
> -   int i, idx, last_offset, nrpages, err = 0;
> +   int i, idx, last_offset, nrpages;
>
> /* scan the node segment */
> last_offset = sbi->blocks_per_seg;
> addr = START_BLOCK(sbi, segno);
> sum_entry = >entries[0];
>
> -   for (i = 0; !err && i < last_offset; i += nrpages, addr += nrpages) {
> +   for (i = 0; i < last_offset; i += nrpages, addr += nrpages) {
> nrpages = min(last_offset - i, bio_blocks);
>
> /* readahead node pages */
> -   nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
> -   if (!nrpages)
> -   return -ENOMEM;
> +   ra_meta_pages(sbi, addr, nrpages, META_POR);
>
> -   for (idx = 0; idx < nrpages; idx++) {
> -   if (err)
> -   goto skip;
> +   for (idx = addr; idx < addr + nrpages; idx++) {
> +   struct page *page = get_meta_page(sbi, idx);
>
> -   lock_page(pages[idx]);
> -   if (unlikely(!PageUptodate(pages[idx]))) {
> -   err = -EIO;
> -   } else {
> -   rn = F2FS_NODE(pages[idx]);
> -   sum_entry->nid = rn->footer.nid;
> -   sum_entry->version = 0;
> -   sum_entry->ofs_in_node = 0;
> -   sum_entry++;
> -   }
> -   unlock_page(pages[idx]);
> -skip:
> -   page_cache_release(pages[idx]);
> +   rn = F2FS_NODE(page);
> +   sum_entry->nid = rn->footer.nid;
> +   sum_entry->version = 0;
> +   sum_entry->ofs_in_node = 0;
> +   sum_entry++;
> +   f2fs_put_page(page, 1);
> }
>
> -   invalidate_mapping_pages(inode->i_map

Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost

2014-12-17 Thread Changman Lee

Hi Yu,

This patch is effective only in 32 bit machine. In case of 64 bit
machine, nat_entry will be aligned in 8 bytes due to pointer variable
(i.e. struct list_head). So it can't get any benefit to reduce memory
usage. In the case of node_info, however, it will be gain in terms of
memory usage.
Hence, I think it's not correct for commit log to describe this patch.

Thanks,

Reviewed-by: Changman Lee 

2014-12-15 18:33 GMT+09:00 Chao Yu :
> This patch moves one member of struct nat_entry: _flag_ to struct node_info,
> so _version_ in struct node_info and _flag_ with unsigned char type will merge
> to one 32-bit space in register/memory. Then the size of nat_entry will reduce
> its size from 28 bytes to 24 bytes and slab memory using by f2fs will be
> reduced.
>
> changes from v1:
>  o introduce inline copy_node_info() to copy valid data from node info 
> suggested
>by Jaegeuk Kim, it can avoid bug.
>
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/node.c |  4 ++--
>  fs/f2fs/node.h | 33 ++---
>  2 files changed, 24 insertions(+), 13 deletions(-)
>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index f83326c..5aa54a0 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -268,7 +268,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, 
> struct node_info *ni,
> e = __lookup_nat_cache(nm_i, ni->nid);
> if (!e) {
> e = grab_nat_entry(nm_i, ni->nid);
> -   e->ni = *ni;
> +   copy_node_info(>ni, ni);
> f2fs_bug_on(sbi, ni->blk_addr == NEW_ADDR);
> } else if (new_blkaddr == NEW_ADDR) {
> /*
> @@ -276,7 +276,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, 
> struct node_info *ni,
>  * previous nat entry can be remained in nat cache.
>  * So, reinitialize it with new information.
>  */
> -   e->ni = *ni;
> +   copy_node_info(>ni, ni);
> f2fs_bug_on(sbi, ni->blk_addr != NULL_ADDR);
> }
>
> diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
> index d10b644..eb59167 100644
> --- a/fs/f2fs/node.h
> +++ b/fs/f2fs/node.h
> @@ -29,6 +29,14 @@
>  /* return value for read_node_page */
>  #define LOCKED_PAGE1
>
> +/* For flag in struct node_info */
> +enum {
> +   IS_CHECKPOINTED,/* is it checkpointed before? */
> +   HAS_FSYNCED_INODE,  /* is the inode fsynced before? */
> +   HAS_LAST_FSYNC, /* has the latest node fsync mark? */
> +   IS_DIRTY,   /* this nat entry is dirty? */
> +};
> +
>  /*
>   * For node information
>   */
> @@ -37,18 +45,11 @@ struct node_info {
> nid_t ino;  /* inode number of the node's owner */
> block_t blk_addr;   /* block address of the node */
> unsigned char version;  /* version of the node */
> -};
> -
> -enum {
> -   IS_CHECKPOINTED,/* is it checkpointed before? */
> -   HAS_FSYNCED_INODE,  /* is the inode fsynced before? */
> -   HAS_LAST_FSYNC, /* has the latest node fsync mark? */
> -   IS_DIRTY,   /* this nat entry is dirty? */
> +   unsigned char flag; /* for node information bits */
>  };
>
>  struct nat_entry {
> struct list_head list;  /* for clean or dirty nat list */
> -   unsigned char flag; /* for node information bits */
> struct node_info ni;/* in-memory node information */
>  };
>
> @@ -63,20 +64,30 @@ struct nat_entry {
>
>  #define inc_node_version(version)  (++version)
>
> +static inline void copy_node_info(struct node_info *dst,
> +   struct node_info *src)
> +{
> +   dst->nid = src->nid;
> +   dst->ino = src->ino;
> +   dst->blk_addr = src->blk_addr;
> +   dst->version = src->version;
> +   /* should not copy flag here */
> +}
> +
>  static inline void set_nat_flag(struct nat_entry *ne,
> unsigned int type, bool set)
>  {
> unsigned char mask = 0x01 << type;
> if (set)
> -   ne->flag |= mask;
> +   ne->ni.flag |= mask;
> else
> -   ne->flag &= ~mask;
> +   ne->ni.flag &= ~mask;
>  }
>
>  static inline bool get_nat_flag(struct nat_entry *ne, unsigned int type)
>  {
> unsigned char mask = 0x01 << type;
> -   return ne->flag & mask;
> +   return ne->ni.flag & mask;
>  }
>
>  static inline void nat_reset_flag(struct nat_entry *ne)
> --
> 2.1.2
&g

Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost

2014-12-17 Thread Changman Lee

Hi Yu,

This patch is effective only in 32 bit machine. In case of 64 bit
machine, nat_entry will be aligned in 8 bytes due to pointer variable
(i.e. struct list_head). So it can't get any benefit to reduce memory
usage. In the case of node_info, however, it will be gain in terms of
memory usage.
Hence, I think it's not correct for commit log to describe this patch.

Thanks,

Reviewed-by: Changman Lee cm224@samsung.com

2014-12-15 18:33 GMT+09:00 Chao Yu chao2...@samsung.com:
 This patch moves one member of struct nat_entry: _flag_ to struct node_info,
 so _version_ in struct node_info and _flag_ with unsigned char type will merge
 to one 32-bit space in register/memory. Then the size of nat_entry will reduce
 its size from 28 bytes to 24 bytes and slab memory using by f2fs will be
 reduced.

 changes from v1:
  o introduce inline copy_node_info() to copy valid data from node info 
 suggested
by Jaegeuk Kim, it can avoid bug.

 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/node.c |  4 ++--
  fs/f2fs/node.h | 33 ++---
  2 files changed, 24 insertions(+), 13 deletions(-)

 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index f83326c..5aa54a0 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -268,7 +268,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, 
 struct node_info *ni,
 e = __lookup_nat_cache(nm_i, ni-nid);
 if (!e) {
 e = grab_nat_entry(nm_i, ni-nid);
 -   e-ni = *ni;
 +   copy_node_info(e-ni, ni);
 f2fs_bug_on(sbi, ni-blk_addr == NEW_ADDR);
 } else if (new_blkaddr == NEW_ADDR) {
 /*
 @@ -276,7 +276,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, 
 struct node_info *ni,
  * previous nat entry can be remained in nat cache.
  * So, reinitialize it with new information.
  */
 -   e-ni = *ni;
 +   copy_node_info(e-ni, ni);
 f2fs_bug_on(sbi, ni-blk_addr != NULL_ADDR);
 }

 diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
 index d10b644..eb59167 100644
 --- a/fs/f2fs/node.h
 +++ b/fs/f2fs/node.h
 @@ -29,6 +29,14 @@
  /* return value for read_node_page */
  #define LOCKED_PAGE1

 +/* For flag in struct node_info */
 +enum {
 +   IS_CHECKPOINTED,/* is it checkpointed before? */
 +   HAS_FSYNCED_INODE,  /* is the inode fsynced before? */
 +   HAS_LAST_FSYNC, /* has the latest node fsync mark? */
 +   IS_DIRTY,   /* this nat entry is dirty? */
 +};
 +
  /*
   * For node information
   */
 @@ -37,18 +45,11 @@ struct node_info {
 nid_t ino;  /* inode number of the node's owner */
 block_t blk_addr;   /* block address of the node */
 unsigned char version;  /* version of the node */
 -};
 -
 -enum {
 -   IS_CHECKPOINTED,/* is it checkpointed before? */
 -   HAS_FSYNCED_INODE,  /* is the inode fsynced before? */
 -   HAS_LAST_FSYNC, /* has the latest node fsync mark? */
 -   IS_DIRTY,   /* this nat entry is dirty? */
 +   unsigned char flag; /* for node information bits */
  };

  struct nat_entry {
 struct list_head list;  /* for clean or dirty nat list */
 -   unsigned char flag; /* for node information bits */
 struct node_info ni;/* in-memory node information */
  };

 @@ -63,20 +64,30 @@ struct nat_entry {

  #define inc_node_version(version)  (++version)

 +static inline void copy_node_info(struct node_info *dst,
 +   struct node_info *src)
 +{
 +   dst-nid = src-nid;
 +   dst-ino = src-ino;
 +   dst-blk_addr = src-blk_addr;
 +   dst-version = src-version;
 +   /* should not copy flag here */
 +}
 +
  static inline void set_nat_flag(struct nat_entry *ne,
 unsigned int type, bool set)
  {
 unsigned char mask = 0x01  type;
 if (set)
 -   ne-flag |= mask;
 +   ne-ni.flag |= mask;
 else
 -   ne-flag = ~mask;
 +   ne-ni.flag = ~mask;
  }

  static inline bool get_nat_flag(struct nat_entry *ne, unsigned int type)
  {
 unsigned char mask = 0x01  type;
 -   return ne-flag  mask;
 +   return ne-ni.flag  mask;
  }

  static inline void nat_reset_flag(struct nat_entry *ne)
 --
 2.1.2



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 Linux-f2fs-devel mailing list
 linux

Re: [f2fs-dev] [PATCH v2] f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary

2014-12-17 Thread Changman Lee

Hi,

Is there any reason to use truncate_inode_pages_range instead of
invalidate_mapping_pages?
IMHO, it seems nice to just use invalidate_mapping_pages because pages
of meta_inode shouldn't be dirty, locked, under writeback or mapped in
this function.
If there is my misunderstanding, let me know.

Thanks,

Reviewed-by: Changman Lee cm224@samsung.com

2014-12-17 19:10 GMT+09:00 Chao Yu chao2...@samsung.com:
 Use more common function ra_meta_pages() with META_POR to readahead node 
 blocks
 in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
 readahead code there, and also we can remove unused function ra_sum_pages().

 changes from v1:
  o fix one bug when using truncate_inode_pages_range which is pointed out by
Jaegeuk Kim.

 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/node.c | 68 
 +-
  1 file changed, 15 insertions(+), 53 deletions(-)

 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index 5aa54a0..ab48b4c 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1726,80 +1726,42 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
 struct page *page)
 return 0;
  }

 -/*
 - * ra_sum_pages() merge contiguous pages into one bio and submit.
 - * these pre-read pages are allocated in bd_inode's mapping tree.
 - */
 -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
 -   int start, int nrpages)
 -{
 -   struct inode *inode = sbi-sb-s_bdev-bd_inode;
 -   struct address_space *mapping = inode-i_mapping;
 -   int i, page_idx = start;
 -   struct f2fs_io_info fio = {
 -   .type = META,
 -   .rw = READ_SYNC | REQ_META | REQ_PRIO
 -   };
 -
 -   for (i = 0; page_idx  start + nrpages; page_idx++, i++) {
 -   /* alloc page in bd_inode for reading node summary info */
 -   pages[i] = grab_cache_page(mapping, page_idx);
 -   if (!pages[i])
 -   break;
 -   f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio);
 -   }
 -
 -   f2fs_submit_merged_bio(sbi, META, READ);
 -   return i;
 -}
 -
  int restore_node_summary(struct f2fs_sb_info *sbi,
 unsigned int segno, struct f2fs_summary_block *sum)
  {
 struct f2fs_node *rn;
 struct f2fs_summary *sum_entry;
 -   struct inode *inode = sbi-sb-s_bdev-bd_inode;
 block_t addr;
 int bio_blocks = MAX_BIO_BLOCKS(sbi);
 -   struct page *pages[bio_blocks];
 -   int i, idx, last_offset, nrpages, err = 0;
 +   int i, idx, last_offset, nrpages;

 /* scan the node segment */
 last_offset = sbi-blocks_per_seg;
 addr = START_BLOCK(sbi, segno);
 sum_entry = sum-entries[0];

 -   for (i = 0; !err  i  last_offset; i += nrpages, addr += nrpages) {
 +   for (i = 0; i  last_offset; i += nrpages, addr += nrpages) {
 nrpages = min(last_offset - i, bio_blocks);

 /* readahead node pages */
 -   nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
 -   if (!nrpages)
 -   return -ENOMEM;
 +   ra_meta_pages(sbi, addr, nrpages, META_POR);

 -   for (idx = 0; idx  nrpages; idx++) {
 -   if (err)
 -   goto skip;
 +   for (idx = addr; idx  addr + nrpages; idx++) {
 +   struct page *page = get_meta_page(sbi, idx);

 -   lock_page(pages[idx]);
 -   if (unlikely(!PageUptodate(pages[idx]))) {
 -   err = -EIO;
 -   } else {
 -   rn = F2FS_NODE(pages[idx]);
 -   sum_entry-nid = rn-footer.nid;
 -   sum_entry-version = 0;
 -   sum_entry-ofs_in_node = 0;
 -   sum_entry++;
 -   }
 -   unlock_page(pages[idx]);
 -skip:
 -   page_cache_release(pages[idx]);
 +   rn = F2FS_NODE(page);
 +   sum_entry-nid = rn-footer.nid;
 +   sum_entry-version = 0;
 +   sum_entry-ofs_in_node = 0;
 +   sum_entry++;
 +   f2fs_put_page(page, 1);
 }

 -   invalidate_mapping_pages(inode-i_mapping, addr,
 -   addr + nrpages);
 +   truncate_inode_pages_range(META_MAPPING(sbi),
 +   addr  PAGE_CACHE_SHIFT,
 +   ((addr + nrpages)  PAGE_CACHE_SHIFT) - 1);
 }
 -   return err;
 +   return 0;
  }

  static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
 --
 2.1.2

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-25 Thread Changman Lee

Hi Simon,

Thanks very much for your interest.
It becomes more clear due to your explanation.

Regards,
Changman

On Tue, Nov 25, 2014 at 08:05:23PM +0100, Simon Baatz wrote:
> Hi Changman,
> 
> On Mon, Nov 24, 2014 at 11:46:46AM +0900, Changman Lee wrote:
> > Hi Simon,
> > Thanks for your explanation kindly.
> > 
> > On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote:
> > > Hi Changman, Jaegeuk,
> > > 
> > > On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote:
> > > > On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
> > > > > On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
> > > > > > Hi Jaegeuk,
> > > > > > 
> > > > > > We should call flush_dcache_page before kunmap because the purpose 
> > > > > > of the cache flush is to address aliasing problem related to 
> > > > > > virtual address.
> > > > > 
> > > > > Oh, I just followed zero_user_segments below.
> > > > > 
> > > > > static inline void zero_user_segments(struct page *page,
> > > > >   unsigned start1, unsigned end1,
> > > > >   unsigned start2, unsigned end2)
> > > > > {
> > > > >   void *kaddr = kmap_atomic(page);
> > > > > 
> > > > >   BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
> > > > > 
> > > > >   if (end1 > start1)
> > > > >   memset(kaddr + start1, 0, end1 - start1);
> > > > > 
> > > > >   if (end2 > start2)
> > > > >   memset(kaddr + start2, 0, end2 - start2);
> > > > > 
> > > > >   kunmap_atomic(kaddr);
> > > > >   flush_dcache_page(page);
> > > > > }
> > > > > 
> > > > > Is this a wrong reference? Or, a bug?
> > > > > 
> > > > 
> > > > Well.. Data in cache only have to be flushed until before other users 
> > > > read the data.
> > > > If so, it's not a bug.
> > > > 
> > > 
> > > Yes, it is not a bug, since flush_dcache_page() needs to be able to
> > > deal with non-kmapped pages. However, this may create overhead in
> > > some situations.
> > > 
> > 
> > Previously, I was vague but I thought that it should be different
> > according to vaddr exists or not. So I told jaegeuk that it should
> > be better to change an order between flush_dache_page and kunmap.
> > But actually, it doesn't matter the order between them except
> > the situation you said.
> > Could you explain the situation that makes overhead by flushing after 
> > kummap.
> > I can't imagine it by just seeing flush_dcache_page code.
> > 
> 
> I was a not very precise here. Yes, flush_dcache_page() on ARM does
> the same in both situations since it has no idea whether it is called
> before or after kunmap.  However, flush_kernel_dcache_page() can
> assume that it is called before kunmap and thus, for example, does not
> need to pin a highmem page by kmap_high_get() (apart from not having
> to care about flushing user space mappings)
> 
> > > According to documentation (see Documentation/cachetlb.txt), this is
> > > a use for flush_kernel_dcache_page(), since the page has been
> > > modified by the kernel only.  In contrast to flush_dcache_page(),
> > > this function must be called before kunmap().
> > > 
> > > flush_kernel_dcache_page() does not need to flush the user space
> > > aliases.  Additionally, at least on ARM, it does not flush at all
> > > when called within kmap_atomic()/kunmap_atomic(), when
> > > kunmap_atomic() is going to flush the page anyway.  (I know that
> > > almost no one uses flush_kernel_dcache_page() (probably because
> > > almost no one knows when to use which of the two functions), but it
> > > may save a few cache flushes on architectures which are affected by
> > > aliasing)
> > > 
> > > 
> > > > > Anyway I modified as below.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 
> > > > > >2001
> > > > > From: Jaegeuk Kim 
> > > > > Date: Tue, 18 Nov 2014 10:50:21 -0800
> > > > > Subject: [PATCH] f2fs: call flush_dcache_page when the page was 
> > > > > updated
> > > > >

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-25 Thread Changman Lee

Hi Simon,

Thanks very much for your interest.
It becomes more clear due to your explanation.

Regards,
Changman

On Tue, Nov 25, 2014 at 08:05:23PM +0100, Simon Baatz wrote:
 Hi Changman,
 
 On Mon, Nov 24, 2014 at 11:46:46AM +0900, Changman Lee wrote:
  Hi Simon,
  Thanks for your explanation kindly.
  
  On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote:
   Hi Changman, Jaegeuk,
   
   On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote:
On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
 On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
  Hi Jaegeuk,
  
  We should call flush_dcache_page before kunmap because the purpose 
  of the cache flush is to address aliasing problem related to 
  virtual address.
 
 Oh, I just followed zero_user_segments below.
 
 static inline void zero_user_segments(struct page *page,
   unsigned start1, unsigned end1,
   unsigned start2, unsigned end2)
 {
   void *kaddr = kmap_atomic(page);
 
   BUG_ON(end1  PAGE_SIZE || end2  PAGE_SIZE);
 
   if (end1  start1)
   memset(kaddr + start1, 0, end1 - start1);
 
   if (end2  start2)
   memset(kaddr + start2, 0, end2 - start2);
 
   kunmap_atomic(kaddr);
   flush_dcache_page(page);
 }
 
 Is this a wrong reference? Or, a bug?
 

Well.. Data in cache only have to be flushed until before other users 
read the data.
If so, it's not a bug.

   
   Yes, it is not a bug, since flush_dcache_page() needs to be able to
   deal with non-kmapped pages. However, this may create overhead in
   some situations.
   
  
  Previously, I was vague but I thought that it should be different
  according to vaddr exists or not. So I told jaegeuk that it should
  be better to change an order between flush_dache_page and kunmap.
  But actually, it doesn't matter the order between them except
  the situation you said.
  Could you explain the situation that makes overhead by flushing after 
  kummap.
  I can't imagine it by just seeing flush_dcache_page code.
  
 
 I was a not very precise here. Yes, flush_dcache_page() on ARM does
 the same in both situations since it has no idea whether it is called
 before or after kunmap.  However, flush_kernel_dcache_page() can
 assume that it is called before kunmap and thus, for example, does not
 need to pin a highmem page by kmap_high_get() (apart from not having
 to care about flushing user space mappings)
 
   According to documentation (see Documentation/cachetlb.txt), this is
   a use for flush_kernel_dcache_page(), since the page has been
   modified by the kernel only.  In contrast to flush_dcache_page(),
   this function must be called before kunmap().
   
   flush_kernel_dcache_page() does not need to flush the user space
   aliases.  Additionally, at least on ARM, it does not flush at all
   when called within kmap_atomic()/kunmap_atomic(), when
   kunmap_atomic() is going to flush the page anyway.  (I know that
   almost no one uses flush_kernel_dcache_page() (probably because
   almost no one knows when to use which of the two functions), but it
   may save a few cache flushes on architectures which are affected by
   aliasing)
   
   
 Anyway I modified as below.
 
 Thanks,
 
 From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 
 2001
 From: Jaegeuk Kim jaeg...@kernel.org
 Date: Tue, 18 Nov 2014 10:50:21 -0800
 Subject: [PATCH] f2fs: call flush_dcache_page when the page was 
 updated
 
 Whenever f2fs updates mapped pages, it needs to call 
 flush_dcache_page.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/dir.c| 7 ++-
  fs/f2fs/inline.c | 2 ++
  2 files changed, 8 insertions(+), 1 deletion(-)
 
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index 5a49995..fabf4ee 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
 f2fs_dir_entry *de,
   f2fs_wait_on_page_writeback(page, type);
   de-ino = cpu_to_le32(inode-i_ino);
   set_de_type(de, inode);
 - if (!f2fs_has_inline_dentry(dir))
 + if (!f2fs_has_inline_dentry(dir)) {
 + flush_dcache_page(page);
   kunmap(page);
 + }
   
   Is this a page that may be mapped into user space? (I may be
   completely wrong here, since I have no idea how this code works.  But
   it looks like as if the answer is no ;-) ).
   
   It is not necessary to flush pages that cannot be seen by user space
   (see also the NOTE in the documentation of flush_dcache_page() in
   cachetlb.txt). Thus, if you know that a page will not be mapped into
   user space, please don't create the overhead of flushing it.
   
  
  In the case of dentry unlike inline data

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-23 Thread Changman Lee

Hi Simon,
Thanks for your explanation kindly.

On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote:
> Hi Changman, Jaegeuk,
> 
> On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote:
> > On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
> > > On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
> > > > Hi Jaegeuk,
> > > > 
> > > > We should call flush_dcache_page before kunmap because the purpose of 
> > > > the cache flush is to address aliasing problem related to virtual 
> > > > address.
> > > 
> > > Oh, I just followed zero_user_segments below.
> > > 
> > > static inline void zero_user_segments(struct page *page,
> > >   unsigned start1, unsigned end1,
> > >   unsigned start2, unsigned end2)
> > > {
> > >   void *kaddr = kmap_atomic(page);
> > > 
> > >   BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
> > > 
> > >   if (end1 > start1)
> > >   memset(kaddr + start1, 0, end1 - start1);
> > > 
> > >   if (end2 > start2)
> > >   memset(kaddr + start2, 0, end2 - start2);
> > > 
> > >   kunmap_atomic(kaddr);
> > >   flush_dcache_page(page);
> > > }
> > > 
> > > Is this a wrong reference? Or, a bug?
> > > 
> > 
> > Well.. Data in cache only have to be flushed until before other users read 
> > the data.
> > If so, it's not a bug.
> > 
> 
> Yes, it is not a bug, since flush_dcache_page() needs to be able to
> deal with non-kmapped pages. However, this may create overhead in
> some situations.
> 

Previously, I was vague but I thought that it should be different
according to vaddr exists or not. So I told jaegeuk that it should
be better to change an order between flush_dache_page and kunmap.
But actually, it doesn't matter the order between them except
the situation you said.
Could you explain the situation that makes overhead by flushing after kummap.
I can't imagine it by just seeing flush_dcache_page code.

> According to documentation (see Documentation/cachetlb.txt), this is
> a use for flush_kernel_dcache_page(), since the page has been
> modified by the kernel only.  In contrast to flush_dcache_page(),
> this function must be called before kunmap().
> 
> flush_kernel_dcache_page() does not need to flush the user space
> aliases.  Additionally, at least on ARM, it does not flush at all
> when called within kmap_atomic()/kunmap_atomic(), when
> kunmap_atomic() is going to flush the page anyway.  (I know that
> almost no one uses flush_kernel_dcache_page() (probably because
> almost no one knows when to use which of the two functions), but it
> may save a few cache flushes on architectures which are affected by
> aliasing)
> 
> 
> > > Anyway I modified as below.
> > > 
> > > Thanks,
> > > 
> > > >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim 
> > > Date: Tue, 18 Nov 2014 10:50:21 -0800
> > > Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated
> > > 
> > > Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
> > > 
> > > Signed-off-by: Jaegeuk Kim 
> > > ---
> > >  fs/f2fs/dir.c| 7 ++-
> > >  fs/f2fs/inline.c | 2 ++
> > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > > index 5a49995..fabf4ee 100644
> > > --- a/fs/f2fs/dir.c
> > > +++ b/fs/f2fs/dir.c
> > > @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
> > > f2fs_dir_entry *de,
> > >   f2fs_wait_on_page_writeback(page, type);
> > >   de->ino = cpu_to_le32(inode->i_ino);
> > >   set_de_type(de, inode);
> > > - if (!f2fs_has_inline_dentry(dir))
> > > + if (!f2fs_has_inline_dentry(dir)) {
> > > + flush_dcache_page(page);
> > >   kunmap(page);
> > > + }
> 
> Is this a page that may be mapped into user space? (I may be
> completely wrong here, since I have no idea how this code works.  But
> it looks like as if the answer is "no" ;-) ).
> 
> It is not necessary to flush pages that cannot be seen by user space
> (see also the NOTE in the documentation of flush_dcache_page() in
> cachetlb.txt). Thus, if you know that a page will not be mapped into
> user space, please don't create the overhead of flushing it.
> 

In the case of dentry unlike inline data, this is not mapped to user space, so 
dcache flush
makes overhead. Do you mean that?

Best regard,
Changman

> 
> - Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-23 Thread Changman Lee

Hi Simon,
Thanks for your explanation kindly.

On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote:
 Hi Changman, Jaegeuk,
 
 On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote:
  On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
   On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
Hi Jaegeuk,

We should call flush_dcache_page before kunmap because the purpose of 
the cache flush is to address aliasing problem related to virtual 
address.
   
   Oh, I just followed zero_user_segments below.
   
   static inline void zero_user_segments(struct page *page,
 unsigned start1, unsigned end1,
 unsigned start2, unsigned end2)
   {
 void *kaddr = kmap_atomic(page);
   
 BUG_ON(end1  PAGE_SIZE || end2  PAGE_SIZE);
   
 if (end1  start1)
 memset(kaddr + start1, 0, end1 - start1);
   
 if (end2  start2)
 memset(kaddr + start2, 0, end2 - start2);
   
 kunmap_atomic(kaddr);
 flush_dcache_page(page);
   }
   
   Is this a wrong reference? Or, a bug?
   
  
  Well.. Data in cache only have to be flushed until before other users read 
  the data.
  If so, it's not a bug.
  
 
 Yes, it is not a bug, since flush_dcache_page() needs to be able to
 deal with non-kmapped pages. However, this may create overhead in
 some situations.
 

Previously, I was vague but I thought that it should be different
according to vaddr exists or not. So I told jaegeuk that it should
be better to change an order between flush_dache_page and kunmap.
But actually, it doesn't matter the order between them except
the situation you said.
Could you explain the situation that makes overhead by flushing after kummap.
I can't imagine it by just seeing flush_dcache_page code.

 According to documentation (see Documentation/cachetlb.txt), this is
 a use for flush_kernel_dcache_page(), since the page has been
 modified by the kernel only.  In contrast to flush_dcache_page(),
 this function must be called before kunmap().
 
 flush_kernel_dcache_page() does not need to flush the user space
 aliases.  Additionally, at least on ARM, it does not flush at all
 when called within kmap_atomic()/kunmap_atomic(), when
 kunmap_atomic() is going to flush the page anyway.  (I know that
 almost no one uses flush_kernel_dcache_page() (probably because
 almost no one knows when to use which of the two functions), but it
 may save a few cache flushes on architectures which are affected by
 aliasing)
 
 
   Anyway I modified as below.
   
   Thanks,
   
   From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001
   From: Jaegeuk Kim jaeg...@kernel.org
   Date: Tue, 18 Nov 2014 10:50:21 -0800
   Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated
   
   Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
   
   Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
   ---
fs/f2fs/dir.c| 7 ++-
fs/f2fs/inline.c | 2 ++
2 files changed, 8 insertions(+), 1 deletion(-)
   
   diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
   index 5a49995..fabf4ee 100644
   --- a/fs/f2fs/dir.c
   +++ b/fs/f2fs/dir.c
   @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
   f2fs_dir_entry *de,
 f2fs_wait_on_page_writeback(page, type);
 de-ino = cpu_to_le32(inode-i_ino);
 set_de_type(de, inode);
   - if (!f2fs_has_inline_dentry(dir))
   + if (!f2fs_has_inline_dentry(dir)) {
   + flush_dcache_page(page);
 kunmap(page);
   + }
 
 Is this a page that may be mapped into user space? (I may be
 completely wrong here, since I have no idea how this code works.  But
 it looks like as if the answer is no ;-) ).
 
 It is not necessary to flush pages that cannot be seen by user space
 (see also the NOTE in the documentation of flush_dcache_page() in
 cachetlb.txt). Thus, if you know that a page will not be mapped into
 user space, please don't create the overhead of flushing it.
 

In the case of dentry unlike inline data, this is not mapped to user space, so 
dcache flush
makes overhead. Do you mean that?

Best regard,
Changman

 
 - Simon
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-20 Thread Changman Lee

On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
> On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
> > Hi Jaegeuk,
> > 
> > We should call flush_dcache_page before kunmap because the purpose of the 
> > cache flush is to address aliasing problem related to virtual address.
> 
> Oh, I just followed zero_user_segments below.
> 
> static inline void zero_user_segments(struct page *page,
>   unsigned start1, unsigned end1,
>   unsigned start2, unsigned end2)
> {
>   void *kaddr = kmap_atomic(page);
> 
>   BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
> 
>   if (end1 > start1)
>   memset(kaddr + start1, 0, end1 - start1);
> 
>   if (end2 > start2)
>   memset(kaddr + start2, 0, end2 - start2);
> 
>   kunmap_atomic(kaddr);
>   flush_dcache_page(page);
> }
> 
> Is this a wrong reference? Or, a bug?
> 

Well.. Data in cache only have to be flushed until before other users read the 
data.
If so, it's not a bug.

> Anyway I modified as below.
> 
> Thanks,
> 
> >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim 
> Date: Tue, 18 Nov 2014 10:50:21 -0800
> Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated
> 
> Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/dir.c| 7 ++-
>  fs/f2fs/inline.c | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 5a49995..fabf4ee 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
> f2fs_dir_entry *de,
>   f2fs_wait_on_page_writeback(page, type);
>   de->ino = cpu_to_le32(inode->i_ino);
>   set_de_type(de, inode);
> - if (!f2fs_has_inline_dentry(dir))
> + if (!f2fs_has_inline_dentry(dir)) {
> + flush_dcache_page(page);
>   kunmap(page);
> + }
>   set_page_dirty(page);
>   dir->i_mtime = dir->i_ctime = CURRENT_TIME;
>   mark_inode_dirty(dir);
> @@ -365,6 +367,7 @@ static int make_empty_dir(struct inode *inode,
>   make_dentry_ptr(, (void *)dentry_blk, 1);
>   do_make_empty_dir(inode, parent, );
>  
> + flush_dcache_page(dentry_page);
>   kunmap_atomic(dentry_blk);
>  
>   set_page_dirty(dentry_page);
> @@ -578,6 +581,7 @@ fail:
>   update_inode_page(dir);
>   clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR);
>   }
> + flush_dcache_page(dentry_page);
>   kunmap(dentry_page);
>   f2fs_put_page(dentry_page, 1);
>   return err;
> @@ -660,6 +664,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
> struct page *page,
>   bit_pos = find_next_bit_le(_blk->dentry_bitmap,
>   NR_DENTRY_IN_BLOCK,
>   0);
> + flush_dcache_page(page);
>   kunmap(page); /* kunmap - pair of f2fs_find_entry */
>   set_page_dirty(page);
>  
> diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
> index f26fb87..4291c1f 100644
> --- a/fs/f2fs/inline.c
> +++ b/fs/f2fs/inline.c
> @@ -106,6 +106,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, 
> struct page *page)
>   src_addr = inline_data_addr(dn->inode_page);
>   dst_addr = kmap_atomic(page);
>   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
> + flush_dcache_page(page);
>   kunmap_atomic(dst_addr);
>   SetPageUptodate(page);
>  no_update:
> @@ -357,6 +358,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, 
> struct page *ipage,
>   memcpy(dentry_blk->filename, inline_dentry->filename,
>   NR_INLINE_DENTRY * F2FS_SLOT_LEN);
>  
> + flush_dcache_page(page);
>   kunmap_atomic(dentry_blk);
>   SetPageUptodate(page);
>   set_page_dirty(page);
> -- 
> 2.1.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-20 Thread Changman Lee

On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote:
 On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote:
  Hi Jaegeuk,
  
  We should call flush_dcache_page before kunmap because the purpose of the 
  cache flush is to address aliasing problem related to virtual address.
 
 Oh, I just followed zero_user_segments below.
 
 static inline void zero_user_segments(struct page *page,
   unsigned start1, unsigned end1,
   unsigned start2, unsigned end2)
 {
   void *kaddr = kmap_atomic(page);
 
   BUG_ON(end1  PAGE_SIZE || end2  PAGE_SIZE);
 
   if (end1  start1)
   memset(kaddr + start1, 0, end1 - start1);
 
   if (end2  start2)
   memset(kaddr + start2, 0, end2 - start2);
 
   kunmap_atomic(kaddr);
   flush_dcache_page(page);
 }
 
 Is this a wrong reference? Or, a bug?
 

Well.. Data in cache only have to be flushed until before other users read the 
data.
If so, it's not a bug.

 Anyway I modified as below.
 
 Thanks,
 
 From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001
 From: Jaegeuk Kim jaeg...@kernel.org
 Date: Tue, 18 Nov 2014 10:50:21 -0800
 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated
 
 Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/dir.c| 7 ++-
  fs/f2fs/inline.c | 2 ++
  2 files changed, 8 insertions(+), 1 deletion(-)
 
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index 5a49995..fabf4ee 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
 f2fs_dir_entry *de,
   f2fs_wait_on_page_writeback(page, type);
   de-ino = cpu_to_le32(inode-i_ino);
   set_de_type(de, inode);
 - if (!f2fs_has_inline_dentry(dir))
 + if (!f2fs_has_inline_dentry(dir)) {
 + flush_dcache_page(page);
   kunmap(page);
 + }
   set_page_dirty(page);
   dir-i_mtime = dir-i_ctime = CURRENT_TIME;
   mark_inode_dirty(dir);
 @@ -365,6 +367,7 @@ static int make_empty_dir(struct inode *inode,
   make_dentry_ptr(d, (void *)dentry_blk, 1);
   do_make_empty_dir(inode, parent, d);
  
 + flush_dcache_page(dentry_page);
   kunmap_atomic(dentry_blk);
  
   set_page_dirty(dentry_page);
 @@ -578,6 +581,7 @@ fail:
   update_inode_page(dir);
   clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR);
   }
 + flush_dcache_page(dentry_page);
   kunmap(dentry_page);
   f2fs_put_page(dentry_page, 1);
   return err;
 @@ -660,6 +664,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
 struct page *page,
   bit_pos = find_next_bit_le(dentry_blk-dentry_bitmap,
   NR_DENTRY_IN_BLOCK,
   0);
 + flush_dcache_page(page);
   kunmap(page); /* kunmap - pair of f2fs_find_entry */
   set_page_dirty(page);
  
 diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
 index f26fb87..4291c1f 100644
 --- a/fs/f2fs/inline.c
 +++ b/fs/f2fs/inline.c
 @@ -106,6 +106,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, 
 struct page *page)
   src_addr = inline_data_addr(dn-inode_page);
   dst_addr = kmap_atomic(page);
   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
 + flush_dcache_page(page);
   kunmap_atomic(dst_addr);
   SetPageUptodate(page);
  no_update:
 @@ -357,6 +358,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, 
 struct page *ipage,
   memcpy(dentry_blk-filename, inline_dentry-filename,
   NR_INLINE_DENTRY * F2FS_SLOT_LEN);
  
 + flush_dcache_page(page);
   kunmap_atomic(dentry_blk);
   SetPageUptodate(page);
   set_page_dirty(page);
 -- 
 2.1.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-19 Thread Changman Lee

Hi Jaegeuk,

We should call flush_dcache_page before kunmap because the purpose of the cache 
flush is to address aliasing problem related to virtual address.

On Wed, Nov 19, 2014 at 02:35:08PM -0800, Jaegeuk Kim wrote:
> Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/dir.c| 7 ++-
>  fs/f2fs/inline.c | 4 +++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 5a49995..312fbfc 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
> f2fs_dir_entry *de,
>   f2fs_wait_on_page_writeback(page, type);
>   de->ino = cpu_to_le32(inode->i_ino);
>   set_de_type(de, inode);
> - if (!f2fs_has_inline_dentry(dir))
> + if (!f2fs_has_inline_dentry(dir)) {
>   kunmap(page);
> + flush_dcache_page(page);
> + }
>   set_page_dirty(page);
>   dir->i_mtime = dir->i_ctime = CURRENT_TIME;
>   mark_inode_dirty(dir);
> @@ -366,6 +368,7 @@ static int make_empty_dir(struct inode *inode,
>   do_make_empty_dir(inode, parent, );
>  
>   kunmap_atomic(dentry_blk);
> + flush_dcache_page(dentry_page);
>  
>   set_page_dirty(dentry_page);
>   f2fs_put_page(dentry_page, 1);
> @@ -579,6 +582,7 @@ fail:
>   clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR);
>   }
>   kunmap(dentry_page);
> + flush_dcache_page(dentry_page);
>   f2fs_put_page(dentry_page, 1);
>   return err;
>  }
> @@ -661,6 +665,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
> struct page *page,
>   NR_DENTRY_IN_BLOCK,
>   0);
>   kunmap(page); /* kunmap - pair of f2fs_find_entry */
> + flush_dcache_page(page);
>   set_page_dirty(page);
>  
>   dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
> index f26fb87..8b7cc51 100644
> --- a/fs/f2fs/inline.c
> +++ b/fs/f2fs/inline.c
> @@ -45,8 +45,8 @@ void read_inline_data(struct page *page, struct page *ipage)
>   src_addr = inline_data_addr(ipage);
>   dst_addr = kmap_atomic(page);
>   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
> - flush_dcache_page(page);
>   kunmap_atomic(dst_addr);
> + flush_dcache_page(page);
>   SetPageUptodate(page);
>  }
>  
> @@ -107,6 +107,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, 
> struct page *page)
>   dst_addr = kmap_atomic(page);
>   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
>   kunmap_atomic(dst_addr);
> + flush_dcache_page(page);
>   SetPageUptodate(page);
>  no_update:
>   /* write data page to try to make data consistent */
> @@ -358,6 +359,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, 
> struct page *ipage,
>   NR_INLINE_DENTRY * F2FS_SLOT_LEN);
>  
>   kunmap_atomic(dentry_blk);
> + flush_dcache_page(page);
>   SetPageUptodate(page);
>   set_page_dirty(page);
>  
> -- 
> 2.1.1
> 
> 
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751=/4140/ostg.clktrk
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated

2014-11-19 Thread Changman Lee

Hi Jaegeuk,

We should call flush_dcache_page before kunmap because the purpose of the cache 
flush is to address aliasing problem related to virtual address.

On Wed, Nov 19, 2014 at 02:35:08PM -0800, Jaegeuk Kim wrote:
 Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/dir.c| 7 ++-
  fs/f2fs/inline.c | 4 +++-
  2 files changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index 5a49995..312fbfc 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct 
 f2fs_dir_entry *de,
   f2fs_wait_on_page_writeback(page, type);
   de-ino = cpu_to_le32(inode-i_ino);
   set_de_type(de, inode);
 - if (!f2fs_has_inline_dentry(dir))
 + if (!f2fs_has_inline_dentry(dir)) {
   kunmap(page);
 + flush_dcache_page(page);
 + }
   set_page_dirty(page);
   dir-i_mtime = dir-i_ctime = CURRENT_TIME;
   mark_inode_dirty(dir);
 @@ -366,6 +368,7 @@ static int make_empty_dir(struct inode *inode,
   do_make_empty_dir(inode, parent, d);
  
   kunmap_atomic(dentry_blk);
 + flush_dcache_page(dentry_page);
  
   set_page_dirty(dentry_page);
   f2fs_put_page(dentry_page, 1);
 @@ -579,6 +582,7 @@ fail:
   clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR);
   }
   kunmap(dentry_page);
 + flush_dcache_page(dentry_page);
   f2fs_put_page(dentry_page, 1);
   return err;
  }
 @@ -661,6 +665,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
 struct page *page,
   NR_DENTRY_IN_BLOCK,
   0);
   kunmap(page); /* kunmap - pair of f2fs_find_entry */
 + flush_dcache_page(page);
   set_page_dirty(page);
  
   dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
 index f26fb87..8b7cc51 100644
 --- a/fs/f2fs/inline.c
 +++ b/fs/f2fs/inline.c
 @@ -45,8 +45,8 @@ void read_inline_data(struct page *page, struct page *ipage)
   src_addr = inline_data_addr(ipage);
   dst_addr = kmap_atomic(page);
   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
 - flush_dcache_page(page);
   kunmap_atomic(dst_addr);
 + flush_dcache_page(page);
   SetPageUptodate(page);
  }
  
 @@ -107,6 +107,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, 
 struct page *page)
   dst_addr = kmap_atomic(page);
   memcpy(dst_addr, src_addr, MAX_INLINE_DATA);
   kunmap_atomic(dst_addr);
 + flush_dcache_page(page);
   SetPageUptodate(page);
  no_update:
   /* write data page to try to make data consistent */
 @@ -358,6 +359,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, 
 struct page *ipage,
   NR_INLINE_DENTRY * F2FS_SLOT_LEN);
  
   kunmap_atomic(dentry_blk);
 + flush_dcache_page(page);
   SetPageUptodate(page);
   set_page_dirty(page);
  
 -- 
 2.1.1
 
 
 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2

2014-11-11 Thread Changman Lee

On Mon, Nov 10, 2014 at 07:07:59AM -0800, Jaegeuk Kim wrote:
> Hi Changman,
> 
> On Mon, Nov 10, 2014 at 06:54:37PM +0900, Changman Lee wrote:
> > On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote:
> > > The roll-forward mechanism should be activated when the number of active
> > > logs is not 2.
> > > 
> > > Signed-off-by: Jaegeuk Kim 
> > > ---
> > >  fs/f2fs/file.c| 2 ++
> > >  fs/f2fs/segment.c | 4 ++--
> > >  2 files changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 46311e7..54722a0 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode 
> > > *inode)
> > >   need_cp = true;
> > >   else if (test_opt(sbi, FASTBOOT))
> > >   need_cp = true;
> > > + else if (sbi->active_logs == 2)
> > > + need_cp = true;
> > >  
> > >   return need_cp;
> > >  }
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index 2fb3d7f..16721b5d 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, 
> > > enum page_type p_type)
> > >   else
> > >   return CURSEG_COLD_DATA;
> > >   } else {
> > > - if (IS_DNODE(page) && !is_cold_node(page))
> > > - return CURSEG_HOT_NODE;
> > > + if (IS_DNODE(page) && is_cold_node(page))
> > > + return CURSEG_WARM_NODE;
> > 
> > Hi Jaegeuk,
> > 
> > We should take hot/cold seperation into account as well.
> > In case of dir inode, it will be mixed with COLD_NODE.
> > If it's trade-off, let's notice it kindly as comments.
> 
> NAK.
> This patch tries to fix a bug, which is not a trade-off.
> We should write files' direct node blocks in CURSEG_WARM_NODE for recovery.
> 
> Thanks,

Okay, a word of 'trade-off' is wrong. We must be able to do recovery.
However, we break a rule of hot/cold separation we want. So I thought we
should notice its negative effect.
Anyway, how about putting WARM and HOT together instead HOT and COLD?
We can distinguish enough if they are direct node and have fsync_mark at
recovery time although HOT/WARM are mixed.
Let me know if there is my misundertanding.

Thanks,

> 
> > 
> > Regards,
> > Changman
> > 
> > >   else
> > >   return CURSEG_COLD_NODE;
> > >   }
> > > -- 
> > > 2.1.1
> > > 
> > > 
> > > --
> > > ___
> > > Linux-f2fs-devel mailing list
> > > linux-f2fs-de...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2

2014-11-11 Thread Changman Lee

On Mon, Nov 10, 2014 at 07:07:59AM -0800, Jaegeuk Kim wrote:
 Hi Changman,
 
 On Mon, Nov 10, 2014 at 06:54:37PM +0900, Changman Lee wrote:
  On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote:
   The roll-forward mechanism should be activated when the number of active
   logs is not 2.
   
   Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
   ---
fs/f2fs/file.c| 2 ++
fs/f2fs/segment.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
   
   diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
   index 46311e7..54722a0 100644
   --- a/fs/f2fs/file.c
   +++ b/fs/f2fs/file.c
   @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode 
   *inode)
 need_cp = true;
 else if (test_opt(sbi, FASTBOOT))
 need_cp = true;
   + else if (sbi-active_logs == 2)
   + need_cp = true;

 return need_cp;
}
   diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
   index 2fb3d7f..16721b5d 100644
   --- a/fs/f2fs/segment.c
   +++ b/fs/f2fs/segment.c
   @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, 
   enum page_type p_type)
 else
 return CURSEG_COLD_DATA;
 } else {
   - if (IS_DNODE(page)  !is_cold_node(page))
   - return CURSEG_HOT_NODE;
   + if (IS_DNODE(page)  is_cold_node(page))
   + return CURSEG_WARM_NODE;
  
  Hi Jaegeuk,
  
  We should take hot/cold seperation into account as well.
  In case of dir inode, it will be mixed with COLD_NODE.
  If it's trade-off, let's notice it kindly as comments.
 
 NAK.
 This patch tries to fix a bug, which is not a trade-off.
 We should write files' direct node blocks in CURSEG_WARM_NODE for recovery.
 
 Thanks,

Okay, a word of 'trade-off' is wrong. We must be able to do recovery.
However, we break a rule of hot/cold separation we want. So I thought we
should notice its negative effect.
Anyway, how about putting WARM and HOT together instead HOT and COLD?
We can distinguish enough if they are direct node and have fsync_mark at
recovery time although HOT/WARM are mixed.
Let me know if there is my misundertanding.

Thanks,

 
  
  Regards,
  Changman
  
 else
 return CURSEG_COLD_NODE;
 }
   -- 
   2.1.1
   
   
   --
   ___
   Linux-f2fs-devel mailing list
   linux-f2fs-de...@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2

2014-11-10 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote:
> The roll-forward mechanism should be activated when the number of active
> logs is not 2.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c| 2 ++
>  fs/f2fs/segment.c | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 46311e7..54722a0 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode)
>   need_cp = true;
>   else if (test_opt(sbi, FASTBOOT))
>   need_cp = true;
> + else if (sbi->active_logs == 2)
> + need_cp = true;
>  
>   return need_cp;
>  }
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 2fb3d7f..16721b5d 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum 
> page_type p_type)
>   else
>   return CURSEG_COLD_DATA;
>   } else {
> - if (IS_DNODE(page) && !is_cold_node(page))
> - return CURSEG_HOT_NODE;
> + if (IS_DNODE(page) && is_cold_node(page))
> + return CURSEG_WARM_NODE;

Hi Jaegeuk,

We should take hot/cold seperation into account as well.
In case of dir inode, it will be mixed with COLD_NODE.
If it's trade-off, let's notice it kindly as comments.

Regards,
Changman

>   else
>   return CURSEG_COLD_NODE;
>   }
> -- 
> 2.1.1
> 
> 
> --
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2

2014-11-10 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote:
 The roll-forward mechanism should be activated when the number of active
 logs is not 2.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/file.c| 2 ++
  fs/f2fs/segment.c | 4 ++--
  2 files changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
 index 46311e7..54722a0 100644
 --- a/fs/f2fs/file.c
 +++ b/fs/f2fs/file.c
 @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode)
   need_cp = true;
   else if (test_opt(sbi, FASTBOOT))
   need_cp = true;
 + else if (sbi-active_logs == 2)
 + need_cp = true;
  
   return need_cp;
  }
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 2fb3d7f..16721b5d 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum 
 page_type p_type)
   else
   return CURSEG_COLD_DATA;
   } else {
 - if (IS_DNODE(page)  !is_cold_node(page))
 - return CURSEG_HOT_NODE;
 + if (IS_DNODE(page)  is_cold_node(page))
 + return CURSEG_WARM_NODE;

Hi Jaegeuk,

We should take hot/cold seperation into account as well.
In case of dir inode, it will be mixed with COLD_NODE.
If it's trade-off, let's notice it kindly as comments.

Regards,
Changman

   else
   return CURSEG_COLD_NODE;
   }
 -- 
 2.1.1
 
 
 --
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: implement -o dirsync

2014-11-09 Thread Changman Lee

On Sun, Nov 09, 2014 at 10:24:22PM -0800, Jaegeuk Kim wrote:
> If a mount option has dirsync, we should call checkpoint for all the directory
> operations.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/namei.c | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index 6312dd2..db3ee09 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -138,6 +138,9 @@ static int f2fs_create(struct inode *dir, struct dentry 
> *dentry, umode_t mode,
>   stat_inc_inline_inode(inode);
>   d_instantiate(dentry, inode);
>   unlock_new_inode(inode);
> +
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  out:
>   handle_failed_inode(inode);
> @@ -164,6 +167,9 @@ static int f2fs_link(struct dentry *old_dentry, struct 
> inode *dir,
>   f2fs_unlock_op(sbi);
>  
>   d_instantiate(dentry, inode);
> +
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  out:
>   clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
> @@ -233,6 +239,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
> *dentry)
>   f2fs_delete_entry(de, page, dir, inode);
>   f2fs_unlock_op(sbi);
>  
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
> +
>   /* In order to evict this inode, we set it dirty */
>   mark_inode_dirty(inode);

Let's move it below mark_inode_dirty.
After sync, it's unnecessary inserting inode into dirty_list.


>  fail:
> @@ -268,6 +277,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
> *dentry,
>  
>   d_instantiate(dentry, inode);
>   unlock_new_inode(inode);
> +
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return err;
>  out:
>   handle_failed_inode(inode);
> @@ -304,6 +316,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
> *dentry, umode_t mode)
>   d_instantiate(dentry, inode);
>   unlock_new_inode(inode);
>  
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  
>  out_fail:
> @@ -346,8 +360,12 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
> *dentry,
>   f2fs_unlock_op(sbi);
>  
>   alloc_nid_done(sbi, inode->i_ino);
> +
>   d_instantiate(dentry, inode);
>   unlock_new_inode(inode);
> +
> + if (IS_DIRSYNC(dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  out:
>   handle_failed_inode(inode);
> @@ -461,6 +479,9 @@ static int f2fs_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   }
>  
>   f2fs_unlock_op(sbi);
> +
> + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  
>  put_out_dir:
> @@ -600,6 +621,9 @@ static int f2fs_cross_rename(struct inode *old_dir, 
> struct dentry *old_dentry,
>   update_inode_page(new_dir);
>  
>   f2fs_unlock_op(sbi);
> +
> + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
> + f2fs_sync_fs(sbi->sb, 1);
>   return 0;
>  out_undo:
>   /* Still we may fail to recover name info of f2fs_inode here */
> -- 
> 2.1.1
> 
> 
> --
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 4/5] f2fs: write node pages if checkpoint is not doing

2014-11-09 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:08PM -0800, Jaegeuk Kim wrote:
> It needs to write node pages if checkpoint is not doing in order to avoid
> memory pressure.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/node.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 4ea2c47..6f514fb 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1314,10 +1314,12 @@ static int f2fs_write_node_page(struct page *page,
>   return 0;
>   }
>  
> - if (wbc->for_reclaim)
> - goto redirty_out;
> -
> - down_read(>node_write);
> + if (wbc->for_reclaim) {
> + if (!down_read_trylock(>node_write))
> + goto redirty_out;

Previously, we skipped write_page for reclaim path, but from now on, we
will write out node page to reclaim memory at any time except checkpoint.
We should remember it may occur to break merging bio.
Got it.

Reviewed-by: Changman Lee 

> + } else {
> + down_read(>node_write);
> + }
>   set_page_writeback(page);
>   write_node_page(sbi, page, , nid, ni.blk_addr, _addr);
>   set_node_addr(sbi, , new_addr, is_fsync_dnode(page));
> -- 
> 2.1.1
> 
> 
> --
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 3/5] f2fs: control the memory footprint used by ino entries

2014-11-09 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:07PM -0800, Jaegeuk Kim wrote:
> This patch adds to control the memory footprint used by ino entries.
> This will conduct best effort, not strictly.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/node.c| 28 ++--
>  fs/f2fs/node.h|  3 ++-
>  fs/f2fs/segment.c |  3 ++-
>  3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 44b8afe..4ea2c47 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -31,22 +31,38 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int 
> type)
>  {
>   struct f2fs_nm_info *nm_i = NM_I(sbi);
>   struct sysinfo val;
> + unsigned long avail_ram;
>   unsigned long mem_size = 0;
>   bool res = false;
>  
>   si_meminfo();
> - /* give 25%, 25%, 50% memory for each components respectively */
> +
> + /* only uses low memory */
> + avail_ram = val.totalram - val.totalhigh;
> +
> + /* give 25%, 25%, 50%, 50% memory for each components respectively */

Hi Jaegeuk,

The memory usage of nm_i should be 100% but it's 125%.
Mistake or intended?

>   if (type == FREE_NIDS) {
> - mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >> 12;
> - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2);
> + mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >>
> + PAGE_CACHE_SHIFT;
> + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
>   } else if (type == NAT_ENTRIES) {
> - mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >> 12;
> - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2);
> + mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >>
> + PAGE_CACHE_SHIFT;
> + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
>   } else if (type == DIRTY_DENTS) {
>   if (sbi->sb->s_bdi->dirty_exceeded)
>   return false;
>   mem_size = get_pages(sbi, F2FS_DIRTY_DENTS);
> - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 1);
> + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
> + } else if (type == INO_ENTRIES) {
> + int i;
> +
> + if (sbi->sb->s_bdi->dirty_exceeded)
> + return false;
> + for (i = 0; i <= UPDATE_INO; i++)
> + mem_size += (sbi->ino_num[i] * sizeof(struct ino_entry))
> + >> PAGE_CACHE_SHIFT;
> + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
>   }
>   return res;
>  }
> diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
> index acb71e5..d10b644 100644
> --- a/fs/f2fs/node.h
> +++ b/fs/f2fs/node.h
> @@ -106,7 +106,8 @@ static inline void raw_nat_from_node_info(struct 
> f2fs_nat_entry *raw_ne,
>  enum mem_type {
>   FREE_NIDS,  /* indicates the free nid list */
>   NAT_ENTRIES,/* indicates the cached nat entry */
> - DIRTY_DENTS /* indicates dirty dentry pages */
> + DIRTY_DENTS,/* indicates dirty dentry pages */
> + INO_ENTRIES,/* indicates inode entries */
>  };
>  
>  struct nat_entry_set {
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 16721b5d..e094675 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -276,7 +276,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
>  {
>   /* check the # of cached NAT entries and prefree segments */
>   if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK) ||
> - excess_prefree_segs(sbi))
> + excess_prefree_segs(sbi) ||
> + available_free_memory(sbi, INO_ENTRIES))
>   f2fs_sync_fs(sbi->sb, true);
>  }
>  
> -- 
> 2.1.1
> 
> 
> --
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 3/5] f2fs: control the memory footprint used by ino entries

2014-11-09 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:07PM -0800, Jaegeuk Kim wrote:
 This patch adds to control the memory footprint used by ino entries.
 This will conduct best effort, not strictly.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/node.c| 28 ++--
  fs/f2fs/node.h|  3 ++-
  fs/f2fs/segment.c |  3 ++-
  3 files changed, 26 insertions(+), 8 deletions(-)
 
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index 44b8afe..4ea2c47 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -31,22 +31,38 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int 
 type)
  {
   struct f2fs_nm_info *nm_i = NM_I(sbi);
   struct sysinfo val;
 + unsigned long avail_ram;
   unsigned long mem_size = 0;
   bool res = false;
  
   si_meminfo(val);
 - /* give 25%, 25%, 50% memory for each components respectively */
 +
 + /* only uses low memory */
 + avail_ram = val.totalram - val.totalhigh;
 +
 + /* give 25%, 25%, 50%, 50% memory for each components respectively */

Hi Jaegeuk,

The memory usage of nm_i should be 100% but it's 125%.
Mistake or intended?

   if (type == FREE_NIDS) {
 - mem_size = (nm_i-fcnt * sizeof(struct free_nid))  12;
 - res = mem_size  ((val.totalram * nm_i-ram_thresh / 100)  2);
 + mem_size = (nm_i-fcnt * sizeof(struct free_nid)) 
 + PAGE_CACHE_SHIFT;
 + res = mem_size  ((avail_ram * nm_i-ram_thresh / 100)  2);
   } else if (type == NAT_ENTRIES) {
 - mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry))  12;
 - res = mem_size  ((val.totalram * nm_i-ram_thresh / 100)  2);
 + mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry)) 
 + PAGE_CACHE_SHIFT;
 + res = mem_size  ((avail_ram * nm_i-ram_thresh / 100)  2);
   } else if (type == DIRTY_DENTS) {
   if (sbi-sb-s_bdi-dirty_exceeded)
   return false;
   mem_size = get_pages(sbi, F2FS_DIRTY_DENTS);
 - res = mem_size  ((val.totalram * nm_i-ram_thresh / 100)  1);
 + res = mem_size  ((avail_ram * nm_i-ram_thresh / 100)  1);
 + } else if (type == INO_ENTRIES) {
 + int i;
 +
 + if (sbi-sb-s_bdi-dirty_exceeded)
 + return false;
 + for (i = 0; i = UPDATE_INO; i++)
 + mem_size += (sbi-ino_num[i] * sizeof(struct ino_entry))
 +  PAGE_CACHE_SHIFT;
 + res = mem_size  ((avail_ram * nm_i-ram_thresh / 100)  1);
   }
   return res;
  }
 diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
 index acb71e5..d10b644 100644
 --- a/fs/f2fs/node.h
 +++ b/fs/f2fs/node.h
 @@ -106,7 +106,8 @@ static inline void raw_nat_from_node_info(struct 
 f2fs_nat_entry *raw_ne,
  enum mem_type {
   FREE_NIDS,  /* indicates the free nid list */
   NAT_ENTRIES,/* indicates the cached nat entry */
 - DIRTY_DENTS /* indicates dirty dentry pages */
 + DIRTY_DENTS,/* indicates dirty dentry pages */
 + INO_ENTRIES,/* indicates inode entries */
  };
  
  struct nat_entry_set {
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 16721b5d..e094675 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -276,7 +276,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
  {
   /* check the # of cached NAT entries and prefree segments */
   if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK) ||
 - excess_prefree_segs(sbi))
 + excess_prefree_segs(sbi) ||
 + available_free_memory(sbi, INO_ENTRIES))
   f2fs_sync_fs(sbi-sb, true);
  }
  
 -- 
 2.1.1
 
 
 --
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 4/5] f2fs: write node pages if checkpoint is not doing

2014-11-09 Thread Changman Lee

On Sat, Nov 08, 2014 at 11:36:08PM -0800, Jaegeuk Kim wrote:
 It needs to write node pages if checkpoint is not doing in order to avoid
 memory pressure.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/node.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index 4ea2c47..6f514fb 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1314,10 +1314,12 @@ static int f2fs_write_node_page(struct page *page,
   return 0;
   }
  
 - if (wbc-for_reclaim)
 - goto redirty_out;
 -
 - down_read(sbi-node_write);
 + if (wbc-for_reclaim) {
 + if (!down_read_trylock(sbi-node_write))
 + goto redirty_out;

Previously, we skipped write_page for reclaim path, but from now on, we
will write out node page to reclaim memory at any time except checkpoint.
We should remember it may occur to break merging bio.
Got it.

Reviewed-by: Changman Lee cm224@samsung.com

 + } else {
 + down_read(sbi-node_write);
 + }
   set_page_writeback(page);
   write_node_page(sbi, page, fio, nid, ni.blk_addr, new_addr);
   set_node_addr(sbi, ni, new_addr, is_fsync_dnode(page));
 -- 
 2.1.1
 
 
 --
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: implement -o dirsync

2014-11-09 Thread Changman Lee

On Sun, Nov 09, 2014 at 10:24:22PM -0800, Jaegeuk Kim wrote:
 If a mount option has dirsync, we should call checkpoint for all the directory
 operations.
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/namei.c | 24 
  1 file changed, 24 insertions(+)
 
 diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
 index 6312dd2..db3ee09 100644
 --- a/fs/f2fs/namei.c
 +++ b/fs/f2fs/namei.c
 @@ -138,6 +138,9 @@ static int f2fs_create(struct inode *dir, struct dentry 
 *dentry, umode_t mode,
   stat_inc_inline_inode(inode);
   d_instantiate(dentry, inode);
   unlock_new_inode(inode);
 +
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  out:
   handle_failed_inode(inode);
 @@ -164,6 +167,9 @@ static int f2fs_link(struct dentry *old_dentry, struct 
 inode *dir,
   f2fs_unlock_op(sbi);
  
   d_instantiate(dentry, inode);
 +
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  out:
   clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
 @@ -233,6 +239,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
 *dentry)
   f2fs_delete_entry(de, page, dir, inode);
   f2fs_unlock_op(sbi);
  
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
 +
   /* In order to evict this inode, we set it dirty */
   mark_inode_dirty(inode);

Let's move it below mark_inode_dirty.
After sync, it's unnecessary inserting inode into dirty_list.


  fail:
 @@ -268,6 +277,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
 *dentry,
  
   d_instantiate(dentry, inode);
   unlock_new_inode(inode);
 +
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return err;
  out:
   handle_failed_inode(inode);
 @@ -304,6 +316,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
 *dentry, umode_t mode)
   d_instantiate(dentry, inode);
   unlock_new_inode(inode);
  
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  
  out_fail:
 @@ -346,8 +360,12 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
 *dentry,
   f2fs_unlock_op(sbi);
  
   alloc_nid_done(sbi, inode-i_ino);
 +
   d_instantiate(dentry, inode);
   unlock_new_inode(inode);
 +
 + if (IS_DIRSYNC(dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  out:
   handle_failed_inode(inode);
 @@ -461,6 +479,9 @@ static int f2fs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
   }
  
   f2fs_unlock_op(sbi);
 +
 + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  
  put_out_dir:
 @@ -600,6 +621,9 @@ static int f2fs_cross_rename(struct inode *old_dir, 
 struct dentry *old_dentry,
   update_inode_page(new_dir);
  
   f2fs_unlock_op(sbi);
 +
 + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
 + f2fs_sync_fs(sbi-sb, 1);
   return 0;
  out_undo:
   /* Still we may fail to recover name info of f2fs_inode here */
 -- 
 2.1.1
 
 
 --
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users

2014-09-14 Thread Changman Lee

Hi JK,

I think it' nicer if this can be used as 'OR' with other policy
together. If so, we can also cover the weakness in high utilization.

Regard,
Changman

On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote:
> If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
> only starts to try in-place-updates.
> And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
> keeps out-of-order manner. Otherwise, it triggers in-place-updates.
> 
> This may be used by storage showing very high random write performance.
> 
> For example, it can be used when,
> 
> Seq. writes (Data) + wait + Seq. writes (Node)
> 
> is pretty much slower than,
> 
> Rand. writes (Data)
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  7 +++
>  Documentation/filesystems/f2fs.txt  |  9 -
>  fs/f2fs/f2fs.h  |  1 +
>  fs/f2fs/file.c  |  7 +++
>  fs/f2fs/segment.c   |  3 ++-
>  fs/f2fs/segment.h   | 14 ++
>  fs/f2fs/super.c |  2 ++
>  7 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 62dd725..6f9157f 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -44,6 +44,13 @@ Description:
>Controls the FS utilization condition for the in-place-update
>policies.
>  
> +What:/sys/fs/f2fs//min_fsync_blocks
> +Date:September 2014
> +Contact: "Jaegeuk Kim" 
> +Description:
> +  Controls the dirty page count condition for the in-place-update
> +  policies.
> +
>  What:/sys/fs/f2fs//max_small_discards
>  Date:November 2013
>  Contact: "Jaegeuk Kim" 
> diff --git a/Documentation/filesystems/f2fs.txt 
> b/Documentation/filesystems/f2fs.txt
> index a2046a7..d010da8 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/
>updates in f2fs. There are five policies:
> 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR,
> 2: F2FS_IPU_UTIL,  3: F2FS_IPU_SSR_UTIL,
> -   4: F2FS_IPU_DISABLE.
> +   4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE.
>  
>   min_ipu_util This parameter controls the threshold to 
> trigger
>in-place-updates. The number indicates 
> percentage
>of the filesystem utilization, and used by
>F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
>  
> + min_fsync_blocks This parameter controls the threshold to 
> trigger
> +  in-place-updates when F2FS_IPU_FSYNC mode is 
> set.
> +   The number indicates the number of dirty pages
> +   when fsync needs to flush on its call path. If
> +   the number is less than this value, it triggers
> +   in-place-updates.
> +
>   max_victim_search This parameter controls the number of trials to
> find a victim segment when conducting SSR and
> cleaning operations. The default value is 4096
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2756c16..4f84d2a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -386,6 +386,7 @@ struct f2fs_sm_info {
>  
>   unsigned int ipu_policy;/* in-place-update policy */
>   unsigned int min_ipu_util;  /* in-place-update threshold */
> + unsigned int min_fsync_blocks;  /* threshold for fsync */
>  
>   /* for flush command control */
>   struct flush_cmd_control *cmd_control_info;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 77426c7..af06e22 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   trace_f2fs_sync_file_enter(inode);
>  
>   /* if fdatasync is triggered, let's do in-place-update */
> - if (datasync)
> + if (get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
>   set_inode_flag(fi, FI_NEED_IPU);
> -
>   ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
> - if (datasync)
> - clear_inode_flag(fi, FI_NEED_IPU);
> + clear_inode_flag(fi, FI_NEED_IPU);
> +
>   if (ret) {
>   trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
>   return ret;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index e158d63..c6f627b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1928,8

Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users

2014-09-14 Thread Changman Lee

Hi JK,

I think it' nicer if this can be used as 'OR' with other policy
together. If so, we can also cover the weakness in high utilization.

Regard,
Changman

On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote:
 If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
 only starts to try in-place-updates.
 And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
 keeps out-of-order manner. Otherwise, it triggers in-place-updates.
 
 This may be used by storage showing very high random write performance.
 
 For example, it can be used when,
 
 Seq. writes (Data) + wait + Seq. writes (Node)
 
 is pretty much slower than,
 
 Rand. writes (Data)
 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  Documentation/ABI/testing/sysfs-fs-f2fs |  7 +++
  Documentation/filesystems/f2fs.txt  |  9 -
  fs/f2fs/f2fs.h  |  1 +
  fs/f2fs/file.c  |  7 +++
  fs/f2fs/segment.c   |  3 ++-
  fs/f2fs/segment.h   | 14 ++
  fs/f2fs/super.c |  2 ++
  7 files changed, 33 insertions(+), 10 deletions(-)
 
 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
 b/Documentation/ABI/testing/sysfs-fs-f2fs
 index 62dd725..6f9157f 100644
 --- a/Documentation/ABI/testing/sysfs-fs-f2fs
 +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
 @@ -44,6 +44,13 @@ Description:
Controls the FS utilization condition for the in-place-update
policies.
  
 +What:/sys/fs/f2fs/disk/min_fsync_blocks
 +Date:September 2014
 +Contact: Jaegeuk Kim jaeg...@kernel.org
 +Description:
 +  Controls the dirty page count condition for the in-place-update
 +  policies.
 +
  What:/sys/fs/f2fs/disk/max_small_discards
  Date:November 2013
  Contact: Jaegeuk Kim jaegeuk@samsung.com
 diff --git a/Documentation/filesystems/f2fs.txt 
 b/Documentation/filesystems/f2fs.txt
 index a2046a7..d010da8 100644
 --- a/Documentation/filesystems/f2fs.txt
 +++ b/Documentation/filesystems/f2fs.txt
 @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/devname
updates in f2fs. There are five policies:
 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR,
 2: F2FS_IPU_UTIL,  3: F2FS_IPU_SSR_UTIL,
 -   4: F2FS_IPU_DISABLE.
 +   4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE.
  
   min_ipu_util This parameter controls the threshold to 
 trigger
in-place-updates. The number indicates 
 percentage
of the filesystem utilization, and used by
F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
  
 + min_fsync_blocks This parameter controls the threshold to 
 trigger
 +  in-place-updates when F2FS_IPU_FSYNC mode is 
 set.
 +   The number indicates the number of dirty pages
 +   when fsync needs to flush on its call path. If
 +   the number is less than this value, it triggers
 +   in-place-updates.
 +
   max_victim_search This parameter controls the number of trials to
 find a victim segment when conducting SSR and
 cleaning operations. The default value is 4096
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 2756c16..4f84d2a 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -386,6 +386,7 @@ struct f2fs_sm_info {
  
   unsigned int ipu_policy;/* in-place-update policy */
   unsigned int min_ipu_util;  /* in-place-update threshold */
 + unsigned int min_fsync_blocks;  /* threshold for fsync */
  
   /* for flush command control */
   struct flush_cmd_control *cmd_control_info;
 diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
 index 77426c7..af06e22 100644
 --- a/fs/f2fs/file.c
 +++ b/fs/f2fs/file.c
 @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, 
 loff_t end, int datasync)
   trace_f2fs_sync_file_enter(inode);
  
   /* if fdatasync is triggered, let's do in-place-update */
 - if (datasync)
 + if (get_dirty_pages(inode) = SM_I(sbi)-min_fsync_blocks)
   set_inode_flag(fi, FI_NEED_IPU);
 -
   ret = filemap_write_and_wait_range(inode-i_mapping, start, end);
 - if (datasync)
 - clear_inode_flag(fi, FI_NEED_IPU);
 + clear_inode_flag(fi, FI_NEED_IPU);
 +
   if (ret) {
   trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
   return ret;
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index e158d63..c6f627b 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1928,8 +1928,9 @@ int build_segment_manager(struct

Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode

2014-08-28 Thread Changman Lee

Hi,

On Thu, Aug 28, 2014 at 04:53:01PM +0800, Chao Yu wrote:
> Hi Changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@samsung.com]
> > Sent: Thursday, August 28, 2014 9:48 AM
> > To: Chao Yu
> > Cc: Jaegeuk Kim; linux-kernel@vger.kernel.org; 
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to 
> > prevent accessing invalid
> > inode
> > 
> > Hi Chao,
> > 
> > I agree it's correct unlock_new_inode should be located after 
> > make_bad_inode.
> > 
> > About this scenario,
> > I think we should check some condition if this could be occured;
> 
> I think this condition is the almost impossible but which can happen 
> theoretically.
> 
> > A inode allocated newly could be victim by gc thread.
> > Then, f2fs_iget called by Thread A have to fail because we handled it as
> > bad_inode in Thread B. However, f2fs_iget could still get inode.
> > How about check it using is_bad_inode() in f2fs_iget.
> 
> Yes, agreed. How about return -EIO when this inode we iget_locked is bad?

Hmm.. It might be better to check return value of f2fs_iget like other f/s.

- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -595,6 +595,8 @@ next_step:
inode = f2fs_iget(sb, dni.ino);
if (IS_ERR(inode))
continue;
+   else if (is_bad_inode(inode))
+   continue;


Thanks,
Changman

> 
> Thanks,
> Yu
> 
> > 
> > Thanks,
> > 
> > On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote:
> > > As the race condition on the inode cache, following scenario can appear:
> > > [Thread a][Thread b]
> > >   ->f2fs_mkdir
> > > ->f2fs_add_link
> > >   ->__f2fs_add_link
> > > ->init_inode_metadata failed here
> > > ->gc_thread_func
> > >   ->f2fs_gc
> > > ->do_garbage_collect
> > >   ->gc_data_segment
> > > ->f2fs_iget
> > >   ->iget_locked
> > > ->wait_on_inode
> > > ->unlock_new_inode
> > > ->move_data_page
> > > ->make_bad_inode
> > > ->iput
> > >
> > > When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated 
> > > inode
> > > should be set as bad to avoid being accessed by other thread. But in above
> > > scenario, it allows f2fs to access the invalid inode before this inode 
> > > was set
> > > as bad.
> > > This patch fix the potential problem, and this issue was found by code 
> > > review.
> > >
> > > Signed-off-by: Chao Yu 
> > > ---
> > >  fs/f2fs/namei.c | 10 +-
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> > > index 6b53ce9..845f1be 100644
> > > --- a/fs/f2fs/namei.c
> > > +++ b/fs/f2fs/namei.c
> > > @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct 
> > > dentry *dentry, umode_t
> > mode,
> > >   return 0;
> > >  out:
> > >   clear_nlink(inode);
> > > - unlock_new_inode(inode);
> > >   make_bad_inode(inode);
> > > + unlock_new_inode(inode);
> > >   iput(inode);
> > >   alloc_nid_failed(sbi, ino);
> > >   return err;
> > > @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct 
> > > dentry *dentry,
> > >   return err;
> > >  out:
> > >   clear_nlink(inode);
> > > - unlock_new_inode(inode);
> > >   make_bad_inode(inode);
> > > + unlock_new_inode(inode);
> > >   iput(inode);
> > >   alloc_nid_failed(sbi, inode->i_ino);
> > >   return err;
> > > @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct 
> > > dentry *dentry, umode_t
> > mode)
> > >  out_fail:
> > >   clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
> > >   clear_nlink(inode);
> > > - unlock_new_inode(inode);
> > >   make_bad_inode(inode);
> > > + unlock_new_inode(inode);
> > >   iput(inode);
> > >   alloc_nid_failed(sbi, inode->i_ino);
> > >

Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode

2014-08-28 Thread Changman Lee

Hi,

On Thu, Aug 28, 2014 at 04:53:01PM +0800, Chao Yu wrote:
 Hi Changman,
 
  -Original Message-
  From: Changman Lee [mailto:cm224@samsung.com]
  Sent: Thursday, August 28, 2014 9:48 AM
  To: Chao Yu
  Cc: Jaegeuk Kim; linux-kernel@vger.kernel.org; 
  linux-f2fs-de...@lists.sourceforge.net
  Subject: Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to 
  prevent accessing invalid
  inode
  
  Hi Chao,
  
  I agree it's correct unlock_new_inode should be located after 
  make_bad_inode.
  
  About this scenario,
  I think we should check some condition if this could be occured;
 
 I think this condition is the almost impossible but which can happen 
 theoretically.
 
  A inode allocated newly could be victim by gc thread.
  Then, f2fs_iget called by Thread A have to fail because we handled it as
  bad_inode in Thread B. However, f2fs_iget could still get inode.
  How about check it using is_bad_inode() in f2fs_iget.
 
 Yes, agreed. How about return -EIO when this inode we iget_locked is bad?

Hmm.. It might be better to check return value of f2fs_iget like other f/s.

- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -595,6 +595,8 @@ next_step:
inode = f2fs_iget(sb, dni.ino);
if (IS_ERR(inode))
continue;
+   else if (is_bad_inode(inode))
+   continue;


Thanks,
Changman

 
 Thanks,
 Yu
 
  
  Thanks,
  
  On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote:
   As the race condition on the inode cache, following scenario can appear:
   [Thread a][Thread b]
 -f2fs_mkdir
   -f2fs_add_link
 -__f2fs_add_link
   -init_inode_metadata failed here
   -gc_thread_func
 -f2fs_gc
   -do_garbage_collect
 -gc_data_segment
   -f2fs_iget
 -iget_locked
   -wait_on_inode
   -unlock_new_inode
   -move_data_page
   -make_bad_inode
   -iput
  
   When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated 
   inode
   should be set as bad to avoid being accessed by other thread. But in above
   scenario, it allows f2fs to access the invalid inode before this inode 
   was set
   as bad.
   This patch fix the potential problem, and this issue was found by code 
   review.
  
   Signed-off-by: Chao Yu chao2...@samsung.com
   ---
fs/f2fs/namei.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
  
   diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
   index 6b53ce9..845f1be 100644
   --- a/fs/f2fs/namei.c
   +++ b/fs/f2fs/namei.c
   @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct 
   dentry *dentry, umode_t
  mode,
 return 0;
out:
 clear_nlink(inode);
   - unlock_new_inode(inode);
 make_bad_inode(inode);
   + unlock_new_inode(inode);
 iput(inode);
 alloc_nid_failed(sbi, ino);
 return err;
   @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct 
   dentry *dentry,
 return err;
out:
 clear_nlink(inode);
   - unlock_new_inode(inode);
 make_bad_inode(inode);
   + unlock_new_inode(inode);
 iput(inode);
 alloc_nid_failed(sbi, inode-i_ino);
 return err;
   @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct 
   dentry *dentry, umode_t
  mode)
out_fail:
 clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
 clear_nlink(inode);
   - unlock_new_inode(inode);
 make_bad_inode(inode);
   + unlock_new_inode(inode);
 iput(inode);
 alloc_nid_failed(sbi, inode-i_ino);
 return err;
   @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct 
   dentry *dentry,
 return 0;
out:
 clear_nlink(inode);
   - unlock_new_inode(inode);
 make_bad_inode(inode);
   + unlock_new_inode(inode);
 iput(inode);
 alloc_nid_failed(sbi, inode-i_ino);
 return err;
   @@ -688,8 +688,8 @@ release_out:
out:
 f2fs_unlock_op(sbi);
 clear_nlink(inode);
   - unlock_new_inode(inode);
 make_bad_inode(inode);
   + unlock_new_inode(inode);
 iput(inode);
 alloc_nid_failed(sbi, inode-i_ino);
 return err;
   --
   2.0.0.421.g786a89d
  
  
  
   --
   Slashdot TV.
   Video for Nerds.  Stuff that matters.
   http://tv.slashdot.org/
   ___
   Linux-f2fs-devel mailing list
   linux-f2fs-de...@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode

2014-08-27 Thread Changman Lee

Hi Chao,

I agree it's correct unlock_new_inode should be located after make_bad_inode.

About this scenario,
I think we should check some condition if this could be occured;
A inode allocated newly could be victim by gc thread.
Then, f2fs_iget called by Thread A have to fail because we handled it as
bad_inode in Thread B. However, f2fs_iget could still get inode.
How about check it using is_bad_inode() in f2fs_iget.

Thanks,

On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote:
> As the race condition on the inode cache, following scenario can appear:
> [Thread a][Thread b]
>   ->f2fs_mkdir
> ->f2fs_add_link
>   ->__f2fs_add_link
> ->init_inode_metadata failed here
> ->gc_thread_func
>   ->f2fs_gc
> ->do_garbage_collect
>   ->gc_data_segment
> ->f2fs_iget
>   ->iget_locked
> ->wait_on_inode
> ->unlock_new_inode
> ->move_data_page
> ->make_bad_inode
> ->iput
> 
> When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
> should be set as bad to avoid being accessed by other thread. But in above
> scenario, it allows f2fs to access the invalid inode before this inode was set
> as bad.
> This patch fix the potential problem, and this issue was found by code review.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/namei.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index 6b53ce9..845f1be 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry 
> *dentry, umode_t mode,
>   return 0;
>  out:
>   clear_nlink(inode);
> - unlock_new_inode(inode);
>   make_bad_inode(inode);
> + unlock_new_inode(inode);
>   iput(inode);
>   alloc_nid_failed(sbi, ino);
>   return err;
> @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
> *dentry,
>   return err;
>  out:
>   clear_nlink(inode);
> - unlock_new_inode(inode);
>   make_bad_inode(inode);
> + unlock_new_inode(inode);
>   iput(inode);
>   alloc_nid_failed(sbi, inode->i_ino);
>   return err;
> @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
> *dentry, umode_t mode)
>  out_fail:
>   clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
>   clear_nlink(inode);
> - unlock_new_inode(inode);
>   make_bad_inode(inode);
> + unlock_new_inode(inode);
>   iput(inode);
>   alloc_nid_failed(sbi, inode->i_ino);
>   return err;
> @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
> *dentry,
>   return 0;
>  out:
>   clear_nlink(inode);
> - unlock_new_inode(inode);
>   make_bad_inode(inode);
> + unlock_new_inode(inode);
>   iput(inode);
>   alloc_nid_failed(sbi, inode->i_ino);
>   return err;
> @@ -688,8 +688,8 @@ release_out:
>  out:
>   f2fs_unlock_op(sbi);
>   clear_nlink(inode);
> - unlock_new_inode(inode);
>   make_bad_inode(inode);
> + unlock_new_inode(inode);
>   iput(inode);
>   alloc_nid_failed(sbi, inode->i_ino);
>   return err;
> -- 
> 2.0.0.421.g786a89d
> 
> 
> 
> --
> Slashdot TV.  
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode

2014-08-27 Thread Changman Lee

Hi Chao,

I agree it's correct unlock_new_inode should be located after make_bad_inode.

About this scenario,
I think we should check some condition if this could be occured;
A inode allocated newly could be victim by gc thread.
Then, f2fs_iget called by Thread A have to fail because we handled it as
bad_inode in Thread B. However, f2fs_iget could still get inode.
How about check it using is_bad_inode() in f2fs_iget.

Thanks,

On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote:
 As the race condition on the inode cache, following scenario can appear:
 [Thread a][Thread b]
   -f2fs_mkdir
 -f2fs_add_link
   -__f2fs_add_link
 -init_inode_metadata failed here
 -gc_thread_func
   -f2fs_gc
 -do_garbage_collect
   -gc_data_segment
 -f2fs_iget
   -iget_locked
 -wait_on_inode
 -unlock_new_inode
 -move_data_page
 -make_bad_inode
 -iput
 
 When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
 should be set as bad to avoid being accessed by other thread. But in above
 scenario, it allows f2fs to access the invalid inode before this inode was set
 as bad.
 This patch fix the potential problem, and this issue was found by code review.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/namei.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
 index 6b53ce9..845f1be 100644
 --- a/fs/f2fs/namei.c
 +++ b/fs/f2fs/namei.c
 @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry 
 *dentry, umode_t mode,
   return 0;
  out:
   clear_nlink(inode);
 - unlock_new_inode(inode);
   make_bad_inode(inode);
 + unlock_new_inode(inode);
   iput(inode);
   alloc_nid_failed(sbi, ino);
   return err;
 @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
 *dentry,
   return err;
  out:
   clear_nlink(inode);
 - unlock_new_inode(inode);
   make_bad_inode(inode);
 + unlock_new_inode(inode);
   iput(inode);
   alloc_nid_failed(sbi, inode-i_ino);
   return err;
 @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
 *dentry, umode_t mode)
  out_fail:
   clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
   clear_nlink(inode);
 - unlock_new_inode(inode);
   make_bad_inode(inode);
 + unlock_new_inode(inode);
   iput(inode);
   alloc_nid_failed(sbi, inode-i_ino);
   return err;
 @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
 *dentry,
   return 0;
  out:
   clear_nlink(inode);
 - unlock_new_inode(inode);
   make_bad_inode(inode);
 + unlock_new_inode(inode);
   iput(inode);
   alloc_nid_failed(sbi, inode-i_ino);
   return err;
 @@ -688,8 +688,8 @@ release_out:
  out:
   f2fs_unlock_op(sbi);
   clear_nlink(inode);
 - unlock_new_inode(inode);
   make_bad_inode(inode);
 + unlock_new_inode(inode);
   iput(inode);
   alloc_nid_failed(sbi, inode-i_ino);
   return err;
 -- 
 2.0.0.421.g786a89d
 
 
 
 --
 Slashdot TV.  
 Video for Nerds.  Stuff that matters.
 http://tv.slashdot.org/
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes

2014-07-30 Thread Changman Lee

Hi Chao,

On Wed, Jul 30, 2014 at 09:07:49PM +0800, Chao Yu wrote:
> Hi Jaegeuk Changman,
> 
> > -Original Message-
> > From: Chao Yu [mailto:chao2...@samsung.com]
> > Sent: Thursday, July 03, 2014 6:59 PM
> > To: Jaegeuk Kim; Changman Lee
> > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes
> > 
> > We do not need to block on ->node_write among different node page writers 
> > e.g.
> > fsync/flush, unless we have a node page writer from write_checkpoint.
> > So it's better use rw_semaphore instead of mutex type for ->node_write to
> > promote performance.
> 
> If you could have time to help explaining the problem of this patch, I will be
> appreciated for that.

I have no clue. Except checkpoint, I don't know why need to block to
write node page.
Do you have any problem when you test with this patch?

> 
> Another question is what is ->writepages in sbi used for? I'm not quite clear.
> 

I remember it is for writing data pages per thread as much as possible.
When multi-threads write some files simultaneously, multi-threads contended with
each other to allocate a block. So block allocation was interleaved
across threads. It makes fragmentation of file.

Thanks,

> Thanks,
> 
> > 
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/checkpoint.c |6 +++---
> >  fs/f2fs/f2fs.h   |2 +-
> >  fs/f2fs/node.c   |4 ++--
> >  fs/f2fs/super.c  |2 +-
> >  4 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 0b4710c..eec406b 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -714,10 +714,10 @@ retry_flush_dents:
> >  * until finishing nat/sit flush.
> >  */
> >  retry_flush_nodes:
> > -   mutex_lock(>node_write);
> > +   down_write(>node_write);
> > 
> > if (get_pages(sbi, F2FS_DIRTY_NODES)) {
> > -   mutex_unlock(>node_write);
> > +   up_write(>node_write);
> > sync_node_pages(sbi, 0, );
> > goto retry_flush_nodes;
> > }
> > @@ -726,7 +726,7 @@ retry_flush_nodes:
> > 
> >  static void unblock_operations(struct f2fs_sb_info *sbi)
> >  {
> > -   mutex_unlock(>node_write);
> > +   up_write(>node_write);
> > f2fs_unlock_all(sbi);
> >  }
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index ae3b4ac..ca30b5a 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -444,7 +444,7 @@ struct f2fs_sb_info {
> > struct inode *meta_inode;   /* cache meta blocks */
> > struct mutex cp_mutex;  /* checkpoint procedure lock */
> > struct rw_semaphore cp_rwsem;   /* blocking FS operations */
> > -   struct mutex node_write;/* locking node writes */
> > +   struct rw_semaphore node_write; /* locking node writes */
> > struct mutex writepages;/* mutex for writepages() */
> > bool por_doing; /* recovery is doing or not */
> > wait_queue_head_t cp_wait;
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index a90f51d..7b5b5de 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1231,12 +1231,12 @@ static int f2fs_write_node_page(struct page *page,
> > if (wbc->for_reclaim)
> > goto redirty_out;
> > 
> > -   mutex_lock(>node_write);
> > +   down_read(>node_write);
> > set_page_writeback(page);
> > write_node_page(sbi, page, , nid, ni.blk_addr, _addr);
> > set_node_addr(sbi, , new_addr, is_fsync_dnode(page));
> > dec_page_count(sbi, F2FS_DIRTY_NODES);
> > -   mutex_unlock(>node_write);
> > +   up_read(>node_write);
> > unlock_page(page);
> > return 0;
> > 
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 8f96d93..bed9413 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -947,7 +947,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
> > *data, int silent)
> > mutex_init(>gc_mutex);
> > mutex_init(>writepages);
> > mutex_init(>cp_mutex);
> > -   mutex_init(>node_write);
> > +   init_rwsem(>node_write);
> > sbi->por_doing = false;
> > spin_lock_init(>stat_lock);
> > 
> > --
> > 1.7.9.5
> > 
> > 
&g

Re: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes

2014-07-30 Thread Changman Lee

Hi Chao,

On Wed, Jul 30, 2014 at 09:07:49PM +0800, Chao Yu wrote:
 Hi Jaegeuk Changman,
 
  -Original Message-
  From: Chao Yu [mailto:chao2...@samsung.com]
  Sent: Thursday, July 03, 2014 6:59 PM
  To: Jaegeuk Kim; Changman Lee
  Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
  linux-f2fs-de...@lists.sourceforge.net
  Subject: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes
  
  We do not need to block on -node_write among different node page writers 
  e.g.
  fsync/flush, unless we have a node page writer from write_checkpoint.
  So it's better use rw_semaphore instead of mutex type for -node_write to
  promote performance.
 
 If you could have time to help explaining the problem of this patch, I will be
 appreciated for that.

I have no clue. Except checkpoint, I don't know why need to block to
write node page.
Do you have any problem when you test with this patch?

 
 Another question is what is -writepages in sbi used for? I'm not quite clear.
 

I remember it is for writing data pages per thread as much as possible.
When multi-threads write some files simultaneously, multi-threads contended with
each other to allocate a block. So block allocation was interleaved
across threads. It makes fragmentation of file.

Thanks,

 Thanks,
 
  
  Signed-off-by: Chao Yu chao2...@samsung.com
  ---
   fs/f2fs/checkpoint.c |6 +++---
   fs/f2fs/f2fs.h   |2 +-
   fs/f2fs/node.c   |4 ++--
   fs/f2fs/super.c  |2 +-
   4 files changed, 7 insertions(+), 7 deletions(-)
  
  diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
  index 0b4710c..eec406b 100644
  --- a/fs/f2fs/checkpoint.c
  +++ b/fs/f2fs/checkpoint.c
  @@ -714,10 +714,10 @@ retry_flush_dents:
   * until finishing nat/sit flush.
   */
   retry_flush_nodes:
  -   mutex_lock(sbi-node_write);
  +   down_write(sbi-node_write);
  
  if (get_pages(sbi, F2FS_DIRTY_NODES)) {
  -   mutex_unlock(sbi-node_write);
  +   up_write(sbi-node_write);
  sync_node_pages(sbi, 0, wbc);
  goto retry_flush_nodes;
  }
  @@ -726,7 +726,7 @@ retry_flush_nodes:
  
   static void unblock_operations(struct f2fs_sb_info *sbi)
   {
  -   mutex_unlock(sbi-node_write);
  +   up_write(sbi-node_write);
  f2fs_unlock_all(sbi);
   }
  
  diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
  index ae3b4ac..ca30b5a 100644
  --- a/fs/f2fs/f2fs.h
  +++ b/fs/f2fs/f2fs.h
  @@ -444,7 +444,7 @@ struct f2fs_sb_info {
  struct inode *meta_inode;   /* cache meta blocks */
  struct mutex cp_mutex;  /* checkpoint procedure lock */
  struct rw_semaphore cp_rwsem;   /* blocking FS operations */
  -   struct mutex node_write;/* locking node writes */
  +   struct rw_semaphore node_write; /* locking node writes */
  struct mutex writepages;/* mutex for writepages() */
  bool por_doing; /* recovery is doing or not */
  wait_queue_head_t cp_wait;
  diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
  index a90f51d..7b5b5de 100644
  --- a/fs/f2fs/node.c
  +++ b/fs/f2fs/node.c
  @@ -1231,12 +1231,12 @@ static int f2fs_write_node_page(struct page *page,
  if (wbc-for_reclaim)
  goto redirty_out;
  
  -   mutex_lock(sbi-node_write);
  +   down_read(sbi-node_write);
  set_page_writeback(page);
  write_node_page(sbi, page, fio, nid, ni.blk_addr, new_addr);
  set_node_addr(sbi, ni, new_addr, is_fsync_dnode(page));
  dec_page_count(sbi, F2FS_DIRTY_NODES);
  -   mutex_unlock(sbi-node_write);
  +   up_read(sbi-node_write);
  unlock_page(page);
  return 0;
  
  diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
  index 8f96d93..bed9413 100644
  --- a/fs/f2fs/super.c
  +++ b/fs/f2fs/super.c
  @@ -947,7 +947,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
  *data, int silent)
  mutex_init(sbi-gc_mutex);
  mutex_init(sbi-writepages);
  mutex_init(sbi-cp_mutex);
  -   mutex_init(sbi-node_write);
  +   init_rwsem(sbi-node_write);
  sbi-por_doing = false;
  spin_lock_init(sbi-stat_lock);
  
  --
  1.7.9.5
  
  
  
  --
  Open source business process management suite built on Java and Eclipse
  Turn processes into business applications with Bonita BPM Community Edition
  Quickly connect people, data, and systems into organized workflows
  Winner of BOSSIE, CODIE, OW2 and Gartner awards
  http://p.sf.net/sfu/Bonitasoft
  ___
  Linux-f2fs-devel mailing list
  linux-f2fs-de...@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-29 Thread Changman Lee

On Tue, Jul 29, 2014 at 06:08:21PM -0700, Jaegeuk Kim wrote:
> On Wed, Jul 30, 2014 at 08:54:55AM +0900, Changman Lee wrote:
> > On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote:
> > > Hi Changman,
> > > 
> > > On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote:
> > > > Hi Jaegeuk,
> > > > 
> > > > On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
> > > > > This patch enforces in-place-updates only when fdatasync is requested.
> > > > > If we adopt this in-place-updates for the fdatasync, we can skip to 
> > > > > write the
> > > > > recovery information.
> > > > 
> > > > But, as you know, random write occurs when changing into 
> > > > in-place-updates.
> > > > It will degrade write performance. Is there any case in-place-updates is
> > > > better, except recovery or high utilization?
> > > 
> > > As I described, you can easily imagine, if users requested small amount 
> > > of data
> > > writes with fdatasync, we should do data writes + node writes.
> > > But, if we can do in-place-update, we don't need to write node blocks.
> > > Surely it triggers random writes, however, the amount of data is preety 
> > > small
> > > and the device handles them very fast by its inside cache, so that it can
> > > enhance the performance.
> > > 
> > > Thanks,
> > 
> > Partially agree. Sometimes, I see that SSR shows lower performance than
> > IPU. One of the reasons might be node writes.
> 
> What did you mean? That's why I consider IPU eagarly instead of SSR and LFS
> under the very strict cases.
> 

Okay, I understood your intention.
This discussion seems to be far from this thread a litte bit.
Background I told as above is that I got better number from IPU when I
tested fio under fragmentation by varmail and dd; and utilization about 93%.
The result of perf shows f2fs spends the most cpu time searching victim
in SSR mode. And f2fs had to write node data additionaly.
I think this condition could be one of the strict case as you told.

Thanks,

> > Anyway, if so, we should know total dirty pages for fdatasync and it's very
> > tunable according to a random write performance of device.
> 
> Agreed. We can do that either by comparing the number of dirty pages,
> additional data/node writes, and cost of checkpoint at the same time.
> And there is another thing is that we need to consider the number of
> waiting time for end_io.
> I'll look into this at some time.
> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> > > 
> > > > 
> > > > Thanks
> > > > 
> > > > > 
> > > > > Signed-off-by: Jaegeuk Kim 
> > > > > ---
> > > > >  fs/f2fs/f2fs.h| 1 +
> > > > >  fs/f2fs/file.c| 7 +++
> > > > >  fs/f2fs/segment.h | 4 
> > > > >  3 files changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > > > index ab36025..8f8685e 100644
> > > > > --- a/fs/f2fs/f2fs.h
> > > > > +++ b/fs/f2fs/f2fs.h
> > > > > @@ -998,6 +998,7 @@ enum {
> > > > >   FI_INLINE_DATA, /* used for inline data*/
> > > > >   FI_APPEND_WRITE,/* inode has appended data */
> > > > >   FI_UPDATE_WRITE,/* inode has in-place-update data */
> > > > > + FI_NEED_IPU,/* used fo ipu for fdatasync */
> > > > >  };
> > > > >  
> > > > >  static inline void set_inode_flag(struct f2fs_inode_info *fi, int 
> > > > > flag)
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index 121689a..e339856 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t 
> > > > > start, loff_t end, int datasync)
> > > > >   return 0;
> > > > >  
> > > > >   trace_f2fs_sync_file_enter(inode);
> > > > > +
> > > > > + /* if fdatasync is triggered, let's do in-place-update */
> > > > > + if (datasync)
> > > > > + set_inode_flag(fi, FI_NEED_IPU);
> > > > > +
> > > > >   ret = filemap_write_and_wait_range(inode->i_mapping, start, 
> > > > > end);
> > > >

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-29 Thread Changman Lee

On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote:
> Hi Changman,
> 
> On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote:
> > Hi Jaegeuk,
> > 
> > On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
> > > This patch enforces in-place-updates only when fdatasync is requested.
> > > If we adopt this in-place-updates for the fdatasync, we can skip to write 
> > > the
> > > recovery information.
> > 
> > But, as you know, random write occurs when changing into in-place-updates.
> > It will degrade write performance. Is there any case in-place-updates is
> > better, except recovery or high utilization?
> 
> As I described, you can easily imagine, if users requested small amount of 
> data
> writes with fdatasync, we should do data writes + node writes.
> But, if we can do in-place-update, we don't need to write node blocks.
> Surely it triggers random writes, however, the amount of data is preety small
> and the device handles them very fast by its inside cache, so that it can
> enhance the performance.
> 
> Thanks,

Partially agree. Sometimes, I see that SSR shows lower performance than
IPU. One of the reasons might be node writes.
Anyway, if so, we should know total dirty pages for fdatasync and it's very
tunable according to a random write performance of device.

Thanks,

> 
> > 
> > Thanks
> > 
> > > 
> > > Signed-off-by: Jaegeuk Kim 
> > > ---
> > >  fs/f2fs/f2fs.h| 1 +
> > >  fs/f2fs/file.c| 7 +++
> > >  fs/f2fs/segment.h | 4 
> > >  3 files changed, 12 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index ab36025..8f8685e 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -998,6 +998,7 @@ enum {
> > >   FI_INLINE_DATA, /* used for inline data*/
> > >   FI_APPEND_WRITE,/* inode has appended data */
> > >   FI_UPDATE_WRITE,/* inode has in-place-update data */
> > > + FI_NEED_IPU,/* used fo ipu for fdatasync */
> > >  };
> > >  
> > >  static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 121689a..e339856 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, 
> > > loff_t end, int datasync)
> > >   return 0;
> > >  
> > >   trace_f2fs_sync_file_enter(inode);
> > > +
> > > + /* if fdatasync is triggered, let's do in-place-update */
> > > + if (datasync)
> > > + set_inode_flag(fi, FI_NEED_IPU);
> > > +
> > >   ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
> > >   if (ret) {
> > >   trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
> > >   return ret;
> > >   }
> > > + if (datasync)
> > > + clear_inode_flag(fi, FI_NEED_IPU);
> > >  
> > >   /*
> > >* if there is no written data, don't waste time to write recovery info.
> > > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> > > index ee5c75e..55973f7 100644
> > > --- a/fs/f2fs/segment.h
> > > +++ b/fs/f2fs/segment.h
> > > @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode 
> > > *inode)
> > >   if (S_ISDIR(inode->i_mode))
> > >   return false;
> > >  
> > > + /* this is only set during fdatasync */
> > > + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> > > + return true;
> > > +
> > >   switch (SM_I(sbi)->ipu_policy) {
> > >   case F2FS_IPU_FORCE:
> > >   return true;
> > > -- 
> > > 1.8.5.2 (Apple Git-48)
> > > 
> > > 
> > > --
> > > Want fast and easy access to all the code in your enterprise? Index and
> > > search up to 200,000 lines of code with a free copy of Black Duck
> > > Code Sight - the same software that powers the world's largest code
> > > search on Ohloh, the Black Duck Open Hub! Try it now.
> > > http://p.sf.net/sfu/bds
> > > ___
> > > Linux-f2fs-devel mailing list
> > > linux-f2fs-de...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-29 Thread Changman Lee

On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote:
 Hi Changman,
 
 On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote:
  Hi Jaegeuk,
  
  On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
   This patch enforces in-place-updates only when fdatasync is requested.
   If we adopt this in-place-updates for the fdatasync, we can skip to write 
   the
   recovery information.
  
  But, as you know, random write occurs when changing into in-place-updates.
  It will degrade write performance. Is there any case in-place-updates is
  better, except recovery or high utilization?
 
 As I described, you can easily imagine, if users requested small amount of 
 data
 writes with fdatasync, we should do data writes + node writes.
 But, if we can do in-place-update, we don't need to write node blocks.
 Surely it triggers random writes, however, the amount of data is preety small
 and the device handles them very fast by its inside cache, so that it can
 enhance the performance.
 
 Thanks,

Partially agree. Sometimes, I see that SSR shows lower performance than
IPU. One of the reasons might be node writes.
Anyway, if so, we should know total dirty pages for fdatasync and it's very
tunable according to a random write performance of device.

Thanks,

 
  
  Thanks
  
   
   Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
   ---
fs/f2fs/f2fs.h| 1 +
fs/f2fs/file.c| 7 +++
fs/f2fs/segment.h | 4 
3 files changed, 12 insertions(+)
   
   diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
   index ab36025..8f8685e 100644
   --- a/fs/f2fs/f2fs.h
   +++ b/fs/f2fs/f2fs.h
   @@ -998,6 +998,7 @@ enum {
 FI_INLINE_DATA, /* used for inline data*/
 FI_APPEND_WRITE,/* inode has appended data */
 FI_UPDATE_WRITE,/* inode has in-place-update data */
   + FI_NEED_IPU,/* used fo ipu for fdatasync */
};

static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
   diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
   index 121689a..e339856 100644
   --- a/fs/f2fs/file.c
   +++ b/fs/f2fs/file.c
   @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, 
   loff_t end, int datasync)
 return 0;

 trace_f2fs_sync_file_enter(inode);
   +
   + /* if fdatasync is triggered, let's do in-place-update */
   + if (datasync)
   + set_inode_flag(fi, FI_NEED_IPU);
   +
 ret = filemap_write_and_wait_range(inode-i_mapping, start, end);
 if (ret) {
 trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
 return ret;
 }
   + if (datasync)
   + clear_inode_flag(fi, FI_NEED_IPU);

 /*
  * if there is no written data, don't waste time to write recovery info.
   diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
   index ee5c75e..55973f7 100644
   --- a/fs/f2fs/segment.h
   +++ b/fs/f2fs/segment.h
   @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode 
   *inode)
 if (S_ISDIR(inode-i_mode))
 return false;

   + /* this is only set during fdatasync */
   + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
   + return true;
   +
 switch (SM_I(sbi)-ipu_policy) {
 case F2FS_IPU_FORCE:
 return true;
   -- 
   1.8.5.2 (Apple Git-48)
   
   
   --
   Want fast and easy access to all the code in your enterprise? Index and
   search up to 200,000 lines of code with a free copy of Black Duck
   Code Sight - the same software that powers the world's largest code
   search on Ohloh, the Black Duck Open Hub! Try it now.
   http://p.sf.net/sfu/bds
   ___
   Linux-f2fs-devel mailing list
   linux-f2fs-de...@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-29 Thread Changman Lee

On Tue, Jul 29, 2014 at 06:08:21PM -0700, Jaegeuk Kim wrote:
 On Wed, Jul 30, 2014 at 08:54:55AM +0900, Changman Lee wrote:
  On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote:
   Hi Changman,
   
   On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote:
Hi Jaegeuk,

On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
 This patch enforces in-place-updates only when fdatasync is requested.
 If we adopt this in-place-updates for the fdatasync, we can skip to 
 write the
 recovery information.

But, as you know, random write occurs when changing into 
in-place-updates.
It will degrade write performance. Is there any case in-place-updates is
better, except recovery or high utilization?
   
   As I described, you can easily imagine, if users requested small amount 
   of data
   writes with fdatasync, we should do data writes + node writes.
   But, if we can do in-place-update, we don't need to write node blocks.
   Surely it triggers random writes, however, the amount of data is preety 
   small
   and the device handles them very fast by its inside cache, so that it can
   enhance the performance.
   
   Thanks,
  
  Partially agree. Sometimes, I see that SSR shows lower performance than
  IPU. One of the reasons might be node writes.
 
 What did you mean? That's why I consider IPU eagarly instead of SSR and LFS
 under the very strict cases.
 

Okay, I understood your intention.
This discussion seems to be far from this thread a litte bit.
Background I told as above is that I got better number from IPU when I
tested fio under fragmentation by varmail and dd; and utilization about 93%.
The result of perf shows f2fs spends the most cpu time searching victim
in SSR mode. And f2fs had to write node data additionaly.
I think this condition could be one of the strict case as you told.

Thanks,

  Anyway, if so, we should know total dirty pages for fdatasync and it's very
  tunable according to a random write performance of device.
 
 Agreed. We can do that either by comparing the number of dirty pages,
 additional data/node writes, and cost of checkpoint at the same time.
 And there is another thing is that we need to consider the number of
 waiting time for end_io.
 I'll look into this at some time.
 
 Thanks,
 
  
  Thanks,
  
   

Thanks

 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/f2fs.h| 1 +
  fs/f2fs/file.c| 7 +++
  fs/f2fs/segment.h | 4 
  3 files changed, 12 insertions(+)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index ab36025..8f8685e 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -998,6 +998,7 @@ enum {
   FI_INLINE_DATA, /* used for inline data*/
   FI_APPEND_WRITE,/* inode has appended data */
   FI_UPDATE_WRITE,/* inode has in-place-update data */
 + FI_NEED_IPU,/* used fo ipu for fdatasync */
  };
  
  static inline void set_inode_flag(struct f2fs_inode_info *fi, int 
 flag)
 diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
 index 121689a..e339856 100644
 --- a/fs/f2fs/file.c
 +++ b/fs/f2fs/file.c
 @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t 
 start, loff_t end, int datasync)
   return 0;
  
   trace_f2fs_sync_file_enter(inode);
 +
 + /* if fdatasync is triggered, let's do in-place-update */
 + if (datasync)
 + set_inode_flag(fi, FI_NEED_IPU);
 +
   ret = filemap_write_and_wait_range(inode-i_mapping, start, 
 end);
   if (ret) {
   trace_f2fs_sync_file_exit(inode, need_cp, datasync, 
 ret);
   return ret;
   }
 + if (datasync)
 + clear_inode_flag(fi, FI_NEED_IPU);
  
   /*
* if there is no written data, don't waste time to write 
 recovery info.
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
 index ee5c75e..55973f7 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct 
 inode *inode)
   if (S_ISDIR(inode-i_mode))
   return false;
  
 + /* this is only set during fdatasync */
 + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
 + return true;
 +
   switch (SM_I(sbi)-ipu_policy) {
   case F2FS_IPU_FORCE:
   return true;
 -- 
 1.8.5.2 (Apple Git-48)
 
 
 --
 Want fast and easy access to all the code in your enterprise? Index 
 and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-28 Thread Changman Lee

Hi Jaegeuk,

On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
> This patch enforces in-place-updates only when fdatasync is requested.
> If we adopt this in-place-updates for the fdatasync, we can skip to write the
> recovery information.

But, as you know, random write occurs when changing into in-place-updates.
It will degrade write performance. Is there any case in-place-updates is
better, except recovery or high utilization?

Thanks

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h| 1 +
>  fs/f2fs/file.c| 7 +++
>  fs/f2fs/segment.h | 4 
>  3 files changed, 12 insertions(+)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index ab36025..8f8685e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -998,6 +998,7 @@ enum {
>   FI_INLINE_DATA, /* used for inline data*/
>   FI_APPEND_WRITE,/* inode has appended data */
>   FI_UPDATE_WRITE,/* inode has in-place-update data */
> + FI_NEED_IPU,/* used fo ipu for fdatasync */
>  };
>  
>  static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 121689a..e339856 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   return 0;
>  
>   trace_f2fs_sync_file_enter(inode);
> +
> + /* if fdatasync is triggered, let's do in-place-update */
> + if (datasync)
> + set_inode_flag(fi, FI_NEED_IPU);
> +
>   ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
>   if (ret) {
>   trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
>   return ret;
>   }
> + if (datasync)
> + clear_inode_flag(fi, FI_NEED_IPU);
>  
>   /*
>* if there is no written data, don't waste time to write recovery info.
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index ee5c75e..55973f7 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode 
> *inode)
>   if (S_ISDIR(inode->i_mode))
>   return false;
>  
> + /* this is only set during fdatasync */
> + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
> + return true;
> +
>   switch (SM_I(sbi)->ipu_policy) {
>   case F2FS_IPU_FORCE:
>   return true;
> -- 
> 1.8.5.2 (Apple Git-48)
> 
> 
> --
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-28 Thread Changman Lee

Hi Jaegeuk,

On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote:
 This patch enforces in-place-updates only when fdatasync is requested.
 If we adopt this in-place-updates for the fdatasync, we can skip to write the
 recovery information.

But, as you know, random write occurs when changing into in-place-updates.
It will degrade write performance. Is there any case in-place-updates is
better, except recovery or high utilization?

Thanks

 
 Signed-off-by: Jaegeuk Kim jaeg...@kernel.org
 ---
  fs/f2fs/f2fs.h| 1 +
  fs/f2fs/file.c| 7 +++
  fs/f2fs/segment.h | 4 
  3 files changed, 12 insertions(+)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index ab36025..8f8685e 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -998,6 +998,7 @@ enum {
   FI_INLINE_DATA, /* used for inline data*/
   FI_APPEND_WRITE,/* inode has appended data */
   FI_UPDATE_WRITE,/* inode has in-place-update data */
 + FI_NEED_IPU,/* used fo ipu for fdatasync */
  };
  
  static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
 diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
 index 121689a..e339856 100644
 --- a/fs/f2fs/file.c
 +++ b/fs/f2fs/file.c
 @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, 
 loff_t end, int datasync)
   return 0;
  
   trace_f2fs_sync_file_enter(inode);
 +
 + /* if fdatasync is triggered, let's do in-place-update */
 + if (datasync)
 + set_inode_flag(fi, FI_NEED_IPU);
 +
   ret = filemap_write_and_wait_range(inode-i_mapping, start, end);
   if (ret) {
   trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
   return ret;
   }
 + if (datasync)
 + clear_inode_flag(fi, FI_NEED_IPU);
  
   /*
* if there is no written data, don't waste time to write recovery info.
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
 index ee5c75e..55973f7 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode 
 *inode)
   if (S_ISDIR(inode-i_mode))
   return false;
  
 + /* this is only set during fdatasync */
 + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
 + return true;
 +
   switch (SM_I(sbi)-ipu_policy) {
   case F2FS_IPU_FORCE:
   return true;
 -- 
 1.8.5.2 (Apple Git-48)
 
 
 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-27 Thread Changman Lee

On Tue, May 27, 2014 at 02:32:57PM +0800, Chao Yu wrote:
> Hi changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@samsung.com]
> > Sent: Tuesday, May 27, 2014 9:25 AM
> > To: Chao Yu
> > Cc: Jaegeuk Kim; linux-fsde...@vger.kernel.org; 
> > linux-kernel@vger.kernel.org;
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace 
> > f2fs_submit_page_mbio event
> > in ra_sum_pages
> > 
> > Hi, Chao
> > 
> > Could you think about following once.
> > move node_inode in front of build_segment_manager, then use node_inode
> > instead of bd_inode.
> 
> Jaegeuk and I discussed this solution previously in
> [PATCH 3/3 V3] f2fs: introduce f2fs_cache_node_page() to add page into 
> node_inode cache
> 
> You can see it from this url:
> http://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/?viewmonth=201312=5
> 
> And it seems not easy to change order of build_*_manager and make node_inode,
> because there are dependency between them.
> 

Sorry to make a mess your patch thread.
I've understood it. In your patch, using NAT journal seems to be
possible. Anyway, thanks for your answer.

> > 
> > On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote:
> > > Previously we allocate pages with no mapping in ra_sum_pages(), so we may
> > > encounter a crash in event trace of f2fs_submit_page_mbio where we access
> > > mapping data of the page.
> > >
> > > We'd better allocate pages in bd_inode mapping and invalidate these pages 
> > > after
> > > we restore data from pages. It could avoid crash in above scenario.
> > >
> > > Changes from V1
> > >  o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.
> > >
> > > Call Trace:
> > >  [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
> > >  [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
> > >  [] restore_node_summary+0x13a/0x280 [f2fs]
> > >  [] build_curseg+0x2bd/0x620 [f2fs]
> > >  [] build_segment_manager+0x1cb/0x920 [f2fs]
> > >  [] f2fs_fill_super+0x535/0x8e0 [f2fs]
> > >  [] mount_bdev+0x16a/0x1a0
> > >  [] f2fs_mount+0x1f/0x30 [f2fs]
> > >  [] mount_fs+0x36/0x170
> > >  [] vfs_kern_mount+0x55/0xe0
> > >  [] do_mount+0x1e8/0x900
> > >  [] SyS_mount+0x82/0xc0
> > >  [] sysenter_do_call+0x12/0x22
> > >
> > > Suggested-by: Jaegeuk Kim 
> > > Signed-off-by: Chao Yu 
> > > ---
> > >  fs/f2fs/node.c |   52 
> > > 
> > >  1 file changed, 24 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index 3d60d3d..02a59e9 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
> > > struct page *page)
> > >
> > >  /*
> > >   * ra_sum_pages() merge contiguous pages into one bio and submit.
> > > - * these pre-readed pages are linked in pages list.
> > > + * these pre-readed pages are alloced in bd_inode's mapping tree.
> > >   */
> > > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head 
> > > *pages,
> > > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
> > >   int start, int nrpages)
> > >  {
> > > - struct page *page;
> > > - int page_idx = start;
> > > + struct inode *inode = sbi->sb->s_bdev->bd_inode;
> > > + struct address_space *mapping = inode->i_mapping;
> > > + int i, page_idx = start;
> > >   struct f2fs_io_info fio = {
> > >   .type = META,
> > >   .rw = READ_SYNC | REQ_META | REQ_PRIO
> > >   };
> > >
> > > - for (; page_idx < start + nrpages; page_idx++) {
> > > - /* alloc temporal page for read node summary info*/
> > > - page = alloc_page(GFP_F2FS_ZERO);
> > > - if (!page)
> > > + for (i = 0; page_idx < start + nrpages; page_idx++, i++) {
> > > + /* alloc page in bd_inode for reading node summary info */
> > > + pages[i] = grab_cache_page(mapping, page_idx);
> > > + if (!pages[i])
> > >   break;
> > > -
> > > - lock_page(page);
> > > - page->index = page_idx;
> > > - list_add_tail(>lru, pages);
> > >

Re: [f2fs-dev] [PATCH] f2fs: avoid overflow when large directory feathure is enabled

2014-05-27 Thread Changman Lee

Hi, Chao
Good catch. Please, modify Documentation/filesytems/f2fs.txt

On Tue, May 27, 2014 at 09:06:52AM +0800, Chao Yu wrote:
> When large directory feathure is enable, We have one case which could cause
> overflow in dir_buckets() as following:
> special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.
> 
> Here we define MAX_DIR_BUCKETS to limit the return value when the condition
> could trigger potential overflow.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/dir.c   |4 ++--
>  include/linux/f2fs_fs.h |3 +++
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index c3f1485..966acb0 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode)
>  
>  static unsigned int dir_buckets(unsigned int level, int dir_level)
>  {
> - if (level < MAX_DIR_HASH_DEPTH / 2)
> + if (level + dir_level < MAX_DIR_HASH_DEPTH / 2)
>   return 1 << (level + dir_level);
>   else
> - return 1 << ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1);
> + return MAX_DIR_BUCKETS;
>  }
>  
>  static unsigned int bucket_blocks(unsigned int level)
> diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> index 8c03f71..ba6f312 100644
> --- a/include/linux/f2fs_fs.h
> +++ b/include/linux/f2fs_fs.h
> @@ -394,6 +394,9 @@ typedef __le32f2fs_hash_t;
>  /* MAX level for dir lookup */
>  #define MAX_DIR_HASH_DEPTH   63
>  
> +/* MAX buckets in one level of dir */
> +#define MAX_DIR_BUCKETS  (1 << ((MAX_DIR_HASH_DEPTH / 2) - 1))
> +
>  #define SIZE_OF_DIR_ENTRY11  /* by byte */
>  #define SIZE_OF_DENTRY_BITMAP((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 
> 1) / \
>   BITS_PER_BYTE)
> -- 
> 1.7.10.4
> 
> 
> 
> --
> The best possible search technologies are now affordable for all companies.
> Download your FREE open source Enterprise Search Engine today!
> Our experts will assist you in its installation for $59/mo, no commitment.
> Test it for FREE on our Cloud platform anytime!
> http://pubads.g.doubleclick.net/gampad/clk?id=145328191=/4140/ostg.clktrk
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: avoid overflow when large directory feathure is enabled

2014-05-27 Thread Changman Lee

Hi, Chao
Good catch. Please, modify Documentation/filesytems/f2fs.txt

On Tue, May 27, 2014 at 09:06:52AM +0800, Chao Yu wrote:
 When large directory feathure is enable, We have one case which could cause
 overflow in dir_buckets() as following:
 special case: level + dir_level = 32 and level  MAX_DIR_HASH_DEPTH / 2.
 
 Here we define MAX_DIR_BUCKETS to limit the return value when the condition
 could trigger potential overflow.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/dir.c   |4 ++--
  include/linux/f2fs_fs.h |3 +++
  2 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index c3f1485..966acb0 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode)
  
  static unsigned int dir_buckets(unsigned int level, int dir_level)
  {
 - if (level  MAX_DIR_HASH_DEPTH / 2)
 + if (level + dir_level  MAX_DIR_HASH_DEPTH / 2)
   return 1  (level + dir_level);
   else
 - return 1  ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1);
 + return MAX_DIR_BUCKETS;
  }
  
  static unsigned int bucket_blocks(unsigned int level)
 diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
 index 8c03f71..ba6f312 100644
 --- a/include/linux/f2fs_fs.h
 +++ b/include/linux/f2fs_fs.h
 @@ -394,6 +394,9 @@ typedef __le32f2fs_hash_t;
  /* MAX level for dir lookup */
  #define MAX_DIR_HASH_DEPTH   63
  
 +/* MAX buckets in one level of dir */
 +#define MAX_DIR_BUCKETS  (1  ((MAX_DIR_HASH_DEPTH / 2) - 1))
 +
  #define SIZE_OF_DIR_ENTRY11  /* by byte */
  #define SIZE_OF_DENTRY_BITMAP((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 
 1) / \
   BITS_PER_BYTE)
 -- 
 1.7.10.4
 
 
 
 --
 The best possible search technologies are now affordable for all companies.
 Download your FREE open source Enterprise Search Engine today!
 Our experts will assist you in its installation for $59/mo, no commitment.
 Test it for FREE on our Cloud platform anytime!
 http://pubads.g.doubleclick.net/gampad/clk?id=145328191iu=/4140/ostg.clktrk
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-27 Thread Changman Lee

On Tue, May 27, 2014 at 02:32:57PM +0800, Chao Yu wrote:
 Hi changman,
 
  -Original Message-
  From: Changman Lee [mailto:cm224@samsung.com]
  Sent: Tuesday, May 27, 2014 9:25 AM
  To: Chao Yu
  Cc: Jaegeuk Kim; linux-fsde...@vger.kernel.org; 
  linux-kernel@vger.kernel.org;
  linux-f2fs-de...@lists.sourceforge.net
  Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace 
  f2fs_submit_page_mbio event
  in ra_sum_pages
  
  Hi, Chao
  
  Could you think about following once.
  move node_inode in front of build_segment_manager, then use node_inode
  instead of bd_inode.
 
 Jaegeuk and I discussed this solution previously in
 [PATCH 3/3 V3] f2fs: introduce f2fs_cache_node_page() to add page into 
 node_inode cache
 
 You can see it from this url:
 http://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/?viewmonth=201312page=5
 
 And it seems not easy to change order of build_*_manager and make node_inode,
 because there are dependency between them.
 

Sorry to make a mess your patch thread.
I've understood it. In your patch, using NAT journal seems to be
possible. Anyway, thanks for your answer.

  
  On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote:
   Previously we allocate pages with no mapping in ra_sum_pages(), so we may
   encounter a crash in event trace of f2fs_submit_page_mbio where we access
   mapping data of the page.
  
   We'd better allocate pages in bd_inode mapping and invalidate these pages 
   after
   we restore data from pages. It could avoid crash in above scenario.
  
   Changes from V1
o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.
  
   Call Trace:
[f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
[f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
[f103c5da] restore_node_summary+0x13a/0x280 [f2fs]
[f103e22d] build_curseg+0x2bd/0x620 [f2fs]
[f104043b] build_segment_manager+0x1cb/0x920 [f2fs]
[f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs]
[c115b66a] mount_bdev+0x16a/0x1a0
[f102f63f] f2fs_mount+0x1f/0x30 [f2fs]
[c115c096] mount_fs+0x36/0x170
[c1173635] vfs_kern_mount+0x55/0xe0
[c1175388] do_mount+0x1e8/0x900
[c1175d72] SyS_mount+0x82/0xc0
[c16059cc] sysenter_do_call+0x12/0x22
  
   Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
   Signed-off-by: Chao Yu chao2...@samsung.com
   ---
fs/f2fs/node.c |   52 
   
1 file changed, 24 insertions(+), 28 deletions(-)
  
   diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
   index 3d60d3d..02a59e9 100644
   --- a/fs/f2fs/node.c
   +++ b/fs/f2fs/node.c
   @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
   struct page *page)
  
/*
 * ra_sum_pages() merge contiguous pages into one bio and submit.
   - * these pre-readed pages are linked in pages list.
   + * these pre-readed pages are alloced in bd_inode's mapping tree.
 */
   -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head 
   *pages,
   +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
 int start, int nrpages)
{
   - struct page *page;
   - int page_idx = start;
   + struct inode *inode = sbi-sb-s_bdev-bd_inode;
   + struct address_space *mapping = inode-i_mapping;
   + int i, page_idx = start;
 struct f2fs_io_info fio = {
 .type = META,
 .rw = READ_SYNC | REQ_META | REQ_PRIO
 };
  
   - for (; page_idx  start + nrpages; page_idx++) {
   - /* alloc temporal page for read node summary info*/
   - page = alloc_page(GFP_F2FS_ZERO);
   - if (!page)
   + for (i = 0; page_idx  start + nrpages; page_idx++, i++) {
   + /* alloc page in bd_inode for reading node summary info */
   + pages[i] = grab_cache_page(mapping, page_idx);
   + if (!pages[i])
 break;
   -
   - lock_page(page);
   - page-index = page_idx;
   - list_add_tail(page-lru, pages);
   + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio);
 }
  
   - list_for_each_entry(page, pages, lru)
   - f2fs_submit_page_mbio(sbi, page, page-index, fio);
   -
 f2fs_submit_merged_bio(sbi, META, READ);
   -
   - return page_idx - start;
   + return i;
}
  
int restore_node_summary(struct f2fs_sb_info *sbi,
   @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
{
 struct f2fs_node *rn;
 struct f2fs_summary *sum_entry;
   - struct page *page, *tmp;
   + struct inode *inode = sbi-sb-s_bdev-bd_inode;
 block_t addr;
 int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
   - int i, last_offset, nrpages, err = 0;
   - LIST_HEAD(page_list);
   + struct page *pages[bio_blocks];
   + int i, idx, last_offset, nrpages, err = 0;
  
 /* scan the node segment */
 last_offset = sbi-blocks_per_seg;
   @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi

Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-26 Thread Changman Lee

Hi, Chao

Could you think about following once.
move node_inode in front of build_segment_manager, then use node_inode
instead of bd_inode.

On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote:
> Previously we allocate pages with no mapping in ra_sum_pages(), so we may
> encounter a crash in event trace of f2fs_submit_page_mbio where we access
> mapping data of the page.
> 
> We'd better allocate pages in bd_inode mapping and invalidate these pages 
> after
> we restore data from pages. It could avoid crash in above scenario.
> 
> Changes from V1
>  o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.
> 
> Call Trace:
>  [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
>  [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
>  [] restore_node_summary+0x13a/0x280 [f2fs]
>  [] build_curseg+0x2bd/0x620 [f2fs]
>  [] build_segment_manager+0x1cb/0x920 [f2fs]
>  [] f2fs_fill_super+0x535/0x8e0 [f2fs]
>  [] mount_bdev+0x16a/0x1a0
>  [] f2fs_mount+0x1f/0x30 [f2fs]
>  [] mount_fs+0x36/0x170
>  [] vfs_kern_mount+0x55/0xe0
>  [] do_mount+0x1e8/0x900
>  [] SyS_mount+0x82/0xc0
>  [] sysenter_do_call+0x12/0x22
> 
> Suggested-by: Jaegeuk Kim 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/node.c |   52 
>  1 file changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 3d60d3d..02a59e9 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
> struct page *page)
>  
>  /*
>   * ra_sum_pages() merge contiguous pages into one bio and submit.
> - * these pre-readed pages are linked in pages list.
> + * these pre-readed pages are alloced in bd_inode's mapping tree.
>   */
> -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages,
> +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
>   int start, int nrpages)
>  {
> - struct page *page;
> - int page_idx = start;
> + struct inode *inode = sbi->sb->s_bdev->bd_inode;
> + struct address_space *mapping = inode->i_mapping;
> + int i, page_idx = start;
>   struct f2fs_io_info fio = {
>   .type = META,
>   .rw = READ_SYNC | REQ_META | REQ_PRIO
>   };
>  
> - for (; page_idx < start + nrpages; page_idx++) {
> - /* alloc temporal page for read node summary info*/
> - page = alloc_page(GFP_F2FS_ZERO);
> - if (!page)
> + for (i = 0; page_idx < start + nrpages; page_idx++, i++) {
> + /* alloc page in bd_inode for reading node summary info */
> + pages[i] = grab_cache_page(mapping, page_idx);
> + if (!pages[i])
>   break;
> -
> - lock_page(page);
> - page->index = page_idx;
> - list_add_tail(>lru, pages);
> + f2fs_submit_page_mbio(sbi, pages[i], page_idx, );
>   }
>  
> - list_for_each_entry(page, pages, lru)
> - f2fs_submit_page_mbio(sbi, page, page->index, );
> -
>   f2fs_submit_merged_bio(sbi, META, READ);
> -
> - return page_idx - start;
> + return i;
>  }
>  
>  int restore_node_summary(struct f2fs_sb_info *sbi,
> @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
>  {
>   struct f2fs_node *rn;
>   struct f2fs_summary *sum_entry;
> - struct page *page, *tmp;
> + struct inode *inode = sbi->sb->s_bdev->bd_inode;
>   block_t addr;
>   int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
> - int i, last_offset, nrpages, err = 0;
> - LIST_HEAD(page_list);
> + struct page *pages[bio_blocks];
> + int i, idx, last_offset, nrpages, err = 0;
>  
>   /* scan the node segment */
>   last_offset = sbi->blocks_per_seg;
> @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
>   nrpages = min(last_offset - i, bio_blocks);
>  
>   /* read ahead node pages */
> - nrpages = ra_sum_pages(sbi, _list, addr, nrpages);
> + nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
>   if (!nrpages)
>   return -ENOMEM;
>  
> - list_for_each_entry_safe(page, tmp, _list, lru) {
> + for (idx = 0; idx < nrpages; idx++) {
>   if (err)
>   goto skip;
>  
> - lock_page(page);
> - if (unlikely(!PageUptodate(page))) {
> + lock_page(pages[idx]);
> + if (unlikely(!PageUptodate(pages[idx]))) {
>   err = -EIO;
>   } else {
> - rn = F2FS_NODE(page);
> + rn = F2FS_NODE(pages[idx]);
>   sum_entry->nid = rn->footer.nid;
>   sum_entry->version = 0;
>

Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-26 Thread Changman Lee

On Mon, May 26, 2014 at 02:26:24PM +0800, Chao Yu wrote:
> Hi changman,
> 
> > -Original Message-
> > From: Changman Lee [mailto:cm224@samsung.com]
> > Sent: Friday, May 23, 2014 1:14 PM
> > To: Jaegeuk Kim
> > Cc: Chao Yu; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace 
> > f2fs_submit_page_mbio event in
> > ra_sum_pages
> > 
> > On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote:
> > > Hi Chao,
> > >
> > > 2014-05-16 (금), 17:14 +0800, Chao Yu:
> > > > Previously we allocate pages with no mapping in ra_sum_pages(), so we 
> > > > may
> > > > encounter a crash in event trace of f2fs_submit_page_mbio where we 
> > > > access
> > > > mapping data of the page.
> > > >
> > > > We'd better allocate pages in bd_inode mapping and invalidate these 
> > > > pages after
> > > > we restore data from pages. It could avoid crash in above scenario.
> > > >
> > > > Call Trace:
> > > >  [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
> > > >  [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
> > > >  [] restore_node_summary+0x13a/0x280 [f2fs]
> > > >  [] build_curseg+0x2bd/0x620 [f2fs]
> > > >  [] build_segment_manager+0x1cb/0x920 [f2fs]
> > > >  [] f2fs_fill_super+0x535/0x8e0 [f2fs]
> > > >  [] mount_bdev+0x16a/0x1a0
> > > >  [] f2fs_mount+0x1f/0x30 [f2fs]
> > > >  [] mount_fs+0x36/0x170
> > > >  [] vfs_kern_mount+0x55/0xe0
> > > >  [] do_mount+0x1e8/0x900
> > > >  [] SyS_mount+0x82/0xc0
> > > >  [] sysenter_do_call+0x12/0x22
> > > >
> > > > Signed-off-by: Chao Yu 
> > > > ---
> > > >  fs/f2fs/node.c |   49 -
> > > >  1 file changed, 28 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > index 3d60d3d..b5cd814 100644
> > > > --- a/fs/f2fs/node.c
> > > > +++ b/fs/f2fs/node.c
> > > > @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info 
> > > > *sbi, struct page *page)
> > > >
> > > >  /*
> > > >   * ra_sum_pages() merge contiguous pages into one bio and submit.
> > > > - * these pre-readed pages are linked in pages list.
> > > > + * these pre-readed pages are alloced in bd_inode's mapping tree.
> > > >   */
> > > > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head 
> > > > *pages,
> > > > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
> > > > int start, int nrpages)
> > > >  {
> > > > struct page *page;
> > > > +   struct inode *inode = sbi->sb->s_bdev->bd_inode;
> > 
> > How about use sbi->meta_inode instead of bd_inode, then we can do
> > caching summary pages for further i/o.
> 
> In my understanding, In ra_sum_pages() we readahead node pages in NODE 
> segment,
> then we could padding current summary caching with nid of node page's footer.
> So we should not cache this readaheaded pages in meta_inode's mapping.
> Do I miss something?
> 
> Regards
> 

Sorry, you're right. Forget about caching. I've confused ra_sum_pages with 
summary segments.

> > 
> > > > +   struct address_space *mapping = inode->i_mapping;
> > > > int page_idx = start;
> > > > +   int alloced, readed;
> > > > struct f2fs_io_info fio = {
> > > > .type = META,
> > > > .rw = READ_SYNC | REQ_META | REQ_PRIO
> > > > @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info 
> > > > *sbi, struct list_head
> > *pages,
> > > >
> > > > for (; page_idx < start + nrpages; page_idx++) {
> > > > /* alloc temporal page for read node summary info*/
> > > > -   page = alloc_page(GFP_F2FS_ZERO);
> > > > +   page = grab_cache_page(mapping, page_idx);
> > > > if (!page)
> > > > break;
> > > > -
> > > > -   lock_page(page);
> > > > -   page->index = page_idx;
> > > > -

Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-26 Thread Changman Lee

On Mon, May 26, 2014 at 02:26:24PM +0800, Chao Yu wrote:
 Hi changman,
 
  -Original Message-
  From: Changman Lee [mailto:cm224@samsung.com]
  Sent: Friday, May 23, 2014 1:14 PM
  To: Jaegeuk Kim
  Cc: Chao Yu; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
  linux-f2fs-de...@lists.sourceforge.net
  Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace 
  f2fs_submit_page_mbio event in
  ra_sum_pages
  
  On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote:
   Hi Chao,
  
   2014-05-16 (금), 17:14 +0800, Chao Yu:
Previously we allocate pages with no mapping in ra_sum_pages(), so we 
may
encounter a crash in event trace of f2fs_submit_page_mbio where we 
access
mapping data of the page.
   
We'd better allocate pages in bd_inode mapping and invalidate these 
pages after
we restore data from pages. It could avoid crash in above scenario.
   
Call Trace:
 [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
 [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
 [f103c5da] restore_node_summary+0x13a/0x280 [f2fs]
 [f103e22d] build_curseg+0x2bd/0x620 [f2fs]
 [f104043b] build_segment_manager+0x1cb/0x920 [f2fs]
 [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs]
 [c115b66a] mount_bdev+0x16a/0x1a0
 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs]
 [c115c096] mount_fs+0x36/0x170
 [c1173635] vfs_kern_mount+0x55/0xe0
 [c1175388] do_mount+0x1e8/0x900
 [c1175d72] SyS_mount+0x82/0xc0
 [c16059cc] sysenter_do_call+0x12/0x22
   
Signed-off-by: Chao Yu chao2...@samsung.com
---
 fs/f2fs/node.c |   49 -
 1 file changed, 28 insertions(+), 21 deletions(-)
   
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 3d60d3d..b5cd814 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info 
*sbi, struct page *page)
   
 /*
  * ra_sum_pages() merge contiguous pages into one bio and submit.
- * these pre-readed pages are linked in pages list.
+ * these pre-readed pages are alloced in bd_inode's mapping tree.
  */
-static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head 
*pages,
+static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
int start, int nrpages)
 {
struct page *page;
+   struct inode *inode = sbi-sb-s_bdev-bd_inode;
  
  How about use sbi-meta_inode instead of bd_inode, then we can do
  caching summary pages for further i/o.
 
 In my understanding, In ra_sum_pages() we readahead node pages in NODE 
 segment,
 then we could padding current summary caching with nid of node page's footer.
 So we should not cache this readaheaded pages in meta_inode's mapping.
 Do I miss something?
 
 Regards
 

Sorry, you're right. Forget about caching. I've confused ra_sum_pages with 
summary segments.

  
+   struct address_space *mapping = inode-i_mapping;
int page_idx = start;
+   int alloced, readed;
struct f2fs_io_info fio = {
.type = META,
.rw = READ_SYNC | REQ_META | REQ_PRIO
@@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info 
*sbi, struct list_head
  *pages,
   
for (; page_idx  start + nrpages; page_idx++) {
/* alloc temporal page for read node summary info*/
-   page = alloc_page(GFP_F2FS_ZERO);
+   page = grab_cache_page(mapping, page_idx);
if (!page)
break;
-
-   lock_page(page);
-   page-index = page_idx;
-   list_add_tail(page-lru, pages);
+   page_cache_release(page);
  
   IMO, we don't need to do like this.
   Instead,
 for() {
 page = grab_cache_page();
 if (!page)
 break;
 page[page_idx] = page;
 f2fs_submit_page_mbio(sbi, page, fio);
 }
 f2fs_submit_merged_bio(sbi, META, READ);
 return page_idx - start;
  
   Afterwards, in restore_node_summry(),
 lock_page() will wait the end_io for read.
 ...
 f2fs_put_page(pages[index], 1);
  
   Thanks,
  
}
   
-   list_for_each_entry(page, pages, lru)
-   f2fs_submit_page_mbio(sbi, page, page-index, fio);
+   alloced = page_idx - start;
+   readed = find_get_pages_contig(mapping, start, alloced, pages);
+   BUG_ON(alloced != readed);
+
+   for (page_idx = 0; page_idx  readed; page_idx++)
+   f2fs_submit_page_mbio(sbi, pages[page_idx],
+   pages[page_idx]-index, fio);
   
f2fs_submit_merged_bio(sbi, META, READ);
   
-   return page_idx - start;
+   return readed

Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-26 Thread Changman Lee

Hi, Chao

Could you think about following once.
move node_inode in front of build_segment_manager, then use node_inode
instead of bd_inode.

On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote:
 Previously we allocate pages with no mapping in ra_sum_pages(), so we may
 encounter a crash in event trace of f2fs_submit_page_mbio where we access
 mapping data of the page.
 
 We'd better allocate pages in bd_inode mapping and invalidate these pages 
 after
 we restore data from pages. It could avoid crash in above scenario.
 
 Changes from V1
  o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.
 
 Call Trace:
  [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
  [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
  [f103c5da] restore_node_summary+0x13a/0x280 [f2fs]
  [f103e22d] build_curseg+0x2bd/0x620 [f2fs]
  [f104043b] build_segment_manager+0x1cb/0x920 [f2fs]
  [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs]
  [c115b66a] mount_bdev+0x16a/0x1a0
  [f102f63f] f2fs_mount+0x1f/0x30 [f2fs]
  [c115c096] mount_fs+0x36/0x170
  [c1173635] vfs_kern_mount+0x55/0xe0
  [c1175388] do_mount+0x1e8/0x900
  [c1175d72] SyS_mount+0x82/0xc0
  [c16059cc] sysenter_do_call+0x12/0x22
 
 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/node.c |   52 
  1 file changed, 24 insertions(+), 28 deletions(-)
 
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index 3d60d3d..02a59e9 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
 struct page *page)
  
  /*
   * ra_sum_pages() merge contiguous pages into one bio and submit.
 - * these pre-readed pages are linked in pages list.
 + * these pre-readed pages are alloced in bd_inode's mapping tree.
   */
 -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages,
 +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
   int start, int nrpages)
  {
 - struct page *page;
 - int page_idx = start;
 + struct inode *inode = sbi-sb-s_bdev-bd_inode;
 + struct address_space *mapping = inode-i_mapping;
 + int i, page_idx = start;
   struct f2fs_io_info fio = {
   .type = META,
   .rw = READ_SYNC | REQ_META | REQ_PRIO
   };
  
 - for (; page_idx  start + nrpages; page_idx++) {
 - /* alloc temporal page for read node summary info*/
 - page = alloc_page(GFP_F2FS_ZERO);
 - if (!page)
 + for (i = 0; page_idx  start + nrpages; page_idx++, i++) {
 + /* alloc page in bd_inode for reading node summary info */
 + pages[i] = grab_cache_page(mapping, page_idx);
 + if (!pages[i])
   break;
 -
 - lock_page(page);
 - page-index = page_idx;
 - list_add_tail(page-lru, pages);
 + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio);
   }
  
 - list_for_each_entry(page, pages, lru)
 - f2fs_submit_page_mbio(sbi, page, page-index, fio);
 -
   f2fs_submit_merged_bio(sbi, META, READ);
 -
 - return page_idx - start;
 + return i;
  }
  
  int restore_node_summary(struct f2fs_sb_info *sbi,
 @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
  {
   struct f2fs_node *rn;
   struct f2fs_summary *sum_entry;
 - struct page *page, *tmp;
 + struct inode *inode = sbi-sb-s_bdev-bd_inode;
   block_t addr;
   int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 - int i, last_offset, nrpages, err = 0;
 - LIST_HEAD(page_list);
 + struct page *pages[bio_blocks];
 + int i, idx, last_offset, nrpages, err = 0;
  
   /* scan the node segment */
   last_offset = sbi-blocks_per_seg;
 @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
   nrpages = min(last_offset - i, bio_blocks);
  
   /* read ahead node pages */
 - nrpages = ra_sum_pages(sbi, page_list, addr, nrpages);
 + nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
   if (!nrpages)
   return -ENOMEM;
  
 - list_for_each_entry_safe(page, tmp, page_list, lru) {
 + for (idx = 0; idx  nrpages; idx++) {
   if (err)
   goto skip;
  
 - lock_page(page);
 - if (unlikely(!PageUptodate(page))) {
 + lock_page(pages[idx]);
 + if (unlikely(!PageUptodate(pages[idx]))) {
   err = -EIO;
   } else {
 - rn = F2FS_NODE(page);
 + rn = F2FS_NODE(pages[idx]);
   sum_entry-nid = rn-footer.nid;
   sum_entry-version = 0;

Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-22 Thread Changman Lee

On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote:
> Hi Chao,
> 
> 2014-05-16 (금), 17:14 +0800, Chao Yu:
> > Previously we allocate pages with no mapping in ra_sum_pages(), so we may
> > encounter a crash in event trace of f2fs_submit_page_mbio where we access
> > mapping data of the page.
> > 
> > We'd better allocate pages in bd_inode mapping and invalidate these pages 
> > after
> > we restore data from pages. It could avoid crash in above scenario.
> > 
> > Call Trace:
> >  [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
> >  [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
> >  [] restore_node_summary+0x13a/0x280 [f2fs]
> >  [] build_curseg+0x2bd/0x620 [f2fs]
> >  [] build_segment_manager+0x1cb/0x920 [f2fs]
> >  [] f2fs_fill_super+0x535/0x8e0 [f2fs]
> >  [] mount_bdev+0x16a/0x1a0
> >  [] f2fs_mount+0x1f/0x30 [f2fs]
> >  [] mount_fs+0x36/0x170
> >  [] vfs_kern_mount+0x55/0xe0
> >  [] do_mount+0x1e8/0x900
> >  [] SyS_mount+0x82/0xc0
> >  [] sysenter_do_call+0x12/0x22
> > 
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/node.c |   49 -
> >  1 file changed, 28 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 3d60d3d..b5cd814 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
> > struct page *page)
> >  
> >  /*
> >   * ra_sum_pages() merge contiguous pages into one bio and submit.
> > - * these pre-readed pages are linked in pages list.
> > + * these pre-readed pages are alloced in bd_inode's mapping tree.
> >   */
> > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages,
> > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
> > int start, int nrpages)
> >  {
> > struct page *page;
> > +   struct inode *inode = sbi->sb->s_bdev->bd_inode;

How about use sbi->meta_inode instead of bd_inode, then we can do
caching summary pages for further i/o.

> > +   struct address_space *mapping = inode->i_mapping;
> > int page_idx = start;
> > +   int alloced, readed;
> > struct f2fs_io_info fio = {
> > .type = META,
> > .rw = READ_SYNC | REQ_META | REQ_PRIO
> > @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, 
> > struct list_head *pages,
> >  
> > for (; page_idx < start + nrpages; page_idx++) {
> > /* alloc temporal page for read node summary info*/
> > -   page = alloc_page(GFP_F2FS_ZERO);
> > +   page = grab_cache_page(mapping, page_idx);
> > if (!page)
> > break;
> > -
> > -   lock_page(page);
> > -   page->index = page_idx;
> > -   list_add_tail(>lru, pages);
> > +   page_cache_release(page);
> 
> IMO, we don't need to do like this.
> Instead,
>   for() {
>   page = grab_cache_page();
>   if (!page)
>   break;
>   page[page_idx] = page;
>   f2fs_submit_page_mbio(sbi, page, );
>   }
>   f2fs_submit_merged_bio(sbi, META, READ);
>   return page_idx - start;
> 
> Afterwards, in restore_node_summry(),
>   lock_page() will wait the end_io for read.
>   ...
>   f2fs_put_page(pages[index], 1);
> 
> Thanks,
> 
> > }
> >  
> > -   list_for_each_entry(page, pages, lru)
> > -   f2fs_submit_page_mbio(sbi, page, page->index, );
> > +   alloced = page_idx - start;
> > +   readed = find_get_pages_contig(mapping, start, alloced, pages);
> > +   BUG_ON(alloced != readed);
> > +
> > +   for (page_idx = 0; page_idx < readed; page_idx++)
> > +   f2fs_submit_page_mbio(sbi, pages[page_idx],
> > +   pages[page_idx]->index, );
> >  
> > f2fs_submit_merged_bio(sbi, META, READ);
> >  
> > -   return page_idx - start;
> > +   return readed;
> >  }
> >  
> >  int restore_node_summary(struct f2fs_sb_info *sbi,
> > @@ -1694,11 +1699,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
> >  {
> > struct f2fs_node *rn;
> > struct f2fs_summary *sum_entry;
> > -   struct page *page, *tmp;
> > +   struct inode *inode = sbi->sb->s_bdev->bd_inode;
> > block_t addr;
> > int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
> > -   int i, last_offset, nrpages, err = 0;
> > -   LIST_HEAD(page_list);
> > +   struct page *pages[bio_blocks];
> > +   int i, index, last_offset, nrpages, err = 0;
> >  
> > /* scan the node segment */
> > last_offset = sbi->blocks_per_seg;
> > @@ -1709,29 +1714,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
> > nrpages = min(last_offset - i, bio_blocks);
> >  
> > /* read ahead node pages */
> > -   nrpages = ra_sum_pages(sbi, _list, addr, nrpages);
> > +   nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
> > if (!nrpages)
> > return -ENOMEM;
> >  
>

Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

2014-05-22 Thread Changman Lee

On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote:
 Hi Chao,
 
 2014-05-16 (금), 17:14 +0800, Chao Yu:
  Previously we allocate pages with no mapping in ra_sum_pages(), so we may
  encounter a crash in event trace of f2fs_submit_page_mbio where we access
  mapping data of the page.
  
  We'd better allocate pages in bd_inode mapping and invalidate these pages 
  after
  we restore data from pages. It could avoid crash in above scenario.
  
  Call Trace:
   [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
   [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
   [f103c5da] restore_node_summary+0x13a/0x280 [f2fs]
   [f103e22d] build_curseg+0x2bd/0x620 [f2fs]
   [f104043b] build_segment_manager+0x1cb/0x920 [f2fs]
   [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs]
   [c115b66a] mount_bdev+0x16a/0x1a0
   [f102f63f] f2fs_mount+0x1f/0x30 [f2fs]
   [c115c096] mount_fs+0x36/0x170
   [c1173635] vfs_kern_mount+0x55/0xe0
   [c1175388] do_mount+0x1e8/0x900
   [c1175d72] SyS_mount+0x82/0xc0
   [c16059cc] sysenter_do_call+0x12/0x22
  
  Signed-off-by: Chao Yu chao2...@samsung.com
  ---
   fs/f2fs/node.c |   49 -
   1 file changed, 28 insertions(+), 21 deletions(-)
  
  diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
  index 3d60d3d..b5cd814 100644
  --- a/fs/f2fs/node.c
  +++ b/fs/f2fs/node.c
  @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, 
  struct page *page)
   
   /*
* ra_sum_pages() merge contiguous pages into one bio and submit.
  - * these pre-readed pages are linked in pages list.
  + * these pre-readed pages are alloced in bd_inode's mapping tree.
*/
  -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages,
  +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages,
  int start, int nrpages)
   {
  struct page *page;
  +   struct inode *inode = sbi-sb-s_bdev-bd_inode;

How about use sbi-meta_inode instead of bd_inode, then we can do
caching summary pages for further i/o.

  +   struct address_space *mapping = inode-i_mapping;
  int page_idx = start;
  +   int alloced, readed;
  struct f2fs_io_info fio = {
  .type = META,
  .rw = READ_SYNC | REQ_META | REQ_PRIO
  @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, 
  struct list_head *pages,
   
  for (; page_idx  start + nrpages; page_idx++) {
  /* alloc temporal page for read node summary info*/
  -   page = alloc_page(GFP_F2FS_ZERO);
  +   page = grab_cache_page(mapping, page_idx);
  if (!page)
  break;
  -
  -   lock_page(page);
  -   page-index = page_idx;
  -   list_add_tail(page-lru, pages);
  +   page_cache_release(page);
 
 IMO, we don't need to do like this.
 Instead,
   for() {
   page = grab_cache_page();
   if (!page)
   break;
   page[page_idx] = page;
   f2fs_submit_page_mbio(sbi, page, fio);
   }
   f2fs_submit_merged_bio(sbi, META, READ);
   return page_idx - start;
 
 Afterwards, in restore_node_summry(),
   lock_page() will wait the end_io for read.
   ...
   f2fs_put_page(pages[index], 1);
 
 Thanks,
 
  }
   
  -   list_for_each_entry(page, pages, lru)
  -   f2fs_submit_page_mbio(sbi, page, page-index, fio);
  +   alloced = page_idx - start;
  +   readed = find_get_pages_contig(mapping, start, alloced, pages);
  +   BUG_ON(alloced != readed);
  +
  +   for (page_idx = 0; page_idx  readed; page_idx++)
  +   f2fs_submit_page_mbio(sbi, pages[page_idx],
  +   pages[page_idx]-index, fio);
   
  f2fs_submit_merged_bio(sbi, META, READ);
   
  -   return page_idx - start;
  +   return readed;
   }
   
   int restore_node_summary(struct f2fs_sb_info *sbi,
  @@ -1694,11 +1699,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
   {
  struct f2fs_node *rn;
  struct f2fs_summary *sum_entry;
  -   struct page *page, *tmp;
  +   struct inode *inode = sbi-sb-s_bdev-bd_inode;
  block_t addr;
  int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
  -   int i, last_offset, nrpages, err = 0;
  -   LIST_HEAD(page_list);
  +   struct page *pages[bio_blocks];
  +   int i, index, last_offset, nrpages, err = 0;
   
  /* scan the node segment */
  last_offset = sbi-blocks_per_seg;
  @@ -1709,29 +1714,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
  nrpages = min(last_offset - i, bio_blocks);
   
  /* read ahead node pages */
  -   nrpages = ra_sum_pages(sbi, page_list, addr, nrpages);
  +   nrpages = ra_sum_pages(sbi, pages, addr, nrpages);
  if (!nrpages)
  return -ENOMEM;
   
  -   list_for_each_entry_safe(page, tmp, page_list, lru) {
  +   for (index = 0; index  nrpages;

Re: [f2fs-dev] [PATCH 5/5] f2fs: add a wait queue to avoid unnecessary, build_free_nid

2014-03-09 Thread Changman Lee

On 금, 2014-03-07 at 18:43 +0800, Gu Zheng wrote:
> Previously, when we try to alloc free nid while the build free nid
> is going, the allocer will be run into the flow that waiting for
> "nm_i->build_lock", see following:
>   /* We should not use stale free nids created by build_free_nids */
> > if (nm_i->fcnt && !on_build_free_nids(nm_i)) {
>   f2fs_bug_on(list_empty(_i->free_nid_list));
>   list_for_each(this, _i->free_nid_list) {
>   i = list_entry(this, struct free_nid, list);
>   if (i->state == NID_NEW)
>   break;
>   }
> 
>   f2fs_bug_on(i->state != NID_NEW);
>   *nid = i->nid;
>   i->state = NID_ALLOC;
>   nm_i->fcnt--;
>   spin_unlock(_i->free_nid_list_lock);
>   return true;
>   }
>   spin_unlock(_i->free_nid_list_lock);
> 
>   /* Let's scan nat pages and its caches to get free nids */
> > mutex_lock(_i->build_lock);
>   build_free_nids(sbi);
>   mutex_unlock(_i->build_lock);
> and this will cause another unnecessary building free nid if the current
> building free nid job is done.
> So here we introduce a wait_queue to avoid this issue.
> 
> Signed-off-by: Gu Zheng 
> ---
>  fs/f2fs/f2fs.h |1 +
>  fs/f2fs/node.c |   10 +-
>  2 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index f845e92..7ae193e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -256,6 +256,7 @@ struct f2fs_nm_info {
>   spinlock_t free_nid_list_lock;  /* protect free nid list */
>   unsigned int fcnt;  /* the number of free node id */
>   struct mutex build_lock;/* lock for build free nids */
> + wait_queue_head_t build_wq; /* wait queue for build free nids */
>  
>   /* for checkpoint */
>   char *nat_bitmap;   /* NAT bitmap pointer */
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 4b7861d..ab44711 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1422,7 +1422,13 @@ retry:
>   spin_lock(_i->free_nid_list_lock);
>  
>   /* We should not use stale free nids created by build_free_nids */
> - if (nm_i->fcnt && !on_build_free_nids(nm_i)) {
> + if (on_build_free_nids(nm_i)) {
> + spin_unlock(_i->free_nid_list_lock);
> + wait_event(nm_i->build_wq, !on_build_free_nids(nm_i));
> + goto retry;
> + }
> +

It would be better moving spin_lock(free_nid_list_lock) here after
removing above spin_unlock().

> + if (nm_i->fcnt) {
>   f2fs_bug_on(list_empty(_i->free_nid_list));
>   list_for_each(this, _i->free_nid_list) {
>   i = list_entry(this, struct free_nid, list);
> @@ -1443,6 +1449,7 @@ retry:
>   mutex_lock(_i->build_lock);
>   build_free_nids(sbi);
>   mutex_unlock(_i->build_lock);
> + wake_up_all(_i->build_wq);
>   goto retry;
>  }
>  
> @@ -1813,6 +1820,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
>   INIT_LIST_HEAD(_i->dirty_nat_entries);
>  
>   mutex_init(_i->build_lock);
> + init_waitqueue_head(_i->build_wq);
>   spin_lock_init(_i->free_nid_list_lock);
>   rwlock_init(_i->nat_tree_lock);
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 5/5] f2fs: add a wait queue to avoid unnecessary, build_free_nid

2014-03-09 Thread Changman Lee

On 금, 2014-03-07 at 18:43 +0800, Gu Zheng wrote:
 Previously, when we try to alloc free nid while the build free nid
 is going, the allocer will be run into the flow that waiting for
 nm_i-build_lock, see following:
   /* We should not use stale free nids created by build_free_nids */
  if (nm_i-fcnt  !on_build_free_nids(nm_i)) {
   f2fs_bug_on(list_empty(nm_i-free_nid_list));
   list_for_each(this, nm_i-free_nid_list) {
   i = list_entry(this, struct free_nid, list);
   if (i-state == NID_NEW)
   break;
   }
 
   f2fs_bug_on(i-state != NID_NEW);
   *nid = i-nid;
   i-state = NID_ALLOC;
   nm_i-fcnt--;
   spin_unlock(nm_i-free_nid_list_lock);
   return true;
   }
   spin_unlock(nm_i-free_nid_list_lock);
 
   /* Let's scan nat pages and its caches to get free nids */
  mutex_lock(nm_i-build_lock);
   build_free_nids(sbi);
   mutex_unlock(nm_i-build_lock);
 and this will cause another unnecessary building free nid if the current
 building free nid job is done.
 So here we introduce a wait_queue to avoid this issue.
 
 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/f2fs/f2fs.h |1 +
  fs/f2fs/node.c |   10 +-
  2 files changed, 10 insertions(+), 1 deletions(-)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index f845e92..7ae193e 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -256,6 +256,7 @@ struct f2fs_nm_info {
   spinlock_t free_nid_list_lock;  /* protect free nid list */
   unsigned int fcnt;  /* the number of free node id */
   struct mutex build_lock;/* lock for build free nids */
 + wait_queue_head_t build_wq; /* wait queue for build free nids */
  
   /* for checkpoint */
   char *nat_bitmap;   /* NAT bitmap pointer */
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index 4b7861d..ab44711 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1422,7 +1422,13 @@ retry:
   spin_lock(nm_i-free_nid_list_lock);
  
   /* We should not use stale free nids created by build_free_nids */
 - if (nm_i-fcnt  !on_build_free_nids(nm_i)) {
 + if (on_build_free_nids(nm_i)) {
 + spin_unlock(nm_i-free_nid_list_lock);
 + wait_event(nm_i-build_wq, !on_build_free_nids(nm_i));
 + goto retry;
 + }
 +

It would be better moving spin_lock(free_nid_list_lock) here after
removing above spin_unlock().

 + if (nm_i-fcnt) {
   f2fs_bug_on(list_empty(nm_i-free_nid_list));
   list_for_each(this, nm_i-free_nid_list) {
   i = list_entry(this, struct free_nid, list);
 @@ -1443,6 +1449,7 @@ retry:
   mutex_lock(nm_i-build_lock);
   build_free_nids(sbi);
   mutex_unlock(nm_i-build_lock);
 + wake_up_all(nm_i-build_wq);
   goto retry;
  }
  
 @@ -1813,6 +1820,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
   INIT_LIST_HEAD(nm_i-dirty_nat_entries);
  
   mutex_init(nm_i-build_lock);
 + init_waitqueue_head(nm_i-build_wq);
   spin_lock_init(nm_i-free_nid_list_lock);
   rwlock_init(nm_i-nat_tree_lock);
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 2/4] f2fs: handle dirty segments inside refresh_sit_entry

2014-02-05 Thread Changman Lee

Hi,
I found some redundant code in  your patch.
I think that locate_dirty_segment(sbi, old_cursegno) equals to
locate_dirty_segment(sbi, GET_SEGNO(sbi, new)) in refresh_sit_entry.
Because *new_blkaddr is a block belonging to old_cursegno.
How do you think?


On 화, 2014-01-28 at 14:54 +0900, Jaegeuk Kim wrote:
> This patch cleans up the refresh_sit_entry to handle locate_dirty_segments.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|  1 +
>  fs/f2fs/segment.c | 19 ---
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 42903c3..6e9515d 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1132,6 +1132,7 @@ void destroy_node_manager_caches(void);
>  void f2fs_balance_fs(struct f2fs_sb_info *);
>  void f2fs_balance_fs_bg(struct f2fs_sb_info *);
>  void invalidate_blocks(struct f2fs_sb_info *, block_t);
> +void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
>  void clear_prefree_segments(struct f2fs_sb_info *);
>  int npages_for_summary_flush(struct f2fs_sb_info *);
>  void allocate_new_segments(struct f2fs_sb_info *);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 7caac5f..89aa503 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -434,12 +434,14 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, 
> block_t blkaddr, int del)
>   get_sec_entry(sbi, segno)->valid_blocks += del;
>  }
>  
> -static void refresh_sit_entry(struct f2fs_sb_info *sbi,
> - block_t old_blkaddr, block_t new_blkaddr)
> +void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new)
>  {
> - update_sit_entry(sbi, new_blkaddr, 1);
> - if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
> - update_sit_entry(sbi, old_blkaddr, -1);
> + update_sit_entry(sbi, new, 1);
> + if (GET_SEGNO(sbi, old) != NULL_SEGNO)
> + update_sit_entry(sbi, old, -1);
> +
> + locate_dirty_segment(sbi, GET_SEGNO(sbi, old));
> + locate_dirty_segment(sbi, GET_SEGNO(sbi, new));
>  }
>  
>  void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
> @@ -886,12 +888,11 @@ void allocate_data_block(struct f2fs_sb_info *sbi, 
> struct page *page,
>* since SSR needs latest valid block information.
>*/
>   refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr);
> + locate_dirty_segment(sbi, old_cursegno);
>  
>   if (!__has_curseg_space(sbi, type))
>   sit_i->s_ops->allocate_segment(sbi, type, false);
>  
> - locate_dirty_segment(sbi, old_cursegno);
> - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
>   mutex_unlock(_i->sentry_lock);
>  
>   if (page && IS_NODESEG(type))
> @@ -992,9 +993,7 @@ void recover_data_page(struct f2fs_sb_info *sbi,
>   __add_sum_entry(sbi, type, sum);
>  
>   refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
> -
>   locate_dirty_segment(sbi, old_cursegno);
> - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
>  
>   mutex_unlock(_i->sentry_lock);
>   mutex_unlock(>curseg_mutex);
> @@ -1045,9 +1044,7 @@ void rewrite_node_page(struct f2fs_sb_info *sbi,
>   f2fs_submit_page_mbio(sbi, page, new_blkaddr, );
>   f2fs_submit_merged_bio(sbi, NODE, WRITE);
>   refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
> -
>   locate_dirty_segment(sbi, old_cursegno);
> - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
>  
>   mutex_unlock(_i->sentry_lock);
>   mutex_unlock(>curseg_mutex);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 2/4] f2fs: handle dirty segments inside refresh_sit_entry

2014-02-05 Thread Changman Lee

Hi,
I found some redundant code in  your patch.
I think that locate_dirty_segment(sbi, old_cursegno) equals to
locate_dirty_segment(sbi, GET_SEGNO(sbi, new)) in refresh_sit_entry.
Because *new_blkaddr is a block belonging to old_cursegno.
How do you think?


On 화, 2014-01-28 at 14:54 +0900, Jaegeuk Kim wrote:
 This patch cleans up the refresh_sit_entry to handle locate_dirty_segments.
 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com
 ---
  fs/f2fs/f2fs.h|  1 +
  fs/f2fs/segment.c | 19 ---
  2 files changed, 9 insertions(+), 11 deletions(-)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 42903c3..6e9515d 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -1132,6 +1132,7 @@ void destroy_node_manager_caches(void);
  void f2fs_balance_fs(struct f2fs_sb_info *);
  void f2fs_balance_fs_bg(struct f2fs_sb_info *);
  void invalidate_blocks(struct f2fs_sb_info *, block_t);
 +void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
  void clear_prefree_segments(struct f2fs_sb_info *);
  int npages_for_summary_flush(struct f2fs_sb_info *);
  void allocate_new_segments(struct f2fs_sb_info *);
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 7caac5f..89aa503 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -434,12 +434,14 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, 
 block_t blkaddr, int del)
   get_sec_entry(sbi, segno)-valid_blocks += del;
  }
  
 -static void refresh_sit_entry(struct f2fs_sb_info *sbi,
 - block_t old_blkaddr, block_t new_blkaddr)
 +void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new)
  {
 - update_sit_entry(sbi, new_blkaddr, 1);
 - if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
 - update_sit_entry(sbi, old_blkaddr, -1);
 + update_sit_entry(sbi, new, 1);
 + if (GET_SEGNO(sbi, old) != NULL_SEGNO)
 + update_sit_entry(sbi, old, -1);
 +
 + locate_dirty_segment(sbi, GET_SEGNO(sbi, old));
 + locate_dirty_segment(sbi, GET_SEGNO(sbi, new));
  }
  
  void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
 @@ -886,12 +888,11 @@ void allocate_data_block(struct f2fs_sb_info *sbi, 
 struct page *page,
* since SSR needs latest valid block information.
*/
   refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr);
 + locate_dirty_segment(sbi, old_cursegno);
  
   if (!__has_curseg_space(sbi, type))
   sit_i-s_ops-allocate_segment(sbi, type, false);
  
 - locate_dirty_segment(sbi, old_cursegno);
 - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
   mutex_unlock(sit_i-sentry_lock);
  
   if (page  IS_NODESEG(type))
 @@ -992,9 +993,7 @@ void recover_data_page(struct f2fs_sb_info *sbi,
   __add_sum_entry(sbi, type, sum);
  
   refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
 -
   locate_dirty_segment(sbi, old_cursegno);
 - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
  
   mutex_unlock(sit_i-sentry_lock);
   mutex_unlock(curseg-curseg_mutex);
 @@ -1045,9 +1044,7 @@ void rewrite_node_page(struct f2fs_sb_info *sbi,
   f2fs_submit_page_mbio(sbi, page, new_blkaddr, fio);
   f2fs_submit_merged_bio(sbi, NODE, WRITE);
   refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
 -
   locate_dirty_segment(sbi, old_cursegno);
 - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
  
   mutex_unlock(sit_i-sentry_lock);
   mutex_unlock(curseg-curseg_mutex);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee


As you know, if any data or function are used once, we can use some keywords
like __initdata for data and __init for function.


-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 7:52 PM
To: 'Changman Lee'; jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap with bitops for better mount performance

Hi Lee,

> -Original Message-
> From: Changman Lee [mailto:cm224@samsung.com]
> Sent: Tuesday, October 29, 2013 3:36 PM
> To: 'Chao Yu'; jaegeuk@samsung.com
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net
> Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or 
> zeros
bitmap
> with bitops for better mount performance
> 
> Review attached patch, please.

Could we hide the pre calculated value by generating it in allocated memory
by func, because the value will be no use after build_sit_entries();

Regards
Yu

> 
> -Original Message-
> From: Chao Yu [mailto:chao2...@samsung.com]
> Sent: Tuesday, October 29, 2013 3:51 PM
> To: jaegeuk@samsung.com
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net
> Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros 
> bitmap
with
> bitops for better mount performance
> 
> Previously, check_block_count check valid_map with bit data type in 
> common scenario that sit has all ones or zeros bitmap, it makes low 
> mount performance.
> So let's check the special bitmap with integer data type instead of 
> the
bit one.
> 
> v1-->v2:
> use find_next_{zero_}bit_le for better performance and readable as 
> Jaegeuk suggested.
>   use neat logogram in comment as Gu Zheng suggested.
>   search continuous ones or zeros for better performance when checking

> mixed bitmap.
> 
> Suggested-by: Jaegeuk Kim 
> Signed-off-by: Shu Tan 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.h |   19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 
> abe7094..a7abfa8
> 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -550,8 +550,9 @@ static inline void check_block_count(struct 
> f2fs_sb_info *sbi,  {
>   struct f2fs_sm_info *sm_info = SM_I(sbi);
>   unsigned int end_segno = sm_info->segment_count - 1;
> + bool is_valid  = test_bit_le(0, raw_sit->valid_map) ? true : false;
>   int valid_blocks = 0;
> - int i;
> + int cur_pos = 0, next_pos;
> 
>   /* check segment usage */
>   BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9
> +561,19 @@ static inline void check_block_count(struct f2fs_sb_info 
> +*sbi,
>   BUG_ON(segno > end_segno);
> 
>   /* check bitmap with valid block count */
> - for (i = 0; i < sbi->blocks_per_seg; i++)
> - if (f2fs_test_bit(i, raw_sit->valid_map))
> - valid_blocks++;
> + do {
> + if (is_valid) {
> + next_pos =
> find_next_zero_bit_le(_sit->valid_map,
> + sbi->blocks_per_seg,
> + cur_pos);
> + valid_blocks += next_pos - cur_pos;
> + } else
> + next_pos = find_next_bit_le(_sit->valid_map,
> + sbi->blocks_per_seg,
> + cur_pos);
> + cur_pos = next_pos;
> + is_valid = !is_valid;
> + } while (cur_pos < sbi->blocks_per_seg);
>   BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
> 
> --
> 1.7.9.5
> 
> 
>

> --
> Android is increasing in popularity, but the open development platform
that
> developers love is also attractive to malware creators. Download this
white
> paper to learn more about secure code signing practices that can help 
> keep Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951=/4140/ostg.c
> lktr
> k
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee


Firstly,
Thanks. You're right. And I don't know it would be optimized but considering
pipeline.

for ( i =0; i < SIT_VBLOCK_MAP_SIZE; i += 4) {
valid_blocks += bit_count_byte(raw_sit->valid_map[i];
valid_blocks += bit_count_byte(raw_sit->valid_map[i+1];
valid_blocks += bit_count_byte(raw_sit->valid_map[i+2];
valid_blocks += bit_count_byte(raw_sit->valid_map[i+3];
}

Secondly,
I think also your patch is good in lots of case NOT aging for long time.

-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 7:07 PM
To: 'Changman Lee'; jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap with bitops for better mount performance

Hi Lee,

It's a good point.

Firstly, In your patch:
/* check bitmap with valid block count */
for (i = 0; i < sbi->blocks_per_seg; i++)
-   if (f2fs_test_bit(i, raw_sit->valid_map))
-   valid_blocks++;
+   valid_blocks += bit_count_byte(raw_sit->valid_map[i]);
+
BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }

for (i = 0; i < sbi->blocks_per_seg; i++) should be replace with for (i = 0;
i < SIT_VBLOCK_MAP_SIZE; i++)

Secondly, I tested your patch and mine
with SD and emmc with all zeros bitmap.
It shows my patch takes litter time.
Could you test and compare the performance of two patches.

--
1.7.10.4


> -Original Message-
> From: Changman Lee [mailto:cm224@samsung.com]
> Sent: Tuesday, October 29, 2013 3:36 PM
> To: 'Chao Yu'; jaegeuk@samsung.com
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap
> with bitops for better mount performance
> 
> Review attached patch, please.
> 
> -Original Message-
> From: Chao Yu [mailto:chao2...@samsung.com]
> Sent: Tuesday, October 29, 2013 3:51 PM
> To: jaegeuk@samsung.com
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap
with
> bitops for better mount performance
> 
> Previously, check_block_count check valid_map with bit data type in common
> scenario that sit has all ones or zeros bitmap, it makes low mount
> performance.
> So let's check the special bitmap with integer data type instead of the
bit one.
> 
> v1-->v2:
> use find_next_{zero_}bit_le for better performance and readable as
> Jaegeuk suggested.
>   use neat logogram in comment as Gu Zheng suggested.
>   search continuous ones or zeros for better performance when checking
> mixed bitmap.
> 
> Suggested-by: Jaegeuk Kim 
> Signed-off-by: Shu Tan 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.h |   19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8
> 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -550,8 +550,9 @@ static inline void check_block_count(struct
> f2fs_sb_info *sbi,  {
>   struct f2fs_sm_info *sm_info = SM_I(sbi);
>   unsigned int end_segno = sm_info->segment_count - 1;
> + bool is_valid  = test_bit_le(0, raw_sit->valid_map) ? true : false;
>   int valid_blocks = 0;
> - int i;
> + int cur_pos = 0, next_pos;
> 
>   /* check segment usage */
>   BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9
> +561,19 @@ static inline void check_block_count(struct f2fs_sb_info
> +*sbi,
>   BUG_ON(segno > end_segno);
> 
>   /* check bitmap with valid block count */
> - for (i = 0; i < sbi->blocks_per_seg; i++)
> - if (f2fs_test_bit(i, raw_sit->valid_map))
> - valid_blocks++;
> + do {
> + if (is_valid) {
> + next_pos =
> find_next_zero_bit_le(_sit->valid_map,
> + sbi->blocks_per_seg,
> + cur_pos);
> + valid_blocks += next_pos - cur_pos;
> + } else
> + next_pos = find_next_bit_le(_sit->valid_map,
> + sbi->blocks_per_seg,
> + cur_pos);
> + cur_pos = next_pos;
> + is_valid = !is_valid;
> + } while (cur_pos < sbi->blocks_per_seg);
>   BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
> 
> --
> 1.7.9.5
> 
>

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee

Review attached patch, please.

-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 3:51 PM
To: jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap
with bitops for better mount performance

Previously, check_block_count check valid_map with bit data type in common
scenario that sit has all ones or zeros bitmap, it makes low mount
performance.
So let's check the special bitmap with integer data type instead of the bit
one.

v1-->v2:
use find_next_{zero_}bit_le for better performance and readable as
Jaegeuk suggested.
use neat logogram in comment as Gu Zheng suggested.
search continuous ones or zeros for better performance when checking
mixed bitmap.

Suggested-by: Jaegeuk Kim 
Signed-off-by: Shu Tan 
Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.h |   19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8
100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info
*sbi,  {
struct f2fs_sm_info *sm_info = SM_I(sbi);
unsigned int end_segno = sm_info->segment_count - 1;
+   bool is_valid  = test_bit_le(0, raw_sit->valid_map) ? true : false;
int valid_blocks = 0;
-   int i;
+   int cur_pos = 0, next_pos;
 
/* check segment usage */
BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9
+561,19 @@ static inline void check_block_count(struct f2fs_sb_info *sbi,
BUG_ON(segno > end_segno);
 
/* check bitmap with valid block count */
-   for (i = 0; i < sbi->blocks_per_seg; i++)
-   if (f2fs_test_bit(i, raw_sit->valid_map))
-   valid_blocks++;
+   do {
+   if (is_valid) {
+   next_pos =
find_next_zero_bit_le(_sit->valid_map,
+   sbi->blocks_per_seg,
+   cur_pos);
+   valid_blocks += next_pos - cur_pos;
+   } else
+   next_pos = find_next_bit_le(_sit->valid_map,
+   sbi->blocks_per_seg,
+   cur_pos);
+   cur_pos = next_pos;
+   is_valid = !is_valid;
+   } while (cur_pos < sbi->blocks_per_seg);
BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
 
--
1.7.9.5



--
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951=/4140/ostg.clktrk
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


0001-f2fs-use-pre-calculated-value-to-get-sum-of-valid-bl.patch
Description: Binary data

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee

Review attached patch, please.

-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 3:51 PM
To: jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap
with bitops for better mount performance

Previously, check_block_count check valid_map with bit data type in common
scenario that sit has all ones or zeros bitmap, it makes low mount
performance.
So let's check the special bitmap with integer data type instead of the bit
one.

v1--v2:
use find_next_{zero_}bit_le for better performance and readable as
Jaegeuk suggested.
use neat logogram in comment as Gu Zheng suggested.
search continuous ones or zeros for better performance when checking
mixed bitmap.

Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
Signed-off-by: Shu Tan shu@samsung.com
Signed-off-by: Chao Yu chao2...@samsung.com
---
 fs/f2fs/segment.h |   19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8
100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info
*sbi,  {
struct f2fs_sm_info *sm_info = SM_I(sbi);
unsigned int end_segno = sm_info-segment_count - 1;
+   bool is_valid  = test_bit_le(0, raw_sit-valid_map) ? true : false;
int valid_blocks = 0;
-   int i;
+   int cur_pos = 0, next_pos;
 
/* check segment usage */
BUG_ON(GET_SIT_VBLOCKS(raw_sit)  sbi-blocks_per_seg); @@ -560,9
+561,19 @@ static inline void check_block_count(struct f2fs_sb_info *sbi,
BUG_ON(segno  end_segno);
 
/* check bitmap with valid block count */
-   for (i = 0; i  sbi-blocks_per_seg; i++)
-   if (f2fs_test_bit(i, raw_sit-valid_map))
-   valid_blocks++;
+   do {
+   if (is_valid) {
+   next_pos =
find_next_zero_bit_le(raw_sit-valid_map,
+   sbi-blocks_per_seg,
+   cur_pos);
+   valid_blocks += next_pos - cur_pos;
+   } else
+   next_pos = find_next_bit_le(raw_sit-valid_map,
+   sbi-blocks_per_seg,
+   cur_pos);
+   cur_pos = next_pos;
+   is_valid = !is_valid;
+   } while (cur_pos  sbi-blocks_per_seg);
BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
 
--
1.7.9.5



--
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


0001-f2fs-use-pre-calculated-value-to-get-sum-of-valid-bl.patch
Description: Binary data

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee


Firstly,
Thanks. You're right. And I don't know it would be optimized but considering
pipeline.

for ( i =0; i  SIT_VBLOCK_MAP_SIZE; i += 4) {
valid_blocks += bit_count_byte(raw_sit-valid_map[i];
valid_blocks += bit_count_byte(raw_sit-valid_map[i+1];
valid_blocks += bit_count_byte(raw_sit-valid_map[i+2];
valid_blocks += bit_count_byte(raw_sit-valid_map[i+3];
}

Secondly,
I think also your patch is good in lots of case NOT aging for long time.

-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 7:07 PM
To: 'Changman Lee'; jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap with bitops for better mount performance

Hi Lee,

It's a good point.

Firstly, In your patch:
/* check bitmap with valid block count */
for (i = 0; i  sbi-blocks_per_seg; i++)
-   if (f2fs_test_bit(i, raw_sit-valid_map))
-   valid_blocks++;
+   valid_blocks += bit_count_byte(raw_sit-valid_map[i]);
+
BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }

for (i = 0; i  sbi-blocks_per_seg; i++) should be replace with for (i = 0;
i  SIT_VBLOCK_MAP_SIZE; i++)

Secondly, I tested your patch and mine
with SD and emmc with all zeros bitmap.
It shows my patch takes litter time.
Could you test and compare the performance of two patches.

--
1.7.10.4


 -Original Message-
 From: Changman Lee [mailto:cm224@samsung.com]
 Sent: Tuesday, October 29, 2013 3:36 PM
 To: 'Chao Yu'; jaegeuk@samsung.com
 Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
 linux-f2fs-de...@lists.sourceforge.net
 Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap
 with bitops for better mount performance
 
 Review attached patch, please.
 
 -Original Message-
 From: Chao Yu [mailto:chao2...@samsung.com]
 Sent: Tuesday, October 29, 2013 3:51 PM
 To: jaegeuk@samsung.com
 Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
 linux-f2fs-de...@lists.sourceforge.net
 Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap
with
 bitops for better mount performance
 
 Previously, check_block_count check valid_map with bit data type in common
 scenario that sit has all ones or zeros bitmap, it makes low mount
 performance.
 So let's check the special bitmap with integer data type instead of the
bit one.
 
 v1--v2:
 use find_next_{zero_}bit_le for better performance and readable as
 Jaegeuk suggested.
   use neat logogram in comment as Gu Zheng suggested.
   search continuous ones or zeros for better performance when checking
 mixed bitmap.
 
 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
 Signed-off-by: Shu Tan shu@samsung.com
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.h |   19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)
 
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8
 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -550,8 +550,9 @@ static inline void check_block_count(struct
 f2fs_sb_info *sbi,  {
   struct f2fs_sm_info *sm_info = SM_I(sbi);
   unsigned int end_segno = sm_info-segment_count - 1;
 + bool is_valid  = test_bit_le(0, raw_sit-valid_map) ? true : false;
   int valid_blocks = 0;
 - int i;
 + int cur_pos = 0, next_pos;
 
   /* check segment usage */
   BUG_ON(GET_SIT_VBLOCKS(raw_sit)  sbi-blocks_per_seg); @@ -560,9
 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info
 +*sbi,
   BUG_ON(segno  end_segno);
 
   /* check bitmap with valid block count */
 - for (i = 0; i  sbi-blocks_per_seg; i++)
 - if (f2fs_test_bit(i, raw_sit-valid_map))
 - valid_blocks++;
 + do {
 + if (is_valid) {
 + next_pos =
 find_next_zero_bit_le(raw_sit-valid_map,
 + sbi-blocks_per_seg,
 + cur_pos);
 + valid_blocks += next_pos - cur_pos;
 + } else
 + next_pos = find_next_bit_le(raw_sit-valid_map,
 + sbi-blocks_per_seg,
 + cur_pos);
 + cur_pos = next_pos;
 + is_valid = !is_valid;
 + } while (cur_pos  sbi-blocks_per_seg);
   BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
 
 --
 1.7.9.5
 
 


 --
 Android is increasing in popularity, but the open development platform
that
 developers love is also attractive to malware creators. Download this
white
 paper to learn more about secure code signing practices that can help keep
 Android apps secure.
 http://pubads.g.doubleclick.net

RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance

2013-10-29 Thread Changman Lee


As you know, if any data or function are used once, we can use some keywords
like __initdata for data and __init for function.


-Original Message-
From: Chao Yu [mailto:chao2...@samsung.com] 
Sent: Tuesday, October 29, 2013 7:52 PM
To: 'Changman Lee'; jaegeuk@samsung.com
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
linux-f2fs-de...@lists.sourceforge.net
Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros
bitmap with bitops for better mount performance

Hi Lee,

 -Original Message-
 From: Changman Lee [mailto:cm224@samsung.com]
 Sent: Tuesday, October 29, 2013 3:36 PM
 To: 'Chao Yu'; jaegeuk@samsung.com
 Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net
 Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or 
 zeros
bitmap
 with bitops for better mount performance
 
 Review attached patch, please.

Could we hide the pre calculated value by generating it in allocated memory
by func, because the value will be no use after build_sit_entries();

Regards
Yu

 
 -Original Message-
 From: Chao Yu [mailto:chao2...@samsung.com]
 Sent: Tuesday, October 29, 2013 3:51 PM
 To: jaegeuk@samsung.com
 Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net
 Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros 
 bitmap
with
 bitops for better mount performance
 
 Previously, check_block_count check valid_map with bit data type in 
 common scenario that sit has all ones or zeros bitmap, it makes low 
 mount performance.
 So let's check the special bitmap with integer data type instead of 
 the
bit one.
 
 v1--v2:
 use find_next_{zero_}bit_le for better performance and readable as 
 Jaegeuk suggested.
   use neat logogram in comment as Gu Zheng suggested.
   search continuous ones or zeros for better performance when checking

 mixed bitmap.
 
 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
 Signed-off-by: Shu Tan shu@samsung.com
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.h |   19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)
 
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 
 abe7094..a7abfa8
 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -550,8 +550,9 @@ static inline void check_block_count(struct 
 f2fs_sb_info *sbi,  {
   struct f2fs_sm_info *sm_info = SM_I(sbi);
   unsigned int end_segno = sm_info-segment_count - 1;
 + bool is_valid  = test_bit_le(0, raw_sit-valid_map) ? true : false;
   int valid_blocks = 0;
 - int i;
 + int cur_pos = 0, next_pos;
 
   /* check segment usage */
   BUG_ON(GET_SIT_VBLOCKS(raw_sit)  sbi-blocks_per_seg); @@ -560,9
 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info 
 +*sbi,
   BUG_ON(segno  end_segno);
 
   /* check bitmap with valid block count */
 - for (i = 0; i  sbi-blocks_per_seg; i++)
 - if (f2fs_test_bit(i, raw_sit-valid_map))
 - valid_blocks++;
 + do {
 + if (is_valid) {
 + next_pos =
 find_next_zero_bit_le(raw_sit-valid_map,
 + sbi-blocks_per_seg,
 + cur_pos);
 + valid_blocks += next_pos - cur_pos;
 + } else
 + next_pos = find_next_bit_le(raw_sit-valid_map,
 + sbi-blocks_per_seg,
 + cur_pos);
 + cur_pos = next_pos;
 + is_valid = !is_valid;
 + } while (cur_pos  sbi-blocks_per_seg);
   BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);  }
 
 --
 1.7.9.5
 
 


 --
 Android is increasing in popularity, but the open development platform
that
 developers love is also attractive to malware creators. Download this
white
 paper to learn more about secure code signing practices that can help 
 keep Android apps secure.
 http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.c
 lktr
 k
 ___
 Linux-f2fs-devel mailing list
 linux-f2fs-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-20 Thread Changman Lee

On 금, 2013-06-14 at 13:20 +0900, Namjae Jeon wrote:
> 2013/6/11, Namjae Jeon :
> > 2013/6/11, Changman Lee :
> >> On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote:
> >>> 2013/6/10, Changman Lee :
> >>> > Hello, Namjae
> >>> Hi. Changman.
> >>> >
> >>> > If using ACL, whenever i_mode is changed we should update acl_mode
> >>> > which
> >>> > is written to xattr block, too. And vice versa.
> >>> > Because update_inode() is called at any reason and anytime, so we
> >>> > should
> >>> > sync both the moment xattr is written.
> >>> > We don't hope that only i_mode is written to disk and xattr is not. So
> >>> > f2fs_setattr is dirty.
> >>> Yes, agreed this could be issue.
> >>> >
> >>> > And, below code has a bug. When error is occurred, inode->i_mode
> >>> > shouldn't be changed. Please, check one more time, Namjae.
> >>> And, below code has a bug. When error is occurred, inode->i_mode
> >>> shouldn't be changed. Please, check one more time, Namjae.
> >>>
> >>> This was part of the default code, when ‘acl’ is not set for file’
> >>> Then, inode should be updated by these conditions (i.e., it covers the
> >>> ‘chmod’ and ‘setacl’ scenario).
> >>> When ACL is not present on the file and ‘chmod’ is done, then mode is
> >>> changed from this part, as f2fs_get_acl() will fail and cause the
> >>> below code to be executed:
> >>>
> >>> if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
> >>>  inode->i_mode = fi->i_acl_mode;
> >>>  clear_inode_flag(fi, FI_ACL_MODE);
> >>>   }
> >>>
> >>> Now, in order to make it consistent and work on all scenario we need
> >>> to make further change like this in addition to the patch changes.
> >>> setattr_copy(inode, attr);
> >>> if (attr->ia_valid & ATTR_MODE) {
> >>> + set_acl_inode(fi, inode->i_mode);
> >>>   err = f2fs_acl_chmod(inode);
> >>>   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
> >>>
> >>> Let me know your opinion.
> >>> Thanks.
> >>>
> >>
> >> setattr_copy changes inode->i_mode, this is not our expectation.
> >> So I made redundant __setatt_copy that copy attr->mode to
> >> fi->i_acl_mode.
> >> When acl_mode is reflected in xattr, acl_mode is copied to
> >> inode->i_mode.
> >>
> >> Agree?
> >>
> > Hi Changman.
> >
> > First, Sorry for interrupt.
> > I think that inode i_mode should be updated regardless of f2fs_acl_chmod.
> > Actually I am still not understand the reason why we should use
> > temporarily acl mode(i_acl_mode).
> > I wroted the v2 patch to not use i_acl_mode like this.
> > Am I missing something ?
> To Changman,
> I am still waiting for your reply. Correct us if we are wrong or
> missing something.
> 
> Hi Jaegeuk,
> Could you please share your views on this?
> 
> Thanks.

Sorry for late. I was very busy.

Could you tell me if it happens difference between xattr and i_mode,
what will you do?
The purpose of i_acl_mode is used to update i_mode and xattr together in
same lock region.

> 
> >
> > 
> > Subject: [PATCH v2] f2fs: reorganize the f2fs_setattr(), f2fs_set_acl,
> > f2fs_setxattr()
> > From: Namjae Jeon 
> >
> > Remove the redundant code from f2fs_setattr() function and make it aligned
> > with usages of generic vfs layer function e.g using the setattr_copy()
> > instead of using the f2fs specific function.
> >
> > Also correct the condition for updating the size of file via
> > truncate_setsize().
> >
> > Also modify the code of f2fs_set_acl and f2fs_setxattr for removing the
> > redundant code & add the required changes to correct the requested
> > operations.
> >
> > Remove the variable "i_acl_mode" from the f2fs_inode_info struct since
> > i_mode will
> > hold the latest 'mode' value which can be used for any further
> > references. And in
> > order to make 'chmod' work without ACL support, inode i_mode should be
> > first
> > updated correctly.
> >
> > Remove the helper functions to access and set the i_acl_mode.
> >
> > Signed-off

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-20 Thread Changman Lee

On 금, 2013-06-14 at 13:20 +0900, Namjae Jeon wrote:
 2013/6/11, Namjae Jeon linkinj...@gmail.com:
  2013/6/11, Changman Lee cm224@samsung.com:
  On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote:
  2013/6/10, Changman Lee cm224@samsung.com:
   Hello, Namjae
  Hi. Changman.
  
   If using ACL, whenever i_mode is changed we should update acl_mode
   which
   is written to xattr block, too. And vice versa.
   Because update_inode() is called at any reason and anytime, so we
   should
   sync both the moment xattr is written.
   We don't hope that only i_mode is written to disk and xattr is not. So
   f2fs_setattr is dirty.
  Yes, agreed this could be issue.
  
   And, below code has a bug. When error is occurred, inode-i_mode
   shouldn't be changed. Please, check one more time, Namjae.
  And, below code has a bug. When error is occurred, inode-i_mode
  shouldn't be changed. Please, check one more time, Namjae.
 
  This was part of the default code, when ‘acl’ is not set for file’
  Then, inode should be updated by these conditions (i.e., it covers the
  ‘chmod’ and ‘setacl’ scenario).
  When ACL is not present on the file and ‘chmod’ is done, then mode is
  changed from this part, as f2fs_get_acl() will fail and cause the
  below code to be executed:
 
  if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
   inode-i_mode = fi-i_acl_mode;
   clear_inode_flag(fi, FI_ACL_MODE);
}
 
  Now, in order to make it consistent and work on all scenario we need
  to make further change like this in addition to the patch changes.
  setattr_copy(inode, attr);
  if (attr-ia_valid  ATTR_MODE) {
  + set_acl_inode(fi, inode-i_mode);
err = f2fs_acl_chmod(inode);
if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
 
  Let me know your opinion.
  Thanks.
 
 
  setattr_copy changes inode-i_mode, this is not our expectation.
  So I made redundant __setatt_copy that copy attr-mode to
  fi-i_acl_mode.
  When acl_mode is reflected in xattr, acl_mode is copied to
  inode-i_mode.
 
  Agree?
 
  Hi Changman.
 
  First, Sorry for interrupt.
  I think that inode i_mode should be updated regardless of f2fs_acl_chmod.
  Actually I am still not understand the reason why we should use
  temporarily acl mode(i_acl_mode).
  I wroted the v2 patch to not use i_acl_mode like this.
  Am I missing something ?
 To Changman,
 I am still waiting for your reply. Correct us if we are wrong or
 missing something.
 
 Hi Jaegeuk,
 Could you please share your views on this?
 
 Thanks.

Sorry for late. I was very busy.

Could you tell me if it happens difference between xattr and i_mode,
what will you do?
The purpose of i_acl_mode is used to update i_mode and xattr together in
same lock region.

 
 
  
  Subject: [PATCH v2] f2fs: reorganize the f2fs_setattr(), f2fs_set_acl,
  f2fs_setxattr()
  From: Namjae Jeon namjae.j...@samsung.com
 
  Remove the redundant code from f2fs_setattr() function and make it aligned
  with usages of generic vfs layer function e.g using the setattr_copy()
  instead of using the f2fs specific function.
 
  Also correct the condition for updating the size of file via
  truncate_setsize().
 
  Also modify the code of f2fs_set_acl and f2fs_setxattr for removing the
  redundant code  add the required changes to correct the requested
  operations.
 
  Remove the variable i_acl_mode from the f2fs_inode_info struct since
  i_mode will
  hold the latest 'mode' value which can be used for any further
  references. And in
  order to make 'chmod' work without ACL support, inode i_mode should be
  first
  updated correctly.
 
  Remove the helper functions to access and set the i_acl_mode.
 
  Signed-off-by: Namjae Jeon namjae.j...@samsung.com
  Signed-off-by: Pankaj Kumar pankaj...@samsung.com
  ---
   fs/f2fs/acl.c   |   23 +--
   fs/f2fs/f2fs.h  |   17 -
   fs/f2fs/file.c  |   48 ++--
   fs/f2fs/xattr.c |9 ++---
   4 files changed, 17 insertions(+), 80 deletions(-)
 
  diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
  index 44abc2f..7ebddf1 100644
  --- a/fs/f2fs/acl.c
  +++ b/fs/f2fs/acl.c
  @@ -17,9 +17,6 @@
   #include xattr.h
   #include acl.h
 
  -#define get_inode_mode(i)  ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
  -   (F2FS_I(i)-i_acl_mode) : ((i)-i_mode))
  -
   static inline size_t f2fs_acl_size(int count)
   {
  if (count = 4) {
  @@ -208,7 +205,6 @@ struct posix_acl *f2fs_get_acl(struct inode
  *inode, int type)
   static int f2fs_set_acl(struct inode *inode, int type, struct posix_acl
  *acl)
   {
  struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb);
  -   struct f2fs_inode_info *fi = F2FS_I(inode);
  int name_index;
  void *value = NULL;
  size_t size = 0;
  @@ -226,9 +222,12 @@ static int

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-10 Thread Changman Lee

On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote:
> 2013/6/10, Changman Lee :
> > Hello, Namjae
> Hi. Changman.
> >
> > If using ACL, whenever i_mode is changed we should update acl_mode which
> > is written to xattr block, too. And vice versa.
> > Because update_inode() is called at any reason and anytime, so we should
> > sync both the moment xattr is written.
> > We don't hope that only i_mode is written to disk and xattr is not. So
> > f2fs_setattr is dirty.
> Yes, agreed this could be issue.
> >
> > And, below code has a bug. When error is occurred, inode->i_mode
> > shouldn't be changed. Please, check one more time, Namjae.
> And, below code has a bug. When error is occurred, inode->i_mode
> shouldn't be changed. Please, check one more time, Namjae.
> 
> This was part of the default code, when ‘acl’ is not set for file’
> Then, inode should be updated by these conditions (i.e., it covers the
> ‘chmod’ and ‘setacl’ scenario).
> When ACL is not present on the file and ‘chmod’ is done, then mode is
> changed from this part, as f2fs_get_acl() will fail and cause the
> below code to be executed:
> 
> if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
>  inode->i_mode = fi->i_acl_mode;
>  clear_inode_flag(fi, FI_ACL_MODE);
>   }
> 
> Now, in order to make it consistent and work on all scenario we need
> to make further change like this in addition to the patch changes.
> setattr_copy(inode, attr);
> if (attr->ia_valid & ATTR_MODE) {
> + set_acl_inode(fi, inode->i_mode);
>   err = f2fs_acl_chmod(inode);
>   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
> 
> Let me know your opinion.
> Thanks.
> 

setattr_copy changes inode->i_mode, this is not our expectation.
So I made redundant __setatt_copy that copy attr->mode to
fi->i_acl_mode.
When acl_mode is reflected in xattr, acl_mode is copied to
inode->i_mode.

Agree?

> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index deefd25..29cd449 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct
> > iattr *attr)
> >
> > if (attr->ia_valid & ATTR_MODE) {
> > err = f2fs_acl_chmod(inode);
> > -   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
> > -   inode->i_mode = fi->i_acl_mode;
> > +   if (err || is_inode_flag_set(fi, FI_ACL_MODE))
> > clear_inode_flag(fi, FI_ACL_MODE);
> > -   }
> > }
> >
> > Thanks.
> >
> >
> > On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote:
> >> From: Namjae Jeon 
> >>
> >> Remove the redundant code from this function and make it aligned with
> >> usages of latest generic vfs layer function e.g using the setattr_copy()
> >> instead of using the f2fs specific function.
> >>
> >> Also correct the condition for updating the size of file via
> >> truncate_setsize().
> >>
> >> Signed-off-by: Namjae Jeon 
> >> Signed-off-by: Pankaj Kumar 
> >> ---
> >>  fs/f2fs/acl.c  |5 +
> >>  fs/f2fs/file.c |   47 +--
> >>  2 files changed, 6 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
> >> index 44abc2f..2d13f44 100644
> >> --- a/fs/f2fs/acl.c
> >> +++ b/fs/f2fs/acl.c
> >> @@ -17,9 +17,6 @@
> >>  #include "xattr.h"
> >>  #include "acl.h"
> >>
> >> -#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ?
> >> \
> >> -  (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
> >> -
> >>  static inline size_t f2fs_acl_size(int count)
> >>  {
> >>if (count <= 4) {
> >> @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode)
> >>struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> >>struct posix_acl *acl;
> >>int error;
> >> -  umode_t mode = get_inode_mode(inode);
> >> +  umode_t mode = inode->i_mode;
> >>
> >>if (!test_opt(sbi, POSIX_ACL))
> >>return 0;
> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >> index deefd25..8dfc1da 100644
> >> --- a/fs/f2fs/file.c
> >> +++ b/fs/f2fs/file.c
> >> @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt,
>

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-10 Thread Changman Lee

On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote:
 2013/6/10, Changman Lee cm224@samsung.com:
  Hello, Namjae
 Hi. Changman.
 
  If using ACL, whenever i_mode is changed we should update acl_mode which
  is written to xattr block, too. And vice versa.
  Because update_inode() is called at any reason and anytime, so we should
  sync both the moment xattr is written.
  We don't hope that only i_mode is written to disk and xattr is not. So
  f2fs_setattr is dirty.
 Yes, agreed this could be issue.
 
  And, below code has a bug. When error is occurred, inode-i_mode
  shouldn't be changed. Please, check one more time, Namjae.
 And, below code has a bug. When error is occurred, inode-i_mode
 shouldn't be changed. Please, check one more time, Namjae.
 
 This was part of the default code, when ‘acl’ is not set for file’
 Then, inode should be updated by these conditions (i.e., it covers the
 ‘chmod’ and ‘setacl’ scenario).
 When ACL is not present on the file and ‘chmod’ is done, then mode is
 changed from this part, as f2fs_get_acl() will fail and cause the
 below code to be executed:
 
 if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
  inode-i_mode = fi-i_acl_mode;
  clear_inode_flag(fi, FI_ACL_MODE);
   }
 
 Now, in order to make it consistent and work on all scenario we need
 to make further change like this in addition to the patch changes.
 setattr_copy(inode, attr);
 if (attr-ia_valid  ATTR_MODE) {
 + set_acl_inode(fi, inode-i_mode);
   err = f2fs_acl_chmod(inode);
   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
 
 Let me know your opinion.
 Thanks.
 

setattr_copy changes inode-i_mode, this is not our expectation.
So I made redundant __setatt_copy that copy attr-mode to
fi-i_acl_mode.
When acl_mode is reflected in xattr, acl_mode is copied to
inode-i_mode.

Agree?

 
  diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
  index deefd25..29cd449 100644
  --- a/fs/f2fs/file.c
  +++ b/fs/f2fs/file.c
  @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct
  iattr *attr)
 
  if (attr-ia_valid  ATTR_MODE) {
  err = f2fs_acl_chmod(inode);
  -   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
  -   inode-i_mode = fi-i_acl_mode;
  +   if (err || is_inode_flag_set(fi, FI_ACL_MODE))
  clear_inode_flag(fi, FI_ACL_MODE);
  -   }
  }
 
  Thanks.
 
 
  On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote:
  From: Namjae Jeon namjae.j...@samsung.com
 
  Remove the redundant code from this function and make it aligned with
  usages of latest generic vfs layer function e.g using the setattr_copy()
  instead of using the f2fs specific function.
 
  Also correct the condition for updating the size of file via
  truncate_setsize().
 
  Signed-off-by: Namjae Jeon namjae.j...@samsung.com
  Signed-off-by: Pankaj Kumar pankaj...@samsung.com
  ---
   fs/f2fs/acl.c  |5 +
   fs/f2fs/file.c |   47 +--
   2 files changed, 6 insertions(+), 46 deletions(-)
 
  diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
  index 44abc2f..2d13f44 100644
  --- a/fs/f2fs/acl.c
  +++ b/fs/f2fs/acl.c
  @@ -17,9 +17,6 @@
   #include xattr.h
   #include acl.h
 
  -#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ?
  \
  -  (F2FS_I(i)-i_acl_mode) : ((i)-i_mode))
  -
   static inline size_t f2fs_acl_size(int count)
   {
 if (count = 4) {
  @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode)
 struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb);
 struct posix_acl *acl;
 int error;
  -  umode_t mode = get_inode_mode(inode);
  +  umode_t mode = inode-i_mode;
 
 if (!test_opt(sbi, POSIX_ACL))
 return 0;
  diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
  index deefd25..8dfc1da 100644
  --- a/fs/f2fs/file.c
  +++ b/fs/f2fs/file.c
  @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt,
 return 0;
   }
 
  -#ifdef CONFIG_F2FS_FS_POSIX_ACL
  -static void __setattr_copy(struct inode *inode, const struct iattr
  *attr)
  -{
  -  struct f2fs_inode_info *fi = F2FS_I(inode);
  -  unsigned int ia_valid = attr-ia_valid;
  -
  -  if (ia_valid  ATTR_UID)
  -  inode-i_uid = attr-ia_uid;
  -  if (ia_valid  ATTR_GID)
  -  inode-i_gid = attr-ia_gid;
  -  if (ia_valid  ATTR_ATIME)
  -  inode-i_atime = timespec_trunc(attr-ia_atime,
  -  inode-i_sb-s_time_gran);
  -  if (ia_valid  ATTR_MTIME)
  -  inode-i_mtime = timespec_trunc(attr-ia_mtime,
  -  inode-i_sb-s_time_gran);
  -  if (ia_valid  ATTR_CTIME)
  -  inode-i_ctime = timespec_trunc(attr-ia_ctime,
  -  inode-i_sb-s_time_gran);
  -  if (ia_valid  ATTR_MODE) {
  -  umode_t mode = attr-ia_mode

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-09 Thread Changman Lee

Hello, Namjae

If using ACL, whenever i_mode is changed we should update acl_mode which
is written to xattr block, too. And vice versa.
Because update_inode() is called at any reason and anytime, so we should
sync both the moment xattr is written.
We don't hope that only i_mode is written to disk and xattr is not. So
f2fs_setattr is dirty.

And, below code has a bug. When error is occurred, inode->i_mode
shouldn't be changed. Please, check one more time, Namjae.

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index deefd25..29cd449 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct
iattr *attr)
 
if (attr->ia_valid & ATTR_MODE) {
err = f2fs_acl_chmod(inode);
-   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
-   inode->i_mode = fi->i_acl_mode;
+   if (err || is_inode_flag_set(fi, FI_ACL_MODE))
clear_inode_flag(fi, FI_ACL_MODE);
-   }
}

Thanks.


On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote:
> From: Namjae Jeon 
> 
> Remove the redundant code from this function and make it aligned with
> usages of latest generic vfs layer function e.g using the setattr_copy()
> instead of using the f2fs specific function.
> 
> Also correct the condition for updating the size of file via
> truncate_setsize().
> 
> Signed-off-by: Namjae Jeon 
> Signed-off-by: Pankaj Kumar 
> ---
>  fs/f2fs/acl.c  |5 +
>  fs/f2fs/file.c |   47 +--
>  2 files changed, 6 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
> index 44abc2f..2d13f44 100644
> --- a/fs/f2fs/acl.c
> +++ b/fs/f2fs/acl.c
> @@ -17,9 +17,6 @@
>  #include "xattr.h"
>  #include "acl.h"
>  
> -#define get_inode_mode(i)((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
> - (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
> -
>  static inline size_t f2fs_acl_size(int count)
>  {
>   if (count <= 4) {
> @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode)
>   struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
>   struct posix_acl *acl;
>   int error;
> - umode_t mode = get_inode_mode(inode);
> + umode_t mode = inode->i_mode;
>  
>   if (!test_opt(sbi, POSIX_ACL))
>   return 0;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index deefd25..8dfc1da 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt,
>   return 0;
>  }
>  
> -#ifdef CONFIG_F2FS_FS_POSIX_ACL
> -static void __setattr_copy(struct inode *inode, const struct iattr *attr)
> -{
> - struct f2fs_inode_info *fi = F2FS_I(inode);
> - unsigned int ia_valid = attr->ia_valid;
> -
> - if (ia_valid & ATTR_UID)
> - inode->i_uid = attr->ia_uid;
> - if (ia_valid & ATTR_GID)
> - inode->i_gid = attr->ia_gid;
> - if (ia_valid & ATTR_ATIME)
> - inode->i_atime = timespec_trunc(attr->ia_atime,
> - inode->i_sb->s_time_gran);
> - if (ia_valid & ATTR_MTIME)
> - inode->i_mtime = timespec_trunc(attr->ia_mtime,
> - inode->i_sb->s_time_gran);
> - if (ia_valid & ATTR_CTIME)
> - inode->i_ctime = timespec_trunc(attr->ia_ctime,
> - inode->i_sb->s_time_gran);
> - if (ia_valid & ATTR_MODE) {
> - umode_t mode = attr->ia_mode;
> -
> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> - mode &= ~S_ISGID;
> - set_acl_inode(fi, mode);
> - }
> -}
> -#else
> -#define __setattr_copy setattr_copy
> -#endif
> -
>  int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
>  {
>   struct inode *inode = dentry->d_inode;
> - struct f2fs_inode_info *fi = F2FS_I(inode);
>   int err;
>  
>   err = inode_change_ok(inode, attr);
>   if (err)
>   return err;
>  
> - if ((attr->ia_valid & ATTR_SIZE) &&
> - attr->ia_size != i_size_read(inode)) {
> - truncate_setsize(inode, attr->ia_size);
> + if ((attr->ia_valid & ATTR_SIZE)) {
> + if (attr->ia_size != i_size_read(inode))
> + truncate_setsize(inode, attr->ia_size);
>   f2fs_truncate(inode);
>   f2fs_balance_fs(F2FS_SB(inode->i_sb));
>   }
>  
> - __setattr_copy(inode, attr);
> + setattr_copy(inode, attr);
>  
> - if (attr->ia_valid & ATTR_MODE) {
> + if (attr->ia_valid & ATTR_MODE)
>   err = f2fs_acl_chmod(inode);
> - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
> - inode->i_mode = fi->i_acl_mode;
> - clear_inode_flag(fi, FI_ACL_MODE);
> - }
> - }
>  
>   mark_inode_dirty(inode);
>   return

Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.

2013-06-09 Thread Changman Lee

Hello, Namjae

If using ACL, whenever i_mode is changed we should update acl_mode which
is written to xattr block, too. And vice versa.
Because update_inode() is called at any reason and anytime, so we should
sync both the moment xattr is written.
We don't hope that only i_mode is written to disk and xattr is not. So
f2fs_setattr is dirty.

And, below code has a bug. When error is occurred, inode-i_mode
shouldn't be changed. Please, check one more time, Namjae.

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index deefd25..29cd449 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct
iattr *attr)
 
if (attr-ia_valid  ATTR_MODE) {
err = f2fs_acl_chmod(inode);
-   if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
-   inode-i_mode = fi-i_acl_mode;
+   if (err || is_inode_flag_set(fi, FI_ACL_MODE))
clear_inode_flag(fi, FI_ACL_MODE);
-   }
}

Thanks.


On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote:
 From: Namjae Jeon namjae.j...@samsung.com
 
 Remove the redundant code from this function and make it aligned with
 usages of latest generic vfs layer function e.g using the setattr_copy()
 instead of using the f2fs specific function.
 
 Also correct the condition for updating the size of file via
 truncate_setsize().
 
 Signed-off-by: Namjae Jeon namjae.j...@samsung.com
 Signed-off-by: Pankaj Kumar pankaj...@samsung.com
 ---
  fs/f2fs/acl.c  |5 +
  fs/f2fs/file.c |   47 +--
  2 files changed, 6 insertions(+), 46 deletions(-)
 
 diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
 index 44abc2f..2d13f44 100644
 --- a/fs/f2fs/acl.c
 +++ b/fs/f2fs/acl.c
 @@ -17,9 +17,6 @@
  #include xattr.h
  #include acl.h
  
 -#define get_inode_mode(i)((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
 - (F2FS_I(i)-i_acl_mode) : ((i)-i_mode))
 -
  static inline size_t f2fs_acl_size(int count)
  {
   if (count = 4) {
 @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode)
   struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb);
   struct posix_acl *acl;
   int error;
 - umode_t mode = get_inode_mode(inode);
 + umode_t mode = inode-i_mode;
  
   if (!test_opt(sbi, POSIX_ACL))
   return 0;
 diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
 index deefd25..8dfc1da 100644
 --- a/fs/f2fs/file.c
 +++ b/fs/f2fs/file.c
 @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt,
   return 0;
  }
  
 -#ifdef CONFIG_F2FS_FS_POSIX_ACL
 -static void __setattr_copy(struct inode *inode, const struct iattr *attr)
 -{
 - struct f2fs_inode_info *fi = F2FS_I(inode);
 - unsigned int ia_valid = attr-ia_valid;
 -
 - if (ia_valid  ATTR_UID)
 - inode-i_uid = attr-ia_uid;
 - if (ia_valid  ATTR_GID)
 - inode-i_gid = attr-ia_gid;
 - if (ia_valid  ATTR_ATIME)
 - inode-i_atime = timespec_trunc(attr-ia_atime,
 - inode-i_sb-s_time_gran);
 - if (ia_valid  ATTR_MTIME)
 - inode-i_mtime = timespec_trunc(attr-ia_mtime,
 - inode-i_sb-s_time_gran);
 - if (ia_valid  ATTR_CTIME)
 - inode-i_ctime = timespec_trunc(attr-ia_ctime,
 - inode-i_sb-s_time_gran);
 - if (ia_valid  ATTR_MODE) {
 - umode_t mode = attr-ia_mode;
 -
 - if (!in_group_p(inode-i_gid)  !capable(CAP_FSETID))
 - mode = ~S_ISGID;
 - set_acl_inode(fi, mode);
 - }
 -}
 -#else
 -#define __setattr_copy setattr_copy
 -#endif
 -
  int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
  {
   struct inode *inode = dentry-d_inode;
 - struct f2fs_inode_info *fi = F2FS_I(inode);
   int err;
  
   err = inode_change_ok(inode, attr);
   if (err)
   return err;
  
 - if ((attr-ia_valid  ATTR_SIZE) 
 - attr-ia_size != i_size_read(inode)) {
 - truncate_setsize(inode, attr-ia_size);
 + if ((attr-ia_valid  ATTR_SIZE)) {
 + if (attr-ia_size != i_size_read(inode))
 + truncate_setsize(inode, attr-ia_size);
   f2fs_truncate(inode);
   f2fs_balance_fs(F2FS_SB(inode-i_sb));
   }
  
 - __setattr_copy(inode, attr);
 + setattr_copy(inode, attr);
  
 - if (attr-ia_valid  ATTR_MODE) {
 + if (attr-ia_valid  ATTR_MODE)
   err = f2fs_acl_chmod(inode);
 - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
 - inode-i_mode = fi-i_acl_mode;
 - clear_inode_flag(fi, FI_ACL_MODE);
 - }
 - }
  
   mark_inode_dirty(inode);
   return err;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message

Re: [PATCH 2/6] f2fs: move out f2fs_balance_fs from gc_thread_func

2013-02-03 Thread Changman Lee

As you know, f2fs_balance_fs conducts gc if f2fs has not enough free
sections. But, the purpose of background gc is to conduct gc during idle
time without checking free sections so that f2fs can solve defragment
condition.

Could you review this?

--> 8 --

>From fbda3262dac81c4f0d7ae8b9b757c820da593120 Mon Sep 17 00:00:00 2001
From: Changman Lee 
Date: Mon, 4 Feb 2013 10:05:09 +0900
Subject: [PATCH] f2fs: remove unnecessary gc option check and balance_fs

 1. If f2fs is mounted with background_gc_off option, checking
BG_GC is not redundant.
 2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant.

Signed-off-by: Changman Lee 
Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/gc.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 8fe43f3..e5c47f6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -49,11 +49,6 @@ static int gc_thread_func(void *data)
continue;
}
 
-   f2fs_balance_fs(sbi);
-
-   if (!test_opt(sbi, BG_GC))
-   continue;
-
/*
 * [GC triggering condition]
 * 0. GC is not conducted currently.
@@ -96,6 +91,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
 {
struct f2fs_gc_kthread *gc_th;
 
+   if (!test_opt(sbi, BG_GC))
+   return 0;
gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
if (!gc_th)
return -ENOMEM;
-- 
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/6] f2fs: move out f2fs_balance_fs from gc_thread_func

2013-02-03 Thread Changman Lee

As you know, f2fs_balance_fs conducts gc if f2fs has not enough free
sections. But, the purpose of background gc is to conduct gc during idle
time without checking free sections so that f2fs can solve defragment
condition.

Could you review this?

-- 8 --

From fbda3262dac81c4f0d7ae8b9b757c820da593120 Mon Sep 17 00:00:00 2001
From: Changman Lee cm224@samsung.com
Date: Mon, 4 Feb 2013 10:05:09 +0900
Subject: [PATCH] f2fs: remove unnecessary gc option check and balance_fs

 1. If f2fs is mounted with background_gc_off option, checking
BG_GC is not redundant.
 2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant.

Signed-off-by: Changman Lee cm224@samsung.com
Signed-off-by: Namjae Jeon namjae.j...@samsung.com
Signed-off-by: Amit Sahrawat a.sahra...@samsung.com
---
 fs/f2fs/gc.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 8fe43f3..e5c47f6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -49,11 +49,6 @@ static int gc_thread_func(void *data)
continue;
}
 
-   f2fs_balance_fs(sbi);
-
-   if (!test_opt(sbi, BG_GC))
-   continue;
-
/*
 * [GC triggering condition]
 * 0. GC is not conducted currently.
@@ -96,6 +91,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
 {
struct f2fs_gc_kthread *gc_th;
 
+   if (!test_opt(sbi, BG_GC))
+   return 0;
gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
if (!gc_th)
return -ENOMEM;
-- 
1.7.10.4



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-17 Thread Changman Lee



> -Original Message-
> From: Namjae Jeon [mailto:linkinj...@gmail.com]
> Sent: Wednesday, October 17, 2012 8:14 PM
> To: Changman Lee
> Cc: Jaegeuk Kim; Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
> ty...@mit.edu; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
> chur@samsung.com; cm224@samsung.com; jooyoung.hw...@samsung.com;
> linux-fsde...@vger.kernel.org
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> 
> 2012/10/11, Changman Lee :
> > 2012년 10월 11일 목요일에 Namjae Jeon님이 작성:
> >> 2012/10/10 Jaegeuk Kim :
> >>
> >>>>
> >>>> I mean that every volume is placed inside any partition (MTD or GPT).
> > Every partition begins from any
> >>>> physical sector. So, as I can understand, f2fs volume can begin from
> > physical sector that is laid
> >>>> inside physical erase block. Thereby, in such case of formating the
> > f2fs's operation units will be
> >>>> unaligned in relation of physical erase blocks, from my point of view.
> > Maybe, I misunderstand
> >>>> something but it can lead to additional FTL operations and performance
> > degradation, from my point of
> >>>> view.
> >>>
> >>> I think mkfs already calculates the offset to align that.
> >> I think this answer is not what he want.
> >> If you don't use partition table such as dos partition table or gpt, I
> >> think that it is possible to align using mkfs.
> >> But If we should consider partition table space in storage, I don't
> >> understand how it  could be align using mkfs.
> >>
> >> Thanks.
> >
> > We can know the physical starting sector address of any partitions from
> > hdio geometry information got by ioctl.
> If so, first block and end block of partition are useless ?
> 
> Thanks.

For example.
If we try to align a start point of F2FS in 2MB but start sector of any 
partition is not aligned in 2MB,
and of course F2FS will have some unused blocks. Instead, F2FS could reduce gc 
cost of ftl.
I don't know my answer is what you want.

> >
> >>> Thanks,
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> > in
> >>> the body of a message to majord...@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> >> in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >>
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-17 Thread Changman Lee

 -Original Message-
 From: Namjae Jeon [mailto:linkinj...@gmail.com]
 Sent: Wednesday, October 17, 2012 8:14 PM
 To: Changman Lee
 Cc: Jaegeuk Kim; Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
 ty...@mit.edu; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
 chur@samsung.com; cm224@samsung.com; jooyoung.hw...@samsung.com;
 linux-fsde...@vger.kernel.org
 Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

 2012/10/11, Changman Lee cm224@gmail.com:
  2012년 10월 11일 목요일에 Namjae Jeonlinkinj...@gmail.com님이 작성:
  2012/10/10 Jaegeuk Kim jaegeuk@samsung.com:

  I mean that every volume is placed inside any partition (MTD or GPT).
  Every partition begins from any
  physical sector. So, as I can understand, f2fs volume can begin from
  physical sector that is laid
  inside physical erase block. Thereby, in such case of formating the
  f2fs's operation units will be
  unaligned in relation of physical erase blocks, from my point of view.
  Maybe, I misunderstand
  something but it can lead to additional FTL operations and performance
  degradation, from my point of
  view.

  I think mkfs already calculates the offset to align that.
  I think this answer is not what he want.
  If you don't use partition table such as dos partition table or gpt, I
  think that it is possible to align using mkfs.
  But If we should consider partition table space in storage, I don't
  understand how it  could be align using mkfs.

  Thanks.

  We can know the physical starting sector address of any partitions from
  hdio geometry information got by ioctl.
 If so, first block and end block of partition are useless ?

 Thanks.

For example.
If we try to align a start point of F2FS in 2MB but start sector of any 
partition is not aligned in 2MB,
and of course F2FS will have some unused blocks. Instead, F2FS could reduce gc 
cost of ftl.
I don't know my answer is what you want.

  Thanks,

  --
  To unsubscribe from this list: send the line unsubscribe linux-fsdevel
  in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel
  in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

96 matches

Mail list logo