Re: exfatprogs-1.0.3 version released

2020-07-01 Thread Hyunchul Lee
Hello Sedat,

For v1.0.3 and later releases, we can provide tar.xz tarballs, hashes
and detached signatures.
But is there a reason why hashes are required despite the signature?

We will let you know when it's done.

Thanks.

Regards,
Hyunchul

2020년 6월 30일 (화) 오후 7:16, Sedat Dilek 님이 작성:
>
> On Tue, May 12, 2020 at 10:17 AM Namjae Jeon  wrote:
> >
> > Hi folk,
> >
> > We have released exfatprogs-1.0.3 version.
> > Any feedback is welcome!:)
> >
> > CHANGES :
> >  * Rename label.exfat to tune.exfat.
> >  * tune.exfat: change argument style(-l option for print level,
> >-L option for setting label)
> >  * mkfs.exfat: harmonize set volume label option with tune.exfat.
> >
> > NEW FEATURES :
> >  * Add man page.
> >
> > BUG FIXES :
> >  * Fix the reported build warnings/errors.
> >  * Add memset to clean garbage in allocation.
> >  * Fix wrong volume label array size.
> >  * Open a device using O_EXCL to avoid formatting it while it is mounted.
> >  * Fix incomplete "make dist" generated tarball.
> >
> > The git tree is at:
> >   https://github.com/exfatprogs/exfatprogs
> >
> > The tarballs can be found at:
> >   
> > https://github.com/exfatprogs/exfatprogs/releases/download/1.0.3/exfatprogs-1.0.3.tar.gz
> >
>
> Hi,
>
> thanks for the upgrade.
>
> Today, I contacted the Debian maintainer on how he wants to
> distinguish between exfat-utils vs. exfatprogs as Linux v5.7 entered
> Debian/unstable.
>
> When I looked at the release/tags page on github:
>
> Can you please offer tar.xz tarballs, please?
> Hashes? Like sha256sum
> Signing keys? (Signed tarballs?)
>
> Thanks.
>
> Regards,
> - Sedat -


[PATCH v2] exfat: call sync_filesystem for read-only remount

2020-06-15 Thread Hyunchul Lee
We need to commit dirty metadata and pages to disk
before remounting exfat as read-only.

This fixes a failure in xfstests generic/452

generic/452 does the following:
cp something /
mount -o remount,ro 

the /something is corrupted. because while
exfat is remounted as read-only, exfat doesn't
have a chance to commit metadata and
vfs invalidates page caches in a block device.

Signed-off-by: Hyunchul Lee 
---
Changes from v1:
- Does not check the return value of sync_filesystem to
  allow to change from "rw" to "ro" even when this function
  fails.
- Add the detailed explanation why generic/452 fails

 fs/exfat/super.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index e650e65536f8..253a92460d52 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -693,10 +693,20 @@ static void exfat_free(struct fs_context *fc)
}
 }
 
+static int exfat_reconfigure(struct fs_context *fc)
+{
+   fc->sb_flags |= SB_NODIRATIME;
+
+   /* volume flag will be updated in exfat_sync_fs */
+   sync_filesystem(fc->root->d_sb);
+   return 0;
+}
+
 static const struct fs_context_operations exfat_context_ops = {
.parse_param= exfat_parse_param,
.get_tree   = exfat_get_tree,
.free   = exfat_free,
+   .reconfigure= exfat_reconfigure,
 };
 
 static int exfat_init_fs_context(struct fs_context *fc)
-- 
2.17.1



Re: [PATCH 2/2] exfat: allow to change some mount options for remount

2020-06-14 Thread Hyunchul Lee
2020년 6월 15일 (월) 오전 9:18, Namjae Jeon 님이 작성:
>
> > Allow to change permission masks, allow_utime, errors. But ignore other 
> > options.
> >
> > Signed-off-by: Hyunchul Lee 
> > ---
> >  fs/exfat/super.c | 40 +---
> >  1 file changed, 29 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/exfat/super.c b/fs/exfat/super.c index 
> > 61c6cf240c19..3c1d47289ba2 100644
> > --- a/fs/exfat/super.c
> > +++ b/fs/exfat/super.c
> > @@ -696,9 +696,13 @@ static void exfat_free(struct fs_context *fc)  static 
> > int
> > exfat_reconfigure(struct fs_context *fc)  {
> >   struct super_block *sb = fc->root->d_sb;
> > + struct exfat_sb_info *sbi = EXFAT_SB(sb);
> > + struct exfat_mount_options *new_opts;
> >   int ret;
> >   bool new_rdonly;
> >
> > + new_opts = &((struct exfat_sb_info *)fc->s_fs_info)->options;
> > +
> >   new_rdonly = fc->sb_flags & SB_RDONLY;
> >   if (new_rdonly != sb_rdonly(sb)) {
> >   if (new_rdonly) {
> > @@ -708,6 +712,12 @@ static int exfat_reconfigure(struct fs_context *fc)
> >   return ret;
> >   }
> >   }
> > +
> > + /* allow to change these options but ignore others */
> > + sbi->options.fs_fmask = new_opts->fs_fmask;
> > + sbi->options.fs_dmask = new_opts->fs_dmask;
> > + sbi->options.allow_utime = new_opts->allow_utime;
> > + sbi->options.errors = new_opts->errors;
> Is there any reason why you allow a few options on remount ?

while exfat is remounted, inodes are not reclaimed. So I think
changing fs_uid, fs_gid, or time_offset is not impossible.
And I am not sure changing the iocharset is safe.

I am curious about your opinion.

Thanks.


> >   return 0;
> >  }
> >
> > @@ -726,17 +736,25 @@ static int exfat_init_fs_context(struct fs_context 
> > *fc)
> >   if (!sbi)
> >   return -ENOMEM;
> >
> > - mutex_init(>s_lock);
> > - ratelimit_state_init(>ratelimit, DEFAULT_RATELIMIT_INTERVAL,
> > - DEFAULT_RATELIMIT_BURST);
> > -
> > - sbi->options.fs_uid = current_uid();
> > - sbi->options.fs_gid = current_gid();
> > - sbi->options.fs_fmask = current->fs->umask;
> > - sbi->options.fs_dmask = current->fs->umask;
> > - sbi->options.allow_utime = -1;
> > - sbi->options.iocharset = exfat_default_iocharset;
> > - sbi->options.errors = EXFAT_ERRORS_RO;
> > + if (fc->root) {
> > + /* reconfiguration */
> > + memcpy(>options, _SB(fc->root->d_sb)->options,
> > + sizeof(struct exfat_mount_options));
> > + sbi->options.iocharset = exfat_default_iocharset;
> > + } else {
> > + mutex_init(>s_lock);
> > + ratelimit_state_init(>ratelimit,
> > + DEFAULT_RATELIMIT_INTERVAL,
> > + DEFAULT_RATELIMIT_BURST);
> > +
> > + sbi->options.fs_uid = current_uid();
> > + sbi->options.fs_gid = current_gid();
> > + sbi->options.fs_fmask = current->fs->umask;
> > + sbi->options.fs_dmask = current->fs->umask;
> > + sbi->options.allow_utime = -1;
> > + sbi->options.iocharset = exfat_default_iocharset;
> > + sbi->options.errors = EXFAT_ERRORS_RO;
> > + }
> >
> >   fc->s_fs_info = sbi;
> >   fc->ops = _context_ops;
> > --
> > 2.17.1
>
>


Re: [PATCH 1/2] exfat: call sync_filesystem for read-only remount

2020-06-14 Thread Hyunchul Lee
Hi Namjae,

2020년 6월 15일 (월) 오전 9:14, Namjae Jeon 님이 작성:
>
> Hi Hyunchul,
> > We need to commit dirty metadata and pages to disk before remounting exfat 
> > as read-only.
> >
> > This fixes a failure in xfstests generic/452
> Could you please elaborate more the reason why generic/452 in xfstests failed 
> ?

xfstests generic/452 does the following.
cp /bin/ls /
mount -o remount,ro 

the /ls file is corrupted, because while exfat is remounted as read-only,
exfat doesn't have a chance to commit metadata and vfs invalidates page
caches in a block device.

I will put this explanation in a commit message.

> >
> > Signed-off-by: Hyunchul Lee 
> > ---
> >  fs/exfat/super.c | 19 +++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/fs/exfat/super.c b/fs/exfat/super.c index 
> > e650e65536f8..61c6cf240c19 100644
> > --- a/fs/exfat/super.c
> > +++ b/fs/exfat/super.c
> > @@ -693,10 +693,29 @@ static void exfat_free(struct fs_context *fc)
> >   }
> >  }
> >
> > +static int exfat_reconfigure(struct fs_context *fc) {
> > + struct super_block *sb = fc->root->d_sb;
> > + int ret;
> int ret = 0;
> > + bool new_rdonly;
> > +
> > + new_rdonly = fc->sb_flags & SB_RDONLY;
> > + if (new_rdonly != sb_rdonly(sb)) {
> If you modify it like this, would not we need new_rdonly?
> if (fc->sb_flags & SB_RDONLY && !sb_rdonly(sb))
>
This condition means that mount options are changed from "rw" to "ro",
or "ro" to "rw".

> > + if (new_rdonly) {

And this condition means these options are changed from "rw" to "ro".
It seems better to change two conditions to the one you suggested, or
remove those. because sync_filesystem returns 0 when the filesystem is
mounted as read-only.

> > + /* volume flag will be updated in exfat_sync_fs */
> > + ret = sync_filesystem(sb);
> > + if (ret < 0)
> > + return ret;
> I think that this ret check can be removed by using return ret; below ?

Okay, I will apply this.
Thank you for your comments!


> > + }
> > + }
> > + return 0;
> return ret;
> > +}
> > +
> >  static const struct fs_context_operations exfat_context_ops = {
> >   .parse_param= exfat_parse_param,
> >   .get_tree   = exfat_get_tree,
> >   .free   = exfat_free,
> > + .reconfigure= exfat_reconfigure,
> >  };
> >
> >  static int exfat_init_fs_context(struct fs_context *fc)
> > --
> > 2.17.1
>
>


[PATCH 1/2] exfat: call sync_filesystem for read-only remount

2020-06-12 Thread Hyunchul Lee
We need to commit dirty metadata and pages to disk
before remounting exfat as read-only.

This fixes a failure in xfstests generic/452

Signed-off-by: Hyunchul Lee 
---
 fs/exfat/super.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index e650e65536f8..61c6cf240c19 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -693,10 +693,29 @@ static void exfat_free(struct fs_context *fc)
}
 }
 
+static int exfat_reconfigure(struct fs_context *fc)
+{
+   struct super_block *sb = fc->root->d_sb;
+   int ret;
+   bool new_rdonly;
+
+   new_rdonly = fc->sb_flags & SB_RDONLY;
+   if (new_rdonly != sb_rdonly(sb)) {
+   if (new_rdonly) {
+   /* volume flag will be updated in exfat_sync_fs */
+   ret = sync_filesystem(sb);
+   if (ret < 0)
+   return ret;
+   }
+   }
+   return 0;
+}
+
 static const struct fs_context_operations exfat_context_ops = {
.parse_param= exfat_parse_param,
.get_tree   = exfat_get_tree,
.free   = exfat_free,
+   .reconfigure= exfat_reconfigure,
 };
 
 static int exfat_init_fs_context(struct fs_context *fc)
-- 
2.17.1



[PATCH 2/2] exfat: allow to change some mount options for remount

2020-06-12 Thread Hyunchul Lee
Allow to change permission masks, allow_utime,
errors. But ignore other options.

Signed-off-by: Hyunchul Lee 
---
 fs/exfat/super.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 61c6cf240c19..3c1d47289ba2 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -696,9 +696,13 @@ static void exfat_free(struct fs_context *fc)
 static int exfat_reconfigure(struct fs_context *fc)
 {
struct super_block *sb = fc->root->d_sb;
+   struct exfat_sb_info *sbi = EXFAT_SB(sb);
+   struct exfat_mount_options *new_opts;
int ret;
bool new_rdonly;
 
+   new_opts = &((struct exfat_sb_info *)fc->s_fs_info)->options;
+
new_rdonly = fc->sb_flags & SB_RDONLY;
if (new_rdonly != sb_rdonly(sb)) {
if (new_rdonly) {
@@ -708,6 +712,12 @@ static int exfat_reconfigure(struct fs_context *fc)
return ret;
}
}
+
+   /* allow to change these options but ignore others */
+   sbi->options.fs_fmask = new_opts->fs_fmask;
+   sbi->options.fs_dmask = new_opts->fs_dmask;
+   sbi->options.allow_utime = new_opts->allow_utime;
+   sbi->options.errors = new_opts->errors;
return 0;
 }
 
@@ -726,17 +736,25 @@ static int exfat_init_fs_context(struct fs_context *fc)
if (!sbi)
return -ENOMEM;
 
-   mutex_init(>s_lock);
-   ratelimit_state_init(>ratelimit, DEFAULT_RATELIMIT_INTERVAL,
-   DEFAULT_RATELIMIT_BURST);
-
-   sbi->options.fs_uid = current_uid();
-   sbi->options.fs_gid = current_gid();
-   sbi->options.fs_fmask = current->fs->umask;
-   sbi->options.fs_dmask = current->fs->umask;
-   sbi->options.allow_utime = -1;
-   sbi->options.iocharset = exfat_default_iocharset;
-   sbi->options.errors = EXFAT_ERRORS_RO;
+   if (fc->root) {
+   /* reconfiguration */
+   memcpy(>options, _SB(fc->root->d_sb)->options,
+   sizeof(struct exfat_mount_options));
+   sbi->options.iocharset = exfat_default_iocharset;
+   } else {
+   mutex_init(>s_lock);
+   ratelimit_state_init(>ratelimit,
+   DEFAULT_RATELIMIT_INTERVAL,
+   DEFAULT_RATELIMIT_BURST);
+
+   sbi->options.fs_uid = current_uid();
+   sbi->options.fs_gid = current_gid();
+   sbi->options.fs_fmask = current->fs->umask;
+   sbi->options.fs_dmask = current->fs->umask;
+   sbi->options.allow_utime = -1;
+   sbi->options.iocharset = exfat_default_iocharset;
+   sbi->options.errors = EXFAT_ERRORS_RO;
+   }
 
fc->s_fs_info = sbi;
fc->ops = _context_ops;
-- 
2.17.1



[PATCH v3] f2fs: add nowait aio support

2018-03-08 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
Changes from v2:
 - Minor fixes in f2fs_overwrite_io
Changes from v1:
 - Return EGAIN if dio_rwsem is not lockable in f2fs_direct_IO
 - Fix the wrong calculation of last_lblk in f2fs_overwrite_io 

 fs/f2fs/data.c | 47 +--
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..251a141 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = NULL;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BLK_ALIGN(pos + len);
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, F2FS_GET_BLOCK_DEFAULT);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
@@ -2314,7 +2332,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
-   down_read(_I(inode)->dio_rwsem[rw]);
+   if (!down_read_trylock(_I(inode)->dio_rwsem[rw])) {
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   iocb->ki_hint = hint;
+   err = -EAGAIN;
+   goto out;
+   }
+   down_read(_I(inode)->dio_rwsem[rw]);
+   }
+
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
@@ -2330,6 +2356,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
}
}
 
+out:
trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
 
return err;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   

[PATCH v3] f2fs: add nowait aio support

2018-03-08 Thread Hyunchul Lee
From: Hyunchul Lee 

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee 
---
Changes from v2:
 - Minor fixes in f2fs_overwrite_io
Changes from v1:
 - Return EGAIN if dio_rwsem is not lockable in f2fs_direct_IO
 - Fix the wrong calculation of last_lblk in f2fs_overwrite_io 

 fs/f2fs/data.c | 47 +--
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..251a141 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = NULL;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BLK_ALIGN(pos + len);
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, F2FS_GET_BLOCK_DEFAULT);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
@@ -2314,7 +2332,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
-   down_read(_I(inode)->dio_rwsem[rw]);
+   if (!down_read_trylock(_I(inode)->dio_rwsem[rw])) {
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   iocb->ki_hint = hint;
+   err = -EAGAIN;
+   goto out;
+   }
+   down_read(_I(inode)->dio_rwsem[rw]);
+   }
+
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
@@ -2330,6 +2356,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
}
}
 
+out:
trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
 
return err;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   F2FS_I_SB(inode)->s_ndevs);
+}

[PATCH v2] f2fs: add nowait aio support

2018-03-07 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
Changes from v1:
 - Return EGAIN if dio_rwsem is not lockable in f2fs_direct_IO
 - Fix the wrong calculation of last_lblk in f2fs_overwrite_io 

 fs/f2fs/data.c | 47 +--
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..b27ea1e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = 0;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BYTES_TO_BLK(pos + len - 1) + 1;
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, 0);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
@@ -2314,7 +2332,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
-   down_read(_I(inode)->dio_rwsem[rw]);
+   if (!down_read_trylock(_I(inode)->dio_rwsem[rw])) {
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   iocb->ki_hint = hint;
+   err = -EAGAIN;
+   goto out;
+   }
+   down_read(_I(inode)->dio_rwsem[rw]);
+   }
+
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
@@ -2330,6 +2356,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
}
}
 
+out:
trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
 
return err;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   F2FS_I_SB(inode)->s_ndevs);
+}
+
 #endif
diff -

[PATCH v2] f2fs: add nowait aio support

2018-03-07 Thread Hyunchul Lee
From: Hyunchul Lee 

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee 
---
Changes from v1:
 - Return EGAIN if dio_rwsem is not lockable in f2fs_direct_IO
 - Fix the wrong calculation of last_lblk in f2fs_overwrite_io 

 fs/f2fs/data.c | 47 +--
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..b27ea1e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = 0;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BYTES_TO_BLK(pos + len - 1) + 1;
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, 0);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
@@ -2314,7 +2332,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
-   down_read(_I(inode)->dio_rwsem[rw]);
+   if (!down_read_trylock(_I(inode)->dio_rwsem[rw])) {
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   iocb->ki_hint = hint;
+   err = -EAGAIN;
+   goto out;
+   }
+   down_read(_I(inode)->dio_rwsem[rw]);
+   }
+
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
@@ -2330,6 +2356,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
}
}
 
+out:
trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
 
return err;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   F2FS_I_SB(inode)->s_ndevs);
+}
+
 #endif
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6a202

[PATCH] f2fs: add nowait aio support

2018-03-01 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/data.c | 36 +++-
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..9a550c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = 0;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BYTES_TO_BLK(pos + len) + 1;
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, 0);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   F2FS_I_SB(inode)->s_ndevs);
+}
+
 #endif
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6a202e5..1051edd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -479,6 +479,9 @@ static int f2fs_file_open(struct inode *inode, struct file 
*filp)
 
if (err)
return err;
+
+   filp->f_mode |= FMODE_NOWAIT;
+
return dquot_file_open(inode, filp);
 }
 
@@ -2895,7 +2898,15 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
return -EIO;
 
-   inode_lock(inode);
+   if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+   return -EINVAL;
+
+   if (!inode_trylock(inode)) {
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EAGAIN;
+   inode_lock(inode);
+   }
+
ret = generic_write_checks(iocb, from);
if (ret > 0) {
int err;
@@ -2903,11 +2914,23 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (iov

[PATCH] f2fs: add nowait aio support

2018-03-01 Thread Hyunchul Lee
From: Hyunchul Lee 

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/data.c | 36 +++-
 fs/f2fs/f2fs.h |  8 
 fs/f2fs/file.c | 35 +--
 3 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6c3c978..9a550c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -839,13 +839,6 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
return 0;
 }
 
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
-   return (f2fs_encrypted_file(inode) ||
-   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
-   F2FS_I_SB(inode)->s_ndevs);
-}
-
 int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
@@ -877,7 +870,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
 
if (direct_io) {
map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
-   flag = __force_buffered_io(inode, WRITE) ?
+   flag = f2fs_force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO;
goto map_blocks;
@@ -1121,6 +1114,31 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+   struct f2fs_map_blocks map;
+   block_t last_lblk;
+   int err;
+
+   if (pos + len > i_size_read(inode))
+   return false;
+
+   map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+   map.m_next_pgofs = 0;
+   map.m_next_extent = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
+   last_lblk = F2FS_BYTES_TO_BLK(pos + len) + 1;
+
+   while (map.m_lblk < last_lblk) {
+   map.m_len = last_lblk - map.m_lblk;
+   err = f2fs_map_blocks(inode, , 0, 0);
+   if (err || map.m_len == 0)
+   return false;
+   map.m_lblk += map.m_len;
+   }
+   return true;
+}
+
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
pgoff_t *next_pgofs, int seg_type)
@@ -2306,7 +2324,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
if (err)
return err;
 
-   if (__force_buffered_io(inode, rw))
+   if (f2fs_force_buffered_io(inode, rw))
return 0;
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f6dc706..351226e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2875,6 +2875,7 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
 int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
 
 /*
  * gc.c
@@ -3259,4 +3260,11 @@ static inline bool f2fs_may_encrypt(struct inode *inode)
 #endif
 }
 
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+   return (f2fs_encrypted_file(inode) ||
+   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+   F2FS_I_SB(inode)->s_ndevs);
+}
+
 #endif
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6a202e5..1051edd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -479,6 +479,9 @@ static int f2fs_file_open(struct inode *inode, struct file 
*filp)
 
if (err)
return err;
+
+   filp->f_mode |= FMODE_NOWAIT;
+
return dquot_file_open(inode, filp);
 }
 
@@ -2895,7 +2898,15 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
return -EIO;
 
-   inode_lock(inode);
+   if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+   return -EINVAL;
+
+   if (!inode_trylock(inode)) {
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   return -EAGAIN;
+   inode_lock(inode);
+   }
+
ret = generic_write_checks(iocb, from);
if (ret > 0) {
int err;
@@ -2903,11 +2914,23 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (iov_iter_fault_in_readable(from, iov_

[PATCH v2 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
v2:
 - Set "whint_mode" to off if "active_logs" is two or four
 - Use a local variable to check "whint_mode" instead of sbi->whint_mode
   in f2fs_direct_IO
 - Fix comments about rw_hint_to_seg_type()

 fs/f2fs/data.c| 28 +-
 fs/f2fs/f2fs.h|  9 +
 fs/f2fs/segment.c | 59 +++
 fs/f2fs/super.c   | 30 +++-
 4 files changed, 120 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6cba74e..726b0ef 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
  */
 static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
struct writeback_control *wbc,
-   int npages, bool is_read)
+   int npages, bool is_read,
+   enum page_type type, enum temp_type temp)
 {
struct bio *bio;
 
bio = f2fs_bio_alloc(sbi, npages, true);
 
f2fs_target_device(sbi, blk_addr, bio);
-   bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
-   bio->bi_private = is_read ? NULL : sbi;
+   if (is_read) {
+   bio->bi_end_io = f2fs_read_end_io;
+   bio->bi_private = NULL;
+   } else {
+   bio->bi_end_io = f2fs_write_end_io;
+   bio->bi_private = sbi;
+   bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
+   }
if (wbc)
wbc_init_bio(wbc, bio);
 
@@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
/* Allocate a new bio */
bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
-   1, is_read_io(fio->op));
+   1, is_read_io(fio->op), fio->type, fio->temp);
 
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
bio_put(bio);
@@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
goto out_fail;
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
-   BIO_MAX_PAGES, false);
+   BIO_MAX_PAGES, false,
+   fio->type, fio->temp);
io->fio = *fio;
}
 
@@ -2287,10 +2295,13 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint hint;
+   int whint_mode = sbi->whint_mode;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2301,11 +2312,18 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_it

[PATCH v2 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee 

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee 
---
v2:
 - Set "whint_mode" to off if "active_logs" is two or four
 - Use a local variable to check "whint_mode" instead of sbi->whint_mode
   in f2fs_direct_IO
 - Fix comments about rw_hint_to_seg_type()

 fs/f2fs/data.c| 28 +-
 fs/f2fs/f2fs.h|  9 +
 fs/f2fs/segment.c | 59 +++
 fs/f2fs/super.c   | 30 +++-
 4 files changed, 120 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6cba74e..726b0ef 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
  */
 static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
struct writeback_control *wbc,
-   int npages, bool is_read)
+   int npages, bool is_read,
+   enum page_type type, enum temp_type temp)
 {
struct bio *bio;
 
bio = f2fs_bio_alloc(sbi, npages, true);
 
f2fs_target_device(sbi, blk_addr, bio);
-   bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
-   bio->bi_private = is_read ? NULL : sbi;
+   if (is_read) {
+   bio->bi_end_io = f2fs_read_end_io;
+   bio->bi_private = NULL;
+   } else {
+   bio->bi_end_io = f2fs_write_end_io;
+   bio->bi_private = sbi;
+   bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
+   }
if (wbc)
wbc_init_bio(wbc, bio);
 
@@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
/* Allocate a new bio */
bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
-   1, is_read_io(fio->op));
+   1, is_read_io(fio->op), fio->type, fio->temp);
 
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
bio_put(bio);
@@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
goto out_fail;
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
-   BIO_MAX_PAGES, false);
+   BIO_MAX_PAGES, false,
+   fio->type, fio->temp);
io->fio = *fio;
}
 
@@ -2287,10 +2295,13 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint hint;
+   int whint_mode = sbi->whint_mode;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2301,11 +2312,18 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode

[PATCH v2 3/3] f2fs: Add the 'whint_mode' mount option to f2fs documentation

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 Documentation/filesystems/f2fs.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 13c2ff0..414a160 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -174,6 +174,12 @@ offgrpjquota   Turn off group journelled quota.
 offprjjquota   Turn off project journelled quota.
 quota  Enable plain user disk quota accounting.
 noquotaDisable all plain disk quota option.
+whint_mode=%s  Control which write hints are passed down to block
+   layer. This supports "off", "user-based", and
+   "fs-based".  In "off" mode (default), f2fs does not pass
+   down hints. In "user-based" mode, f2fs tries to pass
+   down hints given by users. And in "fs-based" mode, f2fs
+   passes down hints with its policy.
 
 

 DEBUGFS ENTRIES
-- 
1.9.1



[PATCH v2 3/3] f2fs: Add the 'whint_mode' mount option to f2fs documentation

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee 

Signed-off-by: Hyunchul Lee 
---
 Documentation/filesystems/f2fs.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 13c2ff0..414a160 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -174,6 +174,12 @@ offgrpjquota   Turn off group journelled quota.
 offprjjquota   Turn off project journelled quota.
 quota  Enable plain user disk quota accounting.
 noquotaDisable all plain disk quota option.
+whint_mode=%s  Control which write hints are passed down to block
+   layer. This supports "off", "user-based", and
+   "fs-based".  In "off" mode (default), f2fs does not pass
+   down hints. In "user-based" mode, f2fs tries to pass
+   down hints given by users. And in "fs-based" mode, f2fs
+   passes down hints with its policy.
 
 

 DEBUGFS ENTRIES
-- 
1.9.1



[PATCH v2 2/3] f2fs: support passing down write hints to block layer with F2FS policy

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
v2:
 - Change "case" statements to "if" statements in rw_hint_to_seg_type()

 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 57 ++-
 fs/f2fs/super.c   |  5 +
 3 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 475637d..8273bc7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1040,6 +1040,7 @@ enum {
 enum {
WHINT_MODE_OFF, /* not pass down write hints */
WHINT_MODE_USER,/* try to pass down hints given by users */
+   WHINT_MODE_FS,  /* pass down hints with F2FS policy */
 };
 
 struct f2fs_sb_info {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 840c8ff..733a733 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2488,6 +2488,32 @@ int rw_hint_to_seg_type(enum rw_hint hint)
  * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
  * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User  F2FS Block
+ *    -
+ *   META WRITE_LIFE_MEDIUM;
+ *   HOT_NODE WRITE_LIFE_NOT_SET
+ *   WARM_NODE"
+ *   COLD_NODEWRITE_LIFE_NONE
+ * ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
+ * extension list""
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
+ * WRITE_LIFE_NONE   ""
+ * WRITE_LIFE_MEDIUM ""
+ * WRITE_LIFE_LONG   ""
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE   "WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  */
 
 enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
@@ -2495,20 +2521,33 @@ enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info 
*sbi,
 {
if (sbi->whint_mode == WHINT_MODE_USER) {
if (type == DATA) {
-   switch (temp) {
-   case COLD:
-   return WRITE_LIFE_EXTREME;
-   case HOT:
-   return WRITE_LIFE_SHORT;
-   default:
+   if (temp == WARM)
return WRITE_LIFE_NOT_SET;
-   }
+   else if (temp == HOT)
+   return WRITE_LIFE_SHORT;
+   else if (temp == COLD)
+   return WRITE_LIFE_EXTREME;
} else {
   

[PATCH v2 2/3] f2fs: support passing down write hints to block layer with F2FS policy

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee 

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee 
---
v2:
 - Change "case" statements to "if" statements in rw_hint_to_seg_type()

 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 57 ++-
 fs/f2fs/super.c   |  5 +
 3 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 475637d..8273bc7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1040,6 +1040,7 @@ enum {
 enum {
WHINT_MODE_OFF, /* not pass down write hints */
WHINT_MODE_USER,/* try to pass down hints given by users */
+   WHINT_MODE_FS,  /* pass down hints with F2FS policy */
 };
 
 struct f2fs_sb_info {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 840c8ff..733a733 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2488,6 +2488,32 @@ int rw_hint_to_seg_type(enum rw_hint hint)
  * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
  * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User  F2FS Block
+ *    -
+ *   META WRITE_LIFE_MEDIUM;
+ *   HOT_NODE WRITE_LIFE_NOT_SET
+ *   WARM_NODE"
+ *   COLD_NODEWRITE_LIFE_NONE
+ * ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
+ * extension list""
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
+ * WRITE_LIFE_NONE   ""
+ * WRITE_LIFE_MEDIUM ""
+ * WRITE_LIFE_LONG   ""
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE   "WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  */
 
 enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
@@ -2495,20 +2521,33 @@ enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info 
*sbi,
 {
if (sbi->whint_mode == WHINT_MODE_USER) {
if (type == DATA) {
-   switch (temp) {
-   case COLD:
-   return WRITE_LIFE_EXTREME;
-   case HOT:
-   return WRITE_LIFE_SHORT;
-   default:
+   if (temp == WARM)
return WRITE_LIFE_NOT_SET;
-   }
+   else if (temp == HOT)
+   return WRITE_LIFE_SHORT;
+   else if (temp == COLD)
+   return WRITE_LIFE_EXTREME;
} else {
return WRITE_LIFE_NOT_SET;

[PATCH v2 0/3] f2fs: support passing down write hints to block layer

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Changes since version 1:
 - Set 'whint_mode' to off if 'active_logs' is two or four 
 - Minor fixes suggested by Chao

This set implements passing down write hints to block layer with the
following mapping. This mapping equals the conclusion from discussion in
the link, https://sourceforge.net/p/linux-f2fs/mailman/message/36170969/

But there are two exceptions. (1) the 'iohint_mode' mount option is changed
to 'whint_mode'. (2) in "user-based" mode, WRITE_LIFE_EXTREME is passed
down instead of WRITE_LIFE_NOT_SET for files flagged with ioctl(COLD) and 
extension list.

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given by
users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
*ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
*extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

3) whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Hyunchul Lee (3):
  f2fs: support passing down write hints given by users to block layer
  f2fs: support passing down write hints to block layer with F2FS policy
  f2fs: Add the 'whint_mode' mount option to f2fs documentation

 Documentation/filesystems/f2fs.txt |  6 +++
 fs/f2fs/data.c | 28 +--
 fs/f2fs/f2fs.h | 10 
 fs/f2fs/segment.c  | 98 ++
 fs/f2fs/super.c| 35 +-
 5 files changed, 171 insertions(+), 6 deletions(-)

-- 
1.9.1



[PATCH v2 0/3] f2fs: support passing down write hints to block layer

2018-01-30 Thread Hyunchul Lee
From: Hyunchul Lee 

Changes since version 1:
 - Set 'whint_mode' to off if 'active_logs' is two or four 
 - Minor fixes suggested by Chao

This set implements passing down write hints to block layer with the
following mapping. This mapping equals the conclusion from discussion in
the link, https://sourceforge.net/p/linux-f2fs/mailman/message/36170969/

But there are two exceptions. (1) the 'iohint_mode' mount option is changed
to 'whint_mode'. (2) in "user-based" mode, WRITE_LIFE_EXTREME is passed
down instead of WRITE_LIFE_NOT_SET for files flagged with ioctl(COLD) and 
extension list.

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given by
users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
*ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
*extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

3) whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Hyunchul Lee (3):
  f2fs: support passing down write hints given by users to block layer
  f2fs: support passing down write hints to block layer with F2FS policy
  f2fs: Add the 'whint_mode' mount option to f2fs documentation

 Documentation/filesystems/f2fs.txt |  6 +++
 fs/f2fs/data.c | 28 +--
 fs/f2fs/f2fs.h | 10 
 fs/f2fs/segment.c  | 98 ++
 fs/f2fs/super.c| 35 +-
 5 files changed, 171 insertions(+), 6 deletions(-)

-- 
1.9.1



Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-28 Thread Hyunchul Lee

On 01/26/2018 11:10 AM, Chao Yu wrote:
> On 2018/1/26 7:46, Hyunchul Lee wrote:
>> On 01/25/2018 05:01 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2018/1/25 10:47, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 01/25/2018 12:32 AM, Chao Yu wrote:
>>>>> On 2018/1/22 18:49, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>
>>>>>> Add the 'whint_mode' mount option that controls which write
>>>>>> hints are passed down to block layer. There are "off" and
>>>>>> "user-based" mode. The default mode is "off".
>>>>>>
>>>>>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>>>>>> by users.
>>>>>
>>>>> Minor,
>>>>>
>>>>> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
>>>>>
>>>>> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
>>>>> ...
>>>>>
>>>>
>>>> Okay, I will reflect this.
>>>>
>>>>> How about changing all comments and codes with above order?
>>>>>
>>>>>>
>>>>>> User  F2FS Block
>>>>>>    -
>>>>>>   META WRITE_LIFE_NOT_SET
>>>>>>   HOT_NODE "
>>>>>>   WARM_NODE"
>>>>>>   COLD_NODE"
>>>>>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>>>>>> extension list""
>>>>>>
>>>>>> -- buffered io
>>>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>>>> WRITE_LIFE_NONE   ""
>>>>>> WRITE_LIFE_MEDIUM ""
>>>>>> WRITE_LIFE_LONG   ""
>>>>>>
>>>>>> -- direct io
>>>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>>>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>>>>>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>>>>>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>>>>>
>>>>>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>>>>>
>>>>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>>>>> implement this patch.
>>>>>>
>>>>>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>>>>>> ---
>>>>>>  fs/f2fs/data.c| 27 -
>>>>>>  fs/f2fs/f2fs.h|  9 +
>>>>>>  fs/f2fs/segment.c | 59 
>>>>>> +++
>>>>>>  fs/f2fs/super.c   | 24 +-
>>>>>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>> index 6cba74e..c76ddc2 100644
>>>>>> --- a/fs/f2fs/data.c
>>>>>> +++ b/fs/f2fs/data.c
>>>>>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>>>>>   */
>>>>>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t 
>>>>>> blk_addr,
>>>>>>  struct writeback_control *wbc,
>>>>>> -int npages, bool is_read)
>>>>>> +int npages, bool is_read,
>>>>>> +enum page_type type, enum temp_type 
>>>>>> temp)
>>>

Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-28 Thread Hyunchul Lee

On 01/26/2018 11:10 AM, Chao Yu wrote:
> On 2018/1/26 7:46, Hyunchul Lee wrote:
>> On 01/25/2018 05:01 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2018/1/25 10:47, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 01/25/2018 12:32 AM, Chao Yu wrote:
>>>>> On 2018/1/22 18:49, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee 
>>>>>>
>>>>>> Add the 'whint_mode' mount option that controls which write
>>>>>> hints are passed down to block layer. There are "off" and
>>>>>> "user-based" mode. The default mode is "off".
>>>>>>
>>>>>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>>>>>> by users.
>>>>>
>>>>> Minor,
>>>>>
>>>>> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
>>>>>
>>>>> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
>>>>> ...
>>>>>
>>>>
>>>> Okay, I will reflect this.
>>>>
>>>>> How about changing all comments and codes with above order?
>>>>>
>>>>>>
>>>>>> User  F2FS Block
>>>>>>    -
>>>>>>   META WRITE_LIFE_NOT_SET
>>>>>>   HOT_NODE "
>>>>>>   WARM_NODE"
>>>>>>   COLD_NODE"
>>>>>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>>>>>> extension list""
>>>>>>
>>>>>> -- buffered io
>>>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>>>> WRITE_LIFE_NONE   ""
>>>>>> WRITE_LIFE_MEDIUM ""
>>>>>> WRITE_LIFE_LONG   ""
>>>>>>
>>>>>> -- direct io
>>>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>>>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>>>>>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>>>>>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>>>>>
>>>>>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>>>>>
>>>>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>>>>> implement this patch.
>>>>>>
>>>>>> Signed-off-by: Hyunchul Lee 
>>>>>> ---
>>>>>>  fs/f2fs/data.c| 27 -
>>>>>>  fs/f2fs/f2fs.h|  9 +
>>>>>>  fs/f2fs/segment.c | 59 
>>>>>> +++
>>>>>>  fs/f2fs/super.c   | 24 +-
>>>>>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>> index 6cba74e..c76ddc2 100644
>>>>>> --- a/fs/f2fs/data.c
>>>>>> +++ b/fs/f2fs/data.c
>>>>>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>>>>>   */
>>>>>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t 
>>>>>> blk_addr,
>>>>>>  struct writeback_control *wbc,
>>>>>> -int npages, bool is_read)
>>>>>> +int npages, bool is_read,
>>>>>> +enum page_type type, enum temp_type 
>>>>>> temp)
>>>>>>

Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-25 Thread Hyunchul Lee
On 01/25/2018 05:01 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2018/1/25 10:47, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 01/25/2018 12:32 AM, Chao Yu wrote:
>>> On 2018/1/22 18:49, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>
>>>> Add the 'whint_mode' mount option that controls which write
>>>> hints are passed down to block layer. There are "off" and
>>>> "user-based" mode. The default mode is "off".
>>>>
>>>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>>>> by users.
>>>
>>> Minor,
>>>
>>> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
>>>
>>> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
>>> ...
>>>
>>
>> Okay, I will reflect this.
>>
>>> How about changing all comments and codes with above order?
>>>
>>>>
>>>> User  F2FS Block
>>>>    -
>>>>   META WRITE_LIFE_NOT_SET
>>>>   HOT_NODE "
>>>>   WARM_NODE"
>>>>   COLD_NODE"
>>>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>>>> extension list""
>>>>
>>>> -- buffered io
>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>> WRITE_LIFE_NONE   ""
>>>> WRITE_LIFE_MEDIUM ""
>>>> WRITE_LIFE_LONG   ""
>>>>
>>>> -- direct io
>>>> WRITE_LIFE_EXTREMECOLD_DATA        WRITE_LIFE_EXTREME
>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>>>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>>>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>>>
>>>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>>>
>>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>>> implement this patch.
>>>>
>>>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>>>> ---
>>>>  fs/f2fs/data.c| 27 -
>>>>  fs/f2fs/f2fs.h|  9 +
>>>>  fs/f2fs/segment.c | 59 
>>>> +++
>>>>  fs/f2fs/super.c   | 24 +-
>>>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 6cba74e..c76ddc2 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>>>   */
>>>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
>>>>struct writeback_control *wbc,
>>>> -  int npages, bool is_read)
>>>> +  int npages, bool is_read,
>>>> +  enum page_type type, enum temp_type temp)
>>>>  {
>>>>struct bio *bio;
>>>>  
>>>>bio = f2fs_bio_alloc(sbi, npages, true);
>>>>  
>>>>f2fs_target_device(sbi, blk_addr, bio);
>>>> -  bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
>>>> -  bio->bi_private = is_read ? NULL : sbi;
>>>> +  if (is_read) {
>>>> +  bio->bi_end_io = f2fs_read_end_io;
>>>> +  bio->bi_private = NULL;
>>>> +  } else {
>>>> +  bio->bi_end_io = f2fs_write_end_io;
>>>> +  bio->bi_private = sbi;
>>>> +  bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
>>>> +  }
>>>>if (wbc)
>>>>

Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-25 Thread Hyunchul Lee
On 01/25/2018 05:01 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2018/1/25 10:47, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 01/25/2018 12:32 AM, Chao Yu wrote:
>>> On 2018/1/22 18:49, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee 
>>>>
>>>> Add the 'whint_mode' mount option that controls which write
>>>> hints are passed down to block layer. There are "off" and
>>>> "user-based" mode. The default mode is "off".
>>>>
>>>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>>>> by users.
>>>
>>> Minor,
>>>
>>> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
>>>
>>> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
>>> ...
>>>
>>
>> Okay, I will reflect this.
>>
>>> How about changing all comments and codes with above order?
>>>
>>>>
>>>> User  F2FS Block
>>>>    -
>>>>   META WRITE_LIFE_NOT_SET
>>>>   HOT_NODE "
>>>>   WARM_NODE"
>>>>   COLD_NODE"
>>>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>>>> extension list""
>>>>
>>>> -- buffered io
>>>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>> WRITE_LIFE_NONE   ""
>>>> WRITE_LIFE_MEDIUM ""
>>>> WRITE_LIFE_LONG   ""
>>>>
>>>> -- direct io
>>>> WRITE_LIFE_EXTREMECOLD_DATA    WRITE_LIFE_EXTREME
>>>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>>>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>>>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>>>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>>>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>>>
>>>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>>>
>>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>>> implement this patch.
>>>>
>>>> Signed-off-by: Hyunchul Lee 
>>>> ---
>>>>  fs/f2fs/data.c| 27 -
>>>>  fs/f2fs/f2fs.h|  9 +
>>>>  fs/f2fs/segment.c | 59 
>>>> +++
>>>>  fs/f2fs/super.c   | 24 +-
>>>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 6cba74e..c76ddc2 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>>>   */
>>>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
>>>>struct writeback_control *wbc,
>>>> -  int npages, bool is_read)
>>>> +  int npages, bool is_read,
>>>> +  enum page_type type, enum temp_type temp)
>>>>  {
>>>>struct bio *bio;
>>>>  
>>>>bio = f2fs_bio_alloc(sbi, npages, true);
>>>>  
>>>>f2fs_target_device(sbi, blk_addr, bio);
>>>> -  bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
>>>> -  bio->bi_private = is_read ? NULL : sbi;
>>>> +  if (is_read) {
>>>> +  bio->bi_end_io = f2fs_read_end_io;
>>>> +  bio->bi_private = NULL;
>>>> +  } else {
>>>> +  bio->bi_end_io = f2fs_write_end_io;
>>>> +  bio->bi_private = sbi;
>>>> +  bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
>>>> +  }
>>>>if (wbc)
>>>>wbc_init_bio(wbc, bio);
>>&

Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-24 Thread Hyunchul Lee
Hi Chao,

On 01/25/2018 12:32 AM, Chao Yu wrote:
> On 2018/1/22 18:49, Hyunchul Lee wrote:
>> From: Hyunchul Lee <cheol@lge.com>
>>
>> Add the 'whint_mode' mount option that controls which write
>> hints are passed down to block layer. There are "off" and
>> "user-based" mode. The default mode is "off".
>>
>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>> by users.
> 
> Minor,
> 
> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
> 
> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
> ...
> 

Okay, I will reflect this.

> How about changing all comments and codes with above order?
> 
>>
>> User  F2FS Block
>>    -
>>   META WRITE_LIFE_NOT_SET
>>   HOT_NODE "
>>   WARM_NODE"
>>   COLD_NODE"
>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>> extension list""
>>
>> -- buffered io
>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE   ""
>> WRITE_LIFE_MEDIUM ""
>> WRITE_LIFE_LONG   ""
>>
>> -- direct io
>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>
>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>
>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>> implement this patch.
>>
>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>> ---
>>  fs/f2fs/data.c| 27 -
>>  fs/f2fs/f2fs.h|  9 +
>>  fs/f2fs/segment.c | 59 
>> +++
>>  fs/f2fs/super.c   | 24 +-
>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 6cba74e..c76ddc2 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>   */
>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
>>  struct writeback_control *wbc,
>> -int npages, bool is_read)
>> +int npages, bool is_read,
>> +enum page_type type, enum temp_type temp)
>>  {
>>  struct bio *bio;
>>  
>>  bio = f2fs_bio_alloc(sbi, npages, true);
>>  
>>  f2fs_target_device(sbi, blk_addr, bio);
>> -bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
>> -bio->bi_private = is_read ? NULL : sbi;
>> +if (is_read) {
>> +bio->bi_end_io = f2fs_read_end_io;
>> +bio->bi_private = NULL;
>> +} else {
>> +bio->bi_end_io = f2fs_write_end_io;
>> +bio->bi_private = sbi;
>> +bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
>> +}
>>  if (wbc)
>>  wbc_init_bio(wbc, bio);
>>  
>> @@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
>>  
>>  /* Allocate a new bio */
>>  bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
>> -1, is_read_io(fio->op));
>> +1, is_read_io(fio->op), fio->type, fio->temp);
>>  
>>  if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
>>  bio_put(bio);
>> @@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>>  goto out_fail;
>>  }
>>  io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
>> - 

Re: [f2fs-dev] [PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-24 Thread Hyunchul Lee
Hi Chao,

On 01/25/2018 12:32 AM, Chao Yu wrote:
> On 2018/1/22 18:49, Hyunchul Lee wrote:
>> From: Hyunchul Lee 
>>
>> Add the 'whint_mode' mount option that controls which write
>> hints are passed down to block layer. There are "off" and
>> "user-based" mode. The default mode is "off".
>>
>> 1) whint_mode=user-based. F2FS tries to pass down hints given
>> by users.
> 
> Minor,
> 
> 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET
> 
> 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
> ...
> 

Okay, I will reflect this.

> How about changing all comments and codes with above order?
> 
>>
>> User  F2FS Block
>>    -
>>   META WRITE_LIFE_NOT_SET
>>   HOT_NODE "
>>   WARM_NODE"
>>   COLD_NODE"
>> ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
>> extension list""
>>
>> -- buffered io
>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE   ""
>> WRITE_LIFE_MEDIUM ""
>> WRITE_LIFE_LONG   ""
>>
>> -- direct io
>> WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE   "WRITE_LIFE_NONE
>> WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
>> WRITE_LIFE_LONG   "WRITE_LIFE_LONG
>>
>> 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
>>
>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>> implement this patch.
>>
>> Signed-off-by: Hyunchul Lee 
>> ---
>>  fs/f2fs/data.c| 27 -
>>  fs/f2fs/f2fs.h|  9 +
>>  fs/f2fs/segment.c | 59 
>> +++
>>  fs/f2fs/super.c   | 24 +-
>>  4 files changed, 113 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 6cba74e..c76ddc2 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
>>   */
>>  static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
>>  struct writeback_control *wbc,
>> -int npages, bool is_read)
>> +int npages, bool is_read,
>> +enum page_type type, enum temp_type temp)
>>  {
>>  struct bio *bio;
>>  
>>  bio = f2fs_bio_alloc(sbi, npages, true);
>>  
>>  f2fs_target_device(sbi, blk_addr, bio);
>> -bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
>> -bio->bi_private = is_read ? NULL : sbi;
>> +if (is_read) {
>> +bio->bi_end_io = f2fs_read_end_io;
>> +bio->bi_private = NULL;
>> +} else {
>> +bio->bi_end_io = f2fs_write_end_io;
>> +bio->bi_private = sbi;
>> +bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
>> +}
>>  if (wbc)
>>  wbc_init_bio(wbc, bio);
>>  
>> @@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
>>  
>>  /* Allocate a new bio */
>>  bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
>> -1, is_read_io(fio->op));
>> +1, is_read_io(fio->op), fio->type, fio->temp);
>>  
>>  if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
>>  bio_put(bio);
>> @@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>>  goto out_fail;
>>  }
>>  io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
>> -BIO_MAX_PAGES, false);
>> +  

[PATCH 2/3] f2fs: support passing down write hints to block layer with F2FS policy

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 49 +++--
 fs/f2fs/super.c   |  5 +
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d7c2797..898f37d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1038,6 +1038,7 @@ enum {
 enum {
WHINT_MODE_OFF, /* not pass down write hints */
WHINT_MODE_USER,/* try to pass down hints given by users */
+   WHINT_MODE_FS,  /* pass down hints with F2FS policy */
 };
 
 struct f2fs_sb_info {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 8bc1fc1..b7cae61 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2491,6 +2491,32 @@ int rw_hint_to_seg_type(enum rw_hint hint)
  *
  * 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
  *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User  F2FS Block
+ *    -
+ *   META WRITE_LIFE_MEDIUM;
+ *   HOT_NODE WRITE_LIFE_NOT_SET
+ *   WARM_NODE"
+ *   COLD_NODEWRITE_LIFE_NONE
+ * ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
+ * extension list""
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
+ * WRITE_LIFE_NONE   ""
+ * WRITE_LIFE_MEDIUM ""
+ * WRITE_LIFE_LONG   ""
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE   "WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  */
 
 enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
@@ -2509,9 +2535,28 @@ enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
} else {
return WRITE_LIFE_NOT_SET;
}
-   } else {
-   return WRITE_LIFE_NOT_SET;
+   } else if (sbi->whint_mode == WHINT_MODE_FS) {
+   if (type == DATA) {
+   switch (temp) {
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   case HOT:
+   return WRITE_LIFE_SHORT;
+   default:
+   return WRITE_LIFE_LONG;
+   }
+   } else if (type == NODE) {
+   switch (temp) {
+   case COLD:
+   return WRITE_LIFE_NONE;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+   } else if (type == ME

[PATCH 2/3] f2fs: support passing down write hints to block layer with F2FS policy

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee 

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 49 +++--
 fs/f2fs/super.c   |  5 +
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d7c2797..898f37d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1038,6 +1038,7 @@ enum {
 enum {
WHINT_MODE_OFF, /* not pass down write hints */
WHINT_MODE_USER,/* try to pass down hints given by users */
+   WHINT_MODE_FS,  /* pass down hints with F2FS policy */
 };
 
 struct f2fs_sb_info {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 8bc1fc1..b7cae61 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2491,6 +2491,32 @@ int rw_hint_to_seg_type(enum rw_hint hint)
  *
  * 2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
  *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User  F2FS Block
+ *    -
+ *   META WRITE_LIFE_MEDIUM;
+ *   HOT_NODE WRITE_LIFE_NOT_SET
+ *   WARM_NODE"
+ *   COLD_NODEWRITE_LIFE_NONE
+ * ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
+ * extension list""
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
+ * WRITE_LIFE_NONE   ""
+ * WRITE_LIFE_MEDIUM ""
+ * WRITE_LIFE_LONG   ""
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE   "WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG   "WRITE_LIFE_LONG
  */
 
 enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
@@ -2509,9 +2535,28 @@ enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
} else {
return WRITE_LIFE_NOT_SET;
}
-   } else {
-   return WRITE_LIFE_NOT_SET;
+   } else if (sbi->whint_mode == WHINT_MODE_FS) {
+   if (type == DATA) {
+   switch (temp) {
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   case HOT:
+   return WRITE_LIFE_SHORT;
+   default:
+   return WRITE_LIFE_LONG;
+   }
+   } else if (type == NODE) {
+   switch (temp) {
+   case COLD:
+   return WRITE_LIFE_NONE;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+   } else if (type == META) {
+   return WRITE_LIFE_MEDIUM;
+ 

[PATCH 0/3] f2fs: support passing down write hints to block layer

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This set implements passing down write hints to block layer with the
following mapping. This mapping equals the conclusion from discussion in
the link, https://sourceforge.net/p/linux-f2fs/mailman/message/36170969/

But there are two exceptions. (1) the 'iohint_mode' mount option is changed
to 'whint_mode'. (2) in "user-based" mode, WRITE_LIFE_EXTREME is passed
down instead of WRITE_LIFE_NOT_SET for files flagged with ioctl(COLD) and 
extension list.

Sorry for late patch.

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given by
users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
*ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
*extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

3) whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Hyunchul Lee (3):
  f2fs: support passing down write hints given by users to block layer
  f2fs: support passing down write hints to block layer with F2FS policy
  f2fs: Add the 'whint_mode' mount option to f2fs documentation

 Documentation/filesystems/f2fs.txt |   6 +++
 fs/f2fs/data.c |  27 --
 fs/f2fs/f2fs.h |  10 
 fs/f2fs/segment.c  | 104 +
 fs/f2fs/super.c|  29 ++-
 5 files changed, 170 insertions(+), 6 deletions(-)

-- 
1.9.1



[PATCH 3/3] f2fs: Add the 'whint_mode' mount option to f2fs documentation

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 Documentation/filesystems/f2fs.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 13c2ff0..414a160 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -174,6 +174,12 @@ offgrpjquota   Turn off group journelled quota.
 offprjjquota   Turn off project journelled quota.
 quota  Enable plain user disk quota accounting.
 noquotaDisable all plain disk quota option.
+whint_mode=%s  Control which write hints are passed down to block
+   layer. This supports "off", "user-based", and
+   "fs-based".  In "off" mode (default), f2fs does not pass
+   down hints. In "user-based" mode, f2fs tries to pass
+   down hints given by users. And in "fs-based" mode, f2fs
+   passes down hints with its policy.
 
 

 DEBUGFS ENTRIES
-- 
1.9.1



[PATCH 0/3] f2fs: support passing down write hints to block layer

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee 

This set implements passing down write hints to block layer with the
following mapping. This mapping equals the conclusion from discussion in
the link, https://sourceforge.net/p/linux-f2fs/mailman/message/36170969/

But there are two exceptions. (1) the 'iohint_mode' mount option is changed
to 'whint_mode'. (2) in "user-based" mode, WRITE_LIFE_EXTREME is passed
down instead of WRITE_LIFE_NOT_SET for files flagged with ioctl(COLD) and 
extension list.

Sorry for late patch.

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given by
users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
*ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
*extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

3) whint_mode=fs-based. F2FS passes down hints with its policy.

User  F2FS Block
   -
  META WRITE_LIFE_MEDIUM;
  HOT_NODE WRITE_LIFE_NOT_SET
  WARM_NODE"
  COLD_NODEWRITE_LIFE_NONE
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_LONG
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

Hyunchul Lee (3):
  f2fs: support passing down write hints given by users to block layer
  f2fs: support passing down write hints to block layer with F2FS policy
  f2fs: Add the 'whint_mode' mount option to f2fs documentation

 Documentation/filesystems/f2fs.txt |   6 +++
 fs/f2fs/data.c |  27 --
 fs/f2fs/f2fs.h |  10 
 fs/f2fs/segment.c  | 104 +
 fs/f2fs/super.c|  29 ++-
 5 files changed, 170 insertions(+), 6 deletions(-)

-- 
1.9.1



[PATCH 3/3] f2fs: Add the 'whint_mode' mount option to f2fs documentation

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee 

Signed-off-by: Hyunchul Lee 
---
 Documentation/filesystems/f2fs.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 13c2ff0..414a160 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -174,6 +174,12 @@ offgrpjquota   Turn off group journelled quota.
 offprjjquota   Turn off project journelled quota.
 quota  Enable plain user disk quota accounting.
 noquotaDisable all plain disk quota option.
+whint_mode=%s  Control which write hints are passed down to block
+   layer. This supports "off", "user-based", and
+   "fs-based".  In "off" mode (default), f2fs does not pass
+   down hints. In "user-based" mode, f2fs tries to pass
+   down hints given by users. And in "fs-based" mode, f2fs
+   passes down hints with its policy.
 
 

 DEBUGFS ENTRIES
-- 
1.9.1



[PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/data.c| 27 -
 fs/f2fs/f2fs.h|  9 +
 fs/f2fs/segment.c | 59 +++
 fs/f2fs/super.c   | 24 +-
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6cba74e..c76ddc2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
  */
 static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
struct writeback_control *wbc,
-   int npages, bool is_read)
+   int npages, bool is_read,
+   enum page_type type, enum temp_type temp)
 {
struct bio *bio;
 
bio = f2fs_bio_alloc(sbi, npages, true);
 
f2fs_target_device(sbi, blk_addr, bio);
-   bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
-   bio->bi_private = is_read ? NULL : sbi;
+   if (is_read) {
+   bio->bi_end_io = f2fs_read_end_io;
+   bio->bi_private = NULL;
+   } else {
+   bio->bi_end_io = f2fs_write_end_io;
+   bio->bi_private = sbi;
+   bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
+   }
if (wbc)
wbc_init_bio(wbc, bio);
 
@@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
/* Allocate a new bio */
bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
-   1, is_read_io(fio->op));
+   1, is_read_io(fio->op), fio->type, fio->temp);
 
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
bio_put(bio);
@@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
goto out_fail;
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
-   BIO_MAX_PAGES, false);
+   BIO_MAX_PAGES, false,
+   fio->type, fio->temp);
io->fio = *fio;
}
 
@@ -2287,10 +2295,12 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint hint;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2301,11 +2311,18 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
+   if (rw == WRITE && sbi->whint_mode == WHINT_MODE_OFF) {
+   hint = iocb->ki_hint;
+   iocb->ki_hint = WRITE_LIFE_NOT_SET;
+   }
+
down_read(_I(

[PATCH 1/3] f2fs: support passing down write hints given by users to block layer

2018-01-22 Thread Hyunchul Lee
From: Hyunchul Lee 

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User  F2FS Block
   -
  META WRITE_LIFE_NOT_SET
  HOT_NODE "
  WARM_NODE"
  COLD_NODE"
ioctl(COLD)   COLD_DATAWRITE_LIFE_EXTREME
extension list""

-- buffered io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   ""
WRITE_LIFE_MEDIUM ""
WRITE_LIFE_LONG   ""

-- direct io
WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME
WRITE_LIFE_SHORT  HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET
WRITE_LIFE_NONE   "WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG   "WRITE_LIFE_LONG

2) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/data.c| 27 -
 fs/f2fs/f2fs.h|  9 +
 fs/f2fs/segment.c | 59 +++
 fs/f2fs/super.c   | 24 +-
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6cba74e..c76ddc2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -175,15 +175,22 @@ static bool __same_bdev(struct f2fs_sb_info *sbi,
  */
 static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
struct writeback_control *wbc,
-   int npages, bool is_read)
+   int npages, bool is_read,
+   enum page_type type, enum temp_type temp)
 {
struct bio *bio;
 
bio = f2fs_bio_alloc(sbi, npages, true);
 
f2fs_target_device(sbi, blk_addr, bio);
-   bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
-   bio->bi_private = is_read ? NULL : sbi;
+   if (is_read) {
+   bio->bi_end_io = f2fs_read_end_io;
+   bio->bi_private = NULL;
+   } else {
+   bio->bi_end_io = f2fs_write_end_io;
+   bio->bi_private = sbi;
+   bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp);
+   }
if (wbc)
wbc_init_bio(wbc, bio);
 
@@ -382,7 +389,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 
/* Allocate a new bio */
bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
-   1, is_read_io(fio->op));
+   1, is_read_io(fio->op), fio->type, fio->temp);
 
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
bio_put(bio);
@@ -445,7 +452,8 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
goto out_fail;
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
-   BIO_MAX_PAGES, false);
+   BIO_MAX_PAGES, false,
+   fio->type, fio->temp);
io->fio = *fio;
}
 
@@ -2287,10 +2295,12 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint hint;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2301,11 +2311,18 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
+   if (rw == WRITE && sbi->whint_mode == WHINT_MODE_OFF) {
+   hint = iocb->ki_hint;
+   iocb->ki_hint = WRITE_LIFE_NOT_SET;
+   }
+
down_read(_I(inode)->dio_rwsem[rw]);
err = blockdev_direct_IO(

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-27 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/28/2017 12:26 PM, Jaegeuk Kim wrote:
> On 12/23, Chao Yu wrote:
>> On 2017/12/15 10:06, Jaegeuk Kim wrote:
>>> On 12/14, Hyunchul Lee wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> I need your comment about the fs_iohint mount option.
>>>>
>>>> a) w/o fs_iohint, propagate user hints to low layer.
>>>> b) w/ fs_iohint, ignore user hints, and use hints which is generated
>>>> with F2FS.
>>>>
>>>> Chao suggests this option. because user hints are more accurate than
>>>> file system.
>>>>
>>>> This is resonable, But I have some concerns about this option. 
>>>> The first thing is that blocks of a segments have different hints. This
>>>> could make GC less effective. 
>>>> The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
>>>> really needed. I think that difference between them is a little ambigous 
>>>> for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
>>>> hints by F2FS.
>>>
>>> I think what we really can do would assign many user hints to our 3 DATA
>>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
>>> Then, we can decide how to keep that as much as possible, since we have
>>> another filesystem metadata such as meta and nodes. In addition, I don't
>>> think we have to keep the original user-hints which makes F2FS logs be
>>> messed up.
>>>
>>> With that mind, I can think of the below cases. Especially, if user wants
>>> to keep their io_hints, we'd better recommend to use direct_io w/o 
>>> fs_iohints.
>>
>>
>>
>>> In order to keep this policy, I think fs_iohints would be better to be a
>>> feature set by mkfs.f2fs and detected by sysfs entries for users.
>>>
>>> 1) w/ fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   WRITE_LIFE_MEDIUM
>>> HOT_NODE   WRITE_LIFE_NOTSET
>>> WARM_NODE  -'
>>> COLD_NODE  WRITE_LIFE_NONE
>>> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
>>> extention list  -' -'
>>> WRITE_LIFE_EXTREME  -' -'
>>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>>>
>>> -- buffered_io
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
>>> WRITE_LIFE_NONE -' -'
>>> WRITE_LIFE_MEDIUM   -' -'
>>> WRITE_LIFE_LONG -' -'
>>>
>>> -- direct_io (Not recommendable)
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
>>> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
>>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
>>
>> Agreed with above IO hint mapping rule.
>>
>>>
>>> 2) w/o fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   -
>>> HOT_NODE   -
>>> WARM_NODE  -
>>> COLD_NODE  -
>>> ioctl(cold) COLD_DATA  -
>>> extention list  -' -
>>>
>>> -- buffered_io
>>> WRITE_LIFE_EXTREME  COLD_DATA  -
>>> WRITE_LIFE_SHORTHOT_DATA   -
>>> WRITE_LIFE_NOT_SET  WARM_DATA  -
>>> WRITE_LIFE_NONE -' -
>>> WRITE_LIFE_MEDIUM   -' -
>>> WRITE_LIFE_LONG -' -
>>
>> Now we recommend direct_io if user wants to give IO hint for storage, I 
>> suspect
>> that user would suffer performance regression issue w/o buffered IO.
>>
>> Another problem is that, now, in Android, it will be very hard to prompt
>> application to migrate their IO pattern from buffered IO to direct IO, one
>> possible way is distinguishing user data lifetime from FWK, e.g. set
>> WRITE_LIFE_SH

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-27 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/28/2017 12:26 PM, Jaegeuk Kim wrote:
> On 12/23, Chao Yu wrote:
>> On 2017/12/15 10:06, Jaegeuk Kim wrote:
>>> On 12/14, Hyunchul Lee wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> I need your comment about the fs_iohint mount option.
>>>>
>>>> a) w/o fs_iohint, propagate user hints to low layer.
>>>> b) w/ fs_iohint, ignore user hints, and use hints which is generated
>>>> with F2FS.
>>>>
>>>> Chao suggests this option. because user hints are more accurate than
>>>> file system.
>>>>
>>>> This is resonable, But I have some concerns about this option. 
>>>> The first thing is that blocks of a segments have different hints. This
>>>> could make GC less effective. 
>>>> The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
>>>> really needed. I think that difference between them is a little ambigous 
>>>> for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
>>>> hints by F2FS.
>>>
>>> I think what we really can do would assign many user hints to our 3 DATA
>>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
>>> Then, we can decide how to keep that as much as possible, since we have
>>> another filesystem metadata such as meta and nodes. In addition, I don't
>>> think we have to keep the original user-hints which makes F2FS logs be
>>> messed up.
>>>
>>> With that mind, I can think of the below cases. Especially, if user wants
>>> to keep their io_hints, we'd better recommend to use direct_io w/o 
>>> fs_iohints.
>>
>>
>>
>>> In order to keep this policy, I think fs_iohints would be better to be a
>>> feature set by mkfs.f2fs and detected by sysfs entries for users.
>>>
>>> 1) w/ fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   WRITE_LIFE_MEDIUM
>>> HOT_NODE   WRITE_LIFE_NOTSET
>>> WARM_NODE  -'
>>> COLD_NODE  WRITE_LIFE_NONE
>>> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
>>> extention list  -' -'
>>> WRITE_LIFE_EXTREME  -' -'
>>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>>>
>>> -- buffered_io
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
>>> WRITE_LIFE_NONE -' -'
>>> WRITE_LIFE_MEDIUM   -' -'
>>> WRITE_LIFE_LONG -' -'
>>>
>>> -- direct_io (Not recommendable)
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
>>> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
>>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
>>
>> Agreed with above IO hint mapping rule.
>>
>>>
>>> 2) w/o fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   -
>>> HOT_NODE   -
>>> WARM_NODE  -
>>> COLD_NODE  -
>>> ioctl(cold) COLD_DATA  -
>>> extention list  -' -
>>>
>>> -- buffered_io
>>> WRITE_LIFE_EXTREME  COLD_DATA  -
>>> WRITE_LIFE_SHORTHOT_DATA   -
>>> WRITE_LIFE_NOT_SET  WARM_DATA  -
>>> WRITE_LIFE_NONE -' -
>>> WRITE_LIFE_MEDIUM   -' -
>>> WRITE_LIFE_LONG -' -
>>
>> Now we recommend direct_io if user wants to give IO hint for storage, I 
>> suspect
>> that user would suffer performance regression issue w/o buffered IO.
>>
>> Another problem is that, now, in Android, it will be very hard to prompt
>> application to migrate their IO pattern from buffered IO to direct IO, one
>> possible way is distinguishing user data lifetime from FWK, e.g. set
>> WRITE_LIFE_SH

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-17 Thread Hyunchul Lee
Hi Jaegeuk,

Agreed. If Chao agrees with this policy, I will implement it.

Thanks for the comment.

On 12/15/2017 11:06 AM, Jaegeuk Kim wrote:
> On 12/14, Hyunchul Lee wrote:
>> Hi Jaegeuk,
>>
>> I need your comment about the fs_iohint mount option.
>>
>> a) w/o fs_iohint, propagate user hints to low layer.
>> b) w/ fs_iohint, ignore user hints, and use hints which is generated
>> with F2FS.
>>
>> Chao suggests this option. because user hints are more accurate than
>> file system.
>>
>> This is resonable, But I have some concerns about this option. 
>> The first thing is that blocks of a segments have different hints. This
>> could make GC less effective. 
>> The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
>> really needed. I think that difference between them is a little ambigous 
>> for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
>> hints by F2FS.
> 
> I think what we really can do would assign many user hints to our 3 DATA
> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
> Then, we can decide how to keep that as much as possible, since we have
> another filesystem metadata such as meta and nodes. In addition, I don't
> think we have to keep the original user-hints which makes F2FS logs be
> messed up.
> 
> With that mind, I can think of the below cases. Especially, if user wants
> to keep their io_hints, we'd better recommend to use direct_io w/o fs_iohints.
> In order to keep this policy, I think fs_iohints would be better to be a
> feature set by mkfs.f2fs and detected by sysfs entries for users.
> 
> 1) w/ fs_iohints
> 
> UserF2FS   Block
> ---
> Meta   WRITE_LIFE_MEDIUM
> HOT_NODE   WRITE_LIFE_NOTSET
> WARM_NODE  -'
> COLD_NODE  WRITE_LIFE_NONE
> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
> extention list  -' -'
> WRITE_LIFE_EXTREME  -' -'
> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
> 
> -- buffered_io
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
> WRITE_LIFE_NONE -' -'
> WRITE_LIFE_MEDIUM   -' -'
> WRITE_LIFE_LONG -' -'
> 
> -- direct_io (Not recommendable)
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
> 
> 2) w/o fs_iohints
> 
> UserF2FS   Block
> ---
> Meta   -
> HOT_NODE   -
> WARM_NODE  -
> COLD_NODE  -
> ioctl(cold) COLD_DATA  -
> extention list  -' -
> 
> -- buffered_io
> WRITE_LIFE_EXTREME  COLD_DATA  -
> WRITE_LIFE_SHORTHOT_DATA   -
> WRITE_LIFE_NOT_SET  WARM_DATA  -
> WRITE_LIFE_NONE -' -
> WRITE_LIFE_MEDIUM   -' -
> WRITE_LIFE_LONG -' -
> 
> -- direct_io
> WRITE_LIFE_EXTREME  COLD_DATA  WRITE_LIFE_EXTREME
> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
> 
> 
> Note that, I don't much care about how to manipulate streamid in nvme driver
> in terms of LIFE_NONE or LIFE_NOTSET, since other drivers can handle them
> in different ways. Taking a look at the definition, at least, we don't need
> to assume that those are same at all. For example, if we can expolit this in
> UFS driver, we can pass all the stream ids to the device as context ids.
> 
> Thanks,
> 
>>
>> Thanks.
>>
>> On 12/12/2017 11:45 AM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/12/12 10:15, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 12/11/2017 10:15 PM, Chao Yu wro

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-17 Thread Hyunchul Lee
Hi Jaegeuk,

Agreed. If Chao agrees with this policy, I will implement it.

Thanks for the comment.

On 12/15/2017 11:06 AM, Jaegeuk Kim wrote:
> On 12/14, Hyunchul Lee wrote:
>> Hi Jaegeuk,
>>
>> I need your comment about the fs_iohint mount option.
>>
>> a) w/o fs_iohint, propagate user hints to low layer.
>> b) w/ fs_iohint, ignore user hints, and use hints which is generated
>> with F2FS.
>>
>> Chao suggests this option. because user hints are more accurate than
>> file system.
>>
>> This is resonable, But I have some concerns about this option. 
>> The first thing is that blocks of a segments have different hints. This
>> could make GC less effective. 
>> The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
>> really needed. I think that difference between them is a little ambigous 
>> for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
>> hints by F2FS.
> 
> I think what we really can do would assign many user hints to our 3 DATA
> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
> Then, we can decide how to keep that as much as possible, since we have
> another filesystem metadata such as meta and nodes. In addition, I don't
> think we have to keep the original user-hints which makes F2FS logs be
> messed up.
> 
> With that mind, I can think of the below cases. Especially, if user wants
> to keep their io_hints, we'd better recommend to use direct_io w/o fs_iohints.
> In order to keep this policy, I think fs_iohints would be better to be a
> feature set by mkfs.f2fs and detected by sysfs entries for users.
> 
> 1) w/ fs_iohints
> 
> UserF2FS   Block
> ---
> Meta   WRITE_LIFE_MEDIUM
> HOT_NODE   WRITE_LIFE_NOTSET
> WARM_NODE  -'
> COLD_NODE  WRITE_LIFE_NONE
> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
> extention list  -' -'
> WRITE_LIFE_EXTREME  -' -'
> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
> 
> -- buffered_io
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
> WRITE_LIFE_NONE -' -'
> WRITE_LIFE_MEDIUM   -' -'
> WRITE_LIFE_LONG -' -'
> 
> -- direct_io (Not recommendable)
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
> 
> 2) w/o fs_iohints
> 
> UserF2FS   Block
> ---
> Meta   -
> HOT_NODE   -
> WARM_NODE  -
> COLD_NODE  -
> ioctl(cold) COLD_DATA  -
> extention list  -' -
> 
> -- buffered_io
> WRITE_LIFE_EXTREME  COLD_DATA  -
> WRITE_LIFE_SHORTHOT_DATA   -
> WRITE_LIFE_NOT_SET  WARM_DATA  -
> WRITE_LIFE_NONE -' -
> WRITE_LIFE_MEDIUM   -' -
> WRITE_LIFE_LONG -' -
> 
> -- direct_io
> WRITE_LIFE_EXTREME  COLD_DATA  WRITE_LIFE_EXTREME
> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
> 
> 
> Note that, I don't much care about how to manipulate streamid in nvme driver
> in terms of LIFE_NONE or LIFE_NOTSET, since other drivers can handle them
> in different ways. Taking a look at the definition, at least, we don't need
> to assume that those are same at all. For example, if we can expolit this in
> UFS driver, we can pass all the stream ids to the device as context ids.
> 
> Thanks,
> 
>>
>> Thanks.
>>
>> On 12/12/2017 11:45 AM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/12/12 10:15, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 12/11/2017 10:15 P

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-13 Thread Hyunchul Lee
Hi Jaegeuk,

I need your comment about the fs_iohint mount option.

a) w/o fs_iohint, propagate user hints to low layer.
b) w/ fs_iohint, ignore user hints, and use hints which is generated
with F2FS.

Chao suggests this option. because user hints are more accurate than
file system.

This is resonable, But I have some concerns about this option. 
The first thing is that blocks of a segments have different hints. This
could make GC less effective. 
The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
really needed. I think that difference between them is a little ambigous 
for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
hints by F2FS.

Thanks.

On 12/12/2017 11:45 AM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/12/12 10:15, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 12/11/2017 10:15 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/12/1 16:28, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 11/30/2017 04:06 PM, Chao Yu wrote:
>>>>> Hi Hyunchul,
>>>>>
>>>>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>
>>>>>> This implements which hint is passed down to block layer
>>>>>> for datas from the specific segment type.
>>>>>>
>>>>>> segment type hints
>>>>>>  -
>>>>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>>>>> WARM_DATAWRITE_LIFE_NONE
>>>>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>>>>> HOT_DATA WRITE_LIFE_MEDIUM
>>>>>> META_DATAWRITE_LIFE_SHORT
>>>>>
>>>>> Just noticed, if our user do not give the hint via ioctl, f2fs can
>>>>> provider hint to lower layer according to hot/cold separation ability,
>>>>> it will be okay. But once user give his hint which may be more accurate
>>>>> than filesystem, hint converted by f2fs may be wrong.
>>>>>
>>>>> So what do you think of adding an option to control whether filesystem
>>>>> can convert hint user given?
>>>>>
>>>>
>>>> I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
>>>> converted to different hints.
>>>
>>> What I mean is introducing a mount option, e.g. fs_iohint,
>>> a) w/o fs_iohint, propagate file/inode io_hint to low layer.
>>> b) w/ fs_iohint, ignore file/inode io_hint, use io_hint which is generated
>>> with filesystem's private rule.
>>>
>>
>> Okay, I will implement this option and send this patch again.
> 
> Let's wait for Jaegeuk's comments first?
> 
>>
>> Without fs_iohint, Even if data blocks are moved due to GC, 
>> we should keep user hints. And if user hints are not given, 
>> any hints are not passed down to block layer, right?
> 
> Hmm.. that will be a problem, IMO, we can store last user's io_hint into inode
> layout, so later when we trigger GC, we can use the last io_hint in inode 
> rather
> than giving no hint or fs' hint.
> 
> I think it needs to discuss with original author of IO hint, what is the IO 
> hint
> policy when filesystem move block by itself after inode has been released in 
> system.
> 
> Thanks,
> 
>>
>> Thank you for comments.
>>
>>> Thanks,
>>>
>>>>
>>>> file hint  segment typeio hint
>>>> -  ---
>>>> LIFE_SHORT HOT_DATALIFE_MEDIUM
>>>> LIFE_MEDIUMWARM_DATA   LIFE_NONE
>>>> LIFE_LONG  WARM_DATA   LIFE_NONE
>>>> LIFE_EXTREME   COLD_DATA   LIFE_EXTREME
>>>>
>>>> the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
>>>> the same hint, LIFE_NONE. I am not sure that the seperation between 
>>>> LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
>>>> difference between them is a little ambigous for users, and if WARM_DATA 
>>>> segment has two different hints, it can makes GC non-efficient.
>>>>
>>>> I wonder your thought about this.
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Linux-f2fs-devel mailing list
>>>> linux-f2fs-de...@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>>
>>>
>>
>> .
>>
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 


Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-13 Thread Hyunchul Lee
Hi Jaegeuk,

I need your comment about the fs_iohint mount option.

a) w/o fs_iohint, propagate user hints to low layer.
b) w/ fs_iohint, ignore user hints, and use hints which is generated
with F2FS.

Chao suggests this option. because user hints are more accurate than
file system.

This is resonable, But I have some concerns about this option. 
The first thing is that blocks of a segments have different hints. This
could make GC less effective. 
The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
really needed. I think that difference between them is a little ambigous 
for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
hints by F2FS.

Thanks.

On 12/12/2017 11:45 AM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/12/12 10:15, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 12/11/2017 10:15 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/12/1 16:28, Hyunchul Lee wrote:
>>>> Hi Chao,
>>>>
>>>> On 11/30/2017 04:06 PM, Chao Yu wrote:
>>>>> Hi Hyunchul,
>>>>>
>>>>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee 
>>>>>>
>>>>>> This implements which hint is passed down to block layer
>>>>>> for datas from the specific segment type.
>>>>>>
>>>>>> segment type hints
>>>>>>  -
>>>>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>>>>> WARM_DATAWRITE_LIFE_NONE
>>>>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>>>>> HOT_DATA WRITE_LIFE_MEDIUM
>>>>>> META_DATAWRITE_LIFE_SHORT
>>>>>
>>>>> Just noticed, if our user do not give the hint via ioctl, f2fs can
>>>>> provider hint to lower layer according to hot/cold separation ability,
>>>>> it will be okay. But once user give his hint which may be more accurate
>>>>> than filesystem, hint converted by f2fs may be wrong.
>>>>>
>>>>> So what do you think of adding an option to control whether filesystem
>>>>> can convert hint user given?
>>>>>
>>>>
>>>> I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
>>>> converted to different hints.
>>>
>>> What I mean is introducing a mount option, e.g. fs_iohint,
>>> a) w/o fs_iohint, propagate file/inode io_hint to low layer.
>>> b) w/ fs_iohint, ignore file/inode io_hint, use io_hint which is generated
>>> with filesystem's private rule.
>>>
>>
>> Okay, I will implement this option and send this patch again.
> 
> Let's wait for Jaegeuk's comments first?
> 
>>
>> Without fs_iohint, Even if data blocks are moved due to GC, 
>> we should keep user hints. And if user hints are not given, 
>> any hints are not passed down to block layer, right?
> 
> Hmm.. that will be a problem, IMO, we can store last user's io_hint into inode
> layout, so later when we trigger GC, we can use the last io_hint in inode 
> rather
> than giving no hint or fs' hint.
> 
> I think it needs to discuss with original author of IO hint, what is the IO 
> hint
> policy when filesystem move block by itself after inode has been released in 
> system.
> 
> Thanks,
> 
>>
>> Thank you for comments.
>>
>>> Thanks,
>>>
>>>>
>>>> file hint  segment typeio hint
>>>> -  ---
>>>> LIFE_SHORT HOT_DATALIFE_MEDIUM
>>>> LIFE_MEDIUMWARM_DATA   LIFE_NONE
>>>> LIFE_LONG  WARM_DATA   LIFE_NONE
>>>> LIFE_EXTREME   COLD_DATA   LIFE_EXTREME
>>>>
>>>> the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
>>>> the same hint, LIFE_NONE. I am not sure that the seperation between 
>>>> LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
>>>> difference between them is a little ambigous for users, and if WARM_DATA 
>>>> segment has two different hints, it can makes GC non-efficient.
>>>>
>>>> I wonder your thought about this.
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Linux-f2fs-devel mailing list
>>>> linux-f2fs-de...@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>>
>>>
>>
>> .
>>
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 


Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-11 Thread Hyunchul Lee
Hi Chao,

On 12/11/2017 10:15 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/12/1 16:28, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 11/30/2017 04:06 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>
>>>> This implements which hint is passed down to block layer
>>>> for datas from the specific segment type.
>>>>
>>>> segment type hints
>>>>  -
>>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>>> WARM_DATAWRITE_LIFE_NONE
>>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>>> HOT_DATA WRITE_LIFE_MEDIUM
>>>> META_DATAWRITE_LIFE_SHORT
>>>
>>> Just noticed, if our user do not give the hint via ioctl, f2fs can
>>> provider hint to lower layer according to hot/cold separation ability,
>>> it will be okay. But once user give his hint which may be more accurate
>>> than filesystem, hint converted by f2fs may be wrong.
>>>
>>> So what do you think of adding an option to control whether filesystem
>>> can convert hint user given?
>>>
>>
>> I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
>> converted to different hints.
> 
> What I mean is introducing a mount option, e.g. fs_iohint,
> a) w/o fs_iohint, propagate file/inode io_hint to low layer.
> b) w/ fs_iohint, ignore file/inode io_hint, use io_hint which is generated
> with filesystem's private rule.
> 

Okay, I will implement this option and send this patch again.

Without fs_iohint, Even if data blocks are moved due to GC, 
we should keep user hints. And if user hints are not given, 
any hints are not passed down to block layer, right?

Thank you for comments.

> Thanks,
> 
>>
>> file hint  segment typeio hint
>> -  ---
>> LIFE_SHORT HOT_DATALIFE_MEDIUM
>> LIFE_MEDIUMWARM_DATA   LIFE_NONE
>> LIFE_LONG  WARM_DATA   LIFE_NONE
>> LIFE_EXTREME   COLD_DATA   LIFE_EXTREME
>>
>> the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
>> the same hint, LIFE_NONE. I am not sure that the seperation between 
>> LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
>> difference between them is a little ambigous for users, and if WARM_DATA 
>> segment has two different hints, it can makes GC non-efficient.
>>
>> I wonder your thought about this.
>>
>> Thanks.
>>
>>> Thanks,
>>>
>>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Linux-f2fs-devel mailing list
>> linux-f2fs-de...@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>
> 


Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-11 Thread Hyunchul Lee
Hi Chao,

On 12/11/2017 10:15 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/12/1 16:28, Hyunchul Lee wrote:
>> Hi Chao,
>>
>> On 11/30/2017 04:06 PM, Chao Yu wrote:
>>> Hi Hyunchul,
>>>
>>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee 
>>>>
>>>> This implements which hint is passed down to block layer
>>>> for datas from the specific segment type.
>>>>
>>>> segment type hints
>>>>  -
>>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>>> WARM_DATAWRITE_LIFE_NONE
>>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>>> HOT_DATA WRITE_LIFE_MEDIUM
>>>> META_DATAWRITE_LIFE_SHORT
>>>
>>> Just noticed, if our user do not give the hint via ioctl, f2fs can
>>> provider hint to lower layer according to hot/cold separation ability,
>>> it will be okay. But once user give his hint which may be more accurate
>>> than filesystem, hint converted by f2fs may be wrong.
>>>
>>> So what do you think of adding an option to control whether filesystem
>>> can convert hint user given?
>>>
>>
>> I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
>> converted to different hints.
> 
> What I mean is introducing a mount option, e.g. fs_iohint,
> a) w/o fs_iohint, propagate file/inode io_hint to low layer.
> b) w/ fs_iohint, ignore file/inode io_hint, use io_hint which is generated
> with filesystem's private rule.
> 

Okay, I will implement this option and send this patch again.

Without fs_iohint, Even if data blocks are moved due to GC, 
we should keep user hints. And if user hints are not given, 
any hints are not passed down to block layer, right?

Thank you for comments.

> Thanks,
> 
>>
>> file hint  segment typeio hint
>> -  ---
>> LIFE_SHORT HOT_DATALIFE_MEDIUM
>> LIFE_MEDIUMWARM_DATA   LIFE_NONE
>> LIFE_LONG  WARM_DATA   LIFE_NONE
>> LIFE_EXTREME   COLD_DATA   LIFE_EXTREME
>>
>> the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
>> the same hint, LIFE_NONE. I am not sure that the seperation between 
>> LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
>> difference between them is a little ambigous for users, and if WARM_DATA 
>> segment has two different hints, it can makes GC non-efficient.
>>
>> I wonder your thought about this.
>>
>> Thanks.
>>
>>> Thanks,
>>>
>>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Linux-f2fs-devel mailing list
>> linux-f2fs-de...@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>
> 


Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-01 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/01/2017 04:28 PM, Jaegeuk Kim wrote:
> On 11/30, Chao Yu wrote:
>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>> From: Hyunchul Lee <cheol@lge.com>
>>>
>>> This implements which hint is passed down to block layer
>>> for datas from the specific segment type.
>>>
>>> segment type hints
>>>  -
>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>> WARM_DATAWRITE_LIFE_NONE
>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>> HOT_DATA WRITE_LIFE_MEDIUM
>>> META_DATAWRITE_LIFE_SHORT
>>
>> The correspondence looks good to me.
> 
> Still, I don't fully get the point. Why do we have to assign WRITE_LIFE_NONE
> to WARM_DATA? Why not using WRITE_LIFE_NOT_SET?
> 

I think LIFE_NONE and LIFE_NOT_SET are the same. So I chose LIFE_NONE simply.

Thanks.

>>>
>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>> implement this patch.
>>>
>>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>>> ---
>>>  fs/f2fs/data.c|  1 +
>>>  fs/f2fs/f2fs.h|  2 ++
>>>  fs/f2fs/segment.c | 29 +
>>>  fs/f2fs/super.c   |  2 ++
>>>  4 files changed, 34 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index a7ae418..0919a43 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>>> }
>>> io->bio = __bio_alloc(sbi, fio->new_blkaddr,
>>> BIO_MAX_PAGES, false);
>>> +   io->bio->bi_write_hint = io->write_hint;
>>
>> Need to assign bi_write_hint for IPU path?
>>
>> - rewrite_data_page
>>  - f2fs_submit_page_bio
>>
>>> io->fio = *fio;
>>> }
>>>  
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 7bcd148..be3cb0c 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -969,6 +969,7 @@ struct f2fs_bio_info {
>>> struct rw_semaphore io_rwsem;   /* blocking op for bio */
>>> spinlock_t io_lock; /* serialize DATA/NODE IOs */
>>> struct list_head io_list;   /* track fios */
>>> +   enum rw_hint write_hint;
>>
>> Add missing comment?
>>
>>>  };
>>>  
>>>  #define FDEV(i)(sbi->devs[i])
>>> @@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
>>> *journal, int type,
>>>  int __init create_segment_manager_caches(void);
>>>  void destroy_segment_manager_caches(void);
>>>  int rw_hint_to_seg_type(enum rw_hint hint);
>>> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
>>>  
>>>  /*
>>>   * checkpoint.c
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index c117e09..0570db7 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
>>> }
>>>  }
>>>  
>>
>> It will be better to copy commit log here to declare correspondence
>> more clearly.
>>
>> Thanks,
>>
>>> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
>>> +{
>>> +   if (page == DATA) {
>>> +   switch (temp) {
>>> +   case WARM:
>>> +   return WRITE_LIFE_NONE;
>>> +   case COLD:
>>> +   return WRITE_LIFE_EXTREME;
>>> +   case HOT:
>>> +   return WRITE_LIFE_MEDIUM;
>>> +   default:
>>> +   return WRITE_LIFE_NOT_SET;
>>> +   }
>>> +   } else if (page == NODE) {
>>> +   switch (temp) {
>>> +   case WARM:
>>> +   case HOT:
>>> +   return WRITE_LIFE_LONG;
>>> +   case COLD:
>>> +   return WRITE_LIFE_EXTREME;
>>> +   default:
>>> +   return WRITE_LIFE_NOT_SET;
>>> +   }
>>> +
>>> +   } else {
>>> +   return WRITE_LIFE_SHORT;
>>> +   }
>>> +}
>>> +
>>>  static int __get_segment_type_2(struct f2fs_io_info *fio)
>>>  {
>>> if (fio->type == DATA)
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index a6c5dd4..51c19a0 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, 
>>> void *data, int silent)
>>> sbi->write_io[i][j].bio = NULL;
>>> spin_lock_init(>write_io[i][j].io_lock);
>>> INIT_LIST_HEAD(>write_io[i][j].io_list);
>>> +   sbi->write_io[i][j].write_hint =
>>> +   io_type_to_rw_hint(i, j);
>>> }
>>> }
>>>  
>>>
> 


Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-01 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/01/2017 04:28 PM, Jaegeuk Kim wrote:
> On 11/30, Chao Yu wrote:
>> On 2017/11/28 8:23, Hyunchul Lee wrote:
>>> From: Hyunchul Lee 
>>>
>>> This implements which hint is passed down to block layer
>>> for datas from the specific segment type.
>>>
>>> segment type hints
>>>  -
>>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>>> WARM_DATAWRITE_LIFE_NONE
>>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>>> HOT_DATA WRITE_LIFE_MEDIUM
>>> META_DATAWRITE_LIFE_SHORT
>>
>> The correspondence looks good to me.
> 
> Still, I don't fully get the point. Why do we have to assign WRITE_LIFE_NONE
> to WARM_DATA? Why not using WRITE_LIFE_NOT_SET?
> 

I think LIFE_NONE and LIFE_NOT_SET are the same. So I chose LIFE_NONE simply.

Thanks.

>>>
>>> Many thanks to Chao Yu and Jaegeuk Kim for comments to
>>> implement this patch.
>>>
>>> Signed-off-by: Hyunchul Lee 
>>> ---
>>>  fs/f2fs/data.c|  1 +
>>>  fs/f2fs/f2fs.h|  2 ++
>>>  fs/f2fs/segment.c | 29 +
>>>  fs/f2fs/super.c   |  2 ++
>>>  4 files changed, 34 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index a7ae418..0919a43 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>>> }
>>> io->bio = __bio_alloc(sbi, fio->new_blkaddr,
>>> BIO_MAX_PAGES, false);
>>> +   io->bio->bi_write_hint = io->write_hint;
>>
>> Need to assign bi_write_hint for IPU path?
>>
>> - rewrite_data_page
>>  - f2fs_submit_page_bio
>>
>>> io->fio = *fio;
>>> }
>>>  
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 7bcd148..be3cb0c 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -969,6 +969,7 @@ struct f2fs_bio_info {
>>> struct rw_semaphore io_rwsem;   /* blocking op for bio */
>>> spinlock_t io_lock; /* serialize DATA/NODE IOs */
>>> struct list_head io_list;   /* track fios */
>>> +   enum rw_hint write_hint;
>>
>> Add missing comment?
>>
>>>  };
>>>  
>>>  #define FDEV(i)(sbi->devs[i])
>>> @@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
>>> *journal, int type,
>>>  int __init create_segment_manager_caches(void);
>>>  void destroy_segment_manager_caches(void);
>>>  int rw_hint_to_seg_type(enum rw_hint hint);
>>> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
>>>  
>>>  /*
>>>   * checkpoint.c
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index c117e09..0570db7 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
>>> }
>>>  }
>>>  
>>
>> It will be better to copy commit log here to declare correspondence
>> more clearly.
>>
>> Thanks,
>>
>>> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
>>> +{
>>> +   if (page == DATA) {
>>> +   switch (temp) {
>>> +   case WARM:
>>> +   return WRITE_LIFE_NONE;
>>> +   case COLD:
>>> +   return WRITE_LIFE_EXTREME;
>>> +   case HOT:
>>> +   return WRITE_LIFE_MEDIUM;
>>> +   default:
>>> +   return WRITE_LIFE_NOT_SET;
>>> +   }
>>> +   } else if (page == NODE) {
>>> +   switch (temp) {
>>> +   case WARM:
>>> +   case HOT:
>>> +   return WRITE_LIFE_LONG;
>>> +   case COLD:
>>> +   return WRITE_LIFE_EXTREME;
>>> +   default:
>>> +   return WRITE_LIFE_NOT_SET;
>>> +   }
>>> +
>>> +   } else {
>>> +   return WRITE_LIFE_SHORT;
>>> +   }
>>> +}
>>> +
>>>  static int __get_segment_type_2(struct f2fs_io_info *fio)
>>>  {
>>> if (fio->type == DATA)
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index a6c5dd4..51c19a0 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, 
>>> void *data, int silent)
>>> sbi->write_io[i][j].bio = NULL;
>>> spin_lock_init(>write_io[i][j].io_lock);
>>> INIT_LIST_HEAD(>write_io[i][j].io_list);
>>> +   sbi->write_io[i][j].write_hint =
>>> +   io_type_to_rw_hint(i, j);
>>> }
>>> }
>>>  
>>>
> 


Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-01 Thread Hyunchul Lee
Hi Chao,

On 11/30/2017 04:06 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/11/28 8:23, Hyunchul Lee wrote:
>> From: Hyunchul Lee <cheol@lge.com>
>>
>> This implements which hint is passed down to block layer
>> for datas from the specific segment type.
>>
>> segment type hints
>>  -
>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>> WARM_DATAWRITE_LIFE_NONE
>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>> HOT_DATA WRITE_LIFE_MEDIUM
>> META_DATAWRITE_LIFE_SHORT
> 
> Just noticed, if our user do not give the hint via ioctl, f2fs can
> provider hint to lower layer according to hot/cold separation ability,
> it will be okay. But once user give his hint which may be more accurate
> than filesystem, hint converted by f2fs may be wrong.
> 
> So what do you think of adding an option to control whether filesystem
> can convert hint user given?
> 

I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
converted to different hints.

file hint  segment typeio hint
-  ---
LIFE_SHORT HOT_DATALIFE_MEDIUM
LIFE_MEDIUMWARM_DATA   LIFE_NONE
LIFE_LONG  WARM_DATA   LIFE_NONE
LIFE_EXTREME   COLD_DATA   LIFE_EXTREME

the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
the same hint, LIFE_NONE. I am not sure that the seperation between 
LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
difference between them is a little ambigous for users, and if WARM_DATA 
segment has two different hints, it can makes GC non-efficient.

I wonder your thought about this.

Thanks.

> Thanks,
> 
> 


Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-01 Thread Hyunchul Lee
Hi Chao,

On 11/30/2017 04:06 PM, Chao Yu wrote:
> Hi Hyunchul,
> 
> On 2017/11/28 8:23, Hyunchul Lee wrote:
>> From: Hyunchul Lee 
>>
>> This implements which hint is passed down to block layer
>> for datas from the specific segment type.
>>
>> segment type hints
>>  -
>> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
>> WARM_DATAWRITE_LIFE_NONE
>> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
>> HOT_DATA WRITE_LIFE_MEDIUM
>> META_DATAWRITE_LIFE_SHORT
> 
> Just noticed, if our user do not give the hint via ioctl, f2fs can
> provider hint to lower layer according to hot/cold separation ability,
> it will be okay. But once user give his hint which may be more accurate
> than filesystem, hint converted by f2fs may be wrong.
> 
> So what do you think of adding an option to control whether filesystem
> can convert hint user given?
> 

I think it is okay for LIFE_SHORT and LIFE_EXTREME. because they are 
converted to different hints.

file hint  segment typeio hint
-  ---
LIFE_SHORT HOT_DATALIFE_MEDIUM
LIFE_MEDIUMWARM_DATA   LIFE_NONE
LIFE_LONG  WARM_DATA   LIFE_NONE
LIFE_EXTREME   COLD_DATA   LIFE_EXTREME

the problem is that LIFE_MEDIUM and LIFE_LONG are converted to 
the same hint, LIFE_NONE. I am not sure that the seperation between 
LIFE_MEDIUM and LIFE_LONG is really needed. Because I guess that the 
difference between them is a little ambigous for users, and if WARM_DATA 
segment has two different hints, it can makes GC non-efficient.

I wonder your thought about this.

Thanks.

> Thanks,
> 
> 


[PATCH v2] f2fs: apply write hints to select the type of segment for direct write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
v2: 
- Add a new member, m_seg_type to struct f2fs_map_blocks.
- Assign the segment type to m_seg_type and pass it to f2fs_map_blocks().

 fs/f2fs/data.c | 26 ++
 fs/f2fs/f2fs.h |  2 ++
 fs/f2fs/file.c |  6 --
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 516fa0d..a7ae418 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
return page;
 }
 
-static int __allocate_data_block(struct dnode_of_data *dn)
+static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_summary sum;
@@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
set_summary(, dn->nid, dn->ofs_in_node, ni.version);
 
allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
-   , CURSEG_WARM_DATA, NULL, false);
+   , seg_type, NULL, false);
set_data_blkaddr(dn);
 
/* update i_size */
@@ -851,12 +851,15 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
map.m_len = 0;
 
map.m_next_pgofs = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
 
-   if (iocb->ki_flags & IOCB_DIRECT)
+   if (iocb->ki_flags & IOCB_DIRECT) {
+   map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
return f2fs_map_blocks(inode, , 1,
__force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO);
+   }
if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
err = f2fs_convert_inline_inode(inode);
if (err)
@@ -960,7 +963,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
last_ofs_in_node = dn.ofs_in_node;
}
} else {
-   err = __allocate_data_block();
+   err = __allocate_data_block(,
+   map->m_seg_type);
if (!err)
set_inode_flag(inode, FI_APPEND_WRITE);
}
@@ -1053,7 +1057,7 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
 
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
-   pgoff_t *next_pgofs)
+   pgoff_t *next_pgofs, int seg_type)
 {
struct f2fs_map_blocks map;
int err;
@@ -1061,6 +1065,7 @@ static int __get_data_block(struct inode *inode, sector_t 
iblock,
map.m_lblk = iblock;
map.m_len = bh->b_size >> inode->i_blkbits;
map.m_next_pgofs = next_pgofs;
+   map.m_seg_type = seg_type;
 
err = f2fs_map_blocks(inode, , create, flag);
if (!err) {
@@ -1076,14 +1081,17 @@ static int get_data_block(struct inode *inode, sector_t 
iblock,
pgoff_t *next_pgofs)
 {
return __get_data_block(inode, iblock, bh_result, create,
-   flag, next_pgofs);
+   flag, next_pgofs,
+   NO_CHECK_TYPE);
 }
 
 static int get_data_block_dio(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
return __get_data_block(inode, iblock, bh_result, create,
-   F2FS_GET_BLOCK_DEFAULT, NULL);
+   F2FS_GET_BLOCK_DEFAULT, NULL,
+   rw_hint_to_seg_type(
+   inode->i_write_hint));
 }
 
 static int get_data_block_bmap(struct inode *inode, sector_t iblock,
@@ -1094,7 +1102,8 @@ static int get_data_block_bmap(struct inode *inode, 
sector_t iblock,
return -EFBIG;
 
return __get_data_block(inode, iblock, bh_result, create,
-   F2FS_GET_BLOCK_BMAP, NULL);
+   F2FS_GET_BLOCK_BMAP, NULL,
+   NO_CHECK_TYPE);
 }
 
 static inline sector_t logical_to_blk(struct inode *inode, loff_t 

[PATCH 2/2] f2fs: pass down write hints to block layer for direct write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This implements which hint is passed down to block layer
for direct write.

 (allocated
file hintsegment type)   io hint
--   
WRITE_LIFE_SHORT HOT_DATAWRITE_LIFE_MEDIUM
WRITE_LIFE_EXTREME   COLD_DATA   WRITE_LIFE_EXTREME
others   WARM_DATA   WRITE_LIFE_NONE

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/data.c|  6 ++
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 12 
 3 files changed, 19 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 0919a43..eabd569 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2093,6 +2093,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint orig_hint;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2103,10 +2104,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
+   orig_hint = iocb->ki_hint;
+   iocb->ki_hint = file_rwhint_to_io_rwhint(iocb->ki_hint);
+
down_read(_I(inode)->dio_rwsem[rw]);
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
+   iocb->ki_hint = orig_hint;
+
if (rw == WRITE) {
if (err > 0) {
f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index be3cb0c..426625a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2676,6 +2676,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
*journal, int type,
 void destroy_segment_manager_caches(void);
 int rw_hint_to_seg_type(enum rw_hint hint);
 enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
+enum rw_hint file_rwhint_to_io_rwhint(enum rw_hint hint);
 
 /*
  * checkpoint.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0570db7..b4496a5 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2475,6 +2475,18 @@ enum rw_hint io_type_to_rw_hint(enum page_type page, 
enum temp_type temp)
}
 }
 
+enum rw_hint file_rwhint_to_io_rwhint(enum rw_hint hint)
+{
+   switch(hint) {
+   case WRITE_LIFE_SHORT:
+   return WRITE_LIFE_MEDIUM;
+   case WRITE_LIFE_EXTREME:
+   return WRITE_LIFE_EXTREME;
+   default:
+   return WRITE_LIFE_NONE;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
-- 
1.9.1



[PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

This implements which hint is passed down to block layer
for datas from the specific segment type.

segment type hints
 -
COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
WARM_DATAWRITE_LIFE_NONE
HOT_NODE & WARM_NODE WRITE_LIFE_LONG
HOT_DATA WRITE_LIFE_MEDIUM
META_DATAWRITE_LIFE_SHORT

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/data.c|  1 +
 fs/f2fs/f2fs.h|  2 ++
 fs/f2fs/segment.c | 29 +
 fs/f2fs/super.c   |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a7ae418..0919a43 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr,
BIO_MAX_PAGES, false);
+   io->bio->bi_write_hint = io->write_hint;
io->fio = *fio;
}
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7bcd148..be3cb0c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -969,6 +969,7 @@ struct f2fs_bio_info {
struct rw_semaphore io_rwsem;   /* blocking op for bio */
spinlock_t io_lock; /* serialize DATA/NODE IOs */
struct list_head io_list;   /* track fios */
+   enum rw_hint write_hint;
 };
 
 #define FDEV(i)(sbi->devs[i])
@@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
*journal, int type,
 int __init create_segment_manager_caches(void);
 void destroy_segment_manager_caches(void);
 int rw_hint_to_seg_type(enum rw_hint hint);
+enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
 
 /*
  * checkpoint.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c117e09..0570db7 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
}
 }
 
+enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
+{
+   if (page == DATA) {
+   switch (temp) {
+   case WARM:
+   return WRITE_LIFE_NONE;
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   case HOT:
+   return WRITE_LIFE_MEDIUM;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+   } else if (page == NODE) {
+   switch (temp) {
+   case WARM:
+   case HOT:
+   return WRITE_LIFE_LONG;
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+
+   } else {
+   return WRITE_LIFE_SHORT;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a6c5dd4..51c19a0 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->write_io[i][j].bio = NULL;
spin_lock_init(>write_io[i][j].io_lock);
INIT_LIST_HEAD(>write_io[i][j].io_list);
+   sbi->write_io[i][j].write_hint =
+   io_type_to_rw_hint(i, j);
}
}
 
-- 
1.9.1



[PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee 

This implements which hint is passed down to block layer
for datas from the specific segment type.

segment type hints
 -
COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
WARM_DATAWRITE_LIFE_NONE
HOT_NODE & WARM_NODE WRITE_LIFE_LONG
HOT_DATA WRITE_LIFE_MEDIUM
META_DATAWRITE_LIFE_SHORT

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/data.c|  1 +
 fs/f2fs/f2fs.h|  2 ++
 fs/f2fs/segment.c | 29 +
 fs/f2fs/super.c   |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a7ae418..0919a43 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
}
io->bio = __bio_alloc(sbi, fio->new_blkaddr,
BIO_MAX_PAGES, false);
+   io->bio->bi_write_hint = io->write_hint;
io->fio = *fio;
}
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7bcd148..be3cb0c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -969,6 +969,7 @@ struct f2fs_bio_info {
struct rw_semaphore io_rwsem;   /* blocking op for bio */
spinlock_t io_lock; /* serialize DATA/NODE IOs */
struct list_head io_list;   /* track fios */
+   enum rw_hint write_hint;
 };
 
 #define FDEV(i)(sbi->devs[i])
@@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
*journal, int type,
 int __init create_segment_manager_caches(void);
 void destroy_segment_manager_caches(void);
 int rw_hint_to_seg_type(enum rw_hint hint);
+enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
 
 /*
  * checkpoint.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c117e09..0570db7 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
}
 }
 
+enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
+{
+   if (page == DATA) {
+   switch (temp) {
+   case WARM:
+   return WRITE_LIFE_NONE;
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   case HOT:
+   return WRITE_LIFE_MEDIUM;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+   } else if (page == NODE) {
+   switch (temp) {
+   case WARM:
+   case HOT:
+   return WRITE_LIFE_LONG;
+   case COLD:
+   return WRITE_LIFE_EXTREME;
+   default:
+   return WRITE_LIFE_NOT_SET;
+   }
+
+   } else {
+   return WRITE_LIFE_SHORT;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a6c5dd4..51c19a0 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->write_io[i][j].bio = NULL;
spin_lock_init(>write_io[i][j].io_lock);
INIT_LIST_HEAD(>write_io[i][j].io_list);
+   sbi->write_io[i][j].write_hint =
+   io_type_to_rw_hint(i, j);
}
}
 
-- 
1.9.1



[PATCH v2] f2fs: apply write hints to select the type of segment for direct write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee 

When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.

Signed-off-by: Hyunchul Lee 
---
v2: 
- Add a new member, m_seg_type to struct f2fs_map_blocks.
- Assign the segment type to m_seg_type and pass it to f2fs_map_blocks().

 fs/f2fs/data.c | 26 ++
 fs/f2fs/f2fs.h |  2 ++
 fs/f2fs/file.c |  6 --
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 516fa0d..a7ae418 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
return page;
 }
 
-static int __allocate_data_block(struct dnode_of_data *dn)
+static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_summary sum;
@@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
set_summary(, dn->nid, dn->ofs_in_node, ni.version);
 
allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
-   , CURSEG_WARM_DATA, NULL, false);
+   , seg_type, NULL, false);
set_data_blkaddr(dn);
 
/* update i_size */
@@ -851,12 +851,15 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
map.m_len = 0;
 
map.m_next_pgofs = NULL;
+   map.m_seg_type = NO_CHECK_TYPE;
 
-   if (iocb->ki_flags & IOCB_DIRECT)
+   if (iocb->ki_flags & IOCB_DIRECT) {
+   map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint);
return f2fs_map_blocks(inode, , 1,
__force_buffered_io(inode, WRITE) ?
F2FS_GET_BLOCK_PRE_AIO :
F2FS_GET_BLOCK_PRE_DIO);
+   }
if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
err = f2fs_convert_inline_inode(inode);
if (err)
@@ -960,7 +963,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
last_ofs_in_node = dn.ofs_in_node;
}
} else {
-   err = __allocate_data_block();
+   err = __allocate_data_block(,
+   map->m_seg_type);
if (!err)
set_inode_flag(inode, FI_APPEND_WRITE);
}
@@ -1053,7 +1057,7 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
 
 static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
-   pgoff_t *next_pgofs)
+   pgoff_t *next_pgofs, int seg_type)
 {
struct f2fs_map_blocks map;
int err;
@@ -1061,6 +1065,7 @@ static int __get_data_block(struct inode *inode, sector_t 
iblock,
map.m_lblk = iblock;
map.m_len = bh->b_size >> inode->i_blkbits;
map.m_next_pgofs = next_pgofs;
+   map.m_seg_type = seg_type;
 
err = f2fs_map_blocks(inode, , create, flag);
if (!err) {
@@ -1076,14 +1081,17 @@ static int get_data_block(struct inode *inode, sector_t 
iblock,
pgoff_t *next_pgofs)
 {
return __get_data_block(inode, iblock, bh_result, create,
-   flag, next_pgofs);
+   flag, next_pgofs,
+   NO_CHECK_TYPE);
 }
 
 static int get_data_block_dio(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
return __get_data_block(inode, iblock, bh_result, create,
-   F2FS_GET_BLOCK_DEFAULT, NULL);
+   F2FS_GET_BLOCK_DEFAULT, NULL,
+   rw_hint_to_seg_type(
+   inode->i_write_hint));
 }
 
 static int get_data_block_bmap(struct inode *inode, sector_t iblock,
@@ -1094,7 +1102,8 @@ static int get_data_block_bmap(struct inode *inode, 
sector_t iblock,
return -EFBIG;
 
return __get_data_block(inode, iblock, bh_result, create,
-   F2FS_GET_BLOCK_BMAP, NULL);
+   F2FS_GET_BLOCK_BMAP, NULL,
+   NO_CHECK_TYPE);
 }
 
 static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
@@ -1214,6 +1223,7 @@ static int f2fs_mpage_readpag

[PATCH 2/2] f2fs: pass down write hints to block layer for direct write

2017-11-27 Thread Hyunchul Lee
From: Hyunchul Lee 

This implements which hint is passed down to block layer
for direct write.

 (allocated
file hintsegment type)   io hint
--   
WRITE_LIFE_SHORT HOT_DATAWRITE_LIFE_MEDIUM
WRITE_LIFE_EXTREME   COLD_DATA   WRITE_LIFE_EXTREME
others   WARM_DATA   WRITE_LIFE_NONE

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/data.c|  6 ++
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 12 
 3 files changed, 19 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 0919a43..eabd569 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2093,6 +2093,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+   enum rw_hint orig_hint;
 
err = check_direct_IO(inode, iter, offset);
if (err)
@@ -2103,10 +2104,15 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
 
+   orig_hint = iocb->ki_hint;
+   iocb->ki_hint = file_rwhint_to_io_rwhint(iocb->ki_hint);
+
down_read(_I(inode)->dio_rwsem[rw]);
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
up_read(_I(inode)->dio_rwsem[rw]);
 
+   iocb->ki_hint = orig_hint;
+
if (rw == WRITE) {
if (err > 0) {
f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index be3cb0c..426625a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2676,6 +2676,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
*journal, int type,
 void destroy_segment_manager_caches(void);
 int rw_hint_to_seg_type(enum rw_hint hint);
 enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
+enum rw_hint file_rwhint_to_io_rwhint(enum rw_hint hint);
 
 /*
  * checkpoint.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0570db7..b4496a5 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2475,6 +2475,18 @@ enum rw_hint io_type_to_rw_hint(enum page_type page, 
enum temp_type temp)
}
 }
 
+enum rw_hint file_rwhint_to_io_rwhint(enum rw_hint hint)
+{
+   switch(hint) {
+   case WRITE_LIFE_SHORT:
+   return WRITE_LIFE_MEDIUM;
+   case WRITE_LIFE_EXTREME:
+   return WRITE_LIFE_EXTREME;
+   default:
+   return WRITE_LIFE_NONE;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
-- 
1.9.1



Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-19 Thread Hyunchul Lee
On 11/18/2017 03:53 AM, Jaegeuk Kim wrote:
> ...
>>>>>>>>>>>>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time 
>>>>>>>>>>>>>>>>> of the data
>>>>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints 
>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>>>>>>   2) the hints that will be passed down to devices with the 
>>>>>>>>>>>>>>>>> data of segments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set implements the first mapping from write hints 
>>>>>>>>>>>>>>>>> to segment types
>>>>>>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over 
>>>>>>>>>>>>>>>>> this hints, And
>>>>>>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is 
>>>>>>>>>>>>>>>> existing?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am afraid that this makes side effects. for example, this 
>>>>>>>>>>>>>>> could cause
>>>>>>>>>>>>>>> out-of-place updates even when there are not enough free 
>>>>>>>>>>>>>>> segments. 
>>>>>>>>>>>>>>> I can write the patch that handles these situations. But I 
>>>>>>>>>>>>>>> wonder 
>>>>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can 
>>>>>>>>>>>>>>> be disabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>>>>>>> filesystem
>>>>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it 
>>>>>>>>>>>>>> will be okay
>>>>>>>>>>>>>> to not consider it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Before the second mapping is implemented, write hints are not 
>>>>>>>>>>>>>>>>> passed down
>>>>>>>>>>>>>>>>> to devices. Because it is better that the data of a segment 
>>>>>>>>>>>>>>>>> have the same 
>>>>>>

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-19 Thread Hyunchul Lee
On 11/18/2017 03:53 AM, Jaegeuk Kim wrote:
> ...
>>>>>>>>>>>>>>>>> From: Hyunchul Lee 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time 
>>>>>>>>>>>>>>>>> of the data
>>>>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints 
>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>>>>>>   2) the hints that will be passed down to devices with the 
>>>>>>>>>>>>>>>>> data of segments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set implements the first mapping from write hints 
>>>>>>>>>>>>>>>>> to segment types
>>>>>>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over 
>>>>>>>>>>>>>>>>> this hints, And
>>>>>>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is 
>>>>>>>>>>>>>>>> existing?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am afraid that this makes side effects. for example, this 
>>>>>>>>>>>>>>> could cause
>>>>>>>>>>>>>>> out-of-place updates even when there are not enough free 
>>>>>>>>>>>>>>> segments. 
>>>>>>>>>>>>>>> I can write the patch that handles these situations. But I 
>>>>>>>>>>>>>>> wonder 
>>>>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can 
>>>>>>>>>>>>>>> be disabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>>>>>>> filesystem
>>>>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it 
>>>>>>>>>>>>>> will be okay
>>>>>>>>>>>>>> to not consider it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Before the second mapping is implemented, write hints are not 
>>>>>>>>>>>>>>>>> passed down
>>>>>>>>>>>>>>>>> to devices. Because it is better that the data of a segment 
>>>>>>>>>>>>>>>>> have the same 
>>>>>>>>>>>>>&g

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-15 Thread Hyunchul Lee
On 11/16/2017 12:58 PM, Jaegeuk Kim wrote:
> On 11/16, Chao Yu wrote:
>> On 2017/11/16 8:56, Hyunchul Lee wrote:
>>>
>>> On 11/16/2017 01:27 AM, Jaegeuk Kim wrote:
>>>> On 11/14, Chao Yu wrote:
>>>>> On 2017/11/14 12:20, Jaegeuk Kim wrote:
>>>>>> On 11/13, Hyunchul Lee wrote:
>>>>>>> On 11/13/2017 10:59 AM, Chao Yu wrote:
>>>>>>>> On 2017/11/13 9:35, Hyunchul Lee wrote:
>>>>>>>>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>>>>>>>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>>>>>>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>>>>>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>>>>>>>>> Hello, Chao
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>>>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>>>>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time of 
>>>>>>>>>>>>>>> the data
>>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints 
>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>>>>   2) the hints that will be passed down to devices with the 
>>>>>>>>>>>>>>> data of segments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>>>>>>>>> segment types
>>>>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this 
>>>>>>>>>>>>>>> hints, And
>>>>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is 
>>>>>>>>>>>>>> existing?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am afraid that this makes side effects. for example, this could 
>>>>>>>>>>>>> cause
>>>>>>>>>>>>> out-of-place updates even when there are not enough free 
>>>>>>>>>>>>> segments. 
>>>>>>>>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>>>>>>>>> disabled.
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>>>>> filesystem
>>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it will 
>>>>>>>>>>>> be okay
>>>>>>>>>>>&

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-15 Thread Hyunchul Lee
On 11/16/2017 12:58 PM, Jaegeuk Kim wrote:
> On 11/16, Chao Yu wrote:
>> On 2017/11/16 8:56, Hyunchul Lee wrote:
>>>
>>> On 11/16/2017 01:27 AM, Jaegeuk Kim wrote:
>>>> On 11/14, Chao Yu wrote:
>>>>> On 2017/11/14 12:20, Jaegeuk Kim wrote:
>>>>>> On 11/13, Hyunchul Lee wrote:
>>>>>>> On 11/13/2017 10:59 AM, Chao Yu wrote:
>>>>>>>> On 2017/11/13 9:35, Hyunchul Lee wrote:
>>>>>>>>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>>>>>>>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>>>>>>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>>>>>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>>>>>>>>> Hello, Chao
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>>>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>>>>>>>>> From: Hyunchul Lee 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Using write hints[1], applications can inform the life time of 
>>>>>>>>>>>>>>> the data
>>>>>>>>>>>>>>> written to devices. and this[2] reported that the write hints 
>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>>>>   2) the hints that will be passed down to devices with the 
>>>>>>>>>>>>>>> data of segments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>>>>>>>>> segment types
>>>>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this 
>>>>>>>>>>>>>>> hints, And
>>>>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is 
>>>>>>>>>>>>>> existing?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am afraid that this makes side effects. for example, this could 
>>>>>>>>>>>>> cause
>>>>>>>>>>>>> out-of-place updates even when there are not enough free 
>>>>>>>>>>>>> segments. 
>>>>>>>>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>>>>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>>>>>>>>> disabled.
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>>>>> filesystem
>>>>>>>>>>>> hot/cold separating, rather than this feature. So I think it will 
>>>>>>>>>>>> be okay
>>>>>>>>>>>> to not consider it.
&

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-15 Thread Hyunchul Lee

On 11/16/2017 01:27 AM, Jaegeuk Kim wrote:
> On 11/14, Chao Yu wrote:
>> On 2017/11/14 12:20, Jaegeuk Kim wrote:
>>> On 11/13, Hyunchul Lee wrote:
>>>> On 11/13/2017 10:59 AM, Chao Yu wrote:
>>>>> On 2017/11/13 9:35, Hyunchul Lee wrote:
>>>>>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>>>>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>>>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>>>>>> Hello, Chao
>>>>>>>>>>
>>>>>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Using write hints[1], applications can inform the life time of the 
>>>>>>>>>>>> data
>>>>>>>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>
>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>   2) the hints that will be passed down to devices with the data 
>>>>>>>>>>>> of segments.
>>>>>>>>>>>>
>>>>>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>>>>>> segment types
>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>
>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>   - 
>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>
>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this 
>>>>>>>>>>>> hints, And
>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>
>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am afraid that this makes side effects. for example, this could 
>>>>>>>>>> cause
>>>>>>>>>> out-of-place updates even when there are not enough free segments. 
>>>>>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>>>>>> disabled.
>>>>>>>>>
>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>> filesystem
>>>>>>>>> hot/cold separating, rather than this feature. So I think it will be 
>>>>>>>>> okay
>>>>>>>>> to not consider it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Before the second mapping is implemented, write hints are not 
>>>>>>>>>>>> passed down
>>>>>>>>>>>> to devices. Because it is better that the data of a segment have 
>>>>>>>>>>>> the same 
>>>>>>>>>>>> hint.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>>>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>>>>>>>
>>>>>>>>>>> Could you write a patch to support passing write hint to block 
>>>>>>>>>

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-15 Thread Hyunchul Lee

On 11/16/2017 01:27 AM, Jaegeuk Kim wrote:
> On 11/14, Chao Yu wrote:
>> On 2017/11/14 12:20, Jaegeuk Kim wrote:
>>> On 11/13, Hyunchul Lee wrote:
>>>> On 11/13/2017 10:59 AM, Chao Yu wrote:
>>>>> On 2017/11/13 9:35, Hyunchul Lee wrote:
>>>>>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>>>>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>>>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>>>>>> Hello, Chao
>>>>>>>>>>
>>>>>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>>>>>> From: Hyunchul Lee 
>>>>>>>>>>>>
>>>>>>>>>>>> Using write hints[1], applications can inform the life time of the 
>>>>>>>>>>>> data
>>>>>>>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>>>>>
>>>>>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>>>>>   2) the hints that will be passed down to devices with the data 
>>>>>>>>>>>> of segments.
>>>>>>>>>>>>
>>>>>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>>>>>> segment types
>>>>>>>>>>>> as shown below.
>>>>>>>>>>>>
>>>>>>>>>>>>   hints segment type
>>>>>>>>>>>>   - 
>>>>>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>>>>>
>>>>>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this 
>>>>>>>>>>>> hints, And
>>>>>>>>>>>> hints are not applied in in-place update.
>>>>>>>>>>>
>>>>>>>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am afraid that this makes side effects. for example, this could 
>>>>>>>>>> cause
>>>>>>>>>> out-of-place updates even when there are not enough free segments. 
>>>>>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>>>>>> disabled.
>>>>>>>>>
>>>>>>>>> Oh, As I replied in another thread, I think IPU just affects 
>>>>>>>>> filesystem
>>>>>>>>> hot/cold separating, rather than this feature. So I think it will be 
>>>>>>>>> okay
>>>>>>>>> to not consider it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Before the second mapping is implemented, write hints are not 
>>>>>>>>>>>> passed down
>>>>>>>>>>>> to devices. Because it is better that the data of a segment have 
>>>>>>>>>>>> the same 
>>>>>>>>>>>> hint.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>>>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>>>>>>>
>>>>>>>>>>> Could you write a patch to support passing write hint to block 
>>>>>>>>>>> layer for
>&g

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:59 AM, Chao Yu wrote:
> On 2017/11/13 9:35, Hyunchul Lee wrote:
>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>> Hello, Chao
>>>>>>
>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>>>
>>>>>>>> Using write hints[1], applications can inform the life time of the data
>>>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>
>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>   2) the hints that will be passed down to devices with the data of 
>>>>>>>> segments.
>>>>>>>>
>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>> segment types
>>>>>>>> as shown below.
>>>>>>>>
>>>>>>>>   hints segment type
>>>>>>>>   - 
>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>
>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, 
>>>>>>>> And
>>>>>>>> hints are not applied in in-place update.
>>>>>>>
>>>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>>>
>>>>>>
>>>>>> I am afraid that this makes side effects. for example, this could cause
>>>>>> out-of-place updates even when there are not enough free segments. 
>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>> disabled.
>>>>>
>>>>> Oh, As I replied in another thread, I think IPU just affects filesystem
>>>>> hot/cold separating, rather than this feature. So I think it will be okay
>>>>> to not consider it.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Before the second mapping is implemented, write hints are not passed 
>>>>>>>> down
>>>>>>>> to devices. Because it is better that the data of a segment have the 
>>>>>>>> same 
>>>>>>>> hint.
>>>>>>>>
>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>>>
>>>>>>> Could you write a patch to support passing write hint to block layer for
>>>>>>> buffered writes as below commit:
>>>>>>> 0127251c45ae ("ext4: add support for passing in write hints for 
>>>>>>> buffered writes")
>>>>>>>
>>>>>>
>>>>>> Sure I will. I wrote it already ;)
>>>>>
>>>>> Cool, ;)
>>>>>
>>>>>> I think that datas from the same segment should be passed down with the 
>>>>>> same
>>>>>> hint, and the following mapping is reasonable. I wonder what is your 
>>>>>> opinion
>>>>>> about it.
>>>>>>
>>>>>>   segment type   hints
>>>>>>      -
>>>>>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>>>>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>>>>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
>>>>>
>>>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
>>>>>
>>>>>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
>>>>>
>>>>> As I know, in scenari

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:59 AM, Chao Yu wrote:
> On 2017/11/13 9:35, Hyunchul Lee wrote:
>> On 11/13/2017 10:26 AM, Chao Yu wrote:
>>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>>>> Hello, Chao
>>>>>>
>>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>>>> From: Hyunchul Lee 
>>>>>>>>
>>>>>>>> Using write hints[1], applications can inform the life time of the data
>>>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>>>> decreased writes in NAND by 25%.
>>>>>>>>
>>>>>>>> This hints help F2FS to determine the followings.
>>>>>>>>   1) the segment types where the data will be written.
>>>>>>>>   2) the hints that will be passed down to devices with the data of 
>>>>>>>> segments.
>>>>>>>>
>>>>>>>> This patch set implements the first mapping from write hints to 
>>>>>>>> segment types
>>>>>>>> as shown below.
>>>>>>>>
>>>>>>>>   hints segment type
>>>>>>>>   - 
>>>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>>>   othersCURSEG_WARM_DATA
>>>>>>>>
>>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, 
>>>>>>>> And
>>>>>>>> hints are not applied in in-place update.
>>>>>>>
>>>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>>>
>>>>>>
>>>>>> I am afraid that this makes side effects. for example, this could cause
>>>>>> out-of-place updates even when there are not enough free segments. 
>>>>>> I can write the patch that handles these situations. But I wonder 
>>>>>> that this is required, and I am not sure which IPU polices can be 
>>>>>> disabled.
>>>>>
>>>>> Oh, As I replied in another thread, I think IPU just affects filesystem
>>>>> hot/cold separating, rather than this feature. So I think it will be okay
>>>>> to not consider it.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Before the second mapping is implemented, write hints are not passed 
>>>>>>>> down
>>>>>>>> to devices. Because it is better that the data of a segment have the 
>>>>>>>> same 
>>>>>>>> hint.
>>>>>>>>
>>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>>>
>>>>>>> Could you write a patch to support passing write hint to block layer for
>>>>>>> buffered writes as below commit:
>>>>>>> 0127251c45ae ("ext4: add support for passing in write hints for 
>>>>>>> buffered writes")
>>>>>>>
>>>>>>
>>>>>> Sure I will. I wrote it already ;)
>>>>>
>>>>> Cool, ;)
>>>>>
>>>>>> I think that datas from the same segment should be passed down with the 
>>>>>> same
>>>>>> hint, and the following mapping is reasonable. I wonder what is your 
>>>>>> opinion
>>>>>> about it.
>>>>>>
>>>>>>   segment type   hints
>>>>>>      -
>>>>>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>>>>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>>>>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
>>>>>
>>>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
>>>>>
>>>>>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
>>>>>
>>>>> As I know, in scenario of cell phone, data 

Re: [f2fs-dev] [RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:24 AM, Chao Yu wrote:
> On 2017/11/13 8:07, Hyunchul Lee wrote:
>> On 11/11/2017 09:38 AM, Chao Yu wrote:
>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>
>>>> Select the type of the segment using write hints, when blocks are
>>>> allocated for direct write.
>>>>
>>>> There are unhandled corner cases. Hints are not applied in
>>>> in-place update.  And if the blocks of a file is not pre-allocated
>>>> because of the invalid user buffer, CURSEG_WARM_DATA segment will
>>>> be selected.
>>>>
>>>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>>>> ---
>>>>  fs/f2fs/data.c | 101 
>>>> ++---
>>>>  fs/f2fs/f2fs.h |   1 +
>>>>  2 files changed, 61 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 36b5352..d06048a 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
>>>>return page;
>>>>  }
>>>>  
>>>> -static int __allocate_data_block(struct dnode_of_data *dn)
>>>> +static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>>>>  {
>>>>struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
>>>>struct f2fs_summary sum;
>>>> @@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data 
>>>> *dn)
>>>>set_summary(, dn->nid, dn->ofs_in_node, ni.version);
>>>>  
>>>>allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
>>>> -  , CURSEG_WARM_DATA, NULL, false);
>>>> +  , seg_type, NULL, false);
>>>>set_data_blkaddr(dn);
>>>>  
>>>>/* update i_size */
>>>> @@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
>>>> *inode, int rw)
>>>>F2FS_I_SB(inode)->s_ndevs);
>>>>  }
>>>>  
>>>> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>>>> -{
>>>> -  struct inode *inode = file_inode(iocb->ki_filp);
>>>> -  struct f2fs_map_blocks map;
>>>> -  int err = 0;
>>>> -
>>>> -  if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>>> -  return 0;
>>>> -
>>>> -  map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
>>>> -  map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
>>>> -  if (map.m_len > map.m_lblk)
>>>> -  map.m_len -= map.m_lblk;
>>>> -  else
>>>> -  map.m_len = 0;
>>>> -
>>>> -  map.m_next_pgofs = NULL;
>>>> -
>>>> -  if (iocb->ki_flags & IOCB_DIRECT) {
>>>> -  err = f2fs_convert_inline_inode(inode);
>>>> -  if (err)
>>>> -  return err;
>>>> -  return f2fs_map_blocks(inode, , 1,
>>>> -  __force_buffered_io(inode, WRITE) ?
>>>> -  F2FS_GET_BLOCK_PRE_AIO :
>>>> -  F2FS_GET_BLOCK_PRE_DIO);
>>>> -  }
>>>> -  if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
>>>> -  err = f2fs_convert_inline_inode(inode);
>>>> -  if (err)
>>>> -  return err;
>>>> -  }
>>>> -  if (!f2fs_has_inline_data(inode))
>>>> -  return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
>>>> -  return err;
>>>> -}
>>>>  
>>>>  static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool 
>>>> lock)
>>>>  {
>>>> @@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info 
>>>> *sbi, int flag, bool lock)
>>>>   * b. do not use extent cache for better performance
>>>>   * c. give the block addresses to blockdev
>>>>   */
>>>> -int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
>>>> -  int create, int flag)
>>>> +static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks 
>>>&

Re: [f2fs-dev] [RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:24 AM, Chao Yu wrote:
> On 2017/11/13 8:07, Hyunchul Lee wrote:
>> On 11/11/2017 09:38 AM, Chao Yu wrote:
>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee 
>>>>
>>>> Select the type of the segment using write hints, when blocks are
>>>> allocated for direct write.
>>>>
>>>> There are unhandled corner cases. Hints are not applied in
>>>> in-place update.  And if the blocks of a file is not pre-allocated
>>>> because of the invalid user buffer, CURSEG_WARM_DATA segment will
>>>> be selected.
>>>>
>>>> Signed-off-by: Hyunchul Lee 
>>>> ---
>>>>  fs/f2fs/data.c | 101 
>>>> ++---
>>>>  fs/f2fs/f2fs.h |   1 +
>>>>  2 files changed, 61 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>> index 36b5352..d06048a 100644
>>>> --- a/fs/f2fs/data.c
>>>> +++ b/fs/f2fs/data.c
>>>> @@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
>>>>return page;
>>>>  }
>>>>  
>>>> -static int __allocate_data_block(struct dnode_of_data *dn)
>>>> +static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>>>>  {
>>>>struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
>>>>struct f2fs_summary sum;
>>>> @@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data 
>>>> *dn)
>>>>set_summary(, dn->nid, dn->ofs_in_node, ni.version);
>>>>  
>>>>allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
>>>> -  , CURSEG_WARM_DATA, NULL, false);
>>>> +  , seg_type, NULL, false);
>>>>set_data_blkaddr(dn);
>>>>  
>>>>/* update i_size */
>>>> @@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
>>>> *inode, int rw)
>>>>F2FS_I_SB(inode)->s_ndevs);
>>>>  }
>>>>  
>>>> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>>>> -{
>>>> -  struct inode *inode = file_inode(iocb->ki_filp);
>>>> -  struct f2fs_map_blocks map;
>>>> -  int err = 0;
>>>> -
>>>> -  if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>>> -  return 0;
>>>> -
>>>> -  map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
>>>> -  map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
>>>> -  if (map.m_len > map.m_lblk)
>>>> -  map.m_len -= map.m_lblk;
>>>> -  else
>>>> -  map.m_len = 0;
>>>> -
>>>> -  map.m_next_pgofs = NULL;
>>>> -
>>>> -  if (iocb->ki_flags & IOCB_DIRECT) {
>>>> -  err = f2fs_convert_inline_inode(inode);
>>>> -  if (err)
>>>> -  return err;
>>>> -  return f2fs_map_blocks(inode, , 1,
>>>> -  __force_buffered_io(inode, WRITE) ?
>>>> -  F2FS_GET_BLOCK_PRE_AIO :
>>>> -  F2FS_GET_BLOCK_PRE_DIO);
>>>> -  }
>>>> -  if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
>>>> -  err = f2fs_convert_inline_inode(inode);
>>>> -  if (err)
>>>> -  return err;
>>>> -  }
>>>> -  if (!f2fs_has_inline_data(inode))
>>>> -  return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
>>>> -  return err;
>>>> -}
>>>>  
>>>>  static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool 
>>>> lock)
>>>>  {
>>>> @@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info 
>>>> *sbi, int flag, bool lock)
>>>>   * b. do not use extent cache for better performance
>>>>   * c. give the block addresses to blockdev
>>>>   */
>>>> -int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
>>>> -  int create, int flag)
>>>> +static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks 
>>>> *map,
>>>> +   

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:26 AM, Chao Yu wrote:
> On 2017/11/13 8:24, Hyunchul Lee wrote:
>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>> Hello, Chao
>>>>
>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>>>
>>>>>> Using write hints[1], applications can inform the life time of the data
>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>> decreased writes in NAND by 25%.
>>>>>>
>>>>>> This hints help F2FS to determine the followings.
>>>>>>   1) the segment types where the data will be written.
>>>>>>   2) the hints that will be passed down to devices with the data of 
>>>>>> segments.
>>>>>>
>>>>>> This patch set implements the first mapping from write hints to segment 
>>>>>> types
>>>>>> as shown below.
>>>>>>
>>>>>>   hints segment type
>>>>>>   - 
>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>   othersCURSEG_WARM_DATA
>>>>>>
>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, 
>>>>>> And
>>>>>> hints are not applied in in-place update.
>>>>>
>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>
>>>>
>>>> I am afraid that this makes side effects. for example, this could cause
>>>> out-of-place updates even when there are not enough free segments. 
>>>> I can write the patch that handles these situations. But I wonder 
>>>> that this is required, and I am not sure which IPU polices can be disabled.
>>>
>>> Oh, As I replied in another thread, I think IPU just affects filesystem
>>> hot/cold separating, rather than this feature. So I think it will be okay
>>> to not consider it.
>>>
>>>>
>>>>>>
>>>>>> Before the second mapping is implemented, write hints are not passed down
>>>>>> to devices. Because it is better that the data of a segment have the 
>>>>>> same 
>>>>>> hint.
>>>>>>
>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>
>>>>> Could you write a patch to support passing write hint to block layer for
>>>>> buffered writes as below commit:
>>>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
>>>>> writes")
>>>>>
>>>>
>>>> Sure I will. I wrote it already ;)
>>>
>>> Cool, ;)
>>>
>>>> I think that datas from the same segment should be passed down with the 
>>>> same
>>>> hint, and the following mapping is reasonable. I wonder what is your 
>>>> opinion
>>>> about it.
>>>>
>>>>   segment type   hints
>>>>      -
>>>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
>>>
>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
>>>
>>>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
>>>
>>> As I know, in scenario of cell phone, data of meta_inode is hottest, then 
>>> hot
>>> data, warm node, and cold node should be coldest. So I suggested we can 
>>> define
>>> as below:
>>>
>>> META_DATA   WRITE_LIFE_SHORT
>>> HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM
>>> HOT_NODE & WARM_DATAWRITE_LIFE_LONG
>>> COLD_NODE & COLD_DATA   WRITE_LIFE_EXTREME
>>>
>>
>> I agree, But I am not sure that assigning the same hint to a node and data
>> segment is good. Because NVMe is likely to write them in the same erase 
>> block if they have the same hint.
> 
> If we do not give the hint, they can still be written to the same erase block,
> right? it will not be worse?
> 

If the hint is not given, I think that they could be written to 
the same erase block, or not. But if we give the same hint, they are written
to the same block.
I am not sure ;)

> Thanks,
> 
>>
>> Thanks.
>>
>>> Thanks,
>>>
>>>>   others WRITE_LIFE_NONE
>>>>  
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> Hyunchul Lee (2):
>>>>>>   f2fs: apply write hints to select the type of segments for buffered
>>>>>> write
>>>>>>   f2fs: apply write hints to select the type of segment for direct write
>>>>>>
>>>>>>  fs/f2fs/data.c| 101 
>>>>>> --
>>>>>>  fs/f2fs/f2fs.h|   1 +
>>>>>>  fs/f2fs/segment.c |  14 +++-
>>>>>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Thanks
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
> 


Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/13/2017 10:26 AM, Chao Yu wrote:
> On 2017/11/13 8:24, Hyunchul Lee wrote:
>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
>>>> Hello, Chao
>>>>
>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>>>> From: Hyunchul Lee 
>>>>>>
>>>>>> Using write hints[1], applications can inform the life time of the data
>>>>>> written to devices. and this[2] reported that the write hints patch
>>>>>> decreased writes in NAND by 25%.
>>>>>>
>>>>>> This hints help F2FS to determine the followings.
>>>>>>   1) the segment types where the data will be written.
>>>>>>   2) the hints that will be passed down to devices with the data of 
>>>>>> segments.
>>>>>>
>>>>>> This patch set implements the first mapping from write hints to segment 
>>>>>> types
>>>>>> as shown below.
>>>>>>
>>>>>>   hints segment type
>>>>>>   - 
>>>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>>>   othersCURSEG_WARM_DATA
>>>>>>
>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, 
>>>>>> And
>>>>>> hints are not applied in in-place update.
>>>>>
>>>>> Could we change to disable IPU if file/inode write hint is existing?
>>>>>
>>>>
>>>> I am afraid that this makes side effects. for example, this could cause
>>>> out-of-place updates even when there are not enough free segments. 
>>>> I can write the patch that handles these situations. But I wonder 
>>>> that this is required, and I am not sure which IPU polices can be disabled.
>>>
>>> Oh, As I replied in another thread, I think IPU just affects filesystem
>>> hot/cold separating, rather than this feature. So I think it will be okay
>>> to not consider it.
>>>
>>>>
>>>>>>
>>>>>> Before the second mapping is implemented, write hints are not passed down
>>>>>> to devices. Because it is better that the data of a segment have the 
>>>>>> same 
>>>>>> hint.
>>>>>>
>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>>>> [2]: https://lwn.net/Articles/726477/
>>>>>
>>>>> Could you write a patch to support passing write hint to block layer for
>>>>> buffered writes as below commit:
>>>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
>>>>> writes")
>>>>>
>>>>
>>>> Sure I will. I wrote it already ;)
>>>
>>> Cool, ;)
>>>
>>>> I think that datas from the same segment should be passed down with the 
>>>> same
>>>> hint, and the following mapping is reasonable. I wonder what is your 
>>>> opinion
>>>> about it.
>>>>
>>>>   segment type   hints
>>>>      -
>>>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
>>>
>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
>>>
>>>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
>>>
>>> As I know, in scenario of cell phone, data of meta_inode is hottest, then 
>>> hot
>>> data, warm node, and cold node should be coldest. So I suggested we can 
>>> define
>>> as below:
>>>
>>> META_DATA   WRITE_LIFE_SHORT
>>> HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM
>>> HOT_NODE & WARM_DATAWRITE_LIFE_LONG
>>> COLD_NODE & COLD_DATA   WRITE_LIFE_EXTREME
>>>
>>
>> I agree, But I am not sure that assigning the same hint to a node and data
>> segment is good. Because NVMe is likely to write them in the same erase 
>> block if they have the same hint.
> 
> If we do not give the hint, they can still be written to the same erase block,
> right? it will not be worse?
> 

If the hint is not given, I think that they could be written to 
the same erase block, or not. But if we give the same hint, they are written
to the same block.
I am not sure ;)

> Thanks,
> 
>>
>> Thanks.
>>
>>> Thanks,
>>>
>>>>   others WRITE_LIFE_NONE
>>>>  
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> Hyunchul Lee (2):
>>>>>>   f2fs: apply write hints to select the type of segments for buffered
>>>>>> write
>>>>>>   f2fs: apply write hints to select the type of segment for direct write
>>>>>>
>>>>>>  fs/f2fs/data.c| 101 
>>>>>> --
>>>>>>  fs/f2fs/f2fs.h|   1 +
>>>>>>  fs/f2fs/segment.c |  14 +++-
>>>>>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Thanks
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
> 


Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/10/2017 03:42 PM, Chao Yu wrote:
> On 2017/11/10 8:23, Hyunchul Lee wrote:
>> Hello, Chao
>>
>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee <cheol@lge.com>
>>>>
>>>> Using write hints[1], applications can inform the life time of the data
>>>> written to devices. and this[2] reported that the write hints patch
>>>> decreased writes in NAND by 25%.
>>>>
>>>> This hints help F2FS to determine the followings.
>>>>   1) the segment types where the data will be written.
>>>>   2) the hints that will be passed down to devices with the data of 
>>>> segments.
>>>>
>>>> This patch set implements the first mapping from write hints to segment 
>>>> types
>>>> as shown below.
>>>>
>>>>   hints segment type
>>>>   - 
>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>   othersCURSEG_WARM_DATA
>>>>
>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
>>>> hints are not applied in in-place update.
>>>
>>> Could we change to disable IPU if file/inode write hint is existing?
>>>
>>
>> I am afraid that this makes side effects. for example, this could cause
>> out-of-place updates even when there are not enough free segments. 
>> I can write the patch that handles these situations. But I wonder 
>> that this is required, and I am not sure which IPU polices can be disabled.
> 
> Oh, As I replied in another thread, I think IPU just affects filesystem
> hot/cold separating, rather than this feature. So I think it will be okay
> to not consider it.
> 
>>
>>>>
>>>> Before the second mapping is implemented, write hints are not passed down
>>>> to devices. Because it is better that the data of a segment have the same 
>>>> hint.
>>>>
>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>> [2]: https://lwn.net/Articles/726477/
>>>
>>> Could you write a patch to support passing write hint to block layer for
>>> buffered writes as below commit:
>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
>>> writes")
>>>
>>
>> Sure I will. I wrote it already ;)
> 
> Cool, ;)
> 
>> I think that datas from the same segment should be passed down with the same
>> hint, and the following mapping is reasonable. I wonder what is your opinion
>> about it.
>>
>>   segment type   hints
>>      -
>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
> 
> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
> 
>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
> 
> As I know, in scenario of cell phone, data of meta_inode is hottest, then hot
> data, warm node, and cold node should be coldest. So I suggested we can define
> as below:
> 
> META_DATA WRITE_LIFE_SHORT
> HOT_DATA & WARM_NODE  WRITE_LIFE_MEDIUM
> HOT_NODE & WARM_DATA  WRITE_LIFE_LONG
> COLD_NODE & COLD_DATA WRITE_LIFE_EXTREME
> 

I agree, But I am not sure that assigning the same hint to a node and data
segment is good. Because NVMe is likely to write them in the same erase 
block if they have the same hint.

Thanks.

> Thanks,
> 
>>   others WRITE_LIFE_NONE
>>  
>>> Thanks,
>>>
>>>>
>>>> Hyunchul Lee (2):
>>>>   f2fs: apply write hints to select the type of segments for buffered
>>>> write
>>>>   f2fs: apply write hints to select the type of segment for direct write
>>>>
>>>>  fs/f2fs/data.c| 101 
>>>> --
>>>>  fs/f2fs/f2fs.h|   1 +
>>>>  fs/f2fs/segment.c |  14 +++-
>>>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>>>
>>>
>>>
>>
>> Thanks
>>
>> .
>>
> 
> 


Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-12 Thread Hyunchul Lee
On 11/10/2017 03:42 PM, Chao Yu wrote:
> On 2017/11/10 8:23, Hyunchul Lee wrote:
>> Hello, Chao
>>
>> On 11/09/2017 06:12 PM, Chao Yu wrote:
>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>>> From: Hyunchul Lee 
>>>>
>>>> Using write hints[1], applications can inform the life time of the data
>>>> written to devices. and this[2] reported that the write hints patch
>>>> decreased writes in NAND by 25%.
>>>>
>>>> This hints help F2FS to determine the followings.
>>>>   1) the segment types where the data will be written.
>>>>   2) the hints that will be passed down to devices with the data of 
>>>> segments.
>>>>
>>>> This patch set implements the first mapping from write hints to segment 
>>>> types
>>>> as shown below.
>>>>
>>>>   hints segment type
>>>>   - 
>>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>>   othersCURSEG_WARM_DATA
>>>>
>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
>>>> hints are not applied in in-place update.
>>>
>>> Could we change to disable IPU if file/inode write hint is existing?
>>>
>>
>> I am afraid that this makes side effects. for example, this could cause
>> out-of-place updates even when there are not enough free segments. 
>> I can write the patch that handles these situations. But I wonder 
>> that this is required, and I am not sure which IPU polices can be disabled.
> 
> Oh, As I replied in another thread, I think IPU just affects filesystem
> hot/cold separating, rather than this feature. So I think it will be okay
> to not consider it.
> 
>>
>>>>
>>>> Before the second mapping is implemented, write hints are not passed down
>>>> to devices. Because it is better that the data of a segment have the same 
>>>> hint.
>>>>
>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>>> [2]: https://lwn.net/Articles/726477/
>>>
>>> Could you write a patch to support passing write hint to block layer for
>>> buffered writes as below commit:
>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
>>> writes")
>>>
>>
>> Sure I will. I wrote it already ;)
> 
> Cool, ;)
> 
>> I think that datas from the same segment should be passed down with the same
>> hint, and the following mapping is reasonable. I wonder what is your opinion
>> about it.
>>
>>   segment type   hints
>>      -
>>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
> 
> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
> 
>>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
> 
> As I know, in scenario of cell phone, data of meta_inode is hottest, then hot
> data, warm node, and cold node should be coldest. So I suggested we can define
> as below:
> 
> META_DATA WRITE_LIFE_SHORT
> HOT_DATA & WARM_NODE  WRITE_LIFE_MEDIUM
> HOT_NODE & WARM_DATA  WRITE_LIFE_LONG
> COLD_NODE & COLD_DATA WRITE_LIFE_EXTREME
> 

I agree, But I am not sure that assigning the same hint to a node and data
segment is good. Because NVMe is likely to write them in the same erase 
block if they have the same hint.

Thanks.

> Thanks,
> 
>>   others WRITE_LIFE_NONE
>>  
>>> Thanks,
>>>
>>>>
>>>> Hyunchul Lee (2):
>>>>   f2fs: apply write hints to select the type of segments for buffered
>>>> write
>>>>   f2fs: apply write hints to select the type of segment for direct write
>>>>
>>>>  fs/f2fs/data.c| 101 
>>>> --
>>>>  fs/f2fs/f2fs.h|   1 +
>>>>  fs/f2fs/segment.c |  14 +++-
>>>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>>>
>>>
>>>
>>
>> Thanks
>>
>> .
>>
> 
> 


Re: [f2fs-dev] [RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-12 Thread Hyunchul Lee
On 11/11/2017 09:38 AM, Chao Yu wrote:
> On 2017/11/9 13:51, Hyunchul Lee wrote:
>> From: Hyunchul Lee <cheol@lge.com>
>>
>> Select the type of the segment using write hints, when blocks are
>> allocated for direct write.
>>
>> There are unhandled corner cases. Hints are not applied in
>> in-place update.  And if the blocks of a file is not pre-allocated
>> because of the invalid user buffer, CURSEG_WARM_DATA segment will
>> be selected.
>>
>> Signed-off-by: Hyunchul Lee <cheol@lge.com>
>> ---
>>  fs/f2fs/data.c | 101 
>> ++---
>>  fs/f2fs/f2fs.h |   1 +
>>  2 files changed, 61 insertions(+), 41 deletions(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 36b5352..d06048a 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
>>  return page;
>>  }
>>  
>> -static int __allocate_data_block(struct dnode_of_data *dn)
>> +static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>>  {
>>  struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
>>  struct f2fs_summary sum;
>> @@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data 
>> *dn)
>>  set_summary(, dn->nid, dn->ofs_in_node, ni.version);
>>  
>>  allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
>> -, CURSEG_WARM_DATA, NULL, false);
>> +, seg_type, NULL, false);
>>  set_data_blkaddr(dn);
>>  
>>  /* update i_size */
>> @@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
>> *inode, int rw)
>>  F2FS_I_SB(inode)->s_ndevs);
>>  }
>>  
>> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>> -{
>> -struct inode *inode = file_inode(iocb->ki_filp);
>> -struct f2fs_map_blocks map;
>> -int err = 0;
>> -
>> -if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>> -return 0;
>> -
>> -map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
>> -map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
>> -if (map.m_len > map.m_lblk)
>> -map.m_len -= map.m_lblk;
>> -else
>> -map.m_len = 0;
>> -
>> -map.m_next_pgofs = NULL;
>> -
>> -if (iocb->ki_flags & IOCB_DIRECT) {
>> -err = f2fs_convert_inline_inode(inode);
>> -if (err)
>> -return err;
>> -return f2fs_map_blocks(inode, , 1,
>> -__force_buffered_io(inode, WRITE) ?
>> -F2FS_GET_BLOCK_PRE_AIO :
>> -F2FS_GET_BLOCK_PRE_DIO);
>> -}
>> -if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
>> -err = f2fs_convert_inline_inode(inode);
>> -if (err)
>> -return err;
>> -}
>> -if (!f2fs_has_inline_data(inode))
>> -return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
>> -return err;
>> -}
>>  
>>  static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool 
>> lock)
>>  {
>> @@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info 
>> *sbi, int flag, bool lock)
>>   * b. do not use extent cache for better performance
>>   * c. give the block addresses to blockdev
>>   */
>> -int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
>> -int create, int flag)
>> +static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks 
>> *map,
>> +int create, int flag, int seg_type)
>>  {
>>  unsigned int maxblocks = map->m_len;
>>  struct dnode_of_data dn;
>> @@ -957,7 +921,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
>> f2fs_map_blocks *map,
>>  last_ofs_in_node = dn.ofs_in_node;
>>  }
>>  } else {
>> -err = __allocate_data_block();
>> +/* if this inode is marked with FI_NO_PREALLOC,
>> + * @seg_type is NO_CHECK_TYPE
>> + */
>> +if (seg_typ

Re: [f2fs-dev] [RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-12 Thread Hyunchul Lee
On 11/11/2017 09:38 AM, Chao Yu wrote:
> On 2017/11/9 13:51, Hyunchul Lee wrote:
>> From: Hyunchul Lee 
>>
>> Select the type of the segment using write hints, when blocks are
>> allocated for direct write.
>>
>> There are unhandled corner cases. Hints are not applied in
>> in-place update.  And if the blocks of a file is not pre-allocated
>> because of the invalid user buffer, CURSEG_WARM_DATA segment will
>> be selected.
>>
>> Signed-off-by: Hyunchul Lee 
>> ---
>>  fs/f2fs/data.c | 101 
>> ++---
>>  fs/f2fs/f2fs.h |   1 +
>>  2 files changed, 61 insertions(+), 41 deletions(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 36b5352..d06048a 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
>>  return page;
>>  }
>>  
>> -static int __allocate_data_block(struct dnode_of_data *dn)
>> +static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
>>  {
>>  struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
>>  struct f2fs_summary sum;
>> @@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data 
>> *dn)
>>  set_summary(, dn->nid, dn->ofs_in_node, ni.version);
>>  
>>  allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
>> -, CURSEG_WARM_DATA, NULL, false);
>> +, seg_type, NULL, false);
>>  set_data_blkaddr(dn);
>>  
>>  /* update i_size */
>> @@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
>> *inode, int rw)
>>  F2FS_I_SB(inode)->s_ndevs);
>>  }
>>  
>> -int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>> -{
>> -struct inode *inode = file_inode(iocb->ki_filp);
>> -struct f2fs_map_blocks map;
>> -int err = 0;
>> -
>> -if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>> -return 0;
>> -
>> -map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
>> -map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
>> -if (map.m_len > map.m_lblk)
>> -map.m_len -= map.m_lblk;
>> -else
>> -map.m_len = 0;
>> -
>> -map.m_next_pgofs = NULL;
>> -
>> -if (iocb->ki_flags & IOCB_DIRECT) {
>> -err = f2fs_convert_inline_inode(inode);
>> -if (err)
>> -return err;
>> -return f2fs_map_blocks(inode, , 1,
>> -__force_buffered_io(inode, WRITE) ?
>> -F2FS_GET_BLOCK_PRE_AIO :
>> -F2FS_GET_BLOCK_PRE_DIO);
>> -}
>> -if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
>> -err = f2fs_convert_inline_inode(inode);
>> -if (err)
>> -return err;
>> -}
>> -if (!f2fs_has_inline_data(inode))
>> -return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
>> -return err;
>> -}
>>  
>>  static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool 
>> lock)
>>  {
>> @@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info 
>> *sbi, int flag, bool lock)
>>   * b. do not use extent cache for better performance
>>   * c. give the block addresses to blockdev
>>   */
>> -int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
>> -int create, int flag)
>> +static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks 
>> *map,
>> +int create, int flag, int seg_type)
>>  {
>>  unsigned int maxblocks = map->m_len;
>>  struct dnode_of_data dn;
>> @@ -957,7 +921,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
>> f2fs_map_blocks *map,
>>  last_ofs_in_node = dn.ofs_in_node;
>>  }
>>  } else {
>> -err = __allocate_data_block();
>> +/* if this inode is marked with FI_NO_PREALLOC,
>> + * @seg_type is NO_CHECK_TYPE
>> + */
>> +if (seg_type == NO_CHECK_TYPE)
>> 

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-09 Thread Hyunchul Lee
Hello, Chao

On 11/09/2017 06:12 PM, Chao Yu wrote:
> On 2017/11/9 13:51, Hyunchul Lee wrote:
>> From: Hyunchul Lee <cheol@lge.com>
>>
>> Using write hints[1], applications can inform the life time of the data
>> written to devices. and this[2] reported that the write hints patch
>> decreased writes in NAND by 25%.
>>
>> This hints help F2FS to determine the followings.
>>   1) the segment types where the data will be written.
>>   2) the hints that will be passed down to devices with the data of segments.
>>
>> This patch set implements the first mapping from write hints to segment types
>> as shown below.
>>
>>   hints segment type
>>   - 
>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>   othersCURSEG_WARM_DATA
>>
>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
>> hints are not applied in in-place update.
> 
> Could we change to disable IPU if file/inode write hint is existing?
> 

I am afraid that this makes side effects. for example, this could cause
out-of-place updates even when there are not enough free segments. 
I can write the patch that handles these situations. But I wonder 
that this is required, and I am not sure which IPU polices can be disabled.

>>
>> Before the second mapping is implemented, write hints are not passed down
>> to devices. Because it is better that the data of a segment have the same 
>> hint.
>>
>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>> [2]: https://lwn.net/Articles/726477/
> 
> Could you write a patch to support passing write hint to block layer for
> buffered writes as below commit:
> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
> writes")
> 

Sure I will. I wrote it already ;)
I think that datas from the same segment should be passed down with the same
hint, and the following mapping is reasonable. I wonder what is your opinion
about it.

  segment type   hints
     -
  CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
  CURSEG_HOT_DATAWRITE_LIFE_SHORT
  CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
  CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
  others WRITE_LIFE_NONE
 
> Thanks,
> 
>>
>> Hyunchul Lee (2):
>>   f2fs: apply write hints to select the type of segments for buffered
>> write
>>   f2fs: apply write hints to select the type of segment for direct write
>>
>>  fs/f2fs/data.c| 101 
>> --
>>  fs/f2fs/f2fs.h|   1 +
>>  fs/f2fs/segment.c |  14 +++-
>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>
> 
> 

Thanks


Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-09 Thread Hyunchul Lee
Hello, Chao

On 11/09/2017 06:12 PM, Chao Yu wrote:
> On 2017/11/9 13:51, Hyunchul Lee wrote:
>> From: Hyunchul Lee 
>>
>> Using write hints[1], applications can inform the life time of the data
>> written to devices. and this[2] reported that the write hints patch
>> decreased writes in NAND by 25%.
>>
>> This hints help F2FS to determine the followings.
>>   1) the segment types where the data will be written.
>>   2) the hints that will be passed down to devices with the data of segments.
>>
>> This patch set implements the first mapping from write hints to segment types
>> as shown below.
>>
>>   hints segment type
>>   - 
>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>   othersCURSEG_WARM_DATA
>>
>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
>> hints are not applied in in-place update.
> 
> Could we change to disable IPU if file/inode write hint is existing?
> 

I am afraid that this makes side effects. for example, this could cause
out-of-place updates even when there are not enough free segments. 
I can write the patch that handles these situations. But I wonder 
that this is required, and I am not sure which IPU polices can be disabled.

>>
>> Before the second mapping is implemented, write hints are not passed down
>> to devices. Because it is better that the data of a segment have the same 
>> hint.
>>
>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>> [2]: https://lwn.net/Articles/726477/
> 
> Could you write a patch to support passing write hint to block layer for
> buffered writes as below commit:
> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
> writes")
> 

Sure I will. I wrote it already ;)
I think that datas from the same segment should be passed down with the same
hint, and the following mapping is reasonable. I wonder what is your opinion
about it.

  segment type   hints
     -
  CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
  CURSEG_HOT_DATAWRITE_LIFE_SHORT
  CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
  CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
  others WRITE_LIFE_NONE
 
> Thanks,
> 
>>
>> Hyunchul Lee (2):
>>   f2fs: apply write hints to select the type of segments for buffered
>> write
>>   f2fs: apply write hints to select the type of segment for direct write
>>
>>  fs/f2fs/data.c| 101 
>> --
>>  fs/f2fs/f2fs.h|   1 +
>>  fs/f2fs/segment.c |  14 +++-
>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>
> 
> 

Thanks


[RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Select the type of the segment using write hints, when blocks are
allocated for direct write.

There are unhandled corner cases. Hints are not applied in
in-place update.  And if the blocks of a file is not pre-allocated
because of the invalid user buffer, CURSEG_WARM_DATA segment will
be selected.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/data.c | 101 ++---
 fs/f2fs/f2fs.h |   1 +
 2 files changed, 61 insertions(+), 41 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 36b5352..d06048a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
return page;
 }
 
-static int __allocate_data_block(struct dnode_of_data *dn)
+static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_summary sum;
@@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
set_summary(, dn->nid, dn->ofs_in_node, ni.version);
 
allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
-   , CURSEG_WARM_DATA, NULL, false);
+   , seg_type, NULL, false);
set_data_blkaddr(dn);
 
/* update i_size */
@@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
*inode, int rw)
F2FS_I_SB(inode)->s_ndevs);
 }
 
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
-{
-   struct inode *inode = file_inode(iocb->ki_filp);
-   struct f2fs_map_blocks map;
-   int err = 0;
-
-   if (is_inode_flag_set(inode, FI_NO_PREALLOC))
-   return 0;
-
-   map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
-   map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
-   if (map.m_len > map.m_lblk)
-   map.m_len -= map.m_lblk;
-   else
-   map.m_len = 0;
-
-   map.m_next_pgofs = NULL;
-
-   if (iocb->ki_flags & IOCB_DIRECT) {
-   err = f2fs_convert_inline_inode(inode);
-   if (err)
-   return err;
-   return f2fs_map_blocks(inode, , 1,
-   __force_buffered_io(inode, WRITE) ?
-   F2FS_GET_BLOCK_PRE_AIO :
-   F2FS_GET_BLOCK_PRE_DIO);
-   }
-   if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
-   err = f2fs_convert_inline_inode(inode);
-   if (err)
-   return err;
-   }
-   if (!f2fs_has_inline_data(inode))
-   return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
-   return err;
-}
 
 static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
 {
@@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info *sbi, 
int flag, bool lock)
  * b. do not use extent cache for better performance
  * c. give the block addresses to blockdev
  */
-int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
-   int create, int flag)
+static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
+   int create, int flag, int seg_type)
 {
unsigned int maxblocks = map->m_len;
struct dnode_of_data dn;
@@ -957,7 +921,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
last_ofs_in_node = dn.ofs_in_node;
}
} else {
-   err = __allocate_data_block();
+   /* if this inode is marked with FI_NO_PREALLOC,
+* @seg_type is NO_CHECK_TYPE
+*/
+   if (seg_type == NO_CHECK_TYPE)
+   seg_type = CURSEG_WARM_DATA;
+   err = __allocate_data_block(, seg_type);
if (!err)
set_inode_flag(inode, FI_APPEND_WRITE);
}
@@ -1048,6 +1017,51 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
+   int create, int flag)
+{
+   return __f2fs_map_blocks(inode, map, create, flag, NO_CHECK_TYPE);
+}
+
+int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct inode *inode = file_inode(iocb->ki_filp);
+   struct f2fs_map_blocks map;
+   int err = 0;
+
+   if (is_inode_flag_set(inode, FI_NO_PREALLOC))
+ 

[RFC PATHC 2/2] f2fs: apply write hints to select the type of segment for direct write

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee 

Select the type of the segment using write hints, when blocks are
allocated for direct write.

There are unhandled corner cases. Hints are not applied in
in-place update.  And if the blocks of a file is not pre-allocated
because of the invalid user buffer, CURSEG_WARM_DATA segment will
be selected.

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/data.c | 101 ++---
 fs/f2fs/f2fs.h |   1 +
 2 files changed, 61 insertions(+), 41 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 36b5352..d06048a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -783,7 +783,7 @@ struct page *get_new_data_page(struct inode *inode,
return page;
 }
 
-static int __allocate_data_block(struct dnode_of_data *dn)
+static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_summary sum;
@@ -808,7 +808,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
set_summary(, dn->nid, dn->ofs_in_node, ni.version);
 
allocate_data_block(sbi, NULL, dn->data_blkaddr, >data_blkaddr,
-   , CURSEG_WARM_DATA, NULL, false);
+   , seg_type, NULL, false);
set_data_blkaddr(dn);
 
/* update i_size */
@@ -827,42 +827,6 @@ static inline bool __force_buffered_io(struct inode 
*inode, int rw)
F2FS_I_SB(inode)->s_ndevs);
 }
 
-int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
-{
-   struct inode *inode = file_inode(iocb->ki_filp);
-   struct f2fs_map_blocks map;
-   int err = 0;
-
-   if (is_inode_flag_set(inode, FI_NO_PREALLOC))
-   return 0;
-
-   map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
-   map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
-   if (map.m_len > map.m_lblk)
-   map.m_len -= map.m_lblk;
-   else
-   map.m_len = 0;
-
-   map.m_next_pgofs = NULL;
-
-   if (iocb->ki_flags & IOCB_DIRECT) {
-   err = f2fs_convert_inline_inode(inode);
-   if (err)
-   return err;
-   return f2fs_map_blocks(inode, , 1,
-   __force_buffered_io(inode, WRITE) ?
-   F2FS_GET_BLOCK_PRE_AIO :
-   F2FS_GET_BLOCK_PRE_DIO);
-   }
-   if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
-   err = f2fs_convert_inline_inode(inode);
-   if (err)
-   return err;
-   }
-   if (!f2fs_has_inline_data(inode))
-   return f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
-   return err;
-}
 
 static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock)
 {
@@ -888,8 +852,8 @@ static inline void __do_map_lock(struct f2fs_sb_info *sbi, 
int flag, bool lock)
  * b. do not use extent cache for better performance
  * c. give the block addresses to blockdev
  */
-int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
-   int create, int flag)
+static int __f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
+   int create, int flag, int seg_type)
 {
unsigned int maxblocks = map->m_len;
struct dnode_of_data dn;
@@ -957,7 +921,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
last_ofs_in_node = dn.ofs_in_node;
}
} else {
-   err = __allocate_data_block();
+   /* if this inode is marked with FI_NO_PREALLOC,
+* @seg_type is NO_CHECK_TYPE
+*/
+   if (seg_type == NO_CHECK_TYPE)
+   seg_type = CURSEG_WARM_DATA;
+   err = __allocate_data_block(, seg_type);
if (!err)
set_inode_flag(inode, FI_APPEND_WRITE);
}
@@ -1048,6 +1017,51 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
return err;
 }
 
+int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
+   int create, int flag)
+{
+   return __f2fs_map_blocks(inode, map, create, flag, NO_CHECK_TYPE);
+}
+
+int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct inode *inode = file_inode(iocb->ki_filp);
+   struct f2fs_map_blocks map;
+   int err = 0;
+
+   if (is_inode_flag_set(inode, FI_NO_PREALLOC))
+   return 0;
+
+   map.m_lblk 

[RFC PATHC 1/2] f2fs: apply write hints to select the type of segments for buffered write

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Write hints helps F2FS to determine which type of segments would be
selected for buffered write.

This patch implements the mapping from write hints to segment types
as shown below.

  hints   segment type
  -   
  WRITE_LIFE_SHORTCURSEG_COLD_DATA
  WRITE_LIFE_EXTREME  CURSEG_HOT_DATA
  others  CURSEG_WARM_DATA

the F2FS poliy for hot/cold seperation has precedence over this hints.
And hints are not applied in in-place update.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/f2fs/segment.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c695ff4..45aef53 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2258,6 +2258,18 @@ static bool __has_curseg_space(struct f2fs_sb_info *sbi, 
int type)
return false;
 }
 
+int rw_hint_to_seg_type(enum rw_hint hint)
+{
+   switch (hint) {
+   case WRITE_LIFE_SHORT:
+   return CURSEG_HOT_DATA;
+   case WRITE_LIFE_EXTREME:
+   return CURSEG_COLD_DATA;
+   default:
+   return CURSEG_WARM_DATA;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
@@ -2292,7 +2304,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
return CURSEG_COLD_DATA;
if (is_inode_flag_set(inode, FI_HOT_DATA))
return CURSEG_HOT_DATA;
-   return CURSEG_WARM_DATA;
+   return rw_hint_to_seg_type(inode->i_write_hint);
} else {
if (IS_DNODE(fio->page))
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
-- 
1.9.1



[RFC PATHC 1/2] f2fs: apply write hints to select the type of segments for buffered write

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee 

Write hints helps F2FS to determine which type of segments would be
selected for buffered write.

This patch implements the mapping from write hints to segment types
as shown below.

  hints   segment type
  -   
  WRITE_LIFE_SHORTCURSEG_COLD_DATA
  WRITE_LIFE_EXTREME  CURSEG_HOT_DATA
  others  CURSEG_WARM_DATA

the F2FS poliy for hot/cold seperation has precedence over this hints.
And hints are not applied in in-place update.

Signed-off-by: Hyunchul Lee 
---
 fs/f2fs/segment.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c695ff4..45aef53 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2258,6 +2258,18 @@ static bool __has_curseg_space(struct f2fs_sb_info *sbi, 
int type)
return false;
 }
 
+int rw_hint_to_seg_type(enum rw_hint hint)
+{
+   switch (hint) {
+   case WRITE_LIFE_SHORT:
+   return CURSEG_HOT_DATA;
+   case WRITE_LIFE_EXTREME:
+   return CURSEG_COLD_DATA;
+   default:
+   return CURSEG_WARM_DATA;
+   }
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
if (fio->type == DATA)
@@ -2292,7 +2304,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
return CURSEG_COLD_DATA;
if (is_inode_flag_set(inode, FI_HOT_DATA))
return CURSEG_HOT_DATA;
-   return CURSEG_WARM_DATA;
+   return rw_hint_to_seg_type(inode->i_write_hint);
} else {
if (IS_DNODE(fio->page))
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
-- 
1.9.1



[RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Using write hints[1], applications can inform the life time of the data
written to devices. and this[2] reported that the write hints patch
decreased writes in NAND by 25%.

This hints help F2FS to determine the followings.
  1) the segment types where the data will be written.
  2) the hints that will be passed down to devices with the data of segments.

This patch set implements the first mapping from write hints to segment types
as shown below.

  hints segment type
  - 
  WRITE_LIFE_SHORT  CURSEG_COLD_DATA
  WRITE_LIFE_EXTREMECURSEG_HOT_DATA
  othersCURSEG_WARM_DATA

The F2FS poliy for hot/cold seperation has precedence over this hints, And
hints are not applied in in-place update.

Before the second mapping is implemented, write hints are not passed down
to devices. Because it is better that the data of a segment have the same 
hint.

[1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
[2]: https://lwn.net/Articles/726477/

Hyunchul Lee (2):
  f2fs: apply write hints to select the type of segments for buffered
write
  f2fs: apply write hints to select the type of segment for direct write

 fs/f2fs/data.c| 101 --
 fs/f2fs/f2fs.h|   1 +
 fs/f2fs/segment.c |  14 +++-
 3 files changed, 74 insertions(+), 42 deletions(-)

-- 
1.9.1



[RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-08 Thread Hyunchul Lee
From: Hyunchul Lee 

Using write hints[1], applications can inform the life time of the data
written to devices. and this[2] reported that the write hints patch
decreased writes in NAND by 25%.

This hints help F2FS to determine the followings.
  1) the segment types where the data will be written.
  2) the hints that will be passed down to devices with the data of segments.

This patch set implements the first mapping from write hints to segment types
as shown below.

  hints segment type
  - 
  WRITE_LIFE_SHORT  CURSEG_COLD_DATA
  WRITE_LIFE_EXTREMECURSEG_HOT_DATA
  othersCURSEG_WARM_DATA

The F2FS poliy for hot/cold seperation has precedence over this hints, And
hints are not applied in in-place update.

Before the second mapping is implemented, write hints are not passed down
to devices. Because it is better that the data of a segment have the same 
hint.

[1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
[2]: https://lwn.net/Articles/726477/

Hyunchul Lee (2):
  f2fs: apply write hints to select the type of segments for buffered
write
  f2fs: apply write hints to select the type of segment for direct write

 fs/f2fs/data.c| 101 --
 fs/f2fs/f2fs.h|   1 +
 fs/f2fs/segment.c |  14 +++-
 3 files changed, 74 insertions(+), 42 deletions(-)

-- 
1.9.1



[PATCH] ubi: Remove ubi_io_is_bad call from scan_peb

2017-09-25 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

When erase count and volume identifier headers are read,
ubi_io_is_bad is called. So instead of calling ubi_io_is_bad
from scan_peb, use the result.

this patch reduces the attach time by about 15% in my
environment.

ARMv7 1GHZ based board, 66.8MiB MTD partition
before  after
attach time 308.365 usec257.100 usec

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 drivers/mtd/ubi/attach.c | 15 ++-
 drivers/mtd/ubi/io.c |  9 ++---
 drivers/mtd/ubi/ubi.h|  2 ++
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
index 93ceea4..07b9162 100644
--- a/drivers/mtd/ubi/attach.c
+++ b/drivers/mtd/ubi/attach.c
@@ -962,15 +962,6 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
 
dbg_bld("scan PEB %d", pnum);
 
-   /* Skip bad physical eraseblocks */
-   err = ubi_io_is_bad(ubi, pnum);
-   if (err < 0)
-   return err;
-   else if (err) {
-   ai->bad_peb_count += 1;
-   return 0;
-   }
-
err = ubi_io_read_ec_hdr(ubi, pnum, ech, 0);
if (err < 0)
return err;
@@ -999,6 +990,9 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
ec = UBI_UNKNOWN;
bitflips = 1;
break;
+   case UBI_IO_BAD_BLK:
+   ai->bad_peb_count += 1;
+   return 0;
default:
ubi_err(ubi, "'ubi_io_read_ec_hdr()' returned unknown code %d",
err);
@@ -1136,6 +1130,9 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
if (err)
return err;
goto adjust_mean_ec;
+   case UBI_IO_BAD_BLK:
+   ai->bad_peb_count += 1;
+   return 0;
default:
ubi_err(ubi, "'ubi_io_read_vid_hdr()' returned unknown code %d",
err);
diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index 8290432..ae52e7e 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -117,6 +117,7 @@ static int self_check_write(struct ubi_device *ubi, const 
void *buf, int pnum,
  * o %UBI_IO_BITFLIPS if all the requested data were successfully read, but
  *   correctable bit-flips were detected; this is harmless but may indicate
  *   that this eraseblock may become bad soon (but do not have to);
+ * o %UBI_IO_BAD_BLK if the erabse block is bad
  * o %-EBADMSG if the MTD subsystem reported about data integrity problems, for
  *   example it can be an ECC error in case of NAND; this most probably means
  *   that the data is corrupted;
@@ -137,7 +138,9 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, 
int pnum, int offset,
ubi_assert(len > 0);
 
err = self_check_not_bad(ubi, pnum);
-   if (err)
+   if (err == -EBADSLT)
+   return UBI_IO_BAD_BLK;
+   else if (err)
return err;
 
/*
@@ -1131,7 +1134,7 @@ int ubi_io_write_vid_hdr(struct ubi_device *ubi, int pnum,
  * @ubi: UBI device description object
  * @pnum: physical eraseblock number to check
  *
- * This function returns zero if the physical eraseblock is good, %-EINVAL if
+ * This function returns zero if the physical eraseblock is good, %-EBADSLT if
  * it is bad and a negative error code if an error occurred.
  */
 static int self_check_not_bad(const struct ubi_device *ubi, int pnum)
@@ -1147,7 +1150,7 @@ static int self_check_not_bad(const struct ubi_device 
*ubi, int pnum)
 
ubi_err(ubi, "self-check failed for PEB %d", pnum);
dump_stack();
-   return err > 0 ? -EINVAL : err;
+   return err > 0 ? -EBADSLT : err;
 }
 
 /**
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 5fe6265..5c5207d 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -107,6 +107,7 @@
  * data integrity error reported by the MTD driver
  * (uncorrectable ECC error in case of NAND)
  * UBI_IO_BITFLIPS: bit-flips were detected and corrected
+ * UBI_IO_BAD_BLK: bad erase block
  *
  * Note, it is probably better to have bit-flip and ebadmsg as flags which can
  * be or'ed with other error code. But this is a big change because there are
@@ -118,6 +119,7 @@ enum {
UBI_IO_BAD_HDR,
UBI_IO_BAD_HDR_EBADMSG,
UBI_IO_BITFLIPS,
+   UBI_IO_BAD_BLK,
 };
 
 /*
-- 
1.9.1



[PATCH] ubi: Remove ubi_io_is_bad call from scan_peb

2017-09-25 Thread Hyunchul Lee
From: Hyunchul Lee 

When erase count and volume identifier headers are read,
ubi_io_is_bad is called. So instead of calling ubi_io_is_bad
from scan_peb, use the result.

this patch reduces the attach time by about 15% in my
environment.

ARMv7 1GHZ based board, 66.8MiB MTD partition
before  after
attach time 308.365 usec257.100 usec

Signed-off-by: Hyunchul Lee 
---
 drivers/mtd/ubi/attach.c | 15 ++-
 drivers/mtd/ubi/io.c |  9 ++---
 drivers/mtd/ubi/ubi.h|  2 ++
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
index 93ceea4..07b9162 100644
--- a/drivers/mtd/ubi/attach.c
+++ b/drivers/mtd/ubi/attach.c
@@ -962,15 +962,6 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
 
dbg_bld("scan PEB %d", pnum);
 
-   /* Skip bad physical eraseblocks */
-   err = ubi_io_is_bad(ubi, pnum);
-   if (err < 0)
-   return err;
-   else if (err) {
-   ai->bad_peb_count += 1;
-   return 0;
-   }
-
err = ubi_io_read_ec_hdr(ubi, pnum, ech, 0);
if (err < 0)
return err;
@@ -999,6 +990,9 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
ec = UBI_UNKNOWN;
bitflips = 1;
break;
+   case UBI_IO_BAD_BLK:
+   ai->bad_peb_count += 1;
+   return 0;
default:
ubi_err(ubi, "'ubi_io_read_ec_hdr()' returned unknown code %d",
err);
@@ -1136,6 +1130,9 @@ static int scan_peb(struct ubi_device *ubi, struct 
ubi_attach_info *ai,
if (err)
return err;
goto adjust_mean_ec;
+   case UBI_IO_BAD_BLK:
+   ai->bad_peb_count += 1;
+   return 0;
default:
ubi_err(ubi, "'ubi_io_read_vid_hdr()' returned unknown code %d",
err);
diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index 8290432..ae52e7e 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -117,6 +117,7 @@ static int self_check_write(struct ubi_device *ubi, const 
void *buf, int pnum,
  * o %UBI_IO_BITFLIPS if all the requested data were successfully read, but
  *   correctable bit-flips were detected; this is harmless but may indicate
  *   that this eraseblock may become bad soon (but do not have to);
+ * o %UBI_IO_BAD_BLK if the erabse block is bad
  * o %-EBADMSG if the MTD subsystem reported about data integrity problems, for
  *   example it can be an ECC error in case of NAND; this most probably means
  *   that the data is corrupted;
@@ -137,7 +138,9 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, 
int pnum, int offset,
ubi_assert(len > 0);
 
err = self_check_not_bad(ubi, pnum);
-   if (err)
+   if (err == -EBADSLT)
+   return UBI_IO_BAD_BLK;
+   else if (err)
return err;
 
/*
@@ -1131,7 +1134,7 @@ int ubi_io_write_vid_hdr(struct ubi_device *ubi, int pnum,
  * @ubi: UBI device description object
  * @pnum: physical eraseblock number to check
  *
- * This function returns zero if the physical eraseblock is good, %-EINVAL if
+ * This function returns zero if the physical eraseblock is good, %-EBADSLT if
  * it is bad and a negative error code if an error occurred.
  */
 static int self_check_not_bad(const struct ubi_device *ubi, int pnum)
@@ -1147,7 +1150,7 @@ static int self_check_not_bad(const struct ubi_device 
*ubi, int pnum)
 
ubi_err(ubi, "self-check failed for PEB %d", pnum);
dump_stack();
-   return err > 0 ? -EINVAL : err;
+   return err > 0 ? -EBADSLT : err;
 }
 
 /**
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 5fe6265..5c5207d 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -107,6 +107,7 @@
  * data integrity error reported by the MTD driver
  * (uncorrectable ECC error in case of NAND)
  * UBI_IO_BITFLIPS: bit-flips were detected and corrected
+ * UBI_IO_BAD_BLK: bad erase block
  *
  * Note, it is probably better to have bit-flip and ebadmsg as flags which can
  * be or'ed with other error code. But this is a big change because there are
@@ -118,6 +119,7 @@ enum {
UBI_IO_BAD_HDR,
UBI_IO_BAD_HDR_EBADMSG,
UBI_IO_BITFLIPS,
+   UBI_IO_BAD_BLK,
 };
 
 /*
-- 
1.9.1



[PATCH v3] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.

Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.

For this, Use readahead_gfp_mask for gfp flags when
allocating pages.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
v3:
* fix the invalid format of version history in 
 this patch.

v2:
* rewrite a commit message for explaining why this
 patch is needed.

 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



[PATCH v3] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee 

In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.

Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.

For this, Use readahead_gfp_mask for gfp flags when
allocating pages.

Signed-off-by: Hyunchul Lee 
---
v3:
* fix the invalid format of version history in 
 this patch.

v2:
* rewrite a commit message for explaining why this
 patch is needed.

 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



[PATCH v2] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.

Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.

For this, Use readahead_gfp_mask for gfp flags when
allocating pages.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
v2:
* rewrite a commit message for explaining why this
 patch is needed.
---
 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



[PATCH v2] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee 

In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.

Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.

For this, Use readahead_gfp_mask for gfp flags when
allocating pages.

Signed-off-by: Hyunchul Lee 
---
v2:
* rewrite a commit message for explaining why this
 patch is needed.
---
 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



[PATCH] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

Use readahead_gfp_mask for gfp flags when allocating pages.
This set additional flags which are __GFP_NORETRY and
__GFP_NOWARN. So OOMs and a failure message can be
avoided.
And we should remove __GFP_FS from flags to prevent from
calling ubifs_writepage during page reclaim.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



[PATCH] ubifs: Change gfp flags in page allocation for bulk read

2017-06-13 Thread Hyunchul Lee
From: Hyunchul Lee 

Use readahead_gfp_mask for gfp flags when allocating pages.
This set additional flags which are __GFP_NORETRY and
__GFP_NOWARN. So OOMs and a failure message can be
avoided.
And we should remove __GFP_FS from flags to prevent from
calling ubifs_writepage during page reclaim.

Signed-off-by: Hyunchul Lee 
---
 fs/ubifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index d9ae86f..4396c04 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -735,6 +735,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
int err, page_idx, page_cnt, ret = 0, n = 0;
int allocate = bu->buf ? 0 : 1;
loff_t isize;
+   gfp_t ra_gfp_mask = readahead_gfp_mask(mapping) & ~__GFP_FS;
 
err = ubifs_tnc_get_bu_keys(c, bu);
if (err)
@@ -796,8 +797,7 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct 
bu_info *bu,
 
if (page_offset > end_index)
break;
-   page = find_or_create_page(mapping, page_offset,
-  GFP_NOFS | __GFP_COLD);
+   page = find_or_create_page(mapping, page_offset, ra_gfp_mask);
if (!page)
break;
if (!PageUptodate(page))
-- 
1.9.1



Re: [PATCH] ubifs: Add freeze support

2017-05-29 Thread Hyunchul Lee
On Mon, May 29, 2017 at 10:42:37AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 29.05.2017 um 04:24 schrieb Hyunchul Lee:
> >>> This is just broken.  First ubifs should still properly propagate
> >>> the errors, and second freezing/unfreezing read only file systems is
> >>> perfectly valid, 
> >>
> >> it is right.
> > 
> > if updating TNC is failed, ubifs might become inconsistant and be switched 
> > to 
> > read-only mode. for example, when ubifs_jnl_update is called to create a 
> > file, 
> > if inserting a znode for new inode is failed, TNC has only a znode for 
> > new dentry. and this can be only recoverd by replay.
> > 
> > is it required to fix this?
> 
> UBIFS is designed to be power-cut tolerant.
> So, UBIFS must not corrupt in any case.
> 
> Which failure are you facing?
> 
> I have the feeling that you try to paper over some other issue. :-)

The failure hasn't happened. I wondered the following situation
should be handled.

ubifs_create
  ubifs_jnl_update
write_head
ubifs_tnc_add_nm  /* (1) add dentry to TNC */
ubifs_tnc_add /* (2) add new inode to TNC */
ubifs_tnc_add /* (3) add parent inode to TNC */

If ubifs_tnc_add(2) fails, TNC would have the index of a dentry 
which points to an invalid inode. So, though ubifs_readdir
emits the dentry, this inode cannot be accessed. Becasue
there isn't the index of the inode.

I know this situation is hardly probable. But UBIFS would
be read-only and inconsitant in this situation, until replay
is completed.

> 
> Thanks,
> //richard

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-29 Thread Hyunchul Lee
On Mon, May 29, 2017 at 10:42:37AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 29.05.2017 um 04:24 schrieb Hyunchul Lee:
> >>> This is just broken.  First ubifs should still properly propagate
> >>> the errors, and second freezing/unfreezing read only file systems is
> >>> perfectly valid, 
> >>
> >> it is right.
> > 
> > if updating TNC is failed, ubifs might become inconsistant and be switched 
> > to 
> > read-only mode. for example, when ubifs_jnl_update is called to create a 
> > file, 
> > if inserting a znode for new inode is failed, TNC has only a znode for 
> > new dentry. and this can be only recoverd by replay.
> > 
> > is it required to fix this?
> 
> UBIFS is designed to be power-cut tolerant.
> So, UBIFS must not corrupt in any case.
> 
> Which failure are you facing?
> 
> I have the feeling that you try to paper over some other issue. :-)

The failure hasn't happened. I wondered the following situation
should be handled.

ubifs_create
  ubifs_jnl_update
write_head
ubifs_tnc_add_nm  /* (1) add dentry to TNC */
ubifs_tnc_add /* (2) add new inode to TNC */
ubifs_tnc_add /* (3) add parent inode to TNC */

If ubifs_tnc_add(2) fails, TNC would have the index of a dentry 
which points to an invalid inode. So, though ubifs_readdir
emits the dentry, this inode cannot be accessed. Becasue
there isn't the index of the inode.

I know this situation is hardly probable. But UBIFS would
be read-only and inconsitant in this situation, until replay
is completed.

> 
> Thanks,
> //richard

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee

and I missed the following case.

in some embedded systems, clean-up for shutdown should be fast.
during this clean-up, freeze file system to guarantee integrity.
umount with MNT_DETACH is not suitable because of not killing tasks.

On Mon, May 29, 2017 at 10:18:34AM +0900, Hyunchul Lee wrote:
> Hi, Richard.
> 
> On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote:
> > Hyunchul,
> > 
> > Am 26.05.2017 um 01:30 schrieb Hyunchul Lee:
> > > From: Hyunchul Lee <cheol@lge.com>
> > > 
> > > for un/freeze support, implement freeze_super and un/freeze_fs
> > > of super_operations.
> > > ubifs_freeze_super just calls freeze_super. because freeze_super always
> > > succeeds if file system is read-only,  UBIFS errors should be checked.
> > > if there are errors, UBIFS is switched to read-only mode.
> > > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
> > > are blocked and sync_fs is called before, if commit alreay was started
> > > before writes are blocked, TNC/LPT might have dirty COW nodes.
> > 
> > you explain how you implement that feature, but not why.
> > What is the use-case?
> > I always thought this interface is only being used by LVM.
> 
> Sorry, I forgot this. I implement this to make a backup of some files, and
> support fsfreeze utility and SysRq's freeze/thaw commmand.
> 
> > 
> > Thanks,
> > //richard
> > 
> > __
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> -- 
> 
> Thanks,
> Hyunchul
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee

and I missed the following case.

in some embedded systems, clean-up for shutdown should be fast.
during this clean-up, freeze file system to guarantee integrity.
umount with MNT_DETACH is not suitable because of not killing tasks.

On Mon, May 29, 2017 at 10:18:34AM +0900, Hyunchul Lee wrote:
> Hi, Richard.
> 
> On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote:
> > Hyunchul,
> > 
> > Am 26.05.2017 um 01:30 schrieb Hyunchul Lee:
> > > From: Hyunchul Lee 
> > > 
> > > for un/freeze support, implement freeze_super and un/freeze_fs
> > > of super_operations.
> > > ubifs_freeze_super just calls freeze_super. because freeze_super always
> > > succeeds if file system is read-only,  UBIFS errors should be checked.
> > > if there are errors, UBIFS is switched to read-only mode.
> > > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
> > > are blocked and sync_fs is called before, if commit alreay was started
> > > before writes are blocked, TNC/LPT might have dirty COW nodes.
> > 
> > you explain how you implement that feature, but not why.
> > What is the use-case?
> > I always thought this interface is only being used by LVM.
> 
> Sorry, I forgot this. I implement this to make a backup of some files, and
> support fsfreeze utility and SysRq's freeze/thaw commmand.
> 
> > 
> > Thanks,
> > //richard
> > 
> > __
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> -- 
> 
> Thanks,
> Hyunchul
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
Hi, Richard.

On Mon, May 29, 2017 at 09:43:46AM +0900, Hyunchul Lee wrote:
> On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote:
> > > +static int ubifs_freeze_super(struct super_block *sb)
> > > +{
> > > + struct ubifs_info *c = sb->s_fs_info;
> > > + int err;
> > > +
> > > + dbg_gen("starting");
> > > + /* freeze_super always succeeds if file system is in read-only.
> > > +  * however if there are errors, UBIFS is switched to read-only mode.
> > > +  * so @ro_error should be checked.
> > > +  */
> > > + err = freeze_super(sb);
> > > + if (!err && c->ro_error) {
> > > + thaw_super(sb);
> > > + return -EIO;
> > > + }
> > > + return err;
> > 
> > This is just broken.  First ubifs should still properly propagate
> > the errors, and second freezing/unfreezing read only file systems is
> > perfectly valid, 
> 
> it is right.

if updating TNC is failed, ubifs might become inconsistant and be switched to 
read-only mode. for example, when ubifs_jnl_update is called to create a file, 
if inserting a znode for new inode is failed, TNC has only a znode for 
new dentry. and this can be only recoverd by replay.

is it required to fix this?

> 
> > and third the freeze_super method is a special
> > hack for gfs2 that should not gain additional users.
> 
> I thought that it was ok. because commit 48b6bca says "every filesystem
> that implements this hooks must call the vfs freeze_super ..."
> 
> Thank you for comment.
> > 
> > __
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> -- 
> 
> Thanks,
> Hyunchul

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
Hi, Richard.

On Mon, May 29, 2017 at 09:43:46AM +0900, Hyunchul Lee wrote:
> On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote:
> > > +static int ubifs_freeze_super(struct super_block *sb)
> > > +{
> > > + struct ubifs_info *c = sb->s_fs_info;
> > > + int err;
> > > +
> > > + dbg_gen("starting");
> > > + /* freeze_super always succeeds if file system is in read-only.
> > > +  * however if there are errors, UBIFS is switched to read-only mode.
> > > +  * so @ro_error should be checked.
> > > +  */
> > > + err = freeze_super(sb);
> > > + if (!err && c->ro_error) {
> > > + thaw_super(sb);
> > > + return -EIO;
> > > + }
> > > + return err;
> > 
> > This is just broken.  First ubifs should still properly propagate
> > the errors, and second freezing/unfreezing read only file systems is
> > perfectly valid, 
> 
> it is right.

if updating TNC is failed, ubifs might become inconsistant and be switched to 
read-only mode. for example, when ubifs_jnl_update is called to create a file, 
if inserting a znode for new inode is failed, TNC has only a znode for 
new dentry. and this can be only recoverd by replay.

is it required to fix this?

> 
> > and third the freeze_super method is a special
> > hack for gfs2 that should not gain additional users.
> 
> I thought that it was ok. because commit 48b6bca says "every filesystem
> that implements this hooks must call the vfs freeze_super ..."
> 
> Thank you for comment.
> > 
> > __
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> -- 
> 
> Thanks,
> Hyunchul

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
Hi, Richard.

On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 26.05.2017 um 01:30 schrieb Hyunchul Lee:
> > From: Hyunchul Lee <cheol@lge.com>
> > 
> > for un/freeze support, implement freeze_super and un/freeze_fs
> > of super_operations.
> > ubifs_freeze_super just calls freeze_super. because freeze_super always
> > succeeds if file system is read-only,  UBIFS errors should be checked.
> > if there are errors, UBIFS is switched to read-only mode.
> > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
> > are blocked and sync_fs is called before, if commit alreay was started
> > before writes are blocked, TNC/LPT might have dirty COW nodes.
> 
> you explain how you implement that feature, but not why.
> What is the use-case?
> I always thought this interface is only being used by LVM.

Sorry, I forgot this. I implement this to make a backup of some files, and
support fsfreeze utility and SysRq's freeze/thaw commmand.

> 
> Thanks,
> //richard
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
Hi, Richard.

On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 26.05.2017 um 01:30 schrieb Hyunchul Lee:
> > From: Hyunchul Lee 
> > 
> > for un/freeze support, implement freeze_super and un/freeze_fs
> > of super_operations.
> > ubifs_freeze_super just calls freeze_super. because freeze_super always
> > succeeds if file system is read-only,  UBIFS errors should be checked.
> > if there are errors, UBIFS is switched to read-only mode.
> > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
> > are blocked and sync_fs is called before, if commit alreay was started
> > before writes are blocked, TNC/LPT might have dirty COW nodes.
> 
> you explain how you implement that feature, but not why.
> What is the use-case?
> I always thought this interface is only being used by LVM.

Sorry, I forgot this. I implement this to make a backup of some files, and
support fsfreeze utility and SysRq's freeze/thaw commmand.

> 
> Thanks,
> //richard
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote:
> > +static int ubifs_freeze_super(struct super_block *sb)
> > +{
> > +   struct ubifs_info *c = sb->s_fs_info;
> > +   int err;
> > +
> > +   dbg_gen("starting");
> > +   /* freeze_super always succeeds if file system is in read-only.
> > +* however if there are errors, UBIFS is switched to read-only mode.
> > +* so @ro_error should be checked.
> > +*/
> > +   err = freeze_super(sb);
> > +   if (!err && c->ro_error) {
> > +   thaw_super(sb);
> > +   return -EIO;
> > +   }
> > +   return err;
> 
> This is just broken.  First ubifs should still properly propagate
> the errors, and second freezing/unfreezing read only file systems is
> perfectly valid, 

it is right.

> and third the freeze_super method is a special
> hack for gfs2 that should not gain additional users.

I thought that it was ok. because commit 48b6bca says "every filesystem
that implements this hooks must call the vfs freeze_super ..."

Thank you for comment.
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH] ubifs: Add freeze support

2017-05-28 Thread Hyunchul Lee
On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote:
> > +static int ubifs_freeze_super(struct super_block *sb)
> > +{
> > +   struct ubifs_info *c = sb->s_fs_info;
> > +   int err;
> > +
> > +   dbg_gen("starting");
> > +   /* freeze_super always succeeds if file system is in read-only.
> > +* however if there are errors, UBIFS is switched to read-only mode.
> > +* so @ro_error should be checked.
> > +*/
> > +   err = freeze_super(sb);
> > +   if (!err && c->ro_error) {
> > +   thaw_super(sb);
> > +   return -EIO;
> > +   }
> > +   return err;
> 
> This is just broken.  First ubifs should still properly propagate
> the errors, and second freezing/unfreezing read only file systems is
> perfectly valid, 

it is right.

> and third the freeze_super method is a special
> hack for gfs2 that should not gain additional users.

I thought that it was ok. because commit 48b6bca says "every filesystem
that implements this hooks must call the vfs freeze_super ..."

Thank you for comment.
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


[PATCH] ubifs: Add freeze support

2017-05-25 Thread Hyunchul Lee
From: Hyunchul Lee <cheol@lge.com>

for un/freeze support, implement freeze_super and un/freeze_fs
of super_operations.
ubifs_freeze_super just calls freeze_super. because freeze_super always
succeeds if file system is read-only,  UBIFS errors should be checked.
if there are errors, UBIFS is switched to read-only mode.
ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
are blocked and sync_fs is called before, if commit alreay was started
before writes are blocked, TNC/LPT might have dirty COW nodes.

Signed-off-by: Hyunchul Lee <cheol@lge.com>
---
 fs/ubifs/commit.c |  6 +++---
 fs/ubifs/super.c  | 63 +++
 fs/ubifs/ubifs.h  |  1 +
 3 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 63f5661..ab347ff 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -49,7 +49,7 @@
 #include "ubifs.h"
 
 /*
- * nothing_to_commit - check if there is nothing to commit.
+ * ubifs_nothing_to_commit - check if there is nothing to commit.
  * @c: UBIFS file-system description object
  *
  * This is a helper function which checks if there is anything to commit. It is
@@ -65,7 +65,7 @@
  *
  * This function returns %1 if there is nothing to commit and %0 otherwise.
  */
-static int nothing_to_commit(struct ubifs_info *c)
+int ubifs_nothing_to_commit(struct ubifs_info *c)
 {
/*
 * During mounting or remounting from R/O mode to R/W mode we may
@@ -120,7 +120,7 @@ static int do_commit(struct ubifs_info *c)
goto out_up;
}
 
-   if (nothing_to_commit(c)) {
+   if (ubifs_nothing_to_commit(c)) {
up_write(>commit_sem);
err = 0;
goto out_cancel;
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index b73811b..16fc22c 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -356,6 +356,7 @@ static void ubifs_evict_inode(struct inode *inode)
if (inode->i_nlink)
goto done;
 
+   sb_start_intwrite(inode->i_sb);
if (is_bad_inode(inode))
goto out;
 
@@ -377,6 +378,7 @@ static void ubifs_evict_inode(struct inode *inode)
c->bi.nospace = c->bi.nospace_rp = 0;
smp_wmb();
}
+   sb_end_intwrite(inode->i_sb);
 done:
clear_inode(inode);
 #ifdef CONFIG_UBIFS_FS_ENCRYPTION
@@ -486,6 +488,64 @@ static int ubifs_sync_fs(struct super_block *sb, int wait)
return ubi_sync(c->vi.ubi_num);
 }
 
+static int ubifs_freeze_super(struct super_block *sb)
+{
+   struct ubifs_info *c = sb->s_fs_info;
+   int err;
+
+   dbg_gen("starting");
+   /* freeze_super always succeeds if file system is in read-only.
+* however if there are errors, UBIFS is switched to read-only mode.
+* so @ro_error should be checked.
+*/
+   err = freeze_super(sb);
+   if (!err && c->ro_error) {
+   thaw_super(sb);
+   return -EIO;
+   }
+   return err;
+}
+
+static int ubifs_freeze(struct super_block *sb)
+{
+   struct ubifs_info *c = sb->s_fs_info;
+   int ret;
+
+   if (c->ro_error)
+   return -EIO;
+
+   if (c->ro_mount)
+   return 0;
+
+   down_write(>commit_sem);
+   ret = ubifs_nothing_to_commit(c);
+   up_write(>commit_sem);
+
+   /* writes were blocked and ubifs_sync_fs was called before.
+* but TNC/LPT isn't guarranteed to be clean. because if commit was
+* already started before writes were blocked, TNC/LPT might have
+* COW nodes. so we try to commit again in this case.
+*/
+   if (!ret) {
+   ret = ubifs_run_commit(c);
+   if (ret)
+   return ret;
+
+   down_write(>commit_sem);
+   ret = ubifs_nothing_to_commit(c);
+   up_write(>commit_sem);
+   if (!ret)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int ubifs_unfreeze(struct super_block *sb)
+{
+   return 0;
+}
+
 /**
  * init_constants_early - initialize UBIFS constants.
  * @c: UBIFS file-system description object
@@ -1889,6 +1949,9 @@ static int ubifs_remount_fs(struct super_block *sb, int 
*flags, char *data)
.remount_fs= ubifs_remount_fs,
.show_options  = ubifs_show_options,
.sync_fs   = ubifs_sync_fs,
+   .freeze_super  = ubifs_freeze_super,
+   .freeze_fs = ubifs_freeze,
+   .unfreeze_fs   = ubifs_unfreeze,
 };
 
 /**
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index abdd116..545796e 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1645,6 +1645,7 @@ unsigned long ubifs_shrink_count(struct shrinker *shrink,
 void ubifs_recovery_commit(struct ubifs_info *c);
 int ubifs_gc_should_commit(struct ubifs_info *c);
 void ubifs_wait_for_commit(s

[PATCH] ubifs: Add freeze support

2017-05-25 Thread Hyunchul Lee
From: Hyunchul Lee 

for un/freeze support, implement freeze_super and un/freeze_fs
of super_operations.
ubifs_freeze_super just calls freeze_super. because freeze_super always
succeeds if file system is read-only,  UBIFS errors should be checked.
if there are errors, UBIFS is switched to read-only mode.
ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes
are blocked and sync_fs is called before, if commit alreay was started
before writes are blocked, TNC/LPT might have dirty COW nodes.

Signed-off-by: Hyunchul Lee 
---
 fs/ubifs/commit.c |  6 +++---
 fs/ubifs/super.c  | 63 +++
 fs/ubifs/ubifs.h  |  1 +
 3 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 63f5661..ab347ff 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -49,7 +49,7 @@
 #include "ubifs.h"
 
 /*
- * nothing_to_commit - check if there is nothing to commit.
+ * ubifs_nothing_to_commit - check if there is nothing to commit.
  * @c: UBIFS file-system description object
  *
  * This is a helper function which checks if there is anything to commit. It is
@@ -65,7 +65,7 @@
  *
  * This function returns %1 if there is nothing to commit and %0 otherwise.
  */
-static int nothing_to_commit(struct ubifs_info *c)
+int ubifs_nothing_to_commit(struct ubifs_info *c)
 {
/*
 * During mounting or remounting from R/O mode to R/W mode we may
@@ -120,7 +120,7 @@ static int do_commit(struct ubifs_info *c)
goto out_up;
}
 
-   if (nothing_to_commit(c)) {
+   if (ubifs_nothing_to_commit(c)) {
up_write(>commit_sem);
err = 0;
goto out_cancel;
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index b73811b..16fc22c 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -356,6 +356,7 @@ static void ubifs_evict_inode(struct inode *inode)
if (inode->i_nlink)
goto done;
 
+   sb_start_intwrite(inode->i_sb);
if (is_bad_inode(inode))
goto out;
 
@@ -377,6 +378,7 @@ static void ubifs_evict_inode(struct inode *inode)
c->bi.nospace = c->bi.nospace_rp = 0;
smp_wmb();
}
+   sb_end_intwrite(inode->i_sb);
 done:
clear_inode(inode);
 #ifdef CONFIG_UBIFS_FS_ENCRYPTION
@@ -486,6 +488,64 @@ static int ubifs_sync_fs(struct super_block *sb, int wait)
return ubi_sync(c->vi.ubi_num);
 }
 
+static int ubifs_freeze_super(struct super_block *sb)
+{
+   struct ubifs_info *c = sb->s_fs_info;
+   int err;
+
+   dbg_gen("starting");
+   /* freeze_super always succeeds if file system is in read-only.
+* however if there are errors, UBIFS is switched to read-only mode.
+* so @ro_error should be checked.
+*/
+   err = freeze_super(sb);
+   if (!err && c->ro_error) {
+   thaw_super(sb);
+   return -EIO;
+   }
+   return err;
+}
+
+static int ubifs_freeze(struct super_block *sb)
+{
+   struct ubifs_info *c = sb->s_fs_info;
+   int ret;
+
+   if (c->ro_error)
+   return -EIO;
+
+   if (c->ro_mount)
+   return 0;
+
+   down_write(>commit_sem);
+   ret = ubifs_nothing_to_commit(c);
+   up_write(>commit_sem);
+
+   /* writes were blocked and ubifs_sync_fs was called before.
+* but TNC/LPT isn't guarranteed to be clean. because if commit was
+* already started before writes were blocked, TNC/LPT might have
+* COW nodes. so we try to commit again in this case.
+*/
+   if (!ret) {
+   ret = ubifs_run_commit(c);
+   if (ret)
+   return ret;
+
+   down_write(>commit_sem);
+   ret = ubifs_nothing_to_commit(c);
+   up_write(>commit_sem);
+   if (!ret)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int ubifs_unfreeze(struct super_block *sb)
+{
+   return 0;
+}
+
 /**
  * init_constants_early - initialize UBIFS constants.
  * @c: UBIFS file-system description object
@@ -1889,6 +1949,9 @@ static int ubifs_remount_fs(struct super_block *sb, int 
*flags, char *data)
.remount_fs= ubifs_remount_fs,
.show_options  = ubifs_show_options,
.sync_fs   = ubifs_sync_fs,
+   .freeze_super  = ubifs_freeze_super,
+   .freeze_fs = ubifs_freeze,
+   .unfreeze_fs   = ubifs_unfreeze,
 };
 
 /**
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index abdd116..545796e 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1645,6 +1645,7 @@ unsigned long ubifs_shrink_count(struct shrinker *shrink,
 void ubifs_recovery_commit(struct ubifs_info *c);
 int ubifs_gc_should_commit(struct ubifs_info *c);
 void ubifs_wait_for_commit(struct ubifs_info *c);
+int ubifs_nothing_to_c

Re: [PATCH 4/6] ubifs: Maintain a parent pointer

2017-05-22 Thread Hyunchul Lee
Hi Richard,

On Mon, May 22, 2017 at 10:45:08AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 22.05.2017 um 06:30 schrieb Hyunchul Lee:
> >> +  if (move)
> >> +  old_inode_ui->parent_inum = new_dir->i_ino;
> >> +
> >>err = ubifs_jnl_rename(c, old_dir, old_inode, _nm, new_dir,
> >>   new_inode, _nm, whiteout, sync);
> > 
> > I think that old_inode_ui->parent_inum could point old_dir, even though 
> > old_inode
> > is a child of new_dir. this could happen that there is power-cut before
> > old_inode is synced. so I guess that old_inode is needed to be written with
> > rename's node group in ubifs_jnl_rename. is it right?
> 
> I assumed that the journal does this already because we change 
> old_inode->i_ctime
> in this function too.
> But checking the code showed the opposite.
> So, if we face a power-cut the rename can succeed but we lose the ctime 
> change.
> 
> This needs to be addressed before we can add the parent pointer.

Is writing old_inode->i_ctime required? I guess that it is needed only when 
IS_SYNC(old_inode) is true, otherwise we don't need to guarantee that ctime
is synced.

> 
> Thanks,
> //richard
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH 4/6] ubifs: Maintain a parent pointer

2017-05-22 Thread Hyunchul Lee
Hi Richard,

On Mon, May 22, 2017 at 10:45:08AM +0200, Richard Weinberger wrote:
> Hyunchul,
> 
> Am 22.05.2017 um 06:30 schrieb Hyunchul Lee:
> >> +  if (move)
> >> +  old_inode_ui->parent_inum = new_dir->i_ino;
> >> +
> >>err = ubifs_jnl_rename(c, old_dir, old_inode, _nm, new_dir,
> >>   new_inode, _nm, whiteout, sync);
> > 
> > I think that old_inode_ui->parent_inum could point old_dir, even though 
> > old_inode
> > is a child of new_dir. this could happen that there is power-cut before
> > old_inode is synced. so I guess that old_inode is needed to be written with
> > rename's node group in ubifs_jnl_rename. is it right?
> 
> I assumed that the journal does this already because we change 
> old_inode->i_ctime
> in this function too.
> But checking the code showed the opposite.
> So, if we face a power-cut the rename can succeed but we lose the ctime 
> change.
> 
> This needs to be addressed before we can add the parent pointer.

Is writing old_inode->i_ctime required? I guess that it is needed only when 
IS_SYNC(old_inode) is true, otherwise we don't need to guarantee that ctime
is synced.

> 
> Thanks,
> //richard
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 

Thanks,
Hyunchul


Re: [PATCH 4/6] ubifs: Maintain a parent pointer

2017-05-21 Thread Hyunchul Lee
Hi Richard,

On Sun, May 21, 2017 at 10:20:49PM +0200, Richard Weinberger wrote:
> The new feature UBIFS_FLG_PARENTPOINTER allows looking
> up the parent. Usually the Linux VFS walks down the filesystem
> and no parent pointers are needed. But when a filesystem
> is exportable via NFS such a lookup is needed.
> We can use a padding field in struct ubifs_ino_node to
> maintain a pointer to the parent inode.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  fs/ubifs/dir.c | 21 +++--
>  fs/ubifs/journal.c |  5 -
>  fs/ubifs/sb.c  |  2 ++
>  fs/ubifs/super.c   |  1 +
>  fs/ubifs/ubifs-media.h | 12 +---
>  fs/ubifs/ubifs.h   |  4 
>  6 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index e79b529df9c3..a6eadb52a1a8 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -171,6 +171,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, 
> struct inode *dir,
>   }
>  
>   inode->i_ino = ++c->highest_inum;
> + ui->parent_inum = dir->i_ino;
>   /*
>* The creation sequence number remains with this inode for its
>* lifetime. All nodes for this inode have a greater sequence number,
> @@ -1374,7 +1375,7 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   if (unlink)
>   ubifs_assert(inode_is_locked(new_inode));
>  
> - if (old_dir != new_dir) {
> + if (move) {
>   if (ubifs_crypt_is_encrypted(new_dir) &&
>   !fscrypt_has_permitted_context(new_dir, old_inode))
>   return -EPERM;
> @@ -1528,8 +1529,12 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   mark_inode_dirty(whiteout);
>   whiteout->i_state &= ~I_LINKABLE;
>   iput(whiteout);
> + whiteout_ui->parent_inum = new_dir->i_ino;
>   }
>  
> + if (move)
> + old_inode_ui->parent_inum = new_dir->i_ino;
> +
>   err = ubifs_jnl_rename(c, old_dir, old_inode, _nm, new_dir,
>  new_inode, _nm, whiteout, sync);

I think that old_inode_ui->parent_inum could point old_dir, even though 
old_inode
is a child of new_dir. this could happen that there is power-cut before
old_inode is synced. so I guess that old_inode is needed to be written with
rename's node group in ubifs_jnl_rename. is it right?

>   if (err)
> @@ -1571,6 +1576,8 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   inc_nlink(old_dir);
>   }
>   }
> + if (move)
> + old_inode_ui->parent_inum = old_dir->i_ino;
>   if (whiteout) {
>   drop_nlink(whiteout);
>   iput(whiteout);
> @@ -1592,6 +1599,8 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   int sync = IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir);
>   struct inode *fst_inode = d_inode(old_dentry);
>   struct inode *snd_inode = d_inode(new_dentry);
> + struct ubifs_inode *fst_inode_ui = ubifs_inode(fst_inode);
> + struct ubifs_inode *snd_inode_ui = ubifs_inode(snd_inode);
>   struct timespec time;
>   int err;
>   struct fscrypt_name fst_nm, snd_nm;
> @@ -1623,7 +1632,10 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   old_dir->i_mtime = old_dir->i_ctime = time;
>   new_dir->i_mtime = new_dir->i_ctime = time;
>  
> - if (old_dir != new_dir) {
> + if (new_dir != old_dir) {
> + fst_inode_ui->parent_inum = new_dir->i_ino;
> + snd_inode_ui->parent_inum = old_dir->i_ino;
> +
>   if (S_ISDIR(fst_inode->i_mode) && !S_ISDIR(snd_inode->i_mode)) {
>   inc_nlink(new_dir);
>   drop_nlink(old_dir);
> @@ -1637,6 +1649,11 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   err = ubifs_jnl_xrename(c, old_dir, fst_inode, _nm, new_dir,
>   snd_inode, _nm, sync);
>  
> + if (err && new_dir != old_dir) {
> + fst_inode_ui->parent_inum = old_dir->i_ino;
> + snd_inode_ui->parent_inum = new_dir->i_ino;
> + }
> +
>   unlock_4_inodes(old_dir, new_dir, NULL, NULL);
>   ubifs_release_budget(c, );
>  
> diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
> index 294519b98874..8eaf8f2f1fe1 100644
> --- a/fs/ubifs/journal.c
> +++ b/fs/ubifs/journal.c
> @@ -66,7 +66,6 @@
>   */
>  static inline void zero_ino_node_unused(struct ubifs_ino_node *ino)
>  {
> - memset(ino->padding1, 0, 4);
>   memset(ino->padding2, 0, 26);
>  }
>  
> @@ -470,6 +469,10 @@ static void pack_inode(struct ubifs_info *c, struct 
> ubifs_ino_node *ino,
>   ino->xattr_cnt   = cpu_to_le32(ui->xattr_cnt);
>   ino->xattr_size  = cpu_to_le32(ui->xattr_size);
>   ino->xattr_names = cpu_to_le32(ui->xattr_names);
> 

Re: [PATCH 4/6] ubifs: Maintain a parent pointer

2017-05-21 Thread Hyunchul Lee
Hi Richard,

On Sun, May 21, 2017 at 10:20:49PM +0200, Richard Weinberger wrote:
> The new feature UBIFS_FLG_PARENTPOINTER allows looking
> up the parent. Usually the Linux VFS walks down the filesystem
> and no parent pointers are needed. But when a filesystem
> is exportable via NFS such a lookup is needed.
> We can use a padding field in struct ubifs_ino_node to
> maintain a pointer to the parent inode.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  fs/ubifs/dir.c | 21 +++--
>  fs/ubifs/journal.c |  5 -
>  fs/ubifs/sb.c  |  2 ++
>  fs/ubifs/super.c   |  1 +
>  fs/ubifs/ubifs-media.h | 12 +---
>  fs/ubifs/ubifs.h   |  4 
>  6 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index e79b529df9c3..a6eadb52a1a8 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -171,6 +171,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, 
> struct inode *dir,
>   }
>  
>   inode->i_ino = ++c->highest_inum;
> + ui->parent_inum = dir->i_ino;
>   /*
>* The creation sequence number remains with this inode for its
>* lifetime. All nodes for this inode have a greater sequence number,
> @@ -1374,7 +1375,7 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   if (unlink)
>   ubifs_assert(inode_is_locked(new_inode));
>  
> - if (old_dir != new_dir) {
> + if (move) {
>   if (ubifs_crypt_is_encrypted(new_dir) &&
>   !fscrypt_has_permitted_context(new_dir, old_inode))
>   return -EPERM;
> @@ -1528,8 +1529,12 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   mark_inode_dirty(whiteout);
>   whiteout->i_state &= ~I_LINKABLE;
>   iput(whiteout);
> + whiteout_ui->parent_inum = new_dir->i_ino;
>   }
>  
> + if (move)
> + old_inode_ui->parent_inum = new_dir->i_ino;
> +
>   err = ubifs_jnl_rename(c, old_dir, old_inode, _nm, new_dir,
>  new_inode, _nm, whiteout, sync);

I think that old_inode_ui->parent_inum could point old_dir, even though 
old_inode
is a child of new_dir. this could happen that there is power-cut before
old_inode is synced. so I guess that old_inode is needed to be written with
rename's node group in ubifs_jnl_rename. is it right?

>   if (err)
> @@ -1571,6 +1576,8 @@ static int do_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   inc_nlink(old_dir);
>   }
>   }
> + if (move)
> + old_inode_ui->parent_inum = old_dir->i_ino;
>   if (whiteout) {
>   drop_nlink(whiteout);
>   iput(whiteout);
> @@ -1592,6 +1599,8 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   int sync = IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir);
>   struct inode *fst_inode = d_inode(old_dentry);
>   struct inode *snd_inode = d_inode(new_dentry);
> + struct ubifs_inode *fst_inode_ui = ubifs_inode(fst_inode);
> + struct ubifs_inode *snd_inode_ui = ubifs_inode(snd_inode);
>   struct timespec time;
>   int err;
>   struct fscrypt_name fst_nm, snd_nm;
> @@ -1623,7 +1632,10 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   old_dir->i_mtime = old_dir->i_ctime = time;
>   new_dir->i_mtime = new_dir->i_ctime = time;
>  
> - if (old_dir != new_dir) {
> + if (new_dir != old_dir) {
> + fst_inode_ui->parent_inum = new_dir->i_ino;
> + snd_inode_ui->parent_inum = old_dir->i_ino;
> +
>   if (S_ISDIR(fst_inode->i_mode) && !S_ISDIR(snd_inode->i_mode)) {
>   inc_nlink(new_dir);
>   drop_nlink(old_dir);
> @@ -1637,6 +1649,11 @@ static int ubifs_xrename(struct inode *old_dir, struct 
> dentry *old_dentry,
>   err = ubifs_jnl_xrename(c, old_dir, fst_inode, _nm, new_dir,
>   snd_inode, _nm, sync);
>  
> + if (err && new_dir != old_dir) {
> + fst_inode_ui->parent_inum = old_dir->i_ino;
> + snd_inode_ui->parent_inum = new_dir->i_ino;
> + }
> +
>   unlock_4_inodes(old_dir, new_dir, NULL, NULL);
>   ubifs_release_budget(c, );
>  
> diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
> index 294519b98874..8eaf8f2f1fe1 100644
> --- a/fs/ubifs/journal.c
> +++ b/fs/ubifs/journal.c
> @@ -66,7 +66,6 @@
>   */
>  static inline void zero_ino_node_unused(struct ubifs_ino_node *ino)
>  {
> - memset(ino->padding1, 0, 4);
>   memset(ino->padding2, 0, 26);
>  }
>  
> @@ -470,6 +469,10 @@ static void pack_inode(struct ubifs_info *c, struct 
> ubifs_ino_node *ino,
>   ino->xattr_cnt   = cpu_to_le32(ui->xattr_cnt);
>   ino->xattr_size  = cpu_to_le32(ui->xattr_size);
>   ino->xattr_names = cpu_to_le32(ui->xattr_names);
> + if 

  1   2   >