Re: [PATCH 2/2] f2fs: support data compression

2019-10-22 Thread Eric Biggers
On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:
> From: Chao Yu 
> 
> This patch tries to support compression in f2fs.
> 
> - New term named cluster is defined as basic unit of compression, file can
> be divided into multiple clusters logically. One cluster includes 4 << n
> (n >= 0) logical pages, compression size is also cluster size, each of
> cluster can be compressed or not.
> 
> - In cluster metadata layout, one special flag is used to indicate cluster
> is compressed one or normal one, for compressed cluster, following metadata
> maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
> data including compress header and compressed data.
> 
> - In order to eliminate write amplification during overwrite, F2FS only
> support compression on write-once file, data can be compressed only when
> all logical blocks in file are valid and cluster compress ratio is lower
> than specified threshold.
> 
> - To enable compression on regular inode, there are three ways:
> * chattr +c file
> * chattr +c dir; touch dir/file
> * mount w/ -o compress_extension=ext; touch file.ext
> 
> Compress metadata layout:
>  [Dnode Structure]
>  +---+
>  | cluster 1 | cluster 2 | . | cluster N |
>  +---+
>  .   .   .   .
>.   ..  .
>   . Compressed Cluster   ..Normal Cluster 
>.
> +--+-+-+-+  
> +-+-+-+-+
> |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | 
> block 4 |
> +--+-+-+-+  
> +-+-+-+-+
>. .
>  .   .
>.   .
>   +-+-+--++
>   | data length | data chksum | reserved |  compressed data   |
>   +-+-+--++
> 
> Changelog:
> 
> 20190326:
> - fix error handling of read_end_io().
> - remove unneeded comments in f2fs_encrypt_one_page().
> 
> 20190327:
> - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
> - don't jump into loop directly to avoid uninitialized variables.
> - add TODO tag in error path of f2fs_write_cache_pages().
> 
> 20190328:
> - fix wrong merge condition in f2fs_read_multi_pages().
> - check compressed file in f2fs_post_read_required().
> 
> 20190401
> - allow overwrite on non-compressed cluster.
> - check cluster meta before writing compressed data.
> 
> 20190402
> - don't preallocate blocks for compressed file.
> 
> - add lz4 compress algorithm
> - process multiple post read works in one workqueue
>   Now f2fs supports processing post read work in multiple workqueue,
>   it shows low performance due to schedule overhead of multiple
>   workqueue executing orderly.
> 
> - compress: support buffered overwrite
> C: compress cluster flag
> V: valid block address
> N: NEW_ADDR
> 
> One cluster contain 4 blocks
> 
>  before overwrite   after overwrite
> 
> - ->  CVNN
> - CVNN->  
> 
> - CVNN->  CVNN
> - CVNN->  CVVV
> 
> - CVVV->  CVNN
> - CVVV->  CVVV
> 
> [Jaegeuk Kim]
> - add tracepoint for f2fs_{,de}compress_pages()
> - fix many bugs and add some compression stats
> 
> Signed-off-by: Chao Yu 
> Signed-off-by: Jaegeuk Kim 

How was this tested?  Shouldn't there a mount option analogous to
test_dummy_encryption that causes all files to be auto-compressed, so that a
full run of xfstests can be done with compression?  I see "compress_extension",
but apparently it's only for a file extension?  Also, since reads can involve
any combination of decryption, compression, and verity, it's important to test
as many combinations as possible, including all at once.  Has that been done?

I also tried running the fs-verity xfstests on this with
'kvm-xfstests -c f2fs -g verity', but the kernel immediately crashes:

BUG: kernel NULL pointer dereference, address: 0182
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0 
Oops:  [#1] SMP
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1-00119-g60f351f4c50f #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
?-20191013_105130-anatol 04/01/2014
RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 
80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
RSP: 0018:c90a8db0 EFLAGS: 00010046
RAX: 88807d94e340 RBX: 00

Re: [f2fs-dev] [PATCH v5 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() helper function

2019-10-22 Thread Shinichiro Kawasaki
On Oct 22, 2019 / 17:24, Chao Yu wrote:
> On 2019/10/22 17:10, Damien Le Moal wrote:
> > On 2019/10/22 17:59, Chao Yu wrote:
> >> On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> >>> To prepare for write pointer consistency fix by fsck, add
> >>> f2fs_reset_zone() helper function which calls RESET ZONE command. The
> >>> function is added to lib/libf2fs_zoned which gathers zoned block device
> >>> related functions.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki 
> >>> ---
> >>>  include/f2fs_fs.h   |  1 +
> >>>  lib/libf2fs_zoned.c | 26 ++
> >>>  2 files changed, 27 insertions(+)
> >>>
> >>> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
> >>> index 1f7ef05..a36927b 100644
> >>> --- a/include/f2fs_fs.h
> >>> +++ b/include/f2fs_fs.h
> >>> @@ -1303,6 +1303,7 @@ extern int f2fs_report_zone(int, u_int64_t, void *);
> >>>  typedef int (report_zones_cb_t)(int i, void *, void *);
> >>>  extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
> >>>  extern int f2fs_check_zones(int);
> >>> +int f2fs_reset_zone(int, void *);
> >>>  extern int f2fs_reset_zones(int);
> >>>  
> >>>  #define SIZE_ALIGN(val, size)((val) + (size) - 1) / (size)
> >>> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
> >>> index 10d6d0b..1335038 100644
> >>> --- a/lib/libf2fs_zoned.c
> >>> +++ b/lib/libf2fs_zoned.c
> >>> @@ -388,6 +388,26 @@ out:
> >>>   return ret;
> >>>  }
> >>>  
> >>> +int f2fs_reset_zone(int i, void *blkzone)
> >>> +{
> >>> + struct blk_zone *blkz = (struct blk_zone *)blkzone;
> >>> + struct device_info *dev = c.devices + i;
> >>> + struct blk_zone_range range;
> >>> + int ret;
> >>> +
> >>> + if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
> >>> + return 0;
> >>> +
> >>> + /* Non empty sequential zone: reset */
> >>> + range.sector = blk_zone_sector(blkz);
> >>> + range.nr_sectors = blk_zone_length(blkz);
> >>> + ret = ioctl(dev->fd, BLKRESETZONE, &range);
> >>> + if (ret != 0)
> >>
> >> As you did in other zoned block device code, errno would be preferred as 
> >> return
> >> value?

Yes, should return -errno. This made me think that it's the better to print
errno in ERR_MSG below. Will reflect these changes in the next version.

> >>
> >>> + ERR_MSG("ioctl BLKRESETZONE failed\n");
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>>  int f2fs_reset_zones(int j)
> >>>  {
> >>>   struct device_info *dev = c.devices + j;
> >>> @@ -491,6 +511,12 @@ int f2fs_check_zones(int i)
> >>>   return -1;
> >>>  }
> >>>  
> >>> +int f2fs_reset_zone(int i, void *blkzone)
> >>> +{
> >>> + ERR_MSG("%d: Zoned block devices are not supported\n", i);
> >>
> >> Minor thing:
> >>
> >> "device is"?
> > 
> > ERR_MSG("%d: Unsupported zoned block device\n", i);
> > 
> > may be better.
> 
> Looks better.

Thanks. Will reflect in the next version. Same change will be applied to
1st and 2nd patches for f2fs_report_zones() and f2fs_report_zone().

> 
> > 
> > Note that we should never hit this anyway since the validity of the set
> > of devices used should have been checked way before we start resetting
> > zones.
> 
> Yup.
> 
> Thanks,
> 
> > 
> >>
> >>> + return -1;
> >>> +}
> >>> +
> >>>  int f2fs_reset_zones(int i)
> >>>  {
> >>>   ERR_MSG("%d: Zoned block devices are not supported\n", i);
> >>
> >> "device is"?
> > 
> > Same as above.

Not only f2fs_reset_zones() but also f2fs_check_zones() have the same
ERR_MSG string. I will replace the message string of these functions
with suggested one in this 3rd patch.

Thanks!

--
Best Regards,
Shin'ichiro Kawasaki

___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v14 1/5] Add flags option to get xattr method paired to __vfs_getxattr

2019-10-22 Thread Mark Salyzyn via Linux-f2fs-devel
Replace arguments for get and set xattr methods, and __vfs_getxattr
and __vfs_setaxtr functions with a reference to the following now
common argument structure:

struct xattr_gs_args {
struct dentry *dentry;
struct inode *inode;
const char *name;
union {
void *buffer;
const void *value;
};
size_t size;
int flags;
};

Which in effect adds a flags option to the get method and
__vfs_getxattr function.

Add a flag option to get xattr method that has bit flag of
XATTR_NOSECURITY passed to it.  XATTR_NOSECURITY is generally then
set in the __vfs_getxattr path when called by security
infrastructure.

This handles the case of a union filesystem driver that is being
requested by the security layer to report back the xattr data.

For the use case where access is to be blocked by the security layer.

The path then could be security(dentry) ->
__vfs_getxattr({dentry...XATTR_NOSECURITY}) ->
handler->get({dentry...XATTR_NOSECURITY}) ->
__vfs_getxattr({lower_dentry...XATTR_NOSECURITY}) ->
lower_handler->get({lower_dentry...XATTR_NOSECURITY})
which would report back through the chain data and success as
expected, the logging security layer at the top would have the
data to determine the access permissions and report back the target
context that was blocked.

Without the get handler flag, the path on a union filesystem would be
the errant security(dentry) -> __vfs_getxattr(dentry) ->
handler->get(dentry) -> vfs_getxattr(lower_dentry) -> nested ->
security(lower_dentry, log off) -> lower_handler->get(lower_dentry)
which would report back through the chain no data, and -EACCES.

For selinux for both cases, this would translate to a correctly
determined blocked access. In the first case with this change a correct avc
log would be reported, in the second legacy case an incorrect avc log
would be reported against an uninitialized u:object_r:unlabeled:s0
context making the logs cosmetically useless for audit2allow.

This patch series is inert and is the wide-spread addition of the
flags option for xattr functions, and a replacement of __vfs_getxattr
with __vfs_getxattr({...XATTR_NOSECURITY}).

Signed-off-by: Mark Salyzyn 
Reviewed-by: Jan Kara 
Acked-by: Jan Kara 
Acked-by: Jeff Layton 
Acked-by: David Sterba 
Acked-by: Darrick J. Wong 
Acked-by: Mike Marshall 
Cc: Stephen Smalley 
Cc: linux-ker...@vger.kernel.org
Cc: kernel-t...@android.com
Cc: linux-security-mod...@vger.kernel.org

---
v14 (new series):
- Reincorporate back into the bugfix series for overlayfs

v8:
- Documentation reported 'struct xattr_gs_flags' rather than
  'struct xattr_gs_flags *args' as argument to get and set methods.

v7:
- missed spots in fs/9p/acl.c, fs/afs/xattr.c, fs/ecryptfs/crypto.c,
  fs/ubifs/xattr.c, fs/xfs/libxfs/xfs_attr.c,
  security/integrity/evm/evm_main.c and security/smack/smack_lsm.c.

v6:
- kernfs missed a spot

v5:
- introduce struct xattr_gs_args for get and set methods,
  __vfs_getxattr and __vfs_setxattr functions.
- cover a missing spot in ext2.
- switch from snprintf to scnprintf for correctness.

v4:
- ifdef __KERNEL__ around XATTR_NOSECURITY to
  keep it colocated in uapi headers.

v3:
- poor aim on ubifs not ubifs_xattr_get, but static xattr_get

v2:
- Missed a spot: ubifs, erofs and afs.

v1:
- Removed from an overlayfs patch set, and made independent.
  Expect this to be the basis of some security improvements.

---
 Documentation/filesystems/locking.rst |  10 +--
 fs/9p/acl.c   |  51 ++--
 fs/9p/xattr.c |  19 ++---
 fs/afs/xattr.c| 112 --
 fs/btrfs/xattr.c  |  36 -
 fs/ceph/xattr.c   |  17 ++--
 fs/cifs/xattr.c   |  72 +
 fs/ecryptfs/crypto.c  |  20 +++--
 fs/ecryptfs/inode.c   |  36 +
 fs/ecryptfs/mmap.c|  39 -
 fs/erofs/xattr.c  |   8 +-
 fs/ext2/xattr_security.c  |  16 ++--
 fs/ext2/xattr_trusted.c   |  15 ++--
 fs/ext2/xattr_user.c  |  19 ++---
 fs/ext4/xattr_security.c  |  15 ++--
 fs/ext4/xattr_trusted.c   |  15 ++--
 fs/ext4/xattr_user.c  |  19 ++---
 fs/f2fs/xattr.c   |  42 +-
 fs/fuse/xattr.c   |  23 +++---
 fs/gfs2/xattr.c   |  18 ++---
 fs/hfs/attr.c |  15 ++--
 fs/hfsplus/xattr.c|  17 ++--
 fs/hfsplus/xattr_security.c   |  13 ++-
 fs/hfsplus/xattr_trusted.c|  13 ++-
 fs/hfsplus/xattr_user.c   |  13 ++-
 fs/jffs2/security.c   |  16 ++--
 fs/jffs2/xattr_trusted.c  |  16 ++--
 fs/jffs2/xattr_user.c |  16 ++--
 fs/jfs/xattr.c|  33 
 fs/kernfs/inode.c  

Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-22 Thread Ju Hyung Park
Hi Jaegeuk and Chao,

Nice to see this finally getting into shape :) Great work
I'm excited to see possible use-cases for this in the future.

Would f2fs compress files automatically like how btrfs' "compress" option works?
Or is it per-extension basis for now?

On Wed, Oct 23, 2019 at 2:16 AM Jaegeuk Kim  wrote:
> +compress_algorithm=%s  Control compress algorithm, currently f2fs supports 
> "lzo"
> +   and "lz4" algorithm.

I see absolutely no reason to support regular lzo variant at this time.
Everyone should use lz4 instead of lzo. If one wants zlib-level
compression, they should use zstd.

However, there's recent conversation on new lzo-rle and how it could
be a better candidate than lz4.

Since the mainline now have lz4, zstd and lzo-rle, I don't think
supporting lzo is a good idea.

> diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
> index 652fd2e2b23d..c12854c3b1a1 100644
> --- a/fs/f2fs/Kconfig
> +++ b/fs/f2fs/Kconfig
> @@ -6,6 +6,10 @@ config F2FS_FS
> select CRYPTO
> select CRYPTO_CRC32
> select F2FS_FS_XATTR if FS_ENCRYPTION
> +   select LZO_COMPRESS
> +   select LZO_DECOMPRESS
> +   select LZ4_COMPRESS
> +   select LZ4_DECOMPRESS

This is a bad idea.
This unnecessarily increases kernel binary image when no the user
intends to change the defaults.

For example, my Android kernel doesn't use lzo anywhere and this
wouldn't be welcome.

> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> new file mode 100644
> index ..f276d82a67aa
> --- /dev/null
> +++ b/fs/f2fs/compress.c
> @@ -0,0 +1,1066 @@
> +static unsigned int offset_in_cluster(struct compress_ctx *cc, pgoff_t index)
> +static unsigned int cluster_idx(struct compress_ctx *cc, pgoff_t index)
> +static unsigned int start_idx_of_cluster(struct compress_ctx *cc)

Looks like these would be better if they were explicitly marked as inline.

> +static void f2fs_init_compress_ops(struct f2fs_sb_info *sbi)
> +{
> +   sbi->cops[COMPRESS_LZO] = &f2fs_lzo_ops;
> +   sbi->cops[COMPRESS_LZ4] = &f2fs_lz4_ops;
> +}

Would it be possible for f2fs to use generic crypto compression APIs?
Hardcoding for lzo/lz4 would make it harder to venture future different options.

Have a look at mm/zswap.c:__zswap_pool_create_fallback().

> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index c681f51e351b..775c96291490 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -155,6 +163,7 @@ struct f2fs_mount_info {
>  #define F2FS_FEATURE_VERITY0x0400
>  #define F2FS_FEATURE_SB_CHKSUM 0x0800
>  #define F2FS_FEATURE_CASEFOLD  0x1000
> +#define F2FS_FEATURE_COMPRESSION   0x2000

How would older versions of f2fs behave if an image was used by the
latest f2fs and have compressed pages?
I hope fail-safes are in place.

Thanks.

> --
> 2.19.0.605.g01d371f741-goog
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 1/2] f2fs: support aligned pinned file

2019-10-22 Thread Jaegeuk Kim
This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
by allocating fully valid 2MB segment.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h |  4 +++-
 fs/f2fs/file.c | 39 ++-
 fs/f2fs/recovery.c |  2 +-
 fs/f2fs/segment.c  | 21 -
 fs/f2fs/segment.h  |  2 ++
 fs/f2fs/super.c|  1 +
 fs/f2fs/sysfs.c|  2 ++
 7 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ca342f4c7db1..c681f51e351b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -890,6 +890,7 @@ enum {
CURSEG_WARM_NODE,   /* direct node blocks of normal files */
CURSEG_COLD_NODE,   /* indirect node blocks */
NO_CHECK_TYPE,
+   CURSEG_COLD_DATA_PINNED,/* cold data for pinned file */
 };
 
 struct flush_cmd {
@@ -1301,6 +1302,7 @@ struct f2fs_sb_info {
 
/* threshold for gc trials on pinned files */
u64 gc_pin_file_threshold;
+   struct rw_semaphore pin_sem;
 
/* maximum # of trials to find a victim segment for SSR and GC */
unsigned int max_victim_search;
@@ -3116,7 +3118,7 @@ void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
 int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
 void allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
unsigned int start, unsigned int end);
-void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
+void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi, int type);
 int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
 bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
struct cp_control *cpc);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 29bc0a542759..f6c038e8a6a7 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1545,12 +1545,41 @@ static int expand_inode_data(struct inode *inode, 
loff_t offset,
if (off_end)
map.m_len++;
 
-   if (f2fs_is_pinned_file(inode))
-   map.m_seg_type = CURSEG_COLD_DATA;
+   if (!map.m_len)
+   return 0;
+
+   if (f2fs_is_pinned_file(inode)) {
+   block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
+   sbi->log_blocks_per_seg;
+   block_t done = 0;
+
+   if (map.m_len % sbi->blocks_per_seg)
+   len += sbi->blocks_per_seg;
 
-   err = f2fs_map_blocks(inode, &map, 1, (f2fs_is_pinned_file(inode) ?
-   F2FS_GET_BLOCK_PRE_DIO :
-   F2FS_GET_BLOCK_PRE_AIO));
+   map.m_len = sbi->blocks_per_seg;
+next_alloc:
+   mutex_lock(&sbi->gc_mutex);
+   err = f2fs_gc(sbi, true, false, NULL_SEGNO);
+   if (err && err != -ENODATA && err != -EAGAIN)
+   goto out_err;
+
+   down_write(&sbi->pin_sem);
+   map.m_seg_type = CURSEG_COLD_DATA_PINNED;
+   f2fs_allocate_new_segments(sbi, CURSEG_COLD_DATA);
+   err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO);
+   up_write(&sbi->pin_sem);
+
+   done += map.m_len;
+   len -= map.m_len;
+   map.m_lblk += map.m_len;
+   if (!err && len)
+   goto next_alloc;
+
+   map.m_len = done;
+   } else {
+   err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO);
+   }
+out_err:
if (err) {
pgoff_t last_off;
 
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 783773e4560d..76477f71d4ee 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -711,7 +711,7 @@ static int recover_data(struct f2fs_sb_info *sbi, struct 
list_head *inode_list,
f2fs_put_page(page, 1);
}
if (!err)
-   f2fs_allocate_new_segments(sbi);
+   f2fs_allocate_new_segments(sbi, NO_CHECK_TYPE);
return err;
 }
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 25c750cd0272..253d72c2663c 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2690,7 +2690,7 @@ void allocate_segment_for_resize(struct f2fs_sb_info 
*sbi, int type,
up_read(&SM_I(sbi)->curseg_lock);
 }
 
-void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
+void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi, int type)
 {
struct curseg_info *curseg;
unsigned int old_segno;
@@ -2699,6 +2699,9 @@ void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
down_write(&SIT_I(sbi)->sentry_lock);
 
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
+   if (type != NO_CHECK_TYPE && i != type)
+   continue;
+
curseg = CURSEG_I(sbi, i);
old_segno = curseg->segno;
SIT_I

[f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-22 Thread Jaegeuk Kim
From: Chao Yu 

This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
 [Dnode Structure]
 +---+
 | cluster 1 | cluster 2 | . | cluster N |
 +---+
 .   .   .   .
   .   ..  .
  . Compressed Cluster   ..Normal Cluster   
 .
+--+-+-+-+  
+-+-+-+-+
|compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | 
block 4 |
+--+-+-+-+  
+-+-+-+-+
   . .
 .   .
   .   .
  +-+-+--++
  | data length | data chksum | reserved |  compressed data   |
  +-+-+--++

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
  Now f2fs supports processing post read work in multiple workqueue,
  it shows low performance due to schedule overhead of multiple
  workqueue executing orderly.

- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

 before overwrite   after overwrite

-   ->  CVNN
- CVNN  ->  

- CVNN  ->  CVNN
- CVNN  ->  CVVV

- CVVV  ->  CVNN
- CVVV  ->  CVVV

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats

Signed-off-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 Documentation/filesystems/f2fs.txt |   48 ++
 fs/f2fs/Kconfig|4 +
 fs/f2fs/Makefile   |2 +-
 fs/f2fs/compress.c | 1066 
 fs/f2fs/data.c |  468 ++--
 fs/f2fs/debug.c|6 +
 fs/f2fs/f2fs.h |  205 +-
 fs/f2fs/file.c |  124 +++-
 fs/f2fs/inode.c|   29 +
 fs/f2fs/namei.c|   47 ++
 fs/f2fs/segment.c  |5 +-
 fs/f2fs/segment.h  |   12 -
 fs/f2fs/super.c|  115 ++-
 fs/f2fs/sysfs.c|7 +
 include/linux/f2fs_fs.h|8 +
 include/trace/events/f2fs.h|   99 +++
 16 files changed, 2139 insertions(+), 106 deletions(-)
 create mode 100644 fs/f2fs/compress.c

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 29020af0cff9..d1accf665c86 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -235,6 +235,13 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off 
checkpointing. Set to "en
hide up to all remaining free space. The actual space 
that
would be unusable can be viewed at 
/sys/fs/f2fs//unusable
This space is reclaimed once checkpoint=enable.
+compress_algorithm=%s  Control compress algorithm, currently f2fs sup

Re: [f2fs-dev] [PATCH v5 6/8] fsck: Check fsync data always for zoned block devices

2019-10-22 Thread Jaegeuk Kim
On 10/22, Chao Yu wrote:
> On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> > Fsck checks fsync data when UMOUNT flag is not set. When the f2fs was not
> > cleanly unmouted, UMOUNT flag is not recorded in meta data and fsync data
> > can be left in the f2fs. The first fsck run checks fsync data to reflect
> > it on quota status recovery. After that, fsck writes UMOUNT flag in the
> > f2fs meta data so that second fsck run can skip fsync data check.
> 
> Oh, IMO, we need that flag to let fsck know there is still fsynced data in the
> image, could we just leave it as it is?
> 
> To Jaegeuk, thoughts?

I don't have objection to apply this patch. :P

> 
> Thanks,
> 
> > 
> > However, fsck for zoned block devices need to care fsync data for all
> > fsck runs. The first fsck run checks fsync data, then fsck can check
> > write pointer consistency with fsync data. However, since second fsck run
> > does not check fsync data, fsck detects write pointer at fsync data end
> > is not consistent with f2fs meta data. This results in meta data update
> > by fsck and fsync data gets lost.
> > 
> > To have fsck check fsync data always for zoned block devices, introduce
> > need_fsync_data_record() helper function which returns boolean to tell
> > if fsck needs fsync data check or not. For zoned block devices, always
> > return true. Otherwise, return true if UMOUNT flag is not set in CP.
> > Replace UMOUNT flag check codes for fsync data with the function call.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > ---
> >  fsck/fsck.h|  6 ++
> >  fsck/mount.c   | 14 +++---
> >  fsck/segment.c |  2 +-
> >  3 files changed, 14 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fsck/fsck.h b/fsck/fsck.h
> > index 8da0ebb..75052d8 100644
> > --- a/fsck/fsck.h
> > +++ b/fsck/fsck.h
> > @@ -133,6 +133,12 @@ enum seg_type {
> >  
> >  struct selabel_handle;
> >  
> > +static inline bool need_fsync_data_record(struct f2fs_sb_info *sbi)
> > +{
> > +   return !is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG) ||
> > +   c.zoned_model == F2FS_ZONED_HM;
> > +}
> > +
> >  extern int fsck_chk_orphan_node(struct f2fs_sb_info *);
> >  extern int fsck_chk_quota_node(struct f2fs_sb_info *);
> >  extern int fsck_chk_quota_files(struct f2fs_sb_info *);
> > diff --git a/fsck/mount.c b/fsck/mount.c
> > index 915f487..2979865 100644
> > --- a/fsck/mount.c
> > +++ b/fsck/mount.c
> > @@ -1525,7 +1525,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
> >  
> > bitmap_size = TOTAL_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE;
> >  
> > -   if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG))
> > +   if (need_fsync_data_record(sbi))
> > bitmap_size += bitmap_size;
> >  
> > sit_i->bitmap = calloc(bitmap_size, 1);
> > @@ -1540,7 +1540,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
> > sit_i->sentries[start].cur_valid_map = bitmap;
> > bitmap += SIT_VBLOCK_MAP_SIZE;
> >  
> > -   if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG)) {
> > +   if (need_fsync_data_record(sbi)) {
> > sit_i->sentries[start].ckpt_valid_map = bitmap;
> > bitmap += SIT_VBLOCK_MAP_SIZE;
> > }
> > @@ -1887,7 +1887,7 @@ void seg_info_from_raw_sit(struct f2fs_sb_info *sbi, 
> > struct seg_entry *se,
> >  {
> > __seg_info_from_raw_sit(se, raw_sit);
> >  
> > -   if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> > +   if (!need_fsync_data_record(sbi))
> > return;
> > se->ckpt_valid_blocks = se->valid_blocks;
> > memcpy(se->ckpt_valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
> > @@ -1903,7 +1903,7 @@ struct seg_entry *get_seg_entry(struct f2fs_sb_info 
> > *sbi,
> >  
> >  unsigned short get_seg_vblocks(struct f2fs_sb_info *sbi, struct seg_entry 
> > *se)
> >  {
> > -   if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> > +   if (!need_fsync_data_record(sbi))
> > return se->valid_blocks;
> > else
> > return se->ckpt_valid_blocks;
> > @@ -1911,7 +1911,7 @@ unsigned short get_seg_vblocks(struct f2fs_sb_info 
> > *sbi, struct seg_entry *se)
> >  
> >  unsigned char *get_seg_bitmap(struct f2fs_sb_info *sbi, struct seg_entry 
> > *se)
> >  {
> > -   if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> > +   if (!need_fsync_data_record(sbi))
> > return se->cur_valid_map;
> > else
> > return se->ckpt_valid_map;
> > @@ -1919,7 +1919,7 @@ unsigned char *get_seg_bitmap(struct f2fs_sb_info 
> > *sbi, struct seg_entry *se)
> >  
> >  unsigned char get_seg_type(struct f2fs_sb_info *sbi, struct seg_entry *se)
> >  {
> > -   if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> > +   if (!need_fsync_data_record(sbi))
> > return se->type;
> > else
> > return se->ckpt_type;
> > @@ -3242,7 +3242,7 @@ static int record_fsync_data(struct f2fs_sb_info *sbi)
> > struct list_head inode_list = LIST_HEAD_INIT(inode_list);
> > int ret;
> >  
> > -   if (is_set_ckpt

Re: [f2fs-dev] [PATCH 3/3] f2fs: add support for INLINE_CRYPT_OPTIMIZED encryption policies

2019-10-22 Thread Jaegeuk Kim
On 10/21, Eric Biggers wrote:
> From: Eric Biggers 
> 
> f2fs inode numbers are stable across filesystem resizing, and f2fs inode
> and file logical block numbers are always 32-bit.  So f2fs can always
> support INLINE_CRYPT_OPTIMIZED encryption policies.  Wire up the needed
> fscrypt_operations to declare support.
> 
> Signed-off-by: Eric Biggers 

Acked-by: Jaegeuk Kim 

> ---
>  fs/f2fs/super.c | 26 --
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1443cee158633..851ac95229263 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2308,13 +2308,27 @@ static bool f2fs_dummy_context(struct inode *inode)
>   return DUMMY_ENCRYPTION_ENABLED(F2FS_I_SB(inode));
>  }
>  
> +static bool f2fs_has_stable_inodes(struct super_block *sb)
> +{
> + return true;
> +}
> +
> +static void f2fs_get_ino_and_lblk_bits(struct super_block *sb,
> +int *ino_bits_ret, int *lblk_bits_ret)
> +{
> + *ino_bits_ret = 8 * sizeof(nid_t);
> + *lblk_bits_ret = 8 * sizeof(block_t);
> +}
> +
>  static const struct fscrypt_operations f2fs_cryptops = {
> - .key_prefix = "f2fs:",
> - .get_context= f2fs_get_context,
> - .set_context= f2fs_set_context,
> - .dummy_context  = f2fs_dummy_context,
> - .empty_dir  = f2fs_empty_dir,
> - .max_namelen= F2FS_NAME_LEN,
> + .key_prefix = "f2fs:",
> + .get_context= f2fs_get_context,
> + .set_context= f2fs_set_context,
> + .dummy_context  = f2fs_dummy_context,
> + .empty_dir  = f2fs_empty_dir,
> + .max_namelen= F2FS_NAME_LEN,
> + .has_stable_inodes  = f2fs_has_stable_inodes,
> + .get_ino_and_lblk_bits  = f2fs_get_ino_and_lblk_bits,
>  };
>  #endif
>  
> -- 
> 2.23.0.866.gb869b98d4c-goog


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/3] ext4: add support for INLINE_CRYPT_OPTIMIZED encryption policies

2019-10-22 Thread Eric Biggers
On Tue, Oct 22, 2019 at 09:37:16AM -0400, Theodore Y. Ts'o wrote:
> On Mon, Oct 21, 2019 at 04:03:54PM -0700, Eric Biggers wrote:
> > From: Eric Biggers 
> > 
> > INLINE_CRYPT_OPTIMIZED encryption policies have special requirements
> > from the filesystem:
> > 
> > - Inode numbers must never change, even if the filesystem is resized
> > - Inode numbers must be <= 32 bits
> > - File logical block numbers must be <= 32 bits
> 
> You need to guarantee more than this; you also need to guarantee that
> the logical block number may not change.  Fortunately, because the
> original per-file key scheme used a logical block tweak, we've
> prohibited this already, and we didn't relax this restriction for
> files encrpyted using DIRECT_KEY.  So it's a requirement which we
> already meet, but we should document this requirement explicitly ---
> both here and also in Documentations/filesystems/fscrypt.rst.
> 
> Otherwise, looks good.  Feel free to add:
> 
> Reviewed-by: Theodore Ts'o 
> 

This is meant to list the requirements over the current policies.  If we wanted
to list all requirements on filesystems to support any fscrypt policy at all,
we'd also have to list a lot of other things like that the filesystem must
implement all the fscrypt_operations, must call all the needed hooks, must
support encrypted filenames and symlinks, etc...

I'll change the beginning of this commit message to
"INLINE_CRYPT_OPTIMIZED encryption policies have special requirements
from the filesystem, in comparison to the current encryption policies:"

... and in the previous patch I'll add a note in the "Contents encryption"
section of Documentation/filesystems/fscrypt.rst that the use of the file
logical block number means that filesystems must not allow operations that would
change it.

- Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/3] fscrypt: add support for inline-encryption-optimized policies

2019-10-22 Thread Eric Biggers
On Tue, Oct 22, 2019 at 09:30:01AM -0400, Theodore Y. Ts'o wrote:
> > An alternative which would work nicely on ext4 and xfs (if xfs supported
> > fscrypt) would be to pass the physical block number as the DUN.  However, 
> > that
> > wouldn't work at all on f2fs because f2fs moves data blocks around.  And 
> > since
> > most people who want to use this are using f2fs, f2fs support is essential.
> 
> And that is something fscrypt supports already, so if people really
> did want to use 64-bit logical block numbers, they could do that, at
> the cost of giving up the ability to shrink the file system (which XFS
> doesn't support anyway)

I was talking about the physical block number (offset from the start of the
filesystem -- ext4_fsblk_t on ext4), not the file logical block number (offset
in the file data -- ext4_lblk_t on ext4).  fscrypt doesn't currently support
using the physical block number.

- Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/3] ext4: add support for INLINE_CRYPT_OPTIMIZED encryption policies

2019-10-22 Thread Theodore Y. Ts'o
On Mon, Oct 21, 2019 at 04:03:54PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> INLINE_CRYPT_OPTIMIZED encryption policies have special requirements
> from the filesystem:
> 
> - Inode numbers must never change, even if the filesystem is resized
> - Inode numbers must be <= 32 bits
> - File logical block numbers must be <= 32 bits

You need to guarantee more than this; you also need to guarantee that
the logical block number may not change.  Fortunately, because the
original per-file key scheme used a logical block tweak, we've
prohibited this already, and we didn't relax this restriction for
files encrpyted using DIRECT_KEY.  So it's a requirement which we
already meet, but we should document this requirement explicitly ---
both here and also in Documentations/filesystems/fscrypt.rst.

Otherwise, looks good.  Feel free to add:

Reviewed-by: Theodore Ts'o 

- Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/3] fscrypt: add support for inline-encryption-optimized policies

2019-10-22 Thread Theodore Y. Ts'o
On Mon, Oct 21, 2019 at 11:00:04PM -0700, Eric Biggers wrote:
> That won't work because we need consecutive file blocks to have consecutive 
> IVs
> as often as possible.  The crypto support in the UFS and EMMC standards takes
> only a single 64-bit "data unit number" (DUN) per request, which the hardware
> uses as the first 64 bits of the IV and automatically increments for each data
> unit (i.e. for each filesystem block, in this case).

It seems very likely that for systems that are using UFS and eMMC
(which are overwhelming lower-end devices --- e.g., embedded and
mobile handsets) 32-bit inode and logical block numbers will be just
fine.

If and when we actually get inline crypto support for server-class
systems, hopefully they will support 128-bit DUN's, and/or they will
have sufficiently fast key load times such that we can use per-file
keying.

> An alternative which would work nicely on ext4 and xfs (if xfs supported
> fscrypt) would be to pass the physical block number as the DUN.  However, that
> wouldn't work at all on f2fs because f2fs moves data blocks around.  And since
> most people who want to use this are using f2fs, f2fs support is essential.

And that is something fscrypt supports already, so if people really
did want to use 64-bit logical block numbers, they could do that, at
the cost of giving up the ability to shrink the file system (which XFS
doesn't support anyway)

- Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [Bug 205203] ram_thresh default (DEF_RAM_THRESHOLD) is wrong (outdated) in f2fs document

2019-10-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205203

Chao Yu (c...@kernel.org) changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||c...@kernel.org

--- Comment #1 from Chao Yu (c...@kernel.org) ---
Thanks for the report, I've sent a patch for this issue.

https://lore.kernel.org/linux-f2fs-devel/20191022092611.58191-1-yuch...@huawei.com/T/#u

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[PATCH] f2fs: fix wrong description in document

2019-10-22 Thread Chao Yu
As reported in bugzilla, default value of DEF_RAM_THRESHOLD was fixed by
commit 29710bcf9426 ("f2fs: fix wrong percentage"), however leaving wrong
description in document, fix it.

https://bugzilla.kernel.org/show_bug.cgi?id=205203

Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 7e1991328473..29020af0cff9 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -346,7 +346,7 @@ Files in /sys/fs/f2fs/
 
  ram_thresh   This parameter controls the memory footprint used
  by free nids and cached nat entries. By default,
- 10 is set, which indicates 10 MB / 1 GB RAM.
+ 1 is set, which indicates 10 MB / 1 GB RAM.
 
  ra_nid_pagesWhen building free nids, F2FS reads NAT blocks
  ahead for speed up. Default is 0.
-- 
2.18.0.rc1



Re: [f2fs-dev] [PATCH v5 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() helper function

2019-10-22 Thread Chao Yu
On 2019/10/22 17:10, Damien Le Moal wrote:
> On 2019/10/22 17:59, Chao Yu wrote:
>> On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
>>> To prepare for write pointer consistency fix by fsck, add
>>> f2fs_reset_zone() helper function which calls RESET ZONE command. The
>>> function is added to lib/libf2fs_zoned which gathers zoned block device
>>> related functions.
>>>
>>> Signed-off-by: Shin'ichiro Kawasaki 
>>> ---
>>>  include/f2fs_fs.h   |  1 +
>>>  lib/libf2fs_zoned.c | 26 ++
>>>  2 files changed, 27 insertions(+)
>>>
>>> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
>>> index 1f7ef05..a36927b 100644
>>> --- a/include/f2fs_fs.h
>>> +++ b/include/f2fs_fs.h
>>> @@ -1303,6 +1303,7 @@ extern int f2fs_report_zone(int, u_int64_t, void *);
>>>  typedef int (report_zones_cb_t)(int i, void *, void *);
>>>  extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
>>>  extern int f2fs_check_zones(int);
>>> +int f2fs_reset_zone(int, void *);
>>>  extern int f2fs_reset_zones(int);
>>>  
>>>  #define SIZE_ALIGN(val, size)  ((val) + (size) - 1) / (size)
>>> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
>>> index 10d6d0b..1335038 100644
>>> --- a/lib/libf2fs_zoned.c
>>> +++ b/lib/libf2fs_zoned.c
>>> @@ -388,6 +388,26 @@ out:
>>> return ret;
>>>  }
>>>  
>>> +int f2fs_reset_zone(int i, void *blkzone)
>>> +{
>>> +   struct blk_zone *blkz = (struct blk_zone *)blkzone;
>>> +   struct device_info *dev = c.devices + i;
>>> +   struct blk_zone_range range;
>>> +   int ret;
>>> +
>>> +   if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
>>> +   return 0;
>>> +
>>> +   /* Non empty sequential zone: reset */
>>> +   range.sector = blk_zone_sector(blkz);
>>> +   range.nr_sectors = blk_zone_length(blkz);
>>> +   ret = ioctl(dev->fd, BLKRESETZONE, &range);
>>> +   if (ret != 0)
>>
>> As you did in other zoned block device code, errno would be preferred as 
>> return
>> value?
>>
>>> +   ERR_MSG("ioctl BLKRESETZONE failed\n");
>>> +
>>> +   return ret;
>>> +}
>>> +
>>>  int f2fs_reset_zones(int j)
>>>  {
>>> struct device_info *dev = c.devices + j;
>>> @@ -491,6 +511,12 @@ int f2fs_check_zones(int i)
>>> return -1;
>>>  }
>>>  
>>> +int f2fs_reset_zone(int i, void *blkzone)
>>> +{
>>> +   ERR_MSG("%d: Zoned block devices are not supported\n", i);
>>
>> Minor thing:
>>
>> "device is"?
> 
>   ERR_MSG("%d: Unsupported zoned block device\n", i);
> 
> may be better.

Looks better.

> 
> Note that we should never hit this anyway since the validity of the set
> of devices used should have been checked way before we start resetting
> zones.

Yup.

Thanks,

> 
>>
>>> +   return -1;
>>> +}
>>> +
>>>  int f2fs_reset_zones(int i)
>>>  {
>>> ERR_MSG("%d: Zoned block devices are not supported\n", i);
>>
>> "device is"?
> 
> Same as above.
> 
>>
>> Thanks,
>>
>>>
>>
> 
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 6/8] fsck: Check fsync data always for zoned block devices

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> Fsck checks fsync data when UMOUNT flag is not set. When the f2fs was not
> cleanly unmouted, UMOUNT flag is not recorded in meta data and fsync data
> can be left in the f2fs. The first fsck run checks fsync data to reflect
> it on quota status recovery. After that, fsck writes UMOUNT flag in the
> f2fs meta data so that second fsck run can skip fsync data check.

Oh, IMO, we need that flag to let fsck know there is still fsynced data in the
image, could we just leave it as it is?

To Jaegeuk, thoughts?

Thanks,

> 
> However, fsck for zoned block devices need to care fsync data for all
> fsck runs. The first fsck run checks fsync data, then fsck can check
> write pointer consistency with fsync data. However, since second fsck run
> does not check fsync data, fsck detects write pointer at fsync data end
> is not consistent with f2fs meta data. This results in meta data update
> by fsck and fsync data gets lost.
> 
> To have fsck check fsync data always for zoned block devices, introduce
> need_fsync_data_record() helper function which returns boolean to tell
> if fsck needs fsync data check or not. For zoned block devices, always
> return true. Otherwise, return true if UMOUNT flag is not set in CP.
> Replace UMOUNT flag check codes for fsync data with the function call.
> 
> Signed-off-by: Shin'ichiro Kawasaki 
> ---
>  fsck/fsck.h|  6 ++
>  fsck/mount.c   | 14 +++---
>  fsck/segment.c |  2 +-
>  3 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/fsck/fsck.h b/fsck/fsck.h
> index 8da0ebb..75052d8 100644
> --- a/fsck/fsck.h
> +++ b/fsck/fsck.h
> @@ -133,6 +133,12 @@ enum seg_type {
>  
>  struct selabel_handle;
>  
> +static inline bool need_fsync_data_record(struct f2fs_sb_info *sbi)
> +{
> + return !is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG) ||
> + c.zoned_model == F2FS_ZONED_HM;
> +}
> +
>  extern int fsck_chk_orphan_node(struct f2fs_sb_info *);
>  extern int fsck_chk_quota_node(struct f2fs_sb_info *);
>  extern int fsck_chk_quota_files(struct f2fs_sb_info *);
> diff --git a/fsck/mount.c b/fsck/mount.c
> index 915f487..2979865 100644
> --- a/fsck/mount.c
> +++ b/fsck/mount.c
> @@ -1525,7 +1525,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
>  
>   bitmap_size = TOTAL_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE;
>  
> - if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG))
> + if (need_fsync_data_record(sbi))
>   bitmap_size += bitmap_size;
>  
>   sit_i->bitmap = calloc(bitmap_size, 1);
> @@ -1540,7 +1540,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
>   sit_i->sentries[start].cur_valid_map = bitmap;
>   bitmap += SIT_VBLOCK_MAP_SIZE;
>  
> - if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG)) {
> + if (need_fsync_data_record(sbi)) {
>   sit_i->sentries[start].ckpt_valid_map = bitmap;
>   bitmap += SIT_VBLOCK_MAP_SIZE;
>   }
> @@ -1887,7 +1887,7 @@ void seg_info_from_raw_sit(struct f2fs_sb_info *sbi, 
> struct seg_entry *se,
>  {
>   __seg_info_from_raw_sit(se, raw_sit);
>  
> - if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> + if (!need_fsync_data_record(sbi))
>   return;
>   se->ckpt_valid_blocks = se->valid_blocks;
>   memcpy(se->ckpt_valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
> @@ -1903,7 +1903,7 @@ struct seg_entry *get_seg_entry(struct f2fs_sb_info 
> *sbi,
>  
>  unsigned short get_seg_vblocks(struct f2fs_sb_info *sbi, struct seg_entry 
> *se)
>  {
> - if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> + if (!need_fsync_data_record(sbi))
>   return se->valid_blocks;
>   else
>   return se->ckpt_valid_blocks;
> @@ -1911,7 +1911,7 @@ unsigned short get_seg_vblocks(struct f2fs_sb_info 
> *sbi, struct seg_entry *se)
>  
>  unsigned char *get_seg_bitmap(struct f2fs_sb_info *sbi, struct seg_entry *se)
>  {
> - if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> + if (!need_fsync_data_record(sbi))
>   return se->cur_valid_map;
>   else
>   return se->ckpt_valid_map;
> @@ -1919,7 +1919,7 @@ unsigned char *get_seg_bitmap(struct f2fs_sb_info *sbi, 
> struct seg_entry *se)
>  
>  unsigned char get_seg_type(struct f2fs_sb_info *sbi, struct seg_entry *se)
>  {
> - if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> + if (!need_fsync_data_record(sbi))
>   return se->type;
>   else
>   return se->ckpt_type;
> @@ -3242,7 +3242,7 @@ static int record_fsync_data(struct f2fs_sb_info *sbi)
>   struct list_head inode_list = LIST_HEAD_INIT(inode_list);
>   int ret;
>  
> - if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
> + if (!need_fsync_data_record(sbi))
>   return 0;
>  
>   ret = find_fsync_inode(sbi, &inode_list);
> diff --git a/fsck/segment.c b/fsck/segment.c
> index ccde05f..17c42b7 100644
> 

Re: [f2fs-dev] [PATCH v5 5/8] fsck: Introduce move_one_curseg_info() function

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> When fsck updates one of the current segments, update_curseg_info() is
> called specifying a single current segment as its argument. However,
> update_curseg_info() calls move_curseg_info() function which updates all
> six current segments. Then update_curseg_info() for a single current
> segment moves all current segments.
> 
> This excessive current segment move causes an issue when a new zone is
> assigned to a current segment because of write pointer inconsistency.
> Even when a current segment has write pointer inconsistency, all other
> current segments should not be moved because they may have fsync data
> at their positions.
> 
> To avoid the excessive current segment move, introduce
> move_one_curseg_info() function which does same work as
> move_curseg_info() only for a single current segment. Call
> move_one_curseg_info() in place of move_curseg_info() from
> update_curseg_info().

Good catch!

> 
> Signed-off-by: Shin'ichiro Kawasaki 

Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() helper function

2019-10-22 Thread Damien Le Moal
On 2019/10/22 17:59, Chao Yu wrote:
> On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
>> To prepare for write pointer consistency fix by fsck, add
>> f2fs_reset_zone() helper function which calls RESET ZONE command. The
>> function is added to lib/libf2fs_zoned which gathers zoned block device
>> related functions.
>>
>> Signed-off-by: Shin'ichiro Kawasaki 
>> ---
>>  include/f2fs_fs.h   |  1 +
>>  lib/libf2fs_zoned.c | 26 ++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
>> index 1f7ef05..a36927b 100644
>> --- a/include/f2fs_fs.h
>> +++ b/include/f2fs_fs.h
>> @@ -1303,6 +1303,7 @@ extern int f2fs_report_zone(int, u_int64_t, void *);
>>  typedef int (report_zones_cb_t)(int i, void *, void *);
>>  extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
>>  extern int f2fs_check_zones(int);
>> +int f2fs_reset_zone(int, void *);
>>  extern int f2fs_reset_zones(int);
>>  
>>  #define SIZE_ALIGN(val, size)   ((val) + (size) - 1) / (size)
>> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
>> index 10d6d0b..1335038 100644
>> --- a/lib/libf2fs_zoned.c
>> +++ b/lib/libf2fs_zoned.c
>> @@ -388,6 +388,26 @@ out:
>>  return ret;
>>  }
>>  
>> +int f2fs_reset_zone(int i, void *blkzone)
>> +{
>> +struct blk_zone *blkz = (struct blk_zone *)blkzone;
>> +struct device_info *dev = c.devices + i;
>> +struct blk_zone_range range;
>> +int ret;
>> +
>> +if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
>> +return 0;
>> +
>> +/* Non empty sequential zone: reset */
>> +range.sector = blk_zone_sector(blkz);
>> +range.nr_sectors = blk_zone_length(blkz);
>> +ret = ioctl(dev->fd, BLKRESETZONE, &range);
>> +if (ret != 0)
> 
> As you did in other zoned block device code, errno would be preferred as 
> return
> value?
> 
>> +ERR_MSG("ioctl BLKRESETZONE failed\n");
>> +
>> +return ret;
>> +}
>> +
>>  int f2fs_reset_zones(int j)
>>  {
>>  struct device_info *dev = c.devices + j;
>> @@ -491,6 +511,12 @@ int f2fs_check_zones(int i)
>>  return -1;
>>  }
>>  
>> +int f2fs_reset_zone(int i, void *blkzone)
>> +{
>> +ERR_MSG("%d: Zoned block devices are not supported\n", i);
> 
> Minor thing:
> 
> "device is"?

ERR_MSG("%d: Unsupported zoned block device\n", i);

may be better.

Note that we should never hit this anyway since the validity of the set
of devices used should have been checked way before we start resetting
zones.

> 
>> +return -1;
>> +}
>> +
>>  int f2fs_reset_zones(int i)
>>  {
>>  ERR_MSG("%d: Zoned block devices are not supported\n", i);
> 
> "device is"?

Same as above.

> 
> Thanks,
> 
>>
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 4/8] fsck: Find free zones instead of blocks to assign to current segments

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> When fsck needs to assign a new area to a curreng segment, it calls
> find_next_free_block() function to find a new block to assign. For zoned
> block devices, fsck checks write pointer consistency with current
> segments' positions. In case a curseg is inconsistent with the
> write pointer of the zone it points to, fsck should assign not a new free
> block but a new free zone/section with write pointer at the zone start,
> so that next write to the current segment succeeds without error.
> 
> To extend find_next_free_block() function's capability to find not only
> a block but also a zone/section, add new_sec flag to
> find_next_free_block() function. When new_sec flag is true, skip check
> for each block's availability so that the check is done with unit of
> section. Note that it is ensured that one zone has one section for f2fs
> on zoned block devices. Then the logic to find a new free section is good
> to find a new free zone.
> 
> When fsck target devices have ZONED_HM model, set new_sec flag true to
> call find_next_free_block() from move_curseg_info(). Set curseg's
> alloc_type not SSR but LFS for the devices with ZONED_HM model, because

I think this is correct, but just to declear, SSR allocator on a freed section
will also allocate data block sequentially.

> SSR block allocation is not allowed for zoned block devices. Also skip
> relocate_curseg_offset() for the devices with ZONED_HM model for the
> same reason.
> 
> Signed-off-by: Shin'ichiro Kawasaki 

Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() helper function

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> To prepare for write pointer consistency fix by fsck, add
> f2fs_reset_zone() helper function which calls RESET ZONE command. The
> function is added to lib/libf2fs_zoned which gathers zoned block device
> related functions.
> 
> Signed-off-by: Shin'ichiro Kawasaki 
> ---
>  include/f2fs_fs.h   |  1 +
>  lib/libf2fs_zoned.c | 26 ++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
> index 1f7ef05..a36927b 100644
> --- a/include/f2fs_fs.h
> +++ b/include/f2fs_fs.h
> @@ -1303,6 +1303,7 @@ extern int f2fs_report_zone(int, u_int64_t, void *);
>  typedef int (report_zones_cb_t)(int i, void *, void *);
>  extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
>  extern int f2fs_check_zones(int);
> +int f2fs_reset_zone(int, void *);
>  extern int f2fs_reset_zones(int);
>  
>  #define SIZE_ALIGN(val, size)((val) + (size) - 1) / (size)
> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
> index 10d6d0b..1335038 100644
> --- a/lib/libf2fs_zoned.c
> +++ b/lib/libf2fs_zoned.c
> @@ -388,6 +388,26 @@ out:
>   return ret;
>  }
>  
> +int f2fs_reset_zone(int i, void *blkzone)
> +{
> + struct blk_zone *blkz = (struct blk_zone *)blkzone;
> + struct device_info *dev = c.devices + i;
> + struct blk_zone_range range;
> + int ret;
> +
> + if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
> + return 0;
> +
> + /* Non empty sequential zone: reset */
> + range.sector = blk_zone_sector(blkz);
> + range.nr_sectors = blk_zone_length(blkz);
> + ret = ioctl(dev->fd, BLKRESETZONE, &range);
> + if (ret != 0)

As you did in other zoned block device code, errno would be preferred as return
value?

> + ERR_MSG("ioctl BLKRESETZONE failed\n");
> +
> + return ret;
> +}
> +
>  int f2fs_reset_zones(int j)
>  {
>   struct device_info *dev = c.devices + j;
> @@ -491,6 +511,12 @@ int f2fs_check_zones(int i)
>   return -1;
>  }
>  
> +int f2fs_reset_zone(int i, void *blkzone)
> +{
> + ERR_MSG("%d: Zoned block devices are not supported\n", i);

Minor thing:

"device is"?

> + return -1;
> +}
> +
>  int f2fs_reset_zones(int i)
>  {
>   ERR_MSG("%d: Zoned block devices are not supported\n", i);

"device is"?

Thanks,

> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 2/8] libf2fs_zoned: Introduce f2fs_report_zone() helper function

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> To prepare for write pointer consistency check by fsck, add
> f2fs_report_zone() helper function which calls REPORT ZONE command to
> get write pointer status of a single zone. The function is added to
> lib/libf2fs_zoned which gathers zoned block device related functions.
> 
> Signed-off-by: Shin'ichiro Kawasaki 

Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function

2019-10-22 Thread Chao Yu
On 2019/10/18 14:37, Shin'ichiro Kawasaki wrote:
> To prepare for write pointer consistency check by fsck, add
> f2fs_report_zones() helper function which calls REPORT ZONE command to
> get write pointer status. The function is added to lib/libf2fs_zoned
> which gathers zoned block device related functions.
> 
> To check write pointer consistency with f2fs meta data, fsck needs to
> refer both of reported zone information and f2fs super block structure
> "f2fs_sb_info". However, libf2fs_zoned does not import f2fs_sb_info. To
> keep f2fs_sb_info structure out of libf2fs_zoned, provide a callback
> function in fsck to f2fs_report_zones() and call it for each zone.
> 
> Add SECTOR_SHIFT definition in include/f2fs_fs.h to avoid a magic number
> to convert bytes into 512B sectors.
> 
> Signed-off-by: Shin'ichiro Kawasaki 

Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [PATCH v2] f2fs: fix to avoid memory leakage in f2fs_listxattr

2019-10-22 Thread Chao Yu
On 2019/10/18 14:56, Randall Huang wrote:
> In f2fs_listxattr, there is no boundary check before
> memcpy e_name to buffer.
> If the e_name_len is corrupted,
> unexpected memory contents may be returned to the buffer.
> 
> Signed-off-by: Randall Huang 

Reviewed-by: Chao Yu 

Thanks,


Re: [f2fs-dev] [PATCH v2 0/2] add a new fio cache for IPU.

2019-10-22 Thread Chao Yu
Hi Lihong,

On 2019/10/16 18:11, Lihong Kou wrote:
> Introdce a new fio cache for IPU.
> After commit 8648de2c581(f2fs: add bio cache for IPU)
> in the mainline, we still have the problem in SQLite. So I
> reuse the fio data structs to cache the mergeale write IO in
> wirtepages().

I have discussed with Jaegeuk about this in below thread,

https://lore.kernel.org/linux-f2fs-devel/20190219081529.5106-1-yuch...@huawei.com/T/#md5d6dcf60052201994e6df4a915a07a04869d427

IMO, we need a per data type global list to cache all bios to enhance its
scalability, as single bio cache may suffer performance regression due to race
of multiple threads, I've sent a patch [1] for that previously.

For any thoughts, we can discuss in above thread.

[1]
https://lore.kernel.org/linux-f2fs-devel/20190930105325.42870-1-yuch...@huawei.com/T/#u

Thanks,

> 
> Lihong Kou (1):
>   Revert "f2fs: add bio cache for IPU"
> 
> Lihong Kou (1):
>   f2fs: introduce a new fio cache for IPU.
> 
>  fs/f2fs/checkpoint.c |  1 +
>  fs/f2fs/data.c   | 93 
> 
>  fs/f2fs/f2fs.h   |  4 +--
>  fs/f2fs/segment.c| 22 ++---
>  fs/f2fs/super.c  |  2 ++
>  5 files changed, 29 insertions(+), 93 deletions(-)
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel