Re: [f2fs-dev] [RFC] errno.h: Provide EFSCORRUPTED for everybody

2019-10-30 Thread Darrick J. Wong
On Wed, Oct 30, 2019 at 09:07:33PM -0400, Valdis Kletnieks wrote:
> Three questions: (a) ACK/NAK on this patch,

Acked-by: Darrick J. Wong 

> (b) should it be all in one patch, or one to add to errno.h and 6
> patches for 6 filesystems?), and

I don't particularly care, but I've a slight preference for changing it
all at once so that it's obvious as a move.

> (c) if one patch, who gets to shepherd it through?

Heh. :)

I would add (d) can we do the same to EFSBADCRC, seeing as f2fs,
ext4, xfs, and jbd2 all define it the same way?

--D

> There's currently 6 filesystems that have the same #define. Move it
> into errno.h so it's defined in just one place.
> 
> Signed-off-by: Valdis Kletnieks 
> ---
>  drivers/staging/exfat/exfat.h| 2 --
>  fs/erofs/internal.h  | 2 --
>  fs/ext4/ext4.h   | 1 -
>  fs/f2fs/f2fs.h   | 1 -
>  fs/xfs/xfs_linux.h   | 1 -
>  include/linux/jbd2.h | 1 -
>  include/uapi/asm-generic/errno.h | 1 +
>  7 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/exfat/exfat.h b/drivers/staging/exfat/exfat.h
> index 84de1123e178..3cf7e54af0b7 100644
> --- a/drivers/staging/exfat/exfat.h
> +++ b/drivers/staging/exfat/exfat.h
> @@ -30,8 +30,6 @@
>  #undef DEBUG
>  #endif
>  
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
>  #define DENTRY_SIZE  32  /* dir entry size */
>  #define DENTRY_SIZE_BITS 5
>  
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 544a453f3076..3980026a8882 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -425,7 +425,5 @@ static inline int z_erofs_init_zip_subsystem(void) { 
> return 0; }
>  static inline void z_erofs_exit_zip_subsystem(void) {}
>  #endif   /* !CONFIG_EROFS_FS_ZIP */
>  
> -#define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
> -
>  #endif   /* __EROFS_INTERNAL_H */
>  
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 03db3e71676c..a86c2585457d 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3396,6 +3396,5 @@ static inline int ext4_buffer_uptodate(struct 
> buffer_head *bh)
>  #endif   /* __KERNEL__ */
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif   /* _EXT4_H */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4024790028aa..04ebe77569a3 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3752,6 +3752,5 @@ static inline bool is_journalled_quota(struct 
> f2fs_sb_info *sbi)
>  }
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif /* _LINUX_F2FS_H */
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index ca15105681ca..3409d02a7d21 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -123,7 +123,6 @@ typedef __u32 xfs_nlink_t;
>  
>  #define ENOATTR  ENODATA /* Attribute not found */
>  #define EWRONGFS EINVAL  /* Mount with wrong filesystem type */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
>  
>  #define SYNCHRONIZE()barrier()
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 564793c24d12..1ecd3859d040 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1657,6 +1657,5 @@ static inline tid_t  
> jbd2_get_latest_transaction(journal_t *journal)
>  #endif   /* __KERNEL__ */
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif   /* _LINUX_JBD2_H */
> diff --git a/include/uapi/asm-generic/errno.h 
> b/include/uapi/asm-generic/errno.h
> index cf9c51ac49f9..1d5ffdf54cb0 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -98,6 +98,7 @@
>  #define  EINPROGRESS 115 /* Operation now in progress */
>  #define  ESTALE  116 /* Stale file handle */
>  #define  EUCLEAN 117 /* Structure needs cleaning */
> +#define  EFSCORRUPTEDEUCLEAN
>  #define  ENOTNAM 118 /* Not a XENIX named type file */
>  #define  ENAVAIL 119 /* No XENIX semaphores available */
>  #define  EISNAM  120 /* Is a named type file */
> -- 
> 2.24.0.rc1
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/2] f2fs: support aligned pinned file

2019-10-30 Thread Chao Yu
On 2019/10/31 0:09, Jaegeuk Kim wrote:
> On 10/26, Chao Yu wrote:
>> On 2019/10/26 2:18, Jaegeuk Kim wrote:
>>> On 10/24, Chao Yu wrote:
 Hi Jaegeuk,

 On 2019/10/23 1:16, Jaegeuk Kim wrote:
> This patch supports 2MB-aligned pinned file, which can guarantee no GC at 
> all
> by allocating fully valid 2MB segment.
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h |  4 +++-
>  fs/f2fs/file.c | 39 ++-
>  fs/f2fs/recovery.c |  2 +-
>  fs/f2fs/segment.c  | 21 -
>  fs/f2fs/segment.h  |  2 ++
>  fs/f2fs/super.c|  1 +
>  fs/f2fs/sysfs.c|  2 ++
>  7 files changed, 63 insertions(+), 8 deletions(-)
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index ca342f4c7db1..c681f51e351b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -890,6 +890,7 @@ enum {
>   CURSEG_WARM_NODE,   /* direct node blocks of normal files */
>   CURSEG_COLD_NODE,   /* indirect node blocks */
>   NO_CHECK_TYPE,
> + CURSEG_COLD_DATA_PINNED,/* cold data for pinned file */
>  };
>  
>  struct flush_cmd {
> @@ -1301,6 +1302,7 @@ struct f2fs_sb_info {
>  
>   /* threshold for gc trials on pinned files */
>   u64 gc_pin_file_threshold;
> + struct rw_semaphore pin_sem;
>  
>   /* maximum # of trials to find a victim segment for SSR and GC */
>   unsigned int max_victim_search;
> @@ -3116,7 +3118,7 @@ void f2fs_release_discard_addrs(struct f2fs_sb_info 
> *sbi);
>  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
>  void allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
>   unsigned int start, unsigned int end);
> -void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
> +void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi, int type);
>  int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
>  bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
>   struct cp_control *cpc);
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 29bc0a542759..f6c038e8a6a7 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1545,12 +1545,41 @@ static int expand_inode_data(struct inode *inode, 
> loff_t offset,
>   if (off_end)
>   map.m_len++;
>  
> - if (f2fs_is_pinned_file(inode))
> - map.m_seg_type = CURSEG_COLD_DATA;
> + if (!map.m_len)
> + return 0;
> +
> + if (f2fs_is_pinned_file(inode)) {
> + block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
> + sbi->log_blocks_per_seg;
> + block_t done = 0;
> +
> + if (map.m_len % sbi->blocks_per_seg)
> + len += sbi->blocks_per_seg;
>  
> - err = f2fs_map_blocks(inode, , 1, (f2fs_is_pinned_file(inode) ?
> - F2FS_GET_BLOCK_PRE_DIO :
> - F2FS_GET_BLOCK_PRE_AIO));
> + map.m_len = sbi->blocks_per_seg;
> +next_alloc:
> + mutex_lock(>gc_mutex);
> + err = f2fs_gc(sbi, true, false, NULL_SEGNO);
> + if (err && err != -ENODATA && err != -EAGAIN)
> + goto out_err;

 To grab enough free space?

 Shouldn't we call

if (has_not_enough_free_secs(sbi, 0, 0)) {
mutex_lock(>gc_mutex);
f2fs_gc(sbi, false, false, NULL_SEGNO);
}
>>>
>>> The above calls gc all the time. Do we need this?
>>
>> Hmmm... my concern is why we need to run foreground GC even if there is 
>> enough
>> free space..
> 
> In order to get the free segment easily?

However, I doubt arbitrary foreground GC with greedy algorithm will ruin
hot/cold data separation, actually, for sufficient free segment case, it's
unnecessary to call FGGC.

Thanks,

> 
>>
>>>

> +
> + down_write(>pin_sem);
> + map.m_seg_type = CURSEG_COLD_DATA_PINNED;
> + f2fs_allocate_new_segments(sbi, CURSEG_COLD_DATA);
> + err = f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_DIO);
> + up_write(>pin_sem);
> +
> + done += map.m_len;
> + len -= map.m_len;
> + map.m_lblk += map.m_len;
> + if (!err && len)
> + goto next_alloc;
> +
> + map.m_len = done;
> + } else {
> + err = f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
> + }
> +out_err:
>   if (err) {
>   pgoff_t last_off;
>  
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 783773e4560d..76477f71d4ee 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -711,7 +711,7 @@ static int 

Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Chao Yu
On 2019/10/31 1:02, Eric Biggers wrote:
> On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
>>  static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>>  {
>> -/*
>> - * We use different work queues for decryption and for verity 
>> because
>> - * verity may require reading metadata pages that need 
>> decryption, and
>> - * we shouldn't recurse to the same workqueue.
>> - */
>
> Why is it okay (i.e., no deadlocks) to no longer use different work 
> queues for
> decryption and for verity?  See the comment above which is being deleted.

 Could you explain more about how deadlock happen? or share me a link 
 address if
 you have described that case somewhere?

>>>
>>> The verity work can read pages from the file which require decryption.  I'm
>>> concerned that it could deadlock if the work is scheduled on the same 
>>> workqueue.
>>
>> I assume you've tried one workqueue, and suffered deadlock..
>>
>>> Granted, I'm not an expert in Linux workqueues, so if you've investigated 
>>> this
>>> and determined that it's safe, can you explain why?
>>
>> I'm not familiar with workqueue...  I guess it may not safe that if the work 
>> is
>> scheduled to the same cpu in where verity was waiting for data? if the work 
>> is
>> scheduled to other cpu, it may be safe.
>>
>> I can check that before splitting the workqueue for verity and 
>> decrypt/decompress.
>>
> 
> Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
> The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
> the Merkle tree page to be decrypted.  This is with the v2 compression patch;
> it works fine on current mainline.

Oh, alright...

Let me split them, thanks very much for all the comments and test anyway.

Thanks,

> 
> INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
>   Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:0D061  2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
>  context_switch kernel/sched/core.c:3384 [inline]
>  __schedule+0x299/0x6c0 kernel/sched/core.c:4069
>  schedule+0x44/0xd0 kernel/sched/core.c:4136
>  io_schedule+0x11/0x40 kernel/sched/core.c:5780
>  wait_on_page_bit_common mm/filemap.c:1174 [inline]
>  wait_on_page_bit mm/filemap.c:1223 [inline]
>  wait_on_page_locked include/linux/pagemap.h:527 [inline]
>  wait_on_page_locked include/linux/pagemap.h:524 [inline]
>  wait_on_page_read mm/filemap.c:2767 [inline]
>  do_read_cache_page+0x407/0x660 mm/filemap.c:2810
>  read_cache_page+0xd/0x10 mm/filemap.c:2894
>  f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
>  verify_page+0x110/0x560 fs/verity/verify.c:120
>  fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
>  verity_work fs/f2fs/data.c:142 [inline]
>  f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
>  process_one_work+0x225/0x550 kernel/workqueue.c:2269
>  worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
>  kthread+0x125/0x140 kernel/kthread.c:255
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
>   Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u5:1D0  1140  2 0x80004000
> Workqueue: f2fs_post_read_wq f2fs_post_read_work
> Call Trace:
>  context_switch kernel/sched/core.c:3384 [inline]
>  __schedule+0x299/0x6c0 kernel/sched/core.c:4069
>  schedule+0x44/0xd0 kernel/sched/core.c:4136
>  io_schedule+0x11/0x40 kernel/sched/core.c:5780
>  wait_on_page_bit_common mm/filemap.c:1174 [inline]
>  wait_on_page_bit mm/filemap.c:1223 [inline]
>  wait_on_page_locked include/linux/pagemap.h:527 [inline]
>  wait_on_page_locked include/linux/pagemap.h:524 [inline]
>  wait_on_page_read mm/filemap.c:2767 [inline]
>  do_read_cache_page+0x407/0x660 mm/filemap.c:2810
>  read_cache_page+0xd/0x10 mm/filemap.c:2894
>  f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
>  verify_page+0x110/0x560 fs/verity/verify.c:120
>  fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
>  verity_work fs/f2fs/data.c:142 [inline]
>  f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
>  process_one_work+0x225/0x550 kernel/workqueue.c:2269
>  worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
>  kthread+0x125/0x140 kernel/kthread.c:255
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/21:
>  #0: 82250520 (rcu_read_lock){}, at: 
> rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
> 2 locks held by kworker/u5:0/61:
>  #0: 88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: 
> set_work_data kernel/workqueue.c:619 [inline]
>  #0: 88807b78eb28 

Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Chao Yu
On 2019/10/31 0:50, Eric Biggers wrote:
> No, just use kvmalloc().  The whole point of kvmalloc() is that it tries
> kmalloc() and then falls back to vmalloc() if it fails.

Okay, it's fine to me, let me fix this in another patch.

Thanks,

> 
> - Eric
> .
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Chao Yu
On 2019/10/30 23:14, Theodore Y. Ts'o wrote:
> On Wed, Oct 30, 2019 at 06:43:45PM +0800, Gao Xiang wrote:
>>> You're right, in low memory scenario, allocation with bioset will be 
>>> faster, as
>>> you mentioned offline, maybe we can add/use a priviate bioset like btrfs did
>>> rather than using global one, however, we'd better check how deadlock happen
>>> with a bioset mempool first ...
>>
>> Okay, hope to get hints from Jaegeuk and redo this patch then...
> 
> It's not at all clear to me that using a private bioset is a good idea
> for f2fs.  That just means you're allocating a separate chunk of
> memory just for f2fs, as opposed to using the global pool.  That's an
> additional chunk of non-swapable kernel memory that's not going to be
> available, in *addition* to the global mempool.  
> 
> Also, who else would you be contending for space with the global
> mempool?  It's not like an mobile handset is going to have other users
> of the global bio mempool.
> 
> On a low-end mobile handset, memory is at a premium, so wasting memory
> to no good effect isn't going to be a great idea.

You're right, it looks that the purpose that btrfs added private bioset is to
avoid abusing bio internal fields (via commit 9be3395bcd4a), f2fs has no such
reason to do that now.

Thanks,

> 
> Regards,
> 
>   - Ted
> .
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [RFC] errno.h: Provide EFSCORRUPTED for everybody

2019-10-30 Thread Valdis Kletnieks
Three questions: (a) ACK/NAK on this patch, (b) should it be all in one
patch, or one to add to errno.h and 6 patches for 6 filesystems?), and
(c) if one patch, who gets to shepherd it through?


There's currently 6 filesystems that have the same #define. Move it
into errno.h so it's defined in just one place.

Signed-off-by: Valdis Kletnieks 
---
 drivers/staging/exfat/exfat.h| 2 --
 fs/erofs/internal.h  | 2 --
 fs/ext4/ext4.h   | 1 -
 fs/f2fs/f2fs.h   | 1 -
 fs/xfs/xfs_linux.h   | 1 -
 include/linux/jbd2.h | 1 -
 include/uapi/asm-generic/errno.h | 1 +
 7 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/staging/exfat/exfat.h b/drivers/staging/exfat/exfat.h
index 84de1123e178..3cf7e54af0b7 100644
--- a/drivers/staging/exfat/exfat.h
+++ b/drivers/staging/exfat/exfat.h
@@ -30,8 +30,6 @@
 #undef DEBUG
 #endif
 
-#define EFSCORRUPTED   EUCLEAN /* Filesystem is corrupted */
-
 #define DENTRY_SIZE32  /* dir entry size */
 #define DENTRY_SIZE_BITS   5
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 544a453f3076..3980026a8882 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -425,7 +425,5 @@ static inline int z_erofs_init_zip_subsystem(void) { return 
0; }
 static inline void z_erofs_exit_zip_subsystem(void) {}
 #endif /* !CONFIG_EROFS_FS_ZIP */
 
-#define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
-
 #endif /* __EROFS_INTERNAL_H */
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 03db3e71676c..a86c2585457d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3396,6 +3396,5 @@ static inline int ext4_buffer_uptodate(struct buffer_head 
*bh)
 #endif /* __KERNEL__ */
 
 #define EFSBADCRC  EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED   EUCLEAN /* Filesystem is corrupted */
 
 #endif /* _EXT4_H */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4024790028aa..04ebe77569a3 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3752,6 +3752,5 @@ static inline bool is_journalled_quota(struct 
f2fs_sb_info *sbi)
 }
 
 #define EFSBADCRC  EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED   EUCLEAN /* Filesystem is corrupted */
 
 #endif /* _LINUX_F2FS_H */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index ca15105681ca..3409d02a7d21 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -123,7 +123,6 @@ typedef __u32   xfs_nlink_t;
 
 #define ENOATTRENODATA /* Attribute not found */
 #define EWRONGFS   EINVAL  /* Mount with wrong filesystem type */
-#define EFSCORRUPTED   EUCLEAN /* Filesystem is corrupted */
 #define EFSBADCRC  EBADMSG /* Bad CRC detected */
 
 #define SYNCHRONIZE()  barrier()
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 564793c24d12..1ecd3859d040 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1657,6 +1657,5 @@ static inline tid_t  
jbd2_get_latest_transaction(journal_t *journal)
 #endif /* __KERNEL__ */
 
 #define EFSBADCRC  EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED   EUCLEAN /* Filesystem is corrupted */
 
 #endif /* _LINUX_JBD2_H */
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
index cf9c51ac49f9..1d5ffdf54cb0 100644
--- a/include/uapi/asm-generic/errno.h
+++ b/include/uapi/asm-generic/errno.h
@@ -98,6 +98,7 @@
 #defineEINPROGRESS 115 /* Operation now in progress */
 #defineESTALE  116 /* Stale file handle */
 #defineEUCLEAN 117 /* Structure needs cleaning */
+#defineEFSCORRUPTEDEUCLEAN
 #defineENOTNAM 118 /* Not a XENIX named type file */
 #defineENAVAIL 119 /* No XENIX semaphores available */
 #defineEISNAM  120 /* Is a named type file */
-- 
2.24.0.rc1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] docs: fs-verity: document first supported kernel version

2019-10-30 Thread Eric Biggers
From: Eric Biggers 

I had meant to replace these TODOs with the actual version when applying
the patches, but forgot to do so.  Do it now.

Signed-off-by: Eric Biggers 
---
 Documentation/filesystems/fsverity.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/fsverity.rst 
b/Documentation/filesystems/fsverity.rst
index 3355377a2439..a95536b6443c 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -406,7 +406,7 @@ pages have been read into the pagecache.  (See `Verifying 
data`_.)
 ext4
 
 
-ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2.
+ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2.
 
 To create verity files on an ext4 filesystem, the filesystem must have
 been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
@@ -442,7 +442,7 @@ also only supports extent-based files.
 f2fs
 
 
-f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0.
+f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0.
 
 To create verity files on an f2fs filesystem, the filesystem must have
 been formatted with ``-O verity``.
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "ext4 crypto: fix to check feature status before get policy"

2019-10-30 Thread Doug Anderson
Hi,

On Wed, Oct 30, 2019 at 1:57 PM Eric Biggers  wrote:
>
> FWIW, from reading the Chrome OS code, I think the code you linked to isn't
> where the breakage actually is.  I think it's actually at
> https://chromium.googlesource.com/chromiumos/platform2/+/refs/heads/master/chromeos-common-script/share/chromeos-common.sh#375
> ... where an init script is using the error message printed by 'e4crypt
> get_policy' to decide whether to add -O encrypt to the filesystem or not.
>
> It really should check instead:
>
> [ -e /sys/fs/ext4/features/encryption ]

OK, I filed  and CCed all the people listed
in the cryptohome "OWNERS" file.  Hopefully one of them can pick this
up as a general cleanup.  Thanks!

-Doug


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] Revert "ext4 crypto: fix to check feature status before get policy"

2019-10-30 Thread Eric Biggers
From: Douglas Anderson 

This reverts commit 0642ea2409f3 ("ext4 crypto: fix to check feature
status before get policy").

The commit made a clear and documented ABI change that is not backward
compatible.  There exists userspace code [1][2] that relied on the old
behavior and is now broken.

While we could entertain the idea of updating the userspace code to
handle the ABI change, it's my understanding that in general ABI
changes that break userspace are frowned upon (to put it nicely).

[1] 
https://chromium.googlesource.com/chromiumos/platform2/+/5993e5c2c2439d7a144863e9c7622736d72771d5/chromeos-common-script/share/chromeos-common.sh#375
[2] https://crbug.com/1018265

[EB: Note, this revert restores an inconsistency between ext4 and f2fs
 and restores the partially incorrect documentation.  Later we should
 try fixing the inconsistency the other way, by changing f2fs instead
 -- or if that won't work either, at least fixing the documentation.

 Also fixed link 1 above to point to the code which actually broke.]

Fixes: 0642ea2409f3 ("ext4 crypto: fix to check feature status before get 
policy")
Signed-off-by: Douglas Anderson 
Signed-off-by: Eric Biggers 
---

v2: improved commit message.

 Documentation/filesystems/fscrypt.rst | 3 +--
 fs/ext4/ioctl.c   | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/Documentation/filesystems/fscrypt.rst 
b/Documentation/filesystems/fscrypt.rst
index 8a0700af9596..4289c29d7c5a 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -562,8 +562,7 @@ FS_IOC_GET_ENCRYPTION_POLICY_EX can fail with the following 
errors:
   or this kernel is too old to support FS_IOC_GET_ENCRYPTION_POLICY_EX
   (try FS_IOC_GET_ENCRYPTION_POLICY instead)
 - ``EOPNOTSUPP``: the kernel was not configured with encryption
-  support for this filesystem, or the filesystem superblock has not
-  had encryption enabled on it
+  support for this filesystem
 - ``EOVERFLOW``: the file is encrypted and uses a recognized
   encryption policy version, but the policy struct does not fit into
   the provided buffer
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 0b7f316fd30f..13d97fb797b4 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1181,8 +1181,6 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
 #endif
}
case EXT4_IOC_GET_ENCRYPTION_POLICY:
-   if (!ext4_has_feature_encrypt(sb))
-   return -EOPNOTSUPP;
return fscrypt_ioctl_get_policy(filp, (void __user *)arg);
 
case FS_IOC_GET_ENCRYPTION_POLICY_EX:
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "ext4 crypto: fix to check feature status before get policy"

2019-10-30 Thread Eric Biggers
On Wed, Oct 30, 2019 at 10:51:20AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Oct 30, 2019 at 10:38 AM Eric Biggers  wrote:
> >
> > Hi Douglas,
> >
> > On Wed, Oct 30, 2019 at 10:06:25AM -0700, Douglas Anderson wrote:
> > > This reverts commit 0642ea2409f3 ("ext4 crypto: fix to check feature
> > > status before get policy").
> > >
> > > The commit made a clear and documented ABI change that is not backward
> > > compatible.  There exists userspace code [1] that relied on the old
> > > behavior and is now broken.
> > >
> > > While we could entertain the idea of updating the userspace code to
> > > handle the ABI change, it's my understanding that in general ABI
> > > changes that break userspace are frowned upon (to put it nicely).
> > >
> > > NOTE: if we for some reason do decide to entertain the idea of
> > > allowing the ABI change and updating userspace, I'd appreciate any
> > > help on how we should make the change.  Specifically the old code
> > > relied on the different return values to differentiate between
> > > "KeyState::NO_KEY" and "KeyState::NOT_SUPPORTED".  I'm no expert on
> > > the ext4 encryption APIs (I just ended up here tracking down the
> > > regression [2]) so I'd need a bit of handholding from someone.
> > >
> > > [1] 
> > > https://chromium.googlesource.com/chromiumos/platform2/+/refs/heads/master/cryptohome/dircrypto_util.cc#73
> > > [2] https://crbug.com/1018265
> > >
> > > Fixes: 0642ea2409f3 ("ext4 crypto: fix to check feature status before get 
> > > policy")
> > > Signed-off-by: Douglas Anderson 
> > > ---
> > >
> > >  Documentation/filesystems/fscrypt.rst | 3 +--
> > >  fs/ext4/ioctl.c   | 2 --
> > >  2 files changed, 1 insertion(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/fscrypt.rst 
> > > b/Documentation/filesystems/fscrypt.rst
> > > index 8a0700af9596..4289c29d7c5a 100644
> > > --- a/Documentation/filesystems/fscrypt.rst
> > > +++ b/Documentation/filesystems/fscrypt.rst
> > > @@ -562,8 +562,7 @@ FS_IOC_GET_ENCRYPTION_POLICY_EX can fail with the 
> > > following errors:
> > >or this kernel is too old to support FS_IOC_GET_ENCRYPTION_POLICY_EX
> > >(try FS_IOC_GET_ENCRYPTION_POLICY instead)
> > >  - ``EOPNOTSUPP``: the kernel was not configured with encryption
> > > -  support for this filesystem, or the filesystem superblock has not
> > > -  had encryption enabled on it
> > > +  support for this filesystem
> > >  - ``EOVERFLOW``: the file is encrypted and uses a recognized
> > >encryption policy version, but the policy struct does not fit into
> > >the provided buffer
> > > diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> > > index 0b7f316fd30f..13d97fb797b4 100644
> > > --- a/fs/ext4/ioctl.c
> > > +++ b/fs/ext4/ioctl.c
> > > @@ -1181,8 +1181,6 @@ long ext4_ioctl(struct file *filp, unsigned int 
> > > cmd, unsigned long arg)
> > >  #endif
> > >   }
> > >   case EXT4_IOC_GET_ENCRYPTION_POLICY:
> > > - if (!ext4_has_feature_encrypt(sb))
> > > - return -EOPNOTSUPP;
> > >   return fscrypt_ioctl_get_policy(filp, (void __user *)arg);
> > >
> >
> > Thanks for reporting this.  Can you elaborate on exactly why returning
> > EOPNOTSUPP breaks things in the Chrome OS code?  Since encryption is indeed 
> > not
> > supported, why isn't "KeyState::NOT_SUPPORTED" correct?
> 
> I guess all I know is from the cryptohome source code I sent a link
> to, which I'm not a super expert in.  Did you get a chance to take a
> look at that?  As far as I can tell the code is doing something like
> this:
> 
> 1. If I see EOPNOTSUPP then this must be a kernel without ext4 crypto.
> Fallback to using the old-style ecryptfs.
> 
> 2. If I see ENODATA then this is a kernel with ext4 crypto but there's
> no key yet.  We should set a key and (if necessarily) enable crypto on
> the filesystem.
> 
> 3. If I see no error then we're already good.
> 
> > Note that the state after this revert will be:
> >
> > - FS_IOC_GET_ENCRYPTION_POLICY on ext4 => ENODATA
> > - FS_IOC_GET_ENCRYPTION_POLICY on f2fs => EOPNOTSUPP
> > - FS_IOC_GET_ENCRYPTION_POLICY_EX on ext4 => EOPNOTSUPP
> > - FS_IOC_GET_ENCRYPTION_POLICY_EX on f2fs => EOPNOTSUPP
> >
> > So if this code change is made, the documentation would need to be updated 
> > to
> > explain that the error code from FS_IOC_GET_ENCRYPTION_POLICY is
> > filesystem-specific (which we'd really like to avoid...), and that
> > FS_IOC_GET_ENCRYPTION_POLICY_EX handles this case differently.  Or else the
> > other three would need to be changed to ENODATA -- which for
> > FS_IOC_GET_ENCRYPTION_POLICY on f2fs would be an ABI break in its own right,
> > though it's possible that no one would notice.
> >
> > Is your proposal to keep the error filesystem-specific for now?
> 
> I guess I'd have to leave it up to the people who know this better.
> Mostly I just saw this as an ABI change breaking userspace which to me
> means revert.  I have very little background here to 

Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Jaegeuk Kim
On 10/30, Eric Biggers wrote:
> On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> > On 2019/10/30 10:55, Eric Biggers wrote:
> > > On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
> > >> On 2019/10/28 6:50, Eric Biggers wrote:
> >  +bool f2fs_is_compressed_page(struct page *page)
> >  +{
> >  +  if (!page_private(page))
> >  +  return false;
> >  +  if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
> >  +  return false;
> >  +  return *((u32 *)page_private(page)) == 
> >  F2FS_COMPRESSED_PAGE_MAGIC;
> >  +}
> > >>>
> > >>> This code implies that there can be multiple page private structures 
> > >>> each of
> > >>> which has a different magic number.  But I only see 
> > >>> F2FS_COMPRESSED_PAGE_MAGIC.
> > >>> Where in the code is the other one(s)?
> > >>
> > >> I'm not sure I understood you correctly, did you mean it needs to 
> > >> introduce
> > >> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
> > >> f2fs_is_compressed_page()?
> > >>
> > > 
> > > No, I'm asking what is the case where the line
> > > 
> > >   *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
> > > 
> > > returns false?
> > 
> > Should be this?
> > 
> > if (!page_private(page))
> > return false;
> > f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
> > return true;
> 
> Yes, that makes more sense, unless there are other cases.
> 
> > 
> > > 
> > >>>
> >  +
> >  +static void f2fs_set_compressed_page(struct page *page,
> >  +  struct inode *inode, pgoff_t index, void *data, 
> >  refcount_t *r)
> >  +{
> >  +  SetPagePrivate(page);
> >  +  set_page_private(page, (unsigned long)data);
> >  +
> >  +  /* i_crypto_info and iv index */
> >  +  page->index = index;
> >  +  page->mapping = inode->i_mapping;
> >  +  if (r)
> >  +  refcount_inc(r);
> >  +}
> > >>>
> > >>> It isn't really appropriate to create fake pagecache pages like this.  
> > >>> Did you
> > >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> > >>
> > >> We need to store i_crypto_info and iv index somewhere, in order to pass 
> > >> them to
> > >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> > >>
> > > 
> > > The same place where the pages are stored.
> > 
> > Still we need allocate space for those fields, any strong reason to do so?
> > 
> 
> page->mapping set implies that the page is a pagecache page.  Faking it could
> cause problems with code elsewhere.

I've checked it with minchan, and it seems to be fine that filesystem uses
this page internally only, not in pagecache.

> 
> > > 
> >  +
> >  +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
> >  +{
> >  +  kvfree(cc->rpages);
> >  +}
> > >>>
> > >>> The memory is allocated with kzalloc(), so why is it freed with 
> > >>> kvfree() and not
> > >>> just kfree()?
> > >>
> > >> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
> > >> kmalloc() failed.
> > > 
> > > This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back 
> > > to
> > > kvmalloc().  As per its name, it should only use kmalloc().  
> > > f2fs_kvmalloc()
> > > already exists, so it can be used when the fallback is wanted.
> > 
> > We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() 
> > as
> > below:
> > 
> > f2fs_memalloc()
> > {
> > mem = f2fs_kmalloc();
> > if (mem)
> > return mem;
> > return f2fs_kvmalloc();
> > }
> > 
> > It can be used in specified place where we really need it, like the place
> > descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in 
> > where
> > we introduced original logic.
> 
> No, just use kvmalloc().  The whole point of kvmalloc() is that it tries
> kmalloc() and then falls back to vmalloc() if it fails.
> 
> - Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
Hi Eric,

(add some mm folks...)

On Wed, Oct 30, 2019 at 09:50:56AM -0700, Eric Biggers wrote:



> > >>>
> > >>> It isn't really appropriate to create fake pagecache pages like this.  
> > >>> Did you
> > >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> > >>
> > >> We need to store i_crypto_info and iv index somewhere, in order to pass 
> > >> them to
> > >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> > >>
> > > 
> > > The same place where the pages are stored.
> > 
> > Still we need allocate space for those fields, any strong reason to do so?
> > 
> 
> page->mapping set implies that the page is a pagecache page.  Faking it could
> cause problems with code elsewhere.

Not very related with this patch. Faking page->mapping was used in zsmalloc 
before
nonLRU migration (see material [1]) and use in erofs now (page->mapping to 
indicate
nonLRU short lifetime temporary page type, page->private is used for per-page 
information),
as far as I know, NonLRU page without PAGE_MAPPING_MOVABLE set is safe for most 
mm code.

On the other hands, I think NULL page->mapping will waste such field in precious
page structure... And we can not get such page type directly only by a NULL --
a truncated file page or just allocated page or some type internal temporary 
pages...

So I have some proposal is to use page->mapping to indicate specific page type 
for
such nonLRU pages (by some common convention, e.g. some real structure, rather 
than
just zero out to waste 8 bytes, it's also natural to indicate some page type by
its `mapping' naming )... Since my English is not very well, I delay it util 
now...

[1] https://elixir.bootlin.com/linux/v3.18.140/source/mm/zsmalloc.c#L379

https://lore.kernel.org/linux-mm/1459321935-3655-7-git-send-email-minc...@kernel.org
and some not very related topic: https://lwn.net/Articles/752564/

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
On Wed, Oct 30, 2019 at 09:33:13AM -0700, Jaegeuk Kim wrote:
> On 10/30, Theodore Y. Ts'o wrote:
> > On Wed, Oct 30, 2019 at 11:50:37PM +0800, Gao Xiang wrote:
> > > 
> > > So I'm curious about the original issue in commit 740432f83560
> > > ("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
> > > bios with its internal fio but it seems the commit is not helpful to
> > > resolve potential mempool deadlock (I'm confused since no calltrace,
> > > maybe I'm wrong)...
> > 
> > Two possibilities come to mind.  (a) It may be that on older kernels
> > (when f2fs is backported to older Board Support Package kernels from
> > the SOC vendors) didn't have the bio_alloc() guarantee, so it was
> > necessary on older kernels, but not on upstream, or (b) it wasn't
> > *actually* possible for bio_alloc() to fail and someone added the
> > error handling in 740432f83560 out of paranoia.
> 
> Yup, I was checking old device kernels but just stopped digging it out.
> Instead, I hesitate to apply this patch since I can't get why we need to
> get rid of this code for clean-up purpose. This may be able to bring
> some hassles when backporting to android/device kernels.

Yes, got you concern. As I said in other patches for many times, since
you're the maintainer of f2fs, it's all up to you (I'm not paranoia).
However, I think there are 2 valid reasons:

 1) As a newbie of Linux filesystem. When I study or work on f2fs,
and I saw these misleading code, I think I will produce similar
code in the future (not everyone refers comments above bio_alloc),
so such usage will spread (since one could refer some sample code
from exist code);

 2) Since it's upstream, I personally think appropriate cleanup is ok (anyway
it kills net 20+ line dead code), and this patch I think isn't so harmful
for backporting.

Thanks,
Gao Xiang

> 
> > 
> > (Hence my suggestion that in the ext4 version of the patch, we add a
> > code comment justifying why there was no error checking, to make it
> > clear that this was a deliberate choice.  :-)
> > 
> > - Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Eric Biggers
On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
>   static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
>   {
>  -/*
>  - * We use different work queues for decryption and for verity 
>  because
>  - * verity may require reading metadata pages that need 
>  decryption, and
>  - * we shouldn't recurse to the same workqueue.
>  - */
> >>>
> >>> Why is it okay (i.e., no deadlocks) to no longer use different work 
> >>> queues for
> >>> decryption and for verity?  See the comment above which is being deleted.
> >>
> >> Could you explain more about how deadlock happen? or share me a link 
> >> address if
> >> you have described that case somewhere?
> >>
> > 
> > The verity work can read pages from the file which require decryption.  I'm
> > concerned that it could deadlock if the work is scheduled on the same 
> > workqueue.
> 
> I assume you've tried one workqueue, and suffered deadlock..
> 
> > Granted, I'm not an expert in Linux workqueues, so if you've investigated 
> > this
> > and determined that it's safe, can you explain why?
> 
> I'm not familiar with workqueue...  I guess it may not safe that if the work 
> is
> scheduled to the same cpu in where verity was waiting for data? if the work is
> scheduled to other cpu, it may be safe.
> 
> I can check that before splitting the workqueue for verity and 
> decrypt/decompress.
> 

Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
the Merkle tree page to be decrypted.  This is with the v2 compression patch;
it works fine on current mainline.

INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
  Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:0D061  2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x299/0x6c0 kernel/sched/core.c:4069
 schedule+0x44/0xd0 kernel/sched/core.c:4136
 io_schedule+0x11/0x40 kernel/sched/core.c:5780
 wait_on_page_bit_common mm/filemap.c:1174 [inline]
 wait_on_page_bit mm/filemap.c:1223 [inline]
 wait_on_page_locked include/linux/pagemap.h:527 [inline]
 wait_on_page_locked include/linux/pagemap.h:524 [inline]
 wait_on_page_read mm/filemap.c:2767 [inline]
 do_read_cache_page+0x407/0x660 mm/filemap.c:2810
 read_cache_page+0xd/0x10 mm/filemap.c:2894
 f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
 verify_page+0x110/0x560 fs/verity/verify.c:120
 fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
 verity_work fs/f2fs/data.c:142 [inline]
 f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
 process_one_work+0x225/0x550 kernel/workqueue.c:2269
 worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
 kthread+0x125/0x140 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
  Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:1D0  1140  2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x299/0x6c0 kernel/sched/core.c:4069
 schedule+0x44/0xd0 kernel/sched/core.c:4136
 io_schedule+0x11/0x40 kernel/sched/core.c:5780
 wait_on_page_bit_common mm/filemap.c:1174 [inline]
 wait_on_page_bit mm/filemap.c:1223 [inline]
 wait_on_page_locked include/linux/pagemap.h:527 [inline]
 wait_on_page_locked include/linux/pagemap.h:524 [inline]
 wait_on_page_read mm/filemap.c:2767 [inline]
 do_read_cache_page+0x407/0x660 mm/filemap.c:2810
 read_cache_page+0xd/0x10 mm/filemap.c:2894
 f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
 verify_page+0x110/0x560 fs/verity/verify.c:120
 fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
 verity_work fs/f2fs/data.c:142 [inline]
 f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
 process_one_work+0x225/0x550 kernel/workqueue.c:2269
 worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
 kthread+0x125/0x140 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Showing all locks held in the system:
1 lock held by khungtaskd/21:
 #0: 82250520 (rcu_read_lock){}, at: 
rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
2 locks held by kworker/u5:0/61:
 #0: 88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: 
set_work_data kernel/workqueue.c:619 [inline]
 #0: 88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: 
set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
 #0: 88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: 
process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
 #1: c9253e50 ((work_completion)(>work)){+.+.}, at: set_work_data 

Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Eric Biggers
On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> On 2019/10/30 10:55, Eric Biggers wrote:
> > On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
> >> On 2019/10/28 6:50, Eric Biggers wrote:
>  +bool f2fs_is_compressed_page(struct page *page)
>  +{
>  +if (!page_private(page))
>  +return false;
>  +if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
>  +return false;
>  +return *((u32 *)page_private(page)) == 
>  F2FS_COMPRESSED_PAGE_MAGIC;
>  +}
> >>>
> >>> This code implies that there can be multiple page private structures each 
> >>> of
> >>> which has a different magic number.  But I only see 
> >>> F2FS_COMPRESSED_PAGE_MAGIC.
> >>> Where in the code is the other one(s)?
> >>
> >> I'm not sure I understood you correctly, did you mean it needs to introduce
> >> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
> >> f2fs_is_compressed_page()?
> >>
> > 
> > No, I'm asking what is the case where the line
> > 
> > *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
> > 
> > returns false?
> 
> Should be this?
> 
> if (!page_private(page))
>   return false;
> f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
> return true;

Yes, that makes more sense, unless there are other cases.

> 
> > 
> >>>
>  +
>  +static void f2fs_set_compressed_page(struct page *page,
>  +struct inode *inode, pgoff_t index, void *data, 
>  refcount_t *r)
>  +{
>  +SetPagePrivate(page);
>  +set_page_private(page, (unsigned long)data);
>  +
>  +/* i_crypto_info and iv index */
>  +page->index = index;
>  +page->mapping = inode->i_mapping;
>  +if (r)
>  +refcount_inc(r);
>  +}
> >>>
> >>> It isn't really appropriate to create fake pagecache pages like this.  
> >>> Did you
> >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> >>
> >> We need to store i_crypto_info and iv index somewhere, in order to pass 
> >> them to
> >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> >>
> > 
> > The same place where the pages are stored.
> 
> Still we need allocate space for those fields, any strong reason to do so?
> 

page->mapping set implies that the page is a pagecache page.  Faking it could
cause problems with code elsewhere.

> > 
>  +
>  +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
>  +{
>  +kvfree(cc->rpages);
>  +}
> >>>
> >>> The memory is allocated with kzalloc(), so why is it freed with kvfree() 
> >>> and not
> >>> just kfree()?
> >>
> >> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
> >> kmalloc() failed.
> > 
> > This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
> > kvmalloc().  As per its name, it should only use kmalloc().  f2fs_kvmalloc()
> > already exists, so it can be used when the fallback is wanted.
> 
> We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() as
> below:
> 
> f2fs_memalloc()
> {
>   mem = f2fs_kmalloc();
>   if (mem)
>   return mem;
>   return f2fs_kvmalloc();
> }
> 
> It can be used in specified place where we really need it, like the place
> descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in 
> where
> we introduced original logic.

No, just use kvmalloc().  The whole point of kvmalloc() is that it tries
kmalloc() and then falls back to vmalloc() if it fails.

- Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/2] libf2fs_io: Add user-space cache

2019-10-30 Thread Jaegeuk Kim
Hi Robin,

Could you please address this build errors for mac build?

libf2fs_io.c:85:38: error: use of undeclared identifier 'false'
static bool dcache_exit_registered = false;
 ^
libf2fs_io.c:95:34: error: use of undeclared identifier 'false'
static bool dcache_initialized = false;
 ^
libf2fs_io.c:127:4: warning: format specifies type 'unsigned long' but the 
argument has type 'off64_t' (aka 'long long') [-Wformat]
dcache_config.num_cache_entry,
^
libf2fs_io.c:128:4: warning: format specifies type 'unsigned long' but the 
argument has type 'off64_t' (aka 'long long') [-Wformat]
useCnt,
^~
libf2fs_io.c:129:4: warning: format specifies type 'long' but the argument has 
type 'int64_t' (aka 'long long') [-Wformat]
dcache_raccess,
^~
libf2fs_io.c:130:4: warning: format specifies type 'long' but the argument has 
type 'int64_t' (aka 'long long') [-Wformat]
dcache_rhit,
^~~
libf2fs_io.c:131:4: warning: format specifies type 'long' but the argument has 
type 'int64_t' (aka 'long long') [-Wformat]
dcache_rmiss,
^~~~
libf2fs_io.c:132:4: warning: format specifies type 'long' but the argument has 
type 'int64_t' (aka 'long long') [-Wformat]
dcache_rreplace);
^~~
libf2fs_io.c:140:23: error: use of undeclared identifier 'false'
dcache_initialized = false;
 ^
libf2fs_io.c:222:23: error: use of undeclared identifier 'true'
dcache_initialized = true;
 ^
libf2fs_io.c:225:28: error: use of undeclared identifier 'true'
dcache_exit_registered = true;
 ^
libf2fs_io.c:273:24: error: use of undeclared identifier 'true'
dcache_valid[entry] = true;
  ^
libf2fs_io.c:363:50: error: use of undeclared identifier 'true'
return dcache_update_rw(fd, buf, offset, count, true);
^
libf2fs_io.c:369:50: error: use of undeclared identifier 'false'
return dcache_update_rw(fd, buf, offset, count, false);


On 10/29, Robin Hsu wrote:
> Implemented cache options in F2FS configuration 'c':
>   * use c.cache_config.num_cache_entry to set the number of
> cache entries (in block), minimum 1024, or 0 to disable cache.
>   * use c.cache_config.max_hash_collision to set maximum hash
> collision (max 16).
>   * use c.cache_config.dbg_en to enable printing of debug messages.
> 
> Signed-off-by: Robin Hsu 
> ---
>  include/f2fs_fs.h |  20 +++
>  lib/libf2fs_io.c  | 317 ++
>  2 files changed, 337 insertions(+)
> 
> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
> index 84f3f3e..a386e61 100644
> --- a/include/f2fs_fs.h
> +++ b/include/f2fs_fs.h
> @@ -3,6 +3,8 @@
>   *
>   * Copyright (c) 2012 Samsung Electronics Co., Ltd.
>   * http://www.samsung.com/
> + * Copyright (c) 2019 Google Inc.
> + * http://www.google.com/
>   *
>   * Dual licensed under the GPL or LGPL version 2 licenses.
>   *
> @@ -329,6 +331,18 @@ struct device_info {
>   size_t zone_blocks;
>  };
>  
> +typedef off_toff64_t;
> +
> +typedef struct {
> + /* Value 0 means no cache, minimum 1024 */
> + off64_t num_cache_entry;
> +
> + /* Value 0 means always overwrite (no collision allowed). maximum 16 */
> + unsigned max_hash_collision;
> +
> + bool dbg_en;
> +} dev_cache_config_t;
> +
>  struct f2fs_configuration {
>   u_int32_t reserved_segments;
>   u_int32_t new_reserved_segments;
> @@ -419,6 +433,9 @@ struct f2fs_configuration {
>  
>   /* precomputed fs UUID checksum for seeding other checksums */
>   u_int32_t chksum_seed;
> +
> + /* cache parameters */
> + dev_cache_config_t cache_config;
>  };
>  
>  #ifdef CONFIG_64BIT
> @@ -1185,6 +1202,9 @@ extern int f2fs_init_sparse_file(void);
>  extern int f2fs_finalize_device(void);
>  extern int f2fs_fsync_device(void);
>  
> +extern void dcache_init(void);
> +extern void dcache_release(void);
> +
>  extern int dev_read(void *, __u64, size_t);
>  #ifdef POSIX_FADV_WILLNEED
>  extern int dev_readahead(__u64, size_t);
> diff --git a/lib/libf2fs_io.c b/lib/libf2fs_io.c
> index 4d0ea0d..4888b6e 100644
> --- a/lib/libf2fs_io.c
> +++ b/lib/libf2fs_io.c
> @@ -3,6 +3,8 @@
>   *
>   * Copyright (c) 2013 Samsung Electronics Co., Ltd.
>   * http://www.samsung.com/
> + * Copyright (c) 2019 Google Inc.
> + * http://www.google.com/
>   *
>   * Dual licensed under the GPL or LGPL version 2 licenses.
>   */
> @@ -64,6 +66,309 @@ 

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Jaegeuk Kim
On 10/30, Theodore Y. Ts'o wrote:
> On Wed, Oct 30, 2019 at 11:50:37PM +0800, Gao Xiang wrote:
> > 
> > So I'm curious about the original issue in commit 740432f83560
> > ("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
> > bios with its internal fio but it seems the commit is not helpful to
> > resolve potential mempool deadlock (I'm confused since no calltrace,
> > maybe I'm wrong)...
> 
> Two possibilities come to mind.  (a) It may be that on older kernels
> (when f2fs is backported to older Board Support Package kernels from
> the SOC vendors) didn't have the bio_alloc() guarantee, so it was
> necessary on older kernels, but not on upstream, or (b) it wasn't
> *actually* possible for bio_alloc() to fail and someone added the
> error handling in 740432f83560 out of paranoia.

Yup, I was checking old device kernels but just stopped digging it out.
Instead, I hesitate to apply this patch since I can't get why we need to
get rid of this code for clean-up purpose. This may be able to bring
some hassles when backporting to android/device kernels.

> 
> (Hence my suggestion that in the ext4 version of the patch, we add a
> code comment justifying why there was no error checking, to make it
> clear that this was a deliberate choice.  :-)
> 
>   - Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Theodore Y. Ts'o
On Wed, Oct 30, 2019 at 11:50:37PM +0800, Gao Xiang wrote:
> 
> So I'm curious about the original issue in commit 740432f83560
> ("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
> bios with its internal fio but it seems the commit is not helpful to
> resolve potential mempool deadlock (I'm confused since no calltrace,
> maybe I'm wrong)...

Two possibilities come to mind.  (a) It may be that on older kernels
(when f2fs is backported to older Board Support Package kernels from
the SOC vendors) didn't have the bio_alloc() guarantee, so it was
necessary on older kernels, but not on upstream, or (b) it wasn't
*actually* possible for bio_alloc() to fail and someone added the
error handling in 740432f83560 out of paranoia.

(Hence my suggestion that in the ext4 version of the patch, we add a
code comment justifying why there was no error checking, to make it
clear that this was a deliberate choice.  :-)

- Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/2] f2fs: support aligned pinned file

2019-10-30 Thread Jaegeuk Kim
On 10/26, Chao Yu wrote:
> On 2019/10/26 2:18, Jaegeuk Kim wrote:
> > On 10/24, Chao Yu wrote:
> >> Hi Jaegeuk,
> >>
> >> On 2019/10/23 1:16, Jaegeuk Kim wrote:
> >>> This patch supports 2MB-aligned pinned file, which can guarantee no GC at 
> >>> all
> >>> by allocating fully valid 2MB segment.
> >>>
> >>> Signed-off-by: Jaegeuk Kim 
> >>> ---
> >>>  fs/f2fs/f2fs.h |  4 +++-
> >>>  fs/f2fs/file.c | 39 ++-
> >>>  fs/f2fs/recovery.c |  2 +-
> >>>  fs/f2fs/segment.c  | 21 -
> >>>  fs/f2fs/segment.h  |  2 ++
> >>>  fs/f2fs/super.c|  1 +
> >>>  fs/f2fs/sysfs.c|  2 ++
> >>>  7 files changed, 63 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index ca342f4c7db1..c681f51e351b 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -890,6 +890,7 @@ enum {
> >>>   CURSEG_WARM_NODE,   /* direct node blocks of normal files */
> >>>   CURSEG_COLD_NODE,   /* indirect node blocks */
> >>>   NO_CHECK_TYPE,
> >>> + CURSEG_COLD_DATA_PINNED,/* cold data for pinned file */
> >>>  };
> >>>  
> >>>  struct flush_cmd {
> >>> @@ -1301,6 +1302,7 @@ struct f2fs_sb_info {
> >>>  
> >>>   /* threshold for gc trials on pinned files */
> >>>   u64 gc_pin_file_threshold;
> >>> + struct rw_semaphore pin_sem;
> >>>  
> >>>   /* maximum # of trials to find a victim segment for SSR and GC */
> >>>   unsigned int max_victim_search;
> >>> @@ -3116,7 +3118,7 @@ void f2fs_release_discard_addrs(struct f2fs_sb_info 
> >>> *sbi);
> >>>  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
> >>>  void allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
> >>>   unsigned int start, unsigned int end);
> >>> -void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
> >>> +void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi, int type);
> >>>  int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
> >>>  bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
> >>>   struct cp_control *cpc);
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>> index 29bc0a542759..f6c038e8a6a7 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -1545,12 +1545,41 @@ static int expand_inode_data(struct inode *inode, 
> >>> loff_t offset,
> >>>   if (off_end)
> >>>   map.m_len++;
> >>>  
> >>> - if (f2fs_is_pinned_file(inode))
> >>> - map.m_seg_type = CURSEG_COLD_DATA;
> >>> + if (!map.m_len)
> >>> + return 0;
> >>> +
> >>> + if (f2fs_is_pinned_file(inode)) {
> >>> + block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
> >>> + sbi->log_blocks_per_seg;
> >>> + block_t done = 0;
> >>> +
> >>> + if (map.m_len % sbi->blocks_per_seg)
> >>> + len += sbi->blocks_per_seg;
> >>>  
> >>> - err = f2fs_map_blocks(inode, , 1, (f2fs_is_pinned_file(inode) ?
> >>> - F2FS_GET_BLOCK_PRE_DIO :
> >>> - F2FS_GET_BLOCK_PRE_AIO));
> >>> + map.m_len = sbi->blocks_per_seg;
> >>> +next_alloc:
> >>> + mutex_lock(>gc_mutex);
> >>> + err = f2fs_gc(sbi, true, false, NULL_SEGNO);
> >>> + if (err && err != -ENODATA && err != -EAGAIN)
> >>> + goto out_err;
> >>
> >> To grab enough free space?
> >>
> >> Shouldn't we call
> >>
> >>if (has_not_enough_free_secs(sbi, 0, 0)) {
> >>mutex_lock(>gc_mutex);
> >>f2fs_gc(sbi, false, false, NULL_SEGNO);
> >>}
> > 
> > The above calls gc all the time. Do we need this?
> 
> Hmmm... my concern is why we need to run foreground GC even if there is enough
> free space..

In order to get the free segment easily?

> 
> > 
> >>
> >>> +
> >>> + down_write(>pin_sem);
> >>> + map.m_seg_type = CURSEG_COLD_DATA_PINNED;
> >>> + f2fs_allocate_new_segments(sbi, CURSEG_COLD_DATA);
> >>> + err = f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_DIO);
> >>> + up_write(>pin_sem);
> >>> +
> >>> + done += map.m_len;
> >>> + len -= map.m_len;
> >>> + map.m_lblk += map.m_len;
> >>> + if (!err && len)
> >>> + goto next_alloc;
> >>> +
> >>> + map.m_len = done;
> >>> + } else {
> >>> + err = f2fs_map_blocks(inode, , 1, F2FS_GET_BLOCK_PRE_AIO);
> >>> + }
> >>> +out_err:
> >>>   if (err) {
> >>>   pgoff_t last_off;
> >>>  
> >>> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> >>> index 783773e4560d..76477f71d4ee 100644
> >>> --- a/fs/f2fs/recovery.c
> >>> +++ b/fs/f2fs/recovery.c
> >>> @@ -711,7 +711,7 @@ static int recover_data(struct f2fs_sb_info *sbi, 
> >>> struct list_head *inode_list,
> >>>   f2fs_put_page(page, 1);
> >>>   }
> >>>   if (!err)
> >>> - f2fs_allocate_new_segments(sbi);
> >>> + 

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
Hi Ted,

On Wed, Oct 30, 2019 at 11:14:45AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Oct 30, 2019 at 06:43:45PM +0800, Gao Xiang wrote:
> > > You're right, in low memory scenario, allocation with bioset will be 
> > > faster, as
> > > you mentioned offline, maybe we can add/use a priviate bioset like btrfs 
> > > did
> > > rather than using global one, however, we'd better check how deadlock 
> > > happen
> > > with a bioset mempool first ...
> > 
> > Okay, hope to get hints from Jaegeuk and redo this patch then...
> 
> It's not at all clear to me that using a private bioset is a good idea
> for f2fs.  That just means you're allocating a separate chunk of
> memory just for f2fs, as opposed to using the global pool.  That's an
> additional chunk of non-swapable kernel memory that's not going to be
> available, in *addition* to the global mempool.  
> 
> Also, who else would you be contending for space with the global
> mempool?  It's not like an mobile handset is going to have other users
> of the global bio mempool.
> 
> On a low-end mobile handset, memory is at a premium, so wasting memory
> to no good effect isn't going to be a great idea.

Thanks for your reply. I agree with your idea.

Actually I think after this version patch is applied, all are the same
as the previous status (whether some deadlock or not).

So I'm curious about the original issue in commit 740432f83560
("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
bios with its internal fio but it seems the commit is not helpful to
resolve potential mempool deadlock (I'm confused since no calltrace,
maybe I'm wrong)...

I think it should be gotten clear first and think how to do next..
(I tend not to add another private bioset since it's unshareable as you
 said as well...)

Thanks,
Gao Xiang

> 
> Regards,
> 
>   - Ted
> 
>
 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Theodore Y. Ts'o
On Wed, Oct 30, 2019 at 06:43:45PM +0800, Gao Xiang wrote:
> > You're right, in low memory scenario, allocation with bioset will be 
> > faster, as
> > you mentioned offline, maybe we can add/use a priviate bioset like btrfs did
> > rather than using global one, however, we'd better check how deadlock happen
> > with a bioset mempool first ...
> 
> Okay, hope to get hints from Jaegeuk and redo this patch then...

It's not at all clear to me that using a private bioset is a good idea
for f2fs.  That just means you're allocating a separate chunk of
memory just for f2fs, as opposed to using the global pool.  That's an
additional chunk of non-swapable kernel memory that's not going to be
available, in *addition* to the global mempool.  

Also, who else would you be contending for space with the global
mempool?  It's not like an mobile handset is going to have other users
of the global bio mempool.

On a low-end mobile handset, memory is at a premium, so wasting memory
to no good effect isn't going to be a great idea.

Regards,

- Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang
On Wed, Oct 30, 2019 at 05:27:54PM +0800, Chao Yu wrote:
> Hi Xiang,
> 
> On 2019/10/30 17:15, Gao Xiang wrote:
> > Hi Chao,
> > 
> > On Wed, Oct 30, 2019 at 04:56:17PM +0800, Chao Yu wrote:
> >> On 2019/10/30 11:55, Gao Xiang wrote:
> >>> remove such useless code and related fault injection.
> >>
> >> Hi Xiang,
> >>
> >> Although, there is so many 'nofail' allocation in f2fs, I think we'd better
> >> avoid such allocation as much as possible (now for read path, we may allow 
> >> to
> >> fail to allocate bio), I suggest to keep the failure path and bio 
> >> allocation
> >> injection.
> >>
> >> It looks bio_alloc() will use its own mempool, which may suffer deadlock
> >> potentially. So how about changing to use bio_alloc_bioset(, , NULL) 
> >> instead of
> >> bio_alloc()?
> > 
> > Yes, I noticed the original commit 740432f83560 ("f2fs: handle failed bio 
> > allocation"),
> > yet I don't find any real call trace clue what happened before.
> > 
> > As my understanding, if we allocate bios without submit_bio (I mean write 
> > path) with
> > default bs and gfp_flags GFP_NOIO or GFP_KERNEL, I think it will be slept 
> > inside
> > mempool rather than return NULL to its caller... Please correct me if I'm 
> > wrong...
> 
> I'm curious too...
> 
> Jaegeuk may know the details.
> 
> > 
> > I could send another patch with bio_alloc_bioset(, , NULL), I am curious to 
> > know the
> > original issue and how it solved though...
> > 
> > For read or flush path, since it will submit_bio and bio_alloc one by one, 
> > I think
> > mempool will get a page quicker (memory failure path could be longer). But 
> > I can
> > send a patch just by using bio_alloc_bioset(, , NULL) instead as you 
> > suggested later.
> 
> You're right, in low memory scenario, allocation with bioset will be faster, 
> as
> you mentioned offline, maybe we can add/use a priviate bioset like btrfs did
> rather than using global one, however, we'd better check how deadlock happen
> with a bioset mempool first ...

Okay, hope to get hints from Jaegeuk and redo this patch then...

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> Signed-off-by: Gao Xiang 
> >>> ---
> >>>  Documentation/filesystems/f2fs.txt |  1 -
> >>>  fs/f2fs/data.c |  6 ++
> >>>  fs/f2fs/f2fs.h | 21 -
> >>>  fs/f2fs/segment.c  |  5 +
> >>>  fs/f2fs/super.c|  1 -
> >>>  5 files changed, 3 insertions(+), 31 deletions(-)
> >>>
> >>> diff --git a/Documentation/filesystems/f2fs.txt 
> >>> b/Documentation/filesystems/f2fs.txt
> >>> index 7e1991328473..3477c3e4c08b 100644
> >>> --- a/Documentation/filesystems/f2fs.txt
> >>> +++ b/Documentation/filesystems/f2fs.txt
> >>> @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
> >>> injection type, should be
> >>> FAULT_KVMALLOC0x2
> >>> FAULT_PAGE_ALLOC  0x4
> >>> FAULT_PAGE_GET0x8
> >>> -   FAULT_ALLOC_BIO   0x00010
> >>> FAULT_ALLOC_NID   0x00020
> >>> FAULT_ORPHAN  0x00040
> >>> FAULT_BLOCK   0x00080
> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> >>> index 5755e897a5f0..3b88dcb15de6 100644
> >>> --- a/fs/f2fs/data.c
> >>> +++ b/fs/f2fs/data.c
> >>> @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info 
> >>> *fio, int npages)
> >>>   struct f2fs_sb_info *sbi = fio->sbi;
> >>>   struct bio *bio;
> >>>  
> >>> - bio = f2fs_bio_alloc(sbi, npages, true);
> >>> + bio = bio_alloc(GFP_NOIO, npages);
> >>>  
> >>>   f2fs_target_device(sbi, fio->new_blkaddr, bio);
> >>>   if (is_read_io(fio->op)) {
> >>> @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
> >>> *inode, block_t blkaddr,
> >>>   struct bio_post_read_ctx *ctx;
> >>>   unsigned int post_read_steps = 0;
> >>>  
> >>> - bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
> >>> - if (!bio)
> >>> - return ERR_PTR(-ENOMEM);
> >>> + bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
> >>>   f2fs_target_device(sbi, blkaddr, bio);
> >>>   bio->bi_end_io = f2fs_read_end_io;
> >>>   bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 4024790028aa..40012f874be0 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -44,7 +44,6 @@ enum {
> >>>   FAULT_KVMALLOC,
> >>>   FAULT_PAGE_ALLOC,
> >>>   FAULT_PAGE_GET,
> >>> - FAULT_ALLOC_BIO,
> >>>   FAULT_ALLOC_NID,
> >>>   FAULT_ORPHAN,
> >>>   FAULT_BLOCK,
> >>> @@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
> >>> kmem_cache *cachep,
> >>>   return entry;
> >>>  }
> >>>  
> >>> -static inline struct bio *f2fs_bio_alloc(struct 

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Chao Yu
Hi Xiang,

On 2019/10/30 17:15, Gao Xiang wrote:
> Hi Chao,
> 
> On Wed, Oct 30, 2019 at 04:56:17PM +0800, Chao Yu wrote:
>> On 2019/10/30 11:55, Gao Xiang wrote:
>>> remove such useless code and related fault injection.
>>
>> Hi Xiang,
>>
>> Although, there is so many 'nofail' allocation in f2fs, I think we'd better
>> avoid such allocation as much as possible (now for read path, we may allow to
>> fail to allocate bio), I suggest to keep the failure path and bio allocation
>> injection.
>>
>> It looks bio_alloc() will use its own mempool, which may suffer deadlock
>> potentially. So how about changing to use bio_alloc_bioset(, , NULL) instead 
>> of
>> bio_alloc()?
> 
> Yes, I noticed the original commit 740432f83560 ("f2fs: handle failed bio 
> allocation"),
> yet I don't find any real call trace clue what happened before.
> 
> As my understanding, if we allocate bios without submit_bio (I mean write 
> path) with
> default bs and gfp_flags GFP_NOIO or GFP_KERNEL, I think it will be slept 
> inside
> mempool rather than return NULL to its caller... Please correct me if I'm 
> wrong...

I'm curious too...

Jaegeuk may know the details.

> 
> I could send another patch with bio_alloc_bioset(, , NULL), I am curious to 
> know the
> original issue and how it solved though...
> 
> For read or flush path, since it will submit_bio and bio_alloc one by one, I 
> think
> mempool will get a page quicker (memory failure path could be longer). But I 
> can
> send a patch just by using bio_alloc_bioset(, , NULL) instead as you 
> suggested later.

You're right, in low memory scenario, allocation with bioset will be faster, as
you mentioned offline, maybe we can add/use a priviate bioset like btrfs did
rather than using global one, however, we'd better check how deadlock happen
with a bioset mempool first ...

Thanks,

> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>>
>>>
>>> Signed-off-by: Gao Xiang 
>>> ---
>>>  Documentation/filesystems/f2fs.txt |  1 -
>>>  fs/f2fs/data.c |  6 ++
>>>  fs/f2fs/f2fs.h | 21 -
>>>  fs/f2fs/segment.c  |  5 +
>>>  fs/f2fs/super.c|  1 -
>>>  5 files changed, 3 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/f2fs.txt 
>>> b/Documentation/filesystems/f2fs.txt
>>> index 7e1991328473..3477c3e4c08b 100644
>>> --- a/Documentation/filesystems/f2fs.txt
>>> +++ b/Documentation/filesystems/f2fs.txt
>>> @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
>>> injection type, should be
>>> FAULT_KVMALLOC  0x2
>>> FAULT_PAGE_ALLOC0x4
>>> FAULT_PAGE_GET  0x8
>>> -   FAULT_ALLOC_BIO 0x00010
>>> FAULT_ALLOC_NID 0x00020
>>> FAULT_ORPHAN0x00040
>>> FAULT_BLOCK 0x00080
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 5755e897a5f0..3b88dcb15de6 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info 
>>> *fio, int npages)
>>> struct f2fs_sb_info *sbi = fio->sbi;
>>> struct bio *bio;
>>>  
>>> -   bio = f2fs_bio_alloc(sbi, npages, true);
>>> +   bio = bio_alloc(GFP_NOIO, npages);
>>>  
>>> f2fs_target_device(sbi, fio->new_blkaddr, bio);
>>> if (is_read_io(fio->op)) {
>>> @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
>>> *inode, block_t blkaddr,
>>> struct bio_post_read_ctx *ctx;
>>> unsigned int post_read_steps = 0;
>>>  
>>> -   bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
>>> -   if (!bio)
>>> -   return ERR_PTR(-ENOMEM);
>>> +   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
>>> f2fs_target_device(sbi, blkaddr, bio);
>>> bio->bi_end_io = f2fs_read_end_io;
>>> bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 4024790028aa..40012f874be0 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -44,7 +44,6 @@ enum {
>>> FAULT_KVMALLOC,
>>> FAULT_PAGE_ALLOC,
>>> FAULT_PAGE_GET,
>>> -   FAULT_ALLOC_BIO,
>>> FAULT_ALLOC_NID,
>>> FAULT_ORPHAN,
>>> FAULT_BLOCK,
>>> @@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
>>> kmem_cache *cachep,
>>> return entry;
>>>  }
>>>  
>>> -static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
>>> -   int npages, bool no_fail)
>>> -{
>>> -   struct bio *bio;
>>> -
>>> -   if (no_fail) {
>>> -   /* No failure on bio allocation */
>>> -   bio = bio_alloc(GFP_NOIO, npages);
>>> -   if (!bio)
>>> -   bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
>>> -   

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang
Hi Chao,

On Wed, Oct 30, 2019 at 04:56:17PM +0800, Chao Yu wrote:
> On 2019/10/30 11:55, Gao Xiang wrote:
> > remove such useless code and related fault injection.
> 
> Hi Xiang,
> 
> Although, there is so many 'nofail' allocation in f2fs, I think we'd better
> avoid such allocation as much as possible (now for read path, we may allow to
> fail to allocate bio), I suggest to keep the failure path and bio allocation
> injection.
> 
> It looks bio_alloc() will use its own mempool, which may suffer deadlock
> potentially. So how about changing to use bio_alloc_bioset(, , NULL) instead 
> of
> bio_alloc()?

Yes, I noticed the original commit 740432f83560 ("f2fs: handle failed bio 
allocation"),
yet I don't find any real call trace clue what happened before.

As my understanding, if we allocate bios without submit_bio (I mean write path) 
with
default bs and gfp_flags GFP_NOIO or GFP_KERNEL, I think it will be slept inside
mempool rather than return NULL to its caller... Please correct me if I'm 
wrong...

I could send another patch with bio_alloc_bioset(, , NULL), I am curious to 
know the
original issue and how it solved though...

For read or flush path, since it will submit_bio and bio_alloc one by one, I 
think
mempool will get a page quicker (memory failure path could be longer). But I can
send a patch just by using bio_alloc_bioset(, , NULL) instead as you suggested 
later.

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Signed-off-by: Gao Xiang 
> > ---
> >  Documentation/filesystems/f2fs.txt |  1 -
> >  fs/f2fs/data.c |  6 ++
> >  fs/f2fs/f2fs.h | 21 -
> >  fs/f2fs/segment.c  |  5 +
> >  fs/f2fs/super.c|  1 -
> >  5 files changed, 3 insertions(+), 31 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/f2fs.txt 
> > b/Documentation/filesystems/f2fs.txt
> > index 7e1991328473..3477c3e4c08b 100644
> > --- a/Documentation/filesystems/f2fs.txt
> > +++ b/Documentation/filesystems/f2fs.txt
> > @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
> > injection type, should be
> > FAULT_KVMALLOC  0x2
> > FAULT_PAGE_ALLOC0x4
> > FAULT_PAGE_GET  0x8
> > -   FAULT_ALLOC_BIO 0x00010
> > FAULT_ALLOC_NID 0x00020
> > FAULT_ORPHAN0x00040
> > FAULT_BLOCK 0x00080
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 5755e897a5f0..3b88dcb15de6 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info 
> > *fio, int npages)
> > struct f2fs_sb_info *sbi = fio->sbi;
> > struct bio *bio;
> >  
> > -   bio = f2fs_bio_alloc(sbi, npages, true);
> > +   bio = bio_alloc(GFP_NOIO, npages);
> >  
> > f2fs_target_device(sbi, fio->new_blkaddr, bio);
> > if (is_read_io(fio->op)) {
> > @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
> > *inode, block_t blkaddr,
> > struct bio_post_read_ctx *ctx;
> > unsigned int post_read_steps = 0;
> >  
> > -   bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
> > -   if (!bio)
> > -   return ERR_PTR(-ENOMEM);
> > +   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
> > f2fs_target_device(sbi, blkaddr, bio);
> > bio->bi_end_io = f2fs_read_end_io;
> > bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4024790028aa..40012f874be0 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -44,7 +44,6 @@ enum {
> > FAULT_KVMALLOC,
> > FAULT_PAGE_ALLOC,
> > FAULT_PAGE_GET,
> > -   FAULT_ALLOC_BIO,
> > FAULT_ALLOC_NID,
> > FAULT_ORPHAN,
> > FAULT_BLOCK,
> > @@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
> > kmem_cache *cachep,
> > return entry;
> >  }
> >  
> > -static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
> > -   int npages, bool no_fail)
> > -{
> > -   struct bio *bio;
> > -
> > -   if (no_fail) {
> > -   /* No failure on bio allocation */
> > -   bio = bio_alloc(GFP_NOIO, npages);
> > -   if (!bio)
> > -   bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
> > -   return bio;
> > -   }
> > -   if (time_to_inject(sbi, FAULT_ALLOC_BIO)) {
> > -   f2fs_show_injection_info(FAULT_ALLOC_BIO);
> > -   return NULL;
> > -   }
> > -
> > -   return bio_alloc(GFP_KERNEL, npages);
> > -}
> > -
> >  static inline bool is_idle(struct f2fs_sb_info *sbi, int type)
> >  {
> > if (sbi->gc_mode == GC_URGENT)
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 808709581481..28457c878d0d 

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Chao Yu
On 2019/10/30 11:55, Gao Xiang wrote:
> remove such useless code and related fault injection.

Hi Xiang,

Although, there is so many 'nofail' allocation in f2fs, I think we'd better
avoid such allocation as much as possible (now for read path, we may allow to
fail to allocate bio), I suggest to keep the failure path and bio allocation
injection.

It looks bio_alloc() will use its own mempool, which may suffer deadlock
potentially. So how about changing to use bio_alloc_bioset(, , NULL) instead of
bio_alloc()?

Thanks,

> 
> Signed-off-by: Gao Xiang 
> ---
>  Documentation/filesystems/f2fs.txt |  1 -
>  fs/f2fs/data.c |  6 ++
>  fs/f2fs/f2fs.h | 21 -
>  fs/f2fs/segment.c  |  5 +
>  fs/f2fs/super.c|  1 -
>  5 files changed, 3 insertions(+), 31 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.txt 
> b/Documentation/filesystems/f2fs.txt
> index 7e1991328473..3477c3e4c08b 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
> injection type, should be
> FAULT_KVMALLOC0x2
> FAULT_PAGE_ALLOC  0x4
> FAULT_PAGE_GET0x8
> -   FAULT_ALLOC_BIO   0x00010
> FAULT_ALLOC_NID   0x00020
> FAULT_ORPHAN  0x00040
> FAULT_BLOCK   0x00080
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 5755e897a5f0..3b88dcb15de6 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, 
> int npages)
>   struct f2fs_sb_info *sbi = fio->sbi;
>   struct bio *bio;
>  
> - bio = f2fs_bio_alloc(sbi, npages, true);
> + bio = bio_alloc(GFP_NOIO, npages);
>  
>   f2fs_target_device(sbi, fio->new_blkaddr, bio);
>   if (is_read_io(fio->op)) {
> @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
> *inode, block_t blkaddr,
>   struct bio_post_read_ctx *ctx;
>   unsigned int post_read_steps = 0;
>  
> - bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
> - if (!bio)
> - return ERR_PTR(-ENOMEM);
> + bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
>   f2fs_target_device(sbi, blkaddr, bio);
>   bio->bi_end_io = f2fs_read_end_io;
>   bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4024790028aa..40012f874be0 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -44,7 +44,6 @@ enum {
>   FAULT_KVMALLOC,
>   FAULT_PAGE_ALLOC,
>   FAULT_PAGE_GET,
> - FAULT_ALLOC_BIO,
>   FAULT_ALLOC_NID,
>   FAULT_ORPHAN,
>   FAULT_BLOCK,
> @@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
> kmem_cache *cachep,
>   return entry;
>  }
>  
> -static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
> - int npages, bool no_fail)
> -{
> - struct bio *bio;
> -
> - if (no_fail) {
> - /* No failure on bio allocation */
> - bio = bio_alloc(GFP_NOIO, npages);
> - if (!bio)
> - bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
> - return bio;
> - }
> - if (time_to_inject(sbi, FAULT_ALLOC_BIO)) {
> - f2fs_show_injection_info(FAULT_ALLOC_BIO);
> - return NULL;
> - }
> -
> - return bio_alloc(GFP_KERNEL, npages);
> -}
> -
>  static inline bool is_idle(struct f2fs_sb_info *sbi, int type)
>  {
>   if (sbi->gc_mode == GC_URGENT)
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 808709581481..28457c878d0d 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -552,10 +552,7 @@ static int __submit_flush_wait(struct f2fs_sb_info *sbi,
>   struct bio *bio;
>   int ret;
>  
> - bio = f2fs_bio_alloc(sbi, 0, false);
> - if (!bio)
> - return -ENOMEM;
> -
> + bio = bio_alloc(GFP_KERNEL, 0);
>   bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
>   bio_set_dev(bio, bdev);
>   ret = submit_bio_wait(bio);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1443cee15863..51945dd27f00 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -44,7 +44,6 @@ const char *f2fs_fault_name[FAULT_MAX] = {
>   [FAULT_KVMALLOC]= "kvmalloc",
>   [FAULT_PAGE_ALLOC]  = "page alloc",
>   [FAULT_PAGE_GET]= "page get",
> - [FAULT_ALLOC_BIO]   = "alloc bio",
>   [FAULT_ALLOC_NID]   = "alloc nid",
>   [FAULT_ORPHAN]  = "orphan",
>   [FAULT_BLOCK]   = "no more block",
> 



Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Chao Yu
On 2019/10/30 10:55, Eric Biggers wrote:
> On Tue, Oct 29, 2019 at 04:33:36PM +0800, Chao Yu wrote:
>> On 2019/10/28 6:50, Eric Biggers wrote:
 +bool f2fs_is_compressed_page(struct page *page)
 +{
 +  if (!page_private(page))
 +  return false;
 +  if (IS_ATOMIC_WRITTEN_PAGE(page) || IS_DUMMY_WRITTEN_PAGE(page))
 +  return false;
 +  return *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC;
 +}
>>>
>>> This code implies that there can be multiple page private structures each of
>>> which has a different magic number.  But I only see 
>>> F2FS_COMPRESSED_PAGE_MAGIC.
>>> Where in the code is the other one(s)?
>>
>> I'm not sure I understood you correctly, did you mean it needs to introduce
>> f2fs_is_atomic_written_page() and f2fs_is_dummy_written_page() like
>> f2fs_is_compressed_page()?
>>
> 
> No, I'm asking what is the case where the line
> 
>   *((u32 *)page_private(page)) == F2FS_COMPRESSED_PAGE_MAGIC
> 
> returns false?

Should be this?

if (!page_private(page))
return false;
f2fs_bug_on(*((u32 *)page_private(page)) != F2FS_COMPRESSED_PAGE_MAGIC)
return true;

> 
>>>
 +
 +static void f2fs_set_compressed_page(struct page *page,
 +  struct inode *inode, pgoff_t index, void *data, refcount_t *r)
 +{
 +  SetPagePrivate(page);
 +  set_page_private(page, (unsigned long)data);
 +
 +  /* i_crypto_info and iv index */
 +  page->index = index;
 +  page->mapping = inode->i_mapping;
 +  if (r)
 +  refcount_inc(r);
 +}
>>>
>>> It isn't really appropriate to create fake pagecache pages like this.  Did 
>>> you
>>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
>>
>> We need to store i_crypto_info and iv index somewhere, in order to pass them 
>> to
>> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
>>
> 
> The same place where the pages are stored.

Still we need allocate space for those fields, any strong reason to do so?

> 
 +
 +void f2fs_destroy_compress_ctx(struct compress_ctx *cc)
 +{
 +  kvfree(cc->rpages);
 +}
>>>
>>> The memory is allocated with kzalloc(), so why is it freed with kvfree() 
>>> and not
>>> just kfree()?
>>
>> It was allocated by f2fs_*alloc() which will fallback to kvmalloc() once
>> kmalloc() failed.
> 
> This seems to be a bug in f2fs_kmalloc() -- it inappropriately falls back to
> kvmalloc().  As per its name, it should only use kmalloc().  f2fs_kvmalloc()
> already exists, so it can be used when the fallback is wanted.

We can introduce f2fs_memalloc() to wrap f2fs_kmalloc() and f2fs_kvmalloc() as
below:

f2fs_memalloc()
{
mem = f2fs_kmalloc();
if (mem)
return mem;
return f2fs_kvmalloc();
}

It can be used in specified place where we really need it, like the place
descirbied in 5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed") in where
we introduced original logic.

> 
>>
 +static int lzo_compress_pages(struct compress_ctx *cc)
 +{
 +  int ret;
 +
 +  ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata,
 +  >clen, cc->private);
 +  if (ret != LZO_E_OK) {
 +  printk_ratelimited("%sF2FS-fs: lzo compress failed, ret:%d\n",
 +  KERN_ERR, ret);
 +  return -EIO;
 +  }
 +  return 0;
 +}
>>>
>>> Why not using f2fs_err()?  Same in lots of other places.
>>
>> We use printk_ratelimited at some points where we can afford to lose logs,
>> otherwise we use f2fs_{err,warn...} to record info as much as possible for
>> troubleshoot.
>>
> 
> It used to be the case that f2fs_msg() was ratelimited.  What stops it from
> spamming the logs now?

https://lore.kernel.org/patchwork/patch/973837/

> 
> The problem with a bare printk is that it doesn't show which filesystem 
> instance
> the message is coming from.

We can add to print sbi->sb->s_id like f2fs_printk().

> 
 +
 +  ret = cops->compress_pages(cc);
 +  if (ret)
 +  goto out_vunmap_cbuf;
 +
 +  max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
 +
 +  if (cc->clen > max_len) {
 +  ret = -EAGAIN;
 +  goto out_vunmap_cbuf;
 +  }
>>>
>>> Since we already know the max length we're willing to compress to (the max
>>> length for any space to be saved), why is more space than that being 
>>> allocated?
>>> LZ4_compress_default() will return an error if there isn't enough space, so 
>>> that
>>> error could just be used as the indication to store the data uncompressed.
>>
>> AFAIK, there is no such common error code returned from all compression
>> algorithms indicating there is no room for limited target size, however we 
>> need
>> that information to fallback to write raw pages. Any better idea?
>>
> 
> "Not enough room" is the only reasonable way for compression to