Re: [f2fs-dev] [PATCH v3 00/13] fiemap extension for more physical information

2024-04-03 Thread Gao Xiang

Hi,

On 2024/4/3 23:11, Sweet Tea Dorminy wrote:




I'm not sure why here iomap was excluded technically or I'm missing some
previous comments?


Could you also make iomap support new FIEMAP physical extent information?
since compressed EROFS uses iomap FIEMAP interface to report compressed
extents ("z_erofs_iomap_report_ops") but there is no way to return
correct compressed lengths, that is unexpected.



I'll add iomap support in v4, I'd skipped it since I was worried it'd be an 
expansive additional part not necessary initially. Thank you for noting it!


Thanks, I think just fiemap report for iomap seems straight-forward.
Thanks for your work!

Thanks,
Gao Xiang



Sweet Tea



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 00/13] fiemap extension for more physical information

2024-04-03 Thread Gao Xiang

Hi,

On 2024/4/3 15:22, Sweet Tea Dorminy wrote:

For many years, various btrfs users have written programs to discover
the actual disk space used by files, using root-only interfaces.
However, this information is a great fit for fiemap: it is inherently
tied to extent information, all filesystems can use it, and the
capabilities required for FIEMAP make sense for this additional
information also.

Hence, this patchset adds various additional information to fiemap,
and extends filesystems (but not iomap) to return it.  This uses some of
the reserved padding in the fiemap extent structure, so programs unaware
of the changes will be unaffected.


I'm not sure why here iomap was excluded technically or I'm missing some
previous comments?



This is based on next-20240403. I've tested the btrfs part of this with
the standard btrfs testing matrix locally and manually, and done minimal
testing of the non-btrfs parts.

I'm unsure whether btrfs should be returning the entire physical extent
referenced by a particular logical range, or just the part of the
physical extent referenced by that range. The v2 thread has a discussion
of this.


Could you also make iomap support new FIEMAP physical extent information?
since compressed EROFS uses iomap FIEMAP interface to report compressed
extents ("z_erofs_iomap_report_ops") but there is no way to return
correct compressed lengths, that is unexpected.

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Gao Xiang

Hi Juhyung,

On 2023/12/5 22:43, Juhyung Park wrote:

On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang  wrote:




...



I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.


Down the rabbit hole we go...

Let me know if you have trouble getting an instance with FSRM. I'll
see what I can do.


I've sent out a fix to address this, please help check:
https://lore.kernel.org/r/20231206030758.3760521-1-hsiang...@linux.alibaba.com

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Gao Xiang



On 2023/12/5 22:23, Juhyung Park wrote:

Hi Gao,

On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang  wrote:


Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...




- Could you share the full message about the output of `lscpu`?


Sure:

Architecture:x86_64
CPU op-mode(s):32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order:Little Endian
CPU(s):  8
On-line CPU(s) list:   0-7
Vendor ID:   GenuineIntel
BIOS Vendor ID:Intel(R) Corporation
Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None 
CPU
@ 3.0GHz
  BIOS CPU family: 198
  CPU family:  6
  Model:   140
  Thread(s) per core:  2
  Core(s) per socket:  4
  Socket(s):   1
  Stepping:1
  CPU(s) scaling MHz:  60%
  CPU max MHz: 4800.
  CPU min MHz: 400.
  BogoMIPS:5990.40
  Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mc
   a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 
ss
   ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
art
arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_
   tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq 
dtes6
   4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 
xt
   pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_dead
   line_timer aes xsave avx f16c rdrand lahf_lm abm 
3dnowp
   refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb 
st
   ibp ibrs_enhanced tpr_shadow flexpriority ept vpid 
ept_
   ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid
rdt_a avx512f avx512dq rdseed adx smap avx512ifma 
clfl
   ushopt clwb intel_pt avx512cd sha_ni avx512bw 
avx512vl
   xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
dtherm
ida arat pln pts hwp hwp_notify hwp_act_window 
hwp_epp
hwp_pkg_req vnmi avx512vbmi umip pku ospke 
avx512_vbmi
   2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme 
av
   x512_vpopcntdq rdpid movdiri movdir64b fsrm 
avx512_vp2i


Sigh, I've been thinking.  Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
   #define CHECK_LEN cmp $0x20, %rdx; jb 1f
   #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
   .Lmemmove_begin_forward:
-   ALTERNATIVE_2 __stringify(CHECK_LEN), \
- __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
- __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+   CHECK_LEN

 /*
  * movsq instruction have many startup latency


Yup, that also seems to fix it.
Are we looking at a potential memmove issue?


I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-04 Thread Gao Xiang

Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...




- Could you share the full message about the output of `lscpu`?


Sure:

Architecture:x86_64
   CPU op-mode(s):32-bit, 64-bit
   Address sizes: 39 bits physical, 48 bits virtual
   Byte Order:Little Endian
CPU(s):  8
   On-line CPU(s) list:   0-7
Vendor ID:   GenuineIntel
   BIOS Vendor ID:Intel(R) Corporation
   Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
 BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None 
CPU
   @ 3.0GHz
 BIOS CPU family: 198
 CPU family:  6
 Model:   140
 Thread(s) per core:  2
 Core(s) per socket:  4
 Socket(s):   1
 Stepping:1
 CPU(s) scaling MHz:  60%
 CPU max MHz: 4800.
 CPU min MHz: 400.
 BogoMIPS:5990.40
 Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mc
  a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
  ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
art
   arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_
  tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq 
dtes6
  4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 
xt
  pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_dead
  line_timer aes xsave avx f16c rdrand lahf_lm abm 
3dnowp
  refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb 
st
  ibp ibrs_enhanced tpr_shadow flexpriority ept vpid 
ept_
  ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid
   rdt_a avx512f avx512dq rdseed adx smap avx512ifma 
clfl
  ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
  xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
dtherm
   ida arat pln pts hwp hwp_notify hwp_act_window 
hwp_epp
   hwp_pkg_req vnmi avx512vbmi umip pku ospke 
avx512_vbmi
  2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme 
av
  x512_vpopcntdq rdpid movdiri movdir64b fsrm 
avx512_vp2i


Sigh, I've been thinking.  Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
 #define CHECK_LEN  cmp $0x20, %rdx; jb 1f
 #define MEMMOVE_BYTES  movq %rdx, %rcx; rep movsb; RET
 .Lmemmove_begin_forward:
-   ALTERNATIVE_2 __stringify(CHECK_LEN), \
- __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
- __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+   CHECK_LEN
 
 	/*

 * movsq instruction have many startup latency

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-03 Thread Gao Xiang




On 2023/12/4 01:32, Juhyung Park wrote:

Hi Gao,


...





What is the difference between these two machines? just different CPU or
they have some other difference like different compliers?


I fully and exclusively control both devices, and the setup is almost the same.
Same Ubuntu version, kernel/compiler version.

But as I said, on my laptop, the issue happens on kernels that someone
else (Canonical) built, so I don't think it matters.


The only thing I could say is that the kernel side has optimized
inplace decompression compared to fuse so that it will reuse the
same buffer for decompression but with a safe margin (according to
the current lz4 decompression implementation).  It shouldn't behave
different just due to different CPUs.  Let me find more clues
later, also maybe we should introduce a way for users to turn off
this if needed.


Cool :)

I'm comfortable changing and building my own custom kernel for this
specific laptop. Feel free to ask me to try out some patches.


Thanks, I need to narrow down this issue:

-  First, could you apply the following diff to test if it's still
   reproducable?

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 021be5feb1bc..40a306628e1a 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -131,7 +131,7 @@ static void *z_erofs_lz4_handle_overlap(struct 
z_erofs_lz4_decompress_ctx *ctx,

if (rq->inplace_io) {
omargin = PAGE_ALIGN(ctx->oend) - ctx->oend;
-   if (rq->partial_decoding || !may_inplace ||
+   if (1 || rq->partial_decoding || !may_inplace ||
omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize))
goto docopy;

- Could you share the full message about the output of `lscpu`?

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-03 Thread Gao Xiang



On 2023/12/4 01:01, Juhyung Park wrote:

Hi Gao,

On Mon, Dec 4, 2023 at 1:52 AM Gao Xiang  wrote:


Hi Juhyung,

On 2023/12/4 00:22, Juhyung Park wrote:

(Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
while ago, which may mean that this is not specific to EROFS:
https://lore.kernel.org/all/cad14+f2nbztlflc6cwnjgcourrrjwzttp3d3ik4of+1eejk...@mail.gmail.com/
)

Hi.

I'm encountering a very weird EROFS data corruption.

I noticed when I build an EROFS image for AOSP development, the device
would randomly not boot from a certain build.
After inspecting the log, I noticed that a file got corrupted.


Is it observed on your laptop (i7-1185G7), yes? or some other arm64
device?


Yes, only on my laptop. The arm64 device seems fine.
The reason that it would not boot was that the host machine (my
laptop) was repacking the EROFS image wrongfully.

The workflow is something like this:
Server-built EROFS AOSP image -> Image copied to laptop -> Laptop
mounts the EROFS image -> Copies the entire content to a scratch
directory (CORRUPT!) -> Changes some files -> mkfs.erofs

So the device is not responsible for the corruption, the laptop is.


Ok.







After adding a hash check during the build flow, I noticed that EROFS
would randomly read data wrong.

I now have a reliable method of reproducing the issue, but here's the
funny/weird part: it's only happening on my laptop (i7-1185G7). This
is not happening with my 128 cores buildfarm machine (Threadripper
3990X).>
I first suspected a hardware issue, but:
a. The laptop had its motherboard replaced recently (due to a failing
physical Type-C port).
b. The laptop passes memory test (memtest86).
c. This happens on all kernel versions from v5.4 to the latest v6.6
including my personal custom builds and Canonical's official Ubuntu
kernels.
d. This happens on different host SSDs and file-system combinations.
e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
f. This only happens when mounting the image natively by the kernel.
Using fuse with erofsfuse is fine.


I think it's a weird issue with inplace decompression because you said
it depends on the hardware.  In addition, with your dataset sadly I
cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).


As I feared. Bummer :(



What is the difference between these two machines? just different CPU or
they have some other difference like different compliers?


I fully and exclusively control both devices, and the setup is almost the same.
Same Ubuntu version, kernel/compiler version.

But as I said, on my laptop, the issue happens on kernels that someone
else (Canonical) built, so I don't think it matters.


The only thing I could say is that the kernel side has optimized
inplace decompression compared to fuse so that it will reuse the
same buffer for decompression but with a safe margin (according to
the current lz4 decompression implementation).  It shouldn't behave
different just due to different CPUs.  Let me find more clues
later, also maybe we should introduce a way for users to turn off
this if needed.

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-03 Thread Gao Xiang

Hi Juhyung,

On 2023/12/4 00:22, Juhyung Park wrote:

(Cc'ing f2fs and crypto as I've noticed something similar with f2fs a
while ago, which may mean that this is not specific to EROFS:
https://lore.kernel.org/all/cad14+f2nbztlflc6cwnjgcourrrjwzttp3d3ik4of+1eejk...@mail.gmail.com/
)

Hi.

I'm encountering a very weird EROFS data corruption.

I noticed when I build an EROFS image for AOSP development, the device
would randomly not boot from a certain build.
After inspecting the log, I noticed that a file got corrupted.


Is it observed on your laptop (i7-1185G7), yes? or some other arm64
device?



After adding a hash check during the build flow, I noticed that EROFS
would randomly read data wrong.

I now have a reliable method of reproducing the issue, but here's the
funny/weird part: it's only happening on my laptop (i7-1185G7). This
is not happening with my 128 cores buildfarm machine (Threadripper
3990X).> 
I first suspected a hardware issue, but:

a. The laptop had its motherboard replaced recently (due to a failing
physical Type-C port).
b. The laptop passes memory test (memtest86).
c. This happens on all kernel versions from v5.4 to the latest v6.6
including my personal custom builds and Canonical's official Ubuntu
kernels.
d. This happens on different host SSDs and file-system combinations.
e. This only happens on LZ4. LZ4HC doesn't trigger the issue.
f. This only happens when mounting the image natively by the kernel.
Using fuse with erofsfuse is fine.


I think it's a weird issue with inplace decompression because you said
it depends on the hardware.  In addition, with your dataset sadly I
cannot reproduce on my local server (Xeon(R) CPU E5-2682 v4).

What is the difference between these two machines? just different CPU or
they have some other difference like different compliers?

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: handle decompress only post processing in softirq

2022-06-20 Thread Gao Xiang
On Mon, Jun 20, 2022 at 10:38:43AM -0700, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> Now decompression is being handled in workqueue and it makes read I/O
> latency non-deterministic, because of the non-deterministic scheduling
> nature of workqueues. So, I made it handled in softirq context only if
> possible, not in low memory devices, since this modification will
> maintain decompresion related memory a little longer.
> 

Again, I don't think this method scalable.  Since you already handle
all decompression algorithms in this way.  Laterly, I wonder if you'd
like to handle all:
 - decompression algorithms;
 - verity algorithms;
 - decrypt algorithms;

in this way, regardless of slow decompression algorithms, that would be a
long latency and CPU overhead of softirq context.  This is my last words
on this, I will not talk anymore.

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: handle decompress only post processing in softirq

2022-06-14 Thread Gao Xiang
On Tue, Jun 14, 2022 at 10:49:37AM -0700, Daeho Jeong wrote:
> > Yeah, I heard that you folks are really suffered from the scheduling
> > issues. But for my own previous experience, extra memory footprints are
> > really critical in Android low memory scenarios (no matter low-ended
> > devices or artificial workloads), it tossed me a lot. So I finally
> > ntroduced many inplace I/O to handle/minimize that, including inplace
> > I/O for compressed pages and temporary pages.
> >
> > But I'm not quite sure what's currently happening now, since we once
> > didn't have such non-deterministic workqueues, and I don't hear from
> > other landed vendors.  I think it'd be better to analyse what's going
> > on for these kworkers from scheduling POV and why they don't schedule
> > in time.
> >
> > I also have an idea is much like what I'm doing now for sync
> > decompression, is that just before lock page and ->read_folio, we can
> > trigger some decompression in addition to kworker decompression, but it
> > needs some MM modification, as below:
> >
> >!PageUptodate(page)
> >
> >some callback to decompress in addition to kworker
> >
> >lock_page()
> >->read_folio()
> >
> > If mm folks don't like it, I think RT thread is also fine after we
> > analysed the root cause of the kworker delay I think.
> >
> > Thanks,
> > Gao Xiang
> >
> > >
> > > Thanks,
> 
> I don't think this is not a problem with most devices, since the
> allocated memory is not too big and it'll be kept just as long as I/O
> processing is on. However, I still understand what you're worried
> about, so I think I can make a new mount option like "memory=low",
> which can be used to give a hint to F2FS to have a priority on as
> little memory as possible. In this mode, we will try to keep minimal
> memory and we can use the previous implementation for decompression.

Okay, one of our previous tests was that how many applications are
still there after many other applications boot. That makes sense since
most users need to leave as many apps as possible, I know for now we
have swap-like thing, but once it was done without swap. If you reserve
too much memory (with page mempool or even for inflight I/O), it will
impact the alive app numbers compared to uncompressed cases for all
devices (even high-ended devices).

BTW, most crypto algorithms have hardware instructions to boost up,
actually we have some in-house neon lz4 assembly as well. but it still
somewhat slow than crypto algorithms, not to mention some algorithms
like zstd or lzma. Anyway, I personally prefer RT Thread way since it's
more flexible, also for dm-verity at least try with WQ_HIGHPRI, and I've
seen:
https://android-review.googlesource.com/c/kernel/common/+/204421

But I'm not sure why it wasn't upstreamed though.

Thanks,
Gao Xiang

> 
> Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: handle decompress only post processing in softirq

2022-06-14 Thread Gao Xiang
Hi Daeho,

On Tue, Jun 14, 2022 at 09:46:50AM -0700, Daeho Jeong wrote:
> >
> > Some my own previous thoughts about this strategy:
> >
> >  - If we allocate all memory and map these before I/Os, all inflight I/Os
> >will keep such temporary pages all the time until decompression is
> >finished. In contrast, if we allocate or reuse such pages just before
> >decompression, it would minimize the memory footprints.
> >
> >I think it will impact the memory numbers at least on the very
> >low-ended devices with bslow storage. (I've seen f2fs has some big
> >mempool already)
> >
> >  - Many compression algorithms are not suitable in the softirq contexts,
> >also I vaguely remembered if softirq context lasts for > 2ms, it will
> >push into ksoftirqd instead so it's actually another process context.
> >And it may delay other important interrupt handling.
> >
> >  - Go back to the non-deterministic scheduling of workqueues. I guess it
> >may be just due to scheduling punishment due to a lot of CPU consuming
> >due to decompression before so the priority becomes low, but that is
> >just a pure guess. May be we need to use RT scheduling policy instead.
> >
> >At least with WQ_HIGHPRI for dm-verity at least, but I don't find
> >WQ_HIGHPRI mark for dm-verity.
> >
> > Thanks,
> > Gao Xiang
> 
> I totally understand what you are worried about. However, in the real
> world, non-determinism from workqueues is more harsh than we expected.
> As you know, reading I/Os in the system are critical paths most of the
> time and now I/O variations with workqueue are too bad.
> 
> I also think it's better that we have RT scheduling like things here.
> We could think about it more.

Yeah, I heard that you folks are really suffered from the scheduling
issues. But for my own previous experience, extra memory footprints are
really critical in Android low memory scenarios (no matter low-ended
devices or artificial workloads), it tossed me a lot. So I finally 
ntroduced many inplace I/O to handle/minimize that, including inplace
I/O for compressed pages and temporary pages.

But I'm not quite sure what's currently happening now, since we once
didn't have such non-deterministic workqueues, and I don't hear from
other landed vendors.  I think it'd be better to analyse what's going
on for these kworkers from scheduling POV and why they don't schedule
in time.

I also have an idea is much like what I'm doing now for sync
decompression, is that just before lock page and ->read_folio, we can
trigger some decompression in addition to kworker decompression, but it
needs some MM modification, as below:

   !PageUptodate(page)

   some callback to decompress in addition to kworker

   lock_page()
   ->read_folio()

If mm folks don't like it, I think RT thread is also fine after we
analysed the root cause of the kworker delay I think.

Thanks,
Gao Xiang

> 
> Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: handle decompress only post processing in softirq

2022-06-14 Thread Gao Xiang
Hi all,

On Mon, Jun 13, 2022 at 10:38:25PM -0700, Eric Biggers wrote:
> [+Cc Nathan Huckleberry who is looking into a similar problem in dm-verity]
> 
> On Mon, Jun 13, 2022 at 08:56:12AM -0700, Daeho Jeong wrote:
> > From: Daeho Jeong 
> > 
> > Now decompression is being handled in workqueue and it makes read I/O
> > latency non-deterministic, because of the non-deterministic scheduling
> > nature of workqueues. So, I made it handled in softirq context only if
> > possible.
> > 
> > Signed-off-by: Daeho Jeong 

...

> 
> One question: is this (the bio endio callback) actually guaranteed to be
> executed from a softirq?  If you look at dm-crypt's support for workqueue-less
> decryption, for example, it explicitly checks 'in_hardirq() || 
> irqs_disabled()'
> and schedules a tasklet if either of those is the case.
> 
> - Eric
> 

Some my own previous thoughts about this strategy:

 - If we allocate all memory and map these before I/Os, all inflight I/Os
   will keep such temporary pages all the time until decompression is
   finished. In contrast, if we allocate or reuse such pages just before
   decompression, it would minimize the memory footprints.

   I think it will impact the memory numbers at least on the very
   low-ended devices with bslow storage. (I've seen f2fs has some big
   mempool already)

 - Many compression algorithms are not suitable in the softirq contexts,
   also I vaguely remembered if softirq context lasts for > 2ms, it will
   push into ksoftirqd instead so it's actually another process context.
   And it may delay other important interrupt handling.

 - Go back to the non-deterministic scheduling of workqueues. I guess it
   may be just due to scheduling punishment due to a lot of CPU consuming
   due to decompression before so the priority becomes low, but that is
   just a pure guess. May be we need to use RT scheduling policy instead.

   At least with WQ_HIGHPRI for dm-verity at least, but I don't find
   WQ_HIGHPRI mark for dm-verity.

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: fix up f2fs_lookup tracepoints

2021-09-21 Thread Gao Xiang
Fix up a misuse that the filename pointer isn't always valid in
the ring buffer, and we should copy the content instead.

Fixes: 0c5e36db17f5 ("f2fs: trace f2fs_lookup")
Signed-off-by: Gao Xiang 
---
 include/trace/events/f2fs.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 4e881d9..4cb055a 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -807,20 +807,20 @@
TP_STRUCT__entry(
__field(dev_t,  dev)
__field(ino_t,  ino)
-   __field(const char *,   name)
+   __string(name,  dentry->d_name.name)
__field(unsigned int, flags)
),
 
TP_fast_assign(
__entry->dev= dir->i_sb->s_dev;
__entry->ino= dir->i_ino;
-   __entry->name   = dentry->d_name.name;
+   __assign_str(name, dentry->d_name.name);
__entry->flags  = flags;
),
 
TP_printk("dev = (%d,%d), pino = %lu, name:%s, flags:%u",
show_dev_ino(__entry),
-   __entry->name,
+   __get_str(name),
__entry->flags)
 );
 
@@ -834,7 +834,7 @@
TP_STRUCT__entry(
__field(dev_t,  dev)
__field(ino_t,  ino)
-   __field(const char *,   name)
+   __string(name,  dentry->d_name.name)
__field(nid_t,  cino)
__field(int,err)
),
@@ -842,14 +842,14 @@
TP_fast_assign(
__entry->dev= dir->i_sb->s_dev;
__entry->ino= dir->i_ino;
-   __entry->name   = dentry->d_name.name;
+   __assign_str(name, dentry->d_name.name);
__entry->cino   = ino;
__entry->err= err;
),
 
TP_printk("dev = (%d,%d), pino = %lu, name:%s, ino:%u, err:%d",
show_dev_ino(__entry),
-   __entry->name,
+   __get_str(name),
__entry->cino,
__entry->err)
 );
-- 
1.8.3.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: replace ERANGE with ENAMETOOLONG in file name length check

2021-06-14 Thread Gao Xiang


On Tue, Jun 15, 2021 at 11:19:24AM +0800, wangxiaojun (N) wrote:
> 
> 在 2021/6/15 10:31, Gao Xiang 写道:
> > On Tue, Jun 15, 2021 at 09:35:09AM +0800, Wang Xiaojun wrote:
> > > ERANGE indicates that the math result is not representative. Here,
> > > ENAMETOOLONG is used to replace ERANGE.
> > > 
> > > Signed-off-by: Wang Xiaojun 
> > I don't think ENAMETOOLONG is a valid return code for {g,s}etxattr.
> > https://man7.org/linux/man-pages/man2/getxattr.2.html
> > https://man7.org/linux/man-pages/man2/setxattr.2.html
> > instead of ERANGE.
> > 
> > please also see ext4 / xfs implementations.
> > 
> > Thanks,
> > Gao Xiang
> 
> Hi Xiang, You're right. Thanks for your comments.

Hi Xiaojun,

Yeah, currently ENAMETOOLONG is strictly specific for pathname. If we
change like this, I'm not sure if it could break some exist user
programs.

IOW, it should be a wide discussion or modification at least.

Thanks,
Gao Xiang

> 
> > 
> > > ---
> > >   fs/f2fs/xattr.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
> > > index c8f34decbf8e..eb827c10e970 100644
> > > --- a/fs/f2fs/xattr.c
> > > +++ b/fs/f2fs/xattr.c
> > > @@ -529,7 +529,7 @@ int f2fs_getxattr(struct inode *inode, int index, 
> > > const char *name,
> > >   len = strlen(name);
> > >   if (len > F2FS_NAME_LEN)
> > > - return -ERANGE;
> > > + return -ENAMETOOLONG;
> > >   down_read(_I(inode)->i_xattr_sem);
> > >   error = lookup_all_xattrs(inode, ipage, index, len, name,
> > > @@ -646,7 +646,7 @@ static int __f2fs_setxattr(struct inode *inode, int 
> > > index,
> > >   len = strlen(name);
> > >   if (len > F2FS_NAME_LEN)
> > > - return -ERANGE;
> > > + return -ENAMETOOLONG;
> > >   if (size > MAX_VALUE_LEN(inode))
> > >   return -E2BIG;
> > > -- 
> > > 2.25.4
> > > 
> > > 
> > > 
> > > ___
> > > Linux-f2fs-devel mailing list
> > > Linux-f2fs-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > .


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: replace ERANGE with ENAMETOOLONG in file name length check

2021-06-14 Thread Gao Xiang
On Tue, Jun 15, 2021 at 09:35:09AM +0800, Wang Xiaojun wrote:
> ERANGE indicates that the math result is not representative. Here,
> ENAMETOOLONG is used to replace ERANGE.
> 
> Signed-off-by: Wang Xiaojun 

I don't think ENAMETOOLONG is a valid return code for {g,s}etxattr.
https://man7.org/linux/man-pages/man2/getxattr.2.html
https://man7.org/linux/man-pages/man2/setxattr.2.html
instead of ERANGE.

please also see ext4 / xfs implementations.

Thanks,
Gao Xiang


> ---
>  fs/f2fs/xattr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
> index c8f34decbf8e..eb827c10e970 100644
> --- a/fs/f2fs/xattr.c
> +++ b/fs/f2fs/xattr.c
> @@ -529,7 +529,7 @@ int f2fs_getxattr(struct inode *inode, int index, const 
> char *name,
>  
>   len = strlen(name);
>   if (len > F2FS_NAME_LEN)
> - return -ERANGE;
> + return -ENAMETOOLONG;
>  
>   down_read(_I(inode)->i_xattr_sem);
>   error = lookup_all_xattrs(inode, ipage, index, len, name,
> @@ -646,7 +646,7 @@ static int __f2fs_setxattr(struct inode *inode, int index,
>   len = strlen(name);
>  
>   if (len > F2FS_NAME_LEN)
> - return -ERANGE;
> + return -ENAMETOOLONG;
>  
>   if (size > MAX_VALUE_LEN(inode))
>   return -E2BIG;
> -- 
> 2.25.4
> 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] 答复: [PATCH v4] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-05-11 Thread Gao Xiang
On Tue, May 11, 2021 at 02:50:54PM -0700, Jaegeuk Kim wrote:
> On 05/11, changfeng...@vivo.com wrote:
> > Hi Jaegeuk:
> > 
> > If there're existing clusters beyond i_size, may cause data corruption, but
> > will this happen in normal? maybe some error can cause this, if i_size is
> > error the data beyond size still can't handle properly.  Is there normal
> > case can casue existing clusters beyond i_size?
> 
> We don't have a rule to sync between i_size and i_blocks.

Hmmm.. Again, it's still unclear what's the on-disk format like when
post-EOF. Also, corrupted/crafted/malicious on-disk data needs to be
handled at least to make sure it cannot crash the kernel and corrupt
the fs itself even further, especially some optimization patch like
this targeted on the specific logic to challenge the stablility.

So without details, in the beginning, it smelled somewhat dangerous
to me anyway. But considering some performance impact, I just leave
some message here.

Thanks,
Gao Xiang

> 
> > 
> > Thanks.
> > 
> > -邮件原件-
> > 发件人: Jaegeuk Kim  
> > 发送时间: 2021年5月10日 23:44
> > 收件人: Fengnan Chang 
> > 抄送: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net
> > 主题: Re: [PATCH v4] f2fs: compress: avoid unnecessary check in
> > f2fs_prepare_compress_overwrite
> > 
> > On 05/07, Fengnan Chang wrote:
> > > when write compressed file with O_TRUNC, there will be a lot of 
> > > unnecessary check valid blocks in f2fs_prepare_compress_overwrite, 
> > > especially when written in page size, remove it.
> > > 
> > > This patch will not bring significant performance improvements, I test 
> > > this on mobile phone, use androbench, the sequential write test case 
> > > was open file with O_TRUNC, set write size to 4KB,  performance 
> > > improved about 2%-3%. If write size set to 32MB, performance improved
> > about 0.5%.
> > > 
> > > Signed-off-by: Fengnan Chang 
> > > ---
> > >  fs/f2fs/data.c | 8 
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 
> > > cf935474ffba..b9ec7b182f45 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -3303,9 +3303,17 @@ static int f2fs_write_begin(struct file *file, 
> > > struct address_space *mapping,  #ifdef CONFIG_F2FS_FS_COMPRESSION
> > >   if (f2fs_compressed_file(inode)) {
> > >   int ret;
> > > + pgoff_t end = (i_size_read(inode) + PAGE_SIZE - 1) >>
> > PAGE_SHIFT;
> > > 
> > >   *fsdata = NULL;
> > > 
> > > + /*
> > > +  * when write pos is bigger than inode size
> > ,f2fs_prepare_compress_overwrite
> > > +  * always return 0, so check pos first to avoid this.
> > > +  */
> > > + if (index >= end)
> > > + goto repeat;
> > 
> > What if there're existing clusters beyond i_size? Given performance impacts,
> > do we really need this?
> > 
> > > +
> > >   ret = f2fs_prepare_compress_overwrite(inode, pagep,
> > >   index, fsdata);
> > >   if (ret < 0) {
> > > --
> > > 2.29.0
> > 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] 答复: 答复: [PATCH] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-05-06 Thread Gao Xiang via Linux-f2fs-devel
On Thu, May 06, 2021 at 08:15:40PM +0800, changfeng...@vivo.com wrote:
> This patch will not bring significant performance improvements, I
> test this on mobile phone, use androbench, the sequential write test
> case was open file with O_TRUNC, set write size to 4KB,  performance
> improved about 2%-3%. If write size set to 32MB, performance improved
> about 0.5%.

Ok, so considering this, my suggestion is that it'd be better to add
your own configuration and raw test results to the commit message to
show the reason why we need this constraint here.

Also, adding some inline comments about this sounds better.

Thanks,
Gao Xiang

> 
> 
> -邮件原件-
> 发件人: Chao Yu  
> 发送时间: 2021年5月6日 18:38
> 收件人: Gao Xiang 
> 抄送: Gao Xiang ; changfeng...@vivo.com; 
> jaeg...@kernel.org; linux-f2fs-devel@lists.sourceforge.net
> 主题: Re: [f2fs-dev] 答复: [PATCH] f2fs: compress: avoid unnecessary check in 
> f2fs_prepare_compress_overwrite
> 
> Hi Xiang,
> 
> On 2021/5/6 17:58, Gao Xiang wrote:
> > Hi Chao,
> > 
> > On Thu, May 06, 2021 at 05:15:04PM +0800, Chao Yu wrote:
> >> On 2021/4/26 17:00, Gao Xiang wrote:
> >>> On Mon, Apr 26, 2021 at 04:42:20PM +0800, changfeng...@vivo.com wrote:
> >>>> Thank you for the reminder, I hadn't thought about fallocate before.
> >>>> I have done some tests and the results are as expected.
> >>>> Here is my test method, create a compressed file, and use fallocate 
> >>>> with keep size, when write data to expand area, 
> >>>> f2fs_prepare_compress_overwrite always return 0, the behavior is same as 
> >>>> my patch , apply my patch can avoid those check.
> >>>> Is there anything else I haven't thought of?
> >>>
> >>> Nope, I didn't look into the implementation. Just a wild guess.
> >>>
> >>> (I just wondered if the cluster size is somewhat large (e.g. 64k),
> >>>but with a partial fallocate (e.g. 16k), and does it behave ok?
> >>>or some other corner cases/conditions are needed.)
> >>
> >> Xiang, sorry for late reply.
> >>
> >> Now, f2fs triggers compression only if one cluster is fully written, 
> >> e.g. cluster size is 16kb, isize is 8kb, then the first cluster is 
> >> non-compressed one, so we don't need to prepare for compressed 
> >> cluster overwrite during write_begin(). Also, blocks fallocated 
> >> beyond isize should never be compressed, so we don't need to worry 
> >> about that.
> >>
> > 
> > Yeah, that could make it unnoticable. but my main concern is actually 
> > not what the current runtime compression logic is, but what the 
> > on-disk compresion format actually is, or there could cause 
> > compatibility issue if some later kernel makes full use of this and 
> > use old kernels
> 
> That's related, if there is layout v2 or we updated runtime compression 
> policy later, it needs to reconsider newly introduced logic of this patch, I 
> guess we need to add comments here to indicate why we can skip the 
> preparation function.
> 
> > instead (also considering some corrupted compression indexes, which is 
> > not generated by the normal runtime compression logic.)
> 
> Yes, that's good concern, but that was not done by 
> f2fs_prepare_compress_overwrite(), another sanity check logic needs to be 
> designed and implemented in separated patch.
> 
> > 
> > My own suggestion about this is still verifying compress indexes first 
> > rather than relying much on runtime logic constraint. (Except that 
> > this patch can show signifiant benefit performance numbers to prove it 
> > can improve performance a lot.) Just my own premature thoughts.
> 
> Fengnan, could you please give some numbers to show how that check can impact 
> performance?
> 
> Thanks,
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> >> Thanks,
> >>
> >>>
> >>> If that is fine, I have no problem about this, yet i_size here is 
> >>> generally somewhat risky since after post-EOF behavior changes (e.g. 
> >>> supporting FALLOC_FL_ZERO_RANGE with keep size later), it may cause 
> >>> some potential regression.
> >>>
> >>>>
> >>>> -邮件原件-
> >>>> 发件人: Gao Xiang 
> >>>> 发送时间: 2021年4月26日 11:19
> >>>> 收件人: Fengnan Chang 
> >>>> 抄送: c...@kernel.org; jaeg...@kernel.org; 
> >>>> linux-f2fs-devel@lists.sourceforge.net
> >>>> 主题: Re: [f2fs-dev] [PATCH] f2fs: compress: avoid unnecessary check 
> >>&g

Re: [f2fs-dev] 答复: [PATCH] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-05-06 Thread Gao Xiang via Linux-f2fs-devel
On Thu, May 06, 2021 at 06:37:45PM +0800, Chao Yu wrote:
> Hi Xiang,
> 
> On 2021/5/6 17:58, Gao Xiang wrote:
> > Hi Chao,
> > 
> > On Thu, May 06, 2021 at 05:15:04PM +0800, Chao Yu wrote:
> > > On 2021/4/26 17:00, Gao Xiang wrote:
> > > > On Mon, Apr 26, 2021 at 04:42:20PM +0800, changfeng...@vivo.com wrote:
> > > > > Thank you for the reminder, I hadn't thought about fallocate before.
> > > > > I have done some tests and the results are as expected.
> > > > > Here is my test method, create a compressed file, and use fallocate 
> > > > > with keep size, when write data to expand area, 
> > > > > f2fs_prepare_compress_overwrite
> > > > > always return 0, the behavior is same as my patch , apply my patch 
> > > > > can avoid those check.
> > > > > Is there anything else I haven't thought of?
> > > > 
> > > > Nope, I didn't look into the implementation. Just a wild guess.
> > > > 
> > > > (I just wondered if the cluster size is somewhat large (e.g. 64k),
> > > >but with a partial fallocate (e.g. 16k), and does it behave ok?
> > > >or some other corner cases/conditions are needed.)
> > > 
> > > Xiang, sorry for late reply.
> > > 
> > > Now, f2fs triggers compression only if one cluster is fully written,
> > > e.g. cluster size is 16kb, isize is 8kb, then the first cluster is
> > > non-compressed one, so we don't need to prepare for compressed
> > > cluster overwrite during write_begin(). Also, blocks fallocated
> > > beyond isize should never be compressed, so we don't need to worry
> > > about that.
> > > 
> > 
> > Yeah, that could make it unnoticable. but my main concern is actually
> > not what the current runtime compression logic is, but what the on-disk
> > compresion format actually is, or there could cause compatibility
> > issue if some later kernel makes full use of this and use old kernels
> 
> That's related, if there is layout v2 or we updated runtime compression
> policy later, it needs to reconsider newly introduced logic of this patch,
> I guess we need to add comments here to indicate why we can skip the
> preparation function.

Anyway, my thoughts is mainly to distinguish the current runtime
compression logic and the proposal on-disk format by design. If it's
easy to support reading partial written clusters and post-EOF cases
in practice with a few lines, so the later compression logic could
use compat feature (or at least ro_compat feature) to update, which
is much better than an incompat feature for older kernels.

But if it's somewhat hard to add simply, that makes no difference so
v2 may need to be introduced instead.

> 
> > instead (also considering some corrupted compression indexes, which
> > is not generated by the normal runtime compression logic.)
> 
> Yes, that's good concern, but that was not done by
> f2fs_prepare_compress_overwrite(), another sanity check logic needs
> to be designed and implemented in separated patch.
> 
> > 
> > My own suggestion about this is still verifying compress indexes
> > first rather than relying much on runtime logic constraint. (Except
> > that this patch can show signifiant benefit performance numbers to
> > prove it can improve performance a lot.) Just my own premature
> > thoughts.
> 
> Fengnan, could you please give some numbers to show how that check can
> impact performance?

IMO, it'd be better to show some real numbers to add more constraint
like this, if it can be measureable, that is another story indeed.

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > Thanks,
> > > 
> > > > 
> > > > If that is fine, I have no problem about this, yet i_size here
> > > > is generally somewhat risky since after post-EOF behavior
> > > > changes (e.g. supporting FALLOC_FL_ZERO_RANGE with keep size
> > > > later), it may cause some potential regression.
> > > > 
> > > > > 
> > > > > -邮件原件-
> > > > > 发件人: Gao Xiang 
> > > > > 发送时间: 2021年4月26日 11:19
> > > > > 收件人: Fengnan Chang 
> > > > > 抄送: c...@kernel.org; jaeg...@kernel.org;
> > > > > linux-f2fs-devel@lists.sourceforge.net
> > > > > 主题: Re: [f2fs-dev] [PATCH] f2fs: compress: avoid unnecessary check in
> > > > > f2fs_prepare_compress_overwrite
> > > > > 
> > > > > On Mon, Apr 26, 2021 at 10:11:53AM +0800, Fengnan Chang wrote:
> > 

Re: [f2fs-dev] 答复: [PATCH] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-05-06 Thread Gao Xiang via Linux-f2fs-devel
Hi Chao,

On Thu, May 06, 2021 at 05:15:04PM +0800, Chao Yu wrote:
> On 2021/4/26 17:00, Gao Xiang wrote:
> > On Mon, Apr 26, 2021 at 04:42:20PM +0800, changfeng...@vivo.com wrote:
> > > Thank you for the reminder, I hadn't thought about fallocate before.
> > > I have done some tests and the results are as expected.
> > > Here is my test method, create a compressed file, and use fallocate with 
> > > keep size, when write data to expand area, f2fs_prepare_compress_overwrite
> > > always return 0, the behavior is same as my patch , apply my patch can 
> > > avoid those check.
> > > Is there anything else I haven't thought of?
> > 
> > Nope, I didn't look into the implementation. Just a wild guess.
> > 
> > (I just wondered if the cluster size is somewhat large (e.g. 64k),
> >   but with a partial fallocate (e.g. 16k), and does it behave ok?
> >   or some other corner cases/conditions are needed.)
> 
> Xiang, sorry for late reply.
> 
> Now, f2fs triggers compression only if one cluster is fully written,
> e.g. cluster size is 16kb, isize is 8kb, then the first cluster is
> non-compressed one, so we don't need to prepare for compressed
> cluster overwrite during write_begin(). Also, blocks fallocated
> beyond isize should never be compressed, so we don't need to worry
> about that.
> 

Yeah, that could make it unnoticable. but my main concern is actually
not what the current runtime compression logic is, but what the on-disk
compresion format actually is, or there could cause compatibility
issue if some later kernel makes full use of this and use old kernels
instead (also considering some corrupted compression indexes, which
is not generated by the normal runtime compression logic.)

My own suggestion about this is still verifying compress indexes
first rather than relying much on runtime logic constraint. (Except
that this patch can show signifiant benefit performance numbers to
prove it can improve performance a lot.) Just my own premature
thoughts.

Thanks,
Gao Xiang

> Thanks,
> 
> > 
> > If that is fine, I have no problem about this, yet i_size here
> > is generally somewhat risky since after post-EOF behavior
> > changes (e.g. supporting FALLOC_FL_ZERO_RANGE with keep size
> > later), it may cause some potential regression.
> > 
> > > 
> > > -邮件原件-
> > > 发件人: Gao Xiang 
> > > 发送时间: 2021年4月26日 11:19
> > > 收件人: Fengnan Chang 
> > > 抄送: c...@kernel.org; jaeg...@kernel.org;
> > > linux-f2fs-devel@lists.sourceforge.net
> > > 主题: Re: [f2fs-dev] [PATCH] f2fs: compress: avoid unnecessary check in
> > > f2fs_prepare_compress_overwrite
> > > 
> > > On Mon, Apr 26, 2021 at 10:11:53AM +0800, Fengnan Chang wrote:
> > > > when write compressed file with O_TRUNC, there will be a lot of
> > > > unnecessary check valid blocks in f2fs_prepare_compress_overwrite,
> > > > especially when written in page size, remove it.
> > > > 
> > > > Signed-off-by: Fengnan Chang 
> > > 
> > > Even though I didn't look into the whole thing, my reaction here is 
> > > roughly
> > > how to handle fallocate with keep size? Does it work as expected?
> > > 
> > > > ---
> > > >   fs/f2fs/data.c | 4 
> > > >   1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index
> > > > cf935474ffba..9c3b0849f35e 100644
> > > > --- a/fs/f2fs/data.c
> > > > +++ b/fs/f2fs/data.c
> > > > @@ -3270,6 +3270,7 @@ static int f2fs_write_begin(struct file *file,
> > > > struct address_space *mapping,
> > > > struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > > struct page *page = NULL;
> > > > pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
> > > > +   pgoff_t end = (i_size_read(inode) + PAGE_SIZE - 1) >> 
> > > > PAGE_SHIFT;
> > > > bool need_balance = false, drop_atomic = false;
> > > > block_t blkaddr = NULL_ADDR;
> > > > int err = 0;
> > > > @@ -3306,6 +3307,9 @@ static int f2fs_write_begin(struct file *file,
> > > > struct address_space *mapping,
> > > > 
> > > > *fsdata = NULL;
> > > > 
> > > > +   if (index >= end)
> > > > +   goto repeat;
> > > > +
> > > > ret = f2fs_prepare_compress_overwrite(inode, pagep,
> > > > index, fsdata);
> > > > if (ret < 0) {
> > > > --
> > > > 2.29.0
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > ___
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] 答复: [PATCH] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-04-26 Thread Gao Xiang
On Mon, Apr 26, 2021 at 04:42:20PM +0800, changfeng...@vivo.com wrote:
> Thank you for the reminder, I hadn't thought about fallocate before.
> I have done some tests and the results are as expected.
> Here is my test method, create a compressed file, and use fallocate with keep 
> size, when write data to expand area, f2fs_prepare_compress_overwrite
> always return 0, the behavior is same as my patch , apply my patch can avoid 
> those check.
> Is there anything else I haven't thought of?

Nope, I didn't look into the implementation. Just a wild guess.

(I just wondered if the cluster size is somewhat large (e.g. 64k),
 but with a partial fallocate (e.g. 16k), and does it behave ok?
 or some other corner cases/conditions are needed.)

If that is fine, I have no problem about this, yet i_size here
is generally somewhat risky since after post-EOF behavior
changes (e.g. supporting FALLOC_FL_ZERO_RANGE with keep size
later), it may cause some potential regression.

> 
> -邮件原件-
> 发件人: Gao Xiang 
> 发送时间: 2021年4月26日 11:19
> 收件人: Fengnan Chang 
> 抄送: c...@kernel.org; jaeg...@kernel.org;
> linux-f2fs-devel@lists.sourceforge.net
> 主题: Re: [f2fs-dev] [PATCH] f2fs: compress: avoid unnecessary check in
> f2fs_prepare_compress_overwrite
> 
> On Mon, Apr 26, 2021 at 10:11:53AM +0800, Fengnan Chang wrote:
> > when write compressed file with O_TRUNC, there will be a lot of
> > unnecessary check valid blocks in f2fs_prepare_compress_overwrite,
> > especially when written in page size, remove it.
> >
> > Signed-off-by: Fengnan Chang 
> 
> Even though I didn't look into the whole thing, my reaction here is roughly
> how to handle fallocate with keep size? Does it work as expected?
> 
> > ---
> >  fs/f2fs/data.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index
> > cf935474ffba..9c3b0849f35e 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -3270,6 +3270,7 @@ static int f2fs_write_begin(struct file *file,
> > struct address_space *mapping,
> > struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > struct page *page = NULL;
> > pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
> > +   pgoff_t end = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > bool need_balance = false, drop_atomic = false;
> > block_t blkaddr = NULL_ADDR;
> > int err = 0;
> > @@ -3306,6 +3307,9 @@ static int f2fs_write_begin(struct file *file,
> > struct address_space *mapping,
> >
> > *fsdata = NULL;
> >
> > +   if (index >= end)
> > +   goto repeat;
> > +
> > ret = f2fs_prepare_compress_overwrite(inode, pagep,
> > index, fsdata);
> > if (ret < 0) {
> > --
> > 2.29.0
> 
> 
> 
> 
> 
> 
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: compress: avoid unnecessary check in f2fs_prepare_compress_overwrite

2021-04-25 Thread Gao Xiang
On Mon, Apr 26, 2021 at 10:11:53AM +0800, Fengnan Chang wrote:
> when write compressed file with O_TRUNC, there will be a lot of
> unnecessary check valid blocks in f2fs_prepare_compress_overwrite,
> especially when written in page size, remove it.
> 
> Signed-off-by: Fengnan Chang 

Even though I didn't look into the whole thing,
my reaction here is roughly how to handle fallocate with
keep size? Does it work as expected?

> ---
>  fs/f2fs/data.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index cf935474ffba..9c3b0849f35e 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3270,6 +3270,7 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   struct page *page = NULL;
>   pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
> + pgoff_t end = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
>   bool need_balance = false, drop_atomic = false;
>   block_t blkaddr = NULL_ADDR;
>   int err = 0;
> @@ -3306,6 +3307,9 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
> 
>   *fsdata = NULL;
> 
> + if (index >= end)
> + goto repeat;
> +
>   ret = f2fs_prepare_compress_overwrite(inode, pagep,
>   index, fsdata);
>   if (ret < 0) {
> --
> 2.29.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: introduce gc_merge mount option

2021-03-27 Thread Gao Xiang
On Sat, Mar 27, 2021 at 05:57:06PM +0800, Chao Yu wrote:
> In this patch, we will add two new mount options: "gc_merge" and
> "nogc_merge", when background_gc is on, "gc_merge" option can be
> set to let background GC thread to handle foreground GC requests,
> it can eliminate the sluggish issue caused by slow foreground GC
> operation when GC is triggered from a process with limited I/O
> and CPU resources.
> 
> Original idea is from Xiang.
> 
> Signed-off-by: Gao Xiang 
> Signed-off-by: Chao Yu 

Ah, that was a quite old commit many years ago due to priority inversion
issue ;-) I vaguely remembered some potential wakeup race condition which
was addressed in the internal branch...Yet I have no idea about those now
LOL.

Thanks for redoing this and sending it out to the upstream... :-)

Thanks,
Gao Xiang

> ---
>  Documentation/filesystems/f2fs.rst |  6 ++
>  fs/f2fs/f2fs.h |  1 +
>  fs/f2fs/gc.c   | 26 ++
>  fs/f2fs/gc.h   |  6 ++
>  fs/f2fs/segment.c  | 15 +--
>  fs/f2fs/super.c| 19 +--
>  6 files changed, 65 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index 35ed01a5fbc9..63c0c49b726d 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -110,6 +110,12 @@ background_gc=%s  Turn on/off cleaning operations, 
> namely garbage
>on synchronous garbage collection running in 
> background.
>Default value for this option is on. So garbage
>collection is on by default.
> +gc_merge  When background_gc is on, this option can be enabled to
> +  let background GC thread to handle foreground GC 
> requests,
> +  it can eliminate the sluggish issue caused by slow 
> foreground
> +  GC operation when GC is triggered from a process with 
> limited
> +  I/O and CPU resources.
> +nogc_mergeDisable GC merge feature.
>  disable_roll_forward  Disable the roll-forward recovery routine
>  norecoveryDisable the roll-forward recovery routine, mounted 
> read-
>only (i.e., -o ro,disable_roll_forward)
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index fe380bcf8d4d..87d734f5589d 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -97,6 +97,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
>  #define F2FS_MOUNT_NORECOVERY0x0400
>  #define F2FS_MOUNT_ATGC  0x0800
>  #define F2FS_MOUNT_MERGE_CHECKPOINT  0x1000
> +#define  F2FS_MOUNT_GC_MERGE 0x2000
>  
>  #define F2FS_OPTION(sbi) ((sbi)->mount_opt)
>  #define clear_opt(sbi, option)   (F2FS_OPTION(sbi).opt &= 
> ~F2FS_MOUNT_##option)
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index a2ca483f9855..5c48825fd12d 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -31,19 +31,24 @@ static int gc_thread_func(void *data)
>   struct f2fs_sb_info *sbi = data;
>   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
>   wait_queue_head_t *wq = >gc_thread->gc_wait_queue_head;
> + wait_queue_head_t *fggc_wq = >gc_thread->fggc_wq;
>   unsigned int wait_ms;
>  
>   wait_ms = gc_th->min_sleep_time;
>  
>   set_freezable();
>   do {
> - bool sync_mode;
> + bool sync_mode, foreground = false;
>  
>   wait_event_interruptible_timeout(*wq,
>   kthread_should_stop() || freezing(current) ||
> + waitqueue_active(fggc_wq) ||
>   gc_th->gc_wake,
>   msecs_to_jiffies(wait_ms));
>  
> + if (test_opt(sbi, GC_MERGE) && waitqueue_active(fggc_wq))
> + foreground = true;
> +
>   /* give it a try one time */
>   if (gc_th->gc_wake)
>   gc_th->gc_wake = 0;
> @@ -90,7 +95,10 @@ static int gc_thread_func(void *data)
>   goto do_gc;
>   }
>  
> - if (!down_write_trylock(>gc_lock)) {
> + if (foreground) {
> + down_write(>gc_lock);
> + goto do_gc;
> + } else if (!down_write_trylock(>gc_lock)) {
>   stat_other_skip_bggc_count(sbi);
>   goto next;
>

Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-04 Thread Gao Xiang
On Fri, Dec 04, 2020 at 04:50:14PM +0800, Chao Yu wrote:

...

> 
> > 
> > About the speed, I think which is also limited to storage device and other
> > conditions (I mean the CPU loading during the writeback might be different
> > between lz4 and lz4hc-9 due to many other bounds, e.g. UFS 3.0 seq
> > write is somewhat higher vs VM. lz4 may have higher bandwidth on high
> 
> Yeah, I guess my VM have been limited on its storage bandwidth, and its 
> back-end
> could be low-end rotating disk...

Yeah, anyway that's in IO writeback path (no matter the time was working
on IO or CPU calcualation...)

> 
> > level devices since it seems some IO bound here... I guess but not sure,
> > since pure in-memory lz4 is fast according to lzbench / lz4 homepage.)
> > 
> > Anyway, it's up to f2fs folks if it's useful :) (the CR number is what
> > I expect though... I'm a bit of afraid the CPU runtime loading.)
> 
> I just have a glance at CPU usage numbers (my VM has 16 cores):
> lz4hc takes 11% in first half and downgrade to 6% at second half.
> lz4 takes 6% in whole process.
> 
> But that's not accruate...

There is some userspace lzbench [1] to benchmark lz4/lz4hc completely
in memory. So it's expected that lz4bench will consume all 100% CPU
with maximum bandwidth (but in-kernel lz4 version is lower though):

Intel Core i7-8700K
Compression Decompression   C/R
memcpy  10362 MB/s  10790 MB/s  100.00
lz4 1.9.2   737 MB/s4448 MB/s   47.60
lz4hc 1.9.2 -9  33 MB/s 4378 MB/s   36.75

So adding more IO time (due to storage device difference) could make
CPU loading lower (also could make the whole process IO bound) but
the overall write bandwidth will be lower as well.

[1] https://github.com/inikep/lzbench

Thanks,
Gao Xiang

> 
> Thanks,



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-03 Thread Gao Xiang
Hi Chao,

On Fri, Dec 04, 2020 at 03:09:20PM +0800, Chao Yu wrote:
> On 2020/12/4 8:31, Gao Xiang wrote:
> > could make more sense), could you leave some CR numbers about these
> > algorithms on typical datasets (enwik9, silisia.tar or else.) with 16k
> > cluster size?
> 
> Just from a quick test with enwik9 on vm:
> 
> Original blocks:  244382
> 
>   lz4 lz4hc-9
> compressed blocks 170647  163270
> compress ratio69.8%   66.8%
> speed 16.4207 s, 60.9 MB/s26.7299 s, 37.4 MB/s
> 
> compress ratio = after / before

Thanks for the confirmation. it'd be better to add this to commit message
if needed when adding a new algorithm to show the benefits.

About the speed, I think which is also limited to storage device and other
conditions (I mean the CPU loading during the writeback might be different
between lz4 and lz4hc-9 due to many other bounds, e.g. UFS 3.0 seq
write is somewhat higher vs VM. lz4 may have higher bandwidth on high
level devices since it seems some IO bound here... I guess but not sure,
since pure in-memory lz4 is fast according to lzbench / lz4 homepage.)

Anyway, it's up to f2fs folks if it's useful :) (the CR number is what
I expect though... I'm a bit of afraid the CPU runtime loading.)
Thanks for your time!

Thanks,
Gao Xiang

> 
> Thanks,
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-03 Thread Gao Xiang
On Fri, Dec 04, 2020 at 11:11:03AM +0800, Chao Yu wrote:
> On 2020/12/4 10:47, Gao Xiang wrote:

...

> 
> > future (and add more dependency to algorithms, you might see BWT-based bzip2
> > removal patch
> 
> Oops, is that really allowed? I don't this is a good idea...and I don't see 
> there
> are deletions from fs/ due to similar reason...

Fortunately, bzip is quite slow algorithm, so not affective at all.

My personal opinion based on compress algorithm principle (just for reference
as well...)
 - zlib should be better replaced with libdeflate if possible, the main point
   is that many hardware accelerator for deflate (LZ77 + huffman) are
   available;

 - lzo is not attractive from its format complexity and its CR/performance
goal so lz4 is generally better due to its format design;

 - lzo-rle, oops, just introduced for zram I think, not sure quite helpful
   for file data (since anonymous pages are generally RLE-effective due to
   many repeative data.) But it'd be good if lzo author accepts it.

Thanks,
Gao Xiang

> 
> Thanks,
> > 
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-03 Thread Gao Xiang
On Fri, Dec 04, 2020 at 10:38:08AM +0800, Chao Yu wrote:
> On 2020/12/4 10:06, Gao Xiang wrote:
> > On Fri, Dec 04, 2020 at 09:56:27AM +0800, Chao Yu wrote:

...

> 
> > 
> > Keep lz4hc dirty data under writeback could block writeback, make kswapd
> > busy, and direct memory reclaim path, I guess that's why rare online
> > compression chooses it. My own premature suggestion is that it'd better
> > to show the CR or performance benefits in advance, and prevent unprivileged
> > users from using high-level lz4hc algorithm (to avoid potential system 
> > attack.)
> > either from mount options or ioctl.
> 
> Yes, I guess you are worry about destop/server scenario, as for android 
> scenario,
> all compression related flow can be customized, and I don't think we will use
> online lz4hc compress; for other scenario, except the numbers, I need to add 
> the
> risk of using lz4hc algorithm in document.

Yes, I was saying the general scenario. My overall premature thought is that
before releasing some brand new algorithm, it may be better to evaluate first
it'd benefit to some scenarios first (either on CR or performance side, or
why adding this?), or it would might cause lzo-rle likewise situation in the
future (and add more dependency to algorithms, you might see BWT-based bzip2
removal patch
https://lore.kernel.org/r/20201117223253.65920-1-alex_y...@yahoo.ca
(since I personally don't think BWT is a good algorithm as well)... Just FYI
... If i'm wrong, kindly ignore me :)

Thanks,
Gao Xiang

> 
> Thanks,



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-03 Thread Gao Xiang
On Fri, Dec 04, 2020 at 09:56:27AM +0800, Chao Yu wrote:
> Hi Xiang,
> 
> On 2020/12/4 8:31, Gao Xiang wrote:
> > Hi Chao,
> > 
> > On Thu, Dec 03, 2020 at 11:32:34AM -0800, Eric Biggers wrote:
> > 
> > ...
> > 
> > > 
> > > What is the use case for storing the compression level on-disk?
> > > 
> > > Keep in mind that compression levels are an implementation detail; the 
> > > exact
> > > compressed data that is produced by a particular algorithm at a particular
> > > compression level is *not* a stable interface.  It can change when the
> > > compressor is updated, as long as the output continues to be compatible 
> > > with the
> > > decompressor.
> > > 
> > > So does compression level really belong in the on-disk format?
> > > 
> > 
> > Curious about this, since f2fs compression uses 16k f2fs compress cluster
> > by default (doesn't do sub-block compression by design as what btrfs did),
> > so is there significant CR difference between lz4 and lz4hc on 16k
> > configuration (I guess using zstd or lz4hc for 128k cluster like btrfs
> > could make more sense), could you leave some CR numbers about these
> > algorithms on typical datasets (enwik9, silisia.tar or else.) with 16k
> > cluster size?
> 
> Yup, I can figure out some numbers later. :)
> 
> > 
> > As you may noticed, lz4hc is much slower than lz4, so if it's used online,
> > it's a good way to keep all CPUs busy (under writeback) with unprivileged
> > users. I'm not sure if it does matter. (Ok, it'll give users more options
> > at least, yet I'm not sure end users are quite understand what these
> > algorithms really mean, I guess it spends more CPU time but without much
> > more storage saving by the default 16k configuration.)
> > 
> > from https://github.com/lz4/lz4Core i7-9700K CPU @ 4.9GHz
> > Silesia Corpus
> > 
> > Compressor  Ratio   Compression Decompression
> > memcpy  1.000   13700 MB/s  13700 MB/s
> > Zstandard 1.4.0 -1  2.883   515 MB/s1380 MB/s
> > LZ4 HC -9 (v1.9.0)  2.721   41 MB/s 4900 MB/s
> 
> There is one solutions now, Daeho has submitted two patches:
> 
> f2fs: add compress_mode mount option
> f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
> 
> Which allows to specify all files in data partition be compressible, by 
> default,
> all files are written as non-compressed one, at free time of system, we can 
> use
> ioctl to reload and compress data for specific files.
> 

Yeah, my own premature suggestion is there are many compression options
exist in f2fs compression, but end users are not compression experts.
So it'd better to leave advantage options to users (or users might be
confused or select wrong algorithm or make potential complaint...)

Keep lz4hc dirty data under writeback could block writeback, make kswapd
busy, and direct memory reclaim path, I guess that's why rare online
compression chooses it. My own premature suggestion is that it'd better
to show the CR or performance benefits in advance, and prevent unprivileged
users from using high-level lz4hc algorithm (to avoid potential system attack.)
either from mount options or ioctl.

> > 
> > Also a minor thing is lzo-rle, initially it was only used for in-memory
> > anonymous pages and it won't be kept on-disk so that's fine. I'm not sure
> > if lzo original author want to support it or not. It'd be better to get
> 
> 
> Hmm.. that's a problem, as there may be existed potential users who are
> using lzo-rle, remove lzo-rle support will cause compatibility issue...
> 
> IMO, the condition "f2fs may has persisted lzo-rle compress format data 
> already"
> may affect the decision of not supporting that algorithm from author.
> 
> > some opinion before keeping it on-disk.
> 
> Yes, I can try to ask... :)

Yeah, it'd be better to ask the author first, or it may have to maintain
a private lz4-rle folk...

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > - Eric
> > > 
> > > 
> > > ___
> > > Linux-f2fs-devel mailing list
> > > Linux-f2fs-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > 
> > .
> > 
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

2020-12-03 Thread Gao Xiang
Hi Chao,

On Thu, Dec 03, 2020 at 11:32:34AM -0800, Eric Biggers wrote:

...

> 
> What is the use case for storing the compression level on-disk?
> 
> Keep in mind that compression levels are an implementation detail; the exact
> compressed data that is produced by a particular algorithm at a particular
> compression level is *not* a stable interface.  It can change when the
> compressor is updated, as long as the output continues to be compatible with 
> the
> decompressor.
> 
> So does compression level really belong in the on-disk format?
> 

Curious about this, since f2fs compression uses 16k f2fs compress cluster
by default (doesn't do sub-block compression by design as what btrfs did),
so is there significant CR difference between lz4 and lz4hc on 16k
configuration (I guess using zstd or lz4hc for 128k cluster like btrfs
could make more sense), could you leave some CR numbers about these
algorithms on typical datasets (enwik9, silisia.tar or else.) with 16k
cluster size?

As you may noticed, lz4hc is much slower than lz4, so if it's used online,
it's a good way to keep all CPUs busy (under writeback) with unprivileged
users. I'm not sure if it does matter. (Ok, it'll give users more options
at least, yet I'm not sure end users are quite understand what these
algorithms really mean, I guess it spends more CPU time but without much
more storage saving by the default 16k configuration.)

from https://github.com/lz4/lz4Core i7-9700K CPU @ 4.9GHz
Silesia Corpus

Compressor  Ratio   Compression Decompression
memcpy  1.000   13700 MB/s  13700 MB/s
Zstandard 1.4.0 -1  2.883   515 MB/s1380 MB/s
LZ4 HC -9 (v1.9.0)  2.721   41 MB/s 4900 MB/s

Also a minor thing is lzo-rle, initially it was only used for in-memory
anonymous pages and it won't be kept on-disk so that's fine. I'm not sure
if lzo original author want to support it or not. It'd be better to get
some opinion before keeping it on-disk.

Thanks,
Gao Xiang

> - Eric
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4 2/3] fscrypt: Have filesystems handle their d_ops

2020-11-23 Thread Gao Xiang
On Mon, Nov 23, 2020 at 02:51:44PM -0800, Eric Biggers wrote:
> On Sun, Nov 22, 2020 at 01:12:18PM +0800, Gao Xiang wrote:
> > Hi all,
> > 
> > On Thu, Nov 19, 2020 at 06:09:03AM +, Daniel Rosenberg wrote:
> > > This shifts the responsibility of setting up dentry operations from
> > > fscrypt to the individual filesystems, allowing them to have their own
> > > operations while still setting fscrypt's d_revalidate as appropriate.
> > > 
> > > Most filesystems can just use generic_set_encrypted_ci_d_ops, unless
> > > they have their own specific dentry operations as well. That operation
> > > will set the minimal d_ops required under the circumstances.
> > > 
> > > Since the fscrypt d_ops are set later on, we must set all d_ops there,
> > > since we cannot adjust those later on. This should not result in any
> > > change in behavior.
> > > 
> > > Signed-off-by: Daniel Rosenberg 
> > > Acked-by: Eric Biggers 
> > > ---
> > 
> > ...
> > 
> > >  extern const struct file_operations ext4_dir_operations;
> > >  
> > > -#ifdef CONFIG_UNICODE
> > > -extern const struct dentry_operations ext4_dentry_ops;
> > > -#endif
> > > -
> > >  /* file.c */
> > >  extern const struct inode_operations ext4_file_inode_operations;
> > >  extern const struct file_operations ext4_file_operations;
> > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > > index 33509266f5a0..12a417ff5648 100644
> > > --- a/fs/ext4/namei.c
> > > +++ b/fs/ext4/namei.c
> > > @@ -1614,6 +1614,7 @@ static struct buffer_head *ext4_lookup_entry(struct 
> > > inode *dir,
> > >   struct buffer_head *bh;
> > >  
> > >   err = ext4_fname_prepare_lookup(dir, dentry, );
> > > + generic_set_encrypted_ci_d_ops(dentry);
> > 
> > One thing might be worth noticing is that currently overlayfs might
> > not work properly when dentry->d_sb->s_encoding is set even only some
> > subdirs are CI-enabled but the others not, see 
> > generic_set_encrypted_ci_d_ops(),
> > ovl_mount_dir_noesc => ovl_dentry_weird()
> > 
> > For more details, see:
> > https://android-review.googlesource.com/c/device/linaro/hikey/+/1483316/2#message-2e1f6ab0010a3e35e7d8effea73f60341f84ee4d
> > 
> > Just found it by chance (and not sure if it's vital for now), and
> > a kind reminder about this.
> > 
> 
> Yes, overlayfs doesn't work on ext4 or f2fs filesystems that have the casefold
> feature enabled, regardless of which directories are actually using 
> casefolding.
> This is an existing limitation which was previously discussed, e.g. at
> https://lkml.kernel.org/linux-ext4/CAOQ4uxgPXBazE-g2v=t_vovnr_f0zhykyz4wvn7a3epatzr...@mail.gmail.com/T/#u
> and
> https://lkml.kernel.org/linux-ext4/20191203051049.44573-1-dro...@google.com/T/#u.
> 
> Gabriel and Daniel, is one of you still looking into fixing this?  IIUC, the
> current thinking is that when the casefolding flag is set on a directory, it's
> too late to assign dentry_operations at that point.  But what if all child
> dentries (which must be negative) are invalidated first, and also the 
> filesystem
> forbids setting the casefold flag on encrypted directories that are accessed 
> via
> a no-key name (so that fscrypt_d_revalidate isn't needed -- i.e. the directory
> would only go from "no d_ops" to "generic_ci_dentry_ops", not from
> "generic_encrypted_dentry_ops" to "generic_encrypted_ci_dentry_ops")?

>From my limited knowledge about VFS, I think that is practical as well, since
we don't have sub-sub-dirs since all sub-dirs are negative dentries for empty 
dirs.
And if casefold ioctl is "dir inode locked", I think that would be fine (?)
I don't check the code though.

Thanks,
Gao Xiang

> 
> - Eric
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4 2/3] fscrypt: Have filesystems handle their d_ops

2020-11-21 Thread Gao Xiang
Hi all,

On Thu, Nov 19, 2020 at 06:09:03AM +, Daniel Rosenberg wrote:
> This shifts the responsibility of setting up dentry operations from
> fscrypt to the individual filesystems, allowing them to have their own
> operations while still setting fscrypt's d_revalidate as appropriate.
> 
> Most filesystems can just use generic_set_encrypted_ci_d_ops, unless
> they have their own specific dentry operations as well. That operation
> will set the minimal d_ops required under the circumstances.
> 
> Since the fscrypt d_ops are set later on, we must set all d_ops there,
> since we cannot adjust those later on. This should not result in any
> change in behavior.
> 
> Signed-off-by: Daniel Rosenberg 
> Acked-by: Eric Biggers 
> ---

...

>  extern const struct file_operations ext4_dir_operations;
>  
> -#ifdef CONFIG_UNICODE
> -extern const struct dentry_operations ext4_dentry_ops;
> -#endif
> -
>  /* file.c */
>  extern const struct inode_operations ext4_file_inode_operations;
>  extern const struct file_operations ext4_file_operations;
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 33509266f5a0..12a417ff5648 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1614,6 +1614,7 @@ static struct buffer_head *ext4_lookup_entry(struct 
> inode *dir,
>   struct buffer_head *bh;
>  
>   err = ext4_fname_prepare_lookup(dir, dentry, );
> + generic_set_encrypted_ci_d_ops(dentry);

One thing might be worth noticing is that currently overlayfs might
not work properly when dentry->d_sb->s_encoding is set even only some
subdirs are CI-enabled but the others not, see generic_set_encrypted_ci_d_ops(),
ovl_mount_dir_noesc => ovl_dentry_weird()

For more details, see:
https://android-review.googlesource.com/c/device/linaro/hikey/+/1483316/2#message-2e1f6ab0010a3e35e7d8effea73f60341f84ee4d

Just found it by chance (and not sure if it's vital for now), and
a kind reminder about this.

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3] f2fs: change virtual mapping way for compression pages

2020-08-12 Thread Gao Xiang
On Wed, Aug 12, 2020 at 02:17:11PM +0900, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> By profiling f2fs compression works, I've found vmap() callings have
> unexpected hikes in the execution time in our test environment and
> those are bottlenecks of f2fs decompression path. Changing these with
> vm_map_ram(), we can enhance f2fs decompression speed pretty much.
> 
> [Verification]
> Android Pixel 3(ARM64, 6GB RAM, 128GB UFS)
> Turned on only 0-3 little cores(at 1.785GHz)
> 
> dd if=/dev/zero of=dummy bs=1m count=1000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=dummy of=/dev/zero bs=512k
> 
> - w/o compression -
> 1048576000 bytes (0.9 G) copied, 2.082554 s, 480 M/s
> 1048576000 bytes (0.9 G) copied, 2.081634 s, 480 M/s
> 1048576000 bytes (0.9 G) copied, 2.090861 s, 478 M/s
> 
> - before patch -
> 1048576000 bytes (0.9 G) copied, 7.407527 s, 135 M/s
> 1048576000 bytes (0.9 G) copied, 7.283734 s, 137 M/s
> 1048576000 bytes (0.9 G) copied, 7.291508 s, 137 M/s
> 
> - after patch -
> 1048576000 bytes (0.9 G) copied, 1.998959 s, 500 M/s
> 1048576000 bytes (0.9 G) copied, 1.987554 s, 503 M/s
> 1048576000 bytes (0.9 G) copied, 1.986380 s, 503 M/s
>

The reason why I raised up this was that I once observed vmap() vs
vm_map_ram() on arm64 kirin platform as well. it indeed had some
impact (~10%) but not as huge as this. Anyway, such description
with test environment looks ok.

Thanks,
Gao Xiang

 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages

2020-08-11 Thread Gao Xiang
On Tue, Aug 11, 2020 at 08:21:23PM +0900, Daeho Jeong wrote:
> Sure, I'll update the test condition as you said in the commit message.
> FYI, the test is done with 16kb chunk and Pixel 3 (arm64) device.

Yeah, anyway, it'd better to lock the freq and offline the little
cores in your test as well (it'd make more sense). e.g. if 16k cluster
is applied, even all data is zeroed, the count of vmap/vm_map_ram
isn't hugeous (and as you said, "sometimes, it has a very long delay",
it's much like another scheduling concern as well).

Anyway, I'm not against your commit but the commit message is a bit
of unclear. At least, if you think that is really the case, I'm ok
with that.

Thanks,
Gao Xiang 

> 
> Thanks,
> 
> 2020년 8월 11� (화) 오후 7:18, Gao Xiang 님� 
> 작성:
> >
> > On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote:
> > > Plus, when we use vmap(), vmap() normally executes in a short time
> > > like vm_map_ram().
> > > But, sometimes, it has a very long delay.
> > >
> > > 2020년 8� 11� (�) 오후 6:28, Daeho 
> > > Jeong 님� 작성:
> > > >
> > > > Actually, as you can see, I use the whole zero data blocks in the test 
> > > > file.
> > > > It can maximize the effect of changing virtual mapping.
> > > > When I use normal files which can be compressed about 70% from the
> > > > original file,
> > > > The vm_map_ram() version is about 2x faster than vmap() version.
> >
> > What f2fs does is much similar to btrfs compression. Even if these
> > blocks are all zeroed. In principle, the maximum compression ratio
> > is determined (cluster sized blocks into one compressed block, e.g
> > 16k cluster into one compressed block).
> >
> > So it'd be better to describe your configured cluster size (16k or
> > 128k) and your hardware information in the commit message as well.
> >
> > Actually, I also tried with this patch as well on my x86 laptop just
> > now with FIO (I didn't use zeroed block though), and I didn't notice
> > much difference with turbo boost off and maxfreq.
> >
> > I'm not arguing this commit, just a note about this commit message.
> > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
> >
> > IMHO, the above number is much like decompressing in the arm64 little cores.
> >
> > Thanks,
> > Gao Xiang
> >
> >
> > > >
> > > > 2020년 8� 11� (�) 오후 4:55, Chao 
> > > > Yu 님� 작성:
> > > > >
> > > > > On 2020/8/11 15:15, Gao Xiang wrote:
> > > > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote:
> > > > > >> From: Daeho Jeong 
> > > > > >>
> > > > > >> By profiling f2fs compression works, I've found vmap() callings are
> > > > > >> bottlenecks of f2fs decompression path. Changing these with
> > > > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty much.
> > > > > >>
> > > > > >> [Verification]
> > > > > >> dd if=/dev/zero of=dummy bs=1m count=1000
> > > > > >> echo 3 > /proc/sys/vm/drop_caches
> > > > > >> dd if=dummy of=/dev/zero bs=512k
> > > > > >>
> > > > > >> - w/o compression -
> > > > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s
> > > > > >>
> > > > > >> - before patch -
> > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
> > > > > >>
> > > > > >> - after patch -
> > > > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s
> > > > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s
> > > > > >
> > > > > > Indeed, vmap() approach has some impact on the whole
> > > > > > workflow. But I don't think the gap is such significant,
> > > > > > maybe it relates to unlocked cpufreq (and big little
> > > > > > core difference if it's on some arm64 board).
> > > > >
> > > > > Agreed,
> > > > >
> > > > > I guess there should be other reason causing the large performance
> > > > > gap, scheduling, frequency, or something else.
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ___
> > > > > > Linux-f2fs-devel mailing list
> > > > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > > > .
> > > > > >
> > >
> >
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages

2020-08-11 Thread Gao Xiang
On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote:
> Plus, when we use vmap(), vmap() normally executes in a short time
> like vm_map_ram().
> But, sometimes, it has a very long delay.
> 
> 2020년 8월 11� (화) 오후 6:28, Daeho Jeong 님� 
> 작성:
> >
> > Actually, as you can see, I use the whole zero data blocks in the test file.
> > It can maximize the effect of changing virtual mapping.
> > When I use normal files which can be compressed about 70% from the
> > original file,
> > The vm_map_ram() version is about 2x faster than vmap() version.

What f2fs does is much similar to btrfs compression. Even if these
blocks are all zeroed. In principle, the maximum compression ratio
is determined (cluster sized blocks into one compressed block, e.g
16k cluster into one compressed block).

So it'd be better to describe your configured cluster size (16k or
128k) and your hardware information in the commit message as well.

Actually, I also tried with this patch as well on my x86 laptop just
now with FIO (I didn't use zeroed block though), and I didn't notice
much difference with turbo boost off and maxfreq.

I'm not arguing this commit, just a note about this commit message.
> > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s

IMHO, the above number is much like decompressing in the arm64 little cores.

Thanks,
Gao Xiang


> >
> > 2020년 8월 11� (화) 오후 4:55, Chao Yu 님� 
> > 작성:
> > >
> > > On 2020/8/11 15:15, Gao Xiang wrote:
> > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote:
> > > >> From: Daeho Jeong 
> > > >>
> > > >> By profiling f2fs compression works, I've found vmap() callings are
> > > >> bottlenecks of f2fs decompression path. Changing these with
> > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty much.
> > > >>
> > > >> [Verification]
> > > >> dd if=/dev/zero of=dummy bs=1m count=1000
> > > >> echo 3 > /proc/sys/vm/drop_caches
> > > >> dd if=dummy of=/dev/zero bs=512k
> > > >>
> > > >> - w/o compression -
> > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s
> > > >>
> > > >> - before patch -
> > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
> > > >>
> > > >> - after patch -
> > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s
> > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s
> > > >
> > > > Indeed, vmap() approach has some impact on the whole
> > > > workflow. But I don't think the gap is such significant,
> > > > maybe it relates to unlocked cpufreq (and big little
> > > > core difference if it's on some arm64 board).
> > >
> > > Agreed,
> > >
> > > I guess there should be other reason causing the large performance
> > > gap, scheduling, frequency, or something else.
> > >
> > > >
> > > >
> > > >
> > > > ___
> > > > Linux-f2fs-devel mailing list
> > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > .
> > > >
> 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages

2020-08-11 Thread Gao Xiang
On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> By profiling f2fs compression works, I've found vmap() callings are
> bottlenecks of f2fs decompression path. Changing these with
> vm_map_ram(), we can enhance f2fs decompression speed pretty much.
> 
> [Verification]
> dd if=/dev/zero of=dummy bs=1m count=1000
> echo 3 > /proc/sys/vm/drop_caches
> dd if=dummy of=/dev/zero bs=512k
> 
> - w/o compression -
> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s
> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s
> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s
> 
> - before patch -
> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
> 
> - after patch -
> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s
> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s
> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s

Indeed, vmap() approach has some impact on the whole
workflow. But I don't think the gap is such significant,
maybe it relates to unlocked cpufreq (and big little
core difference if it's on some arm64 board).



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] IO hang due to f2fs checkpoint and writeback stuck

2020-07-09 Thread Gao Xiang via Linux-f2fs-devel
On Fri, Jul 10, 2020 at 10:54:13AM +0800, Chao Yu wrote:
> Hi Sahitya,
> 
> It looks block plug has already been removed by Jaegeuk with
> below commit:
> 
> commit 1f5f11a3c41e2b23288b2769435a00f74e02496b
> Author: Jaegeuk Kim 
> Date:   Fri May 8 12:25:45 2020 -0700
> 
> f2fs: remove blk_plugging in block_operations
> 
> blk_plugging doesn't seem to give any benefit.
> 
> How about backporting this patch?

Yeah, also notice that

commit bd900d4580107c899d43b262fbbd995f11097a43
Author: Jens Axboe 
Date:   Mon Apr 18 22:06:57 2011 +0200

block: kill blk_flush_plug_list() export

With all drivers and file systems converted, we only have
in-core use of this function. So remove the export.

Reporteed-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 

blk_flush_plug_list() is not an exported symbol for now except for in-core use,
as well as blk_flush_plug().

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] Fwd: Any tools of f2fs to inspect infos?

2020-07-01 Thread Gao Xiang
(cc linux-f2fs-devel, Jaegeuk, Chao.
 It'd be better to ask related people and cc the corresponding list.)

On Wed, Jul 01, 2020 at 03:29:41PM +0800, lampahome wrote:
> As title
> 
> Any tools of f2fs to inspect like allocated segments, hot/warm/cold
> ratio, or gc is running?
> 
> thx



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: compress: allow lz4 to compress data partially

2020-05-08 Thread Gao Xiang via Linux-f2fs-devel
Hi Chao,

On Fri, May 08, 2020 at 05:47:09PM +0800, Chao Yu wrote:
> For lz4 worst compress case, caller should allocate buffer with size
> of LZ4_compressBound(inputsize) for target compressed data storing.
> 
> However lz4 supports partial data compression, so we can get rid of
> output buffer size limitation now, then we can avoid 2 * 4KB size
> intermediate buffer allocation when log_cluster_size is 2, and avoid
> unnecessary compressing work of compressor if we can not save at
> least 4KB space.
> 
> Suggested-by: Daeho Jeong 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/compress.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index 5e4947250262..23825f559bcf 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -228,7 +228,12 @@ static int lz4_init_compress_ctx(struct compress_ctx *cc)
>   if (!cc->private)
>   return -ENOMEM;
>  
> - cc->clen = LZ4_compressBound(PAGE_SIZE << cc->log_cluster_size);
> + /*
> +  * we do not change cc->clen to LZ4_compressBound(inputsize) to
> +  * adapt worst compress case, because lz4 algorithm supports partial
> +  * compression.

Actually, in this case the lz4 compressed block is not valid (at least not ended
in a valid lz4 EOF), and AFAIK the current public lz4 API cannot keep on
compressing this block. so IMO "partial compression" for an invalid lz4
block may be confusing.

I think some words like "because lz4 implementation can handle output buffer
budget properly" or similar stuff could be better.

The same to the patch title and the commit message.

Thanks,
Gao Xiang


> +  */
> + cc->clen = cc->rlen - PAGE_SIZE - COMPRESS_HEADER_SIZE;
>   return 0;
>  }
>  
> @@ -244,11 +249,9 @@ static int lz4_compress_pages(struct compress_ctx *cc)
>  
>   len = LZ4_compress_default(cc->rbuf, cc->cbuf->cdata, cc->rlen,
>   cc->clen, cc->private);
> - if (!len) {
> - printk_ratelimited("%sF2FS-fs (%s): lz4 compress failed\n",
> - KERN_ERR, F2FS_I_SB(cc->inode)->sb->s_id);
> - return -EIO;
> - }
> + if (!len)
> + return -EAGAIN;
> +
>   cc->clen = len;
>   return 0;
>  }
> -- 
> 2.18.0.rc1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: get parent inode when recovering pino

2020-05-07 Thread Gao Xiang
Hi Chao,

On Thu, May 07, 2020 at 02:38:39PM +0800, Chao Yu wrote:
> On 2020/5/7 6:36, Gao Xiang wrote:
> > On Wed, May 06, 2020 at 12:16:13PM -0700, Eric Biggers wrote:
> >> On Wed, May 06, 2020 at 02:47:19PM +0800, Gao Xiang wrote:
> >>> On Wed, May 06, 2020 at 09:58:22AM +0800, Gao Xiang wrote:
> >>>> On Tue, May 05, 2020 at 06:24:28PM -0700, Eric Biggers wrote:
> >>>>> On Wed, May 06, 2020 at 08:14:07AM +0800, Gao Xiang wrote:
> >>>>>>>
> >>>>>>> Actually, I think this is wrong because the fsync can be done via a 
> >>>>>>> file
> >>>>>>> descriptor that was opened to a now-deleted link to the file.
> >>>>>>
> >>>>>> I'm still confused about this...
> >>>>>>
> >>>>>> I don't know what's wrong with this version from my limited knowledge?
> >>>>>>  inode itself is locked when fsyncing, so
> >>>>>>
> >>>>>>if the fsync inode->i_nlink == 1, this inode has only one hard link
> >>>>>>(not deleted yet) and should belong to a single directory; and
> >>>>>>
> >>>>>>the only one parent directory would not go away (not deleted as 
> >>>>>> well)
> >>>>>>since there are some dirents in it (not empty).
> >>>>>>
> >>>>>> Could kindly explain more so I would learn more about this scenario?
> >>>>>> Thanks a lot!
> >>>>>
> >>>>> i_nlink == 1 just means that there is one non-deleted link.  There can 
> >>>>> be links
> >>>>> that have since been deleted, and file descriptors can still be open to 
> >>>>> them.
> >>>>
> >>>> Thanks for your inspiration. You are right, thanks.
> >>>>
> >>>> Correct my words... I didn't check f2fs code just now, it seems f2fs 
> >>>> doesn't
> >>>> take inode_lock as some other fs like __generic_file_fsync or 
> >>>> ubifs_fsync.
> >>>>
> >>>> And i_sem locks nlink / try_to_fix_pino similarly in some extent. It 
> >>>> seems
> >>>> no race by using d_find_alias here. Thanks again.
> >>>>
> >>>
> >>> (think more little bit just now...)
> >>>
> >>>  Thread 1:   Thread 2 (fsync):
> >>>   vfs_unlink  try_to_fix_pino
> >>> f2fs_unlink
> >>>f2fs_delete_entry
> >>>  f2fs_drop_nlink  (i_sem, inode->i_nlink = 1)
> >>>
> >>>   (...   but this dentry still hashed)  i_sem, check 
> >>> inode->i_nlink = 1
> >>> i_sem d_find_alias
> >>>
> >>>   d_delete
> >>>
> >>> I'm not sure if fsync could still use some wrong alias by chance..
> >>> completely untested, maybe just noise...
>
> Another race condition could be:
>
> Thread 1 (fsync)  Thread 2 (rename)
> - f2fs_sync_fs
> - try_to_fix_pino
>   - f2fs_rename
>- down_write
>- file_lost_pino
>- up_write
>  - down_write
>  - file_got_pino
>  - up_write

Yes, IMHO, I think it could be not proper to
take dir lock in fsync path anyway...

I would suggest as before (if it needs to be fixed).
And it seems no significant performance difference.

Thanks,
Gao Xiang

>
> Thanks,
>
> >>>
> >>
> >> Right, good observation.  My patch makes it better, but it's still broken.
> >>
> >> I don't know how to fix it.  If we see i_nlink == 1 and multiple hashed
> >> dentries, there doesn't appear to be a way to distingush which one 
> >> corresponds
> >> to the remaining link on-disk (if any; it may not even be in the dcache), 
> >> and
> >> which correspond to links that vfs_unlink() has deleted from disk but 
> >> hasn't yet
> >> done d_delete() on.
> >>
> >> One idea would be choose one, then take inode_lock_shared(dir) and do
> >> __f2fs_find_entry() to check if the dentry is really still on-disk.  That's
> >> heavyweight and error-prone though, and the locking could cause problems.
> >>
> >> I'm wondering though, does f2fs really need try_to_fix_pino() at all, and 
> >> did it
> >> ever really work?  It never actually updates the f2fs_inode::i_name to 
> >> match the
> >> new directory.  So independently of this bug with deleted links, I don't 
> >> see how
> >> it can possibly work as intended.
> >
> > Part of my humble opinion would be "update pino in rename/unlink/link... 
> > such ops
> > instead of in fsync" (maybe it makes better sense of locking)... But 
> > actually I'm
> > not a f2fs folk now, just curious about what the original patch resolved 
> > with
> > these new extra igrab/iput (as I said before, I could not find some clue 
> > previously).
> >
> > Thanks,
> > Gao Xiang
> >
> >>
> >> - Eric
> >
> >
> > ___
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > .
> >


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: get parent inode when recovering pino

2020-05-06 Thread Gao Xiang
On Wed, May 06, 2020 at 12:16:13PM -0700, Eric Biggers wrote:
> On Wed, May 06, 2020 at 02:47:19PM +0800, Gao Xiang wrote:
> > On Wed, May 06, 2020 at 09:58:22AM +0800, Gao Xiang wrote:
> > > On Tue, May 05, 2020 at 06:24:28PM -0700, Eric Biggers wrote:
> > > > On Wed, May 06, 2020 at 08:14:07AM +0800, Gao Xiang wrote:
> > > > > >
> > > > > > Actually, I think this is wrong because the fsync can be done via a 
> > > > > > file
> > > > > > descriptor that was opened to a now-deleted link to the file.
> > > > >
> > > > > I'm still confused about this...
> > > > >
> > > > > I don't know what's wrong with this version from my limited knowledge?
> > > > >  inode itself is locked when fsyncing, so
> > > > >
> > > > >if the fsync inode->i_nlink == 1, this inode has only one hard link
> > > > >(not deleted yet) and should belong to a single directory; and
> > > > >
> > > > >the only one parent directory would not go away (not deleted as 
> > > > > well)
> > > > >since there are some dirents in it (not empty).
> > > > >
> > > > > Could kindly explain more so I would learn more about this scenario?
> > > > > Thanks a lot!
> > > >
> > > > i_nlink == 1 just means that there is one non-deleted link.  There can 
> > > > be links
> > > > that have since been deleted, and file descriptors can still be open to 
> > > > them.
> > >
> > > Thanks for your inspiration. You are right, thanks.
> > >
> > > Correct my words... I didn't check f2fs code just now, it seems f2fs 
> > > doesn't
> > > take inode_lock as some other fs like __generic_file_fsync or ubifs_fsync.
> > >
> > > And i_sem locks nlink / try_to_fix_pino similarly in some extent. It seems
> > > no race by using d_find_alias here. Thanks again.
> > >
> >
> > (think more little bit just now...)
> >
> >  Thread 1:   Thread 2 (fsync):
> >   vfs_unlink  try_to_fix_pino
> > f2fs_unlink
> >f2fs_delete_entry
> >  f2fs_drop_nlink  (i_sem, inode->i_nlink = 1)
> >
> >   (...   but this dentry still hashed)  i_sem, check 
> > inode->i_nlink = 1
> > i_sem d_find_alias
> >
> >   d_delete
> >
> > I'm not sure if fsync could still use some wrong alias by chance..
> > completely untested, maybe just noise...
> >
>
> Right, good observation.  My patch makes it better, but it's still broken.
>
> I don't know how to fix it.  If we see i_nlink == 1 and multiple hashed
> dentries, there doesn't appear to be a way to distingush which one corresponds
> to the remaining link on-disk (if any; it may not even be in the dcache), and
> which correspond to links that vfs_unlink() has deleted from disk but hasn't 
> yet
> done d_delete() on.
>
> One idea would be choose one, then take inode_lock_shared(dir) and do
> __f2fs_find_entry() to check if the dentry is really still on-disk.  That's
> heavyweight and error-prone though, and the locking could cause problems.
>
> I'm wondering though, does f2fs really need try_to_fix_pino() at all, and did 
> it
> ever really work?  It never actually updates the f2fs_inode::i_name to match 
> the
> new directory.  So independently of this bug with deleted links, I don't see 
> how
> it can possibly work as intended.

Part of my humble opinion would be "update pino in rename/unlink/link... such 
ops
instead of in fsync" (maybe it makes better sense of locking)... But actually 
I'm
not a f2fs folk now, just curious about what the original patch resolved with
these new extra igrab/iput (as I said before, I could not find some clue 
previously).

Thanks,
Gao Xiang

>
> - Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: get parent inode when recovering pino

2020-05-06 Thread Gao Xiang
On Wed, May 06, 2020 at 09:58:22AM +0800, Gao Xiang wrote:
> On Tue, May 05, 2020 at 06:24:28PM -0700, Eric Biggers wrote:
> > On Wed, May 06, 2020 at 08:14:07AM +0800, Gao Xiang wrote:
> > > >
> > > > Actually, I think this is wrong because the fsync can be done via a file
> > > > descriptor that was opened to a now-deleted link to the file.
> > >
> > > I'm still confused about this...
> > >
> > > I don't know what's wrong with this version from my limited knowledge?
> > >  inode itself is locked when fsyncing, so
> > >
> > >if the fsync inode->i_nlink == 1, this inode has only one hard link
> > >(not deleted yet) and should belong to a single directory; and
> > >
> > >the only one parent directory would not go away (not deleted as well)
> > >since there are some dirents in it (not empty).
> > >
> > > Could kindly explain more so I would learn more about this scenario?
> > > Thanks a lot!
> >
> > i_nlink == 1 just means that there is one non-deleted link.  There can be 
> > links
> > that have since been deleted, and file descriptors can still be open to 
> > them.
>
> Thanks for your inspiration. You are right, thanks.
>
> Correct my words... I didn't check f2fs code just now, it seems f2fs doesn't
> take inode_lock as some other fs like __generic_file_fsync or ubifs_fsync.
>
> And i_sem locks nlink / try_to_fix_pino similarly in some extent. It seems
> no race by using d_find_alias here. Thanks again.
>

(think more little bit just now...)

 Thread 1:   Thread 2 (fsync):
  vfs_unlink  try_to_fix_pino
f2fs_unlink
   f2fs_delete_entry
 f2fs_drop_nlink  (i_sem, inode->i_nlink = 1)

  (...   but this dentry still hashed)  i_sem, check 
inode->i_nlink = 1
i_sem d_find_alias

  d_delete

I'm not sure if fsync could still use some wrong alias by chance..
completely untested, maybe just noise...

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: get parent inode when recovering pino

2020-05-05 Thread Gao Xiang via Linux-f2fs-devel
On Tue, May 05, 2020 at 06:24:28PM -0700, Eric Biggers wrote:
> On Wed, May 06, 2020 at 08:14:07AM +0800, Gao Xiang wrote:
> > >
> > > Actually, I think this is wrong because the fsync can be done via a file
> > > descriptor that was opened to a now-deleted link to the file.
> > 
> > I'm still confused about this...
> > 
> > I don't know what's wrong with this version from my limited knowledge?
> >  inode itself is locked when fsyncing, so
> > 
> >if the fsync inode->i_nlink == 1, this inode has only one hard link
> >(not deleted yet) and should belong to a single directory; and
> > 
> >the only one parent directory would not go away (not deleted as well)
> >since there are some dirents in it (not empty).
> > 
> > Could kindly explain more so I would learn more about this scenario?
> > Thanks a lot!
> 
> i_nlink == 1 just means that there is one non-deleted link.  There can be 
> links
> that have since been deleted, and file descriptors can still be open to them.

Thanks for your inspiration. You are right, thanks.

Correct my words... I didn't check f2fs code just now, it seems f2fs doesn't
take inode_lock as some other fs like __generic_file_fsync or ubifs_fsync.

And i_sem locks nlink / try_to_fix_pino similarly in some extent. It seems
no race by using d_find_alias here. Thanks again.

Thanks,
Gao Xiang

> 
> > 
> > >
> > > We need to find the dentry whose parent directory is still exists, i.e. 
> > > the
> > > parent directory that is counting towards 'inode->i_nlink == 1'.
> > 
> > directory counting towards 'inode->i_nlink == 1', what's happening?
> 
> The non-deleted link is the one counted in i_nlink.
> 
> > 
> > >
> > > I think d_find_alias() is what we're looking for.
> > 
> > It may be simply dentry->d_parent (stable/positive as you said before, and 
> > it's
> > not empty). why need to d_find_alias()?
> 
> Because we need to get the dentry that hasn't been deleted yet, which isn't
> necessarily the one associated with the file descriptor being fsync()'ed.
> 
> > And what is the original problem? I could not get some clue from the 
> > original
> > patch description (I only saw some extra igrab/iput because of some unknown
> > reasons), it there some backtrace related to the problem?
> 
> The problem is that i_pino gets set incorrectly.  I just noticed this while
> reviewing the code.  It's not hard to reproduce, e.g.:
> 
> #include 
> #include 
> #include 
> 
> int main()
> {
> int fd;
> 
> mkdir("dir1", 0700);
> mkdir("dir2", 0700);
> mknod("dir1/file", S_IFREG|0600, 0);
> link("dir1/file", "dir2/file");
> fd = open("dir2/file", O_WRONLY);
> unlink("dir2/file");
> write(fd, "X", 1);
> fsync(fd);
> }
> 
> Then:
> 
> sync
> echo N | dump.f2fs -i $(stat -c %i dir1/file) /dev/vdb | grep 'i_pino'
> echo "dir1 (correct): $(stat -c %i dir1)"
> echo "dir2 (wrong): $(stat -c %i dir2)"
> 
> i_pino will point to dir2 rather than dir1 as expected.
> 
> - Eric
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: get parent inode when recovering pino

2020-05-05 Thread Gao Xiang
Hi Eric,

On Tue, May 05, 2020 at 11:19:41AM -0700, Eric Biggers wrote:
> On Tue, May 05, 2020 at 11:13:23AM -0700, Jaegeuk Kim wrote:

...

> >
> > -static int get_parent_ino(struct inode *inode, nid_t *pino)
> > -{
> > -   struct dentry *dentry;
> > -
> > -   inode = igrab(inode);
> > -   dentry = d_find_any_alias(inode);
> > -   iput(inode);
> > -   if (!dentry)
> > -   return 0;
> > -
> > -   *pino = parent_ino(dentry);
> > -   dput(dentry);
> > -   return 1;
> > -}
> > -
> >  static inline enum cp_reason_type need_do_checkpoint(struct inode *inode)
> >  {
> > struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > @@ -223,14 +208,15 @@ static bool need_inode_page_update(struct 
> > f2fs_sb_info *sbi, nid_t ino)
> > return ret;
> >  }
> >
> > -static void try_to_fix_pino(struct inode *inode)
> > +static void try_to_fix_pino(struct dentry *dentry)
> >  {
> > +   struct inode *inode = d_inode(dentry);
> > struct f2fs_inode_info *fi = F2FS_I(inode);
> > -   nid_t pino;
> >
> > down_write(>i_sem);
> > -   if (file_wrong_pino(inode) && inode->i_nlink == 1 &&
> > -   get_parent_ino(inode, )) {
> > +   if (file_wrong_pino(inode) && inode->i_nlink == 1) {
> > +   nid_t pino = parent_ino(dentry);
> > +
> > f2fs_i_pino_write(inode, pino);
> > file_got_pino(inode);
> > }
> > @@ -310,7 +296,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> > start, loff_t end,
> >  * We've secured consistency through sync_fs. Following pino
> >  * will be used only for fsynced inodes after checkpoint.
> >  */
> > -   try_to_fix_pino(inode);
> > +   try_to_fix_pino(file_dentry(file));
> > clear_inode_flag(inode, FI_APPEND_WRITE);
> > clear_inode_flag(inode, FI_UPDATE_WRITE);
> > goto out;
>
> Actually, I think this is wrong because the fsync can be done via a file
> descriptor that was opened to a now-deleted link to the file.

I'm still confused about this...

I don't know what's wrong with this version from my limited knowledge?
 inode itself is locked when fsyncing, so

   if the fsync inode->i_nlink == 1, this inode has only one hard link
   (not deleted yet) and should belong to a single directory; and

   the only one parent directory would not go away (not deleted as well)
   since there are some dirents in it (not empty).

Could kindly explain more so I would learn more about this scenario?
Thanks a lot!

>
> We need to find the dentry whose parent directory is still exists, i.e. the
> parent directory that is counting towards 'inode->i_nlink == 1'.

directory counting towards 'inode->i_nlink == 1', what's happening?

>
> I think d_find_alias() is what we're looking for.

It may be simply dentry->d_parent (stable/positive as you said before, and it's
not empty). why need to d_find_alias()?


And what is the original problem? I could not get some clue from the original
patch description (I only saw some extra igrab/iput because of some unknown
reasons), it there some backtrace related to the problem?

Thanks,
Gao Xiang

>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 6ab8f621a3c5..855f27468baa 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -165,13 +165,13 @@ static int get_parent_ino(struct inode *inode, nid_t 
> *pino)
>  {
> struct dentry *dentry;
>
> -   inode = igrab(inode);
> -   dentry = d_find_any_alias(inode);
> -   iput(inode);
> +   dentry = d_find_alias(inode);
> if (!dentry)
> return 0;
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v11 19/25] erofs: Convert compressed files from readpages to readahead

2020-04-21 Thread Gao Xiang via Linux-f2fs-devel
Hi Andrew,

On Mon, Apr 20, 2020 at 10:42:10PM -0700, Andrew Morton wrote:
> On Tue, 14 Apr 2020 08:02:27 -0700 Matthew Wilcox  wrote:
> 
> > 
> > Use the new readahead operation in erofs.
> > 
> 
> Well this is exciting.
> 
> fs/erofs/data.c: In function erofs_raw_access_readahead:
> fs/erofs/data.c:149:18: warning: last_block may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>   *last_block + 1 != current_block) {
> 
> It seems to be a preexisting bug, which your patch prompted gcc-7.2.0
> to notice.
> 
> erofs_read_raw_page() goes in and uses *last_block, but neither of its
> callers has initialized it.  Could the erofs maintainers please take a
> look?

simply because last_block doesn't need to be initialized at first,
because bio == NULL in the begining anyway. I believe this is a gcc
false warning because some gcc versions raised some before (many gccs
don't, including my current gcc (Debian 8.3.0-6) 8.3.0).

in detail,

146 /* note that for readpage case, bio also equals to NULL */
147 if (bio &&
148 /* not continuous */
149 *last_block + 1 != current_block) {
150 submit_bio_retry:
151 submit_bio(bio);
152 bio = NULL;
153 }

bio will be NULL and will bypass the next condition at first.
after that,

155 if (!bio) {

...

221 bio = bio_alloc(GFP_NOIO, nblocks);

...

}

...

230 err = bio_add_page(bio, page, PAGE_SIZE, 0);
231 /* out of the extent or bio is full */
232 if (err < PAGE_SIZE)
233 goto submit_bio_retry;
234
235 *last_block = current_block;

so bio != NULL, and last_block will be assigned then as well.

Thanks,
Gao Xiang




___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 12/19] erofs: Convert uncompressed files from readpages to readahead

2020-02-18 Thread Gao Xiang
On Mon, Feb 17, 2020 at 10:46:01AM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> Use the new readahead operation in erofs
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---

It looks good to me, and will test it later as well..

Acked-by: Gao Xiang 

Thanks,
Gao Xiang

>  fs/erofs/data.c  | 39 +---
>  fs/erofs/zdata.c |  2 +-
>  include/trace/events/erofs.h |  6 +++---
>  3 files changed, 18 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index fc3a8d8064f8..82ebcee9d178 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -280,47 +280,36 @@ static int erofs_raw_access_readpage(struct file *file, 
> struct page *page)
>   return 0;
>  }
>  
> -static int erofs_raw_access_readpages(struct file *filp,
> -   struct address_space *mapping,
> -   struct list_head *pages,
> -   unsigned int nr_pages)
> +static void erofs_raw_access_readahead(struct readahead_control *rac)
>  {
>   erofs_off_t last_block;
>   struct bio *bio = NULL;
> - gfp_t gfp = readahead_gfp_mask(mapping);
> - struct page *page = list_last_entry(pages, struct page, lru);
> -
> - trace_erofs_readpages(mapping->host, page, nr_pages, true);
> + struct page *page;
>  
> - for (; nr_pages; --nr_pages) {
> - page = list_entry(pages->prev, struct page, lru);
> + trace_erofs_readpages(rac->mapping->host, readahead_index(rac),
> + readahead_count(rac), true);
>  
> + readahead_for_each(rac, page) {
>   prefetchw(>flags);
> - list_del(>lru);
>  
> - if (!add_to_page_cache_lru(page, mapping, page->index, gfp)) {
> - bio = erofs_read_raw_page(bio, mapping, page,
> -   _block, nr_pages, true);
> + bio = erofs_read_raw_page(bio, rac->mapping, page, _block,
> + readahead_count(rac), true);
>  
> - /* all the page errors are ignored when readahead */
> - if (IS_ERR(bio)) {
> - pr_err("%s, readahead error at page %lu of nid 
> %llu\n",
> -__func__, page->index,
> -EROFS_I(mapping->host)->nid);
> + /* all the page errors are ignored when readahead */
> + if (IS_ERR(bio)) {
> + pr_err("%s, readahead error at page %lu of nid %llu\n",
> +__func__, page->index,
> +EROFS_I(rac->mapping->host)->nid);
>  
> - bio = NULL;
> - }
> + bio = NULL;
>   }
>  
> - /* pages could still be locked */
>   put_page(page);
>   }
> - DBG_BUGON(!list_empty(pages));
>  
>   /* the rare case (end in gaps) */
>   if (bio)
>   submit_bio(bio);
> - return 0;
>  }
>  
>  static int erofs_get_block(struct inode *inode, sector_t iblock,
> @@ -358,7 +347,7 @@ static sector_t erofs_bmap(struct address_space *mapping, 
> sector_t block)
>  /* for uncompressed (aligned) files and raw access for other files */
>  const struct address_space_operations erofs_raw_access_aops = {
>   .readpage = erofs_raw_access_readpage,
> - .readpages = erofs_raw_access_readpages,
> + .readahead = erofs_raw_access_readahead,
>   .bmap = erofs_bmap,
>  };
>  
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 80e47f07d946..17f45fcb8c5c 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1315,7 +1315,7 @@ static int z_erofs_readpages(struct file *filp, struct 
> address_space *mapping,
>   struct page *head = NULL;
>   LIST_HEAD(pagepool);
>  
> - trace_erofs_readpages(mapping->host, lru_to_page(pages),
> + trace_erofs_readpages(mapping->host, lru_to_page(pages)->index,
> nr_pages, false);
>  
>   f.headoffset = (erofs_off_t)lru_to_page(pages)->index << PAGE_SHIFT;
> diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
> index 27f5caa6299a..bf9806fd1306 100644
> --- a/include/trace/events/erofs.h
> +++ b/include/trace/events/erofs.h
> @@ -113,10 +113,10 @@ TRACE_EVENT(erofs_readpage,
>  
>  TRACE_EVENT(erofs_readpages,
>  
> - TP_PROTO(struct inode *inode, struct page *page, unsigned int nrpage,
>

Re: [f2fs-dev] [PATCH v6 11/16] erofs: Convert compressed files from readpages to readahead

2020-02-18 Thread Gao Xiang
On Mon, Feb 17, 2020 at 10:46:00AM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> Use the new readahead operation in erofs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 

It looks good to me, although some further optimization exists
but we could make a straight-forward transform first, and
I haven't tested the whole series for now...
Will test it later.

Acked-by: Gao Xiang 

Thanks,
Gao Xiang

> ---
>  fs/erofs/zdata.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 17f45fcb8c5c..7c02015d501d 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1303,28 +1303,23 @@ static bool should_decompress_synchronously(struct 
> erofs_sb_info *sbi,
>   return nr <= sbi->max_sync_decompress_pages;
>  }
>  
> -static int z_erofs_readpages(struct file *filp, struct address_space 
> *mapping,
> -  struct list_head *pages, unsigned int nr_pages)
> +static void z_erofs_readahead(struct readahead_control *rac)
>  {
> - struct inode *const inode = mapping->host;
> + struct inode *const inode = rac->mapping->host;
>   struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
>  
> - bool sync = should_decompress_synchronously(sbi, nr_pages);
> + bool sync = should_decompress_synchronously(sbi, readahead_count(rac));
>   struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
> - gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL);
> - struct page *head = NULL;
> + struct page *page, *head = NULL;
>   LIST_HEAD(pagepool);
>  
> - trace_erofs_readpages(mapping->host, lru_to_page(pages)->index,
> -   nr_pages, false);
> + trace_erofs_readpages(inode, readahead_index(rac),
> + readahead_count(rac), false);
>  
> - f.headoffset = (erofs_off_t)lru_to_page(pages)->index << PAGE_SHIFT;
> -
> - for (; nr_pages; --nr_pages) {
> - struct page *page = lru_to_page(pages);
> + f.headoffset = readahead_offset(rac);
>  
> + readahead_for_each(rac, page) {
>   prefetchw(>flags);
> - list_del(>lru);
>  
>   /*
>* A pure asynchronous readahead is indicated if
> @@ -1333,11 +1328,6 @@ static int z_erofs_readpages(struct file *filp, struct 
> address_space *mapping,
>*/
>   sync &= !(PageReadahead(page) && !head);
>  
> - if (add_to_page_cache_lru(page, mapping, page->index, gfp)) {
> - list_add(>lru, );
> - continue;
> - }
> -
>   set_page_private(page, (unsigned long)head);
>   head = page;
>   }
> @@ -1366,11 +1356,10 @@ static int z_erofs_readpages(struct file *filp, 
> struct address_space *mapping,
>  
>   /* clean up the remaining free pages */
>   put_pages_list();
> - return 0;
>  }
>  
>  const struct address_space_operations z_erofs_aops = {
>   .readpage = z_erofs_readpage,
> - .readpages = z_erofs_readpages,
> + .readahead = z_erofs_readahead,
>  };
>  
> -- 
> 2.25.0
> 
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: fix race conditions in ->d_compare() and ->d_hash()

2020-01-23 Thread Gao Xiang via Linux-f2fs-devel
On Thu, Jan 23, 2020 at 09:42:56PM -0800, Eric Biggers wrote:
> On Fri, Jan 24, 2020 at 01:34:23PM +0800, Gao Xiang wrote:
> > On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> > 
> > []
> > 
> > > So we need READ_ONCE() to ensure that a consistent value is used.
> > 
> > By the way, my understanding is all pointer could be accessed
> > atomicly guaranteed by compiler. In my opinion, we generally
> > use READ_ONCE() on pointers for other uses (such as, avoid
> > accessing a variable twice due to compiler optimization and
> > it will break some logic potentially or need some data
> > dependency barrier...)
> > 
> > Thanks,
> > Gao Xiang
> 
> But that *is* why we need READ_ONCE() here.  Without it, there's no guarantee
> that the compiler doesn't load the variable twice.  Please read:
> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

After scanning the patch, it seems the parent variable (dentry->d_parent)
only referenced once as below:

-   struct inode *inode = dentry->d_parent->d_inode;
+   const struct dentry *parent = READ_ONCE(dentry->d_parent);
+   const struct inode *inode = READ_ONCE(parent->d_inode);

So I think it is enough as

const struct inode *inode = READ_ONCE(dentry->d_parent->d_inode);

to access parent inode once to avoid parent inode being accessed
for more time (and all pointers dereference should be in atomic
by compilers) as one reason on

if (!inode || !IS_CASEFOLDED(inode) || ...

or etc.

Thanks for your web reference, I will look into it. I think there
is no worry about dentry->d_parent here because of this only one
dereference on dentry->d_parent.

You could ignore my words anyway, just my little thought though.
Other part of the patch is ok.

Thanks,
Gao Xiang

> 
> - Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: fix race conditions in ->d_compare() and ->d_hash()

2020-01-23 Thread Gao Xiang via Linux-f2fs-devel
On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:

[]

> So we need READ_ONCE() to ensure that a consistent value is used.

By the way, my understanding is all pointer could be accessed
atomicly guaranteed by compiler. In my opinion, we generally
use READ_ONCE() on pointers for other uses (such as, avoid
accessing a variable twice due to compiler optimization and
it will break some logic potentially or need some data
dependency barrier...)

Thanks,
Gao Xiang




___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: fix race conditions in ->d_compare() and ->d_hash()

2020-01-23 Thread Gao Xiang via Linux-f2fs-devel
On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> On Fri, Jan 24, 2020 at 01:04:25PM +0800, Gao Xiang wrote:
> > > diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> > > index 8964778aabefb..0129d14629881 100644
> > > --- a/fs/ext4/dir.c
> > > +++ b/fs/ext4/dir.c
> > > @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry 
> > > *dentry, unsigned int len,
> > > const char *str, const struct qstr *name)
> > >  {
> > >   struct qstr qstr = {.name = str, .len = len };
> > > - struct inode *inode = dentry->d_parent->d_inode;
> > > + const struct dentry *parent = READ_ONCE(dentry->d_parent);
> > 
> > I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
> > won't be NULL anyway), and d_seq will guard all its validity. If I'm
> > wrong, correct me kindly...
> > 
> > Otherwise, it looks good to me...
> > Reviewed-by: Gao Xiang 
> > 
> 
> While d_parent can't be set to NULL, it can still be changed concurrently.
> So we need READ_ONCE() to ensure that a consistent value is used.

If I understand correctly, unlazy RCU->ref-walk will be guarded by
seqlock, and for ref-walk we have d_lock (and even parent lock)
in relative paths. So I prematurely think no race of renaming or
unlinking evenually.

I'm curious about that if experts could correct me about this.

Thanks,
Gao Xiang

> 
> - Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] ext4: fix race conditions in ->d_compare() and ->d_hash()

2020-01-23 Thread Gao Xiang via Linux-f2fs-devel
Hi Eric,

On Thu, Jan 23, 2020 at 08:12:34PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
> ->d_parent and ->d_inode can be concurrently modified, and in
> particular, ->d_inode may be changed to NULL.  For ext4_d_hash() this
> resulted in a reproducible NULL dereference if a lookup is done in a
> directory being deleted, e.g. with:
> 
>   int main()
>   {
>   if (fork()) {
>   for (;;) {
>   mkdir("subdir", 0700);
>   rmdir("subdir");
>   }
>   } else {
>   for (;;)
>   access("subdir/file", 0);
>   }
>   }
> 
> ... or by running the 't_encrypted_d_revalidate' program from xfstests.
> Both repros work in any directory on a filesystem with the encoding
> feature, even if the directory doesn't actually have the casefold flag.
> 
> I couldn't reproduce a crash in ext4_d_compare(), but it appears that a
> similar crash is possible there.
> 
> Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
> falling back to the case sensitive behavior if the inode is NULL.
> 
> Reported-by: Al Viro 
> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
> Cc:  # v5.2+
> Signed-off-by: Eric Biggers 
> ---
>  fs/ext4/dir.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 8964778aabefb..0129d14629881 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, 
> unsigned int len,
> const char *str, const struct qstr *name)
>  {
>   struct qstr qstr = {.name = str, .len = len };
> - struct inode *inode = dentry->d_parent->d_inode;
> + const struct dentry *parent = READ_ONCE(dentry->d_parent);

I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
won't be NULL anyway), and d_seq will guard all its validity. If I'm
wrong, correct me kindly...

Otherwise, it looks good to me...
Reviewed-by: Gao Xiang 

Thanks,
Gao Xiang


> + const struct inode *inode = READ_ONCE(parent->d_inode);
>  
> - if (!IS_CASEFOLDED(inode) || !EXT4_SB(inode->i_sb)->s_encoding) {
> + if (!inode || !IS_CASEFOLDED(inode) ||
> + !EXT4_SB(inode->i_sb)->s_encoding) {
>   if (len != name->len)
>   return -1;
>   return memcmp(str, name->name, len);
> @@ -686,10 +688,11 @@ static int ext4_d_hash(const struct dentry *dentry, 
> struct qstr *str)
>  {
>   const struct ext4_sb_info *sbi = EXT4_SB(dentry->d_sb);
>   const struct unicode_map *um = sbi->s_encoding;
> + const struct inode *inode = READ_ONCE(dentry->d_inode);
>   unsigned char *norm;
>   int len, ret = 0;
>  
> - if (!IS_CASEFOLDED(dentry->d_inode) || !um)
> + if (!inode || !IS_CASEFOLDED(inode) || !um)
>   return 0;
>  
>   norm = kmalloc(PATH_MAX, GFP_ATOMIC);
> -- 
> 2.25.0
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5] fs: introduce is_dot_or_dotdot helper for cleanup

2019-12-11 Thread Gao Xiang via Linux-f2fs-devel
Hi Matthew,

On Wed, Dec 11, 2019 at 05:40:14AM -0800, Matthew Wilcox wrote:
> On Wed, Dec 11, 2019 at 03:17:11PM +0800, Gao Xiang wrote:
> > > static inline bool is_dot_or_dotdot(const unsigned char *name, size_t len)
> > > {
> > > if (len >= 1 && unlikely(name[0] == '.')) {
> > 
> > 
> > And I suggest drop "unlikely" here since files start with prefix
> > '.' (plus specical ".", "..") are not as uncommon as you expected...
> 
> They absolutely are uncommon.  Even if you just consider
> /home/willy/kernel/linux/.git/config, only one of those six path elements
> starts with a '.'.

Okay, I think it depends on userdata and access patterns.
I admit I have no statistics on all those callers.

Just considering introducing an inline helper for cleanup, except for
lookup_one_len_common() (since it's on an error path), others were all
without unlikely() before.

Ignore my words if it seems unreasonable or unlikely() is an improvement
in this patch and sorry for annoying.

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5] fs: introduce is_dot_or_dotdot helper for cleanup

2019-12-10 Thread Gao Xiang
On Wed, Dec 11, 2019 at 02:38:34PM +0800, Tiezhu Yang wrote:
> On 12/11/2019 12:47 PM, Al Viro wrote:
> > On Wed, Dec 11, 2019 at 11:59:40AM +0800, Tiezhu Yang wrote:
> > 
> > > static inline bool is_dot_or_dotdot(const unsigned char *name, size_t len)
> > > {
> > >  if (len == 1 && name[0] == '.')
> > >  return true;
> > > 
> > >  if (len == 2 && name[0] == '.' && name[1] == '.')
> > >  return true;
> > > 
> > >  return false;
> > > }
> > > 
> > > Hi Matthew,
> > > 
> > > How do you think? I think the performance influence is very small
> > > due to is_dot_or_dotdot() is a such short static inline function.
> > It's a very short inline function called on a very hot codepath.
> > Often.
> > 
> > I mean it - it's done literally for every pathname component of
> > every pathname passed to a syscall.
> 
> OK. I understand. Let us do not use the helper function in fs/namei.c,
> just use the following implementation for other callers:
> 
> static inline bool is_dot_or_dotdot(const unsigned char *name, size_t len)
> {
> if (len >= 1 && unlikely(name[0] == '.')) {


And I suggest drop "unlikely" here since files start with prefix
'.' (plus specical ".", "..") are not as uncommon as you expected...


Thanks,
Gao Xiang


> if (len < 2 || (len == 2 && name[1] == '.'))
> return true;
> }
> 
> return false;
> }
> 
> Special thanks for Matthew, Darrick, Al and Eric.
> If you have any more suggestion, please let me know.
> 
> Thanks,
> 
> Tiezhu Yang
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Potential data corruption?

2019-12-08 Thread Gao Xiang via Linux-f2fs-devel
Hi,

On Sun, Dec 08, 2019 at 09:15:55PM +0800, Hongwei Qin wrote:
> Hi,
> 
> On Sun, Dec 8, 2019 at 12:01 PM Chao Yu  wrote:
> >
> > Hello,
> >
> > On 2019-12-7 18:10, 锟斤拷锟秸碉拷锟斤拷锟斤拷锟斤拷 wrote:
> > > Hi F2FS experts,
> > > The following confuses me:
> > >
> > > A typical fsync() goes like this:
> > > 1) Issue data block IOs
> > > 2) Wait for completion
> > > 3) Issue chained node block IOs
> > > 4) Wait for completion
> > > 5) Issue flush command
> > >
> > > In order to preserve data consistency under sudden power failure, it 
> > > requires that the storage device persists data blocks prior to node 
> > > blocks.
> > > Otherwise, under sudden power failure, it's possible that the persisted 
> > > node block points to NULL data blocks.
> >
> > Firstly it doesn't break POSIX semantics, right? since fsync() didn't return
> > successfully before sudden power-cut, so we can not guarantee that data is 
> > fully
> > persisted in such condition.
> >
> > However, what you want looks like atomic write semantics, which mostly 
> > database
> > want to guarantee during db file update.
> >
> > F2FS has support atomic_write via ioctl, which is used by SQLite 
> > officially, I
> > guess you can check its implementation detail.
> >
> > Thanks,
> >
> 
> Thanks for your kind reply.
> It's true that if we meet power failure before fsync() completes,
> POSIX doen't require FS to recover the file. However, consider the
> following situation:
> 
> 1) Data block IOs (Not persisted)
> 2) Node block IOs (All Persisted)
> 3) Power failure
> 
> Since the node blocks are all persisted before power failure, the node
> chain isn't broken. Note that this file's new data is not properly
> persisted before crash. So the recovery process should be able to
> recognize this situation and avoid recover this file. However, since
> the node chain is not broken, perhaps the recovery process will regard
> this file as recoverable?

As my own limited understanding, I'm afraid it seems true for extreme case.
Without proper FLUSH command, newer nodes could be recovered but no newer
data persisted.

So if fsync() is not successful, the old data should be readed
but for this case, unexpected data (not A or A', could be random data
C) will be considered validly since its node is ok.

It seems it should FLUSH data before the related node chain written or
introduce some data checksum though.

If I am wrong, kindly correct me...

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 4/8] vfs: Fold casefolding into vfs

2019-12-02 Thread Gao Xiang
On Mon, Dec 02, 2019 at 09:10:45PM -0800, Daniel Rosenberg wrote:
> Ext4 and F2fs are both using casefolding, and they, along with any other
> filesystem that adds the feature, will be using identical dentry_ops.
> Additionally, those dentry ops interfere with the dentry_ops required
> for fscrypt once we add support for casefolding and encryption.
> Moving this into the vfs removes code duplication as well as the
> complication with encryption.
> 
> Currently this is pretty close to just moving the existing f2fs/ext4
> code up a level into the vfs, although there is a lot of room for
> improvement now.
> 
> Signed-off-by: Daniel Rosenberg 

I'm afraid that such vfs modification is unneeded.

Just a quick glance it seems just can be replaced by introducing some
.d_cmp, .d_hash helpers (or with little modification) and most non-Android
emulated storage files are not casefolded (even in Android).

"those dentry ops interfere with the dentry_ops required for fscrypt",
I don't think it's a real diffculty and it could be done with some
better approach instead.

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 10/10] errno.h: Provide EFSCORRUPTED for everybody

2019-11-03 Thread Gao Xiang
On Sun, Nov 03, 2019 at 08:45:06PM -0500, Valdis Kletnieks wrote:
> There's currently 6 filesystems that have the same #define. Move it
> into errno.h so it's defined in just one place.
> 
> Signed-off-by: Valdis Kletnieks 
> Acked-by: Darrick J. Wong 
> Reviewed-by: Jan Kara 
> Acked-by: Theodore Ts'o 

For EROFS part,

Acked-by: Gao Xiang 

> ---
>  drivers/staging/exfat/exfat.h| 2 --
>  fs/erofs/internal.h  | 2 --
>  fs/ext4/ext4.h   | 1 -
>  fs/f2fs/f2fs.h   | 1 -
>  fs/xfs/xfs_linux.h   | 1 -
>  include/linux/jbd2.h | 1 -
>  include/uapi/asm-generic/errno.h | 1 +
>  7 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/exfat/exfat.h b/drivers/staging/exfat/exfat.h
> index 72cf40e123de..58b091a077e8 100644
> --- a/drivers/staging/exfat/exfat.h
> +++ b/drivers/staging/exfat/exfat.h
> @@ -30,8 +30,6 @@
>  #undef DEBUG
>  #endif
>  
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
>  #define DENTRY_SIZE  32  /* dir entry size */
>  #define DENTRY_SIZE_BITS 5
>  
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 544a453f3076..3980026a8882 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -425,7 +425,5 @@ static inline int z_erofs_init_zip_subsystem(void) { 
> return 0; }
>  static inline void z_erofs_exit_zip_subsystem(void) {}
>  #endif   /* !CONFIG_EROFS_FS_ZIP */
>  
> -#define EFSCORRUPTEDEUCLEAN /* Filesystem is corrupted */
> -
>  #endif   /* __EROFS_INTERNAL_H */
>  
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 03db3e71676c..a86c2585457d 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3396,6 +3396,5 @@ static inline int ext4_buffer_uptodate(struct 
> buffer_head *bh)
>  #endif   /* __KERNEL__ */
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif   /* _EXT4_H */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4024790028aa..04ebe77569a3 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3752,6 +3752,5 @@ static inline bool is_journalled_quota(struct 
> f2fs_sb_info *sbi)
>  }
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif /* _LINUX_F2FS_H */
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index ca15105681ca..3409d02a7d21 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -123,7 +123,6 @@ typedef __u32 xfs_nlink_t;
>  
>  #define ENOATTR  ENODATA /* Attribute not found */
>  #define EWRONGFS EINVAL  /* Mount with wrong filesystem type */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
>  
>  #define SYNCHRONIZE()barrier()
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 603fbc4e2f70..69411d7e0431 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1657,6 +1657,5 @@ static inline tid_t  
> jbd2_get_latest_transaction(journal_t *journal)
>  #endif   /* __KERNEL__ */
>  
>  #define EFSBADCRCEBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>  
>  #endif   /* _LINUX_JBD2_H */
> diff --git a/include/uapi/asm-generic/errno.h 
> b/include/uapi/asm-generic/errno.h
> index cf9c51ac49f9..1d5ffdf54cb0 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -98,6 +98,7 @@
>  #define  EINPROGRESS 115 /* Operation now in progress */
>  #define  ESTALE  116 /* Stale file handle */
>  #define  EUCLEAN 117 /* Structure needs cleaning */
> +#define  EFSCORRUPTEDEUCLEAN

BTW, minor, how about adding some comments right after EFSCORRUPTED
like the other error codes although it's now an alias...
Just my personal thought.

Thanks,
Gao Xiang

>  #define  ENOTNAM 118 /* Not a XENIX named type file */
>  #define  ENAVAIL 119 /* No XENIX semaphores available */
>  #define  EISNAM  120 /* Is a named type file */
> -- 
> 2.24.0.rc1
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: support data compression

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
Hi Eric,

(add some mm folks...)

On Wed, Oct 30, 2019 at 09:50:56AM -0700, Eric Biggers wrote:



> > >>>
> > >>> It isn't really appropriate to create fake pagecache pages like this.  
> > >>> Did you
> > >>> consider changing f2fs to use fscrypt_decrypt_block_inplace() instead?
> > >>
> > >> We need to store i_crypto_info and iv index somewhere, in order to pass 
> > >> them to
> > >> fscrypt_decrypt_block_inplace(), where did you suggest to store them?
> > >>
> > > 
> > > The same place where the pages are stored.
> > 
> > Still we need allocate space for those fields, any strong reason to do so?
> > 
> 
> page->mapping set implies that the page is a pagecache page.  Faking it could
> cause problems with code elsewhere.

Not very related with this patch. Faking page->mapping was used in zsmalloc 
before
nonLRU migration (see material [1]) and use in erofs now (page->mapping to 
indicate
nonLRU short lifetime temporary page type, page->private is used for per-page 
information),
as far as I know, NonLRU page without PAGE_MAPPING_MOVABLE set is safe for most 
mm code.

On the other hands, I think NULL page->mapping will waste such field in precious
page structure... And we can not get such page type directly only by a NULL --
a truncated file page or just allocated page or some type internal temporary 
pages...

So I have some proposal is to use page->mapping to indicate specific page type 
for
such nonLRU pages (by some common convention, e.g. some real structure, rather 
than
just zero out to waste 8 bytes, it's also natural to indicate some page type by
its `mapping' naming )... Since my English is not very well, I delay it util 
now...

[1] https://elixir.bootlin.com/linux/v3.18.140/source/mm/zsmalloc.c#L379

https://lore.kernel.org/linux-mm/1459321935-3655-7-git-send-email-minc...@kernel.org
and some not very related topic: https://lwn.net/Articles/752564/

Thanks,
Gao Xiang



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
On Wed, Oct 30, 2019 at 09:33:13AM -0700, Jaegeuk Kim wrote:
> On 10/30, Theodore Y. Ts'o wrote:
> > On Wed, Oct 30, 2019 at 11:50:37PM +0800, Gao Xiang wrote:
> > > 
> > > So I'm curious about the original issue in commit 740432f83560
> > > ("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
> > > bios with its internal fio but it seems the commit is not helpful to
> > > resolve potential mempool deadlock (I'm confused since no calltrace,
> > > maybe I'm wrong)...
> > 
> > Two possibilities come to mind.  (a) It may be that on older kernels
> > (when f2fs is backported to older Board Support Package kernels from
> > the SOC vendors) didn't have the bio_alloc() guarantee, so it was
> > necessary on older kernels, but not on upstream, or (b) it wasn't
> > *actually* possible for bio_alloc() to fail and someone added the
> > error handling in 740432f83560 out of paranoia.
> 
> Yup, I was checking old device kernels but just stopped digging it out.
> Instead, I hesitate to apply this patch since I can't get why we need to
> get rid of this code for clean-up purpose. This may be able to bring
> some hassles when backporting to android/device kernels.

Yes, got you concern. As I said in other patches for many times, since
you're the maintainer of f2fs, it's all up to you (I'm not paranoia).
However, I think there are 2 valid reasons:

 1) As a newbie of Linux filesystem. When I study or work on f2fs,
and I saw these misleading code, I think I will produce similar
code in the future (not everyone refers comments above bio_alloc),
so such usage will spread (since one could refer some sample code
from exist code);

 2) Since it's upstream, I personally think appropriate cleanup is ok (anyway
it kills net 20+ line dead code), and this patch I think isn't so harmful
for backporting.

Thanks,
Gao Xiang

> 
> > 
> > (Hence my suggestion that in the ext4 version of the patch, we add a
> > code comment justifying why there was no error checking, to make it
> > clear that this was a deliberate choice.  :-)
> > 
> > - Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang via Linux-f2fs-devel
Hi Ted,

On Wed, Oct 30, 2019 at 11:14:45AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Oct 30, 2019 at 06:43:45PM +0800, Gao Xiang wrote:
> > > You're right, in low memory scenario, allocation with bioset will be 
> > > faster, as
> > > you mentioned offline, maybe we can add/use a priviate bioset like btrfs 
> > > did
> > > rather than using global one, however, we'd better check how deadlock 
> > > happen
> > > with a bioset mempool first ...
> > 
> > Okay, hope to get hints from Jaegeuk and redo this patch then...
> 
> It's not at all clear to me that using a private bioset is a good idea
> for f2fs.  That just means you're allocating a separate chunk of
> memory just for f2fs, as opposed to using the global pool.  That's an
> additional chunk of non-swapable kernel memory that's not going to be
> available, in *addition* to the global mempool.  
> 
> Also, who else would you be contending for space with the global
> mempool?  It's not like an mobile handset is going to have other users
> of the global bio mempool.
> 
> On a low-end mobile handset, memory is at a premium, so wasting memory
> to no good effect isn't going to be a great idea.

Thanks for your reply. I agree with your idea.

Actually I think after this version patch is applied, all are the same
as the previous status (whether some deadlock or not).

So I'm curious about the original issue in commit 740432f83560
("f2fs: handle failed bio allocation"). Since f2fs manages multiple write
bios with its internal fio but it seems the commit is not helpful to
resolve potential mempool deadlock (I'm confused since no calltrace,
maybe I'm wrong)...

I think it should be gotten clear first and think how to do next..
(I tend not to add another private bioset since it's unshareable as you
 said as well...)

Thanks,
Gao Xiang

> 
> Regards,
> 
>   - Ted
> 
>
 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang
On Wed, Oct 30, 2019 at 05:27:54PM +0800, Chao Yu wrote:
> Hi Xiang,
> 
> On 2019/10/30 17:15, Gao Xiang wrote:
> > Hi Chao,
> > 
> > On Wed, Oct 30, 2019 at 04:56:17PM +0800, Chao Yu wrote:
> >> On 2019/10/30 11:55, Gao Xiang wrote:
> >>> remove such useless code and related fault injection.
> >>
> >> Hi Xiang,
> >>
> >> Although, there is so many 'nofail' allocation in f2fs, I think we'd better
> >> avoid such allocation as much as possible (now for read path, we may allow 
> >> to
> >> fail to allocate bio), I suggest to keep the failure path and bio 
> >> allocation
> >> injection.
> >>
> >> It looks bio_alloc() will use its own mempool, which may suffer deadlock
> >> potentially. So how about changing to use bio_alloc_bioset(, , NULL) 
> >> instead of
> >> bio_alloc()?
> > 
> > Yes, I noticed the original commit 740432f83560 ("f2fs: handle failed bio 
> > allocation"),
> > yet I don't find any real call trace clue what happened before.
> > 
> > As my understanding, if we allocate bios without submit_bio (I mean write 
> > path) with
> > default bs and gfp_flags GFP_NOIO or GFP_KERNEL, I think it will be slept 
> > inside
> > mempool rather than return NULL to its caller... Please correct me if I'm 
> > wrong...
> 
> I'm curious too...
> 
> Jaegeuk may know the details.
> 
> > 
> > I could send another patch with bio_alloc_bioset(, , NULL), I am curious to 
> > know the
> > original issue and how it solved though...
> > 
> > For read or flush path, since it will submit_bio and bio_alloc one by one, 
> > I think
> > mempool will get a page quicker (memory failure path could be longer). But 
> > I can
> > send a patch just by using bio_alloc_bioset(, , NULL) instead as you 
> > suggested later.
> 
> You're right, in low memory scenario, allocation with bioset will be faster, 
> as
> you mentioned offline, maybe we can add/use a priviate bioset like btrfs did
> rather than using global one, however, we'd better check how deadlock happen
> with a bioset mempool first ...

Okay, hope to get hints from Jaegeuk and redo this patch then...

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> Signed-off-by: Gao Xiang 
> >>> ---
> >>>  Documentation/filesystems/f2fs.txt |  1 -
> >>>  fs/f2fs/data.c |  6 ++
> >>>  fs/f2fs/f2fs.h | 21 -
> >>>  fs/f2fs/segment.c  |  5 +
> >>>  fs/f2fs/super.c|  1 -
> >>>  5 files changed, 3 insertions(+), 31 deletions(-)
> >>>
> >>> diff --git a/Documentation/filesystems/f2fs.txt 
> >>> b/Documentation/filesystems/f2fs.txt
> >>> index 7e1991328473..3477c3e4c08b 100644
> >>> --- a/Documentation/filesystems/f2fs.txt
> >>> +++ b/Documentation/filesystems/f2fs.txt
> >>> @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
> >>> injection type, should be
> >>> FAULT_KVMALLOC0x2
> >>> FAULT_PAGE_ALLOC  0x4
> >>> FAULT_PAGE_GET0x8
> >>> -   FAULT_ALLOC_BIO   0x00010
> >>> FAULT_ALLOC_NID   0x00020
> >>> FAULT_ORPHAN  0x00040
> >>> FAULT_BLOCK   0x00080
> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> >>> index 5755e897a5f0..3b88dcb15de6 100644
> >>> --- a/fs/f2fs/data.c
> >>> +++ b/fs/f2fs/data.c
> >>> @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info 
> >>> *fio, int npages)
> >>>   struct f2fs_sb_info *sbi = fio->sbi;
> >>>   struct bio *bio;
> >>>  
> >>> - bio = f2fs_bio_alloc(sbi, npages, true);
> >>> + bio = bio_alloc(GFP_NOIO, npages);
> >>>  
> >>>   f2fs_target_device(sbi, fio->new_blkaddr, bio);
> >>>   if (is_read_io(fio->op)) {
> >>> @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
> >>> *inode, block_t blkaddr,
> >>>   struct bio_post_read_ctx *ctx;
> >>>   unsigned int post_read

Re: [f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-30 Thread Gao Xiang
Hi Chao,

On Wed, Oct 30, 2019 at 04:56:17PM +0800, Chao Yu wrote:
> On 2019/10/30 11:55, Gao Xiang wrote:
> > remove such useless code and related fault injection.
> 
> Hi Xiang,
> 
> Although, there is so many 'nofail' allocation in f2fs, I think we'd better
> avoid such allocation as much as possible (now for read path, we may allow to
> fail to allocate bio), I suggest to keep the failure path and bio allocation
> injection.
> 
> It looks bio_alloc() will use its own mempool, which may suffer deadlock
> potentially. So how about changing to use bio_alloc_bioset(, , NULL) instead 
> of
> bio_alloc()?

Yes, I noticed the original commit 740432f83560 ("f2fs: handle failed bio 
allocation"),
yet I don't find any real call trace clue what happened before.

As my understanding, if we allocate bios without submit_bio (I mean write path) 
with
default bs and gfp_flags GFP_NOIO or GFP_KERNEL, I think it will be slept inside
mempool rather than return NULL to its caller... Please correct me if I'm 
wrong...

I could send another patch with bio_alloc_bioset(, , NULL), I am curious to 
know the
original issue and how it solved though...

For read or flush path, since it will submit_bio and bio_alloc one by one, I 
think
mempool will get a page quicker (memory failure path could be longer). But I can
send a patch just by using bio_alloc_bioset(, , NULL) instead as you suggested 
later.

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> > Signed-off-by: Gao Xiang 
> > ---
> >  Documentation/filesystems/f2fs.txt |  1 -
> >  fs/f2fs/data.c |  6 ++
> >  fs/f2fs/f2fs.h | 21 -
> >  fs/f2fs/segment.c  |  5 +
> >  fs/f2fs/super.c|  1 -
> >  5 files changed, 3 insertions(+), 31 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/f2fs.txt 
> > b/Documentation/filesystems/f2fs.txt
> > index 7e1991328473..3477c3e4c08b 100644
> > --- a/Documentation/filesystems/f2fs.txt
> > +++ b/Documentation/filesystems/f2fs.txt
> > @@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault 
> > injection type, should be
> > FAULT_KVMALLOC  0x2
> > FAULT_PAGE_ALLOC0x4
> > FAULT_PAGE_GET  0x8
> > -   FAULT_ALLOC_BIO 0x00010
> > FAULT_ALLOC_NID 0x00020
> > FAULT_ORPHAN0x00040
> > FAULT_BLOCK 0x00080
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 5755e897a5f0..3b88dcb15de6 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info 
> > *fio, int npages)
> > struct f2fs_sb_info *sbi = fio->sbi;
> > struct bio *bio;
> >  
> > -   bio = f2fs_bio_alloc(sbi, npages, true);
> > +   bio = bio_alloc(GFP_NOIO, npages);
> >  
> > f2fs_target_device(sbi, fio->new_blkaddr, bio);
> > if (is_read_io(fio->op)) {
> > @@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode 
> > *inode, block_t blkaddr,
> > struct bio_post_read_ctx *ctx;
> > unsigned int post_read_steps = 0;
> >  
> > -   bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
> > -   if (!bio)
> > -   return ERR_PTR(-ENOMEM);
> > +   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
> > f2fs_target_device(sbi, blkaddr, bio);
> > bio->bi_end_io = f2fs_read_end_io;
> > bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4024790028aa..40012f874be0 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -44,7 +44,6 @@ enum {
> > FAULT_KVMALLOC,
> > FAULT_PAGE_ALLOC,
> > FAULT_PAGE_GET,
> > -   FAULT_ALLOC_BIO,
> > FAULT_ALLOC_NID,
> > FAULT_ORPHAN,
> > FAULT_BLOCK,
> > @@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
> > kmem_cache *cachep,
> > return entry;
> >  }
> >  
> > -static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
> > -   int npages, bool no_fail)
> > -{
> > -   struct bio *bio;
> > -
> > -   if (no_fail) {
> > -   /* No failure on bio allocation */
> > -   bio = bio_alloc(GFP_NOIO, npages);
> > -   if (!bio)

[f2fs-dev] [PATCH] f2fs: bio_alloc should never fail

2019-10-29 Thread Gao Xiang
remove such useless code and related fault injection.

Signed-off-by: Gao Xiang 
---
 Documentation/filesystems/f2fs.txt |  1 -
 fs/f2fs/data.c |  6 ++
 fs/f2fs/f2fs.h | 21 -
 fs/f2fs/segment.c  |  5 +
 fs/f2fs/super.c|  1 -
 5 files changed, 3 insertions(+), 31 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 7e1991328473..3477c3e4c08b 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -172,7 +172,6 @@ fault_type=%d  Support configuring fault injection 
type, should be
FAULT_KVMALLOC  0x2
FAULT_PAGE_ALLOC0x4
FAULT_PAGE_GET  0x8
-   FAULT_ALLOC_BIO 0x00010
FAULT_ALLOC_NID 0x00020
FAULT_ORPHAN0x00040
FAULT_BLOCK 0x00080
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5755e897a5f0..3b88dcb15de6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -288,7 +288,7 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, 
int npages)
struct f2fs_sb_info *sbi = fio->sbi;
struct bio *bio;
 
-   bio = f2fs_bio_alloc(sbi, npages, true);
+   bio = bio_alloc(GFP_NOIO, npages);
 
f2fs_target_device(sbi, fio->new_blkaddr, bio);
if (is_read_io(fio->op)) {
@@ -682,9 +682,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, 
block_t blkaddr,
struct bio_post_read_ctx *ctx;
unsigned int post_read_steps = 0;
 
-   bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
-   if (!bio)
-   return ERR_PTR(-ENOMEM);
+   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
f2fs_target_device(sbi, blkaddr, bio);
bio->bi_end_io = f2fs_read_end_io;
bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4024790028aa..40012f874be0 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -44,7 +44,6 @@ enum {
FAULT_KVMALLOC,
FAULT_PAGE_ALLOC,
FAULT_PAGE_GET,
-   FAULT_ALLOC_BIO,
FAULT_ALLOC_NID,
FAULT_ORPHAN,
FAULT_BLOCK,
@@ -2210,26 +2209,6 @@ static inline void *f2fs_kmem_cache_alloc(struct 
kmem_cache *cachep,
return entry;
 }
 
-static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
-   int npages, bool no_fail)
-{
-   struct bio *bio;
-
-   if (no_fail) {
-   /* No failure on bio allocation */
-   bio = bio_alloc(GFP_NOIO, npages);
-   if (!bio)
-   bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
-   return bio;
-   }
-   if (time_to_inject(sbi, FAULT_ALLOC_BIO)) {
-   f2fs_show_injection_info(FAULT_ALLOC_BIO);
-   return NULL;
-   }
-
-   return bio_alloc(GFP_KERNEL, npages);
-}
-
 static inline bool is_idle(struct f2fs_sb_info *sbi, int type)
 {
if (sbi->gc_mode == GC_URGENT)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 808709581481..28457c878d0d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -552,10 +552,7 @@ static int __submit_flush_wait(struct f2fs_sb_info *sbi,
struct bio *bio;
int ret;
 
-   bio = f2fs_bio_alloc(sbi, 0, false);
-   if (!bio)
-   return -ENOMEM;
-
+   bio = bio_alloc(GFP_KERNEL, 0);
bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
bio_set_dev(bio, bdev);
ret = submit_bio_wait(bio);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1443cee15863..51945dd27f00 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -44,7 +44,6 @@ const char *f2fs_fault_name[FAULT_MAX] = {
[FAULT_KVMALLOC]= "kvmalloc",
[FAULT_PAGE_ALLOC]  = "page alloc",
[FAULT_PAGE_GET]= "page get",
-   [FAULT_ALLOC_BIO]   = "alloc bio",
[FAULT_ALLOC_NID]   = "alloc nid",
[FAULT_ORPHAN]  = "orphan",
[FAULT_BLOCK]   = "no more block",
-- 
2.17.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix comment of f2fs_evict_inode

2019-09-28 Thread Gao Xiang
On Sun, Sep 29, 2019 at 08:53:05AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2019/9/28 2:31, Jaegeuk Kim wrote:
> > Hi Chao,
> > 
> > On 09/25, Chao Yu wrote:
> >> evict() should be called once i_count is zero, rather than i_nlinke
> >> is zero.
> >>
> >> Reported-by: Gao Xiang 
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/inode.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> >> index db4fec30c30d..8262f4a483d3 100644
> >> --- a/fs/f2fs/inode.c
> >> +++ b/fs/f2fs/inode.c
> >> @@ -632,7 +632,7 @@ int f2fs_write_inode(struct inode *inode, struct 
> >> writeback_control *wbc)
> >>  }
> >>  
> >>  /*
> >> - * Called at the last iput() if i_nlink is zero
> > 
> > I don't think this comment is wrong. You may be able to add on top of this.
> 
> It actually misleads the developer or user.
> 
> How do you think of:
> 
> "Called at the last iput() if i_count is zero, and will release all meta/data
> blocks allocated in the inode if i_nlink is zero"

(sigh... side note: I just took some time to check the original meaning
 out of curiosity. AFAIK, the above word was added in Linux-2.1.45 [1]
 due to ext2_delete_inode behavior, which is called when i_nlink == 0,
 and .delete_inode was gone in 2010 (commit 72edc4d0873b merge ext2
 delete_inode and clear_inode, switch to ->evict_inode()), it may be
 helpful to understand the story so I write here for later folks reference.
 And it's also good to just kill it. )

+
+/*
+ * Called at the last iput() if i_nlink is zero.
+ */
+void ext2_delete_inode (struct inode * inode)
+{
+   if (inode->i_ino == EXT2_ACL_IDX_INO ||
inode->i_ino == EXT2_ACL_DATA_INO)
return;
inode->u.ext2_i.i_dtime = CURRENT_TIME;
-   inode->i_dirt = 1;
+   mark_inode_dirty(inode);
ext2_update_inode(inode, IS_SYNC(inode));
inode->i_size = 0;
if (inode->i_blocks)
@@ -248,7 +258,7 @@
if (IS_SYNC(inode) || inode->u.ext2_i.i_osync)
ext2_sync_inode (inode);
else
-   inode->i_dirt = 1;
+   mark_inode_dirty(inode);
return result;
 }

+void iput(struct inode *inode)
 {
-   struct inode * inode = get_empty_inode();
+   if (inode) {
+   struct super_operations *op = NULL;
 
-   PIPE_BASE(*inode) = (char*)__get_free_page(GFP_USER);
-   if (!(PIPE_BASE(*inode))) {
-   iput(inode);
-   return NULL;
+   if (inode->i_sb && inode->i_sb->s_op)
+   op = inode->i_sb->s_op;
+   if (op && op->put_inode)
+   op->put_inode(inode);
+
+   spin_lock(_lock);
+   if (!--inode->i_count) {
+   if (!inode->i_nlink) {
+   list_del(>i_hash);
+   INIT_LIST_HEAD(>i_hash);
+   if (op && op->delete_inode) {
+   void (*delete)(struct inode *) = 
op->delete_inode;
+   spin_unlock(_lock);
+   delete(inode);
+   spin_lock(_lock);
+   }
+   }
+   if (list_empty(>i_hash)) {
+   list_del(>i_list);
+   list_add(>i_list, _unused);
+   }
+   }
+   spin_unlock(_lock);
}

[1] https://www.kernel.org/pub/linux/kernel/v2.1/patch-2.1.45.xz

Thanks,
Gao Xiang

> 
> Thanks,
> 
> > 
> >> + * Called at the last iput() if i_count is zero
> >>   */
> >>  void f2fs_evict_inode(struct inode *inode)
> >>  {
> >> -- 
> >> 2.18.0.rc1
> > .
> > 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [PATCH] f2fs: fix comment of f2fs_evict_inode

2019-09-27 Thread Gao Xiang
Hi Jaegeuk,

On Fri, Sep 27, 2019 at 11:31:50AM -0700, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On 09/25, Chao Yu wrote:
> > evict() should be called once i_count is zero, rather than i_nlinke
> > is zero.
> > 
> > Reported-by: Gao Xiang 
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > index db4fec30c30d..8262f4a483d3 100644
> > --- a/fs/f2fs/inode.c
> > +++ b/fs/f2fs/inode.c
> > @@ -632,7 +632,7 @@ int f2fs_write_inode(struct inode *inode, struct 
> > writeback_control *wbc)
> >  }
> >  
> >  /*
> > - * Called at the last iput() if i_nlink is zero
> 
> I don't think this comment is wrong. You may be able to add on top of this.

Actually I don't really care what this line means, but someone really
told me that .evict_inode() is called on inode is finally removed
because he saw this line.

In practice, I have no idea what the above line (especially the word i_nlink
== 0) mainly emphasizes, just from some documentation (not even refer some 
code):

Documentation/filesystems/porting.rst
326 **mandatory**
327 
328 ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
329 be used instead.  It gets called whenever the inode is evicted, whether it 
has
330 remaining links or not. 

And it seems it's the same comment exists in ext2/ext4. But yes, it's up
to you. However, it misleaded someone and I had to explain more about this.

Thanks,
Gao Xiang

> 
> > + * Called at the last iput() if i_count is zero
> >   */
> >  void f2fs_evict_inode(struct inode *inode)
> >  {
> > -- 
> > 2.18.0.rc1


Re: [f2fs-dev] [PATCH] f2fs: fix comment of f2fs_evict_inode

2019-09-25 Thread Gao Xiang
Hi Chao,

On Wed, Sep 25, 2019 at 05:30:50PM +0800, Chao Yu wrote:
> evict() should be called once i_count is zero, rather than i_nlinke
> is zero.
> 
> Reported-by: Gao Xiang 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index db4fec30c30d..8262f4a483d3 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -632,7 +632,7 @@ int f2fs_write_inode(struct inode *inode, struct 
> writeback_control *wbc)
>  }
>  
>  /*
> - * Called at the last iput() if i_nlink is zero
> + * Called at the last iput() if i_count is zero

Yeah, I'd suggest taking some time to look at other
inconsistent comments, it makes other folks confused
and ask me with such-"strong" reason...

Thanks,
Gao Xiang

>   */
>  void f2fs_evict_inode(struct inode *inode)
>  {
> -- 
> 2.18.0.rc1
> 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Gao Xiang via Linux-f2fs-devel
Hi Ted,

On Tue, Aug 20, 2019 at 12:25:10PM -0400, Theodore Y. Ts'o wrote:
> On Tue, Aug 20, 2019 at 01:12:36PM +0800, Gao Xiang wrote:
> > Add a word, I have some little concern about post read procession order
> > a bit as I mentioned before, because I'd like to move common EROFS
> > decompression code out in the future as well for other fses to use
> > after we think it's mature enough.
> > 
> > It seems the current code mainly addresses eliminating duplicated code,
> > therefore I have no idea about that...
> 
> Actually, we should chat.  I was actually thinking about "borrowing"
> code from erofs to provide ext4-specific compression.  I was really
> impressed with the efficiency goals in the erofs design[1] when I
> reviewed the Usenix ATC paper, and as the saying goes, the best
> artists know how to steal from the best.  :-)
> 
> [1] https://www.usenix.org/conference/atc19/presentation/gao

I also guessed it's you reviewed our work as well from some written words :)
(even though it's analymous...) and I personally think there are some
useful stuffs in our EROFS effort.

> 
> My original specific thinking was to do code reuse by copy and paste,
> simply because it was simpler, and I have limited time to work on it.
> But if you are interested in making the erofs pipeline reusable by
> other file systems, and have the time to do the necessary code
> refactoring, I'd love to work with you on that.

Yes, I have interest in making the erofs pipeline for generic fses.
Now I'm still investigating sequential read on very high speed NVME
(like SAMSUNG 970pro, one thread seq read >3GB/s), it seems it still
has some optimization space.

And then I will do that work for generic fses as well... (but the first
thing I want to do is getting erofs out of staging, as Greg said [1])

Metadata should be designed for each fs like ext4, but maybe not flexible
and compacted as EROFS, therefore it could be some extra metadata
overhead than EROFS.

[1] https://lore.kernel.org/lkml/20190618064523.ga6...@kroah.com/

> 
> It should be noted that the f2fs developers have been working on their
> own compression scheme that was going to be f2fs-specific, unlike the
> file system generic approach used with fsverity and fscrypt.
> 
> My expectation is that we will need to modify the read pipeling code
> to support compression.  That's true whether we are looking at the
> existing file system-specific code used by ext4 and f2fs or in some
> combined work such as what Chandan has proposed.

I think either form is fine with me. :) But it seems that is some minor
which tree we will work on (Maybe Chandan's work will be merged then).

The first thing I need to do is to tidy up the code, and making it more
general, and then it will be very easy for fses to integrate :)

Thanks,
Gao Xiang


> 
> Cheers,
> 
>   - Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-19 Thread Gao Xiang
On Tue, Aug 20, 2019 at 01:12:36PM +0800, Gao Xiang wrote:
> Hi Chandan,
> 
> On Tue, Aug 20, 2019 at 10:35:29AM +0530, Chandan Rajendra wrote:
> > On Friday, August 16, 2019 11:48 AM Chandan Rajendra wrote:
> > > F2FS has a copy of "post read processing" code using which encrypted
> > > file data is decrypted. This commit replaces it to make use of the
> > > generic read_callbacks facility.
> > > 
> > > Signed-off-by: Chandan Rajendra 
> > 
> > Hi Eric and Ted,
> > 
> > Looks like F2FS requires a lot more flexiblity than what can be offered by
> > read callbacks i.e.
> > 
> > 1. F2FS wants to make use of its own workqueue for decryption, verity and
> >decompression.
> > 2. F2FS' decompression code is not an FS independent entity like fscrypt and
> >fsverity. Hence they would need Filesystem specific callback functions to
> >be invoked from "read callbacks". 
> > 
> > Hence I would suggest that we should drop F2FS changes made in this
> > patchset. Please let me know your thoughts on this.
> 
> Add a word, I have some little concern about post read procession order

FYI. Just a minor concern about its flexibility, not big though.
https://lore.kernel.org/r/20190808042640.GA28630@138/

Thanks,
Gao Xiang

> a bit as I mentioned before, because I'd like to move common EROFS
> decompression code out in the future as well for other fses to use
> after we think it's mature enough.
> 
> It seems the current code mainly addresses eliminating duplicated code,
> therefore I have no idea about that...
> 
> Thanks,
> Gao Xiang
> 
> > 
> > -- 
> > chandan
> > 
> > 
> > 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-19 Thread Gao Xiang
Hi Chandan,

On Tue, Aug 20, 2019 at 10:35:29AM +0530, Chandan Rajendra wrote:
> On Friday, August 16, 2019 11:48 AM Chandan Rajendra wrote:
> > F2FS has a copy of "post read processing" code using which encrypted
> > file data is decrypted. This commit replaces it to make use of the
> > generic read_callbacks facility.
> > 
> > Signed-off-by: Chandan Rajendra 
> 
> Hi Eric and Ted,
> 
> Looks like F2FS requires a lot more flexiblity than what can be offered by
> read callbacks i.e.
> 
> 1. F2FS wants to make use of its own workqueue for decryption, verity and
>decompression.
> 2. F2FS' decompression code is not an FS independent entity like fscrypt and
>fsverity. Hence they would need Filesystem specific callback functions to
>be invoked from "read callbacks". 
> 
> Hence I would suggest that we should drop F2FS changes made in this
> patchset. Please let me know your thoughts on this.

Add a word, I have some little concern about post read procession order
a bit as I mentioned before, because I'd like to move common EROFS
decompression code out in the future as well for other fses to use
after we think it's mature enough.

It seems the current code mainly addresses eliminating duplicated code,
therefore I have no idea about that...

Thanks,
Gao Xiang

> 
> -- 
> chandan
> 
> 
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] f2fs: ratelimit recovery messages

2019-05-27 Thread Gao Xiang
Hi Sahitya,

On 2019/5/28 11:17, Chao Yu wrote:
> Hi Sahitya,
> 
> On 2019/5/28 11:05, Sahitya Tummala wrote:
>> Hi Chao,
>>
>> On Tue, May 28, 2019 at 09:23:15AM +0800, Chao Yu wrote:
>>> Hi Sahitya,
>>>
>>> On 2019/5/27 21:10, Sahitya Tummala wrote:
>>>> Ratelimit the recovery logs, which are expected in case
>>>> of sudden power down and which could result into too
>>>> many prints.
>>>
>>> FYI
>>>
>>> https://lore.kernel.org/patchwork/patch/973837/
>>>
>>> IMO, we need those logs to provide evidence during trouble-shooting of file 
>>> data
>>> corruption or file missing problem...
>>>
>> In one of the logs, I have noticed there were ~400 recovery prints in the
> 
> I think its order of magnitudes is not such bad, if there is redundant logs 
> such
> as the one in do_recover_data(), we can improve it.
> 
>> kernel bootup. I noticed your patch above and with that now we can always get
>> the error returned by f2fs_recover_fsync_data(), which should be good enough
>> for knowing the status of recovered files I thought. Do you think we need
>> individually each file status as well?
> 
> Yes, I think so, we need them for the detailed info. :)

I personally agree with Chao's suggestion as well.

Sometimes huawei got stuck into rare potential f2fs stability issues,
which is hard to say whether it is a clearly hardware or software issues.

These messages is used as some evidences for us to guess what happened.
it'd better to handle carefully...

Thanks,
Gao Xiang

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> So I suggest we can keep log as it is in recover_dentry/recover_inode, and 
>>> for
>>> the log in do_recover_data, we can record recovery info [isize_kept,
>>> recovered_count, err ...] into struct fsync_inode_entry, and print them in
>>> batch, how do you think?
>>>
>>> Thanks,
>>>
>>>>
>>>> Signed-off-by: Sahitya Tummala 
>>>> ---
>>>> v2:
>>>>  - fix minor formatting and add new line for printk
>>>>
>>>>  fs/f2fs/recovery.c | 18 +-
>>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
>>>> index e04f82b..60d7652 100644
>>>> --- a/fs/f2fs/recovery.c
>>>> +++ b/fs/f2fs/recovery.c
>>>> @@ -188,8 +188,8 @@ static int recover_dentry(struct inode *inode, struct 
>>>> page *ipage,
>>>>name = "";
>>>>else
>>>>name = raw_inode->i_name;
>>>> -  f2fs_msg(inode->i_sb, KERN_NOTICE,
>>>> -  "%s: ino = %x, name = %s, dir = %lx, err = %d",
>>>> +  printk_ratelimited(KERN_NOTICE
>>>> +  "%s: ino = %x, name = %s, dir = %lx, err = %d\n",
>>>>__func__, ino_of_node(ipage), name,
>>>>IS_ERR(dir) ? 0 : dir->i_ino, err);
>>>>return err;
>>>> @@ -292,8 +292,8 @@ static int recover_inode(struct inode *inode, struct 
>>>> page *page)
>>>>else
>>>>name = F2FS_INODE(page)->i_name;
>>>>  
>>>> -  f2fs_msg(inode->i_sb, KERN_NOTICE,
>>>> -  "recover_inode: ino = %x, name = %s, inline = %x",
>>>> +  printk_ratelimited(KERN_NOTICE
>>>> +  "recover_inode: ino = %x, name = %s, inline = %x\n",
>>>>ino_of_node(page), name, raw->i_inline);
>>>>return 0;
>>>>  }
>>>> @@ -642,11 +642,11 @@ static int do_recover_data(struct f2fs_sb_info *sbi, 
>>>> struct inode *inode,
>>>>  err:
>>>>f2fs_put_dnode();
>>>>  out:
>>>> -  f2fs_msg(sbi->sb, KERN_NOTICE,
>>>> -  "recover_data: ino = %lx (i_size: %s) recovered = %d, err = %d",
>>>> -  inode->i_ino,
>>>> -  file_keep_isize(inode) ? "keep" : "recover",
>>>> -  recovered, err);
>>>> +  printk_ratelimited(KERN_NOTICE
>>>> +  "recover_data: ino = %lx (i_size: %s) recovered = %d, 
>>>> err = %d\n",
>>>> +  inode->i_ino,
>>>> +  file_keep_isize(inode) ? "keep" : "recover",
>>>> +  recovered, err);
>>>>return err;
>>>>  }
>>>>  
>>>>
>>
> 
> 
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid deadlock of atomic file operations

2019-02-25 Thread Gao Xiang



On 2019/2/25 17:34, Chao Yu wrote:
> Hi Xiang,
> 
> On 2019/2/25 17:25, Gao Xiang wrote:
>> Hi Chao,
>>
>> On 2019/2/25 17:11, Chao Yu wrote:
>>> Thread AThread B
>>> - __fput
>>>  - f2fs_release_file
>>>   - drop_inmem_pages
>>>- mutex_lock(>inmem_lock)
>>>- __revoke_inmem_pages
>>> - lock_page(page)
>>> - open
>>> - f2fs_setattr
>>> - truncate_setsize
>>>  - truncate_inode_pages_range
>>>   - lock_page(page)
>>>   - truncate_cleanup_page
>>>- f2fs_invalidate_page
>>> - drop_inmem_page
>>> - mutex_lock(>inmem_lock);
>>>
>>> We may encounter above ABBA deadlock as reported by Kyungtae Kim:
>>>
>>> I'm reporting a bug in linux-4.17.19: "INFO: task hung in
>>> drop_inmem_page" (no reproducer)
>>>
>>> I think this might be somehow related to the following:
>>> https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
>>>
>>> =
>>> INFO: task syz-executor7:10822 blocked for more than 120 seconds.
>>>   Not tainted 4.17.19 #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> syz-executor7   D27024 10822   6346 0x0004
>>> Call Trace:
>>>  context_switch kernel/sched/core.c:2867 [inline]
>>>  __schedule+0x721/0x1e60 kernel/sched/core.c:3515
>>>  schedule+0x88/0x1c0 kernel/sched/core.c:3559
>>>  schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
>>>  __mutex_lock_common kernel/locking/mutex.c:833 [inline]
>>>  __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
>>>  mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
>>>  drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
>>>  f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
>>>  do_invalidatepage mm/truncate.c:165 [inline]
>>>  truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
>>>  truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
>>>  truncate_inode_pages mm/truncate.c:478 [inline]
>>>  truncate_pagecache+0x6d/0x90 mm/truncate.c:801
>>>  truncate_setsize+0x81/0xa0 mm/truncate.c:826
>>>  f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
>>>  notify_change+0xa62/0xe80 fs/attr.c:313
>>>  do_truncate+0x12e/0x1e0 fs/open.c:63
>>>  do_last fs/namei.c:2955 [inline]
>>>  path_openat+0x2042/0x29f0 fs/namei.c:3505
>>>  do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
>>>  do_sys_open+0x35e/0x4e0 fs/open.c:1101
>>>  __do_sys_open fs/open.c:1119 [inline]
>>>  __se_sys_open fs/open.c:1114 [inline]
>>>  __x64_sys_open+0x89/0xc0 fs/open.c:1114
>>>  do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> RIP: 0033:0x4497b9
>>> RSP: 002b:7f734e459c68 EFLAGS: 0246 ORIG_RAX: 0002
>>> RAX: ffda RBX: 7f734e45a6cc RCX: 004497b9
>>> RDX: 0104 RSI: 000a8280 RDI: 2080
>>> RBP: 0071bea0 R08:  R09: 
>>> R10:  R11: 0246 R12: 
>>> R13: 7230 R14: 006f02d0 R15: 7f734e45a700
>>> INFO: task syz-executor7:10858 blocked for more than 120 seconds.
>>>   Not tainted 4.17.19 #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> syz-executor7   D28880 10858   6346 0x0004
>>> Call Trace:
>>>  context_switch kernel/sched/core.c:2867 [inline]
>>>  __schedule+0x721/0x1e60 kernel/sched/core.c:3515
>>>  schedule+0x88/0x1c0 kernel/sched/core.c:3559
>>>  __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
>>>  rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
>>>  call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
>>>  __down_write arch/x86/include/asm/rwsem.h:142 [inline]
>>>  down_write+0x58/0xa0 kernel/locking/rwsem.c:72
>>>  inode_lock include/linux/fs.h:713 [inline]
>>&

Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid deadlock of atomic file operations

2019-02-25 Thread Gao Xiang


Sorry, please ignore my reply... (If there no
truncate and commit race...)

On 2019/2/25 17:25, Gao Xiang wrote:
> Hi Chao,
> 
> On 2019/2/25 17:11, Chao Yu wrote:
>> Thread A Thread B
>> - __fput
>>  - f2fs_release_file
>>   - drop_inmem_pages
>>- mutex_lock(>inmem_lock)
>>- __revoke_inmem_pages
>> - lock_page(page)
>>  - open
>>  - f2fs_setattr
>>  - truncate_setsize
>>   - truncate_inode_pages_range
>>- lock_page(page)
>>- truncate_cleanup_page
>> - f2fs_invalidate_page
>>  - drop_inmem_page
>>  - mutex_lock(>inmem_lock);
>>
>> We may encounter above ABBA deadlock as reported by Kyungtae Kim:
>>
>> I'm reporting a bug in linux-4.17.19: "INFO: task hung in
>> drop_inmem_page" (no reproducer)
>>
>> I think this might be somehow related to the following:
>> https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
>>
>> =
>> INFO: task syz-executor7:10822 blocked for more than 120 seconds.
>>   Not tainted 4.17.19 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> syz-executor7   D27024 10822   6346 0x0004
>> Call Trace:
>>  context_switch kernel/sched/core.c:2867 [inline]
>>  __schedule+0x721/0x1e60 kernel/sched/core.c:3515
>>  schedule+0x88/0x1c0 kernel/sched/core.c:3559
>>  schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
>>  __mutex_lock_common kernel/locking/mutex.c:833 [inline]
>>  __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
>>  mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
>>  drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
>>  f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
>>  do_invalidatepage mm/truncate.c:165 [inline]
>>  truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
>>  truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
>>  truncate_inode_pages mm/truncate.c:478 [inline]
>>  truncate_pagecache+0x6d/0x90 mm/truncate.c:801
>>  truncate_setsize+0x81/0xa0 mm/truncate.c:826
>>  f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
>>  notify_change+0xa62/0xe80 fs/attr.c:313
>>  do_truncate+0x12e/0x1e0 fs/open.c:63
>>  do_last fs/namei.c:2955 [inline]
>>  path_openat+0x2042/0x29f0 fs/namei.c:3505
>>  do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
>>  do_sys_open+0x35e/0x4e0 fs/open.c:1101
>>  __do_sys_open fs/open.c:1119 [inline]
>>  __se_sys_open fs/open.c:1114 [inline]
>>  __x64_sys_open+0x89/0xc0 fs/open.c:1114
>>  do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x4497b9
>> RSP: 002b:7f734e459c68 EFLAGS: 0246 ORIG_RAX: 0002
>> RAX: ffda RBX: 7f734e45a6cc RCX: 004497b9
>> RDX: 0104 RSI: 000a8280 RDI: 2080
>> RBP: 0071bea0 R08:  R09: 
>> R10:  R11: 0246 R12: 
>> R13: 7230 R14: 006f02d0 R15: 7f734e45a700
>> INFO: task syz-executor7:10858 blocked for more than 120 seconds.
>>   Not tainted 4.17.19 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> syz-executor7   D28880 10858   6346 0x0004
>> Call Trace:
>>  context_switch kernel/sched/core.c:2867 [inline]
>>  __schedule+0x721/0x1e60 kernel/sched/core.c:3515
>>  schedule+0x88/0x1c0 kernel/sched/core.c:3559
>>  __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
>>  rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
>>  call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
>>  __down_write arch/x86/include/asm/rwsem.h:142 [inline]
>>  down_write+0x58/0xa0 kernel/locking/rwsem.c:72
>>  inode_lock include/linux/fs.h:713 [inline]
>>  do_truncate+0x120/0x1e0 fs/open.c:61
>>  do_last fs/namei.c:2955 [inline]
>>  path_openat+0x2042/0x29f0 fs/namei.c:3505
>>  do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
>>  do_sys_open+0x35e/0x4e0 fs/open.c:1101
>>  __do_sys_open fs/open.c:1119 [inline]
>>  __se_sys_open

Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid deadlock of atomic file operations

2019-02-25 Thread Gao Xiang
000 R09: 
> R10:  R11: 0246 R12: 
> R13: 7230 R14: 006f02d0 R15: 7f734e3b5700
> INFO: task syz-executor5:10829 blocked for more than 120 seconds.
>   Not tainted 4.17.19 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor5   D28760 10829   6308 0x8002
> Call Trace:
>  context_switch kernel/sched/core.c:2867 [inline]
>  __schedule+0x721/0x1e60 kernel/sched/core.c:3515
>  schedule+0x88/0x1c0 kernel/sched/core.c:3559
>  io_schedule+0x21/0x80 kernel/sched/core.c:5179
>  wait_on_page_bit_common mm/filemap.c:1100 [inline]
>  __lock_page+0x2b5/0x390 mm/filemap.c:1273
>  lock_page include/linux/pagemap.h:483 [inline]
>  __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
>  drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
>  f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
>  __fput+0x2c7/0x780 fs/file_table.c:209
>  fput+0x1a/0x20 fs/file_table.c:243
>  task_work_run+0x151/0x1d0 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x8ba/0x30a0 kernel/exit.c:865
>  do_group_exit+0x13b/0x3a0 kernel/exit.c:968
>  get_signal+0x6bb/0x1650 kernel/signal.c:2482
>  do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
>  exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>  do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4497b9
> RSP: 002b:7f1c68e74ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX: fe00 RBX: 0071bf80 RCX: 004497b9
> RDX:  RSI:  RDI: 0071bf80
> RBP: 0071bf80 R08:  R09: 0071bf58
> R10:  R11: 0246 R12: 
> R13:  R14: 7f1c68e759c0 R15: 7f1c68e75700
> 
> This patch tries to use trylock_page to mitigate such deadlock condition
> for fix.
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
> - fix wrong mutex_unlock position.
>  fs/f2fs/segment.c | 43 +++
>  1 file changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index fdd8cd21522f..ca786913b2c6 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -215,7 +215,8 @@ void f2fs_register_inmem_page(struct inode *inode, struct 
> page *page)
>  }
>  
>  static int __revoke_inmem_pages(struct inode *inode,
> - struct list_head *head, bool drop, bool recover)
> + struct list_head *head, bool drop, bool recover,
> + bool trylock)
>  {
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   struct inmem_pages *cur, *tmp;
> @@ -227,7 +228,16 @@ static int __revoke_inmem_pages(struct inode *inode,
>   if (drop)
>   trace_f2fs_commit_inmem_page(page, INMEM_DROP);
>  
> - lock_page(page);
> + if (trylock) {
> + /*
> +  * to avoid deadlock in between page lock and
> +  * inmem_lock.
> +  */
> + if (!trylock_page(page))
> + continue;

Will it cause memleak? since revoke_list is a temporary local linked list...
Is there a better way than just simplily skip it?

Thanks,
Gao Xiang

> + } else {
> + lock_page(page);
> + }
>  
>   f2fs_wait_on_page_writeback(page, DATA, true, true);
>  
> @@ -318,13 +328,19 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>  
> - mutex_lock(>inmem_lock);
> - __revoke_inmem_pages(inode, >inmem_pages, true, false);
> - spin_lock(>inode_lock[ATOMIC_FILE]);
> - if (!list_empty(>inmem_ilist))
> - list_del_init(>inmem_ilist);
> - spin_unlock(>inode_lock[ATOMIC_FILE]);
> - mutex_unlock(>inmem_lock);
> + while (!list_empty(>inmem_pages)) {
> + mutex_lock(>inmem_lock);
> + __revoke_inmem_pages(inode, >inmem_pages,
> + true, false, true);
> +
> + if (list_empty(>inmem_pages)) {
> + spin_lock(>inode_lock[ATOMIC_FILE]);
> + if (!list_empty(>inmem_ilist))
&g

[f2fs-dev] [PATCH] f2fs: no need to take page lock in readdir

2019-02-20 Thread Gao Xiang
VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.

This patch improves concurrency since .iterate_shared
was introduced to VFS years ago.

Signed-off-by: Gao Xiang 
---

 personally tend to use read_mapping_page here, but it seems
 that f2fs has some remaining customized code since it was
 merged into Linux, use f2fs_find_data_page instead.

 fs/f2fs/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index ecc3a4e2be96..64602bc1e092 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -873,7 +873,7 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
page_cache_sync_readahead(inode->i_mapping, ra, file, n,
min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES));
 
-   dentry_page = f2fs_get_lock_data_page(inode, n, false);
+   dentry_page = f2fs_find_data_page(inode, n);
if (IS_ERR(dentry_page)) {
err = PTR_ERR(dentry_page);
if (err == -ENOENT) {
@@ -891,11 +891,11 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
err = f2fs_fill_dentries(ctx, ,
n * NR_DENTRY_IN_BLOCK, );
if (err) {
-   f2fs_put_page(dentry_page, 1);
+   f2fs_put_page(dentry_page, 0);
break;
}
 
-   f2fs_put_page(dentry_page, 1);
+   f2fs_put_page(dentry_page, 0);
}
 out_free:
fscrypt_fname_free_buffer();
-- 
2.14.4



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: silence VM_WARN_ON_ONCE in mempool_alloc

2019-02-18 Thread Gao Xiang
Note that __GFP_ZERO is not supported for mempool_alloc,
which also documented in the mempool_alloc comments.

Signed-off-by: Gao Xiang 
---
 fs/f2fs/data.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f91d8630c9a2..83c14b31aaba 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -299,9 +299,10 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi,
for (; start < F2FS_IO_SIZE(sbi); start++) {
struct page *page =
mempool_alloc(sbi->write_io_dummy,
-   GFP_NOIO | __GFP_ZERO | __GFP_NOFAIL);
+ GFP_NOIO | __GFP_NOFAIL);
f2fs_bug_on(sbi, !page);
 
+   zero_user_segment(page, 0, PAGE_SIZE);
SetPagePrivate(page);
set_page_private(page, (unsigned 
long)DUMMY_WRITTEN_PAGE);
lock_page(page);
-- 
2.14.4



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: use xattr_prefix to wrap up

2019-01-25 Thread Gao Xiang
Let's use xattr_prefix instead of open code.
No logic changes.

Signed-off-by: Gao Xiang 
---
 fs/f2fs/xattr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index 18d5ffbc5e8c..fa620d31ea5f 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -538,7 +538,7 @@ ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, 
size_t buffer_size)
if (!handler || (handler->list && !handler->list(dentry)))
continue;
 
-   prefix = handler->prefix ?: handler->name;
+   prefix = xattr_prefix(handler);
prefix_len = strlen(prefix);
size = prefix_len + entry->e_name_len + 1;
if (buffer) {
-- 
2.14.4



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-26 Thread Gao Xiang via Linux-f2fs-devel
Hi Eric,

On 2018/8/27 1:04, Eric Biggers wrote:
> Hi Chuck,
> 
> On Sun, Aug 26, 2018 at 11:55:57AM -0400, Chuck Lever wrote:
>>> +
>>> +/**
>>> + * fsverity_verify_page - verify a data page
>>> + *
>>> + * Verify a page that has just been read from a file against that file's 
>>> Merkle
>>> + * tree.  The page is assumed to be a pagecache page.
>>> + *
>>> + * Return: true if the page is valid, else false.
>>> + */
>>> +bool fsverity_verify_page(struct page *data_page)
>>> +{
>>> +   struct inode *inode = data_page->mapping->host;
>>> +   const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +   struct ahash_request *req;
>>> +   bool valid;
>>> +
>>> +   req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);

Some minor suggestions occurred to me after I saw this part of code
again before sleeping...

1) How about introducing an iterator callback to avoid too many
ahash_request_alloc and ahash_request_free... (It seems too many pages
and could be some slower than fsverity_verify_bio...)

2) How about adding a gfp_t input argument since I don't know whether
GFP_KERNEL is suitable for all use cases...

It seems there could be more fsverity_verify_page users as well as
fsverity_verify_bio ;)

Sorry for interruption...

Thanks,
Gao Xiang

>>> +   if (unlikely(!req))
>>> +   return false;
>>> +
>>> +   valid = verify_page(inode, vi, req, data_page);
>>> +
>>> +   ahash_request_free(req);
>>> +
>>> +   return valid;
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
>>> +
>>> +/**
>>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
>>> + *
>>> + * Verify a set of pages that have just been read from a file against that
>>> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  
>>> Pages that
>>> + * fail verification are set to the Error state.  Verification is skipped 
>>> for
>>> + * pages already in the Error state, e.g. due to fscrypt decryption 
>>> failure.
>>> + */
>>> +void fsverity_verify_bio(struct bio *bio)
>>
>> Hi Eric-
>>
>> This kind of API won't work for remote filesystems, which do not use
>> "struct bio" to do their I/O. Could a remote filesystem solely use
>> fsverity_verify_page instead?
>>
> 
> Yes, filesystems don't have to use fsverity_verify_bio().  They can call
> fsverity_verify_page() on each page instead.  I will clarify this in the next
> revision of the patchset.
> 
> - Eric
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-26 Thread Gao Xiang via Linux-f2fs-devel
Hi Ted,

Sorry for the late reply...

On 2018/8/26 1:06, Theodore Y. Ts'o wrote:
> On Sat, Aug 25, 2018 at 03:43:43PM +0800, Gao Xiang wrote:
>>> I don't know of any plan to use fs-verity on Android's system partition or 
>>> to
>>> replace dm-verity on the system partition.  The use cases so far have been
>>> verifying files on /data, like APK files.
>>>
>>> So I don't think you need to support fs-verity in EROFS.
>>
>> Thanks for your information about fs-verity, that is quite useful for us
>> Actually, I was worrying about that these months...  :)
> 
> I'll be even clearer --- I can't *imagine* any situation where it
> would make sense to use fs-verity on the Android system partition.
> Remember, for OTA to work the system image has to be bit-for-bit
> identical to the official golden image for that release.  So the
> system image has to be completely locked down from any modification
> (to data or metadata), and that means dm-verity and *NOT* fs-verity.

I think so mainly because of the security reason you said above.

In addition, I think it is mandatory that the Android system partition
should also _never_ suffer from filesystem corrupted by design (expect
for the storage device corrupt or malware), therefore I think the
bit-for-bit read-only, and identical-verity requirement is quite strong
for Android, which will make the Android system steady and as solid as
rocks.

But I need to make sure my personal thoughts through this topic. :)

> 
> The initial use of fs-verity (as you can see if you look at AOSP) will
> be to protect a small number of privileged APK's that are stored on
> the data partition.  Previously, they were verified when they were
> downloaded, and never again.
> 
> Part of the goal which we are trying to achieve here is that even if
> the kernel gets compromised by a 0-day, a successful reboot should
> restore the system to a known state.  That is, the secure bootloader
> checks the signature of the kernel, and then in turn, dm-verity will
> verify the root Merkle hash protecting the system partition, and
> fs-verity will protect the privileged APK's.  If malware modifies any
> these components in an attempt to be persistent, the modifications
> would be detected, and the worst it could do is to cause subsequent
> reboots to fail until the phone's software could be reflashed.
> 

Yeah, I have seen the the fs-verity presentation and materials from
Android bootcamp and other official channels before.


Thanks for your kindly detailed explanation. :)


Best regards,
Gao Xiang

> Cheers,
> 
>   - Ted
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-25 Thread Gao Xiang
Hi Ted,

Please ignore the following email, Eric has replied to me. :)
I need to dig into these fs-verity patches later and best wishes to fs-verity.

Thanks,
Gao Xiang

On 2018/8/25 15:33, Gao Xiang wrote:
> Hi Ted,
> 
> Thanks for your detailed reply. Sorry about my english, the words could not 
> be logical.
> 
> Tiny pieces in B-tree to compose a page is too far from us too, and you are 
> right,
> fs-verity is complete for >99% cases for the existed file system, and no need 
> to worry about currently.
> 
> As I mentioned in reply to Eric, I am actually curious about the Google 
> fs-verity roadmap
> for the future Android, I need to analyze if it is only limited to APKs for 
> the read-write partitions
> and not to replace dm-verity in the near future since fs-verity has some 
> conflicts
> to EROFS I am working on I mentioned in the email to Eric.
> 
> I think it is more than just to handle FILE_MAPPING and bio-strict for 
> compression use.
> 
> On 2018/8/25 13:06, Theodore Y. Ts'o wrote:
>> But I'd suggest worrying about it when such a file system
>> comes out of the woodwork, and someone is willing to do the work to
>> integrate fserity in that file system.
>>
> Yes, we are now handling partial page due to compression use.
> 
> fs could submit bios in pages from different mapping(FILE_MAPPING[compress 
> in-place and no caching
> compressed page to reduce extra memory overhead] or META_MAPPING [for caching 
> compressed page]) and
> they could be decompressed into many full pages and (possible) a partial page 
> (in-place or out-of-place).
> 
> so in principle, since we have BIO_MAX_PAGES limitation, a filemap page could 
> be Uptodate
> after two bios is ended and decompressed. and other runtime limitations could 
> also divide a bio into two bios for encoded cases.
> 
> Therefore, I think in that case we could not just consider FILE_MAPPING and 
> one bio, and as you said `In
> that case, it could call fsverity after assembling the page in the page 
> cache.' should be done in this way.
> 
>> Well, the userspace interface for instantiating a fs-verity file is
>> that it writes the file data with the fs-verity metadata (which
>> consists of the Merkle tree with a fs-verity header at the end of the
>> file).  The program (which might be a package manager such as dpkg or
>> rpm) would then call an ioctl which would cause the file system to
>> read the fs-verity header and make only the file data visible, and the
>> file system would the verify the data as it is read into the page
>> cache.
> Thanks for your reply again, I think fs-verity is good enough for now.
> However, I need to think over about fs-verity itself more... :(
> 
> Thanks,
> Gao Xiang

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-25 Thread Gao Xiang
Hi Eric,

Thanks for your detailed reply.

My english is not quite well, I could not type logically and quickly like you 
and could use some words improperly,
I just want to express my personal concern, please understand, thanks. :)

On 2018/8/25 12:16, Eric Biggers wrote:
> Hi Gao,
> 
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> Hi,
>>
>> On 2018/8/25 0:16, Eric Biggers wrote:
>>> +/**
>>> + * fsverity_verify_page - verify a data page
>>> + *
>>> + * Verify a page that has just been read from a file against that file's 
>>> Merkle
>>> + * tree.  The page is assumed to be a pagecache page.
>>> + *
>>> + * Return: true if the page is valid, else false.
>>> + */
>>> +bool fsverity_verify_page(struct page *data_page)
>>> +{
>>> +   struct inode *inode = data_page->mapping->host;
>>> +   const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +   struct ahash_request *req;
>>> +   bool valid;
>>> +
>>> +   req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> +   if (unlikely(!req))
>>> +   return false;
>>> +
>>> +   valid = verify_page(inode, vi, req, data_page);
>>> +
>>> +   ahash_request_free(req);
>>> +
>>> +   return valid;
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
>>> +
>>> +/**
>>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
>>> + *
>>> + * Verify a set of pages that have just been read from a file against that
>>> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  
>>> Pages that
>>> + * fail verification are set to the Error state.  Verification is skipped 
>>> for
>>> + * pages already in the Error state, e.g. due to fscrypt decryption 
>>> failure.
>>> + */
>>> +void fsverity_verify_bio(struct bio *bio)
>>> +{
>>> +   struct inode *inode = bio_first_page_all(bio)->mapping->host;
>>> +   const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +   struct ahash_request *req;
>>> +   struct bio_vec *bv;
>>> +   int i;
>>> +
>>> +   req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> +   if (unlikely(!req)) {
>>> +   bio_for_each_segment_all(bv, bio, i)
>>> +   SetPageError(bv->bv_page);
>>> +   return;
>>> +   }
>>> +
>>> +   bio_for_each_segment_all(bv, bio, i) {
>>> +   struct page *page = bv->bv_page;
>>> +
>>> +   if (!PageError(page) && !verify_page(inode, vi, req, page))
>>> +   SetPageError(page);
>>> +   }
>>> +
>>> +   ahash_request_free(req);
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
>>
>> Out of curiosity, I quickly scanned the fs-verity source code and some minor 
>> question out there
>>
>> If something is wrong, please point out, thanks in advance...
>>
>> My first question is that 'Is there any way to skip to verify pages in a 
>> bio?'
>> I am thinking about
>> If metadata and data page are mixed in a filesystem of such kind, they could 
>> submit together in a bio, but metadata could be unsuitable for such kind of 
>> verification.
>>
> 
> Pages below i_size are verified, pages above are not.
> 
> With my patches, ext4 and f2fs won't actually submit pages in both areas in 
> the
> same bio, and they won't call the fs-verity verification function for bios in
> the data area.  But even if they did, there's also a check in verify_page() 
> that

I think you mean the hash area?
Yes, I understand your design. It is a wonderful job for ext4/f2fs for now as 
Ted said.

> skips the verification if the page is above i_size.
>

I think it could not be as simple as you said for all cases.

If some fs submits contiguous access with different MAPPING (something like 
mixed FILE_MAPPING and META_MAPPING),
their page->index are actually unreliable(could be logical page index for 
FILE_MAPPING,and physical page index for META_MAPPING),
and data are organized by design in multi bios for a fs-specific use (such as 
compresssion).

You couldn't do such verification `if the page is above i_size' and it could be 
hard to integrate somehow.

>> The second question is related to the first question --- 'Is there any way 
>> to verify a partial page?'
>> Take scalability into consideration, some files could be totally inlined or 
>> partially inlined in m

Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-24 Thread Gao Xiang
Hi Ted,

On 2018/8/25 11:45, Theodore Y. Ts'o wrote:
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> My first question is that 'Is there any way to skip to verify pages in a 
>> bio?'
>> I am thinking about
>> If metadata and data page are mixed in a filesystem of such kind, they could 
>> submit together in a bio, but metadata could be unsuitable for such kind of 
>> verification.
>>
>> The second question is related to the first question --- 'Is there any way 
>> to verify a partial page?'
>> Take scalability into consideration, some files could be totally inlined or 
>> partially inlined in metadata.
>> Is there any way to deal with them in per-file approach? at least --- 
>> support for the interface?
> A requirement of both fscrypt and fsverity is that is that block size
> == page size, and that all data is stored in blocks.  Inline data is
> not supported.
> 
> The files that are intended for use with fsverity are large files
> (such as APK files), so optimizing for files smaller than a block was
> not a design goal.

Thanks for your quickly reply. :)

I had seen the background of why Google/Android introduces fs-verity before.

> 

But I have some consideration than the current implementation (if it is 
suitable to discuss, thanks...)

1) Since it is the libfs-like library, I think bio-strict is too strict for its 
future fs users.

bios could be already organized in filesystem-specific way, which could include 
some other pages that is unnecessary to be verified.

I could give some example, if some filesystem organizes its bios for 
decompression, and some data exist in metadata.
It could be hard to use this libfs-like fsverity interface.

2) My last question
"At last, I hope filesystems could select the on-disk position of hash tree and 
'struct fsverity_descriptor'
rather than fixed in the end of verity files...I think if fs-verity preparing 
such support and interfaces could be better."

is also for some files partially or totally encoded (eg. compressed, or 
whatever ...)

I think the hash tree is unnecessary to be compressed...so I think it could be 
better that it can be selected by users (filesystems of course).

Thanks,
Gao Xiang.

>   - Ted

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH 02/10] fs-verity: add data verification hooks for ->readpages()

2018-08-24 Thread Gao Xiang
Hi,

On 2018/8/25 0:16, Eric Biggers wrote:
> +/**
> + * fsverity_verify_page - verify a data page
> + *
> + * Verify a page that has just been read from a file against that file's 
> Merkle
> + * tree.  The page is assumed to be a pagecache page.
> + *
> + * Return: true if the page is valid, else false.
> + */
> +bool fsverity_verify_page(struct page *data_page)
> +{
> + struct inode *inode = data_page->mapping->host;
> + const struct fsverity_info *vi = get_fsverity_info(inode);
> + struct ahash_request *req;
> + bool valid;
> +
> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> + if (unlikely(!req))
> + return false;
> +
> + valid = verify_page(inode, vi, req, data_page);
> +
> + ahash_request_free(req);
> +
> + return valid;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> +
> +/**
> + * fsverity_verify_bio - verify a 'read' bio that has just completed
> + *
> + * Verify a set of pages that have just been read from a file against that
> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages 
> that
> + * fail verification are set to the Error state.  Verification is skipped for
> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> + */
> +void fsverity_verify_bio(struct bio *bio)
> +{
> + struct inode *inode = bio_first_page_all(bio)->mapping->host;
> + const struct fsverity_info *vi = get_fsverity_info(inode);
> + struct ahash_request *req;
> + struct bio_vec *bv;
> + int i;
> +
> + req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> + if (unlikely(!req)) {
> + bio_for_each_segment_all(bv, bio, i)
> + SetPageError(bv->bv_page);
> + return;
> + }
> +
> + bio_for_each_segment_all(bv, bio, i) {
> + struct page *page = bv->bv_page;
> +
> + if (!PageError(page) && !verify_page(inode, vi, req, page))
> + SetPageError(page);
> + }
> +
> + ahash_request_free(req);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);

Out of curiosity, I quickly scanned the fs-verity source code and some minor 
question out there

If something is wrong, please point out, thanks in advance...

My first question is that 'Is there any way to skip to verify pages in a bio?'
I am thinking about
If metadata and data page are mixed in a filesystem of such kind, they could 
submit together in a bio, but metadata could be unsuitable for such kind of 
verification.

The second question is related to the first question --- 'Is there any way to 
verify a partial page?'
Take scalability into consideration, some files could be totally inlined or 
partially inlined in metadata.
Is there any way to deal with them in per-file approach? at least --- support 
for the interface?

At last, I hope filesystems could select the on-disk position of hash tree and 
'struct fsverity_descriptor'
rather than fixed in the end of verity files...I think if fs-verity preparing 
such support and interfaces could be better.hmmm... :(

Thanks,
Gao Xiang


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: avoid the global name 'fault_name'

2018-06-30 Thread Gao Xiang


Oh, I just found the same patch written by Chao at:
https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=f2fs-dev

I just fixed the erofs name conflicts, so fixed f2fs as well.
Please ignore this one.

Thanks, 
Gao Xiang

On 2018/6/30 23:57, Gao Xiang wrote:
> Non-prefix global name 'fault_name' will pollute global
> namespace, fix it.
> 
> Refer to:
> https://lists.01.org/pipermail/kbuild-all/2018-June/049660.html
> 
> To: Jaegeuk Kim 
> To: Chao Yu 
> Cc: linux-f2fs-devel@lists.sourceforge.net
> Cc: linux-ker...@vger.kernel.org
> Reported-by: kbuild test robot 
> Signed-off-by: Gao Xiang 
> ---
>  fs/f2fs/f2fs.h  | 4 ++--
>  fs/f2fs/super.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4d8b1de..11a2e09 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -65,7 +65,7 @@ struct f2fs_fault_info {
>   unsigned int inject_type;
>  };
>  
> -extern char *fault_name[FAULT_MAX];
> +extern char *f2fs_fault_name[FAULT_MAX];
>  #define IS_FAULT_SET(fi, type) ((fi)->inject_type & (1 << (type)))
>  #endif
>  
> @@ -1279,7 +1279,7 @@ struct f2fs_sb_info {
>  #ifdef CONFIG_F2FS_FAULT_INJECTION
>  #define f2fs_show_injection_info(type)   \
>   printk("%sF2FS-fs : inject %s in %s of %pF\n",  \
> - KERN_INFO, fault_name[type],\
> + KERN_INFO, f2fs_fault_name[type],   \
>   __func__, __builtin_return_address(0))
>  static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type)
>  {
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 3995e92..df12dcd 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -41,7 +41,7 @@
>  
>  #ifdef CONFIG_F2FS_FAULT_INJECTION
>  
> -char *fault_name[FAULT_MAX] = {
> +char *f2fs_fault_name[FAULT_MAX] = {
>   [FAULT_KMALLOC] = "kmalloc",
>   [FAULT_KVMALLOC]= "kvmalloc",
>   [FAULT_PAGE_ALLOC]  = "page alloc",
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: avoid the global name 'fault_name'

2018-06-30 Thread Gao Xiang
Non-prefix global name 'fault_name' will pollute global
namespace, fix it.

Refer to:
https://lists.01.org/pipermail/kbuild-all/2018-June/049660.html

To: Jaegeuk Kim 
To: Chao Yu 
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-ker...@vger.kernel.org
Reported-by: kbuild test robot 
Signed-off-by: Gao Xiang 
---
 fs/f2fs/f2fs.h  | 4 ++--
 fs/f2fs/super.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4d8b1de..11a2e09 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -65,7 +65,7 @@ struct f2fs_fault_info {
unsigned int inject_type;
 };
 
-extern char *fault_name[FAULT_MAX];
+extern char *f2fs_fault_name[FAULT_MAX];
 #define IS_FAULT_SET(fi, type) ((fi)->inject_type & (1 << (type)))
 #endif
 
@@ -1279,7 +1279,7 @@ struct f2fs_sb_info {
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 #define f2fs_show_injection_info(type) \
printk("%sF2FS-fs : inject %s in %s of %pF\n",  \
-   KERN_INFO, fault_name[type],\
+   KERN_INFO, f2fs_fault_name[type],   \
__func__, __builtin_return_address(0))
 static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type)
 {
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 3995e92..df12dcd 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -41,7 +41,7 @@
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 
-char *fault_name[FAULT_MAX] = {
+char *f2fs_fault_name[FAULT_MAX] = {
[FAULT_KMALLOC] = "kmalloc",
[FAULT_KVMALLOC]= "kvmalloc",
[FAULT_PAGE_ALLOC]  = "page alloc",
-- 
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RESEND PATCH] f2fs: no need to take the address of the array of sb->s_uuid

2018-04-08 Thread Gao Xiang
Hi Chao and Jaegeuk,

On 2018/4/8 20:16, Chao Yu wrote:
> On 2018/4/5 11:58, Gao Xiang wrote:
>> Keep in line with the common case since it is some weird
>> to take the address of an array again.
> 
> I encounter compile error after applying this patch:
> 
> super.c: In function ‘f2fs_fill_super’:
> super.c:2711:2: error: incompatible type for argument 1 of ‘__builtin_memcpy’
>   memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
>   ^
> super.c:2711:2: note: expected ‘void *’ but argument is of type ‘uuid_t’
> 
> Anyway, we need '&' due to sb::s_uuid is a structure object instead of an 
> array.
> 
> Thanks,
> 
Sorry, I'm developing another feature related to it.
But I checked the source code about sb->uuid from 3.x to 4.9.
https://elixir.bootlin.com/linux/v4.9.93/source/include/linux/fs.h#L1382

It seems that the latest kernel changed from u8 s_uuid[16] to uuid_t s_uuid.

Sorry for annoying, please ignore this patch.

Thanks,

>>
>> Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
>> ---
>> fix auto-wrapping of email client
>>
>>  fs/f2fs/super.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index 9587ca0..4d467c7 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -2565,7 +2565,7 @@ static int f2fs_fill_super(struct super_block *sb, 
>> void *data, int silent)
>>  sb->s_time_gran = 1;
>>  sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
>>  (test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
>> -memcpy(>s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
>> +memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
>>  sb->s_iflags |= SB_I_CGROUPWB;
>>  
>>  /* init f2fs-specific super block info */
>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: no need to take the address of the array of sb->s_uuid

2018-04-04 Thread Gao Xiang
Hi Jaegeuk,

On 2018/4/5 11:54, Jaegeuk Kim wrote:
> Hi Gao,
> 
> Could you please check your email settings?
> It's broken.
> 
> Thanks,Sorry to bother for the pervious email..
My business email client was just in a mess. :(

Thanks,

> 
> On 04/05, Gao Xiang wrote:
>> Keep in line with the common case since it is some weird
>> to take the address of an array again.
>>
>> Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
>> ---
>>  fs/f2fs/super.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index 42d564c5ccd0..f864ab702fa2 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -2701,7 +2701,7 @@ static int f2fs_fill_super(struct super_block *sb,
>> void *data, int silent)
>>  sb->s_time_gran = 1;
>>  sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
>>  (test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
>> -memcpy(>s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
>> +memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
>>  sb->s_iflags |= SB_I_CGROUPWB;
>>  /* init f2fs-specific super block info */
>> -- 
>> 2.12.2

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [RESEND PATCH] f2fs: no need to take the address of the array of sb->s_uuid

2018-04-04 Thread Gao Xiang
Keep in line with the common case since it is some weird
to take the address of an array again.

Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
---
fix auto-wrapping of email client

 fs/f2fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9587ca0..4d467c7 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2565,7 +2565,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_time_gran = 1;
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
-   memcpy(>s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
+   memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
sb->s_iflags |= SB_I_CGROUPWB;
 
/* init f2fs-specific super block info */
-- 
2.1.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: no need to take the address of the array of sb->s_uuid

2018-04-04 Thread Gao Xiang

Keep in line with the common case since it is some weird
to take the address of an array again.

Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
---
 fs/f2fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 42d564c5ccd0..f864ab702fa2 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2701,7 +2701,7 @@ static int f2fs_fill_super(struct super_block *sb, 
void *data, int silent)

sb->s_time_gran = 1;
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
-   memcpy(>s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
+   memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
sb->s_iflags |= SB_I_CGROUPWB;
/* init f2fs-specific super block info */
--
2.12.2


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH RFC v5] f2fs: flush cp pack except cp pack 2 page at first

2018-02-09 Thread Gao Xiang via Linux-f2fs-devel
Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
Reviewed-by: Chao Yu <yuch...@huawei.com>
---
Change log from v4:
  - remove redundant "filemap_fdatawait_range"
  - filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
  - filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);
  - move f2fs_flush_device_cache to a more suitable position
  - wait_on_all_pages_writeback after commit_checkpoint
  - since we remove lots of redundant code, I think it's acceptable
  - and it will ensure one checkpoint safety.
Change log from v3:
  - further review comments are applied from Jaegeuk and Chao
  - Tested on this patch (without multiple-device): mount, boot Android with 
f2fs userdata and make fragment
  - If any problem with this patch or I miss something, please kindly share 
your comments, thanks :)
Change log from v2:
  - Apply the review comments from Chao
Change log from v1:
  - Apply the review comments from Chao
  - time data from "finish block_ops" to " finish checkpoint" (tested on ARM64 
with TOSHIBA 128GB UFS):
 Before patch: 0.002273  0.001973  0.002789  0.005159  0.002050
 After patch: 0.002502  0.001624  0.002487  0.003049  0.002696

 fs/f2fs/checkpoint.c | 69 ++--
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 512dca8..4e352cf 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1162,6 +1162,39 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
spin_unlock_irqrestore(>cp_lock, flags);
 }
 
+static void commit_checkpoint(struct f2fs_sb_info *sbi,
+   void *src, block_t blk_addr)
+{
+   struct writeback_control wbc = {
+   .for_reclaim = 0,
+   };
+
+   /*
+* pagevec_lookup_tag and lock_page again will take
+* some extra time. Therefore, update_meta_pages and
+* sync_meta_pages are combined in this function.
+*/
+   struct page *page = grab_meta_page(sbi, blk_addr);
+   int err;
+
+   memcpy(page_address(page), src, PAGE_SIZE);
+   set_page_dirty(page);
+
+   f2fs_wait_on_page_writeback(page, META, true);
+   f2fs_bug_on(sbi, PageWriteback(page));
+   if (unlikely(!clear_page_dirty_for_io(page)))
+   f2fs_bug_on(sbi, 1);
+
+   /* writeout cp pack 2 page */
+   err = __f2fs_write_meta_page(page, , FS_CP_META_IO);
+   f2fs_bug_on(sbi, err);
+
+   f2fs_put_page(page, 0);
+
+   /* submit checkpoint (with barrier if NOBARRIER is not set) */
+   f2fs_submit_merged_write(sbi, META_FLUSH);
+}
+
 static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
@@ -1264,16 +1297,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
}
}
 
-   /* need to wait for end_io results */
-   wait_on_all_pages_writeback(sbi);
-   if (unlikely(f2fs_cp_error(sbi)))
-   return -EIO;
-
-   /* flush all device cache */
-   err = f2fs_flush_device_cache(sbi);
-   if (err)
-   return err;
-
/* write out checkpoint buffer at block 0 */
update_meta_page(sbi, ckpt, start_blk++);
 
@@ -1301,26 +1324,26 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
start_blk += NR_CURSEG_NODE_TYPE;
}
 
-   /* writeout checkpoint block */
-   update_meta_page(sbi, ckpt, start_blk);
+   /* update user_block_counts */
+   sbi->last_valid_block_count = sbi->total_valid_block_count;
+   percpu_counter_set(>alloc_valid_block_count, 0);
 
-   /* wait for previous submitted node/meta pages writeback */
+   /* Here, we have one bio having CP pack except cp pack 2 page */
+   sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
+
+   /* wait for previous submitted meta pages writeback */
wait_on_all_pages_writeback(sbi);
 
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;
 
-   filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
-   filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);
-
-   /* update user_block_counts */
-   sbi->last_valid_block_count = sbi->total_valid_block_count;
-   percpu_counter_set(>alloc_valid_block_count, 0);
-
-   /* Here, we only have on

Re: [f2fs-dev] [PATCH] f2fs: fix to handle looped node chain during recovery

2018-02-03 Thread Gao Xiang via Linux-f2fs-devel


Sorry, I saw the related code entirely, please ignore these replies.


On 2018/2/3 18:45, Gao Xiang wrote:



On 2018/2/3 18:35, Gao Xiang wrote:

Hi Chao and YunLei,


On 2018/2/3 17:44, Chao Yu wrote:

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into 
dead

loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyun...@huawei.com>
Signed-off-by: Chao Yu <yuch...@huawei.com>
---
  fs/f2fs/recovery.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index b6d1ec620a8c..60dd0cee4820 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -243,6 +243,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info 
*sbi, struct list_head *head,

  struct curseg_info *curseg;
  struct page *page = NULL;
  block_t blkaddr;
+    unsigned int loop_cnt = 0;
+    unsigned int free_blocks = sbi->user_block_count -
+    valid_user_blocks(sbi);
There exists another way to detect loop more faster but only using 
two variables.
The algorithm is described as simply "B goes forward a steps only A 
goes forwards 2 steps".
"B goes forward a step only when A goes forward 2(or constant x, more 
than 1) steps".



For example:
1)
   1   2  3  4   5   6     7
   |     \ /
   |        \--/
  A, B
2)
   1  2  3  4   5   6     7
    |   |    \ /
   B   A    \--/
 3)
   1  2  3  4   5   6     7
   |    | \ /
  B   A      \--/
  4)
   1  2  3  4   5   6     7
   |   |\ /
  B      A \--/
5)




Sorry, it seems the encoded diagram is in a mess, I try again.
1)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
   |   \ /
   |    \-<-/
  A, B
2)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
   |    |  \ /
   |    |   \-<-/
   B    A
3)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
    |    | \ /
    |    |  \-<-/
    B    A
4)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
    | |\ /
    | | \-<-/
    B A
5)
if B catchs up A, there exists a cycle.


Thanks,

B will equal A or beyoud A if and only if there has a cycle.
It's a more faster algorithm. :D

Thanks,


  int err = 0;
    /* get node pages in the current segment */
@@ -295,6 +298,17 @@ static int find_fsync_dnodes(struct 
f2fs_sb_info *sbi, struct list_head *head,

  if (IS_INODE(page) && is_dent_dnode(page))
  entry->last_dentry = blkaddr;
  next:
+    /* sanity check in order to detect looped node chain */
+    if (++loop_cnt >= free_blocks ||
+    blkaddr == next_blkaddr_of_node(page)) {
+    f2fs_msg(sbi->sb, KERN_NOTICE,
+    "%s: detect looped node chain, "
+    "blkaddr:%u, next:%u",
+    __func__, blkaddr, next_blkaddr_of_node(page));
+    err = -EINVAL;
+    break;
+    }
+
  /* check next segment */
  blkaddr = next_blkaddr_of_node(page);
  f2fs_put_page(page, 1);







--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix to handle looped node chain during recovery

2018-02-03 Thread Gao Xiang via Linux-f2fs-devel



On 2018/2/3 18:35, Gao Xiang wrote:

Hi Chao and YunLei,


On 2018/2/3 17:44, Chao Yu wrote:

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyun...@huawei.com>
Signed-off-by: Chao Yu <yuch...@huawei.com>
---
  fs/f2fs/recovery.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index b6d1ec620a8c..60dd0cee4820 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -243,6 +243,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info 
*sbi, struct list_head *head,

  struct curseg_info *curseg;
  struct page *page = NULL;
  block_t blkaddr;
+    unsigned int loop_cnt = 0;
+    unsigned int free_blocks = sbi->user_block_count -
+    valid_user_blocks(sbi);
There exists another way to detect loop more faster but only using two 
variables.
The algorithm is described as simply "B goes forward a steps only A 
goes forwards 2 steps".
"B goes forward a step only when A goes forward 2(or constant x, more 
than 1) steps".



For example:
1)
   1   2  3  4   5   6     7
   |     \ /
   |        \--/
  A, B
2)
   1  2  3  4   5   6     7
    |   |    \ /
   B   A    \--/
 3)
   1  2  3  4   5   6     7
   |    | \ /
  B   A      \--/
  4)
   1  2  3  4   5   6     7
   |   |\ /
  B      A \--/
5)




Sorry, it seems the encoded diagram is in a mess, I try again.
1)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
   |   \ /
   |    \-<-/
  A, B
2)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
   |    |  \ /
   |    |   \-<-/
   B    A
3)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
    |    | \ /
    |    |  \-<-/
    B    A
4)
   1 -> 2 -> 3 -> 4 -> 5 ->  6 -> 7
    | |\ /
    | | \-<-/
    B A
5)
if B catchs up A, there exists a cycle.


Thanks,

B will equal A or beyoud A if and only if there has a cycle.
It's a more faster algorithm. :D

Thanks,


  int err = 0;
    /* get node pages in the current segment */
@@ -295,6 +298,17 @@ static int find_fsync_dnodes(struct f2fs_sb_info 
*sbi, struct list_head *head,

  if (IS_INODE(page) && is_dent_dnode(page))
  entry->last_dentry = blkaddr;
  next:
+    /* sanity check in order to detect looped node chain */
+    if (++loop_cnt >= free_blocks ||
+    blkaddr == next_blkaddr_of_node(page)) {
+    f2fs_msg(sbi->sb, KERN_NOTICE,
+    "%s: detect looped node chain, "
+    "blkaddr:%u, next:%u",
+    __func__, blkaddr, next_blkaddr_of_node(page));
+    err = -EINVAL;
+    break;
+    }
+
  /* check next segment */
  blkaddr = next_blkaddr_of_node(page);
  f2fs_put_page(page, 1);





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix to handle looped node chain during recovery

2018-02-03 Thread Gao Xiang via Linux-f2fs-devel

Hi Chao and YunLei,


On 2018/2/3 17:44, Chao Yu wrote:

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He 
Signed-off-by: Chao Yu 
---
  fs/f2fs/recovery.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index b6d1ec620a8c..60dd0cee4820 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -243,6 +243,9 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
struct list_head *head,
struct curseg_info *curseg;
struct page *page = NULL;
block_t blkaddr;
+   unsigned int loop_cnt = 0;
+   unsigned int free_blocks = sbi->user_block_count -
+   valid_user_blocks(sbi);
There exists another way to detect loop more faster but only using two 
variables.
The algorithm is described as simply "B goes forward a steps only A goes 
forwards 2 steps".

For example:
1)
   1   2  3  4   5   6     7
   |     \ /
   |        \--/
  A, B
2)
   1  2  3  4   5   6     7
    |   |    \ /
   B   A    \--/
 3)
   1  2  3  4   5   6     7
   |    | \ /
  B   A      \--/
  4)
   1  2  3  4   5   6     7
   |   |\ /
  B      A \--/
5)

B will equal A or beyoud A if and only if there has a cycle.
It's a more faster algorithm. :D

Thanks,


int err = 0;
  
  	/* get node pages in the current segment */

@@ -295,6 +298,17 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
struct list_head *head,
if (IS_INODE(page) && is_dent_dnode(page))
entry->last_dentry = blkaddr;
  next:
+   /* sanity check in order to detect looped node chain */
+   if (++loop_cnt >= free_blocks ||
+   blkaddr == next_blkaddr_of_node(page)) {
+   f2fs_msg(sbi->sb, KERN_NOTICE,
+   "%s: detect looped node chain, "
+   "blkaddr:%u, next:%u",
+   __func__, blkaddr, next_blkaddr_of_node(page));
+   err = -EINVAL;
+   break;
+   }
+
/* check next segment */
blkaddr = next_blkaddr_of_node(page);
f2fs_put_page(page, 1);



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: use crc ^ cp_ver instead of crc | cp_ver for recovery

2018-02-01 Thread Gao Xiang

Hi Chao,

On 2018/2/1 22:30, Chao Yu wrote:

We can use this calculation since cp_ver is complete 64-bits random number now.


Sorry, I meant we can't.



Alright, I think it is an unimportant proposal for now.
We could fix now, sometime later or never :)
Just saw by chance and a little suggestion to the community...

Thanks,


Thanks,



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: use crc ^ cp_ver instead of crc | cp_ver for recovery

2018-02-01 Thread Gao Xiang



On 2018/2/1 21:57, Gao Xiang wrote:

Hi Chao,

On 2018/2/1 21:27, Chao Yu wrote:

Hi Xiang,

On 2018/2/1 0:16, Gaoxiang (OS) wrote:

This patch add a flag CP_CRC_RECOVERY_XOR_FLAG to use XORed crc ^ cp_ver
since crc | cp_ver is more likely to get a collision or become 
11.. | cp_ver.


FYI, we have discussed about this before:

https://patchwork.kernel.org/patch/9342639/

At that time, cp_ver will always be initialized with 1, so there is 
almost a
little chance to use high 32 bits, result in less collision, so I 
think it

will be OK.

But now, cp_ver will be initialized with a random 64 bits value, then the
collision will be increased, I agree that xor method will be better, 
but I'm
not sure we should use this, since layout change makes complicated 
handling

in codes for back compatibility.

And do you encounter any incorrect recovery, or is there any following 
feature

rely on this?






No... I just looked at node_footer because of another work and saw this 
part of code by chance.
I think XOR-ed calculation is much better in mathematics, and typical 
method for mixing two values (eg. XOR encryption) is also XOR-ed 
calculation rather than OR-ed...


Thanks,



BTW,
"The crc is already random enough, but has 32bits only.
The cp_ver is not easy to use over 32bits, so we don't need to keep the 
other

32bits untouched in most of life."

I observe cp_ver(if the high 32 bits are 0) ^ crc == cp_ver | crc, but
if cp_ver is over 32bits, ...

...hmmm...

Thanks,



Thanks,



Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
---
  fs/f2fs/checkpoint.c    |  4 ++--
  fs/f2fs/node.h  | 16 +++-
  fs/f2fs/segment.c   |  3 ++-
  include/linux/f2fs_fs.h |  1 +
  4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 8b0945b..9e7e63b 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1157,8 +1157,8 @@ static void update_ckpt_flags(struct 
f2fs_sb_info *sbi, struct cp_control *cpc)

  if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
  __set_ckpt_flags(ckpt, CP_FSCK_FLAG);
-    /* set this flag to activate crc|cp_ver for recovery */
-    __set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG);
+    /* set this flag to activate crc^cp_ver for recovery */
+    __set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG);
  __clear_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG);
  spin_unlock_irqrestore(>cp_lock, flags);
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 081ef0d..7b9489f 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -293,8 +293,11 @@ static inline void 
fill_node_footer_blkaddr(struct page *page, block_t blkaddr)

  struct f2fs_node *rn = F2FS_NODE(page);
  __u64 cp_ver = cur_cp_version(ckpt);
-    if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG))
-    cp_ver |= (cur_cp_crc(ckpt) << 32);
+    if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG))
+    cp_ver ^= cur_cp_crc(ckpt) << 32;
+    /* for backward compatibility */
+    else if (unlikely(__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)))
+    cp_ver |= cur_cp_crc(ckpt) << 32;
  rn->footer.cp_ver = cpu_to_le64(cp_ver);
  rn->footer.next_blkaddr = cpu_to_le32(blkaddr);
@@ -307,10 +310,13 @@ static inline bool is_recoverable_dnode(struct 
page *page)

  /* Don't care crc part, if fsck.f2fs sets it. */
  if (__is_set_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG))
-    return (cp_ver << 32) == (cpver_of_node(page) << 32);
+    return (__u32)cp_ver == (__u32)cpver_of_node(page);
-    if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG))
-    cp_ver |= (cur_cp_crc(ckpt) << 32);
+    if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG))
+    cp_ver ^= cur_cp_crc(ckpt) << 32;
+    /* for backward compatibility */
+    else if (unlikely(__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)))
+    cp_ver |= cur_cp_crc(ckpt) << 32;
  return cp_ver == cpver_of_node(page);
  }
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 205b0d9..64d0c1f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2316,7 +2316,8 @@ static void allocate_segment_by_default(struct 
f2fs_sb_info *sbi,

  if (force)
  new_curseg(sbi, type, true);
-    else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
+    else if (!(is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) ||
+    is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_XOR_FLAG)) &&
  type == CURSEG_WARM_NODE)
  new_curseg(sbi, type, false);
  else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, 
type))

diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index f7f0990..07ddf4b 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -117,6 +117,7 @@ struct f2fs_super_block {
  /*
   * For checkpoint
   */
+#define CP_CRC_RECOVERY_XOR_FLAG    0x0800
  #define CP_LARGE_NAT_BITMAP_FLAG    0

Re: [f2fs-dev] [PATCH] f2fs: use crc ^ cp_ver instead of crc | cp_ver for recovery

2018-02-01 Thread Gao Xiang

Hi Chao,

On 2018/2/1 21:27, Chao Yu wrote:

Hi Xiang,

On 2018/2/1 0:16, Gaoxiang (OS) wrote:

This patch add a flag CP_CRC_RECOVERY_XOR_FLAG to use XORed crc ^ cp_ver
since crc | cp_ver is more likely to get a collision or become 11.. | 
cp_ver.


FYI, we have discussed about this before:

https://patchwork.kernel.org/patch/9342639/

At that time, cp_ver will always be initialized with 1, so there is almost a
little chance to use high 32 bits, result in less collision, so I think it
will be OK.

But now, cp_ver will be initialized with a random 64 bits value, then the
collision will be increased, I agree that xor method will be better, but I'm
not sure we should use this, since layout change makes complicated handling
in codes for back compatibility.

And do you encounter any incorrect recovery, or is there any following feature
rely on this?


No... I just looked at node_footer because of another work and saw this 
part of code by chance.
I think XOR-ed calculation is much better in mathematics, and typical 
method for mixing two values (eg. XOR encryption) is also XOR-ed 
calculation rather than OR-ed...


Thanks,



Thanks,



Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
---
  fs/f2fs/checkpoint.c|  4 ++--
  fs/f2fs/node.h  | 16 +++-
  fs/f2fs/segment.c   |  3 ++-
  include/linux/f2fs_fs.h |  1 +
  4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 8b0945b..9e7e63b 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1157,8 +1157,8 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
__set_ckpt_flags(ckpt, CP_FSCK_FLAG);
  
-	/* set this flag to activate crc|cp_ver for recovery */

-   __set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG);
+   /* set this flag to activate crc^cp_ver for recovery */
+   __set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG);
__clear_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG);
  
  	spin_unlock_irqrestore(>cp_lock, flags);

diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 081ef0d..7b9489f 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -293,8 +293,11 @@ static inline void fill_node_footer_blkaddr(struct page 
*page, block_t blkaddr)
struct f2fs_node *rn = F2FS_NODE(page);
__u64 cp_ver = cur_cp_version(ckpt);
  
-	if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG))

-   cp_ver |= (cur_cp_crc(ckpt) << 32);
+   if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG))
+   cp_ver ^= cur_cp_crc(ckpt) << 32;
+   /* for backward compatibility */
+   else if (unlikely(__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)))
+   cp_ver |= cur_cp_crc(ckpt) << 32;
  
  	rn->footer.cp_ver = cpu_to_le64(cp_ver);

rn->footer.next_blkaddr = cpu_to_le32(blkaddr);
@@ -307,10 +310,13 @@ static inline bool is_recoverable_dnode(struct page *page)
  
  	/* Don't care crc part, if fsck.f2fs sets it. */

if (__is_set_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG))
-   return (cp_ver << 32) == (cpver_of_node(page) << 32);
+   return (__u32)cp_ver == (__u32)cpver_of_node(page);
  
-	if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG))

-   cp_ver |= (cur_cp_crc(ckpt) << 32);
+   if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_XOR_FLAG))
+   cp_ver ^= cur_cp_crc(ckpt) << 32;
+   /* for backward compatibility */
+   else if (unlikely(__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG)))
+   cp_ver |= cur_cp_crc(ckpt) << 32;
  
  	return cp_ver == cpver_of_node(page);

  }
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 205b0d9..64d0c1f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2316,7 +2316,8 @@ static void allocate_segment_by_default(struct 
f2fs_sb_info *sbi,
  
  	if (force)

new_curseg(sbi, type, true);
-   else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
+   else if (!(is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) ||
+   is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_XOR_FLAG)) &&
type == CURSEG_WARM_NODE)
new_curseg(sbi, type, false);
else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type))
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index f7f0990..07ddf4b 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -117,6 +117,7 @@ struct f2fs_super_block {
  /*
   * For checkpoint
   */
+#define CP_CRC_RECOVERY_XOR_FLAG   0x0800
  #define CP_LARGE_NAT_BITMAP_FLAG  0x0400
  #define CP_NOCRC_RECOVERY_FLAG0x0200
  #define CP_TRIMMED_FLAG   0x0100



--
Check out the vibrant te

Re: [f2fs-dev] [PATCH RFC v4] f2fs: flush cp pack except cp pack 2 page at first

2018-01-31 Thread Gao Xiang

Hi Jaegeuk and Chao,

On 2018/2/1 6:28, Jaegeuk Kim wrote:

On 01/31, Chao Yu wrote:

On 2018/1/31 14:39, Gaoxiang (OS) wrote:

Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
Reviewed-by: Chao Yu <yuch...@huawei.com>
---
Change log from v3:
   - further review comments are applied from Jaegeuk and Chao
   - Tested on this patch (without multiple-device): mount, boot Android with 
f2fs userdata and make fragment
   - If any problem with this patch or I miss something, please kindly share 
your comments, thanks :)
Change log from v2:
   - Apply the review comments from Chao
Change log from v1:
   - Apply the review comments from Chao
   - time data from "finish block_ops" to " finish checkpoint" (tested on ARM64 
with TOSHIBA 128GB UFS):
  Before patch: 0.002273  0.001973  0.002789  0.005159  0.002050
  After patch: 0.002502  0.001624  0.002487  0.003049  0.002696
  fs/f2fs/checkpoint.c | 67 
  1 file changed, 46 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 14d2fed..916dc72 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1158,6 +1158,39 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
spin_unlock_irqrestore(>cp_lock, flags);
  }
  
+static void commit_checkpoint(struct f2fs_sb_info *sbi,

+   void *src, block_t blk_addr)
+{
+   struct writeback_control wbc = {
+   .for_reclaim = 0,
+   };
+
+   /*
+* pagevec_lookup_tag and lock_page again will take
+* some extra time. Therefore, update_meta_pages and
+* sync_meta_pages are combined in this function.
+*/
+   struct page *page = grab_meta_page(sbi, blk_addr);
+   int err;
+
+   memcpy(page_address(page), src, PAGE_SIZE);
+   set_page_dirty(page);
+
+   f2fs_wait_on_page_writeback(page, META, true);
+   f2fs_bug_on(sbi, PageWriteback(page));
+   if (unlikely(!clear_page_dirty_for_io(page)))
+   f2fs_bug_on(sbi, 1);
+
+   /* writeout cp pack 2 page */
+   err = __f2fs_write_meta_page(page, , FS_CP_META_IO);
+   f2fs_bug_on(sbi, err);
+
+   f2fs_put_page(page, 0);
+
+   /* submit checkpoint (with barrier if NOBARRIER is not set) */
+   f2fs_submit_merged_write(sbi, META_FLUSH);
+}
+
  static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
  {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
@@ -1260,16 +1293,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
}
}
  
-	/* need to wait for end_io results */

-   wait_on_all_pages_writeback(sbi);
-   if (unlikely(f2fs_cp_error(sbi)))
-   return -EIO;
-
-   /* flush all device cache */
-   err = f2fs_flush_device_cache(sbi);
-   if (err)
-   return err;
-
/* write out checkpoint buffer at block 0 */
update_meta_page(sbi, ckpt, start_blk++);
  
@@ -1297,15 +1320,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)

start_blk += NR_CURSEG_NODE_TYPE;
}
  
-	/* writeout checkpoint block */

-   update_meta_page(sbi, ckpt, start_blk);
-
-   /* wait for previous submitted node/meta pages writeback */
-   wait_on_all_pages_writeback(sbi);
-
-   if (unlikely(f2fs_cp_error(sbi)))
-   return -EIO;
-
filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);


  - remove


You mean remove
filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
and
filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);
or remove
filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);

Actually, I have no idea why do these two filemap_fdatawait_range stay 
here and what are these used and waited for in this place,
however I found it was modified recently and for many times, I guess 
they have some use.


  
@@ -1313,12 +1327,23 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)

sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(>alloc_valid_block_count, 0);
  
-	/* Here, we only have one bio having CP pack */

-   sync_meta_pages(sbi, META_FLUSH, LONG_MAX, FS_CP_META_IO);
+   /* Here, we have one bio having CP pack except cp pack 2 page */
+   sync_m

Re: [f2fs-dev] [PATCH RFC] f2fs: flush cp pack except cp page2 at first

2018-01-24 Thread Gao Xiang via Linux-f2fs-devel

Hi Chao,


On 2018/1/24 23:57, Chao Yu wrote:

On 2018/1/24 14:53, Gaoxiang (OS) wrote:

Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly power off at this time, we could meet an
extreme scenario that cp page1 and cp page2 are updated and latest,
but payload or current summaries are still outdated.
(see reliable write in UFS spec)

This patch write the whole cp pack except cp page2 with FLUSH
at first, and then write the cp page2 with an extra independent
bio with FLUSH.

Signed-off-by: Gao Xiang <gaoxian...@huawei.com>
---
  fs/f2fs/checkpoint.c | 48 +---
  fs/f2fs/f2fs.h   |  3 ++-
  fs/f2fs/segment.c| 11 +--
  3 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 14d2fed..e7f5e85 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -300,6 +300,35 @@ static int f2fs_write_meta_pages(struct address_space 
*mapping,
return 0;
  }
  
+static int sync_meta_page_locked(struct f2fs_sb_info *sbi,

+   struct page *page,
+   enum page_type type, enum iostat_type io_type)
+{
+   struct writeback_control wbc = {
+   .for_reclaim = 0,
+   };
+   int err;
+
+   BUG_ON(page->mapping != META_MAPPING(sbi));
+   BUG_ON(!PageDirty(page));
+
+   f2fs_wait_on_page_writeback(page, META, true);
+
+   BUG_ON(PageWriteback(page));
+   if (unlikely(!clear_page_dirty_for_io(page)))
+   BUG();
+
+   err = __f2fs_write_meta_page(page, , io_type);
+   if (err) {
+   f2fs_put_page(page, 1);
+   return err;
+   }
+   f2fs_put_page(page, 0);
+
+   f2fs_submit_merged_write(sbi, type);
+   return err;
+}
+
  long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
long nr_to_write, enum iostat_type io_type)
  {
@@ -1172,6 +1201,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
struct curseg_info *seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE);
u64 kbytes_written;
int err;
+   struct page *cp_page2;
  
  	/* Flush all the NAT/SIT pages */

while (get_pages(sbi, F2FS_DIRTY_META)) {
@@ -1250,7 +1280,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
blk = start_blk + sbi->blocks_per_seg - nm_i->nat_bits_blocks;
for (i = 0; i < nm_i->nat_bits_blocks; i++)
update_meta_page(sbi, nm_i->nat_bits +
-   (i << F2FS_BLKSIZE_BITS), blk + i);
+   (i << F2FS_BLKSIZE_BITS), blk + i, 
NULL);
  
  		/* Flush all the NAT BITS pages */

while (get_pages(sbi, F2FS_DIRTY_META)) {
@@ -1271,11 +1301,11 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
return err;
  
  	/* write out checkpoint buffer at block 0 */

-   update_meta_page(sbi, ckpt, start_blk++);
+   update_meta_page(sbi, ckpt, start_blk++, NULL);
  
  	for (i = 1; i < 1 + cp_payload_blks; i++)

update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE,
-   start_blk++);
+   start_blk++, NULL);
  
  	if (orphan_num) {

write_orphan_inodes(sbi, start_blk);
@@ -1297,9 +1327,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
start_blk += NR_CURSEG_NODE_TYPE;
}
  
-	/* writeout checkpoint block */

-   update_meta_page(sbi, ckpt, start_blk);
-
/* wait for previous submitted node/meta pages writeback */
wait_on_all_pages_writeback(sbi);
  
@@ -1313,12 +1340,19 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)

sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(>alloc_valid_block_count, 0);
  
-	/* Here, we only have one bio having CP pack */

+   /* Here, we only have one bio having CP pack except cp page 2 */
sync_meta_pages(sbi, META_FLUSH, LONG_MAX, FS_CP_META_IO);

We don't need to use META_FLUSH here.


hmmm...I think that we need to write to the device medium rather than device 
cache, or I miss something?
could you give me some hints about that? PREFLUSH or what? yet I cannot see 
some code related to that...



  
  	/* wait for previous submitted meta pages writeback */

wait_on_all_pages_writeback(sbi);
  
+	/* write and flush checkpoint cp page 2 */

+   update_meta_page(sbi, ckpt, start_blk, _page2);
+   sync_meta_page_locked(sbi, cp_page2, META_FLUSH, FS_CP_META_IO);

How about

sync_checkpoint()
{
page = grab_meta_page()
memcpy()
set_page_dirty()

...
__f2fs_write_meta_page()

Re: [f2fs-dev] [PATCH RFC] f2fs: add PRE2 to mark segments free to one checkpoint but obsolete to the other

2018-01-20 Thread Gao Xiang via Linux-f2fs-devel

Hi Weichao,


On 2018/1/21 0:02, guoweichao wrote:

Hi Xiang,

it's not related to SPOR. Just consider the case given by Chao Yu.



(It seems this email was not sent successfully, I resend it just for 
reference only)

Oh I see, I have considered the scenario what Chao said before.

1. write data into segment x;
2. write checkpoint A;
3. remove data in segment x;
4. write checkpoint B;
5. issue discard or write data into segment x;
6. sudden power-cut

Since f2fs is designed for double backup, 5) and 6), I think, actually belongs 
to checkpoint C.
and when we are in checkpoint C, checkpoint A becomes unsafe because the latest 
checkpoint is B.
and I think in that case we cannot prevent data writeback or something to 
pollute checkpoint A.

However, node segments (another metadata) would be special,
but I have no idea whether introducing PRE2 would make all cases safe or not.

In addition, if some data segments change between checkpoint A and C,
some weird data that it didn't have (new data or data from other files) would 
be gotten when switching back to checkpoint A.


Thanks,


Thanks,
*From:*Gao Xiang
*To:*guoweichao,
*Cc:*linux-f2fs-devel@lists.sourceforge.net,heyunlei,
*Date:*2018-01-20 11:49:22
*Subject:*Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to one 
checkpoint but obsolete to the other


Hi Weichao,


On 2018/1/19 23:47, guoweichao wrote:
> A critical case is using the free segments as data segments which
> are previously node segments to the old checkpoint. With fault injecting
> of the newer CP pack, fsck can find errors when checking the sanity 
of nid.

Sorry to interrupt because I'm just curious about this scenario and the
detail.

As far as I know, if the whole blocks in a segment become invalid,
the segment will become PREFREE, and then if a checkpoint is followed,
we can reuse this segment or
discard the whole segment safely after this checkpoint was done
(I think It makes sure that this segment is certainly FREE and not
reused in this checkpoint).

If the segment in the old checkpoint is a node segment, and node blocks
in the segment are all invalid until the new checkpoint.
It seems no danger to reuse the FREE node segment as a data segment in
the next checkpoint?

or something related to SPOR? In my mind f2fs-tools ignores POR node
chain...

Thanks,
> On 2018/1/20 7:29, Weichao Guo wrote:
>> Currently, we set prefree segments as free ones after writing
>> a checkpoint, then believe the segments could be used safely.
>> However, if a sudden power-off coming and the newer checkpoint
>> corrupted due to hardware issues at the same time, we will try
>> to use the old checkpoint and may face an inconsistent file
>> system status.
>>
>> How about add an PRE2 status for prefree segments, and make
>> sure the segments could be used safely to both checkpoints?
>> Or any better solutions? Or this is not a problem?
>>
>> Look forward to your comments!
>>
>> Signed-off-by: Weichao Guo <guoweic...@huawei.com>
>> ---
>>   fs/f2fs/gc.c  | 11 +--
>>   fs/f2fs/segment.c | 21 ++---
>>   fs/f2fs/segment.h |  6 ++
>>   3 files changed, 33 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index 33e7969..153e3ea 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -1030,7 +1030,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>    * threshold, we can make them free by checkpoint. 
Then, we

>>    * secure free segments which doesn't need fggc any more.
>>    */
>> -    if (prefree_segments(sbi)) {
>> +    if (prefree_segments(sbi) || prefree2_segments(sbi)) {
>> +    ret = write_checkpoint(sbi, );
>> +    if (ret)
>> +    goto stop;
>> +    }
>> +    if (has_not_enough_free_secs(sbi, 0, 0) && 
prefree2_segments(sbi)) {

>>   ret = write_checkpoint(sbi, );
>>   if (ret)
>>   goto stop;
>> @@ -1063,8 +1068,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>   goto gc_more;
>>   }
>>
>> -    if (gc_type == FG_GC)
>> +    if (gc_type == FG_GC) {
>> +    ret = write_checkpoint(sbi, );
>>   ret = write_checkpoint(sbi, );
>> +    }
>>   }
>>   stop:
>>   SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 2e8e054d..9dec445 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -1606,7 +1606,7 @@ static void 
set_prefree_

Re: [f2fs-dev] [PATCH RFC] f2fs: add PRE2 to mark segments free to one checkpoint but obsolete to the other

2018-01-20 Thread Gao Xiang via Linux-f2fs-devel

Hi Chao and Weichao,


On 2018/1/21 10:34, Chao Yu wrote:

Hi Weichao,

On 2018/1/20 23:50, guoweichao wrote:

Hi Chao,

Yes, it is exactly what I mean.
It seems that F2FS has no responsibility to cover hardware problems.
However, file systems usually choose redundancy for super block fault tolerance.
So I think we actually have considered some external errors when designing a 
file system.
Our dual checkpoint mechanism is mainly designed to keep at least one stable
CP pack while creating a new one in case of SPO. And it has fault tolerant 
effects.
As CP pack is also very critical to F2FS, why not make checkpoint more 
robustness

I think you're trying to change basic design of A/B upgrade system to A/B/C one,
which can keep always two checkpoint valid. There will be no simple modification
to implement that one, in where we should cover not only prefree case but also
SSR case.
*nod*, triple-like-backups would not solve all hardware issues in a even 
worse case,
and in order to make all backups available we need to find more extra 
area to leave all modification among 3 checkpoints.


In my opinion, introducing "snapshot" feature could be something more 
useful for phone (yet it is a big feature :( )

and if fsck found something is unrecoverable
we could roll back to a stable and reliable (via re-checking metadata) 
snapshot (maybe hours ago or days ago,

and remind users when rollback in fsck or somewhere).

Sorry to bother all,

Thanks,


IMO, the biggest problem there is available space, since in checkpoint C, we can
only use invalid blocks in both checkpoint A and B, so in some cases there will
almost be no valid space we can use during allocation, result in frequently
checkpoint.

IMO, what we can do is trying to keep last valid checkpoint being integrity as
possible as we can. One way is that we can add mirror or parity for the
checkpoint which can help to do recovery once checkpoint is corrupted. At
least, I hope that with it in debug version we can help hardware staff to fix
their issue instead of wasting much time to troubleshoot filesystem issue.

Thanks,


with simple modification in current design and little overhead except for FG_GC.
Of cause, keep unchanged is OK. I just want to discuss this proposal. :)

Thanks,
*From:*Chao Yu
*To:*guoweichao,jaeg...@kernel.org,
*Cc:*linux-f2fs-devel@lists.sourceforge.net,linux-fsde...@vger.kernel.org,heyunlei,
*Date:*2018-01-20 15:43:23
*Subject:*Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to one 
checkpoint but obsolete to the other

Hi Weichao,

On 2018/1/20 7:29, Weichao Guo wrote:

Currently, we set prefree segments as free ones after writing
a checkpoint, then believe the segments could be used safely.
However, if a sudden power-off coming and the newer checkpoint
corrupted due to hardware issues at the same time, we will try
to use the old checkpoint and may face an inconsistent file
system status.

IIUC, you mean:

1. write nodes into segment x;
2. write checkpoint A;
3. remove nodes in segment x;
4. write checkpoint B;
5. issue discard or write datas into segment x;
6. sudden power-cut

But after reboot, we found checkpoint B is corrupted due to hardware, and
then start to use checkpoint A, but nodes in segment x recorded as valid
data in checkpoint A has been overcovered in step 5), so we will encounter
inconsistent meta data, right?

Thanks,


How about add an PRE2 status for prefree segments, and make
sure the segments could be used safely to both checkpoints?
Or any better solutions? Or this is not a problem?

Look forward to your comments!

Signed-off-by: Weichao Guo 
---
  fs/f2fs/gc.c  | 11 +--
  fs/f2fs/segment.c | 21 ++---
  fs/f2fs/segment.h |  6 ++
  3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 33e7969..153e3ea 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1030,7 +1030,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
    * threshold, we can make them free by checkpoint. Then, we
    * secure free segments which doesn't need fggc any more.
    */
- if (prefree_segments(sbi)) {
+ if (prefree_segments(sbi) || prefree2_segments(sbi)) {
+ ret = write_checkpoint(sbi, );
+ if (ret)
+ goto stop;
+ }
+ if (has_not_enough_free_secs(sbi, 0, 0) && prefree2_segments(sbi)) {
   ret = write_checkpoint(sbi, );
   if (ret)
   goto stop;
@@ -1063,8 +1068,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
   goto gc_more;
   }
  
- if (gc_type == FG_GC)

+ if (gc_type == FG_GC) {
+ ret = write_checkpoint(sbi, );
   ret = write_checkpoint(sbi, );
+ }
   }
  stop:
   SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2e8e054d..9dec445 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1606,7 +1606,7 @@ static void set_prefree_as_free_segments(struct 
f2fs_sb_info *sbi)
   unsigned int segno;
  
   mutex_lock(_i->seglist_lock);

- for_each_set_bit(segno, 

Re: [f2fs-dev] [PATCH RFC] f2fs: add PRE2 to mark segments free to one checkpoint but obsolete to the other

2018-01-19 Thread Gao Xiang via Linux-f2fs-devel

Hi Weichao,


On 2018/1/19 23:47, guoweichao wrote:

A critical case is using the free segments as data segments which
are previously node segments to the old checkpoint. With fault injecting
of the newer CP pack, fsck can find errors when checking the sanity of nid.
Sorry to interrupt because I'm just curious about this scenario and the 
detail.


As far as I know, if the whole blocks in a segment become invalid,
the segment will become PREFREE, and then if a checkpoint is followed, 
we can reuse this segment or

discard the whole segment safely after this checkpoint was done
(I think It makes sure that this segment is certainly FREE and not 
reused in this checkpoint).


If the segment in the old checkpoint is a node segment, and node blocks 
in the segment are all invalid until the new checkpoint.
It seems no danger to reuse the FREE node segment as a data segment in 
the next checkpoint?


or something related to SPOR? In my mind f2fs-tools ignores POR node 
chain...


Thanks,

On 2018/1/20 7:29, Weichao Guo wrote:

Currently, we set prefree segments as free ones after writing
a checkpoint, then believe the segments could be used safely.
However, if a sudden power-off coming and the newer checkpoint
corrupted due to hardware issues at the same time, we will try
to use the old checkpoint and may face an inconsistent file
system status.

How about add an PRE2 status for prefree segments, and make
sure the segments could be used safely to both checkpoints?
Or any better solutions? Or this is not a problem?

Look forward to your comments!

Signed-off-by: Weichao Guo 
---
  fs/f2fs/gc.c  | 11 +--
  fs/f2fs/segment.c | 21 ++---
  fs/f2fs/segment.h |  6 ++
  3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 33e7969..153e3ea 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1030,7 +1030,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
 * threshold, we can make them free by checkpoint. Then, we
 * secure free segments which doesn't need fggc any more.
 */
-   if (prefree_segments(sbi)) {
+   if (prefree_segments(sbi) || prefree2_segments(sbi)) {
+   ret = write_checkpoint(sbi, );
+   if (ret)
+   goto stop;
+   }
+   if (has_not_enough_free_secs(sbi, 0, 0) && 
prefree2_segments(sbi)) {
ret = write_checkpoint(sbi, );
if (ret)
goto stop;
@@ -1063,8 +1068,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
goto gc_more;
}
  
-		if (gc_type == FG_GC)

+   if (gc_type == FG_GC) {
+   ret = write_checkpoint(sbi, );
ret = write_checkpoint(sbi, );
+   }
}
  stop:
SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2e8e054d..9dec445 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1606,7 +1606,7 @@ static void set_prefree_as_free_segments(struct 
f2fs_sb_info *sbi)
unsigned int segno;
  
  	mutex_lock(_i->seglist_lock);

-   for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
+   for_each_set_bit(segno, dirty_i->dirty_segmap[PRE2], MAIN_SEGS(sbi))
__set_test_and_free(sbi, segno);
mutex_unlock(_i->seglist_lock);
  }
@@ -1617,13 +1617,17 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
struct list_head *head = >entry_list;
struct discard_entry *entry, *this;
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
-   unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
+   unsigned long *prefree_map;
unsigned int start = 0, end = -1;
unsigned int secno, start_segno;
bool force = (cpc->reason & CP_DISCARD);
+   int phase = 0;
+   enum dirty_type dirty_type = PRE2;
  
  	mutex_lock(_i->seglist_lock);
  
+next_step:

+   prefree_map = dirty_i->dirty_segmap[dirty_type];
while (1) {
int i;
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
@@ -1635,7 +1639,7 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
for (i = start; i < end; i++)
clear_bit(i, prefree_map);
  
-		dirty_i->nr_dirty[PRE] -= end - start;

+   dirty_i->nr_dirty[dirty_type] -= end - start;
  
  		if (!test_opt(sbi, DISCARD))

continue;
@@ -1663,6 +1667,16 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
else
end = start - 1;
}
+   if (phase == 0) {
+   /* status change: PRE -> PRE2 */
+   for_each_set_bit(segno, 

Re: [f2fs-dev] [PATCH v2] mkfs.f2fs: expand scalability of nat bitmap

2018-01-16 Thread Gao Xiang via Linux-f2fs-devel

Hi Chao Yu,


On 2018/1/17 11:15, Chao Yu wrote:

Hi Jaegeuk,

On 2018/1/17 8:47, Jaegeuk Kim wrote:

Hi Chao,

On 01/15, Chao Yu wrote:

Previously, our total node number (nat_bitmap) and total nat segment count
will not monotonously increase along with image size, and max nat_bitmap size
is limited by "CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1", it is
with bad scalability when user wants to create more inode/node in larger image.

So this patch tries to relieve the limitation, by default, limitting total nat
entry number with 20% of total block number.

Before:
image_size(GB)  nat_bitmap  sit_bitmap  nat_segment sit_segment
16  383664  36  2
32  383664  72  2
64  3772128 116 4
128 3708192 114 6
256 3580320 110 10

As you see, nat_segment count will reduce when image size increases
starting from 64GB, that means nat segment count will not monotonously
increase when image size is increasing, so it would be better to active
this when image size is larger than 32GB?

IMO, configuring basic nid ratio to fixed value like ext4 ("free inode" :
"free block" is about 1 : 4) would be better:
a. It will be easy for user to predict nid count or nat segment count with
fix-sized image;
b. If user wants to reserve more nid count, we can support -N option in
mkfs.f2fs to specify total nid count as user wish.
I agree bacause it is weird if nat segment count is not monotonously 
increased, especially for server users, and how about modifying like this?
32GB~xxxGB(if (max_sit_bitmap_size + max_nat_bitmap_size ~ (<=) 
MAX_BITMAP_SIZE_IN_CKPT) )    ---  use the original nat calculation version;

>xxx GB --- use CP_LARGE_NAT_BITMAP_FLAG and the introduced ratio.
if user-defined -N is specified, use the nid count ( or ratio ?) instead 
of  the above calculation.


Thanks,


How do you think?

Thanks,


512 3260640 100 20
10242684121682  38
20481468243244  76
409639004800120 150

After:
image_size(GB)  nat_bitmap  sit_bitmap  nat_segment sit_segment
16  256 64  8   2
32  512 64  16  2
64  960 128 30  4
128 1856192 58  6
256 3712320 116 10

Can we activate this, if size is larger than 256GB or something around that?

Thanks,


512 7424640 232 20
102414787   1216462 38
204829504   2432922 76
409659008   48001844150

Signed-off-by: Chao Yu 
---
v2:
- add CP_LARGE_NAT_BITMAP_FLAG flag to indicate new layout of nat/sit bitmap.
  fsck/f2fs.h| 19 +--
  fsck/resize.c  | 35 +--
  include/f2fs_fs.h  |  8 ++--
  lib/libf2fs.c  |  1 +
  mkfs/f2fs_format.c | 45 +++--
  5 files changed, 60 insertions(+), 48 deletions(-)

diff --git a/fsck/f2fs.h b/fsck/f2fs.h
index f5970d9dafc0..8a5ce365282d 100644
--- a/fsck/f2fs.h
+++ b/fsck/f2fs.h
@@ -239,6 +239,12 @@ static inline unsigned int ofs_of_node(struct f2fs_node 
*node_blk)
return flag >> OFFSET_BIT_SHIFT;
  }
  
+static inline bool is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)

+{
+   unsigned int ckpt_flags = le32_to_cpu(cp->ckpt_flags);
+   return ckpt_flags & f ? 1 : 0;
+}
+
  static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag)
  {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
@@ -256,6 +262,13 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, 
int flag)
  {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
int offset;
+
+   if (is_set_ckpt_flags(ckpt, CP_LARGE_NAT_BITMAP_FLAG)) {
+   offset = (flag == SIT_BITMAP) ?
+   le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0;
+   return >sit_nat_version_bitmap + offset;
+   }
+
if (le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload) > 0) {
if (flag == NAT_BITMAP)
return >sit_nat_version_bitmap;
@@ -268,12 +281,6 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, 
int flag)
}
  }
  
-static inline bool is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)

-{
-   unsigned int ckpt_flags = le32_to_cpu(cp->ckpt_flags);
-   return ckpt_flags 

Re: [f2fs-dev] [PATCH] mkfs.f2fs: expand scalability of nat bitmap

2018-01-13 Thread Gao Xiang via Linux-f2fs-devel

Hi Chao,


On 2018/01/12 18:24, Chao Yu wrote:


Previously, our total node number (nat_bitmap) and total nat segment count
will not monotonously increase along with image size, and max nat_bitmap size
is limited by "CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1", it is
with bad scalability when user wants to create more inode/node in larger image.

So this patch tries to relieve the limitation, by default, limitting total nat
entry number with 20% of total block number.

Before:
image_size(GB)  nat_bitmap  sit_bitmap  nat_segment sit_segment
16  383664  36  2
32  383664  72  2
64  3772128 116 4
128 3708192 114 6
256 3580320 110 10
512 3260640 100 20
10242684121682  38
20481468243244  76
409639004800120 150

After:
image_size(GB)  nat_bitmap  sit_bitmap  nat_segment sit_segment
16  256 64  8   2
32  512 64  16  2
64  960 128 30  4
128 1856192 58  6
256 3712320 116 10
512 7424640 232 20
102414787   1216462 38
204829504   2432922 76
409659008   48001844150

Signed-off-by: Chao Yu 
---
  fsck/resize.c  | 31 ++-
  include/f2fs_fs.h  |  6 --
  mkfs/f2fs_format.c | 35 ++-
  3 files changed, 32 insertions(+), 40 deletions(-)

diff --git a/fsck/resize.c b/fsck/resize.c
index 143ad5d3c0a1..7613c7df4893 100644
--- a/fsck/resize.c
+++ b/fsck/resize.c
@@ -16,7 +16,7 @@ static int get_new_sb(struct f2fs_super_block *sb)
u_int32_t sit_segments, diff, total_meta_segments;
u_int32_t total_valid_blks_available;
u_int32_t sit_bitmap_size, max_sit_bitmap_size;
-   u_int32_t max_nat_bitmap_size, max_nat_segments;
+   u_int32_t max_nat_bitmap_size;
u_int32_t segment_size_bytes = 1 << (get_sb(log_blocksize) +
get_sb(log_blocks_per_seg));
u_int32_t blks_per_seg = 1 << get_sb(log_blocks_per_seg);
@@ -47,7 +47,8 @@ static int get_new_sb(struct f2fs_super_block *sb)
get_sb(segment_count_sit))) * blks_per_seg;
blocks_for_nat = SIZE_ALIGN(total_valid_blks_available,
NAT_ENTRY_PER_BLOCK);
-   set_sb(segment_count_nat, SEG_ALIGN(blocks_for_nat));
+   set_sb(segment_count_nat, SEG_ALIGN(blocks_for_nat) *
+   DEFAULT_NAT_ENTRY_RATIO / 100);
  
  	sit_bitmap_size = ((get_sb(segment_count_sit) / 2) <<

get_sb(log_blocks_per_seg)) / 8;
@@ -56,25 +57,21 @@ static int get_new_sb(struct f2fs_super_block *sb)
else
max_sit_bitmap_size = sit_bitmap_size;
  
-	/*

-* It should be reserved minimum 1 segment for nat.
-* When sit is too large, we should expand cp area. It requires more 
pages for cp.
-*/
-   if (max_sit_bitmap_size > MAX_SIT_BITMAP_SIZE_IN_CKPT) {
-   max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1;
-   set_sb(cp_payload, F2FS_BLK_ALIGN(max_sit_bitmap_size));
+   max_nat_bitmap_size = (get_sb(segment_count_nat) <<
+   get_sb(log_blocks_per_seg)) / 8;
segment_count_nat would not exceed (CHECKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1) * 8 >> get_sb(log_blocks_per_seg) (align-down), I 
think.

In my mind, the nat version bitmap is only located in cp page...
So segment_count_nat has a fixed up bound limitation...


+
+   set_sb(segment_count_nat, get_sb(segment_count_nat) * 2);
+
+   /* use cp_payload if free space of f2fs_checkpoint is not enough */
+   if (max_sit_bitmap_size + max_nat_bitmap_size >
+   MAX_BITMAP_SIZE_IN_CKPT) {
+   u_int32_t diff =  max_sit_bitmap_size + max_nat_bitmap_size -
+   MAX_BITMAP_SIZE_IN_CKPT;

I looked up the f2fs source code, in __bitmap_ptr

if payload is used, the whole payload will be used for sit bitmap.

Therefore, I mean would it be u_int32_t diff = max_sit_bitmap_size?

or modify definition in f2fs.h?


+   set_sb(cp_payload, F2FS_BLK_ALIGN(diff));
 

[f2fs-dev] [PATCH RFC] f2fs: refactor get_new_segment

2017-12-16 Thread Gao Xiang via Linux-f2fs-devel
get_new_segment is too unclear to understand how it works.
This patch refactor it in a straight-forward way and
I think it is equal to the original one.

This patch also fixes two issues in the original get_new_segment:
1) left_start could be overflowed when hint == 0 at first:
...
} else {
go_left = 1;
*   left_start = hint - 1;
}
...
*   while (test_bit(left_start, free_i->free_secmap)) {
2) It will do find_next_zero_bit again when go_left == true and ALLOC_LEFT:
...
find_other_zone:
*   secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
if (secno >= MAIN_SECS(sbi)) {
if (dir == ALLOC_RIGHT) {
...
} else {
go_left = 1;
...
}
}
if (go_left == 0)
goto skip_left;
...
if (i < NR_CURSEG_TYPE) {
/* zone is in user, try another */
*   if (go_left)
*   hint = zoneno * sbi->secs_per_zone - 1;
...
init = false;
goto find_other_zone;
}

Signed-off-by: Gao Xiang <hsiang...@aol.com>
---
 fs/f2fs/segment.c | 137 --
 1 file changed, 81 insertions(+), 56 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c117e09..eea9d3f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2047,82 +2047,107 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
unsigned int *newseg, bool new_sec, int dir)
 {
struct free_segmap_info *free_i = FREE_I(sbi);
-   unsigned int segno, secno, zoneno;
+   unsigned int segno = *newseg, zoneno;
+   unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
unsigned int total_zones = MAIN_SECS(sbi) / sbi->secs_per_zone;
-   unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
-   unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
-   unsigned int left_start = hint;
-   bool init = true;
-   int go_left = 0;
+   unsigned int old_zoneno = GET_ZONE_FROM_SEC(sbi, secno);
+   bool may_again = true, actually_go_left = false;
int i;
 
spin_lock(_i->segmap_lock);
 
-   if (!new_sec && ((*newseg + 1) % sbi->segs_per_sec)) {
+   /* first, attempt to find a segment in the current section */
+   if (!new_sec && ((segno + 1) % sbi->segs_per_sec)) {
+   unsigned end_segno = GET_SEG_FROM_SEC(sbi, secno + 1);
+
segno = find_next_zero_bit(free_i->free_segmap,
-   GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1);
-   if (segno < GET_SEG_FROM_SEC(sbi, hint + 1))
-   goto got_it;
+   end_segno, segno + 1);
+
+   if (segno < end_segno)
+   goto out;
}
-find_other_zone:
-   secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
-   if (secno >= MAIN_SECS(sbi)) {
-   if (dir == ALLOC_RIGHT) {
-   secno = find_next_zero_bit(free_i->free_secmap,
-   MAIN_SECS(sbi), 0);
-   f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
-   } else {
-   go_left = 1;
-   left_start = hint - 1;
+
+try_another_section:
+   if (likely(!actually_go_left)) {
+   /*
+* since ALLOC_LEFT takes much effort,
+* prefer to ALLOC_RIGHT first
+*/
+   unsigned int new_secno = find_next_zero_bit(
+   free_i->free_secmap, MAIN_SECS(sbi), secno);
+
+   if (new_secno < MAIN_SECS(sbi)) {
+   secno = new_secno;
+   goto check_another_zone;
}
}
-   if (go_left == 0)
-   goto skip_left;
 
-   while (test_bit(left_start, free_i->free_secmap)) {
-   if (left_start > 0) {
-   left_start--;
-   continue;
-   }
-   left_start = find_next_zero_bit(free_i->free_secmap,
-   MAIN_SECS(sbi), 0);
-   f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
-   break;
+   if (dir == ALLOC_LEFT) {
+   /* ALLOC_LEFT, no free sections on the right side */
+   actually_go_left = true;
+
+   while(secno)
+   if (!test_bit(--secno, free_i->free_secmap))
+   goto check_another_zone;
}
-   secno = left_start;
-skip_left:
-   segno = GET_SEG_FROM_SEC(sbi, secno);
-  

  1   2   >