Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Gao Xiang

Hi Juhyung,

On 2023/12/5 22:43, Juhyung Park wrote:

On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang  wrote:




...



I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.


Down the rabbit hole we go...

Let me know if you have trouble getting an instance with FSRM. I'll
see what I can do.


I've sent out a fix to address this, please help check:
https://lore.kernel.org/r/20231206030758.3760521-1-hsiang...@linux.alibaba.com

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 4/4 v2] f2fs: let's finish or reset zones all the time

2023-12-05 Thread Jaegeuk Kim
In order to limit # of open zones, let's finish or reset zones given # of
valid blocks per section and its zone condition.

Reviewed-by: Daeho Jeong 
Signed-off-by: Jaegeuk Kim 
---

 - remove unnecessary wp_block

 fs/f2fs/segment.c | 75 +++
 1 file changed, 17 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9081c9af977a..007ebb107236 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4870,82 +4870,43 @@ static int check_zone_write_pointer(struct f2fs_sb_info 
*sbi,
struct f2fs_dev_info *fdev,
struct blk_zone *zone)
 {
-   unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
-   block_t zone_block, wp_block, last_valid_block;
+   unsigned int zone_segno;
+   block_t zone_block, valid_block_cnt;
unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
-   int i, s, b, ret;
-   struct seg_entry *se;
+   int ret;
 
if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
return 0;
 
-   wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
-   wp_segno = GET_SEGNO(sbi, wp_block);
-   wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
zone_segno = GET_SEGNO(sbi, zone_block);
-   zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
-
-   if (zone_segno >= MAIN_SEGS(sbi))
-   return 0;
 
/*
 * Skip check of zones cursegs point to, since
 * fix_curseg_write_pointer() checks them.
 */
-   for (i = 0; i < NO_CHECK_TYPE; i++)
-   if (zone_secno == GET_SEC_FROM_SEG(sbi,
-  CURSEG_I(sbi, i)->segno))
-   return 0;
+   if (zone_segno >= MAIN_SEGS(sbi) ||
+   IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, zone_segno)))
+   return 0;
 
/*
-* Get last valid block of the zone.
+* Get # of valid block of the zone.
 */
-   last_valid_block = zone_block - 1;
-   for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
-   segno = zone_segno + s;
-   se = get_seg_entry(sbi, segno);
-   for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
-   if (f2fs_test_bit(b, se->cur_valid_map)) {
-   last_valid_block = START_BLOCK(sbi, segno) + b;
-   break;
-   }
-   if (last_valid_block >= zone_block)
-   break;
-   }
+   valid_block_cnt = get_valid_blocks(sbi, zone_segno, true);
 
-   /*
-* When safely unmounted in the previous mount, we can trust write
-* pointers. Otherwise, finish zones.
-*/
-   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   /*
-* The write pointer matches with the valid blocks or
-* already points to the end of the zone.
-*/
-   if ((last_valid_block + 1 == wp_block) ||
-   (zone->wp == zone->start + zone->len))
-   return 0;
-   }
+   if ((!valid_block_cnt && zone->cond == BLK_ZONE_COND_EMPTY) ||
+   (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL))
+   return 0;
 
-   if (last_valid_block + 1 == zone_block) {
-   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   /*
-* If there is no valid block in the zone and if write
-* pointer is not at zone start, reset the write
-* pointer.
-*/
-   f2fs_notice(sbi,
- "Zone without valid block has non-zero write "
- "pointer. Reset the write pointer: wp[0x%x,0x%x]",
- wp_segno, wp_blkoff);
-   }
+   if (!valid_block_cnt) {
+   f2fs_notice(sbi, "Zone without valid block has non-zero write "
+   "pointer. Reset the write pointer: cond[0x%x]",
+   zone->cond);
ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block,
zone->len >> log_sectors_per_block);
if (ret)
f2fs_err(sbi, "Discard zone failed: %s (errno=%d)",
 fdev->path, ret);
-
return ret;
}
 
@@ -4957,10 +4918,8 @@ static int check_zone_write_pointer(struct f2fs_sb_info 
*sbi,
 * selected for write operation until it get discarded.
 */
f2fs_notice(sbi, "Valid blocks are not aligned with write "
-   "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]",
-  

Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Juhyung Park
On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang  wrote:
>
>
>
> On 2023/12/5 22:23, Juhyung Park wrote:
> > Hi Gao,
> >
> > On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang  
> > wrote:
> >>
> >> Hi Juhyung,
> >>
> >> On 2023/12/4 11:41, Juhyung Park wrote:
> >>
> >> ...
> >>>
> 
>  - Could you share the full message about the output of `lscpu`?
> >>>
> >>> Sure:
> >>>
> >>> Architecture:x86_64
> >>> CPU op-mode(s):32-bit, 64-bit
> >>> Address sizes: 39 bits physical, 48 bits virtual
> >>> Byte Order:Little Endian
> >>> CPU(s):  8
> >>> On-line CPU(s) list:   0-7
> >>> Vendor ID:   GenuineIntel
> >>> BIOS Vendor ID:Intel(R) Corporation
> >>> Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> >>>   BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 
> >>> None CPU
> >>> @ 3.0GHz
> >>>   BIOS CPU family: 198
> >>>   CPU family:  6
> >>>   Model:   140
> >>>   Thread(s) per core:  2
> >>>   Core(s) per socket:  4
> >>>   Socket(s):   1
> >>>   Stepping:1
> >>>   CPU(s) scaling MHz:  60%
> >>>   CPU max MHz: 4800.
> >>>   CPU min MHz: 400.
> >>>   BogoMIPS:5990.40
> >>>   Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep 
> >>> mtrr pge mc
> >>>a cmov pat pse36 clflush dts acpi mmx fxsr sse 
> >>> sse2 ss
> >>>ht tm pbe syscall nx pdpe1gb rdtscp lm 
> >>> constant_tsc art
> >>> arch_perfmon pebs bts rep_good nopl xtopology 
> >>> nonstop_
> >>>tsc cpuid aperfmperf tsc_known_freq pni 
> >>> pclmulqdq dtes6
> >>>4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg 
> >>> fma cx16 xt
> >>>pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
> >>> tsc_dead
> >>>line_timer aes xsave avx f16c rdrand lahf_lm 
> >>> abm 3dnowp
> >>>refetch cpuid_fault epb cat_l2 cdp_l2 ssbd 
> >>> ibrs ibpb st
> >>>ibp ibrs_enhanced tpr_shadow flexpriority ept 
> >>> vpid ept_
> >>>ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 
> >>> erms invpcid
> >>> rdt_a avx512f avx512dq rdseed adx smap 
> >>> avx512ifma clfl
> >>>ushopt clwb intel_pt avx512cd sha_ni avx512bw 
> >>> avx512vl
> >>>xsaveopt xsavec xgetbv1 xsaves 
> >>> split_lock_detect dtherm
> >>> ida arat pln pts hwp hwp_notify 
> >>> hwp_act_window hwp_epp
> >>> hwp_pkg_req vnmi avx512vbmi umip pku ospke 
> >>> avx512_vbmi
> >>>2 gfni vaes vpclmulqdq avx512_vnni 
> >>> avx512_bitalg tme av
> >>>x512_vpopcntdq rdpid movdiri movdir64b fsrm 
> >>> avx512_vp2i
> >>
> >> Sigh, I've been thinking.  Here FSRM is the most significant difference 
> >> between
> >> our environments, could you only try the following diff to see if there's 
> >> any
> >> difference anymore? (without the previous disable patch.)
> >>
> >> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> >> index 1b60ae81ecd8..1b52a913233c 100644
> >> --- a/arch/x86/lib/memmove_64.S
> >> +++ b/arch/x86/lib/memmove_64.S
> >> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
> >>#define CHECK_LEN cmp $0x20, %rdx; jb 1f
> >>#define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
> >>.Lmemmove_begin_forward:
> >> -   ALTERNATIVE_2 __stringify(CHECK_LEN), \
> >> - __stringify(CHECK_LEN; MEMMOVE_BYTES), 
> >> X86_FEATURE_ERMS, \
> >> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
> >> +   CHECK_LEN
> >>
> >>  /*
> >>   * movsq instruction have many startup latency
> >
> > Yup, that also seems to fix it.
> > Are we looking at a potential memmove issue?
>
> I'm still analyzing this behavior as well as the root cause and
> I will also try to get a recent cloud server with FSRM myself
> to find more clues.

Down the rabbit hole we go...

Let me know if you have trouble getting an instance with FSRM. I'll
see what I can do.

>
> Thanks,
> Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Gao Xiang



On 2023/12/5 22:23, Juhyung Park wrote:

Hi Gao,

On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang  wrote:


Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...




- Could you share the full message about the output of `lscpu`?


Sure:

Architecture:x86_64
CPU op-mode(s):32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order:Little Endian
CPU(s):  8
On-line CPU(s) list:   0-7
Vendor ID:   GenuineIntel
BIOS Vendor ID:Intel(R) Corporation
Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None 
CPU
@ 3.0GHz
  BIOS CPU family: 198
  CPU family:  6
  Model:   140
  Thread(s) per core:  2
  Core(s) per socket:  4
  Socket(s):   1
  Stepping:1
  CPU(s) scaling MHz:  60%
  CPU max MHz: 4800.
  CPU min MHz: 400.
  BogoMIPS:5990.40
  Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mc
   a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 
ss
   ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
art
arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_
   tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq 
dtes6
   4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 
xt
   pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_dead
   line_timer aes xsave avx f16c rdrand lahf_lm abm 
3dnowp
   refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb 
st
   ibp ibrs_enhanced tpr_shadow flexpriority ept vpid 
ept_
   ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid
rdt_a avx512f avx512dq rdseed adx smap avx512ifma 
clfl
   ushopt clwb intel_pt avx512cd sha_ni avx512bw 
avx512vl
   xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
dtherm
ida arat pln pts hwp hwp_notify hwp_act_window 
hwp_epp
hwp_pkg_req vnmi avx512vbmi umip pku ospke 
avx512_vbmi
   2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme 
av
   x512_vpopcntdq rdpid movdiri movdir64b fsrm 
avx512_vp2i


Sigh, I've been thinking.  Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
   #define CHECK_LEN cmp $0x20, %rdx; jb 1f
   #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
   .Lmemmove_begin_forward:
-   ALTERNATIVE_2 __stringify(CHECK_LEN), \
- __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
- __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+   CHECK_LEN

 /*
  * movsq instruction have many startup latency


Yup, that also seems to fix it.
Are we looking at a potential memmove issue?


I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-05 Thread Juhyung Park
Hi Gao,

On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang  wrote:
>
> Hi Juhyung,
>
> On 2023/12/4 11:41, Juhyung Park wrote:
>
> ...
> >
> >>
> >> - Could you share the full message about the output of `lscpu`?
> >
> > Sure:
> >
> > Architecture:x86_64
> >CPU op-mode(s):32-bit, 64-bit
> >Address sizes: 39 bits physical, 48 bits virtual
> >Byte Order:Little Endian
> > CPU(s):  8
> >On-line CPU(s) list:   0-7
> > Vendor ID:   GenuineIntel
> >BIOS Vendor ID:Intel(R) Corporation
> >Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> >  BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 
> > None CPU
> >@ 3.0GHz
> >  BIOS CPU family: 198
> >  CPU family:  6
> >  Model:   140
> >  Thread(s) per core:  2
> >  Core(s) per socket:  4
> >  Socket(s):   1
> >  Stepping:1
> >  CPU(s) scaling MHz:  60%
> >  CPU max MHz: 4800.
> >  CPU min MHz: 400.
> >  BogoMIPS:5990.40
> >  Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
> > pge mc
> >   a cmov pat pse36 clflush dts acpi mmx fxsr sse 
> > sse2 ss
> >   ht tm pbe syscall nx pdpe1gb rdtscp lm 
> > constant_tsc art
> >arch_perfmon pebs bts rep_good nopl xtopology 
> > nonstop_
> >   tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq 
> > dtes6
> >   4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma 
> > cx16 xt
> >   pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
> > tsc_dead
> >   line_timer aes xsave avx f16c rdrand lahf_lm abm 
> > 3dnowp
> >   refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs 
> > ibpb st
> >   ibp ibrs_enhanced tpr_shadow flexpriority ept 
> > vpid ept_
> >   ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
> > invpcid
> >rdt_a avx512f avx512dq rdseed adx smap 
> > avx512ifma clfl
> >   ushopt clwb intel_pt avx512cd sha_ni avx512bw 
> > avx512vl
> >   xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
> > dtherm
> >ida arat pln pts hwp hwp_notify hwp_act_window 
> > hwp_epp
> >hwp_pkg_req vnmi avx512vbmi umip pku ospke 
> > avx512_vbmi
> >   2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg 
> > tme av
> >   x512_vpopcntdq rdpid movdiri movdir64b fsrm 
> > avx512_vp2i
>
> Sigh, I've been thinking.  Here FSRM is the most significant difference 
> between
> our environments, could you only try the following diff to see if there's any
> difference anymore? (without the previous disable patch.)
>
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 1b60ae81ecd8..1b52a913233c 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
>   #define CHECK_LEN cmp $0x20, %rdx; jb 1f
>   #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
>   .Lmemmove_begin_forward:
> -   ALTERNATIVE_2 __stringify(CHECK_LEN), \
> - __stringify(CHECK_LEN; MEMMOVE_BYTES), 
> X86_FEATURE_ERMS, \
> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
> +   CHECK_LEN
>
> /*
>  * movsq instruction have many startup latency

Yup, that also seems to fix it.
Are we looking at a potential memmove issue?

>
> Thanks,
> Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel