Hi Sahitya, On 2019/7/31 11:41, Sahitya Tummala wrote: > Hi Chao, > > On Tue, Jul 30, 2019 at 08:35:46PM +0800, Chao Yu wrote: >> Hi Sahitya, >> >> On 2019/7/30 12:36, Sahitya Tummala wrote: >>> Hi Chao, >>> >>> On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote: >>>> Hi Sahitya, >>>> >>>> On 2019-7-29 13:20, Sahitya Tummala wrote: >>>>> Policy - foreground GC, LFS mode and greedy GC mode. >>>>> >>>>> Under this policy, f2fs_gc() loops forever to GC as it doesn't have >>>>> enough free segements to proceed and thus it keeps calling gc_more >>>>> for the same victim segment. This can happen if the selected victim >>>>> segment could not be GC'd due to failed blkaddr validity check i.e. >>>>> is_alive() returns false for the blocks set in current validity map. >>>>> >>>>> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when >>>>> the segment selected could not be GC'd. This helps to select another >>>>> segment for GC and thus helps to proceed forward with GC. >>>>> >>>>> Signed-off-by: Sahitya Tummala <[email protected]> >>>>> --- >>>>> fs/f2fs/gc.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c >>>>> index 8974672..7bbcc4a 100644 >>>>> --- a/fs/f2fs/gc.c >>>>> +++ b/fs/f2fs/gc.c >>>>> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, >>>>> round++; >>>>> } >>>>> >>>>> - if (gc_type == FG_GC) >>>>> + if (gc_type == FG_GC && seg_freed) >>>>> sbi->cur_victim_sec = NULL_SEGNO; >>>> >>>> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump >>>> out of >>>> GC cycle, then SSR can skip the last victim due to sec_usage_check()... >>>> >>> >>> I see. I have a few questions on how to fix this issue. Please share your >>> comments. >>> >>> 1. Do you think the scenario described is valid? It happens rarely, not very >> >> IIRC, we suffered endless gc loop due to there is valid block belong to an >> opened atomic write file. (because we will skip directly once we hit atomic >> file) >> >> For your case, I'm not sure that would happen, did you look into is_alive(), >> why >> will it fail? block address not match? If so, it looks like summary info and >> dnode block and nat entry are inconsistent. > > Yes, from the ramdumps, I could see that block address is not matching and > hence, is_alive() could fail in the issue scenario. Have you observed any such > cases before? What could be the reason for this mismatch?
Alright, I didn't suffer such case before... I don't know, too few clues to find the root cause. I guess maybe: - random data caused by emmc/ufs firmware bugs - bit-flip or memory overflow - f2fs bugs So, for the solution, I suggest to detect such inconsistency, and tag in somewhere to just get rid of selecting the corrupted section. BTW, do you try fsck on that image? what's the result? Thanks, > > Thanks, > >> >>> easy to reproduce. From the dumps, I see that only block is set as valid in >>> the sentry->cur_valid_map for which I see that summary block check >>> is_alive() >>> could return false. As only one block is set as valid, chances are there it >>> can be always selected as the victim by get_victim_by_default() under FG_GC. >>> >>> 2. What are the possible scenarios where summary block check is_alive() >>> could >>> fail for a segment? >> >> I guess, maybe after check_valid_map(), the block is been truncated before >> is_alive(). If so the victim should be prefree directly instead of being >> selected again... >> >>> >>> 3. How does GC handle such segments? >> >> I think that's not a normal case, or I'm missing something. >> >> Thanks, >> >>> >>> Thanks, >>> >>>> Thanks, >>>> >>>>> >>>>> if (sync) >>>>> >>> >
