Re: fsck bug in replaying partial frag truncate journal on UFS SU+J?

2014-06-18 Thread takehara . mikihito
Hello,

I kept analysis of the problem which I reported previously and found how to fix 
this problem.
(patch in my previou report is wrong.).

My understanding is there is a rule that blkno of JOP_FREEBLK or JOP_NEWBLK 
must be the first 
position in UFS block for each inode, for fsck_ffs's suj.c seems to support 
only such a case. But 
there is a case that kernel does create a JOP_FREEBLK journal which is not in 
this rule. The case is 
small file's partial frag truncation.

Bellow I attached a patch to fix this problem.
Adding new argument frags_offset for newfreework() to adjust for calling 
newjfreeblk(). New argument 
frags_offset is used by only softdep_journal_freeblocks()

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 4d0442b..2f2f063 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -1012,7 +1012,7 @@ staticvoid cancel_jfreeblk(struct freeblks *, 
ufs2_daddr_t);
 static struct jfreefrag *newjfreefrag(struct freefrag *, struct inode *,
ufs2_daddr_t, long, ufs_lbn_t);
 static struct freework *newfreework(struct ufsmount *, struct freeblks *,
-   struct freework *, ufs_lbn_t, ufs2_daddr_t, int, int, int);
+   struct freework *, ufs_lbn_t, ufs2_daddr_t, int, int, int, int);
 static int jwait(struct worklist *, int);
 static struct inodedep *inodedep_lookup_ip(struct inode *);
 static int bmsafemap_backgroundwrite(struct bmsafemap *, struct buf *);
@@ -3996,7 +3996,7 @@ free_freedep(freedep)
  * is visible outside of softdep_setup_freeblocks().
  */
 static struct freework *
-newfreework(ump, freeblks, parent, lbn, nb, frags, off, journal)
+newfreework(ump, freeblks, parent, lbn, nb, frags, off, journal, frags_offset)
struct ufsmount *ump;
struct freeblks *freeblks;
struct freework *parent;
@@ -4005,6 +4005,7 @@ newfreework(ump, freeblks, parent, lbn, nb, frags, off, 
journal)
int frags;
int off;
int journal;
+   int frags_offset;
 {
struct freework *freework;
 
@@ -4022,7 +4023,7 @@ newfreework(ump, freeblks, parent, lbn, nb, frags, off, 
journal)
? 0 : NINDIR(ump-um_fs) + 1;
freework-fw_start = freework-fw_off = off;
if (journal)
-   newjfreeblk(freeblks, lbn, nb, frags);
+   newjfreeblk(freeblks, lbn, nb - frags_offset, frags + 
frags_offset);
if (parent == NULL) {
ACQUIRE_LOCK(lk);
WORKLIST_INSERT(freeblks-fb_freeworkhd, freework-fw_list);
@@ -5958,7 +5959,7 @@ setup_freedirect(freeblks, ip, i, needj)
DIP_SET(ip, i_db[i], 0);
frags = sblksize(ip-i_fs, ip-i_size, i);
frags = numfrags(ip-i_fs, frags);
-   newfreework(ip-i_ump, freeblks, NULL, i, blkno, frags, 0, needj);
+   newfreework(ip-i_ump, freeblks, NULL, i, blkno, frags, 0, needj, 0);
 }
 
 static inline void
@@ -5977,7 +5978,7 @@ setup_freeext(freeblks, ip, i, needj)
ip-i_din2-di_extb[i] = 0;
frags = sblksize(ip-i_fs, ip-i_din2-di_extsize, i);
frags = numfrags(ip-i_fs, frags);
-   newfreework(ip-i_ump, freeblks, NULL, -1 - i, blkno, frags, 0, needj);
+   newfreework(ip-i_ump, freeblks, NULL, -1 - i, blkno, frags, 0, needj, 
0);
 }
 
 static inline void
@@ -5995,7 +5996,7 @@ setup_freeindir(freeblks, ip, i, lbn, needj)
return;
DIP_SET(ip, i_ib[i], 0);
newfreework(ip-i_ump, freeblks, NULL, lbn, blkno, ip-i_fs-fs_frag,
-   0, needj);
+   0, needj, 0);
 }
 
 static inline struct freeblks *
@@ -6111,7 +6112,7 @@ setup_trunc_indir(freeblks, ip, lbn, lastlbn, blkno)
if (off + 1 == NINDIR(ip-i_fs))
goto nowork;
freework = newfreework(ip-i_ump, freeblks, NULL, lbn, blkno, 0, off+1,
-   0);
+   0, 0);
/*
 * Link the freework into the indirdep.  This will prevent any new
 * allocations from proceeding until we are finished with the
@@ -6437,7 +6438,8 @@ softdep_journal_freeblocks(ip, cred, length, flags)
oldfrags = numfrags(ip-i_fs, oldfrags);
blkno += numfrags(ip-i_fs, frags);
newfreework(ip-i_ump, freeblks, NULL, lastlbn,
-   blkno, oldfrags, 0, needj);
+   blkno, oldfrags, 0, needj,
+   numfrags(ip-i_fs, frags));
} else if (blkno == 0)
allocblock = 1;
}
@@ -7737,7 +7739,7 @@ handle_workitem_freeblocks(freeblks, flags)
FREE_LOCK(lk);
freework = newfreework(ump, freeblks, NULL,
aip-ai_lbn, aip-ai_newblkno,
-   ump-um_fs-fs_frag, 0, 0);
+   ump-um_fs-fs_frag, 0, 0, 0);

fsck bug in replaying partial frag truncate journal on UFS SU+J?

2014-05-13 Thread takehara . mikihito
Hello,

I think fsck_ffs has a bug in replaying partial frag truncate journal on UFS 
SU+J.
Bellow I tested about the issue.

I tested blocksize==4KB, fragsize=512byte UFS SU+J. But I think these 
parameters are not related to 
this issue.
1) Preparing 4096byte(1block) test file.
2) Use truncate to shorten filesize to 3584byte(7frags).
3) Shutdown without unmount after journal is written and before inode size 
ufs2_dinode is written.
4) Run fsck_ffs with journal.
5) Mount again and remove test file. Then I face panic.
   panic: ffs_blkfree_cg: freeing free block

This seems to be caused by fsck_ffs to replay JOP_FREEBLK whose blkno is not 
block-aligned.
The above case 2), JOP_FREEBLK journal is like this:
   FREEBLK ino=5, blkno=1727, lbn=0, frags=1, oldfrags=0  ---(a)
fsck_ffs handles JOP_NEWBLK almost same as JOP_FREEBLK. But my understanding is 
that in case of 
JOP_NEWBLK kernel always create block-aligned blkno. So this issue is only in 
case of JOP_FREEBLK.

To analyze this issue, I also tested that using the above test file 
3584byte(7frags) and write 
512byte with append to largen to 4096byte(1block). In this case JOP_NEWBLK 
journal is like this:
   JOP_NEWBLK ino=5, blkno=1720, lbn=0, frags=8, oldfrags=7  ---(b)

I think the bug is in fsck_ffs suj.c's arround blk_check(). My understanding is 
that blk_check() 
cannot handle non-block-aligned blkno so there is a little trick in blk_build() 
bellow.

/*  

 * Rewrite the record using oldfrags to indicate the offset into

 * the block.  Leave jb_frags as the actual allocated count.

 */
blkrec-jb_blkno -= frag;
blkrec-jb_oldfrags = frag; 


By this trick, (a) is modified like:
   JOP_FREEBLK ino=5, blkno=1720, lbn=0, frags=1, oldfrags=7  ---(a')
and (b) is modified like:
   JOP_NEWBLK ino=5, blkno=1720, lbn=0, frags=8, oldfrags=0  ---(b')
But blk_check() cannot handle the case oldfrags!=0. If oldfrags!=0, 
blk_check()'s isat comes 
to be 0 even though the blk is present. So (a) should be modified same as (b')
The following is my patch to fix like this. According to my test, this patch 
works fine.
===
diff --git a/sbin/fsck_ffs/suj.c b/sbin/fsck_ffs/suj.c
index e21ffc6..2512522 100644
--- a/sbin/fsck_ffs/suj.c
+++ b/sbin/fsck_ffs/suj.c
@@ -2503,11 +2503,11 @@ blk_build(struct jblkrec *blkrec)
frag = fragnum(fs, blkrec-jb_blkno);
sblk = blk_lookup(blk, 1);
/*
-* Rewrite the record using oldfrags to indicate the offset into
-* the block.  Leave jb_frags as the actual allocated count.
+* Rewrite the record to indicate the offset into the block.
 */
blkrec-jb_blkno -= frag;
-   blkrec-jb_oldfrags = frag;
+   blkrec-jb_frags += frag;
+   blkrec-jb_oldfrags = 0;
if (blkrec-jb_oldfrags + blkrec-jb_frags  fs-fs_frag)
err_suj(Invalid fragment count %d oldfrags %d\n,
blkrec-jb_frags, frag);
===

I think we can fix this by changing kernel code not to create non-block-aligned 
JOP_FREEBLK.
But I also think it is better to change fsck by considering compatibility.


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


uninitialized journal data written in SU+J ?

2014-04-22 Thread takehara . mikihito
Hello,


I'm testing UFS with SU+J. But it seems sometimes broken journal data has 
written.

In softdep_process_journal (ffs_softdep.c), there is a while code to build 
jsegrec and each entry.
But by my test, sometimes there is no entry then break this while code without 
building jsegrec.
If this happens, bp-b_data is not initialized but this bp is written, I think.

I checked this behavior by following patch.

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 585af50..2d4939c 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -3421,6 +3421,15 @@ softdep_process_journal(mp, needwk, flags)
data = bp-b_data + off;
cnt--;
}
+
+#if 1
+   if (off == 0) {
+   struct jsegrec *tmp = (struct jsegrec*)bp-b_data;
+   if (tmp-jsr_seq != jseg-js_seq) {
+   panic(test test);
+   }
+   }
+#endif
/*
 * Write this one buffer and continue.
 */


If uninitialized data is valid by fsck suj, this may result filesystem 
corruption, I think.
I think it's better to clear b_data before using it.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org