On Thu, 2014-09-11 at 18:47 +0800, Chao Yu wrote:
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
> > Sent: Thursday, September 11, 2014 1:37 PM
> > To: huang ying
> > Cc: linux-f2fs-de...@lists.sourceforge.net; LKML; Huang Ying
> > Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode 
> > dnode < nid of inode
> > 
> > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaeg...@kernel.org> wrote:
> > >
> > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaeg...@kernel.org> 
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > Hi Huang,
> > > > > > > >
> > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and 
> > > > > > > > > the
> > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > before
> > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause 
> > > > > > > > > the
> > > > > > > > > recovery fail.
> > > > > > > > >
> > > > > > > > > Usually, inode will be allocated before non-inode dnode, so 
> > > > > > > > > the
> > > > nid
> > > > > > of
> > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > reverse.
> > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > >
> > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode 
> > > > > > > > > in
> > > > > > > > > find_fsync_dnodes.
> > > > > > > > >
> > > > > > > > > The patch was tested via allocating nid reversely via a 
> > > > > > > > > debugging
> > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Huang, Ying <ying.hu...@intel.com>
> > > > > > > > > ---
> > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > >                                                   
> > > > > > > > > FI_INC_LINK);
> > > > > > > > > -         } else {
> > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > >
> > > > > > > > If this is not inode block, we should add this inode to recover 
> > > > > > > > its
> > > > > > data blocks.
> > > > > > >
> > > > > > > Is it possible that there is only non-inode dnode but no inode 
> > > > > > > when
> > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > changes to
> > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > that
> > > > > > > we will write inode block.  Is it right?  If so, the solution in 
> > > > > > > this
> > > > > > > patch should work too.
> > > > > >
> > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > recovery
> > > > > > fail.
> > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > directly.
> > > > > >
> > > > >
> > > > > Yes.  That is another way to fix the issue.
> > > > >
> > > > >
> > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > For exmaple,
> > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > >
> > > > > > In that case, we can get only non-inode dnode in the node chain, 
> > > > > > since
> > > > the
> > > > > > inode
> > > > > > has not fsync_mark.
> > > > > >
> > > > >
> > > > > I think your solution is better here, but does not fix all scenarios. 
> > > > >  If
> > > > > the inode is checkpointed, the file can be recovered, although the 
> > > > > inode
> > > > > information may be not up to date.  But if the inode is not 
> > > > > checkpointed,
> > > > > f2fs_iget will fail too and recover will fail.
> > > >
> > > > Ok, let me consider your scenarios.
> > > >
> > > > Term: F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > >  -> Lose the latest inode(x). Need to fix.
> > > >
> > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > >  -> Impossible, but recover latest dnode(F)
> > > >
> > > > 3. CP | inode(x) | dnode(F)
> > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > >
> > > > 4. CP | dnode(F) | inode(DF)
> > > >  -> If f2fs_iget fails, then goto next.
> > > >
> > > > 5. CP | dnode(F) | inode(x)
> > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > scenario.
> > > >     Drop this dnode(F).
> > > >
> > > > Indeed, there were some missing scenarios.
> > > >
> > > > So, how about this patch?
> > > >
> > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > From: Jaegeuk Kim <jaeg...@kernel.org>
> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > >
> > > > We can summarize the roll forward recovery scenarios as follows.
> > > >
> > > > [Term] F: fsync_mark, D: dentry_mark
> > > >
> > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > -> Update the latest inode(x).
> > > >
> > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > -> Impossible, but recover latest dnode(F)
> > > >
> > >
> > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > writeback. f2fs_sync_file written dnode(F), then writeback written 
> > > inode(x).
> > 
> > If the inode(x) was written, f2fs_sync_file will do write the inode again 
> > with
> > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > 
> > In f2fs_sync_file,
> >     ...
> >     while (!sync_node_pages(sbi, ino, &wbc)) {
> >             if (fsync_mark_done(sbi, ino))
> >                     goto out;
> >             mark_inode_dirty_sync(inode);
> >             ret = f2fs_write_inode(inode, NULL);
> >             if (ret)
> >                     goto out;
> >     }
> >     ...
> > 
> > >
> > >
> > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > -> No problem.
> > > >
> > > > 5. CP | inode(x) | dnode(F)
> > > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > > >
> > >
> > > I think this is possible.  If inode(x) is written by writeback and then
> > > dnode(F) is written by f2fs_sync_file.
> > 
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> > This is fixed by the below updated patch.
> > 
> > >
> > >
> > > > 6. CP | inode(DF) | dnode(F)
> > > > -> No problem.
> > > >
> > > > 7. CP | dnode(F) | inode(DF)
> > > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > >
> > > > 8. CP | dnode(F) | inode(x)
> > > > -> If f2fs_iget fails, then goto next. But, this is an impossible 
> > > > scenario.
> > > >    Drop this dnode(F).
> > > >
> > >
> > > I think this is possible.  as 3.
> > 
> > ditto.
> > 
> > >
> > >
> > > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > >
> > > > Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
> > > > ---
> > > >  fs/f2fs/file.c     | 10 ++++++++
> > > >  fs/f2fs/recovery.c | 70
> > > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > > >  2 files changed, 69 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > index 5cde363..2660af2 100644
> > > > --- a/fs/f2fs/file.c
> > > > +++ b/fs/f2fs/file.c
> > > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > > loff_t end, int datasync)
> > > >                         up_write(&fi->i_sem);
> > > >                 }
> > > >         } else {
> > > > +               /*
> > > > +                * CP | inode(x) | dnode(F)
> > > > +                * We need to remain inode(DF) for roll-forward 
> > > > recovery.
> > > > +                */
> > > > +               if (fsync_mark_done(sbi, inode->i_ino) &&
> > > >
> > >
> > > How can this happen?  We have not called sync_node_pages(sbi, 
> > > inode->i_ino,
> > > &wbc).  Where is the fsync mark set?
> > >
> > >
> > > > +                       !is_checkpointed_node(sbi, 
> > > > F2FS_I(inode)->i_pino))
> > > > {
> > > >
> > >
> > > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > > checkpoint.  Right?
> > 
> > Yes, whole conditions were wrong.
> > Please, refer the following patch.
> > Thanks,
> > 
> > >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaeg...@kernel.org>
> > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
> > 
> > We can summarize the roll forward recovery scenarios as follows.
> > 
> > [Term] F: fsync_mark, D: dentry_mark
> > 
> > 1. inode(x) | CP | inode(x) | dnode(F)
> > -> Update the latest inode(x).
> > 
> > 2. inode(x) | CP | inode(F) | dnode(F)
> > -> No problem.
> > 
> > 3. inode(x) | CP | dnode(F) | inode(x)
> > -> Impossible, but recover latest dnode(F).
> > 
> > 4. inode(x) | CP | dnode(F) | inode(F)
> > -> No problem.
> > 
> > 5. CP | inode(x) | dnode(F)
> > -> The inode(DF) was missing. Should drop this dnode(F).
> > 
> > 6. CP | inode(DF) | dnode(F)
> > -> No problem.
> > 
> > 7. CP | dnode(F) | inode(DF)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> > 
> > 8. CP | dnode(F) | inode(x)
> > -> If f2fs_iget fails, then goto next to find inode(DF).
> >    But it will fail due to no inode(DF).
> > 
> > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > 
> > Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
> > ---
> >  fs/f2fs/file.c     | 10 ++++++++
> >  fs/f2fs/recovery.c | 70 
> > +++++++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 69 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index e7681c3..e2a2f38 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, 
> > loff_t end, int
> > datasync)
> >                     up_write(&fi->i_sem);
> >             }
> >     } else {
> > +           /*
> > +            * CP | inode(x) | dnode(F)
> > +            * We need to remain inode(DF) for roll-forward recovery.
> > +            */
> > +           if (!fsync_mark_done(sbi, inode->i_ino) &&
> > +                   !is_checkpointed_node(sbi, inode->i_ino)) {
> > +                   mark_inode_dirty_sync(inode);
> > +                   f2fs_write_inode(inode, NULL);
> > +           }
> 
> This un-checkpointed non-fsynced inode page might be writeback by
> kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.
> 
> So how about modifying as below:
> 
>               while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
>                       (!fsync_mark_done(sbi, inode->i_ino) &&
>                       !is_checkpointed_node(sbi, inode->i_ino))) {
> 
> to guarantee the inode(DF) is be written out to disk?

If sync_node_pages write dnode(F), the fsync_done flag for inode will be
set if my understanding of code were correct.  In set_node_addr:

        /* update fsync_mark if its inode nat entry is still alive */
        e = __lookup_nat_cache(nm_i, ni->ino);
        if (e)
                e->fsync_done = fsync_done;

Best Regards,
Huang, Ying

> Thanks,
> Yu
> 
> > +
> >             /* if there is no written node page, write its inode page */
> >             while (!sync_node_pages(sbi, ino, &wbc)) {
> >                     if (fsync_mark_done(sbi, ino))
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 6c5a74a..eeb3785 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -14,6 +14,36 @@
> >  #include "node.h"
> >  #include "segment.h"
> > 
> > +/*
> > + * Roll forward recovery scenarios.
> > + *
> > + * [Term] F: fsync_mark, D: dentry_mark
> > + *
> > + * 1. inode(x) | CP | inode(x) | dnode(F)
> > + * -> Update the latest inode(x).
> > + *
> > + * 2. inode(x) | CP | inode(F) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 3. inode(x) | CP | dnode(F) | inode(x)
> > + * -> Impossible, but recover latest dnode(F)
> > + *
> > + * 4. inode(x) | CP | dnode(F) | inode(F)
> > + * -> No problem.
> > + *
> > + * 5. CP | inode(x) | dnode(F)
> > + * -> The inode(DF) was missing. Should drop this dnode(F).
> > + *
> > + * 6. CP | inode(DF) | dnode(F)
> > + * -> No problem.
> > + *
> > + * 7. CP | dnode(F) | inode(DF)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *
> > + * 8. CP | dnode(F) | inode(x)
> > + * -> If f2fs_iget fails, then goto next to find inode(DF).
> > + *    But it will fail due to no inode(DF).
> > + */
> >  static struct kmem_cache *fsync_entry_slab;
> > 
> >  bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> > @@ -110,27 +140,32 @@ out:
> >     return err;
> >  }
> > 
> > -static int recover_inode(struct inode *inode, struct page *node_page)
> > +static void __recover_inode(struct inode *inode, struct page *page)
> >  {
> > -   struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> > +   struct f2fs_inode *raw = F2FS_INODE(page);
> > +
> > +   inode->i_mode = le16_to_cpu(raw->i_mode);
> > +   i_size_write(inode, le64_to_cpu(raw->i_size));
> > +   inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +   inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> > +   inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> > +   inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +   inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> > +   inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> > +}
> > 
> > +static int recover_inode(struct inode *inode, struct page *node_page)
> > +{
> >     if (!IS_INODE(node_page))
> >             return 0;
> > 
> > -   inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> > -   i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> > -   inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -   inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> > -   inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> > -   inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > -   inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> > -   inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> > +   __recover_inode(inode, node_page);
> > 
> >     if (is_dent_dnode(node_page))
> >             return recover_dentry(node_page, inode);
> > 
> >     f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> > -                   ino_of_node(node_page), raw_inode->i_name);
> > +                   ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> >     return 0;
> >  }
> > 
> > @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info 
> > *sbi, struct list_head
> > *head)
> >                             break;
> >                     }
> > 
> > +                   /*
> > +                    * CP | dnode(F) | inode(DF)
> > +                    * For this case, we should not give up now.
> > +                    */
> >                     entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> >                     if (IS_ERR(entry->inode)) {
> >                             err = PTR_ERR(entry->inode);
> >                             kmem_cache_free(fsync_entry_slab, entry);
> > +                           if (err == -ENOENT)
> > +                                   goto next;
> >                             break;
> >                     }
> >                     list_add_tail(&entry->list, head);
> > @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> >             entry = get_fsync_inode(head, ino_of_node(page));
> >             if (!entry)
> >                     goto next;
> > +           /*
> > +            * inode(x) | CP | inode(x) | dnode(F)
> > +            * In this case, we can lose the latest inode(x).
> > +            * So, call __recover_inode for the inode update.
> > +            */
> > +           if (IS_INODE(page))
> > +                   __recover_inode(entry->inode, page);
> > 
> >             err = do_recover_data(sbi, entry->inode, page, blkaddr);
> >             if (err)
> > --
> > 1.8.5.2 (Apple Git-48)
> > 
> > 
> > ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > linux-f2fs-de...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to