On 2019/2/18 下午1:27, Qu Wenruo wrote: > Patchset can be fetched from github: > https://github.com/adam900710/linux/tree/write_time_tree_checker > Which is based on v5.0-rc1 tag. > Also there is no conflict rebasing the patchset to misc-next.
Now the github branch rebased to v5.0-rc7 tag. And ran tests of btrfs auto group, no new regressions found. Git is clever enough in this rebase, so I bother mail bombing the mail list for another minor update. Thanks, Qu > > This patchset has the following 3 features: > - Tree block validation output enhancement > * Output validation failure timing (write time or read time) > * Always output tree block level/key mismatch error message > This part is already submitted and reviewed. > > - Write time tree block validation check > To catch memory corruption either from hardware or kernel. > Example output would be: > > BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 > slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 > 169 0) > BTRFS error (device dm-3): block=1350630375424 write time tree block > corruption detected > BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO > failure (Error while writing out transaction) > BTRFS info (device dm-3): forced readonly > BTRFS warning (device dm-3): Skipping commit of aborted transaction. > BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO > failure > BTRFS info (device dm-3): delayed_refs has NO entry > > - Better error handling before calling flush_write_bio() > One hidden reason of calling flush_write_bio() under all cases is, > flush_write_bio() will trigger endio function and endio function of > epd->bio will free the bio under all cases. > So we're in fact abusing flush_write_bio() as cleanup. > > Since now flush_write_bio() has its own return value, we shouldn't call > flush_write_bio() no-brain, here we introduce proper cleanup helper, > end_write_bio(). Now we call flush_write_bio() like: > New | Old > -------------------------------------------------------------- > ret = do_some_evil(&epd); | ret = do_some_evil(&epd); > if (ret < 0) { | flush_write_bio(&epd); > end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio? > return ret; | return ret; > } | > ret = flush_write_bio(&epd); | > return ret; | > > Above code should be more streamline for the error handling part. > > Changelog: > v2: > - Unlock locked pages in lock_extent_buffer_for_io() for error handling. > - Added Reviewed-by tags. > > v3: > - Remove duplicated error message. > - Use IS_ENABLED() macro to replace #ifdef. > - Added Reviewed-by tags. > > v4: > - Re-organized patch split > Now each BUG_ON() cleanup has its own patch > - Dig much further into the call sites to eliminate unexpected >0 return > May be a little paranoid and abuse some ASSERT(), but it should be > much safer against further code change. > - Fix the false alert caused by balance and memory pressure > The fix it skip owner checker for non-essential tree at write time. > Since owner root can't always be reliable, either due to commit root > created in current transaction or balance + memory pressure. > > v5: > - Do proper error-out handling other than relying on flush_write_bio() > to clean up. > This has a side effect that no Reviewed-by tags for modified patches. > - New comment for why we don't need to do anything about ebp->bio when > submit_one_bio() fails. > - Add some Reviewed-by tag. > > v5.1: > - Add "block=%llu " output for write/read time error line. > - Also output read time error message for fsid/start/level check. > > Qu Wenruo (12): > btrfs: Always output error message when key/level verification fails > btrfs: extent_io: Kill the forward declaration of flush_write_bio() > btrfs: disk-io: Show the timing of corrupted tree block explicitly > btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up > btrfs: extent_io: Handle error better in extent_write_full_page() > btrfs: extent_io: Handle error better in btree_write_cache_pages() > btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() > btrfs: extent_io: Handle error better in extent_write_locked_range() > btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() > btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() > btrfs: extent_io: Handle error better in extent_writepages() > btrfs: Do mandatory tree block check before submitting bio > > fs/btrfs/disk-io.c | 23 ++++-- > fs/btrfs/extent_io.c | 168 ++++++++++++++++++++++++++++------------ > fs/btrfs/tree-checker.c | 24 +++++- > fs/btrfs/tree-checker.h | 8 ++ > 4 files changed, 164 insertions(+), 59 deletions(-) >
signature.asc
Description: OpenPGP digital signature