On 04/24/2012 01:33 AM, Josef Bacik wrote: > We can deadlock waiting for pages to end writeback because we are doing an > allocation while hold a tree lock since the ordered extent stuff will > require tree locks. A quick easy way to fix this is to end page writeback > before we do our ordered io stuff, which works fine since we don't really > need the page for this to work. Eventually we want to make this work happen > as soon as the io is completed and then push the ordered extent stuff off to > a worker thread, but at this stage we need this deadlock fixed with changing > as little as possible. Thanks, >
Hi Josef, I'm ok with the patch, but could you show us more details about the deadlock between allocation and endio? thanks, liubo > Signed-off-by: Josef Bacik <jo...@redhat.com> > --- > fs/btrfs/compression.c | 3 +-- > fs/btrfs/extent_io.c | 27 +++++++++++---------------- > fs/btrfs/ordered-data.c | 4 +++- > 3 files changed, 15 insertions(+), 19 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index d11afa67..6a456d4 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -286,13 +286,12 @@ static void end_compressed_bio_write(struct bio *bio, > int err) > inode = cb->inode; > tree = &BTRFS_I(inode)->io_tree; > cb->compressed_pages[0]->mapping = cb->inode->i_mapping; > + end_compressed_writeback(inode, cb->start, cb->len); > tree->ops->writepage_end_io_hook(cb->compressed_pages[0], > cb->start, > cb->start + cb->len - 1, > NULL, 1); > cb->compressed_pages[0]->mapping = NULL; > - > - end_compressed_writeback(inode, cb->start, cb->len); > /* note, our inode could be gone now */ > > /* > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 7c501d3..308c1c2 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -1806,16 +1806,6 @@ static void check_page_locked(struct extent_io_tree > *tree, struct page *page) > } > > /* > - * helper function to end page writeback if all the extents > - * in the tree for that page are done with writeback > - */ > -static void check_page_writeback(struct extent_io_tree *tree, > - struct page *page) > -{ > - end_page_writeback(page); > -} > - > -/* > * When IO fails, either with EIO or csum verification fails, we > * try other mirrors that might have a good copy of the data. This > * io_failure_record is used to record state as we go through all the > @@ -2206,6 +2196,13 @@ int end_extent_writepage(struct page *page, int err, > u64 start, u64 end) > struct extent_io_tree *tree; > int ret; > > + /* > + * Don't want the page disappearing out from under us after we clear > + * writeback. > + */ > + page_cache_get(page); > + end_page_writeback(page); > + > tree = &BTRFS_I(page->mapping->host)->io_tree; > > if (tree->ops && tree->ops->writepage_end_io_hook) { > @@ -2220,8 +2217,10 @@ int end_extent_writepage(struct page *page, int err, > u64 start, u64 end) > ret = tree->ops->writepage_io_failed_hook(NULL, page, > start, end, NULL); > /* Writeback already completed */ > - if (ret == 0) > + if (ret == 0) { > + page_cache_release(page); > return 1; > + } > } > > if (!uptodate) { > @@ -2229,6 +2228,7 @@ int end_extent_writepage(struct page *page, int err, > u64 start, u64 end) > ClearPageUptodate(page); > SetPageError(page); > } > + page_cache_release(page); > return 0; > } > > @@ -2267,11 +2267,6 @@ static void end_bio_extent_writepage(struct bio *bio, > int err) > > if (end_extent_writepage(page, err, start, end)) > continue; > - > - if (whole_page) > - end_page_writeback(page); > - else > - check_page_writeback(tree, page); > } while (bvec >= bio->bi_io_vec); > > bio_put(bio); > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index bbf6d0d..acfb360 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -196,7 +196,7 @@ static int __btrfs_add_ordered_extent(struct inode > *inode, u64 file_offset, > entry->len = len; > entry->disk_len = disk_len; > entry->bytes_left = len; > - entry->inode = inode; > + entry->inode = igrab(inode); > entry->compress_type = compress_type; > if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE) > set_bit(type, &entry->flags); > @@ -399,6 +399,8 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent > *entry) > trace_btrfs_ordered_extent_put(entry->inode, entry); > > if (atomic_dec_and_test(&entry->refs)) { > + if (entry->inode) > + btrfs_add_delayed_iput(entry->inode); > while (!list_empty(&entry->list)) { > cur = entry->list.next; > sum = list_entry(cur, struct btrfs_ordered_sum, list); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html