On Friday 19 Jun 2015 12:45:37 Liu Bo wrote: > On Mon, Jun 01, 2015 at 08:52:36PM +0530, Chandan Rajendra wrote: > > For the subpagesize-blocksize scenario, a page can contain multiple > > blocks. In such cases, this patch handles reading data from files. > > > > To track the status of individual blocks of a page, this patch makes use > > of a bitmap pointed to by page->private. > > Start going through the patchset, it's not easy though. > > Several comments are following.
Thanks for the review comments Liu. > > +static int modify_page_blks_state(struct page *page, > > + unsigned long blk_states, > > + u64 start, u64 end, int set) > > +{ > > + struct inode *inode = page->mapping->host; > > + unsigned long *bitmap; > > + unsigned long state; > > + u64 nr_blks; > > + u64 blk; > > + > > + BUG_ON(!PagePrivate(page)); > > + > > + bitmap = ((struct btrfs_page_private *)page->private)->bstate; > > + > > + blk = (start & (PAGE_CACHE_SIZE - 1)) >> inode->i_blkbits; > > + nr_blks = (end - start + 1) >> inode->i_blkbits; > > + > > + while (nr_blks--) { > > + state = find_next_bit(&blk_states, BLK_NR_STATE, 0); > > Looks like we don't need to do find_next_bit for every block. Yes, I agree. The find_next_bit() invocation in the outer loop can be moved outside the loop. > > > + > > + while (state < BLK_NR_STATE) { > > + if (set) > > + set_bit((blk * BLK_NR_STATE) + state, bitmap); > > + else > > + clear_bit((blk * BLK_NR_STATE) + state, bitmap); > > + > > + state = find_next_bit(&blk_states, BLK_NR_STATE, > > + state + 1); > > + } > > + > > + ++blk; > > + } > > + > > + return 0; > > +} > > + > > > > /* > > > > * after a readpage IO is done, we need to: > > * clear the uptodate bits on error > > > > @@ -2548,14 +2628,16 @@ static void end_bio_extent_readpage(struct bio > > *bio, int err)> > > struct bio_vec *bvec; > > int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); > > struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); > > > > + struct extent_state *cached = NULL; > > + struct btrfs_page_private *pg_private; > > > > struct extent_io_tree *tree; > > > > + unsigned long flags; > > > > u64 offset = 0; > > u64 start; > > u64 end; > > > > - u64 len; > > - u64 extent_start = 0; > > - u64 extent_len = 0; > > + int nr_sectors; > > > > int mirror; > > > > + int unlock; > > > > int ret; > > int i; > > > > @@ -2565,54 +2647,31 @@ static void end_bio_extent_readpage(struct bio > > *bio, int err)> > > bio_for_each_segment_all(bvec, bio, i) { > > > > struct page *page = bvec->bv_page; > > struct inode *inode = page->mapping->host; > > > > + struct btrfs_root *root = BTRFS_I(inode)->root; > > > > pr_debug("end_bio_extent_readpage: bi_sector=%llu, err=%d, " > > > > "mirror=%u\n", (u64)bio->bi_iter.bi_sector, err, > > io_bio->mirror_num); > > > > tree = &BTRFS_I(inode)->io_tree; > > > > - /* We always issue full-page reads, but if some block > > - * in a page fails to read, blk_update_request() will > > - * advance bv_offset and adjust bv_len to compensate. > > - * Print a warning for nonzero offsets, and an error > > - * if they don't add up to a full page. */ > > - if (bvec->bv_offset || bvec->bv_len != PAGE_CACHE_SIZE) { > > - if (bvec->bv_offset + bvec->bv_len != PAGE_CACHE_SIZE) > > - btrfs_err(BTRFS_I(page->mapping->host)->root- >fs_info, > > - "partial page read in btrfs with offset %u and length %u", > > - bvec->bv_offset, bvec->bv_len); > > - else > > - btrfs_info(BTRFS_I(page->mapping->host)->root- >fs_info, > > - "incomplete page read in btrfs with offset %u and " > > - "length %u", > > - bvec->bv_offset, bvec->bv_len); > > - } > > - > > - start = page_offset(page); > > - end = start + bvec->bv_offset + bvec->bv_len - 1; > > - len = bvec->bv_len; > > - > > + start = page_offset(page) + bvec->bv_offset; > > + end = start + bvec->bv_len - 1; > > + nr_sectors = bvec->bv_len >> inode->i_sb->s_blocksize_bits; > > > > mirror = io_bio->mirror_num; > > > > - if (likely(uptodate && tree->ops && > > - tree->ops->readpage_end_io_hook)) { > > + > > +next_block: > > + if (likely(uptodate)) { > > Any reason of killing (tree->ops && tree->ops->readpage_end_io_hook)? In subpagesize-blocksize scenario, For extent buffers we need the ability to read just a single extent buffer rather than reading the complete contents of the page containing the extent buffer. Similarly in the corresponding endio function we need to verify a single extent buffer rather than the contents of the full page. Hence I ended up removing btree_readpage_end_io_hook() and btree_io_failed_hook() functions and had verfication functions being invoked directly by the endio function. So since data "read page code" was the only one left to have extent_io_tree->ops->readpage_end_io_hook set, I removed the code to check for its existance. Now i realize that it is not the right thing to do. I will restore back the condition check to its original state. > > > ret = tree->ops->readpage_end_io_hook(io_bio, offset, > > > > - page, start, end, > > - mirror); > > + page, start, > > + start + root- >sectorsize - 1, > > + mirror); > > > > if (ret) > > > > uptodate = 0; > > > > else > > > > clean_io_failure(inode, start, page, 0); > > > > } > > > > - if (likely(uptodate)) > > - goto readpage_ok; > > - > > - if (tree->ops && tree->ops->readpage_io_failed_hook) { > > - ret = tree->ops->readpage_io_failed_hook(page, mirror); > > - if (!ret && !err && > > - test_bit(BIO_UPTODATE, &bio->bi_flags)) > > - uptodate = 1; > > - } else { > > + if (!uptodate) { > > > > /* > > > > * The generic bio_readpage_error handles errors the > > * following way: If possible, new read requests are > > > > @@ -2623,61 +2682,63 @@ static void end_bio_extent_readpage(struct bio > > *bio, int err)> > > * can't handle the error it will return -EIO and we > > * remain responsible for that page. > > */ > > > > - ret = bio_readpage_error(bio, offset, page, start, end, > > - mirror); > > + ret = bio_readpage_error(bio, offset, page, > > + start, start + root- >sectorsize - 1, > > + mirror); > > > > if (ret == 0) { > > > > - uptodate = > > - test_bit(BIO_UPTODATE, &bio- >bi_flags); > > + uptodate = test_bit(BIO_UPTODATE, &bio- >bi_flags); > > > > if (err) > > > > uptodate = 0; > > > > - offset += len; > > - continue; > > + offset += root->sectorsize; > > + if (--nr_sectors) { > > + start += root->sectorsize; > > + goto next_block; > > + } else { > > + continue; > > + } > > > > } > > > > } > > > > -readpage_ok: > > - if (likely(uptodate)) { > > - loff_t i_size = i_size_read(inode); > > - pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT; > > - unsigned off; > > - > > - /* Zero out the end if this page straddles i_size */ > > - off = i_size & (PAGE_CACHE_SIZE-1); > > - if (page->index == end_index && off) > > - zero_user_segment(page, off, PAGE_CACHE_SIZE); > > - SetPageUptodate(page); > > + > > + if (uptodate) { > > + set_page_blks_state(page, 1 << BLK_STATE_UPTODATE, start, > > + start + root->sectorsize - 1); > > + check_page_uptodate(page); > > > > } else { > > > > ClearPageUptodate(page); > > SetPageError(page); > > > > } > > > > - unlock_page(page); > > - offset += len; > > - > > - if (unlikely(!uptodate)) { > > - if (extent_len) { > > - endio_readpage_release_extent(tree, > > - extent_start, > > - extent_len, 1); > > - extent_start = 0; > > - extent_len = 0; > > - } > > - endio_readpage_release_extent(tree, start, > > - end - start + 1, 0); > > - } else if (!extent_len) { > > - extent_start = start; > > - extent_len = end + 1 - start; > > - } else if (extent_start + extent_len == start) { > > - extent_len += end + 1 - start; > > - } else { > > - endio_readpage_release_extent(tree, extent_start, > > - extent_len, uptodate); > > - extent_start = start; > > - extent_len = end + 1 - start; > > + > > + offset += root->sectorsize; > > + > > + if (--nr_sectors) { > > + clear_page_blks_state(page, 1 << BLK_STATE_IO, > > + start, start + root->sectorsize - 1); > > private->io_lock is not acquired here but not in below. > > IIUC, this can be protected by EXTENT_LOCKED. > private->io_lock plays the same role as BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock we may end up in the following situation, NOTE: Assume 64k page size and 4k block size. Also assume that the first 12 blocks of the page are contiguous while the next 4 blocks are contiguous. When reading the page we end up submitting two "logical address space" bios. So end_bio_extent_readpage function is invoked twice (once for each bio). |-------------------------+-------------------------+-------------| | Task A | Task B | Task C | |-------------------------+-------------------------+-------------| | end_bio_extent_readpage | | | | process block 0 | | | | - clear BLK_STATE_IO | | | | - page_read_complete | | | | process block 1 | | | | ... | | | | ... | | | | ... | end_bio_extent_readpage | | | ... | process block 0 | | | ... | - clear BLK_STATE_IO | | | ... | - page_read_complete | | | ... | process block 1 | | | ... | ... | | | process block 11 | process block 3 | | | - clear BLK_STATE_IO | - clear BLK_STATE_IO | | | - page_read_complete | - page_read_complete | | | - returns true | - returns true | | | - unlock_page() | | | | | | lock_page() | | | - unlock_page() | | |-------------------------+-------------------------+-------------| So we end up incorrectly unlocking the page twice and "Task C" ends up working on an unlocked page. So private->io_lock makes sure that only one of the tasks gets "true" as the return value when page_read_complete() is invoked. As an optimization the patch gets the io_lock only when nr_sectors counter reaches the value 0 (i.e. when the last block of the bio_vec is being processed). Please let me know if my analysis was incorrect. Also, I noticed that page_read_complete() and page_write_complete() can be replaced by just one function i.e. page_io_complete(). -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html