Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
Dave Hansen wrote: > On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: > > From: "Kirill A. Shutemov" > > > > For now we still write/read at most PAGE_CACHE_SIZE bytes a time. > > > > This implementation doesn't cover address spaces with backing store. > ... > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1165,12 +1165,23 @@ find_page: > > if (unlikely(page == NULL)) > > goto no_cached_page; > > } > > + if (PageTransTail(page)) { > > + page_cache_release(page); > > + page = find_get_page(mapping, > > + index & ~HPAGE_CACHE_INDEX_MASK); > > + if (!PageTransHuge(page)) { > > + page_cache_release(page); > > + goto find_page; > > + } > > + } > > So, we're going to do a read of a file, and we pulled a tail page out of > the page cache. Why can't we just deal with the tail page directly? Good point. I'll redo it once again. First I thought to make it possible to read/write more PAGE_SIZE at once. If not take this option in account for now, it's possible to make code much cleaner. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" > > For now we still write/read at most PAGE_CACHE_SIZE bytes a time. > > This implementation doesn't cover address spaces with backing store. ... > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1165,12 +1165,23 @@ find_page: > if (unlikely(page == NULL)) > goto no_cached_page; > } > + if (PageTransTail(page)) { > + page_cache_release(page); > + page = find_get_page(mapping, > + index & ~HPAGE_CACHE_INDEX_MASK); > + if (!PageTransHuge(page)) { > + page_cache_release(page); > + goto find_page; > + } > + } So, we're going to do a read of a file, and we pulled a tail page out of the page cache. Why can't we just deal with the tail page directly? What prevents this? Is there something special about THP pages that keeps the head page in the page cache after the tail has been released? I'd normally be worried that the find_get_page() here might fail. It's probably also worth a quick comment like: /* can't deal with tail pages directly, move to head page */ otherwise the reassignment of "page" starts to seem a bit odd. > if (PageReadahead(page)) { > + BUG_ON(PageTransHuge(page)); > page_cache_async_readahead(mapping, > ra, filp, page, > index, last_index - index); > } Is this because we only do readahead for fs's with backing stores? Could we have a comment to this effect? > if (!PageUptodate(page)) { > + BUG_ON(PageTransHuge(page)); > if (inode->i_blkbits == PAGE_CACHE_SHIFT || > !mapping->a_ops->is_partially_uptodate) > goto page_not_up_to_date; Same question. :) Since your two-line description covers two topics, it's not immediately obvious which one this BUG_ON() applies to. > @@ -1212,18 +1223,25 @@ page_ok: > } > nr = nr - offset; > > + /* Recalculate offset in page if we've got a huge page */ > + if (PageTransHuge(page)) { > + offset = (((loff_t)index << PAGE_CACHE_SHIFT) + offset); > + offset &= ~HPAGE_PMD_MASK; > + } Does this need to be done in cases other than the path that goes through "if(PageTransTail(page))" above? If not, I'd probably stick this code up with the other part. > /* If users can be writing to this page using arbitrary >* virtual addresses, take care about potential aliasing >* before reading the page on the kernel side. >*/ > if (mapping_writably_mapped(mapping)) > - flush_dcache_page(page); > + flush_dcache_page(page + (offset >> PAGE_CACHE_SHIFT)); This is another case where I think adding another local variable would essentially help the code self-document. The way it stands, it's fairly subtle how (offset>>PAGE_CACHE_SHIFT) works and that it's conditional on THP being enabled. int tail_page_index = (offset >> PAGE_CACHE_SHIFT) ... > + flush_dcache_page(page + tail_page_index); This makes it obvious that we're indexing off something, *and* that it's only going to be relevant when dealing with tail pages. > /* >* When a sequential read accesses a page several times, >* only mark it as accessed the first time. >*/ > - if (prev_index != index || offset != prev_offset) > + if (prev_index != index || > + (offset & ~PAGE_CACHE_MASK) != prev_offset) > mark_page_accessed(page); > prev_index = index; > > @@ -1238,8 +1256,9 @@ page_ok: >* "pos" here (the actor routine has to update the user buffer >* pointers and the remaining count). >*/ > - ret = file_read_actor(desc, page, offset, nr); > - offset += ret; > + ret = file_read_actor(desc, page + (offset >> PAGE_CACHE_SHIFT), > + offset & ~PAGE_CACHE_MASK, nr); > + offset = (offset & ~PAGE_CACHE_MASK) + ret; ^^ There's an extra space in that last line. > index += offset >> PAGE_CACHE_SHIFT; > offset &= ~PAGE_CACHE_MASK; > prev_offset = offset; > @@ -2440,8 +2459,13 @@ again: > if (mapping_writably_mapped(mapping)) > flush_dcache_page(page); > > + if (PageTransHuge(page)) > +
Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
Hillf Danton wrote: > On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov > wrote: > > + if (PageTransTail(page)) { > > + page_cache_release(page); > > + page = find_get_page(mapping, > > + index & ~HPAGE_CACHE_INDEX_MASK); > check page is valid, please. Good catch. Fixed. > > + if (!PageTransHuge(page)) { > > + page_cache_release(page); > > + goto find_page; > > + } > > + } -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov wrote: > + if (PageTransTail(page)) { > + page_cache_release(page); > + page = find_get_page(mapping, > + index & ~HPAGE_CACHE_INDEX_MASK); check page is valid, please. > + if (!PageTransHuge(page)) { > + page_cache_release(page); > + goto find_page; > + } > + } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines
From: "Kirill A. Shutemov" For now we still write/read at most PAGE_CACHE_SIZE bytes a time. This implementation doesn't cover address spaces with backing store. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c | 35 ++- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index bdedb1b..79ba9cd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1165,12 +1165,23 @@ find_page: if (unlikely(page == NULL)) goto no_cached_page; } + if (PageTransTail(page)) { + page_cache_release(page); + page = find_get_page(mapping, + index & ~HPAGE_CACHE_INDEX_MASK); + if (!PageTransHuge(page)) { + page_cache_release(page); + goto find_page; + } + } if (PageReadahead(page)) { + BUG_ON(PageTransHuge(page)); page_cache_async_readahead(mapping, ra, filp, page, index, last_index - index); } if (!PageUptodate(page)) { + BUG_ON(PageTransHuge(page)); if (inode->i_blkbits == PAGE_CACHE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date; @@ -1212,18 +1223,25 @@ page_ok: } nr = nr - offset; + /* Recalculate offset in page if we've got a huge page */ + if (PageTransHuge(page)) { + offset = (((loff_t)index << PAGE_CACHE_SHIFT) + offset); + offset &= ~HPAGE_PMD_MASK; + } + /* If users can be writing to this page using arbitrary * virtual addresses, take care about potential aliasing * before reading the page on the kernel side. */ if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_page(page + (offset >> PAGE_CACHE_SHIFT)); /* * When a sequential read accesses a page several times, * only mark it as accessed the first time. */ - if (prev_index != index || offset != prev_offset) + if (prev_index != index || + (offset & ~PAGE_CACHE_MASK) != prev_offset) mark_page_accessed(page); prev_index = index; @@ -1238,8 +1256,9 @@ page_ok: * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - ret = file_read_actor(desc, page, offset, nr); - offset += ret; + ret = file_read_actor(desc, page + (offset >> PAGE_CACHE_SHIFT), + offset & ~PAGE_CACHE_MASK, nr); + offset = (offset & ~PAGE_CACHE_MASK) + ret; index += offset >> PAGE_CACHE_SHIFT; offset &= ~PAGE_CACHE_MASK; prev_offset = offset; @@ -2440,8 +2459,13 @@ again: if (mapping_writably_mapped(mapping)) flush_dcache_page(page); + if (PageTransHuge(page)) + offset = pos & ~HPAGE_PMD_MASK; + pagefault_disable(); - copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); + copied = iov_iter_copy_from_user_atomic( + page + (offset >> PAGE_CACHE_SHIFT), + i, offset & ~PAGE_CACHE_MASK, bytes); pagefault_enable(); flush_dcache_page(page); @@ -2464,6 +2488,7 @@ again: * because not all segments in the iov can be copied at * once without a pagefault. */ + offset = pos & ~PAGE_CACHE_MASK; bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset, iov_iter_single_seg_count(i)); goto again; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/