Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines

2013-03-28 Thread Kirill A. Shutemov
Dave Hansen wrote:
> On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" 
> > 
> > For now we still write/read at most PAGE_CACHE_SIZE bytes a time.
> > 
> > This implementation doesn't cover address spaces with backing store.
> ...
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1165,12 +1165,23 @@ find_page:
> > if (unlikely(page == NULL))
> > goto no_cached_page;
> > }
> > +   if (PageTransTail(page)) {
> > +   page_cache_release(page);
> > +   page = find_get_page(mapping,
> > +   index & ~HPAGE_CACHE_INDEX_MASK);
> > +   if (!PageTransHuge(page)) {
> > +   page_cache_release(page);
> > +   goto find_page;
> > +   }
> > +   }
> 
> So, we're going to do a read of a file, and we pulled a tail page out of
> the page cache.  Why can't we just deal with the tail page directly?

Good point. I'll redo it once again.

First I thought to make it possible to read/write more PAGE_SIZE at once.
If not take this option in account for now, it's possible to make code much
cleaner.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines

2013-03-22 Thread Dave Hansen
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" 
> 
> For now we still write/read at most PAGE_CACHE_SIZE bytes a time.
> 
> This implementation doesn't cover address spaces with backing store.
...
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1165,12 +1165,23 @@ find_page:
>   if (unlikely(page == NULL))
>   goto no_cached_page;
>   }
> + if (PageTransTail(page)) {
> + page_cache_release(page);
> + page = find_get_page(mapping,
> + index & ~HPAGE_CACHE_INDEX_MASK);
> + if (!PageTransHuge(page)) {
> + page_cache_release(page);
> + goto find_page;
> + }
> + }

So, we're going to do a read of a file, and we pulled a tail page out of
the page cache.  Why can't we just deal with the tail page directly?
What prevents this?

Is there something special about THP pages that keeps the head page in
the page cache after the tail has been released?  I'd normally be
worried that the find_get_page() here might fail.

It's probably also worth a quick comment like:

/* can't deal with tail pages directly, move to head page */

otherwise the reassignment of "page" starts to seem a bit odd.

>   if (PageReadahead(page)) {
> + BUG_ON(PageTransHuge(page));
>   page_cache_async_readahead(mapping,
>   ra, filp, page,
>   index, last_index - index);
>   }

Is this because we only do readahead for fs's with backing stores?
Could we have a comment to this effect?

>   if (!PageUptodate(page)) {
> + BUG_ON(PageTransHuge(page));
>   if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
>   !mapping->a_ops->is_partially_uptodate)
>   goto page_not_up_to_date;

Same question. :)

Since your two-line description covers two topics, it's not immediately
obvious which one this BUG_ON() applies to.

> @@ -1212,18 +1223,25 @@ page_ok:
>   }
>   nr = nr - offset;
>  
> + /* Recalculate offset in page if we've got a huge page */
> + if (PageTransHuge(page)) {
> + offset = (((loff_t)index << PAGE_CACHE_SHIFT) + offset);
> + offset &= ~HPAGE_PMD_MASK;
> + }

Does this need to be done in cases other than the path that goes through
"if(PageTransTail(page))" above?  If not, I'd probably stick this code
up with the other part.

>   /* If users can be writing to this page using arbitrary
>* virtual addresses, take care about potential aliasing
>* before reading the page on the kernel side.
>*/
>   if (mapping_writably_mapped(mapping))
> - flush_dcache_page(page);
> + flush_dcache_page(page + (offset >> PAGE_CACHE_SHIFT));

This is another case where I think adding another local variable would
essentially help the code self-document.  The way it stands, it's fairly
subtle how (offset>>PAGE_CACHE_SHIFT) works and that it's conditional on
THP being enabled.

int tail_page_index = (offset >> PAGE_CACHE_SHIFT)
...
> + flush_dcache_page(page + tail_page_index);

This makes it obvious that we're indexing off something, *and* that it's
only going to be relevant when dealing with tail pages.

>   /*
>* When a sequential read accesses a page several times,
>* only mark it as accessed the first time.
>*/
> - if (prev_index != index || offset != prev_offset)
> + if (prev_index != index ||
> + (offset & ~PAGE_CACHE_MASK) != prev_offset)
>   mark_page_accessed(page);
>   prev_index = index;
>  
> @@ -1238,8 +1256,9 @@ page_ok:
>* "pos" here (the actor routine has to update the user buffer
>* pointers and the remaining count).
>*/
> - ret = file_read_actor(desc, page, offset, nr);
> - offset += ret;
> + ret = file_read_actor(desc, page + (offset >> PAGE_CACHE_SHIFT),
> + offset & ~PAGE_CACHE_MASK, nr);
> + offset =  (offset & ~PAGE_CACHE_MASK) + ret;

^^ There's an extra space in that last line.

>   index += offset >> PAGE_CACHE_SHIFT;
>   offset &= ~PAGE_CACHE_MASK;
>   prev_offset = offset;
> @@ -2440,8 +2459,13 @@ again:
>   if (mapping_writably_mapped(mapping))
>   flush_dcache_page(page);
>  
> + if (PageTransHuge(page))
> + 

Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines

2013-03-15 Thread Kirill A. Shutemov
Hillf Danton wrote:
> On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov
>  wrote:
> > +   if (PageTransTail(page)) {
> > +   page_cache_release(page);
> > +   page = find_get_page(mapping,
> > +   index & ~HPAGE_CACHE_INDEX_MASK);
> check page is valid, please.

Good catch. Fixed.

> > +   if (!PageTransHuge(page)) {
> > +   page_cache_release(page);
> > +   goto find_page;
> > +   }
> > +   }

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines

2013-03-14 Thread Hillf Danton
On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov
 wrote:
> +   if (PageTransTail(page)) {
> +   page_cache_release(page);
> +   page = find_get_page(mapping,
> +   index & ~HPAGE_CACHE_INDEX_MASK);
check page is valid, please.

> +   if (!PageTransHuge(page)) {
> +   page_cache_release(page);
> +   goto find_page;
> +   }
> +   }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines

2013-03-14 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

For now we still write/read at most PAGE_CACHE_SIZE bytes a time.

This implementation doesn't cover address spaces with backing store.

Signed-off-by: Kirill A. Shutemov 
---
 mm/filemap.c |   35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bdedb1b..79ba9cd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1165,12 +1165,23 @@ find_page:
if (unlikely(page == NULL))
goto no_cached_page;
}
+   if (PageTransTail(page)) {
+   page_cache_release(page);
+   page = find_get_page(mapping,
+   index & ~HPAGE_CACHE_INDEX_MASK);
+   if (!PageTransHuge(page)) {
+   page_cache_release(page);
+   goto find_page;
+   }
+   }
if (PageReadahead(page)) {
+   BUG_ON(PageTransHuge(page));
page_cache_async_readahead(mapping,
ra, filp, page,
index, last_index - index);
}
if (!PageUptodate(page)) {
+   BUG_ON(PageTransHuge(page));
if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
!mapping->a_ops->is_partially_uptodate)
goto page_not_up_to_date;
@@ -1212,18 +1223,25 @@ page_ok:
}
nr = nr - offset;
 
+   /* Recalculate offset in page if we've got a huge page */
+   if (PageTransHuge(page)) {
+   offset = (((loff_t)index << PAGE_CACHE_SHIFT) + offset);
+   offset &= ~HPAGE_PMD_MASK;
+   }
+
/* If users can be writing to this page using arbitrary
 * virtual addresses, take care about potential aliasing
 * before reading the page on the kernel side.
 */
if (mapping_writably_mapped(mapping))
-   flush_dcache_page(page);
+   flush_dcache_page(page + (offset >> PAGE_CACHE_SHIFT));
 
/*
 * When a sequential read accesses a page several times,
 * only mark it as accessed the first time.
 */
-   if (prev_index != index || offset != prev_offset)
+   if (prev_index != index ||
+   (offset & ~PAGE_CACHE_MASK) != prev_offset)
mark_page_accessed(page);
prev_index = index;
 
@@ -1238,8 +1256,9 @@ page_ok:
 * "pos" here (the actor routine has to update the user buffer
 * pointers and the remaining count).
 */
-   ret = file_read_actor(desc, page, offset, nr);
-   offset += ret;
+   ret = file_read_actor(desc, page + (offset >> PAGE_CACHE_SHIFT),
+   offset & ~PAGE_CACHE_MASK, nr);
+   offset =  (offset & ~PAGE_CACHE_MASK) + ret;
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
prev_offset = offset;
@@ -2440,8 +2459,13 @@ again:
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
 
+   if (PageTransHuge(page))
+   offset = pos & ~HPAGE_PMD_MASK;
+
pagefault_disable();
-   copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
+   copied = iov_iter_copy_from_user_atomic(
+   page + (offset >> PAGE_CACHE_SHIFT),
+   i, offset & ~PAGE_CACHE_MASK, bytes);
pagefault_enable();
flush_dcache_page(page);
 
@@ -2464,6 +2488,7 @@ again:
 * because not all segments in the iov can be copied at
 * once without a pagefault.
 */
+   offset = pos & ~PAGE_CACHE_MASK;
bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
iov_iter_single_seg_count(i));
goto again;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/