Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-03-13 Thread Ning Qu
Hi, Andrew,

I have updated the results for tmpfs last week and Hugh already ack
this patchset on Mar 4 in "[PATCH 1/1] mm: implement ->map_pages for
shmem/tmpfs"

Acked-by: Hugh Dickins 

Please consider to apply this patch so that tmpfs will have the
similar feature together with the other file systems. Thanks a lot!

Best wishes,
-- 
Ning Qu (曲宁) | Software Engineer | qun...@google.com | +1-408-418-6066
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-03-03 Thread Kirill A. Shutemov
On Mon, Mar 03, 2014 at 03:37:07PM -0800, Andrew Morton wrote:
> On Mon, 3 Mar 2014 15:29:00 -0800 Linus Torvalds 
>  wrote:
> 
> > On Mon, Mar 3, 2014 at 2:38 PM, Andrew Morton  
> > wrote:
> > >
> > > When the file is uncached, results are peculiar:
> > >
> > > 0.00user 2.84system 0:50.90elapsed 5%CPU (0avgtext+0avgdata 
> > > 4198096maxresident)k
> > > 0inputs+0outputs (1major+49666minor)pagefaults 0swaps
> > >
> > > That's approximately 3x more minor faults.
> > 
> > This is not peculiar.
> > 
> > When the file is uncached, some pages will obviously be under IO due
> > to readahead etc. And the fault-around code very much on purpose will
> > *not* try to wait for those pages, so any busy pages will just simply
> > not be faulted-around.
> 
> Of course.
> 
> > So you should still have fewer minor faults than faulting on *every*
> > page (ie the non-fault-around case), but I would very much expect that
> > fault-around will not see the full "one sixteenth" reduction in minor
> > faults.
> > 
> > And the order of IO will not matter, since the read-ahead is
> > asynchronous wrt the page-faults.
> 
> When a pagefault hits a locked, not-uptodate page it is going to block.
> Once it wakes up we'd *like* to find lots of now-uptodate pages in
> that page's vicinity.  Obviously, that is happening, but not to the
> fullest possible extent.  We _could_ still achieve the 16x if readahead
> was cooperating in an ideal fashion.
> 
> I don't know what's going on in there to produce this consistent 3x
> factor.

In my VM numbers are different (fault in 1G):

cold cache: 2097352inputs+0outputs (2major+25048minor)pagefaults 0swaps
hot cache: 0inputs+0outputs (0major+16450minor)pagefaults 0swaps

~1.5x more page faults with cold cache comparing to hot cache.

BTW, moving do_fault_around() below __do_fault() doesn't make much better:

cold cache: 2097200inputs+0outputs (1major+24641minor)pagefaults 0swaps

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-03-03 Thread Linus Torvalds
On Mon, Mar 3, 2014 at 2:38 PM, Andrew Morton  wrote:
>
> When the file is uncached, results are peculiar:
>
> 0.00user 2.84system 0:50.90elapsed 5%CPU (0avgtext+0avgdata 
> 4198096maxresident)k
> 0inputs+0outputs (1major+49666minor)pagefaults 0swaps
>
> That's approximately 3x more minor faults.

This is not peculiar.

When the file is uncached, some pages will obviously be under IO due
to readahead etc. And the fault-around code very much on purpose will
*not* try to wait for those pages, so any busy pages will just simply
not be faulted-around.

So you should still have fewer minor faults than faulting on *every*
page (ie the non-fault-around case), but I would very much expect that
fault-around will not see the full "one sixteenth" reduction in minor
faults.

And the order of IO will not matter, since the read-ahead is
asynchronous wrt the page-faults.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-03-03 Thread Ning Qu
Actually this one is just using the generic functions provided in
Kirill's patch.

And Kirill just sent another email earlier with two fixes reported by
Hugh and I.

So it doesn't change anything in this patch. I will update the log and
work on the

experiment results. I already asked Kirill command line, so I can have a

apple-to-apple comparison then. I will update the patch with new results soon.

Sorry about the mess-up previously. I should have asked Kirill about the test

in the first place.


Best wishes,
-- 
Ning Qu


On Mon, Mar 3, 2014 at 2:38 PM, Andrew Morton  wrote:
> On Fri, 28 Feb 2014 22:27:04 -0800 Ning Qu  wrote:
>
>> On Fri, Feb 28, 2014 at 10:10 PM, Ning Qu  wrote:
>> > Yes, I am using the iozone -i 0 -i 1. Let me try the most simple test
>> > as you mentioned.
>> > Best wishes,
>> > --
>> > Ning Qu
>> >
>> >
>> > On Fri, Feb 28, 2014 at 5:41 PM, Andrew Morton
>> >  wrote:
>> >> On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:
>> >>
>> >>
>> >> int main(int argc, char *argv[])
>> >> {
>> >> char *p;
>> >> int fd;
>> >> unsigned long idx;
>> >> int sum = 0;
>> >>
>> >> fd = open("foo", O_RDONLY);
>> >> if (fd < 0) {
>> >> perror("open");
>> >> exit(1);
>> >> }
>> >> p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
>> >> if (p == MAP_FAILED) {
>> >> perror("mmap");
>> >> exit(1);
>> >> }
>> >>
>> >> for (idx = 0; idx < 1 * G; idx += 4096)
>> >> sum += p[idx];
>> >> printf("%d\n", sum);
>> >> exit(0);
>> >> }
>> >>
>> >> z:/home/akpm> /usr/bin/time ./a.out
>> >> 0
>> >> 0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
>> >> 4195856maxresident)k
>> >> 0inputs+0outputs (0major+262264minor)pagefaults 0swaps
>> >>
>> >> z:/home/akpm> dc
>> >> 16o
>> >> 262264 4 * p
>> >> 1001E0
>> >>
>> >> That's close!
>
> OK, I'm repairing your top-posting here.  It makes it unnecessarily
> hard to conduct a conversation - please just don't do it.
>
>> Yes, the simple test does verify that the page fault number are
>> correct with the patch. So my previous results are from those command
>> lines, which also show some performance improvement with this change
>> in tmpfs.
>>
>> sequential access
>> /usr/bin/time -a ./iozone -B s 8g -i 0 -i 1
>>
>> random access
>> /usr/bin/time -a ./iozone -B s 8g -i 0 -i 2
>
> I don't understand your point here.
>
> Running my simple test app with and without Kirill's
> mm-introduce-vm_ops-map_pages and
> mm-implement-map_pages-for-page-cache, minor faults are reduced 16x
> when the file is cached, as expected:
>
> 0.02user 0.22system 0:00.24elapsed 97%CPU (0avgtext+0avgdata 
> 4198080maxresident)k
> 0inputs+0outputs (0major+16433minor)pagefaults 0swaps
>
>
> When the file is uncached, results are peculiar:
>
> 0.00user 2.84system 0:50.90elapsed 5%CPU (0avgtext+0avgdata 
> 4198096maxresident)k
> 0inputs+0outputs (1major+49666minor)pagefaults 0swaps
>
> That's approximately 3x more minor faults.  I thought it might be due
> to the fact that userspace pagefaults and disk IO completions are both
> working in the same order through the same pages, so the pagefaults
> keep stumbling across not-yet-completed pages.  So I attempted to
> complete the pages in reverse order:
>
> --- a/fs/mpage.c~a
> +++ a/fs/mpage.c
> @@ -41,12 +41,16 @@
>   * status of that page is hard.  See end_buffer_async_read() for the details.
>   * There is no point in duplicating all that complexity.
>   */
> +#define bio_for_each_segment_all_reverse(bvl, bio, i)  \
> +   for (i = 0, bvl = (bio)->bi_io_vec + (bio)->bi_vcnt - 1;\
> +   i < (bio)->bi_vcnt; i++, bvl--)
> +
>  static void mpage_end_io(struct bio *bio, int err)
>  {
> struct bio_vec *bv;
> int i;
>
> -   bio_for_each_segment_all(bv, bio, i) {
> +   bio_for_each_segment_all_reverse(bv, bio, i) {
> struct page *page = bv->bv_page;
>
> if (bio_data_dir(bio) == READ) {
>
> But that made no difference.  Maybe I got the wrong BIO completion
> routine, but I don't think so (it's ext3).  Probably my theory is
> wrong.
>
> Anyway, could you please resend your patch with Hugh's fix and with a
> more carefully written and more accurate changelog?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu
Yes, the simple test does verify that the page fault number are
correct with the patch. So my previous results are from those command
lines, which also show some performance improvement with this change
in tmpfs.

sequential access
/usr/bin/time -a ./iozone —B s 8g -i 0 -i 1

random access
/usr/bin/time -a ./iozone —B s 8g -i 0 -i 2
Best wishes,
-- 
Ning Qu


On Fri, Feb 28, 2014 at 10:10 PM, Ning Qu  wrote:
> Yes, I am using the iozone -i 0 -i 1. Let me try the most simple test
> as you mentioned.
> Best wishes,
> --
> Ning Qu
>
>
> On Fri, Feb 28, 2014 at 5:41 PM, Andrew Morton
>  wrote:
>> On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:
>>
>>> Sorry about my fault about the experiments, here is the real one.
>>>
>>> Btw, apparently, there are still some questions about the results and
>>> I will sync with Kirill about his test command line.
>>>
>>> Below is just some simple experiment numbers from this patch, let me know if
>>> you would like more:
>>>
>>> Tested on Xeon machine with 64GiB of RAM, using the current default fault
>>> order 4.
>>>
>>> Sequential access 8GiB file
>>> Baselinewith-patch
>>> 1 thread
>>> minor fault 8,389,0524,456,530
>>> time, seconds9.558.31
>>
>> The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
>> faults with this test.
>>
>> Looky:
>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> #define G (1024 * 1024 * 1024)
>>
>> int main(int argc, char *argv[])
>> {
>> char *p;
>> int fd;
>> unsigned long idx;
>> int sum = 0;
>>
>> fd = open("foo", O_RDONLY);
>> if (fd < 0) {
>> perror("open");
>> exit(1);
>> }
>> p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
>> if (p == MAP_FAILED) {
>> perror("mmap");
>> exit(1);
>> }
>>
>> for (idx = 0; idx < 1 * G; idx += 4096)
>> sum += p[idx];
>> printf("%d\n", sum);
>> exit(0);
>> }
>>
>> z:/home/akpm> /usr/bin/time ./a.out
>> 0
>> 0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
>> 4195856maxresident)k
>> 0inputs+0outputs (0major+262264minor)pagefaults 0swaps
>>
>> z:/home/akpm> dc
>> 16o
>> 262264 4 * p
>> 1001E0
>>
>> That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu
Yes, I am using the iozone -i 0 -i 1. Let me try the most simple test
as you mentioned.
Best wishes,
-- 
Ning Qu


On Fri, Feb 28, 2014 at 5:41 PM, Andrew Morton
 wrote:
> On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:
>
>> Sorry about my fault about the experiments, here is the real one.
>>
>> Btw, apparently, there are still some questions about the results and
>> I will sync with Kirill about his test command line.
>>
>> Below is just some simple experiment numbers from this patch, let me know if
>> you would like more:
>>
>> Tested on Xeon machine with 64GiB of RAM, using the current default fault
>> order 4.
>>
>> Sequential access 8GiB file
>> Baselinewith-patch
>> 1 thread
>> minor fault 8,389,0524,456,530
>> time, seconds9.558.31
>
> The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
> faults with this test.
>
> Looky:
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> #define G (1024 * 1024 * 1024)
>
> int main(int argc, char *argv[])
> {
> char *p;
> int fd;
> unsigned long idx;
> int sum = 0;
>
> fd = open("foo", O_RDONLY);
> if (fd < 0) {
> perror("open");
> exit(1);
> }
> p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
> if (p == MAP_FAILED) {
> perror("mmap");
> exit(1);
> }
>
> for (idx = 0; idx < 1 * G; idx += 4096)
> sum += p[idx];
> printf("%d\n", sum);
> exit(0);
> }
>
> z:/home/akpm> /usr/bin/time ./a.out
> 0
> 0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
> 4195856maxresident)k
> 0inputs+0outputs (0major+262264minor)pagefaults 0swaps
>
> z:/home/akpm> dc
> 16o
> 262264 4 * p
> 1001E0
>
> That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Andrew Morton
On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:

> Sorry about my fault about the experiments, here is the real one.
> 
> Btw, apparently, there are still some questions about the results and
> I will sync with Kirill about his test command line.
> 
> Below is just some simple experiment numbers from this patch, let me know if
> you would like more:
> 
> Tested on Xeon machine with 64GiB of RAM, using the current default fault
> order 4.
> 
> Sequential access 8GiB file
> Baselinewith-patch
> 1 thread
> minor fault 8,389,0524,456,530
> time, seconds9.558.31

The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
faults with this test.

Looky:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define G (1024 * 1024 * 1024)

int main(int argc, char *argv[])
{
char *p;
int fd;
unsigned long idx;
int sum = 0;

fd = open("foo", O_RDONLY);
if (fd < 0) {
perror("open");
exit(1);
}
p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}

for (idx = 0; idx < 1 * G; idx += 4096)
sum += p[idx];
printf("%d\n", sum);
exit(0);
}

z:/home/akpm> /usr/bin/time ./a.out
0
0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
4195856maxresident)k
0inputs+0outputs (0major+262264minor)pagefaults 0swaps

z:/home/akpm> dc
16o
262264 4 * p
1001E0

That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu
Sorry about my fault about the experiments, here is the real one.

Btw, apparently, there are still some questions about the results and
I will sync with Kirill about his test command line.

Below is just some simple experiment numbers from this patch, let me know if
you would like more:

Tested on Xeon machine with 64GiB of RAM, using the current default fault
order 4.

Sequential access 8GiB file
Baselinewith-patch
1 thread
minor fault 8,389,0524,456,530
time, seconds9.558.31

Random access 8GiB file
Baselinewith-patch
1 thread
minor fault 8,389,315   6,423,386
time, seconds11.68 10.51



Best wishes,
-- 
Ning Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Andrew Morton
On Fri, 28 Feb 2014 14:18:50 -0800 Ning Qu  wrote:

> This is a follow-up patch for "mm: map few pages around fault address if they 
> are in page cache"
> 
> We use the generic filemap_map_pages as ->map_pages in shmem/tmpfs.
> 

Please cc Hugh on shmem/tmpfs things

> 
> =
> Below is just some simple experiment numbers from this patch, let me know if
> you would like more:
> 
> Tested on Xeon machine with 64GiB of RAM, using the current default fault
> order 4.
> 
> Sequential access 8GiB file
>   Baselinewith-patch
> 1 thread
> minor fault   205 101 

Confused.  Sequential access of an 8G file should generate 2,000,000
minor faults, not 205.  And with FAULT_AROUND_ORDER=4, that should come
down to 2,000,000/16 minor faults when using faultaround?

> time, seconds 7.947.82
> 
> Random access 8GiB file
>   Baselinewith-patch
> 1 thread
> minor fault   724 623
> time, seconds 9.759.84
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/