Re: [RFC] A couple of questions about the paged I/O sub system

2015-10-23 Thread Ian Kent
On Thu, 2015-10-22 at 18:54 -0700, Hugh Dickins wrote:
> On Thu, 22 Oct 2015, Ian Kent wrote:
> > On Wed, 2015-10-21 at 12:56 -0700, Hugh Dickins wrote:
> > > On Wed, 21 Oct 2015, Ian Kent wrote:
> > 
> > Thanks for taking the time to reply Hugh.
> > 
> > > 
> > > > Hi all,
> > > > 
> > > > I've been looking through some of the page reclaim code and at
> > > > truncate_inode_pages().
> > > > 
> > > > I'm not familiar with the code and I'm struggling to understand
> it.
> > > > 
> > > > One thing that is puzzling me right now is, if a file has pages
> > > that
> > > > have been modified and are swapped out when
> > > pagevec_lookup_entries() is
> > > > called will they be found?
> > > 
> > > truncate_inode_pages() is a library function which a filesystem
> calls
> > > at some stage in its inode truncation processing, to take all the
> > > incore
> > > pages out of pagecache (out of its radix_tree), and free them up
> > > (usually: some might be otherwise pinned in memory at the time).
> > > 
> > > A filesystem will have other work to do, very particular to that
> > > filesystem, to free up the actual disk blocks: that's definitely
> > > not part of truncate_inode_pages()'s job.
> > > 
> > > It's also called when evicting an inode no longer needed in
> memory,
> > > to free the associated pagecache, when not deleting the blocks on
> > > disk.
> > > 
> > > I think I don't understand your "swapped out": modifications
> occur to
> > > a page while it is in pagecache, and those modifications need to
> be
> > > written back to disk before that page can be reclaimed for other
> use.
> > 
> > Indeed, now I think about it, "swapped out" is a bad choice of
> words
> > when talking about a paged IO system.
> > 
> > What I'm trying to say is if pages allocated to a mapping are
> modified,
> > then under memory pressure, are they ever reclaimed by writing them
> to
> > swap storage or are they always reclaimed by writing them back to
> disk?
> > 
> > Now I think about what you've said here and looking at the code I
> > suspect the answer is they are always reclaimed by writing them to
> > disk.
> 
> Yes.
> 
> > 
> > > 
> > > > 
> > > > If not then how does truncate_inode_pages(_range)() handle
> waiting
> > > for
> > > > these pages to be swapped back in to perform the writeback and
> > > > truncation?
> > > 
> > > Pages are never "swapped back in to perform the writeback":
> > > if writeback is needed, it's done before the page can be freed
> from
> > > pagecache; and if that data is needed again after the page was
> freed,
> > > it's read back in from disk to fresh page.
> > 
> > That makes sense, using swap would be unnecessary double handling.
> > 
> > > 
> > > You may be worrying about what happens when a page is modified or
> > > under writeback when it is truncated: I think that's something
> each
> > > filesystem has to be careful of, and may deal with in different
> ways.
> > 
> > I'm wondering how a mapping nrpages can be non-zero (read greater
> than
> > one) after calling truncate_inode_pages().
> > 
> > But I'm looking at a much older kernel so it's quite different to
> > current upstream and this seemed like a question relevant to both
> > kernels to get some idea of how page reclaim works.
> > 
> > I guess what I'm really looking to work out is if it's possible,
> with
> > the current upstream kernel, for a mapping to have nrpages greater
> than
> > 1 after calling truncate_inode_pages() and hopefully get some
> > explanation of why if that's not so.
> 
> I assume you're worrying about a truncate_inode_pages(mapping, 0). 
> If
> it's truncate_inode_pages(mapping, 1), or lstart anything greater
> than 0,
> then it will leave behind the incompletely truncated pages at the
> start:
> no mystery in that.

I am, sorry I didn't make that clear to start with.

> 
> > 
> > It's certainly possible with the older kernel I'm looking at but I
> need
> > some info. before I consider looking for possible changes to back
> port.
> 
> Probably what you're looking for is Jan Kara's v3.0 commit
> 08142579b6ca
> "mm: fix assertion mapping->nrpages == 0 in end_writeback()".

I looked at that commit and the back port that went into the older
kernel I'm looking at (around 2011/2012) and I couldn't work out why
taking the tree_lock lock in end_writeback() would always result in
nrpages == 0 due to the quite granular lock/decrement/unlock in the
reclaim code.

In fact, when looking at this, I think I saw a report for that same
problem on a later kernel but I didn't look further (yet) because, in
at least one crash analysis I looked at, nrpages was described as "much
larger than 1" so this is probably a different problem.

Don't think any crash dumps remain so I can't give details, I probably
need to request they be collected, but that's going to be a hard sell
as well, ;)

> > 
> > > 
> > > I'm not sure how much to read in to your use of the word "swap".
> > > It's true that shmem/tmpfs uses swap (of the swapon/swapoff
> variety

Re: [RFC] A couple of questions about the paged I/O sub system

2015-10-22 Thread Hugh Dickins
On Thu, 22 Oct 2015, Ian Kent wrote:
> On Wed, 2015-10-21 at 12:56 -0700, Hugh Dickins wrote:
> > On Wed, 21 Oct 2015, Ian Kent wrote:
> 
> Thanks for taking the time to reply Hugh.
> 
> > 
> > > Hi all,
> > > 
> > > I've been looking through some of the page reclaim code and at
> > > truncate_inode_pages().
> > > 
> > > I'm not familiar with the code and I'm struggling to understand it.
> > > 
> > > One thing that is puzzling me right now is, if a file has pages
> > that
> > > have been modified and are swapped out when
> > pagevec_lookup_entries() is
> > > called will they be found?
> > 
> > truncate_inode_pages() is a library function which a filesystem calls
> > at some stage in its inode truncation processing, to take all the
> > incore
> > pages out of pagecache (out of its radix_tree), and free them up
> > (usually: some might be otherwise pinned in memory at the time).
> > 
> > A filesystem will have other work to do, very particular to that
> > filesystem, to free up the actual disk blocks: that's definitely
> > not part of truncate_inode_pages()'s job.
> > 
> > It's also called when evicting an inode no longer needed in memory,
> > to free the associated pagecache, when not deleting the blocks on
> > disk.
> > 
> > I think I don't understand your "swapped out": modifications occur to
> > a page while it is in pagecache, and those modifications need to be
> > written back to disk before that page can be reclaimed for other use.
> 
> Indeed, now I think about it, "swapped out" is a bad choice of words
> when talking about a paged IO system.
> 
> What I'm trying to say is if pages allocated to a mapping are modified,
> then under memory pressure, are they ever reclaimed by writing them to
> swap storage or are they always reclaimed by writing them back to disk?
> 
> Now I think about what you've said here and looking at the code I
> suspect the answer is they are always reclaimed by writing them to
> disk.

Yes.

> 
> > 
> > > 
> > > If not then how does truncate_inode_pages(_range)() handle waiting
> > for
> > > these pages to be swapped back in to perform the writeback and
> > > truncation?
> > 
> > Pages are never "swapped back in to perform the writeback":
> > if writeback is needed, it's done before the page can be freed from
> > pagecache; and if that data is needed again after the page was freed,
> > it's read back in from disk to fresh page.
> 
> That makes sense, using swap would be unnecessary double handling.
> 
> > 
> > You may be worrying about what happens when a page is modified or
> > under writeback when it is truncated: I think that's something each
> > filesystem has to be careful of, and may deal with in different ways.
> 
> I'm wondering how a mapping nrpages can be non-zero (read greater than
> one) after calling truncate_inode_pages().
> 
> But I'm looking at a much older kernel so it's quite different to
> current upstream and this seemed like a question relevant to both
> kernels to get some idea of how page reclaim works.
> 
> I guess what I'm really looking to work out is if it's possible, with
> the current upstream kernel, for a mapping to have nrpages greater than
> 1 after calling truncate_inode_pages() and hopefully get some
> explanation of why if that's not so.

I assume you're worrying about a truncate_inode_pages(mapping, 0).  If
it's truncate_inode_pages(mapping, 1), or lstart anything greater than 0,
then it will leave behind the incompletely truncated pages at the start:
no mystery in that.

> 
> It's certainly possible with the older kernel I'm looking at but I need
> some info. before I consider looking for possible changes to back port.

Probably what you're looking for is Jan Kara's v3.0 commit 08142579b6ca
"mm: fix assertion mapping->nrpages == 0 in end_writeback()".

> 
> > 
> > I'm not sure how much to read in to your use of the word "swap".
> > It's true that shmem/tmpfs uses swap (of the swapon/swapoff variety)
> > as backing for its pages when under pressure (and uses its own
> > variant
> > shmem_undo_range() to manage that, instead of
> > truncate_inode_pages()),
> > but most filesystems don't use "swap" at all.
> > 
> > I just noticed your subject "paged I/O sub system": I hope you
> > realize
> > that mm/page_io.c is solely concerned with swap (of the
> > swapon/swapoff
> > variety), and has next to nothing to do with filesystems.  (Just as,
> > conversely, mm/swap.c has next to nothing to do with swap.)
> 
> LOL, right, I'm looking at the page reclaim code which, so far, hasn't
> lead me to either of those source files.
> 
> > 
> > > 
> > > Anyone, please?
> > 
> > I hope something I've said there has helped, but warn you that
> > I'm a terrible person to engage in an extended conversation with!
> > Expect long silences, pray for someone else to jump in.
> 
> As well as pointing out that swap storage shouldn't be used in this
> case you've reminded me of the difference between swapping and demand
> paging, so that's a good start.

So long as you 

Re: [RFC] A couple of questions about the paged I/O sub system

2015-10-21 Thread Ian Kent
On Wed, 2015-10-21 at 12:56 -0700, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Ian Kent wrote:

Thanks for taking the time to reply Hugh.

> 
> > Hi all,
> > 
> > I've been looking through some of the page reclaim code and at
> > truncate_inode_pages().
> > 
> > I'm not familiar with the code and I'm struggling to understand it.
> > 
> > One thing that is puzzling me right now is, if a file has pages
> that
> > have been modified and are swapped out when
> pagevec_lookup_entries() is
> > called will they be found?
> 
> truncate_inode_pages() is a library function which a filesystem calls
> at some stage in its inode truncation processing, to take all the
> incore
> pages out of pagecache (out of its radix_tree), and free them up
> (usually: some might be otherwise pinned in memory at the time).
> 
> A filesystem will have other work to do, very particular to that
> filesystem, to free up the actual disk blocks: that's definitely
> not part of truncate_inode_pages()'s job.
> 
> It's also called when evicting an inode no longer needed in memory,
> to free the associated pagecache, when not deleting the blocks on
> disk.
> 
> I think I don't understand your "swapped out": modifications occur to
> a page while it is in pagecache, and those modifications need to be
> written back to disk before that page can be reclaimed for other use.

Indeed, now I think about it, "swapped out" is a bad choice of words
when talking about a paged IO system.

What I'm trying to say is if pages allocated to a mapping are modified,
then under memory pressure, are they ever reclaimed by writing them to
swap storage or are they always reclaimed by writing them back to disk?

Now I think about what you've said here and looking at the code I
suspect the answer is they are always reclaimed by writing them to
disk.

> 
> > 
> > If not then how does truncate_inode_pages(_range)() handle waiting
> for
> > these pages to be swapped back in to perform the writeback and
> > truncation?
> 
> Pages are never "swapped back in to perform the writeback":
> if writeback is needed, it's done before the page can be freed from
> pagecache; and if that data is needed again after the page was freed,
> it's read back in from disk to fresh page.

That makes sense, using swap would be unnecessary double handling.

> 
> You may be worrying about what happens when a page is modified or
> under writeback when it is truncated: I think that's something each
> filesystem has to be careful of, and may deal with in different ways.

I'm wondering how a mapping nrpages can be non-zero (read greater than
one) after calling truncate_inode_pages().

But I'm looking at a much older kernel so it's quite different to
current upstream and this seemed like a question relevant to both
kernels to get some idea of how page reclaim works.

I guess what I'm really looking to work out is if it's possible, with
the current upstream kernel, for a mapping to have nrpages greater than
1 after calling truncate_inode_pages() and hopefully get some
explanation of why if that's not so.

It's certainly possible with the older kernel I'm looking at but I need
some info. before I consider looking for possible changes to back port.

> 
> I'm not sure how much to read in to your use of the word "swap".
> It's true that shmem/tmpfs uses swap (of the swapon/swapoff variety)
> as backing for its pages when under pressure (and uses its own
> variant
> shmem_undo_range() to manage that, instead of
> truncate_inode_pages()),
> but most filesystems don't use "swap" at all.
> 
> I just noticed your subject "paged I/O sub system": I hope you
> realize
> that mm/page_io.c is solely concerned with swap (of the
> swapon/swapoff
> variety), and has next to nothing to do with filesystems.  (Just as,
> conversely, mm/swap.c has next to nothing to do with swap.)

LOL, right, I'm looking at the page reclaim code which, so far, hasn't
lead me to either of those source files.

> 
> > 
> > Anyone, please?
> 
> I hope something I've said there has helped, but warn you that
> I'm a terrible person to engage in an extended conversation with!
> Expect long silences, pray for someone else to jump in.

As well as pointing out that swap storage shouldn't be used in this
case you've reminded me of the difference between swapping and demand
paging, so that's a good start.

Perhaps folks at linux-mm will have more to say.


> > Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] A couple of questions about the paged I/O sub system

2015-10-21 Thread Hugh Dickins
On Wed, 21 Oct 2015, Ian Kent wrote:

> Hi all,
> 
> I've been looking through some of the page reclaim code and at
> truncate_inode_pages().
> 
> I'm not familiar with the code and I'm struggling to understand it.
> 
> One thing that is puzzling me right now is, if a file has pages that
> have been modified and are swapped out when pagevec_lookup_entries() is
> called will they be found?

truncate_inode_pages() is a library function which a filesystem calls
at some stage in its inode truncation processing, to take all the incore
pages out of pagecache (out of its radix_tree), and free them up
(usually: some might be otherwise pinned in memory at the time).

A filesystem will have other work to do, very particular to that
filesystem, to free up the actual disk blocks: that's definitely
not part of truncate_inode_pages()'s job.

It's also called when evicting an inode no longer needed in memory,
to free the associated pagecache, when not deleting the blocks on disk.

I think I don't understand your "swapped out": modifications occur to
a page while it is in pagecache, and those modifications need to be
written back to disk before that page can be reclaimed for other use.

> 
> If not then how does truncate_inode_pages(_range)() handle waiting for
> these pages to be swapped back in to perform the writeback and
> truncation?

Pages are never "swapped back in to perform the writeback":
if writeback is needed, it's done before the page can be freed from
pagecache; and if that data is needed again after the page was freed,
it's read back in from disk to fresh page.

You may be worrying about what happens when a page is modified or
under writeback when it is truncated: I think that's something each
filesystem has to be careful of, and may deal with in different ways.

I'm not sure how much to read in to your use of the word "swap".
It's true that shmem/tmpfs uses swap (of the swapon/swapoff variety)
as backing for its pages when under pressure (and uses its own variant
shmem_undo_range() to manage that, instead of truncate_inode_pages()),
but most filesystems don't use "swap" at all.

I just noticed your subject "paged I/O sub system": I hope you realize
that mm/page_io.c is solely concerned with swap (of the swapon/swapoff
variety), and has next to nothing to do with filesystems.  (Just as,
conversely, mm/swap.c has next to nothing to do with swap.)

> 
> Anyone, please?

I hope something I've said there has helped, but warn you that
I'm a terrible person to engage in an extended conversation with!
Expect long silences, pray for someone else to jump in.

Hugh

> Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] A couple of questions about the paged I/O sub system

2015-10-20 Thread Ian Kent
Hi all,

I've been looking through some of the page reclaim code and at
truncate_inode_pages().

I'm not familiar with the code and I'm struggling to understand it.

One thing that is puzzling me right now is, if a file has pages that
have been modified and are swapped out when pagevec_lookup_entries() is
called will they be found?

If not then how does truncate_inode_pages(_range)() handle waiting for
these pages to be swapped back in to perform the writeback and
truncation?

Anyone, please?
Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/