Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-16 Thread Sonny Rao
On Tue, Jan 15, 2013 at 8:47 PM, Minchan Kim  wrote:
> On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
>> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
>>  wrote:
>> > On Tue, 15 Jan 2013 16:32:38 -0800
>> > Sonny Rao  wrote:
>> >
>> >> >> It's for saving the power to increase batter life.
>> >> >
>> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> >> > needed!
>> >> >
>> >>
>> >> Power saving is certainly why we had it on originally for ChromeOS,
>> >> but we turned it off due to misbehavior.
>> >>
>> >> Specifically, we saw a pathological behavior where we'd end up writing
>> >> to the disk every few seconds when laptop mode was turned on.  This
>> >> turned out to be because laptop-mode sets a timer which is used to
>> >> check for new dirty data after the initial flush and writes that out
>> >> before spinning the disk down, and on ChromeOS various chatty daemons
>> >> on the system were logging and dirtying data more or less constantly
>> >> so there was almost always something there to be written out.  So what
>> >> ended up happening was that we'd need to do a read, then wake up the
>> >> disk, and then keep writing every few seconds for a long period of
>> >> time, which had the opposite effect from what we wanted.
>> >
>> > So after the read, the disk would chatter away doing a dribble of
>> > writes?  That sounds like plain brokenness (and why did the chrome guys
>> > not tell anyone about it?!?!?).
>>
>> Yes, either read or fsync.  I ranted about it a little (here:
>> http://marc.info/?l=linux-mm=135422986220016=4), but mostly
>> assumed it was working as expected, and that ChromeOS was just
>> dirtying data at an absurd pace.  Might have been a bad assumption and
>> I could have been more explicit about reporting it, sorry about that.
>>
>> > The idea is that when the physical
>> > read occurs, we should opportunistically flush out all pending writes,
>> > while the disk is running.  Then go back into
>> > buffer-writes-for-a-long-time mode.
>> >
>>
>> See the comment in page-writeback.c above laptop_io_completion():
>>
>> /*
>>  * We've spun up the disk and we're in laptop mode: schedule writeback
>>  * of all dirty data a few seconds from now.  If the flush is already
>> scheduled
>>  * then push it back - the user is still using the disk.
>>  */
>> void laptop_io_completion(struct backing_dev_info *info)
>>
>> What ends up happening fairly often is that there's always something
>> dirty with that few seconds (or even one second) on our system.
>>
>> > I forget what we did with fsync() and friends.  Quite a lot of
>> > pestiferous applications like to do fsync quite frequently.  I had a
>> > special kernel in which fsync() consisted of "return 0;", but ISTR
>> > there being some resistance to productizing that idea.
>> >
>>
>> Yeah, we have this problem and we try to fix up users of fsync() as we
>> find them but it's a bit of a never-ending battle.  Such a feature
>> would be useful.
>>
>> >>  The issues
>> >> with zram swap just confirmed that we didn't want laptop mode.
>> >>
>> >> Most of our devices have had SSDs rather than spinning disks, so noise
>> >> wasn't an issue, although when we finally did support an official
>> >> device with a spinning disk people certainly complained when the disk
>> >> started clicking all the time
>> >
>> > hm, it's interesting that the general idea still has vailidity.  It
>> > would be a fun project for someone to sniff out all the requirements,
>> > fixup/enhance/rewrite the current implementation and generally make it
>> > all spiffy and nice.
>> >
>> >> (due to the underflow in the writeback code).
>> >
>> > To what underflow do you refer?
>> >
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
>>
>> That particular bug caused writes to happen almost instantly after the
>> underflow ocurred, and consequently slowed write throughput to a crawl
>> because there was no chance for contiguous writes to gather.
>>
>> >> We do know that current SSDs save a significant amount of
>> >> power when they go into standby, so minimizing disk writes is still
>> >> useful on these devices.
>> >>
>> >> A very simple laptop mode which only does a single sync when we spin
>> >> up the disk, and didn't bother with the timer behavior or muck with
>> >> swap behavior might be something that is more useful for us, and I
>> >> suspect it might simplify the writeback code somewhat as well.
>> >
>> > I don't think I understand the problem with the timer.  My original RFC
>> > said
>> >
>> > : laptop_writeback_centisecs
>> > : --
>> > :
>> > : This tunable determines the maximum age of dirty data when the machine
>> > : is operating in Laptop mode.  The default value is 3 - five
>> > : minutes.  This means that if applications are generating a small amount
>> > : of write traffic, the disk will spin up once per five minutes.
>> > :
>> > : If 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-16 Thread Sonny Rao
On Tue, Jan 15, 2013 at 8:47 PM, Minchan Kim minc...@kernel.org wrote:
 On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
 On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
 a...@linux-foundation.org wrote:
  On Tue, 15 Jan 2013 16:32:38 -0800
  Sonny Rao sonny...@google.com wrote:
 
   It's for saving the power to increase batter life.
  
   It might well have that effect, dunno.  That wasn't my intent.  Testing
   needed!
  
 
  Power saving is certainly why we had it on originally for ChromeOS,
  but we turned it off due to misbehavior.
 
  Specifically, we saw a pathological behavior where we'd end up writing
  to the disk every few seconds when laptop mode was turned on.  This
  turned out to be because laptop-mode sets a timer which is used to
  check for new dirty data after the initial flush and writes that out
  before spinning the disk down, and on ChromeOS various chatty daemons
  on the system were logging and dirtying data more or less constantly
  so there was almost always something there to be written out.  So what
  ended up happening was that we'd need to do a read, then wake up the
  disk, and then keep writing every few seconds for a long period of
  time, which had the opposite effect from what we wanted.
 
  So after the read, the disk would chatter away doing a dribble of
  writes?  That sounds like plain brokenness (and why did the chrome guys
  not tell anyone about it?!?!?).

 Yes, either read or fsync.  I ranted about it a little (here:
 http://marc.info/?l=linux-mmm=135422986220016w=4), but mostly
 assumed it was working as expected, and that ChromeOS was just
 dirtying data at an absurd pace.  Might have been a bad assumption and
 I could have been more explicit about reporting it, sorry about that.

  The idea is that when the physical
  read occurs, we should opportunistically flush out all pending writes,
  while the disk is running.  Then go back into
  buffer-writes-for-a-long-time mode.
 

 See the comment in page-writeback.c above laptop_io_completion():

 /*
  * We've spun up the disk and we're in laptop mode: schedule writeback
  * of all dirty data a few seconds from now.  If the flush is already
 scheduled
  * then push it back - the user is still using the disk.
  */
 void laptop_io_completion(struct backing_dev_info *info)

 What ends up happening fairly often is that there's always something
 dirty with that few seconds (or even one second) on our system.

  I forget what we did with fsync() and friends.  Quite a lot of
  pestiferous applications like to do fsync quite frequently.  I had a
  special kernel in which fsync() consisted of return 0;, but ISTR
  there being some resistance to productizing that idea.
 

 Yeah, we have this problem and we try to fix up users of fsync() as we
 find them but it's a bit of a never-ending battle.  Such a feature
 would be useful.

   The issues
  with zram swap just confirmed that we didn't want laptop mode.
 
  Most of our devices have had SSDs rather than spinning disks, so noise
  wasn't an issue, although when we finally did support an official
  device with a spinning disk people certainly complained when the disk
  started clicking all the time
 
  hm, it's interesting that the general idea still has vailidity.  It
  would be a fun project for someone to sniff out all the requirements,
  fixup/enhance/rewrite the current implementation and generally make it
  all spiffy and nice.
 
  (due to the underflow in the writeback code).
 
  To what underflow do you refer?
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

 That particular bug caused writes to happen almost instantly after the
 underflow ocurred, and consequently slowed write throughput to a crawl
 because there was no chance for contiguous writes to gather.

  We do know that current SSDs save a significant amount of
  power when they go into standby, so minimizing disk writes is still
  useful on these devices.
 
  A very simple laptop mode which only does a single sync when we spin
  up the disk, and didn't bother with the timer behavior or muck with
  swap behavior might be something that is more useful for us, and I
  suspect it might simplify the writeback code somewhat as well.
 
  I don't think I understand the problem with the timer.  My original RFC
  said
 
  : laptop_writeback_centisecs
  : --
  :
  : This tunable determines the maximum age of dirty data when the machine
  : is operating in Laptop mode.  The default value is 3 - five
  : minutes.  This means that if applications are generating a small amount
  : of write traffic, the disk will spin up once per five minutes.
  :
  : If the disk is spun up for any other reason (such as for a read) then
  : all dirty data will be flushed anyway, and this timer is reset to zero.
 
  which all sounds very sensible and shouldn't exhibit the behavior you
  observed.
 

 The laptop-mode timer get re-armed 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Minchan Kim
On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
> On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
>  wrote:
> > On Tue, 15 Jan 2013 16:32:38 -0800
> > Sonny Rao  wrote:
> >
> >> >> It's for saving the power to increase batter life.
> >> >
> >> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> >> > needed!
> >> >
> >>
> >> Power saving is certainly why we had it on originally for ChromeOS,
> >> but we turned it off due to misbehavior.
> >>
> >> Specifically, we saw a pathological behavior where we'd end up writing
> >> to the disk every few seconds when laptop mode was turned on.  This
> >> turned out to be because laptop-mode sets a timer which is used to
> >> check for new dirty data after the initial flush and writes that out
> >> before spinning the disk down, and on ChromeOS various chatty daemons
> >> on the system were logging and dirtying data more or less constantly
> >> so there was almost always something there to be written out.  So what
> >> ended up happening was that we'd need to do a read, then wake up the
> >> disk, and then keep writing every few seconds for a long period of
> >> time, which had the opposite effect from what we wanted.
> >
> > So after the read, the disk would chatter away doing a dribble of
> > writes?  That sounds like plain brokenness (and why did the chrome guys
> > not tell anyone about it?!?!?).
> 
> Yes, either read or fsync.  I ranted about it a little (here:
> http://marc.info/?l=linux-mm=135422986220016=4), but mostly
> assumed it was working as expected, and that ChromeOS was just
> dirtying data at an absurd pace.  Might have been a bad assumption and
> I could have been more explicit about reporting it, sorry about that.
> 
> > The idea is that when the physical
> > read occurs, we should opportunistically flush out all pending writes,
> > while the disk is running.  Then go back into
> > buffer-writes-for-a-long-time mode.
> >
> 
> See the comment in page-writeback.c above laptop_io_completion():
> 
> /*
>  * We've spun up the disk and we're in laptop mode: schedule writeback
>  * of all dirty data a few seconds from now.  If the flush is already
> scheduled
>  * then push it back - the user is still using the disk.
>  */
> void laptop_io_completion(struct backing_dev_info *info)
> 
> What ends up happening fairly often is that there's always something
> dirty with that few seconds (or even one second) on our system.
> 
> > I forget what we did with fsync() and friends.  Quite a lot of
> > pestiferous applications like to do fsync quite frequently.  I had a
> > special kernel in which fsync() consisted of "return 0;", but ISTR
> > there being some resistance to productizing that idea.
> >
> 
> Yeah, we have this problem and we try to fix up users of fsync() as we
> find them but it's a bit of a never-ending battle.  Such a feature
> would be useful.
> 
> >>  The issues
> >> with zram swap just confirmed that we didn't want laptop mode.
> >>
> >> Most of our devices have had SSDs rather than spinning disks, so noise
> >> wasn't an issue, although when we finally did support an official
> >> device with a spinning disk people certainly complained when the disk
> >> started clicking all the time
> >
> > hm, it's interesting that the general idea still has vailidity.  It
> > would be a fun project for someone to sniff out all the requirements,
> > fixup/enhance/rewrite the current implementation and generally make it
> > all spiffy and nice.
> >
> >> (due to the underflow in the writeback code).
> >
> > To what underflow do you refer?
> >
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
> 
> That particular bug caused writes to happen almost instantly after the
> underflow ocurred, and consequently slowed write throughput to a crawl
> because there was no chance for contiguous writes to gather.
> 
> >> We do know that current SSDs save a significant amount of
> >> power when they go into standby, so minimizing disk writes is still
> >> useful on these devices.
> >>
> >> A very simple laptop mode which only does a single sync when we spin
> >> up the disk, and didn't bother with the timer behavior or muck with
> >> swap behavior might be something that is more useful for us, and I
> >> suspect it might simplify the writeback code somewhat as well.
> >
> > I don't think I understand the problem with the timer.  My original RFC
> > said
> >
> > : laptop_writeback_centisecs
> > : --
> > :
> > : This tunable determines the maximum age of dirty data when the machine
> > : is operating in Laptop mode.  The default value is 3 - five
> > : minutes.  This means that if applications are generating a small amount
> > : of write traffic, the disk will spin up once per five minutes.
> > :
> > : If the disk is spun up for any other reason (such as for a read) then
> > : all dirty data will be flushed anyway, and this timer is reset to zero.
> >
> > 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Minchan Kim
On Tue, Jan 15, 2013 at 04:09:57PM -0800, Andrew Morton wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim  wrote:
> 
> > Hi Andrew,
> > 
> > On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > > On Thu, 10 Jan 2013 11:23:06 +0900
> > > Minchan Kim  wrote:
> > > 
> > > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > > kinda hacking around as-yet-not-understood failures...
> > > > 
> > > > Absolutely, this patch is last guard for unexpectable behavior.
> > > > As I mentioned in cover-letter, Luigi's problem could be solved either 
> > > > [1/2]
> > > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > > emergency. But you're right. It's not good to hide the problem like 
> > > > this path
> > > > so let's drop [2/2].
> > > > 
> > > > Also, I absolutely agree it has bitrotted so for correcting it, we need 
> > > > a
> > > > volunteer who have to inverstigate power saveing experiment with long 
> > > > time.
> > > > So [1/2] would be band-aid until that.
> > > 
> > > I'm inclined to hold off on 1/2 as well, really.
> > 
> > Then, what's your plan?
> 
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
> 
> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
> 
> > > 
> > > The point of laptop_mode isn't to save power btw - it is to minimise
> > > the frequency with which the disk drive is spun up.  By deferring and
> > > then batching writeout operations, basically.
> > 
> > I don't get it. Why should we minimise such frequency?
> 
> Because my laptop was going clickety every minute and was keeping me
> awake.
> 
> > It's for saving the power to increase batter life.
> 
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
> 
> > As I real all document about laptop_mode, they all said about the power
> > or battery life saving.
> > 
> > 1. Documentation/laptops/laptop-mode.txt
> > 2. http://linux.die.net/man/8/laptop_mode
> > 3. http://samwel.tk/laptop_mode/
> > 3. http://www.thinkwiki.org/wiki/Laptop-mode 
> 
> Documentation creep ;)
> 
> Ten years ago, gad: http://lwn.net/Articles/1652/

Odd, I grep it in linux-history.git and found this.
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=93d33a4885a483c708ccb7d24b56e0d5fef7bcab

It seem to be first commit about laptop_mode but it still said about battery 
life
, NOT clickety. But unfortunately, it had no number, measure method and even no
side-effect when the memory pressure is severe so we couldn't sure how it helped
about batter life without reclaim problem so the VM problem have been exported
since we apply f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware].

So let's apply [1/2] in mainline and even stable to fix the problem.
After that, we can add warning to laptop_mode so user who have used it will 
claim their
requirements. With it, we can know they need it for power saving, clickety or
, both so we can make requirement lists. From then, we can start to do someting.
If we are luck, we can remove it totally if any user doesn't claim.

What do you think about it?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Sonny Rao
On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
 wrote:
> On Tue, 15 Jan 2013 16:32:38 -0800
> Sonny Rao  wrote:
>
>> >> It's for saving the power to increase batter life.
>> >
>> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> > needed!
>> >
>>
>> Power saving is certainly why we had it on originally for ChromeOS,
>> but we turned it off due to misbehavior.
>>
>> Specifically, we saw a pathological behavior where we'd end up writing
>> to the disk every few seconds when laptop mode was turned on.  This
>> turned out to be because laptop-mode sets a timer which is used to
>> check for new dirty data after the initial flush and writes that out
>> before spinning the disk down, and on ChromeOS various chatty daemons
>> on the system were logging and dirtying data more or less constantly
>> so there was almost always something there to be written out.  So what
>> ended up happening was that we'd need to do a read, then wake up the
>> disk, and then keep writing every few seconds for a long period of
>> time, which had the opposite effect from what we wanted.
>
> So after the read, the disk would chatter away doing a dribble of
> writes?  That sounds like plain brokenness (and why did the chrome guys
> not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mm=135422986220016=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

> The idea is that when the physical
> read occurs, we should opportunistically flush out all pending writes,
> while the disk is running.  Then go back into
> buffer-writes-for-a-long-time mode.
>

See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

> I forget what we did with fsync() and friends.  Quite a lot of
> pestiferous applications like to do fsync quite frequently.  I had a
> special kernel in which fsync() consisted of "return 0;", but ISTR
> there being some resistance to productizing that idea.
>

Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

>>  The issues
>> with zram swap just confirmed that we didn't want laptop mode.
>>
>> Most of our devices have had SSDs rather than spinning disks, so noise
>> wasn't an issue, although when we finally did support an official
>> device with a spinning disk people certainly complained when the disk
>> started clicking all the time
>
> hm, it's interesting that the general idea still has vailidity.  It
> would be a fun project for someone to sniff out all the requirements,
> fixup/enhance/rewrite the current implementation and generally make it
> all spiffy and nice.
>
>> (due to the underflow in the writeback code).
>
> To what underflow do you refer?
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

>> We do know that current SSDs save a significant amount of
>> power when they go into standby, so minimizing disk writes is still
>> useful on these devices.
>>
>> A very simple laptop mode which only does a single sync when we spin
>> up the disk, and didn't bother with the timer behavior or muck with
>> swap behavior might be something that is more useful for us, and I
>> suspect it might simplify the writeback code somewhat as well.
>
> I don't think I understand the problem with the timer.  My original RFC
> said
>
> : laptop_writeback_centisecs
> : --
> :
> : This tunable determines the maximum age of dirty data when the machine
> : is operating in Laptop mode.  The default value is 3 - five
> : minutes.  This means that if applications are generating a small amount
> : of write traffic, the disk will spin up once per five minutes.
> :
> : If the disk is spun up for any other reason (such as for a read) then
> : all dirty data will be flushed anyway, and this timer is reset to zero.
>
> which all sounds very sensible and shouldn't exhibit the behavior you
> observed.
>

The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Andrew Morton
On Tue, 15 Jan 2013 16:32:38 -0800
Sonny Rao  wrote:

> >> It's for saving the power to increase batter life.
> >
> > It might well have that effect, dunno.  That wasn't my intent.  Testing
> > needed!
> >
> 
> Power saving is certainly why we had it on originally for ChromeOS,
> but we turned it off due to misbehavior.
> 
> Specifically, we saw a pathological behavior where we'd end up writing
> to the disk every few seconds when laptop mode was turned on.  This
> turned out to be because laptop-mode sets a timer which is used to
> check for new dirty data after the initial flush and writes that out
> before spinning the disk down, and on ChromeOS various chatty daemons
> on the system were logging and dirtying data more or less constantly
> so there was almost always something there to be written out.  So what
> ended up happening was that we'd need to do a read, then wake up the
> disk, and then keep writing every few seconds for a long period of
> time, which had the opposite effect from what we wanted.

So after the read, the disk would chatter away doing a dribble of
writes?  That sounds like plain brokenness (and why did the chrome guys
not tell anyone about it?!?!?).  The idea is that when the physical
read occurs, we should opportunistically flush out all pending writes,
while the disk is running.  Then go back into
buffer-writes-for-a-long-time mode.

I forget what we did with fsync() and friends.  Quite a lot of
pestiferous applications like to do fsync quite frequently.  I had a
special kernel in which fsync() consisted of "return 0;", but ISTR
there being some resistance to productizing that idea.

>  The issues
> with zram swap just confirmed that we didn't want laptop mode.
>
> Most of our devices have had SSDs rather than spinning disks, so noise
> wasn't an issue, although when we finally did support an official
> device with a spinning disk people certainly complained when the disk
> started clicking all the time

hm, it's interesting that the general idea still has vailidity.  It
would be a fun project for someone to sniff out all the requirements,
fixup/enhance/rewrite the current implementation and generally make it
all spiffy and nice.

> (due to the underflow in the writeback code).

To what underflow do you refer?

> We do know that current SSDs save a significant amount of
> power when they go into standby, so minimizing disk writes is still
> useful on these devices.
> 
> A very simple laptop mode which only does a single sync when we spin
> up the disk, and didn't bother with the timer behavior or muck with
> swap behavior might be something that is more useful for us, and I
> suspect it might simplify the writeback code somewhat as well.

I don't think I understand the problem with the timer.  My original RFC
said

: laptop_writeback_centisecs
: --
: 
: This tunable determines the maximum age of dirty data when the machine
: is operating in Laptop mode.  The default value is 3 - five
: minutes.  This means that if applications are generating a small amount
: of write traffic, the disk will spin up once per five minutes.
: 
: If the disk is spun up for any other reason (such as for a read) then
: all dirty data will be flushed anyway, and this timer is reset to zero.

which all sounds very sensible and shouldn't exhibit the behavior you
observed.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Sonny Rao
On Tue, Jan 15, 2013 at 4:09 PM, Andrew Morton
 wrote:
> On Fri, 11 Jan 2013 13:43:27 +0900
> Minchan Kim  wrote:
>
>> Hi Andrew,
>>
>> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
>> > On Thu, 10 Jan 2013 11:23:06 +0900
>> > Minchan Kim  wrote:
>> >
>> > > > I have a feeling that laptop mode has bitrotted and these patches are
>> > > > kinda hacking around as-yet-not-understood failures...
>> > >
>> > > Absolutely, this patch is last guard for unexpectable behavior.
>> > > As I mentioned in cover-letter, Luigi's problem could be solved either 
>> > > [1/2]
>> > > or [2/2] but I wanted to add this as last resort in case of unexpected
>> > > emergency. But you're right. It's not good to hide the problem like this 
>> > > path
>> > > so let's drop [2/2].
>> > >
>> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
>> > > volunteer who have to inverstigate power saveing experiment with long 
>> > > time.
>> > > So [1/2] would be band-aid until that.
>> >
>> > I'm inclined to hold off on 1/2 as well, really.
>>
>> Then, what's your plan?
>
> My plan is to sit here until someone gets down and fully tests and
> fixes laptop-mode.  Making it work properly, reliably and as-designed.
>

I think we should agree on what the goals are first.

> Or perhaps someone wants to make the case that we just don't need it
> any more (SSDs are silent!) and removes it all.
>
>> >
>> > The point of laptop_mode isn't to save power btw - it is to minimise
>> > the frequency with which the disk drive is spun up.  By deferring and
>> > then batching writeout operations, basically.
>>
>> I don't get it. Why should we minimise such frequency?
>
> Because my laptop was going clickety every minute and was keeping me
> awake.
>

Very interesting, I don't know if anyone realized that (or we just forgot) :-)

>> It's for saving the power to increase batter life.
>
> It might well have that effect, dunno.  That wasn't my intent.  Testing
> needed!
>

Power saving is certainly why we had it on originally for ChromeOS,
but we turned it off due to misbehavior.

Specifically, we saw a pathological behavior where we'd end up writing
to the disk every few seconds when laptop mode was turned on.  This
turned out to be because laptop-mode sets a timer which is used to
check for new dirty data after the initial flush and writes that out
before spinning the disk down, and on ChromeOS various chatty daemons
on the system were logging and dirtying data more or less constantly
so there was almost always something there to be written out.  So what
ended up happening was that we'd need to do a read, then wake up the
disk, and then keep writing every few seconds for a long period of
time, which had the opposite effect from what we wanted.  The issues
with zram swap just confirmed that we didn't want laptop mode.

Most of our devices have had SSDs rather than spinning disks, so noise
wasn't an issue, although when we finally did support an official
device with a spinning disk people certainly complained when the disk
started clicking all the time (due to the underflow in the writeback
code).   We do know that current SSDs save a significant amount of
power when they go into standby, so minimizing disk writes is still
useful on these devices.

A very simple laptop mode which only does a single sync when we spin
up the disk, and didn't bother with the timer behavior or muck with
swap behavior might be something that is more useful for us, and I
suspect it might simplify the writeback code somewhat as well.

>> As I real all document about laptop_mode, they all said about the power
>> or battery life saving.
>>
>> 1. Documentation/laptops/laptop-mode.txt
>> 2. http://linux.die.net/man/8/laptop_mode
>> 3. http://samwel.tk/laptop_mode/
>> 3. http://www.thinkwiki.org/wiki/Laptop-mode
>
> Documentation creep ;)
>
> Ten years ago, gad: http://lwn.net/Articles/1652/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Andrew Morton
On Fri, 11 Jan 2013 13:43:27 +0900
Minchan Kim  wrote:

> Hi Andrew,
> 
> On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> > On Thu, 10 Jan 2013 11:23:06 +0900
> > Minchan Kim  wrote:
> > 
> > > > I have a feeling that laptop mode has bitrotted and these patches are
> > > > kinda hacking around as-yet-not-understood failures...
> > > 
> > > Absolutely, this patch is last guard for unexpectable behavior.
> > > As I mentioned in cover-letter, Luigi's problem could be solved either 
> > > [1/2]
> > > or [2/2] but I wanted to add this as last resort in case of unexpected
> > > emergency. But you're right. It's not good to hide the problem like this 
> > > path
> > > so let's drop [2/2].
> > > 
> > > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > > volunteer who have to inverstigate power saveing experiment with long 
> > > time.
> > > So [1/2] would be band-aid until that.
> > 
> > I'm inclined to hold off on 1/2 as well, really.
> 
> Then, what's your plan?

My plan is to sit here until someone gets down and fully tests and
fixes laptop-mode.  Making it work properly, reliably and as-designed.

Or perhaps someone wants to make the case that we just don't need it
any more (SSDs are silent!) and removes it all.

> > 
> > The point of laptop_mode isn't to save power btw - it is to minimise
> > the frequency with which the disk drive is spun up.  By deferring and
> > then batching writeout operations, basically.
> 
> I don't get it. Why should we minimise such frequency?

Because my laptop was going clickety every minute and was keeping me
awake.

> It's for saving the power to increase batter life.

It might well have that effect, dunno.  That wasn't my intent.  Testing
needed!

> As I real all document about laptop_mode, they all said about the power
> or battery life saving.
> 
> 1. Documentation/laptops/laptop-mode.txt
> 2. http://linux.die.net/man/8/laptop_mode
> 3. http://samwel.tk/laptop_mode/
> 3. http://www.thinkwiki.org/wiki/Laptop-mode 

Documentation creep ;)

Ten years ago, gad: http://lwn.net/Articles/1652/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Andrew Morton
On Fri, 11 Jan 2013 13:43:27 +0900
Minchan Kim minc...@kernel.org wrote:

 Hi Andrew,
 
 On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
  On Thu, 10 Jan 2013 11:23:06 +0900
  Minchan Kim minc...@kernel.org wrote:
  
I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...
   
   Absolutely, this patch is last guard for unexpectable behavior.
   As I mentioned in cover-letter, Luigi's problem could be solved either 
   [1/2]
   or [2/2] but I wanted to add this as last resort in case of unexpected
   emergency. But you're right. It's not good to hide the problem like this 
   path
   so let's drop [2/2].
   
   Also, I absolutely agree it has bitrotted so for correcting it, we need a
   volunteer who have to inverstigate power saveing experiment with long 
   time.
   So [1/2] would be band-aid until that.
  
  I'm inclined to hold off on 1/2 as well, really.
 
 Then, what's your plan?

My plan is to sit here until someone gets down and fully tests and
fixes laptop-mode.  Making it work properly, reliably and as-designed.

Or perhaps someone wants to make the case that we just don't need it
any more (SSDs are silent!) and removes it all.

  
  The point of laptop_mode isn't to save power btw - it is to minimise
  the frequency with which the disk drive is spun up.  By deferring and
  then batching writeout operations, basically.
 
 I don't get it. Why should we minimise such frequency?

Because my laptop was going clickety every minute and was keeping me
awake.

 It's for saving the power to increase batter life.

It might well have that effect, dunno.  That wasn't my intent.  Testing
needed!

 As I real all document about laptop_mode, they all said about the power
 or battery life saving.
 
 1. Documentation/laptops/laptop-mode.txt
 2. http://linux.die.net/man/8/laptop_mode
 3. http://samwel.tk/laptop_mode/
 3. http://www.thinkwiki.org/wiki/Laptop-mode 

Documentation creep ;)

Ten years ago, gad: http://lwn.net/Articles/1652/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Sonny Rao
On Tue, Jan 15, 2013 at 4:09 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Fri, 11 Jan 2013 13:43:27 +0900
 Minchan Kim minc...@kernel.org wrote:

 Hi Andrew,

 On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
  On Thu, 10 Jan 2013 11:23:06 +0900
  Minchan Kim minc...@kernel.org wrote:
 
I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...
  
   Absolutely, this patch is last guard for unexpectable behavior.
   As I mentioned in cover-letter, Luigi's problem could be solved either 
   [1/2]
   or [2/2] but I wanted to add this as last resort in case of unexpected
   emergency. But you're right. It's not good to hide the problem like this 
   path
   so let's drop [2/2].
  
   Also, I absolutely agree it has bitrotted so for correcting it, we need a
   volunteer who have to inverstigate power saveing experiment with long 
   time.
   So [1/2] would be band-aid until that.
 
  I'm inclined to hold off on 1/2 as well, really.

 Then, what's your plan?

 My plan is to sit here until someone gets down and fully tests and
 fixes laptop-mode.  Making it work properly, reliably and as-designed.


I think we should agree on what the goals are first.

 Or perhaps someone wants to make the case that we just don't need it
 any more (SSDs are silent!) and removes it all.

 
  The point of laptop_mode isn't to save power btw - it is to minimise
  the frequency with which the disk drive is spun up.  By deferring and
  then batching writeout operations, basically.

 I don't get it. Why should we minimise such frequency?

 Because my laptop was going clickety every minute and was keeping me
 awake.


Very interesting, I don't know if anyone realized that (or we just forgot) :-)

 It's for saving the power to increase batter life.

 It might well have that effect, dunno.  That wasn't my intent.  Testing
 needed!


Power saving is certainly why we had it on originally for ChromeOS,
but we turned it off due to misbehavior.

Specifically, we saw a pathological behavior where we'd end up writing
to the disk every few seconds when laptop mode was turned on.  This
turned out to be because laptop-mode sets a timer which is used to
check for new dirty data after the initial flush and writes that out
before spinning the disk down, and on ChromeOS various chatty daemons
on the system were logging and dirtying data more or less constantly
so there was almost always something there to be written out.  So what
ended up happening was that we'd need to do a read, then wake up the
disk, and then keep writing every few seconds for a long period of
time, which had the opposite effect from what we wanted.  The issues
with zram swap just confirmed that we didn't want laptop mode.

Most of our devices have had SSDs rather than spinning disks, so noise
wasn't an issue, although when we finally did support an official
device with a spinning disk people certainly complained when the disk
started clicking all the time (due to the underflow in the writeback
code).   We do know that current SSDs save a significant amount of
power when they go into standby, so minimizing disk writes is still
useful on these devices.

A very simple laptop mode which only does a single sync when we spin
up the disk, and didn't bother with the timer behavior or muck with
swap behavior might be something that is more useful for us, and I
suspect it might simplify the writeback code somewhat as well.

 As I real all document about laptop_mode, they all said about the power
 or battery life saving.

 1. Documentation/laptops/laptop-mode.txt
 2. http://linux.die.net/man/8/laptop_mode
 3. http://samwel.tk/laptop_mode/
 3. http://www.thinkwiki.org/wiki/Laptop-mode

 Documentation creep ;)

 Ten years ago, gad: http://lwn.net/Articles/1652/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Andrew Morton
On Tue, 15 Jan 2013 16:32:38 -0800
Sonny Rao sonny...@google.com wrote:

  It's for saving the power to increase batter life.
 
  It might well have that effect, dunno.  That wasn't my intent.  Testing
  needed!
 
 
 Power saving is certainly why we had it on originally for ChromeOS,
 but we turned it off due to misbehavior.
 
 Specifically, we saw a pathological behavior where we'd end up writing
 to the disk every few seconds when laptop mode was turned on.  This
 turned out to be because laptop-mode sets a timer which is used to
 check for new dirty data after the initial flush and writes that out
 before spinning the disk down, and on ChromeOS various chatty daemons
 on the system were logging and dirtying data more or less constantly
 so there was almost always something there to be written out.  So what
 ended up happening was that we'd need to do a read, then wake up the
 disk, and then keep writing every few seconds for a long period of
 time, which had the opposite effect from what we wanted.

So after the read, the disk would chatter away doing a dribble of
writes?  That sounds like plain brokenness (and why did the chrome guys
not tell anyone about it?!?!?).  The idea is that when the physical
read occurs, we should opportunistically flush out all pending writes,
while the disk is running.  Then go back into
buffer-writes-for-a-long-time mode.

I forget what we did with fsync() and friends.  Quite a lot of
pestiferous applications like to do fsync quite frequently.  I had a
special kernel in which fsync() consisted of return 0;, but ISTR
there being some resistance to productizing that idea.

  The issues
 with zram swap just confirmed that we didn't want laptop mode.

 Most of our devices have had SSDs rather than spinning disks, so noise
 wasn't an issue, although when we finally did support an official
 device with a spinning disk people certainly complained when the disk
 started clicking all the time

hm, it's interesting that the general idea still has vailidity.  It
would be a fun project for someone to sniff out all the requirements,
fixup/enhance/rewrite the current implementation and generally make it
all spiffy and nice.

 (due to the underflow in the writeback code).

To what underflow do you refer?

 We do know that current SSDs save a significant amount of
 power when they go into standby, so minimizing disk writes is still
 useful on these devices.
 
 A very simple laptop mode which only does a single sync when we spin
 up the disk, and didn't bother with the timer behavior or muck with
 swap behavior might be something that is more useful for us, and I
 suspect it might simplify the writeback code somewhat as well.

I don't think I understand the problem with the timer.  My original RFC
said

: laptop_writeback_centisecs
: --
: 
: This tunable determines the maximum age of dirty data when the machine
: is operating in Laptop mode.  The default value is 3 - five
: minutes.  This means that if applications are generating a small amount
: of write traffic, the disk will spin up once per five minutes.
: 
: If the disk is spun up for any other reason (such as for a read) then
: all dirty data will be flushed anyway, and this timer is reset to zero.

which all sounds very sensible and shouldn't exhibit the behavior you
observed.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Sonny Rao
On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Tue, 15 Jan 2013 16:32:38 -0800
 Sonny Rao sonny...@google.com wrote:

  It's for saving the power to increase batter life.
 
  It might well have that effect, dunno.  That wasn't my intent.  Testing
  needed!
 

 Power saving is certainly why we had it on originally for ChromeOS,
 but we turned it off due to misbehavior.

 Specifically, we saw a pathological behavior where we'd end up writing
 to the disk every few seconds when laptop mode was turned on.  This
 turned out to be because laptop-mode sets a timer which is used to
 check for new dirty data after the initial flush and writes that out
 before spinning the disk down, and on ChromeOS various chatty daemons
 on the system were logging and dirtying data more or less constantly
 so there was almost always something there to be written out.  So what
 ended up happening was that we'd need to do a read, then wake up the
 disk, and then keep writing every few seconds for a long period of
 time, which had the opposite effect from what we wanted.

 So after the read, the disk would chatter away doing a dribble of
 writes?  That sounds like plain brokenness (and why did the chrome guys
 not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mmm=135422986220016w=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

 The idea is that when the physical
 read occurs, we should opportunistically flush out all pending writes,
 while the disk is running.  Then go back into
 buffer-writes-for-a-long-time mode.


See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

 I forget what we did with fsync() and friends.  Quite a lot of
 pestiferous applications like to do fsync quite frequently.  I had a
 special kernel in which fsync() consisted of return 0;, but ISTR
 there being some resistance to productizing that idea.


Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

  The issues
 with zram swap just confirmed that we didn't want laptop mode.

 Most of our devices have had SSDs rather than spinning disks, so noise
 wasn't an issue, although when we finally did support an official
 device with a spinning disk people certainly complained when the disk
 started clicking all the time

 hm, it's interesting that the general idea still has vailidity.  It
 would be a fun project for someone to sniff out all the requirements,
 fixup/enhance/rewrite the current implementation and generally make it
 all spiffy and nice.

 (due to the underflow in the writeback code).

 To what underflow do you refer?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

 We do know that current SSDs save a significant amount of
 power when they go into standby, so minimizing disk writes is still
 useful on these devices.

 A very simple laptop mode which only does a single sync when we spin
 up the disk, and didn't bother with the timer behavior or muck with
 swap behavior might be something that is more useful for us, and I
 suspect it might simplify the writeback code somewhat as well.

 I don't think I understand the problem with the timer.  My original RFC
 said

 : laptop_writeback_centisecs
 : --
 :
 : This tunable determines the maximum age of dirty data when the machine
 : is operating in Laptop mode.  The default value is 3 - five
 : minutes.  This means that if applications are generating a small amount
 : of write traffic, the disk will spin up once per five minutes.
 :
 : If the disk is spun up for any other reason (such as for a read) then
 : all dirty data will be flushed anyway, and this timer is reset to zero.

 which all sounds very sensible and shouldn't exhibit the behavior you
 observed.


The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little bit of
data, we end up getting a chain of small writes which keeps the disk

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Minchan Kim
On Tue, Jan 15, 2013 at 04:09:57PM -0800, Andrew Morton wrote:
 On Fri, 11 Jan 2013 13:43:27 +0900
 Minchan Kim minc...@kernel.org wrote:
 
  Hi Andrew,
  
  On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
   On Thu, 10 Jan 2013 11:23:06 +0900
   Minchan Kim minc...@kernel.org wrote:
   
 I have a feeling that laptop mode has bitrotted and these patches are
 kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either 
[1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like 
this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need 
a
volunteer who have to inverstigate power saveing experiment with long 
time.
So [1/2] would be band-aid until that.
   
   I'm inclined to hold off on 1/2 as well, really.
  
  Then, what's your plan?
 
 My plan is to sit here until someone gets down and fully tests and
 fixes laptop-mode.  Making it work properly, reliably and as-designed.
 
 Or perhaps someone wants to make the case that we just don't need it
 any more (SSDs are silent!) and removes it all.
 
   
   The point of laptop_mode isn't to save power btw - it is to minimise
   the frequency with which the disk drive is spun up.  By deferring and
   then batching writeout operations, basically.
  
  I don't get it. Why should we minimise such frequency?
 
 Because my laptop was going clickety every minute and was keeping me
 awake.
 
  It's for saving the power to increase batter life.
 
 It might well have that effect, dunno.  That wasn't my intent.  Testing
 needed!
 
  As I real all document about laptop_mode, they all said about the power
  or battery life saving.
  
  1. Documentation/laptops/laptop-mode.txt
  2. http://linux.die.net/man/8/laptop_mode
  3. http://samwel.tk/laptop_mode/
  3. http://www.thinkwiki.org/wiki/Laptop-mode 
 
 Documentation creep ;)
 
 Ten years ago, gad: http://lwn.net/Articles/1652/

Odd, I grep it in linux-history.git and found this.
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=93d33a4885a483c708ccb7d24b56e0d5fef7bcab

It seem to be first commit about laptop_mode but it still said about battery 
life
, NOT clickety. But unfortunately, it had no number, measure method and even no
side-effect when the memory pressure is severe so we couldn't sure how it helped
about batter life without reclaim problem so the VM problem have been exported
since we apply f80c067[mm: zone_reclaim: make isolate_lru_page() filter-aware].

So let's apply [1/2] in mainline and even stable to fix the problem.
After that, we can add warning to laptop_mode so user who have used it will 
claim their
requirements. With it, we can know they need it for power saving, clickety or
, both so we can make requirement lists. From then, we can start to do someting.
If we are luck, we can remove it totally if any user doesn't claim.

What do you think about it?

 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-15 Thread Minchan Kim
On Tue, Jan 15, 2013 at 05:21:15PM -0800, Sonny Rao wrote:
 On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
 a...@linux-foundation.org wrote:
  On Tue, 15 Jan 2013 16:32:38 -0800
  Sonny Rao sonny...@google.com wrote:
 
   It's for saving the power to increase batter life.
  
   It might well have that effect, dunno.  That wasn't my intent.  Testing
   needed!
  
 
  Power saving is certainly why we had it on originally for ChromeOS,
  but we turned it off due to misbehavior.
 
  Specifically, we saw a pathological behavior where we'd end up writing
  to the disk every few seconds when laptop mode was turned on.  This
  turned out to be because laptop-mode sets a timer which is used to
  check for new dirty data after the initial flush and writes that out
  before spinning the disk down, and on ChromeOS various chatty daemons
  on the system were logging and dirtying data more or less constantly
  so there was almost always something there to be written out.  So what
  ended up happening was that we'd need to do a read, then wake up the
  disk, and then keep writing every few seconds for a long period of
  time, which had the opposite effect from what we wanted.
 
  So after the read, the disk would chatter away doing a dribble of
  writes?  That sounds like plain brokenness (and why did the chrome guys
  not tell anyone about it?!?!?).
 
 Yes, either read or fsync.  I ranted about it a little (here:
 http://marc.info/?l=linux-mmm=135422986220016w=4), but mostly
 assumed it was working as expected, and that ChromeOS was just
 dirtying data at an absurd pace.  Might have been a bad assumption and
 I could have been more explicit about reporting it, sorry about that.
 
  The idea is that when the physical
  read occurs, we should opportunistically flush out all pending writes,
  while the disk is running.  Then go back into
  buffer-writes-for-a-long-time mode.
 
 
 See the comment in page-writeback.c above laptop_io_completion():
 
 /*
  * We've spun up the disk and we're in laptop mode: schedule writeback
  * of all dirty data a few seconds from now.  If the flush is already
 scheduled
  * then push it back - the user is still using the disk.
  */
 void laptop_io_completion(struct backing_dev_info *info)
 
 What ends up happening fairly often is that there's always something
 dirty with that few seconds (or even one second) on our system.
 
  I forget what we did with fsync() and friends.  Quite a lot of
  pestiferous applications like to do fsync quite frequently.  I had a
  special kernel in which fsync() consisted of return 0;, but ISTR
  there being some resistance to productizing that idea.
 
 
 Yeah, we have this problem and we try to fix up users of fsync() as we
 find them but it's a bit of a never-ending battle.  Such a feature
 would be useful.
 
   The issues
  with zram swap just confirmed that we didn't want laptop mode.
 
  Most of our devices have had SSDs rather than spinning disks, so noise
  wasn't an issue, although when we finally did support an official
  device with a spinning disk people certainly complained when the disk
  started clicking all the time
 
  hm, it's interesting that the general idea still has vailidity.  It
  would be a fun project for someone to sniff out all the requirements,
  fixup/enhance/rewrite the current implementation and generally make it
  all spiffy and nice.
 
  (due to the underflow in the writeback code).
 
  To what underflow do you refer?
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754
 
 That particular bug caused writes to happen almost instantly after the
 underflow ocurred, and consequently slowed write throughput to a crawl
 because there was no chance for contiguous writes to gather.
 
  We do know that current SSDs save a significant amount of
  power when they go into standby, so minimizing disk writes is still
  useful on these devices.
 
  A very simple laptop mode which only does a single sync when we spin
  up the disk, and didn't bother with the timer behavior or muck with
  swap behavior might be something that is more useful for us, and I
  suspect it might simplify the writeback code somewhat as well.
 
  I don't think I understand the problem with the timer.  My original RFC
  said
 
  : laptop_writeback_centisecs
  : --
  :
  : This tunable determines the maximum age of dirty data when the machine
  : is operating in Laptop mode.  The default value is 3 - five
  : minutes.  This means that if applications are generating a small amount
  : of write traffic, the disk will spin up once per five minutes.
  :
  : If the disk is spun up for any other reason (such as for a read) then
  : all dirty data will be flushed anyway, and this timer is reset to zero.
 
  which all sounds very sensible and shouldn't exhibit the behavior you
  observed.
 
 
 The laptop-mode timer get re-armed after each writeback (see above
 laptop_io_completion 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-10 Thread Minchan Kim
Hi Andrew,

On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
> On Thu, 10 Jan 2013 11:23:06 +0900
> Minchan Kim  wrote:
> 
> > > I have a feeling that laptop mode has bitrotted and these patches are
> > > kinda hacking around as-yet-not-understood failures...
> > 
> > Absolutely, this patch is last guard for unexpectable behavior.
> > As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> > or [2/2] but I wanted to add this as last resort in case of unexpected
> > emergency. But you're right. It's not good to hide the problem like this 
> > path
> > so let's drop [2/2].
> > 
> > Also, I absolutely agree it has bitrotted so for correcting it, we need a
> > volunteer who have to inverstigate power saveing experiment with long time.
> > So [1/2] would be band-aid until that.
> 
> I'm inclined to hold off on 1/2 as well, really.

Then, what's your plan?

It's real bug since f80c067[mm: zone_reclaim: make isolate_lru_page() 
filter-aware]
was introduced. Some portable device could use laptop_mode to save batter power.
AFAIK, the usecase was trial of ChromeOS and Luigi reported this problem 
although
they decided to disable laptop_mode due to other reason which laptop_mode burns 
out
power for a very long time in their some workload.

Another problem of laptop_mode isn't aware of in-memory swap, like zram.
So unconditionally, prevent to swap out. :( Yeb. it's another story to be fixed.

If you hate this version, how about this?
This version does following as.

1. get_scan_count forces only file-backed pages reclaiming if may_writepage is 
false.
   It prevents unnecessary CPU consumption and LRU churing with anon pages.
2. If memory reclaim suffers(ie, below DEF_PRIORITY - 2), may_writepage would 
be true
   in only direct reclaim path.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..695b907 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -550,6 +550,8 @@ static inline int zone_is_oom_locked(const struct zone 
*zone)
  */
 #define DEF_PRIORITY 12
 
+#define HARD_TO_RECLAIM_PRIO (DEF_PRIORITY - 2)
+
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..4c63bda 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -814,7 +814,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 */
if (page_is_file_cache(page) &&
(!current_is_kswapd() ||
-sc->priority >= DEF_PRIORITY - 2)) {
+sc->priority >= HARD_TO_RECLAIM_PRIO)) 
{
/*
 * Immediately reclaim when written back.
 * Similar in principal to deactivate_page()
@@ -1683,8 +1683,11 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
if (!global_reclaim(sc))
force_scan = true;
 
-   /* If we have no swap space, do not bother scanning anon pages. */
-   if (!sc->may_swap || (nr_swap_pages <= 0)) {
+   /*
+* If we have no swap space or may_writepage is false,
+* do not bother scanning anon pages.
+*/
+   if (!sc->may_swap || !sc->may_writepage || (nr_swap_pages <= 0)) {
scan_balance = SCAN_FILE;
goto out;
}
@@ -1879,7 +1882,7 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 {
if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-sc->priority < DEF_PRIORITY - 2))
+sc->priority < HARD_TO_RECLAIM_PRIO))
return true;
 
return false;
@@ -2215,9 +2218,16 @@ static unsigned long do_try_to_free_pages(struct 
zonelist *zonelist,
sc->may_writepage = 1;
}
 
+   /*
+* This is a safety belt to prevent OOM kill through reclaiming
+* pages with sacrificing the power.
+*/
+   if (sc->priority < HARD_TO_RECLAIM_PRIO)
+   sc->may_writepage = 1;
+
/* Take a nap, wait for some writeback to complete */
if (!sc->hibernation_mode && sc->nr_scanned &&
-   sc->priority < DEF_PRIORITY - 2) {
+   sc->priority < HARD_TO_RECLAIM_PRIO) {
struct zone *preferred_zone;
 
first_zones_zonelist(zonelist, gfp_zone(sc->gfp_mask),
@@ -2824,7 +2834,7 @@ loop_again:
 * OK, kswapd is getting into trouble.  Take a nap, then take
 * another pass across the zones.
 */
-   if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
+   if (total_scanned && 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-10 Thread Andrew Morton
On Thu, 10 Jan 2013 11:23:06 +0900
Minchan Kim  wrote:

> > I have a feeling that laptop mode has bitrotted and these patches are
> > kinda hacking around as-yet-not-understood failures...
> 
> Absolutely, this patch is last guard for unexpectable behavior.
> As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
> or [2/2] but I wanted to add this as last resort in case of unexpected
> emergency. But you're right. It's not good to hide the problem like this path
> so let's drop [2/2].
> 
> Also, I absolutely agree it has bitrotted so for correcting it, we need a
> volunteer who have to inverstigate power saveing experiment with long time.
> So [1/2] would be band-aid until that.

I'm inclined to hold off on 1/2 as well, really.

The point of laptop_mode isn't to save power btw - it is to minimise
the frequency with which the disk drive is spun up.  By deferring and
then batching writeout operations, basically.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-10 Thread Andrew Morton
On Thu, 10 Jan 2013 11:23:06 +0900
Minchan Kim minc...@kernel.org wrote:

  I have a feeling that laptop mode has bitrotted and these patches are
  kinda hacking around as-yet-not-understood failures...
 
 Absolutely, this patch is last guard for unexpectable behavior.
 As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
 or [2/2] but I wanted to add this as last resort in case of unexpected
 emergency. But you're right. It's not good to hide the problem like this path
 so let's drop [2/2].
 
 Also, I absolutely agree it has bitrotted so for correcting it, we need a
 volunteer who have to inverstigate power saveing experiment with long time.
 So [1/2] would be band-aid until that.

I'm inclined to hold off on 1/2 as well, really.

The point of laptop_mode isn't to save power btw - it is to minimise
the frequency with which the disk drive is spun up.  By deferring and
then batching writeout operations, basically.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-10 Thread Minchan Kim
Hi Andrew,

On Thu, Jan 10, 2013 at 01:58:28PM -0800, Andrew Morton wrote:
 On Thu, 10 Jan 2013 11:23:06 +0900
 Minchan Kim minc...@kernel.org wrote:
 
   I have a feeling that laptop mode has bitrotted and these patches are
   kinda hacking around as-yet-not-understood failures...
  
  Absolutely, this patch is last guard for unexpectable behavior.
  As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
  or [2/2] but I wanted to add this as last resort in case of unexpected
  emergency. But you're right. It's not good to hide the problem like this 
  path
  so let's drop [2/2].
  
  Also, I absolutely agree it has bitrotted so for correcting it, we need a
  volunteer who have to inverstigate power saveing experiment with long time.
  So [1/2] would be band-aid until that.
 
 I'm inclined to hold off on 1/2 as well, really.

Then, what's your plan?

It's real bug since f80c067[mm: zone_reclaim: make isolate_lru_page() 
filter-aware]
was introduced. Some portable device could use laptop_mode to save batter power.
AFAIK, the usecase was trial of ChromeOS and Luigi reported this problem 
although
they decided to disable laptop_mode due to other reason which laptop_mode burns 
out
power for a very long time in their some workload.

Another problem of laptop_mode isn't aware of in-memory swap, like zram.
So unconditionally, prevent to swap out. :( Yeb. it's another story to be fixed.

If you hate this version, how about this?
This version does following as.

1. get_scan_count forces only file-backed pages reclaiming if may_writepage is 
false.
   It prevents unnecessary CPU consumption and LRU churing with anon pages.
2. If memory reclaim suffers(ie, below DEF_PRIORITY - 2), may_writepage would 
be true
   in only direct reclaim path.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..695b907 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -550,6 +550,8 @@ static inline int zone_is_oom_locked(const struct zone 
*zone)
  */
 #define DEF_PRIORITY 12
 
+#define HARD_TO_RECLAIM_PRIO (DEF_PRIORITY - 2)
+
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff869d2..4c63bda 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -814,7 +814,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 */
if (page_is_file_cache(page) 
(!current_is_kswapd() ||
-sc-priority = DEF_PRIORITY - 2)) {
+sc-priority = HARD_TO_RECLAIM_PRIO)) 
{
/*
 * Immediately reclaim when written back.
 * Similar in principal to deactivate_page()
@@ -1683,8 +1683,11 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
if (!global_reclaim(sc))
force_scan = true;
 
-   /* If we have no swap space, do not bother scanning anon pages. */
-   if (!sc-may_swap || (nr_swap_pages = 0)) {
+   /*
+* If we have no swap space or may_writepage is false,
+* do not bother scanning anon pages.
+*/
+   if (!sc-may_swap || !sc-may_writepage || (nr_swap_pages = 0)) {
scan_balance = SCAN_FILE;
goto out;
}
@@ -1879,7 +1882,7 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 {
if (IS_ENABLED(CONFIG_COMPACTION)  sc-order 
(sc-order  PAGE_ALLOC_COSTLY_ORDER ||
-sc-priority  DEF_PRIORITY - 2))
+sc-priority  HARD_TO_RECLAIM_PRIO))
return true;
 
return false;
@@ -2215,9 +2218,16 @@ static unsigned long do_try_to_free_pages(struct 
zonelist *zonelist,
sc-may_writepage = 1;
}
 
+   /*
+* This is a safety belt to prevent OOM kill through reclaiming
+* pages with sacrificing the power.
+*/
+   if (sc-priority  HARD_TO_RECLAIM_PRIO)
+   sc-may_writepage = 1;
+
/* Take a nap, wait for some writeback to complete */
if (!sc-hibernation_mode  sc-nr_scanned 
-   sc-priority  DEF_PRIORITY - 2) {
+   sc-priority  HARD_TO_RECLAIM_PRIO) {
struct zone *preferred_zone;
 
first_zones_zonelist(zonelist, gfp_zone(sc-gfp_mask),
@@ -2824,7 +2834,7 @@ loop_again:
 * OK, kswapd is getting into trouble.  Take a nap, then take
 * another pass across the zones.
 */
-   if (total_scanned  (sc.priority  DEF_PRIORITY - 2)) {
+   if (total_scanned  (sc.priority  HARD_TO_RECLAIM_PRIO)) {
if 

Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-09 Thread Minchan Kim
On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:14 +0900
> Minchan Kim  wrote:
> 
> > If laptop_mode is enable, VM try to avoid I/O for saving the power.
> > But if there isn't reclaimable memory without I/O, we should do I/O
> > for preventing unnecessary OOM kill although we sacrifices power.
> > 
> > One of example is that we are out of page cache. Remained one is
> > only anonymous pages, for swapping out, we needs may_writepage = 1.
> > 
> > Reported-by: Luigi Semenzato 
> > Signed-off-by: Minchan Kim 
> > ---
> >  mm/vmscan.c |6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 439cc47..624c816 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> > free = zone_page_state(zone, NR_FREE_PAGES);
> > if (unlikely(file + free <= high_wmark_pages(zone))) {
> > scan_balance = SCAN_ANON;
> > +   /*
> > +* From now on, we have to swap out
> > +* for peventing OOM kill although
> > +* we sacrifice power consumption.
> > +*/
> > +   sc->may_writepage = 1;
> > goto out;
> > }
> > }
> 
> This is pretty ugly.  get_scan_count() is, as its name implies, an
> idempotent function which inspects the state of things and returns a
> result.  As such, it has no business going in and altering the state of
> the scan_control.
> 
> We have code in both direct reclaim and in kswapd to set may_writepage
> if vmscan is getting into trouble.  I don't see why adding another
> instance is necessary if the existing instances are working correctly.
> 
> 
> 
> (Is it correct that __zone_reclaim() ignores laptop_mode?)
> 
> 
> I have a feeling that laptop mode has bitrotted and these patches are
> kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need a
volunteer who have to inverstigate power saveing experiment with long time.
So [1/2] would be band-aid until that.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-09 Thread Andrew Morton
On Wed,  9 Jan 2013 15:21:14 +0900
Minchan Kim  wrote:

> If laptop_mode is enable, VM try to avoid I/O for saving the power.
> But if there isn't reclaimable memory without I/O, we should do I/O
> for preventing unnecessary OOM kill although we sacrifices power.
> 
> One of example is that we are out of page cache. Remained one is
> only anonymous pages, for swapping out, we needs may_writepage = 1.
> 
> Reported-by: Luigi Semenzato 
> Signed-off-by: Minchan Kim 
> ---
>  mm/vmscan.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 439cc47..624c816 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   free = zone_page_state(zone, NR_FREE_PAGES);
>   if (unlikely(file + free <= high_wmark_pages(zone))) {
>   scan_balance = SCAN_ANON;
> + /*
> +  * From now on, we have to swap out
> +  * for peventing OOM kill although
> +  * we sacrifice power consumption.
> +  */
> + sc->may_writepage = 1;
>   goto out;
>   }
>   }

This is pretty ugly.  get_scan_count() is, as its name implies, an
idempotent function which inspects the state of things and returns a
result.  As such, it has no business going in and altering the state of
the scan_control.

We have code in both direct reclaim and in kswapd to set may_writepage
if vmscan is getting into trouble.  I don't see why adding another
instance is necessary if the existing instances are working correctly.



(Is it correct that __zone_reclaim() ignores laptop_mode?)


I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-09 Thread Andrew Morton
On Wed,  9 Jan 2013 15:21:14 +0900
Minchan Kim minc...@kernel.org wrote:

 If laptop_mode is enable, VM try to avoid I/O for saving the power.
 But if there isn't reclaimable memory without I/O, we should do I/O
 for preventing unnecessary OOM kill although we sacrifices power.
 
 One of example is that we are out of page cache. Remained one is
 only anonymous pages, for swapping out, we needs may_writepage = 1.
 
 Reported-by: Luigi Semenzato semenz...@google.com
 Signed-off-by: Minchan Kim minc...@kernel.org
 ---
  mm/vmscan.c |6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 439cc47..624c816 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
   free = zone_page_state(zone, NR_FREE_PAGES);
   if (unlikely(file + free = high_wmark_pages(zone))) {
   scan_balance = SCAN_ANON;
 + /*
 +  * From now on, we have to swap out
 +  * for peventing OOM kill although
 +  * we sacrifice power consumption.
 +  */
 + sc-may_writepage = 1;
   goto out;
   }
   }

This is pretty ugly.  get_scan_count() is, as its name implies, an
idempotent function which inspects the state of things and returns a
result.  As such, it has no business going in and altering the state of
the scan_control.

We have code in both direct reclaim and in kswapd to set may_writepage
if vmscan is getting into trouble.  I don't see why adding another
instance is necessary if the existing instances are working correctly.



(Is it correct that __zone_reclaim() ignores laptop_mode?)


I have a feeling that laptop mode has bitrotted and these patches are
kinda hacking around as-yet-not-understood failures...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-09 Thread Minchan Kim
On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote:
 On Wed,  9 Jan 2013 15:21:14 +0900
 Minchan Kim minc...@kernel.org wrote:
 
  If laptop_mode is enable, VM try to avoid I/O for saving the power.
  But if there isn't reclaimable memory without I/O, we should do I/O
  for preventing unnecessary OOM kill although we sacrifices power.
  
  One of example is that we are out of page cache. Remained one is
  only anonymous pages, for swapping out, we needs may_writepage = 1.
  
  Reported-by: Luigi Semenzato semenz...@google.com
  Signed-off-by: Minchan Kim minc...@kernel.org
  ---
   mm/vmscan.c |6 ++
   1 file changed, 6 insertions(+)
  
  diff --git a/mm/vmscan.c b/mm/vmscan.c
  index 439cc47..624c816 100644
  --- a/mm/vmscan.c
  +++ b/mm/vmscan.c
  @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, 
  struct scan_control *sc,
  free = zone_page_state(zone, NR_FREE_PAGES);
  if (unlikely(file + free = high_wmark_pages(zone))) {
  scan_balance = SCAN_ANON;
  +   /*
  +* From now on, we have to swap out
  +* for peventing OOM kill although
  +* we sacrifice power consumption.
  +*/
  +   sc-may_writepage = 1;
  goto out;
  }
  }
 
 This is pretty ugly.  get_scan_count() is, as its name implies, an
 idempotent function which inspects the state of things and returns a
 result.  As such, it has no business going in and altering the state of
 the scan_control.
 
 We have code in both direct reclaim and in kswapd to set may_writepage
 if vmscan is getting into trouble.  I don't see why adding another
 instance is necessary if the existing instances are working correctly.
 
 
 
 (Is it correct that __zone_reclaim() ignores laptop_mode?)
 
 
 I have a feeling that laptop mode has bitrotted and these patches are
 kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need a
volunteer who have to inverstigate power saveing experiment with long time.
So [1/2] would be band-aid until that.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-08 Thread Minchan Kim
If laptop_mode is enable, VM try to avoid I/O for saving the power.
But if there isn't reclaimable memory without I/O, we should do I/O
for preventing unnecessary OOM kill although we sacrifices power.

One of example is that we are out of page cache. Remained one is
only anonymous pages, for swapping out, we needs may_writepage = 1.

Reported-by: Luigi Semenzato 
Signed-off-by: Minchan Kim 
---
 mm/vmscan.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 439cc47..624c816 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
free = zone_page_state(zone, NR_FREE_PAGES);
if (unlikely(file + free <= high_wmark_pages(zone))) {
scan_balance = SCAN_ANON;
+   /*
+* From now on, we have to swap out
+* for peventing OOM kill although
+* we sacrifice power consumption.
+*/
+   sc->may_writepage = 1;
goto out;
}
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-08 Thread Minchan Kim
If laptop_mode is enable, VM try to avoid I/O for saving the power.
But if there isn't reclaimable memory without I/O, we should do I/O
for preventing unnecessary OOM kill although we sacrifices power.

One of example is that we are out of page cache. Remained one is
only anonymous pages, for swapping out, we needs may_writepage = 1.

Reported-by: Luigi Semenzato semenz...@google.com
Signed-off-by: Minchan Kim minc...@kernel.org
---
 mm/vmscan.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 439cc47..624c816 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
free = zone_page_state(zone, NR_FREE_PAGES);
if (unlikely(file + free = high_wmark_pages(zone))) {
scan_balance = SCAN_ANON;
+   /*
+* From now on, we have to swap out
+* for peventing OOM kill although
+* we sacrifice power consumption.
+*/
+   sc-may_writepage = 1;
goto out;
}
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/