Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-09 Thread Arnd Bergmann
On Wednesday 09 May 2012, Christoph Hellwig wrote:
> On Wed, May 09, 2012 at 01:59:40PM +, Arnd Bergmann wrote:
> > My feeling is that we should just treat every (REQ_SYNC | REQ_READ)
> > request the same and let them interrupt long-running writes,
> > independent of whether it's REQ_META or demand paging.
> 
> It's funny that the CFQ scheduler used to boost metadata reads that
> have REQ_META set - in fact it still does for those filesystems using
> the now split out REQ_PRIO.

That certainly sounds more sensible than the opposite.

Of course, this is somewhat unrelated to the question of prioritizing
reads over any writes that are already started. IMHO It would be
pointless to only stop the write in order to do a REQ_PRIO read but
not any other read.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-09 Thread Vivek Goyal
On Mon, May 07, 2012 at 10:16:30PM +0530, S, Venkatraman wrote:

[..]
> This feature doesn't fiddle with the I/O scheduler's ability to balance
> read vs write requests or handling requests from various process queues (CFQ).
> 

Does this feature work with CFQ? As CFQ does not submit sync IO (for
idling queues) while async IO is pending and vice a versa (cfq_may_dispatch()).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-09 Thread Christoph Hellwig
On Wed, May 09, 2012 at 01:59:40PM +, Arnd Bergmann wrote:
> My feeling is that we should just treat every (REQ_SYNC | REQ_READ)
> request the same and let them interrupt long-running writes,
> independent of whether it's REQ_META or demand paging.

It's funny that the CFQ scheduler used to boost metadata reads that
have REQ_META set - in fact it still does for those filesystems using
the now split out REQ_PRIO.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-09 Thread Arnd Bergmann
On Wednesday 09 May 2012, Dave Chinner wrote:
> > In low end flash devices, some requests might take too long than normal
> > due to background device maintenance (i.e flash erase / reclaim procedure)
> > kicking in in the context of an ongoing write, stalling them by several
> > orders of magnitude.
> 
> And thereby stalling what might be writes critical to operation.
> Indeed, how does this affect the system when it starts swapping
> heavily? If you keep stalling writes, the system won't be able to
> swap and free memory...

The point here is that reads have a consistent latency, e.g. 500
microseconds for a small access, while writes have a latency
that can easily become 1000x the read latency (e.g. 500 ms of
blocking the device) depending on the state of the device. Most
of the time, writes are fast as well, but sometimes (when garbage
collection happens in the device), they are extremely slow and
block everything else.
This is the only time we ever want to interrupt a write: keeping
the system running interactively while eventually getting to do
the writeback. There is a small penalty for interrupting the garbage
collection, but the device should be able to pick up its work
at the point where we interrupt it, so we can still make forward
progress.

> > > This really seems like functionality that belongs in an IO
> > > scheduler so that write starvation can be avoided, not in high-level
> > > data read paths where we have no clue about anything else going on
> > > in the IO subsystem
> > 
> > Indeed, the feature is built mostly in the low level device driver and
> > minor changes in the elevator. Changes above the block layer are only
> > about setting
> > attributes and transparent to their operation.
> 
> The problem is that the attribute you are setting covers every
> single data read that is done by all users. If that's what you want
> to have happen, then why do you even need a new flag at this layer?
> Just treat every non-REQ_META read request as a demand paged IO and
> you've got exactly the same behaviour without needing to tag at the
> higher layer

My feeling is that we should just treat every (REQ_SYNC | REQ_READ)
request the same and let them interrupt long-running writes,
independent of whether it's REQ_META or demand paging.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-08 Thread Dave Chinner
On Mon, May 07, 2012 at 10:16:30PM +0530, S, Venkatraman wrote:
> Mon, May 7, 2012 at 5:01 AM, Dave Chinner  wrote:
> > On Thu, May 03, 2012 at 07:53:00PM +0530, Venkatraman S wrote:
> >> From: Ilan Smith 
> >>
> >> Add attribute to identify demand paging requests.
> >> Mark readpages with demand paging attribute.
> >>
> >> Signed-off-by: Ilan Smith 
> >> Signed-off-by: Alex Lemberg 
> >> Signed-off-by: Venkatraman S 
> >> ---
> >>  fs/mpage.c                |    2 ++
> >>  include/linux/bio.h       |    7 +++
> >>  include/linux/blk_types.h |    2 ++
> >>  3 files changed, 11 insertions(+)
> >>
> >> diff --git a/fs/mpage.c b/fs/mpage.c
> >> index 0face1c..8b144f5 100644
> >> --- a/fs/mpage.c
> >> +++ b/fs/mpage.c
> >> @@ -386,6 +386,8 @@ mpage_readpages(struct address_space *mapping, struct 
> >> list_head *pages,
> >>                                       &last_block_in_bio, &map_bh,
> >>                                       &first_logical_block,
> >>                                       get_block);
> >> +                     if (bio)
> >> +                             bio->bi_rw |= REQ_RW_DMPG;
> >
> > Have you thought about the potential for DOSing a machine
> > with this? That is, user data reads can now preempt writes of any
> > kind, effectively stalling writeback and memory reclaim which will
> > lead to OOM situations. Or, alternatively, journal flushing will get
> > stalled and no new modifications can take place until the read
> > stream stops.
> 
> This feature doesn't fiddle with the I/O scheduler's ability to balance
> read vs write requests or handling requests from various process queues (CFQ).

And for schedulers like no-op that don't do any read/write balancing?
Also, I thought the code was queuing such demand paged requests at
the front of the queues, too, so bypassing most of the read/write
balancing logic of the elevators...

> Also, for block devices which don't implement the ability to preempt (and even
> for older versions of MMC devices which don't implement this feature),
> the behaviour
> falls back to waiting for write requests to complete before issuing the read.

Sure, but my point is that you are adding a flag that will be set
for all user data read IO, and then making it priviledged in the
lower layers.

> In low end flash devices, some requests might take too long than normal
> due to background device maintenance (i.e flash erase / reclaim procedure)
> kicking in in the context of an ongoing write, stalling them by several
> orders of magnitude.

And thereby stalling what might be writes critical to operation.
Indeed, how does this affect the system when it starts swapping
heavily? If you keep stalling writes, the system won't be able to
swap and free memory...

> This implementation (See 14/16) does have several checks and
> timers to see that it's not triggered very often.  In my tests,
> where I usually have a generous preemption time window, the abort
> happens < 0.1% of the time.

Yes, but seeing as the user has direct control of the pre-emption
vector, it's not hard to imagine someone using it for a timing
attack...

> > This really seems like functionality that belongs in an IO
> > scheduler so that write starvation can be avoided, not in high-level
> > data read paths where we have no clue about anything else going on
> > in the IO subsystem
> 
> Indeed, the feature is built mostly in the low level device driver and
> minor changes in the elevator. Changes above the block layer are only
> about setting
> attributes and transparent to their operation.

The problem is that the attribute you are setting covers every
single data read that is done by all users. If that's what you want
to have happen, then why do you even need a new flag at this layer?
Just treat every non-REQ_META read request as a demand paged IO and
you've got exactly the same behaviour without needing to tag at the
higher layer

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-08 Thread S, Venkatraman
On Tue, May 8, 2012 at 11:58 AM, mani  wrote:
> How about adding the AS_DMPG flag in the file -> address_space when getting
> a filemap_fault()
> so that we can treat the page fault pages as the high priority pages over
> normal read requests.
> How about changing below lines for the support of the pages those are
> requested for the page fault ?
>
>
> --- a/fs/mpage.c 2012-05-04 12:59:12.0 +0530
> +++ b/fs/mpage.c 2012-05-07 13:13:49.0 +0530
> @@ -408,6 +408,8 @@ mpage_readpages(struct address_space *ma
>     &last_block_in_bio, &map_bh,
>     &first_logical_block,
>     get_block);
> +   if(test_bit(AS_DMPG, &mapping->flags) && bio)
>
> + bio->bi_rw |= REQ_RW_DMPG
>     }
>     page_cache_release(page);
>     }
> --- a/include/linux/pagemap.h    2012-05-04 12:57:35.0 +0530
> +++ b/include/linux/pagemap.h    2012-05-07 13:15:24.0 +0530
> @@ -27,6 +27,7 @@ enum mapping_flags {
>  #if defined (CONFIG_BD_CACHE_ENABLED)
>     AS_DIRECT  =   __GFP_BITS_SHIFT + 4,  /* DIRECT_IO specified on file op
> */
>  #endif
> +   AS_DMPG  =   __GFP_BITS_SHIFT + 5,  /* DEMAND PAGE specified on file op
> */
>  };
>
>  static inline void mapping_set_error(struct address_space *mapping, int
> error)
>
> --- a/mm/filemap.c   2012-05-04 12:58:49.0 +0530
> +++ b/mm/filemap.c   2012-05-07 13:15:03.0 +0530
> @@ -1646,6 +1646,7 @@ int filemap_fault(struct vm_area_struct
>     if (offset >= size)
>     return VM_FAULT_SIGBUS;
>
> +   set_bit(AS_DMPG, &file->f_mapping->flags);
>     /*
>  * Do we have something in the page cache already?
>  */
>
> Will these changes have any adverse effect ?
>

Thanks for the example but I can't judge which of the two is the most
elegant or acceptable to maintainers.
I can test with your change and inform if it works.

> Thanks & Regards
> Manish
>
> On Mon, May 7, 2012 at 5:01 AM, Dave Chinner  wrote:
>>
>> On Thu, May 03, 2012 at 07:53:00PM +0530, Venkatraman S wrote:
>> > From: Ilan Smith 
>> >
>> > Add attribute to identify demand paging requests.
>> > Mark readpages with demand paging attribute.
>> >
>> > Signed-off-by: Ilan Smith 
>> > Signed-off-by: Alex Lemberg 
>> > Signed-off-by: Venkatraman S 
>> > ---
>> >  fs/mpage.c                |    2 ++
>> >  include/linux/bio.h       |    7 +++
>> >  include/linux/blk_types.h |    2 ++
>> >  3 files changed, 11 insertions(+)
>> >
>> > diff --git a/fs/mpage.c b/fs/mpage.c
>> > index 0face1c..8b144f5 100644
>> > --- a/fs/mpage.c
>> > +++ b/fs/mpage.c
>> > @@ -386,6 +386,8 @@ mpage_readpages(struct address_space *mapping,
>> > struct list_head *pages,
>> >                                       &last_block_in_bio, &map_bh,
>> >                                       &first_logical_block,
>> >                                       get_block);
>> > +                     if (bio)
>> > +                             bio->bi_rw |= REQ_RW_DMPG;
>>
>> Have you thought about the potential for DOSing a machine
>> with this? That is, user data reads can now preempt writes of any
>> kind, effectively stalling writeback and memory reclaim which will
>> lead to OOM situations. Or, alternatively, journal flushing will get
>> stalled and no new modifications can take place until the read
>> stream stops.
>>
>> This really seems like functionality that belongs in an IO
>> scheduler so that write starvation can be avoided, not in high-level
>> data read paths where we have no clue about anything else going on
>> in the IO subsystem
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> da...@fromorbit.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-07 Thread S, Venkatraman
Mon, May 7, 2012 at 5:01 AM, Dave Chinner  wrote:
> On Thu, May 03, 2012 at 07:53:00PM +0530, Venkatraman S wrote:
>> From: Ilan Smith 
>>
>> Add attribute to identify demand paging requests.
>> Mark readpages with demand paging attribute.
>>
>> Signed-off-by: Ilan Smith 
>> Signed-off-by: Alex Lemberg 
>> Signed-off-by: Venkatraman S 
>> ---
>>  fs/mpage.c                |    2 ++
>>  include/linux/bio.h       |    7 +++
>>  include/linux/blk_types.h |    2 ++
>>  3 files changed, 11 insertions(+)
>>
>> diff --git a/fs/mpage.c b/fs/mpage.c
>> index 0face1c..8b144f5 100644
>> --- a/fs/mpage.c
>> +++ b/fs/mpage.c
>> @@ -386,6 +386,8 @@ mpage_readpages(struct address_space *mapping, struct 
>> list_head *pages,
>>                                       &last_block_in_bio, &map_bh,
>>                                       &first_logical_block,
>>                                       get_block);
>> +                     if (bio)
>> +                             bio->bi_rw |= REQ_RW_DMPG;
>
> Have you thought about the potential for DOSing a machine
> with this? That is, user data reads can now preempt writes of any
> kind, effectively stalling writeback and memory reclaim which will
> lead to OOM situations. Or, alternatively, journal flushing will get
> stalled and no new modifications can take place until the read
> stream stops.

This feature doesn't fiddle with the I/O scheduler's ability to balance
read vs write requests or handling requests from various process queues (CFQ).

Also, for block devices which don't implement the ability to preempt (and even
for older versions of MMC devices which don't implement this feature),
the behaviour
falls back to waiting for write requests to complete before issuing the read.

In low end flash devices, some requests might take too long than normal
due to background device maintenance (i.e flash erase / reclaim procedure)
kicking in in the context of an ongoing write, stalling them by several
orders of magnitude.

This implementation (See 14/16) does have several
checks and timers to see that it's not triggered very often.
In my tests, where I usually have a generous preemption time window, the abort
happens < 0.1% of the time.


>
> This really seems like functionality that belongs in an IO
> scheduler so that write starvation can be avoided, not in high-level
> data read paths where we have no clue about anything else going on
> in the IO subsystem

Indeed, the feature is built mostly in the low level device driver and
minor changes in the elevator. Changes above the block layer are only
about setting
attributes and transparent to their operation.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-06 Thread Dave Chinner
On Thu, May 03, 2012 at 07:53:00PM +0530, Venkatraman S wrote:
> From: Ilan Smith 
> 
> Add attribute to identify demand paging requests.
> Mark readpages with demand paging attribute.
> 
> Signed-off-by: Ilan Smith 
> Signed-off-by: Alex Lemberg 
> Signed-off-by: Venkatraman S 
> ---
>  fs/mpage.c|2 ++
>  include/linux/bio.h   |7 +++
>  include/linux/blk_types.h |2 ++
>  3 files changed, 11 insertions(+)
> 
> diff --git a/fs/mpage.c b/fs/mpage.c
> index 0face1c..8b144f5 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -386,6 +386,8 @@ mpage_readpages(struct address_space *mapping, struct 
> list_head *pages,
>   &last_block_in_bio, &map_bh,
>   &first_logical_block,
>   get_block);
> + if (bio)
> + bio->bi_rw |= REQ_RW_DMPG;

Have you thought about the potential for DOSing a machine
with this? That is, user data reads can now preempt writes of any
kind, effectively stalling writeback and memory reclaim which will
lead to OOM situations. Or, alternatively, journal flushing will get
stalled and no new modifications can take place until the read
stream stops.

This really seems like functionality that belongs in an IO
scheduler so that write starvation can be avoided, not in high-level
data read paths where we have no clue about anything else going on
in the IO subsystem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/16] FS: Added demand paging markers to filesystem

2012-05-03 Thread Venkatraman S
From: Ilan Smith 

Add attribute to identify demand paging requests.
Mark readpages with demand paging attribute.

Signed-off-by: Ilan Smith 
Signed-off-by: Alex Lemberg 
Signed-off-by: Venkatraman S 
---
 fs/mpage.c|2 ++
 include/linux/bio.h   |7 +++
 include/linux/blk_types.h |2 ++
 3 files changed, 11 insertions(+)

diff --git a/fs/mpage.c b/fs/mpage.c
index 0face1c..8b144f5 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -386,6 +386,8 @@ mpage_readpages(struct address_space *mapping, struct 
list_head *pages,
&last_block_in_bio, &map_bh,
&first_logical_block,
get_block);
+   if (bio)
+   bio->bi_rw |= REQ_RW_DMPG;
}
page_cache_release(page);
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4d94eb8..264e0ef 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,6 +57,13 @@
(bio)->bi_rw |= ((unsigned long) (prio) << BIO_PRIO_SHIFT); \
 } while (0)
 
+static inline bool bio_rw_flagged(struct bio *bio, unsigned long flag)
+{
+   return ((bio->bi_rw & flag)  != 0);
+}
+
+#define bio_dmpg(bio)  bio_rw_flagged(bio, REQ_RW_DMPG)
+
 /*
  * various member access, note that bio_data should of course not be used
  * on highmem page vectors
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 4053cbd..87feb80 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -150,6 +150,7 @@ enum rq_flag_bits {
__REQ_FLUSH_SEQ,/* request for flush sequence */
__REQ_IO_STAT,  /* account I/O stat */
__REQ_MIXED_MERGE,  /* merge of different types, fail separately */
+   __REQ_RW_DMPG,
__REQ_NR_BITS,  /* stops here */
 };
 
@@ -191,5 +192,6 @@ enum rq_flag_bits {
 #define REQ_IO_STAT(1 << __REQ_IO_STAT)
 #define REQ_MIXED_MERGE(1 << __REQ_MIXED_MERGE)
 #define REQ_SECURE (1 << __REQ_SECURE)
+#define REQ_RW_DMPG(1 << __REQ_RW_DMPG)
 
 #endif /* __LINUX_BLK_TYPES_H */
-- 
1.7.10.rc2

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html