Re: [00/17] Large Blocksize Support V3

Mel Gorman Thu, 26 Apr 2007 03:49:34 -0700

On (26/04/07 17:42), Nick Piggin didst pronounce:
> Christoph Lameter wrote:
> >On Thu, 26 Apr 2007, Nick Piggin wrote:
> >
> >
> >>>mapping through the radix tree. You just need to change the way the
> >>>filesystem looks up pages.
> >>
> >>You didn't think any of the criticisms of higher order page cache size
> >>were valid?
> >
> >
> >They are all known points that have been discussed to death.
> 
> I missed the part where you showed that it was a better solution than
> the alternatives.
> 
> 
> >>>What are the exact requirement you are trying to address?
> >>
> >>Block size > page cache size.
> >
> >
> >But what do you mean with it? A block is no longer a contiguous section of 
> >memory. So you have redefined the term.
> 
> I don't understand what you mean at all. A block has always been a
> contiguous area of disk.
>


Yes, but what you seem to be proposing is that lower layers be
able to treat non-contiguous pages as one IO artifact - like
non-contiguous-compound-pages. Ultimately both are probably needed. i.e.
Use contiguous pages for large blocks where possible but be able to deal
with the compound page consisting of multiple smaller pages with a
regression in performance when necessary.

I don't think what Christoph and Nick are proposing are mutually
exclusive. The arguement is really "which do we deal with first".

> 
> >>You guys have a couple of problems, firstly you need to have ia64
> >>filesystems accessable to x86_64. And secondly you have these controllers
> >>without enough sg entries for nice sized IOs.
> >
> >
> >This is not sgi specific sorry.
> >
> >
> >>I sympathise, and higher order pagecache might solve these in a way, but
> >>I don't think it is the right way to go, mainly because of the 
> >>fragmentation
> >>issues.
> >
> >
> >And you dont care about Mel's work on that level?
> 
> I actually don't like it too much because it can't provide a robust
> solution. What do you do on systems with small memories, or those that
> eventually do get fragmented?
> 

They won't be creating filesystems with large blocks for a start but
even if they did, they would need to your proposal.

Again, I don't think they are mutually exclusive as such.

> Actually, I don't know why people are so excited about being able to
> use higher order allocations (I would rather be more excited about
> never having to use them). But for those few places that really need
> it, I'd rather see them use a virtually mapped kernel with proper
> defragmentation rather than putting hacks all through the core code.
> 

That involves creating a vmalloc-like area or breaking the 1:1 physical:virtual
mapping in the kernel address space. Of those two, I think a vmalloc area
would be easier, have less performance impact and might even be a starting
point for non-contiguous-compound-pages.

hmm.

> >>Increasing PAGE_SIZE, support for block size > page cache size, and 
> >>getting
> >>io controllers matched to a 4K page size IMO would be some good ways to
> >>solve these problems. I know they are probably harder...
> >
> >
> >No this has been tried before and does not work. Why should we loose the 
> >capability to work with 4k pages just because there is some data that 
> >has to be thrown around in quantity? I'd like to have flexibility here.
> 
> Is that a big problem? Really? You use 16K pages on your IPF systems,
> don't you?
> 
> 
> >The fragmentation problem is solvable and we already have a solution in 
> >mm. So I do not really see a problem there?
> 
> I don't think that it is solved, and I think the heuristics that are
> there would be put under more stress if they become widely used. And
> it isn't only about whether we can get the page or not, but also about
> the cost. Look up Linus's arguments about page colouring, which are
> similar and  I also think are pretty valid.
> 

Page coloring has come up a lot in the past. Can you point me at an
example you have in mind and I'll see if the same arguments really
apply.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

Reply via email to