Christoph Hellwig <[EMAIL PROTECTED]> writes: > On Thu, Apr 26, 2007 at 04:50:06PM +1000, Nick Piggin wrote: >> Improving the buffer layer would be a good way. Of course, that is >> a long and difficult task, so nobody wants to do it. > > It's also a stupid idea. We got rid of the buffer layer because it's > a complete pain in the ass, and now you want to reintroduce one that's > even more complex, and most likely even slower than the elegant solution?
No. I'm really suggesting improving the translation from BIO's to the page cache. A set of helper functions. This patch is suggesting we move to a BSD like buffer cache, except that everything is physically mapped. My most practical suggestion is to have support code so that you can do all of the locking (that I/O cares about) on the first page of a page group in the page cache. You don't need larger physical pages to do that. >> Well, for those architectures (and this would solve your large block >> size and 16TB pagecache size without any core kernel changes), you >> can manage 1<<order hardware ptes as a single Linux pte. There is >> nothing that says you must implement PAGE_SIZE as a single TLB sized >> page. > > Well, ppc64 can do that. And guess what, it really painfull for a lot > of workloads. Think of a poor ps3 with 256 from which the broken hypervisor > already takes a lot away and now every file in the pagecache takes > 64k, every thread stack takes 64k, etc? It's good to have variable > sized objects in places where it makes sense, and the pagecache is > definitively one of them. Agreed the page cache is all about variable sized objects known as files! You don't need to do anything extra. The problem is only with building I/O requests from what is there. Iff we really the larger physical page size to support the hardware then it makes sense to go down a path of larger pages. But it doesn't. There is also a more fundamental reasons this patch is silly. It assumes that there is a trivial mapping between filesystems (the primary target of the page cache and blocks on disk). Now I admit this is common but there is no reason to supposed it is true, and this patch appears to exacerbate things. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/