On Tue, 24 Apr 2007, Hugh Dickins wrote: > I was certainly ignorant of that; but I'm not convinced it eliminates > the potential issue. For a start, sys_move_pages seems not to involve > mempolicies at all - I don't see what prevents it migrating blockdev > pages away from the only node which has NORMAL memory.
There is no need to. If the user does something stupid (and 32bit NUMA system are so stupid by default) as you say then this means that the pages would have to use bounce buffers to be written back. > > Metadata is not movable nor subject to memory policies. > > It will never be mapped into a process space. > > Not as metadata, no. But someone (let's hope only root, though I may > be wrong on that) can map any part of the block device into userspace. Concurrent access to a block device by a filesystem and the user? That cannot go over well. If one just reads then I would expect that a copy of the metadata becomes available to the user. Also you cannot migrate pages that have multiple references (which is the case here if the filesystem uses the page cache for the metadata) unless the user has special priviledges and uses special command options. A page that has references that cannot be accounted for by page migration is never migrated. I would assume that the filesystem at minimum takes a refcount on the page used for metadata. If the filesystem would not take a refcount then it would already be in trouble because the page may then be evicted at any time. > > Yes but before we get there we will bounce pagecache pages into an area > > where we do not need kmap. > > Again, I'm not convinced: bouncing gets done for the I/O, > but where is it done to meet the filesystem's expectations? Bouncing gets done for the block device and the block device determines the zones from which it can do I/O. Thus the block device determines the allocation restrictions. > On the other hand, as I said, I've seen no problem myself in practice. > However, if there is no problem, why do block devices demand GFP_USER? GFP_USER requires allocation in non highmem areas. These are typically reachable by device DMA. I think there is no problem if one would open a block device with GFP_HIGHMEM. The pages may be allocated in highmem which would require bounce buffers but otherwise we will be fine. A ramdisk device may just work fine with highmem. If the system has both high memory and normal memory then only allocations to highmemory are subject to memory policies etc etc. The block device allocations would be in zone normal/dma and thus be exempt from NUMA placement. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/