Hi Rusty, On Thu, Oct 18, 2012 at 03:19:06AM +0100, Rusty Russell wrote: > Will Deacon <will.dea...@arm.com> writes: > > When using a virtio transport, the 9p net device allocates pages to back > > the descriptors inserted into the virtqueue. These allocations may be > > performed from atomic context (under the channel lock) and can therefore > > return high mappings which aren't suitable for virt_to_phys. > > I had not appreciated that subtlety about GFP_ATOMIC :(
Yeah, it's unfortunate for poor old userspace. > This isn't just 9p, the console, block, scsi and net devices also use > GFP_ATOMIC. Ok, I'll split this patch in two since I think that only 9p has the zero-copy stuff, which is why an extra fix is needed there for creating the scatterlist correctly. > > @@ -165,7 +166,8 @@ static int vring_add_indirect(struct vring_virtqueue > > *vq, > > /* Use a single buffer which doesn't continue */ > > head = vq->free_head; > > vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT; > > - vq->vring.desc[head].addr = virt_to_phys(desc); > > + vq->vring.desc[head].addr = page_to_phys(kmap_to_page(desc)) + > > + ((unsigned long)desc & ~PAGE_MASK); > > vq->vring.desc[head].len = i * sizeof(struct vring_desc); > > Gah, virt_to_phys_harder()? Tell me about it... > What's the performance effect? If it's negligible, why doesn't > virt_to_phys() just do this for us? I've not measured it, but even when you don't have CONFIG_HIGHMEM, there's going to be an overhead here because we go around the houses to get the page and then add the offset on afterwards. I doubt it's something we want to plumb directly into virt_to_phys (also, kmap_to_page may call virt_to_phys via the __pa macro so we'd get stuck). > We do have an alternate solution: masking out __GFP_HIGHMEM from the > kmalloc of desc. If it fails, we will fall back to laying out the > virtio request directly inside the ring; if it doesn't fit, we'll wait > for the device to consume more buffers. Hmm, that will probably work for the vring but the zero-copy code for 9p may just give us an address from userspace if I'm understanding it correctly. In that case, we really have to do the translation as below (which is actually much cleaner because everything is page-aligned). > > @@ -325,7 +326,7 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, > > int count = nr_pages; > > while (nr_pages) { > > s = rest_of_page(data); > > - pages[index++] = virt_to_page(data); > > + pages[index++] = kmap_to_page(data); > > data += s; > > nr_pages--; > > } So what do you reckon? How about I leave this hunk as a separate patch and have a play masking out __GFP_HIGHMEM for the vring descriptor? Cheers, Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/