On Mon, Oct 15, 2012 at 12:36:08PM +0100, Chris Webb wrote: > We're planning to implement shared filesystems for guests on our virtualized > hosting platform, stored on a central fileserver separate from the hosts. > > Whilst we can mount the shares on each host and then use qemu's 9p > passthrough/proxy support to access the mountpoint, going via the host > kernel and vfs like this feels quite inefficient. We would be converting > back and forth between vfs and 9p models several times needlessly. > > Instead, I'm wondering about the feasibility of connecting the 9p stream > directly from qemu's virtio-9p-pci device to a socket opened on a > 9p-over-TCP export from the fileserver. Am I right in thinking that qemu's > -fsdev proxy gives me access to a file descriptor attached to the 9p stream > to/from the guest, or is the protocol between virtfs-proxy-helper and qemu > re-encoded within qemu first? > > Secondly, assuming I can somehow get at the 9p streams directly (either with > an existing option or by adding a new one), I'd like to restrict guests to > the relevant user's subdirectory on the fileserver, and have been thinking > about doing this by filtering the 9p stream to restrict 'attach' operations. > > Fortunately, 9p uses client-chosen fids rather than server filesystem inode > numbers which would immediately scupper any simple attempts to implement a > secure chroot proxy of this kind. Looking at the 9p2000.L protocol, it > doesn't look obviously difficult, but I've not really worked with 9p before, > and could well be missing security complications. (I'm not sure whether > there's risk of symlinks being interpreted server side rather than client > side, for example.) > > I'd also be interested in any more general thoughts on this kind of thing. > If we're going to work on it, it would be nice for us to write something > that would be more widely useful to others rather than just create an > in-house hack. > > Cheers, > > Chris. >
If scalability and security are long-term goals, I'd suggest you take a look at OpenAFS (openafs.org). There are other complications, in that you start getting into stuff like how to authenticate your users to the filesystem, but imho, it's a PITA to switch and get used to this model, but once you do, you don't have to worry about if you've got some complicated (and one-off) security filtering configured right. I've been playing with booting debian kernels & initrds directly from AFS as the root filesystem ( http://bitspjoule.org/hg/initramfs-tools ). What's nice (and also a PITA) is that normally the VM client cannot modify the root filesystem, so I know there's no magic configuration on some VM disk image, but if I wanted to make a change to multiple VMs, I could authenticate to AFS as administrator on one of the nodes, make the change, and then restart the daemons on all the other VMs to load the new change. The other (potential) advantage of AFS in a virtualized environment is client side caching, so instead of VM disk image for the OS, and worrying about backing it up, you just use that disk image as the client-side cache local to the VM host machine. The actual authoritative data is stored on the AFS server, so if you have it cached locally, you never have to hit the network, and if the disk the cache is on dies, you just restart the VM on a different disk. (If you wanted to go overboard, you could modify the client-caching code to go back to the server if the cache gets a read error) I can't say that AFS has really *solved* all the hard problems with this, but there is at least some history on how to effectively deal with them. With what you are describing with 9p in a production environment, I think you'll end up re-discovering all the hard problems and have to invent new 9p-specific ways of dealing with them, and you'll end up with the same complexity as AFS. - Troy