Stefan Hajnoczi <stefa...@gmail.com> wrote on 26/02/2013 06:45:30 PM:
> > But is this significantly different than any other security bug in the > > host, > > qemu, kvm....? If you perform the I/O virtualization in a separate (not > > qemu) > > process, you have a significantly smaller, self-contained and bounded > > trusted computing base (TCB) from source code perspective as opposed to > > a single huge user-space process where it's very difficult to define > > boundaries and find potential security holes. > > I disagree here. > > The QEMU process is no more privileged than guest ring 0. It can only > mess with resources that the guest itself has access to (CPU, disk, > network). > > The QEMU process cannot access other guests. SELinux locks it down so > it cannot access host files or other resources. I see your point, but the shared-process only needs access to the virtio ring/buffers (not necessary the entire memory of all the guests), the network sockets and image files opened by all the qemu user-space process. So, if you have a security hole, an attacker can get access only to all these resources. With the traditional model (not shared thread), if you have a security hole in qemu then an attacker will be able to exploit exactly the same security hole to obtain access to the resources "all the qemu instances" have access. I don't see why a security hole in qemu will work only for VM1 and not VM2...they are hosted using exactly the same qemu code. If you move the virtio back-end from qemu to a different user-space process, it will be easier to analyze, maintain the code, and detect security bugs. Maybe, you can use this model also to improve security: you can give access to the network/disk only to the shared virtio back-end process and not to the qemu processes... > > Sounds interesting... however, once the userspace thread runs the driver > > loses > > control (assuming you don't have spare cores). > > I mean, a userspace I/O thread will probably consume all > > its time slice while the driver may prefer to assign less (or more) cycles > > to a > > specific I/O thread based on the ongoing activity of all the VMs. > > > > Using a shared-thread, you can optimize the linux scheduler to handle > > virtual/emulated I/O while you actually don't modify the kernel scheduler > > code. > > Can you explain details on fine-grained I/O scheduling or post some > code? Ok, I'll try with a simple pseudo code to exemplify the idea. If you use a thread per-device, the tx/request path for each thread (different qemu process) will probably look like: while (!stop) { wait_for_queue_data() /* for a specific virtual device of a VM */ while (queue_has_data() && ! stop) { request = dequeue_request() process(request) } } Now, if you use a shared-thread with fine-grained I/O scheduling the code will look like: while (!stop) { queue=select_queue_to_process(); /* for all the virtual devices of all the VMs*/ while (queue_has_data(queue) && !should_change_queue()) { dequeue(request) process(request) } } The should_change_queue function will return true based on: (1) the amount of requests the thread handled for the latest processed queue (2) the amount of requests pending in all the the other queues (3) how old are the oldest/newest request of each other queue (4) priorities between queues The select_queue_to_process will select a queue based on: (1) how old are the oldest/newest request of each queue (2) priorities between queues (3) the average throughput/latency per queue The logic to select which queue should be processed and how many requests should bee processed from this specific queue is implemented in user-space and depends on the ongoing I/O activity in all the queues. Note that with this model you can process many queues in less than 1 scheduler time slice. With the traditional thread-per device model, the Linux scheduler is actually who decides which queue will be processed and the amount of requests that will be processed for this specific queue (cycles the thread runs). Note Linux has no information about the ongoing activity and status of the queues. The scheduler only knows if a thread is waiting (empty queue) or is ready to run (queue has data = event signaled). Finally, with the shared-thread model you have have significantly less thread/process context switches compared to 1 I/O thread per qemu process. You also make the linux scheduler life easier ;)