Stefan Hajnoczi <stefa...@gmail.com> wrote on 26/02/2013 06:45:30 PM:

> > But is this significantly different than any other security bug in the
> > host,
> > qemu, kvm....? If you perform the I/O virtualization in a separate (not
> > qemu)
> > process, you have a significantly smaller, self-contained and bounded
> > trusted computing base (TCB) from source code perspective as opposed to
> > a single huge user-space process where it's very difficult to define
> > boundaries and find potential security holes.
>
> I disagree here.
>
> The QEMU process is no more privileged than guest ring 0.  It can only
> mess with resources that the guest itself has access to (CPU, disk,
> network).
>
> The QEMU process cannot access other guests.  SELinux locks it down so
> it cannot access host files or other resources.

I see your point, but the shared-process only needs access to
the virtio ring/buffers (not necessary the entire memory of
all the guests), the network sockets and image files opened by
all the qemu user-space process. So, if you have a security hole,
an attacker can get access only to all these resources.

With the traditional model (not shared thread), if you have a security
hole in qemu then an attacker will be able to exploit exactly the same
security hole to obtain access to the resources "all the qemu instances"
have access. I don't see why a security hole in qemu will work only for
VM1 and not VM2...they are hosted using exactly the same qemu code.

If you move the virtio back-end from qemu to a different user-space
process,
it will be easier to analyze, maintain the code, and detect security
bugs.
Maybe, you can use this model also to improve security:
you can give access to the network/disk only to the shared virtio back-end
process and not to the qemu processes...

> > Sounds interesting... however, once the userspace thread runs the
driver
> > loses
> > control (assuming you don't have spare cores).
> > I mean, a userspace I/O thread will probably consume all
> > its time slice while the driver may prefer to assign less (or more)
cycles
> > to a
> > specific I/O thread based on the ongoing activity of all the VMs.
> >
> > Using a shared-thread, you can optimize the linux scheduler to handle
> > virtual/emulated I/O while you actually don't modify the kernel
scheduler
> > code.
>
> Can you explain details on fine-grained I/O scheduling or post some
> code?

Ok, I'll try with a simple pseudo code to exemplify the idea.
If you use a thread per-device, the tx/request path for each
thread (different qemu process) will probably look like:

while (!stop) {
   wait_for_queue_data() /* for a specific virtual device of a VM */
   while (queue_has_data() && ! stop) {
     request = dequeue_request()
     process(request)
  }
}


Now, if you use a shared-thread with fine-grained I/O scheduling the code
will look like:

while (!stop) {
   queue=select_queue_to_process(); /* for all the virtual devices of all
the VMs*/
   while (queue_has_data(queue) && !should_change_queue()) {
       dequeue(request)
       process(request)
   }
}

The should_change_queue function will return true based on:
(1) the amount of requests the thread handled for the latest processed
queue
(2) the amount of requests pending in all the the other queues
(3) how old are the oldest/newest request of each other queue
(4) priorities between queues

The select_queue_to_process will select a queue based on:
(1) how old are the oldest/newest request of each queue
(2) priorities between queues
(3) the average throughput/latency per queue

The logic to select which queue should be processed and how many requests
should
bee processed from this specific queue is implemented in user-space and
depends
on the ongoing  I/O activity in all the queues. Note that with this model
you can
process many queues in less than 1 scheduler time slice.
With the traditional thread-per device model, the Linux scheduler is
actually who decides
which queue will be processed and the amount of requests that will be
processed for this
specific queue (cycles the thread runs). Note Linux has no information
about the ongoing
activity and status of the queues. The scheduler only knows if a thread is
waiting
(empty queue) or is ready to run (queue has data = event signaled).

Finally, with the shared-thread model you have have significantly less
thread/process
context switches compared to 1 I/O thread per qemu process. You also make
the linux
scheduler life easier ;)




Reply via email to