On Tue, Jun 2, 2020, at 5:55 AM, Stefan Hajnoczi wrote:
>
> Ping Colin. It would be great if you have time to share your thoughts on
> this discussion and explain how you are using this patch.
Yeah sorry about not replying in this thread earlier, this was just a quick
Friday side project for me and the thread obviously exploded =)
Thinking about this more, probably what would be good enough for now is an
option to just disable internal containerization/sandboxing. In fact per the
discussion our production pipeline runs inside OpenShift 4 and because
Kubernetes doesn't support user namespaces yet it also doesn't support
recursive containerization, so we need an option to turn off the internal
containerization.
Our use case is somewhat specialized - for what we're doing we generally trust
the guest. We use VMs for operating system testing and development of content
we trust, as opposed to e.g. something like kata.
It's fine for us to run virtiofs as the same user/security context as qemu.
So...something like this? (Only compile tested)
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 1240828208..603773c505 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -51,6 +51,7 @@ struct fuse_session {
int fd;
int debug;
int deny_others;
+ int no_namespaces;
struct fuse_lowlevel_ops op;
int got_init;
struct cuse_data *cuse_data;
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 2dd36ec03b..263134f792 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2522,6 +2522,7 @@ static const struct fuse_opt fuse_ll_opts[] = {
LL_OPTION("-d", debug, 1),
LL_OPTION("--debug", debug, 1),
LL_OPTION("allow_root", deny_others, 1),
+ LL_OPTION("--no-namespaces", no_namespaces, 1),
LL_OPTION("--socket-path=%s", vu_socket_path, 0),
LL_OPTION("--fd=%d", vu_listen_fd, 0),
LL_OPTION("--thread-pool-size=%d", thread_pool_size, 0),
@@ -2542,6 +2543,7 @@ void fuse_lowlevel_help(void)
*/
printf(
" -o allow_root allow access by root\n"
+ " --no-namespaces Disable internal use of
unshare()/clone(UNSHARE)\n"
" --socket-path=PATH path for the vhost-user socket\n"
" --fd=FDNUM fd number of vhost-user socket\n"
" --thread-pool-size=NUM thread pool size limit (default %d)\n",
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 3ba1d90984..7c54a9cde3 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2551,15 +2551,15 @@ static void setup_namespaces(struct lo_data *lo, struct
fuse_session *se)
char *tmpdir;
/*
- * Create a new pid namespace for *child* processes. We'll have to
- * fork in order to enter the new pid namespace. A new mount namespace
- * is also needed so that we can remount /proc for the new pid
- * namespace.
- *
- * Our UNIX domain sockets have been created. Now we can move to
- * an empty network namespace to prevent TCP/IP and other network
- * activity in case this process is compromised.
- */
+ * Create a new pid namespace for *child* processes. We'll have to
+ * fork in order to enter the new pid namespace. A new mount namespace
+ * is also needed so that we can remount /proc for the new pid
+ * namespace.
+ *
+ * Our UNIX domain sockets have been created. Now we can move to
+ * an empty network namespace to prevent TCP/IP and other network
+ * activity in case this process is compromised.
+ */
if (unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET) != 0) {
fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n");
exit(1);
@@ -2775,6 +2775,8 @@ static void setup_capabilities(void)
static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
bool enable_syslog)
{
+ if (se->no_namespaces)
+ return;
setup_namespaces(lo, se);
setup_mounts(lo->source);
setup_seccomp(enable_syslog);