On Wed, Jul 06, 2016 at 10:33:07AM -0700, Maxim Patlasov wrote: > On 07/06/2016 02:26 AM, Vladimir Davydov wrote: > > >On Tue, Jul 05, 2016 at 04:45:10PM -0700, Maxim Patlasov wrote: > >>Vova, > >> > >> > >>On 07/04/2016 11:03 AM, Maxim Patlasov wrote: > >>>On 07/04/2016 08:53 AM, Vladimir Davydov wrote: > >>> > >>>>On Tue, Jun 28, 2016 at 03:48:54PM -0700, Maxim Patlasov wrote: > >>>>... > >>>>>@@ -643,6 +643,7 @@ static struct cgroup_subsys_state > >>>>>*ve_create(struct cgroup *cg) > >>>>> ve->odirect_enable = 2; > >>>>> ve->fsync_enable = 2; > >>>>>+ ve->experimental_fs_enable = 2; > >>>>For odirect_enable and fsync_enable, 2 means follow the host's config, 1 > >>>>means enable unconditionally, and 0 means disable unconditionally. But > >>>>we don't want to allow a user inside a CT to enable this feature, right? > >>>I thought it's OK to allow user inside CT to enable it if host sysadmin is > >>>OK about it. The same logic as for odirect: by default > >>>ve0->experimental_fs_enable = 0, so whatever user inside CT writes to this > >>>knob, the feature is disabled. If sysadmin writes '1' to ve0->..., the > >>>feature becomes enabled. If an user wants to voluntarily disable it inside > >>>CT, that's OK too. > >>> > >>>>This is confusing. May be, we'd better add a new VE_FEATURE for the > >>>>purpose? > >>>Not sure right now. I'll look at it and let you know later. > >>Technically, it is very easy to implement new VE_FEATURE for overlayfs. But > >>this approach is less flexible because we return EPERM from ve_write_64 if > >>CT is running, and we'll need to involve userspace team to make the feature > >>configurable and (possibly) persistent. Do you think it's worthy for > >>something we'll get rid of soon anyway (I mean as soon as PSBM-47981 > >>resolved)? > >Fair enough, not much point in introducing yet another feature for the > >purpose, at least right now, sysctl should do for the beginning. > > > >Come to think of it, do we really need this sysctl inside containers? I > >mean, by enabling this sysctl on the host we open a possible system-wide > >security hole, which a CT admin won't be able to mitigate by disabling > >overlayfs inside her CT. So why would she need it for? To prevent > >non-privileged CT users from mounting overlayfs inside a user ns? But > >overlayfs is not permitted to be mounted by a userns root anyway AFAICS. > >May be, just drop in-CT sysctl then? > > Currently, anyone who can login into CT as root may mount overlayfs, then > try to exploit its weak sides. This is a problem. > > Until we ensure that overlayfs is production-ready (at least does not have > obvious breaches), let's disable it by default (of course, if ve != ve0). > Those who want to play with overlayfs at their own risk will enable it by > turning on some knob on host system (ve == ve0). > > I don't think that mixing trusted (overlayfs-enabled) CTs and not trusted > (overlayfs-disabled) CTs on the same physical node is important use-case for > now. So, any simple system-wide knob must work.
<nod> > Essentially, the same scheme > with odirect: by default it is '0' in ve0 and the root inside CT cannot turn > it on; and if it is manually set to '1' in ve0, the behavior will depend on > per-CT root willing. No, that's not how it works. AFAICS (see may_use_odirect), ve0 sysctl ve sysctl odirect allowed in ve? x 0 0 x 1 1 x 2 x i.e. system-wide sysctl can't be used to disallow odirect inside a VE, while you want a different behavior AFAIU - you want to enable overlayfs if both ve0 sysctl and ve sysctl are set. That's why the patch looks confusing to me. Let's only leave system-wide sysctl for permitting overlayfs. VE sysctl doesn't make any sense - only root user is allowed to mount overlayfs inside a CT and she can set this sysctl anyway. _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel