On 07/07/2016 04:10 AM, Vladimir Davydov wrote:

On Wed, Jul 06, 2016 at 10:33:07AM -0700, Maxim Patlasov wrote:
On 07/06/2016 02:26 AM, Vladimir Davydov wrote:

On Tue, Jul 05, 2016 at 04:45:10PM -0700, Maxim Patlasov wrote:
Vova,


On 07/04/2016 11:03 AM, Maxim Patlasov wrote:
On 07/04/2016 08:53 AM, Vladimir Davydov wrote:

On Tue, Jun 28, 2016 at 03:48:54PM -0700, Maxim Patlasov wrote:
...
@@ -643,6 +643,7 @@ static struct cgroup_subsys_state
*ve_create(struct cgroup *cg)
        ve->odirect_enable = 2;
      ve->fsync_enable = 2;
+    ve->experimental_fs_enable = 2;
For odirect_enable and fsync_enable, 2 means follow the host's config, 1
means enable unconditionally, and 0 means disable unconditionally. But
we don't want to allow a user inside a CT to enable this feature, right?
I thought it's OK to allow user inside CT to enable it if host sysadmin is
OK about it. The same logic as for odirect: by default
ve0->experimental_fs_enable = 0, so whatever user inside CT writes to this
knob, the feature is disabled. If sysadmin writes '1' to ve0->..., the
feature becomes enabled. If an user wants to voluntarily disable it inside
CT, that's OK too.

This is confusing. May be, we'd better add a new VE_FEATURE for the
purpose?
Not sure right now. I'll look at it and let you know later.
Technically, it is very easy to implement new VE_FEATURE for overlayfs. But
this approach is less flexible because we return EPERM from ve_write_64 if
CT is running, and we'll need to involve userspace team to make the feature
configurable and (possibly) persistent. Do you think it's worthy for
something we'll get rid of soon anyway (I mean as soon as PSBM-47981
resolved)?
Fair enough, not much point in introducing yet another feature for the
purpose, at least right now, sysctl should do for the beginning.

Come to think of it, do we really need this sysctl inside containers? I
mean, by enabling this sysctl on the host we open a possible system-wide
security hole, which a CT admin won't be able to mitigate by disabling
overlayfs inside her CT. So why would she need it for? To prevent
non-privileged CT users from mounting overlayfs inside a user ns? But
overlayfs is not permitted to be mounted by a userns root anyway AFAICS.
May be, just drop in-CT sysctl then?
Currently, anyone who can login into CT as root may mount overlayfs, then
try to exploit its weak sides. This is a problem.

Until we ensure that overlayfs is production-ready (at least does not have
obvious breaches), let's disable it by default (of course, if ve != ve0).
Those who want to play with overlayfs at their own risk will enable it by
turning on some knob on host system (ve == ve0).

I don't think that mixing trusted (overlayfs-enabled) CTs and not trusted
(overlayfs-disabled) CTs on the same physical node is important use-case for
now. So, any simple system-wide knob must work.
<nod>

Essentially, the same scheme
with odirect: by default it is '0' in ve0 and the root inside CT cannot turn
it on; and if it is manually set to '1' in ve0, the behavior will depend on
per-CT root willing.
No, that's not how it works. AFAICS (see may_use_odirect),

   ve0 sysctl   ve sysctl       odirect allowed in ve?
   x            0               0
   x            1               1
   x            2               x

i.e. system-wide sysctl can't be used to disallow odirect inside a VE,
while you want a different behavior AFAIU - you want to enable overlayfs
if both ve0 sysctl and ve sysctl are set. That's why the patch looks
confusing to me.

Oh, yeah, it's my fault -- I didn't read may_use_odirect() carefully enough. Now I see it checks ve0 only if per-CT sysctl is '2'.


Let's only leave system-wide sysctl for permitting overlayfs. VE sysctl
doesn't make any sense - only root user is allowed to mount overlayfs
inside a CT and she can set this sysctl anyway.

OK, I agree.
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to