Re: [libvirt] [PATCH] [RFC] virSetUIDGID: Don't leak supplementary groups

2015-11-17 Thread Richard Weinberger
On Wed, Jun 24, 2015 at 11:19 AM, Martin Kletzander  wrote:
> On Tue, Jun 23, 2015 at 01:48:42PM +0200, Richard Weinberger wrote:
>>
>> The LXC driver uses virSetUIDGID() to become UID/GID 0.
>> It passes an empty groups list to virSetUIDGID()
>> to get rid of all supplementary groups from the host side.
>> But virSetUIDGID() calls setgroups() only if the supplied list
>> is larger than 0.
>> This leads to a container root with unrelated supplementary groups.
>> In most cases this issue is unoticed as libvirtd runs as UID/GID 0
>> without any supplementary groups.
>>
>> Signed-off-by: Richard Weinberger 
>> ---
>> I've marked that patch as RFC as I'm not sure if all users of
>> virSetUIDGID()
>> expect this behavior too.
>>
>
> I went through the callers and I see no reason why setgroups should
> not be called.  ACK.  I also can't think of a use case in which we'd
> like to keep the supplemental groups.

Ping?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] lxc: Don't make container's TTY a controlling TTY

2015-11-17 Thread Richard Weinberger
On Tue, Jun 23, 2015 at 3:18 PM, Richard Weinberger  wrote:
> Userspace does not expect that the initial console
> is a controlling TTY. systemd can deal with that, others not.
> On sysv init distros getty will fail to spawn a controlling on
> /dev/console or /dev/tty1. Which will cause to whole container
> to reboot upon ctrl-c.
>
> This patch changes the behavior of libvirt to match the kernel
> behavior where the initial TTY is also not controlling.
>
> The only user visible change should be that a container with
> bash as PID 1 would complain. But this matches exactly the kernel
> be behavior with intit=/bin/bash.
> To get a controlling TTY for bash just run "setsid /bin/bash".
>
> Signed-off-by: Richard Weinberger 
> ---
>  src/lxc/lxc_container.c | 14 +-
>  1 file changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 11e9514..7d531e2 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -278,18 +278,6 @@ static int lxcContainerSetupFDs(int *ttyfd,
>"as the FDs are about to be closed for exec of "
>"the container init process");
>
> -if (setsid() < 0) {
> -virReportSystemError(errno, "%s",
> - _("setsid failed"));
> -goto cleanup;
> -}
> -
> -if (ioctl(*ttyfd, TIOCSCTTY, NULL) < 0) {
> -virReportSystemError(errno, "%s",
> - _("ioctl(TIOCSCTTY) failed"));
> -goto cleanup;
> -}
> -
>  if (dup2(*ttyfd, STDIN_FILENO) < 0) {
>  virReportSystemError(errno, "%s",
>   _("dup2(stdin) failed"));
> @@ -2210,7 +2198,7 @@ static int lxcContainerChild(void *data)
>
>  VIR_DEBUG("Container TTY path: %s", ttyPath);
>
> -ttyfd = open(ttyPath, O_RDWR|O_NOCTTY);
> +ttyfd = open(ttyPath, O_RDWR);
>  if (ttyfd < 0) {
>  virReportSystemError(errno,
>   _("Failed to open tty %s"),

Ping?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] lxc: Bind mount container TTYs

2015-11-17 Thread Richard Weinberger
On Fri, Jul 3, 2015 at 1:55 PM, Martin Kletzander  wrote:
> On Tue, Jun 23, 2015 at 04:38:57PM +0200, Richard Weinberger wrote:
>>
>> Instead of creating symlinks, bind mount the devices to
>> /dev/pts/XY.
>> Using bind mounts it is no longer needed to add pts devices
>> to files like /dev/securetty.
>>
>> Signed-off-by: Richard Weinberger 
>> ---
>> src/lxc/lxc_container.c | 38 +-
>> 1 file changed, 21 insertions(+), 17 deletions(-)
>>
>
> I spent ridiculously excessive time on this not working for me, but I
> just figured out that simple check whether that file is a symlink or
> not (inside the container) is enough.  Ant it works.
>
> ACK then, sorry for wasting your time with this as well.
>
>
>> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
>> index 7d531e2..ea76370 100644
>> --- a/src/lxc/lxc_container.c
>> +++ b/src/lxc/lxc_container.c
>> @@ -1141,6 +1141,20 @@ static int
>> lxcContainerMountFSDevPTS(virDomainDefPtr def,
>> return ret;
>> }
>>
>> +static int lxcContainerBindMountDevice(const char *src, const char *dst)
>> +{
>> +if (virFileTouch(dst, 0666) < 0)
>> +return -1;
>> +
>> +if (mount(src, dst, "none", MS_BIND, NULL) < 0) {
>> +virReportSystemError(errno, _("Failed to bind %s on to %s"), src,
>> + dst);
>> +return -1;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths)
>> {
>> size_t i;
>> @@ -1164,34 +1178,24 @@ static int lxcContainerSetupDevices(char
>> **ttyPaths, size_t nttyPaths)
>> }
>>
>> /* We have private devpts capability, so bind that */
>> -if (virFileTouch("/dev/ptmx", 0666) < 0)
>> +if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0)
>> return -1;
>>
>> -if (mount("/dev/pts/ptmx", "/dev/ptmx", "ptmx", MS_BIND, NULL) < 0) {
>> -virReportSystemError(errno, "%s",
>> - _("Failed to bind /dev/pts/ptmx on to
>> /dev/ptmx"));
>> -return -1;
>> -}
>> -
>> for (i = 0; i < nttyPaths; i++) {
>> char *tty;
>> if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0)
>> return -1;
>> -if (symlink(ttyPaths[i], tty) < 0) {
>> -virReportSystemError(errno,
>> - _("Failed to symlink %s to %s"),
>> - ttyPaths[i], tty);
>> -VIR_FREE(tty);
>> +
>> +if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) {
>> return -1;
>> +VIR_FREE(tty);
>> }
>> +
>> VIR_FREE(tty);
>> +
>> if (i == 0 &&
>> -symlink(ttyPaths[i], "/dev/console") < 0) {
>> -virReportSystemError(errno,
>> - _("Failed to symlink %s to
>> /dev/console"),
>> - ttyPaths[i]);
>> +lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0)
>> return -1;
>> -}
>> }
>> return 0;
>> }
>> --
>> 2.4.2
>>
>> --
>> libvir-list mailing list
>> libvir-list@redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list

Ping?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

2015-05-29 Thread Richard Weinberger
[CC'ing libvirt-lxc folks]

Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
> Richard Weinberger  writes:
> 
>> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
>>>> FWIW, it breaks also libvirt-lxc:
>>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys 
>>>> on /proc/sys flags=1021: Operation not permitted
>>>
>>> Interesting.  I had not anticipated a failure there?  And it is failing
>>> in remount?  Oh that is interesting.
>>>
>>> That implies that there is some flag of the original mount of /proc that
>>> the remount of /proc/sys is clearing, and that previously 
>>>
>>> The flags specified are current rdonly,remount,bind so I expect there
>>> are some other flags on proc that libvirt-lxc is clearing by accident
>>> and we did not fail before because the kernel was not enforcing things.
>>
>> Please see:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933
>> lxcContainerMountBasicFS()
>>
>> and:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850
>> lxcBasicMounts
>>
>>> What are the mount flags in a working libvirt-lxc?
>>
>> See:
>> test1:~ # cat /proc/self/mountinfo
>> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
> 
>> If you need more info, please let me know. :-)
> 
> Oh interesting I had not realized libvirt-lxc had grown an unprivileged
> mode using user namespaces.
> 
> This does appear to be a classic remount bug, where you are not
> preserving the permissions.  It appears the fact that the code
> failed to enforce locked permissions on the fresh mount of proc
> was hiding this bug until now.
> 
> I expect what you actually want is the code below:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..f008a7484bfe 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>  { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, 
> false },
> -{ "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false 
> },
> +{ "/proc/sys", "/proc/sys", NULL, 
> MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
>  { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, 
> false, false, true },
>  { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, 
> false, false, true },
>  { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, 
> false, false, false },
> 
> Or possibly just:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..a60ccbd12bfc 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>  { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, 
> false },
> -{ "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false 
> },
> +{ "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false 
> },
>  { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, 
> false, false, true },
>  { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, 
> false, false, true },
>  { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, 
> false, false, false },
> 
> As the there is little point in making /proc/sys read-only in a
> user-namespace, as the permission checks are uid based and no-one should
> have the global uid 0 in your container.  Making mounting /proc/sys
> read-only rather pointless.

Eric, using the patch below I was able to spawn a user-namespace enabled 
container
using libvirt-lxc. :-)

I had to:
1. Disable the read-only mount of /proc/sys which is anyway useless in the 
user-namespace case.
2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only 
needed for the non user-namespace case.
3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd 
have to keep this, though).

Daniel, I'd ta

Re: [libvirt] [CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

2015-06-16 Thread Richard Weinberger
Am 16.06.2015 um 14:31 schrieb Daniel P. Berrange:
> Thanks Richard / Eric for the suggested patches. I'll apply Eric's
> simplified patch to libvirt now, and backport it to our stable
> libvirt branches.

Thank you Daniel!

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] lxc: setsid() usage

2015-06-22 Thread Richard Weinberger
Hi!

Why is libvirt-lxc issuing a setsid() in lxcContainerSetupFDs()?
To me it seems like a hack to have a controlling TTY if PID 1 is /bin/bash.

If one runs a sysv init style distro (like Debian) in libvirt-lxc the setsid() 
has
a major downside, when getty spawns a login shell on /dev/tty1 it cannot become
the controlling tty. Hence, if one presses ctrl-c in such a session, the 
container will
reboot.

Interestingly it does not happen when a systemd distro is used.
Maybe because systemd completely closes and reopens the TTY?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] [RFC] virSetUIDGID: Don't leak supplementary groups

2015-06-23 Thread Richard Weinberger
The LXC driver uses virSetUIDGID() to become UID/GID 0.
It passes an empty groups list to virSetUIDGID()
to get rid of all supplementary groups from the host side.
But virSetUIDGID() calls setgroups() only if the supplied list
is larger than 0.
This leads to a container root with unrelated supplementary groups.
In most cases this issue is unoticed as libvirtd runs as UID/GID 0
without any supplementary groups.

Signed-off-by: Richard Weinberger 
---
I've marked that patch as RFC as I'm not sure if all users of virSetUIDGID()
expect this behavior too.

Thanks,
//richard
---
 src/util/virutil.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/virutil.c b/src/util/virutil.c
index cddc78a..ea697a3 100644
--- a/src/util/virutil.c
+++ b/src/util/virutil.c
@@ -1103,7 +1103,7 @@ virSetUIDGID(uid_t uid, gid_t gid, gid_t *groups 
ATTRIBUTE_UNUSED,
 }
 
 # if HAVE_SETGROUPS
-if (ngroups && setgroups(ngroups, groups) < 0) {
+if (setgroups(ngroups, groups) < 0) {
 virReportSystemError(errno, "%s",
  _("cannot set supplemental groups"));
 return -1;
-- 
2.4.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] lxc: setsid() usage

2015-06-23 Thread Richard Weinberger
Am 22.06.2015 um 16:51 schrieb Daniel P. Berrange:
> Also note systemd uses the device via /dev/console, not /dev/tty1
> and with 'container_ttys' we've told it not to use /dev/tty1 for
> gettys.  So maybe it deals with /dev/console in a different way
> than it would if it were /dev/tty1

BTW: Why are /dev/console and /dev/tty1 symlinks?
IMHO we could make them bind mounts to /dev/pts/XY.

That way one does not have to insert "pts/0" and such to /etc/securetty.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] lxc: setsid() usage

2015-06-23 Thread Richard Weinberger
Am 22.06.2015 um 16:51 schrieb Daniel P. Berrange:
> On Mon, Jun 22, 2015 at 04:40:37PM +0200, Richard Weinberger wrote:
>> Hi!
>>
>> Why is libvirt-lxc issuing a setsid() in lxcContainerSetupFDs()?
>> To me it seems like a hack to have a controlling TTY if PID 1 is /bin/bash.
> 
> I honestly can't remember the reason. It might have been to ensure we have
> separation from the libvirt_lxc session.

Hm, can be.

>> If one runs a sysv init style distro (like Debian) in libvirt-lxc the 
>> setsid() has
>> a major downside, when getty spawns a login shell on /dev/tty1 it cannot 
>> become
>> the controlling tty. Hence, if one presses ctrl-c in such a session, the 
>> container will
>> reboot.
> 
> Is that problem due to the fact we call setsid(), or due to use calling
> ioctl(TIOCSCTTY) ?

If I remove the TIOCSCTTY nothing changes.
Without setsid() libvirt is unable to start the container at all.
So, I fear you're right that it has something do to with the libvirt session.

>> Interestingly it does not happen when a systemd distro is used.
>> Maybe because systemd completely closes and reopens the TTY?
> 
> I have a feeling it does close & reopen the tty, but i dunno if
> that has an impact on the ability to set the controlling tty ?

My TTY-fu is not strong enough to answer that question.

> Also note systemd uses the device via /dev/console, not /dev/tty1
> and with 'container_ttys' we've told it not to use /dev/tty1 for
> gettys.  So maybe it deals with /dev/console in a different way
> than it would if it were /dev/tty1

This can also be. If I change Debian's getty to use /dev/console instead
of /dev/tty1 it is still unable to spwan a controlling tty.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] lxc: setsid() usage

2015-06-23 Thread Richard Weinberger
Am 23.06.2015 um 14:18 schrieb Richard Weinberger:
> Am 22.06.2015 um 16:51 schrieb Daniel P. Berrange:
>> On Mon, Jun 22, 2015 at 04:40:37PM +0200, Richard Weinberger wrote:
>>> Hi!
>>>
>>> Why is libvirt-lxc issuing a setsid() in lxcContainerSetupFDs()?
>>> To me it seems like a hack to have a controlling TTY if PID 1 is /bin/bash.
>>
>> I honestly can't remember the reason. It might have been to ensure we have
>> separation from the libvirt_lxc session.
> 
> Hm, can be.
> 
>>> If one runs a sysv init style distro (like Debian) in libvirt-lxc the 
>>> setsid() has
>>> a major downside, when getty spawns a login shell on /dev/tty1 it cannot 
>>> become
>>> the controlling tty. Hence, if one presses ctrl-c in such a session, the 
>>> container will
>>> reboot.
>>
>> Is that problem due to the fact we call setsid(), or due to use calling
>> ioctl(TIOCSCTTY) ?
> 
> If I remove the TIOCSCTTY nothing changes.
> Without setsid() libvirt is unable to start the container at all.
> So, I fear you're right that it has something do to with the libvirt session.

Found a way to deal with that. Patch is on the way.
The setsid() really only seems to be there to have a controlling TTY if PID 1 
is bash.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] lxc: Don't make container's TTY a controlling TTY

2015-06-23 Thread Richard Weinberger
Userspace does not expect that the initial console
is a controlling TTY. systemd can deal with that, others not.
On sysv init distros getty will fail to spawn a controlling on
/dev/console or /dev/tty1. Which will cause to whole container
to reboot upon ctrl-c.

This patch changes the behavior of libvirt to match the kernel
behavior where the initial TTY is also not controlling.

The only user visible change should be that a container with
bash as PID 1 would complain. But this matches exactly the kernel
be behavior with intit=/bin/bash.
To get a controlling TTY for bash just run "setsid /bin/bash".

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 11e9514..7d531e2 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -278,18 +278,6 @@ static int lxcContainerSetupFDs(int *ttyfd,
   "as the FDs are about to be closed for exec of "
   "the container init process");
 
-if (setsid() < 0) {
-virReportSystemError(errno, "%s",
- _("setsid failed"));
-goto cleanup;
-}
-
-if (ioctl(*ttyfd, TIOCSCTTY, NULL) < 0) {
-virReportSystemError(errno, "%s",
- _("ioctl(TIOCSCTTY) failed"));
-goto cleanup;
-}
-
 if (dup2(*ttyfd, STDIN_FILENO) < 0) {
 virReportSystemError(errno, "%s",
  _("dup2(stdin) failed"));
@@ -2210,7 +2198,7 @@ static int lxcContainerChild(void *data)
 
 VIR_DEBUG("Container TTY path: %s", ttyPath);
 
-ttyfd = open(ttyPath, O_RDWR|O_NOCTTY);
+ttyfd = open(ttyPath, O_RDWR);
 if (ttyfd < 0) {
 virReportSystemError(errno,
  _("Failed to open tty %s"),
-- 
2.4.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] lxc: Bind mount container TTYs

2015-06-23 Thread Richard Weinberger
Instead of creating symlinks, bind mount the devices to
/dev/pts/XY.
Using bind mounts it is no longer needed to add pts devices
to files like /dev/securetty.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 7d531e2..ea76370 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -1141,6 +1141,20 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def,
 return ret;
 }
 
+static int lxcContainerBindMountDevice(const char *src, const char *dst)
+{
+if (virFileTouch(dst, 0666) < 0)
+return -1;
+
+if (mount(src, dst, "none", MS_BIND, NULL) < 0) {
+virReportSystemError(errno, _("Failed to bind %s on to %s"), src,
+ dst);
+return -1;
+}
+
+return 0;
+}
+
 static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths)
 {
 size_t i;
@@ -1164,34 +1178,24 @@ static int lxcContainerSetupDevices(char **ttyPaths, 
size_t nttyPaths)
 }
 
 /* We have private devpts capability, so bind that */
-if (virFileTouch("/dev/ptmx", 0666) < 0)
+if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0)
 return -1;
 
-if (mount("/dev/pts/ptmx", "/dev/ptmx", "ptmx", MS_BIND, NULL) < 0) {
-virReportSystemError(errno, "%s",
- _("Failed to bind /dev/pts/ptmx on to 
/dev/ptmx"));
-return -1;
-}
-
 for (i = 0; i < nttyPaths; i++) {
 char *tty;
 if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0)
 return -1;
-if (symlink(ttyPaths[i], tty) < 0) {
-virReportSystemError(errno,
- _("Failed to symlink %s to %s"),
- ttyPaths[i], tty);
-VIR_FREE(tty);
+
+if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) {
 return -1;
+VIR_FREE(tty);
 }
+
 VIR_FREE(tty);
+
 if (i == 0 &&
-symlink(ttyPaths[i], "/dev/console") < 0) {
-virReportSystemError(errno,
- _("Failed to symlink %s to /dev/console"),
- ttyPaths[i]);
+lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0)
 return -1;
-}
 }
 return 0;
 }
-- 
2.4.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] lxc: Bind mount container TTYs

2015-06-28 Thread Richard Weinberger
Am 26.06.2015 um 15:09 schrieb Martin Kletzander:
> On Tue, Jun 23, 2015 at 04:38:57PM +0200, Richard Weinberger wrote:
>> Instead of creating symlinks, bind mount the devices to
>> /dev/pts/XY.
>> Using bind mounts it is no longer needed to add pts devices
>> to files like /dev/securetty.
>>
> 
> I guess you meant /etc/securetty.

Lol, yes. :-)

> This patch makes sense, but if I start a container that I couldn't
> login as a root into (because of securetty), it still doesn't help, I
> still can't login.  Moreover, if I stop it and start it few times and
> restart the daemon (I'm not sure whether that's needed, it's just that
> I had to switch between gdb and non-gdb daemons and it happened only
> sometimes), I get this:
> 
>  error: internal error: guest failed to start: unexpected exit status 125
> 
> The error in log is:
> 
>  libvirt:  error : failed to setup stdout file handle: Bad file descriptor
> 
> I briefly looked at it and *cmd->outfdptr has the value of 247083264
> which is nowhere in the output of lsof for that process.  I know that
> it doesn't sounds even remotely related, but without this patch that
> doesn't happen.  Maybe it just uncovers some error rotting there for a
> long time...

Hmm, very strange. What guest container are you using?
I tried with a Debian jessi and had user namespace enabled.

Thanks,
//richard



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] lxc: Bind mount container TTYs

2015-06-30 Thread Richard Weinberger
Am 30.06.2015 um 19:12 schrieb Martin Kletzander:
>> Hmm, very strange. What guest container are you using?
>> I tried with a Debian jessi and had user namespace enabled.
>>
> 
> Sorry for the late reply.  I used simple one.  Only gentoo's stage 3
> unpacked into a directory, no special settings used for it.  Removing
> /etc/securetty works for me.  I'll give it another try, but probably
> after the freeze.  If anyone else wants to review this, don't get
> stopped by the problems I'm having!

Hmm, just gave gentoo a try, worked perfectly fine.
Can you share your xml?

This is mine:

gentoo
524288

exe
/sbin/init




















Thanks,
//richard



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] lxc: Bind mount container TTYs

2015-07-01 Thread Richard Weinberger
Am 01.07.2015 um 11:40 schrieb Martin Kletzander:
> On Tue, Jun 30, 2015 at 07:54:25PM +0200, Richard Weinberger wrote:
>> Am 30.06.2015 um 19:12 schrieb Martin Kletzander:
>>>> Hmm, very strange. What guest container are you using?
>>>> I tried with a Debian jessi and had user namespace enabled.
>>>>
>>>
>>> Sorry for the late reply.  I used simple one.  Only gentoo's stage 3
>>> unpacked into a directory, no special settings used for it.  Removing
>>> /etc/securetty works for me.  I'll give it another try, but probably
>>> after the freeze.  If anyone else wants to review this, don't get
>>> stopped by the problems I'm having!
>>
>> Hmm, just gave gentoo a try, worked perfectly fine.
> 
> I tried with latest master with and without your patch.  Wtih your
> patch I got to the problem exactly once even though I tried multiple
> times.  And even though it didn't happen to me at all without your
> patch, I'm thinking it's just some weird rare race and it's not
> related to what you've sent.  That just wouldn't make sense to me.
> 
> I also suspected the problem being me starting with --console
> parameter, but trying with and without that didn't help isolate the
> problem either.

--console works fine here.

> Anyway, that patch still doesn't help me get rid of /etc/securetty.
> The output of 'tty' is still /dev/pts/0 and unless I remove
> /etc/securetty it doesn't start.  What is the output of 'tty' and what
> ttys do you have in /etc/securetty in your container?

tty prints as expected /dev/tty1. (instead of /dev/pts/xy)
/etc/securetty is from gentoo, I did not add anything.

Thanks,
//richard



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] LXC broken on Linux >= 3.15

2014-07-28 Thread Richard Weinberger
Hi!

Kernel commit 23adbe12 ("fs,userns: Change inode_capable to 
capable_wrt_inode_uidgid")
uncovered a libvirt-lxc issue.
Starting with that commit the kernel correctly checks also the gid of an inode.

Sadly this change breaks libvirt-lxc in a way such that openpty() will always 
fail
with -EPERM within a container. Therefore ssh and other programs are no longer 
usable.

Libvirt's virLXCControllerSetupDevPTS() has a hardcoded mount
string for mounting devpts, namely "newinstance,ptmxmode=0666,mode=0620,gid=5",
devpts correctly translates the uid and gid while mounting but libvirt
mounts devpts _before_ setting up the uid/gid mappings.
Therefore the internal gid for the new devpts instance is still 5 instead the 
mapped gid
and the new check in the kernel will always fail.

We have two options to fix that:
a) virLXCControllerSetupDevPTS() translates the gid (5) by hand and passes the 
correct
value to devpts. (IMHO hacky)

b) We setup devpts and therefore also the consoles after installing the 
mappings.
This needs maybe a bit of work.
First I thought a trivial patch like the appended one will do it, but then 
libvirt
fails to start a guest with no further explanation. Maybe I've later the time to
investigate further.

What do you think?

Thanks,
//richard

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 2d220eb..3435f42 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -2157,9 +2157,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl)
 if (virLXCControllerSetupResourceLimits(ctrl) < 0)
 goto cleanup;

-if (virLXCControllerSetupDevPTS(ctrl) < 0)
-goto cleanup;
-
 if (virLXCControllerPopulateDevices(ctrl) < 0)
 goto cleanup;

@@ -2172,9 +2169,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl)
 if (virLXCControllerSetupFuse(ctrl) < 0)
 goto cleanup;

-if (virLXCControllerSetupConsoles(ctrl, containerTTYPaths) < 0)
-goto cleanup;
-
 if (lxcSetPersonality(ctrl->def) < 0)
 goto cleanup;

@@ -2198,6 +2192,12 @@ virLXCControllerRun(virLXCControllerPtr ctrl)
 if (virLXCControllerSetupUserns(ctrl) < 0)
 goto cleanup;

+if (virLXCControllerSetupDevPTS(ctrl) < 0)
+goto cleanup;
+
+if (virLXCControllerSetupConsoles(ctrl, containerTTYPaths) < 0)
+goto cleanup;
+
 if (virLXCControllerMoveInterfaces(ctrl) < 0)
 goto cleanup;




signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] LXC broken on Linux >= 3.15

2014-07-28 Thread Richard Weinberger
Am 28.07.2014 16:37, schrieb Daniel P. Berrange:
> On Mon, Jul 28, 2014 at 04:25:56PM +0200, Richard Weinberger wrote:
>> Hi!
>>
>> Kernel commit 23adbe12 ("fs,userns: Change inode_capable to 
>> capable_wrt_inode_uidgid")
>> uncovered a libvirt-lxc issue.
>> Starting with that commit the kernel correctly checks also the gid of an 
>> inode.
>>
>> Sadly this change breaks libvirt-lxc in a way such that openpty() will 
>> always fail
>> with -EPERM within a container. Therefore ssh and other programs are no 
>> longer usable.
>>
>> Libvirt's virLXCControllerSetupDevPTS() has a hardcoded mount
>> string for mounting devpts, namely 
>> "newinstance,ptmxmode=0666,mode=0620,gid=5",
>> devpts correctly translates the uid and gid while mounting but libvirt
>> mounts devpts _before_ setting up the uid/gid mappings.
>> Therefore the internal gid for the new devpts instance is still 5 instead 
>> the mapped gid
>> and the new check in the kernel will always fail.
>>
>> We have two options to fix that:
>> a) virLXCControllerSetupDevPTS() translates the gid (5) by hand and passes 
>> the correct
>> value to devpts. (IMHO hacky)
> 
> You mean that instead of passing the value '5', if the guest
> GIDs had been remapped to start at 1000, we would pass in
> '1005' to mount ?  I don't think that's hacky - it seems like
> a perfectly sensible fix to do.

Correct.
If you're fine with that I'll happily submit a patch.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] LXC: Fix virLXCControllerSetupDevPTS() wrt user namespaces

2014-07-28 Thread Richard Weinberger
The gid value passed to devpts has to be translated by hand as
virLXCControllerSetupDevPTS() is called before setting up the user
and group mappings.
Otherwise devpts will use an unmapped gid and openpty()
will fail within containers.
Linux commit commit 23adbe12
("fs,userns: Change inode_capable to capable_wrt_inode_uidgid")
uncovered that issue.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_controller.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 2d220eb..82ecf12 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -1164,6 +1164,19 @@ static int virLXCControllerMain(virLXCControllerPtr ctrl)
 return rc;
 }
 
+static uint32_t
+virLXCControllerLookupUsernsMap(virDomainIdMapEntryPtr map, int num,
+uint32_t src)
+{
+int i;
+
+for (i = 0; i < num; i++) {
+if (src > map[i].start && src < map[i].start + map[i].count)
+return map[i].target + (src - map[i].start);
+}
+
+return src;
+}
 
 static int
 virLXCControllerSetupUsernsMap(virDomainIdMapEntryPtr map,
@@ -1930,6 +1943,7 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl)
 char *opts = NULL;
 char *devpts = NULL;
 int ret = -1;
+gid_t ptsgid = 5;
 
 VIR_DEBUG("Setting up private /dev/pts");
 
@@ -1949,10 +1963,17 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl)
 goto cleanup;
 }
 
+if (ctrl->def->idmap.ngidmap)
+ptsgid =
+virLXCControllerLookupUsernsMap(ctrl->def->idmap.gidmap,
+ctrl->def->idmap.ngidmap,
+ptsgid);
+
 /* XXX should we support gid=X for X!=5 for distros which use
  * a different gid for tty?  */
-if (virAsprintf(&opts, "newinstance,ptmxmode=0666,mode=0620,gid=5%s",
-(mount_options ? mount_options : "")) < 0)
+if (virAsprintf
+(&opts, "newinstance,ptmxmode=0666,mode=0620,gid=%u%s", ptsgid,
+ (mount_options ? mount_options : "")) < 0)
 goto cleanup;
 
 VIR_DEBUG("Mount devpts on %s type=tmpfs flags=%x, opts=%s",
-- 
2.0.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Re: [PATCH 2/9] LXC: set IP addresses to veth devices in the container

2014-08-01 Thread Richard Weinberger
On Wed, Jul 30, 2014 at 8:14 PM, Cedric Bosdonnat  wrote:
> Hi all,
>
> On Fri, 2014-07-25 at 17:03 +0200, Cédric Bosdonnat wrote:
>> Uses the new virDomainNetDef ips to set the IP addresses on the network
>> interfaces in the container.
>> ---
>>  src/lxc/lxc_container.c | 20 +++-
>>  1 file changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
>> index 1cf2c8f..62e9d76 100644
>> --- a/src/lxc/lxc_container.c
>> +++ b/src/lxc/lxc_container.c
>> @@ -495,7 +495,7 @@ static int 
>> lxcContainerRenameAndEnableInterfaces(virDomainDefPtr vmDef,
>>   char **veths)
>>  {
>>  int rc = 0;
>> -size_t i;
>> +size_t i, j;
>>  char *newname = NULL;
>>  virDomainNetDefPtr netDef;
>>  bool privNet = vmDef->features[VIR_DOMAIN_FEATURE_PRIVNET] ==
>> @@ -516,6 +516,24 @@ static int 
>> lxcContainerRenameAndEnableInterfaces(virDomainDefPtr vmDef,
>>  if (rc < 0)
>>  goto error_out;
>>
>> +for (jvirNetDevSetIPv4Address() is not optimal as it needs tools 
>> installed in the
container because it runs everything within it.
= 0; j < netDef->nips; j++) {
>> +virDomainNetIpDefPtr ip = netDef->ips[j];
>> +unsigned int prefix = (ip->prefix > 0) ? ip->prefix : 24;
>> +virSocketAddr address;
>> +
>> +if (virSocketAddrParse(&address, ip->address, AF_UNSPEC) < 0)
>> +goto error_out;
>> +
>> +VIR_DEBUG("Adding IP address '%s/%u' to '%s'",
>> +  ip->address, ip->prefix, newname);
>> +if (virNetDevSetIPv4Address(newname, &address, prefix) < 0) {
>
> I'm just thinking that this requires to have either ip-route or ifconfig
> installed in the container... which is pretty unlikely. Should I go for
> an implementation using the kernel functions directly?

I'd not say unlikely but it is a use case to consider.

Implementing ip/ifconfig directly in libvirtd and using the raw kernel
interface seems
cumbersome to me.
The problem with virNetDevSetIPv4Address() is that you call it after
entering all namespaces
and hence you need ip/ifconfig installed in the container.

Enter only the network namespace and then call it.
This way you can configure the network stuff easily for the container using the
host tools. Like ip netns exec ... does.

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] LXC: Fix virLXCControllerSetupDevPTS() wrt user namespaces

2014-08-05 Thread Richard Weinberger
Am 29.07.2014 05:45, schrieb chenhanx...@cn.fujitsu.com:
> 
> 
>> -Original Message-
>> From: libvir-list-boun...@redhat.com [mailto:libvir-list-boun...@redhat.com]
>> On Behalf Of Richard Weinberger
>> Sent: Tuesday, July 29, 2014 4:59 AM
>> To: libvir-list@redhat.com
>> Cc: Richard Weinberger; da...@sigma-star.at
>> Subject: [libvirt] [PATCH] LXC: Fix virLXCControllerSetupDevPTS() wrt user
>> namespaces
>>
>> The gid value passed to devpts has to be translated by hand as
>> virLXCControllerSetupDevPTS() is called before setting up the user
>> and group mappings.
>> Otherwise devpts will use an unmapped gid and openpty()
>> will fail within containers.
>> Linux commit commit 23adbe12
>> ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid")
>> uncovered that issue.
>>
>> Signed-off-by: Richard Weinberger 
> 
> Reviewed-by: Chen Hanxiao 
> 

ping

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH RFC] LXC: add HOME environment variable

2014-08-05 Thread Richard Weinberger
On Fri, Jul 25, 2014 at 8:39 AM, Chen Hanxiao
 wrote:
> We lacked of HOME environment variable,
> set 'HOME=/' as default.
>
> Signed-off-by: Chen Hanxiao 
> ---
>  src/lxc/lxc_container.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 1cf2c8f..9df9c04 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -236,6 +236,7 @@ static virCommandPtr 
> lxcContainerBuildInitCmd(virDomainDefPtr vmDef,
>  virCommandAddEnvString(cmd, "PATH=/bin:/sbin");
>  virCommandAddEnvString(cmd, "TERM=linux");
>  virCommandAddEnvString(cmd, "container=lxc-libvirt");
> +virCommandAddEnvString(cmd, "HOME=/");
>  virCommandAddEnvPair(cmd, "container_uuid", uuidstr);
>  if (nttyPaths > 1)
>      virCommandAddEnvPair(cmd, "container_ttys", 
> virBufferCurrentContent(&buf));

Looks sane to me.
Reviewed-by: Richard Weinberger 

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Verifying libvirt release tarballs

2014-08-11 Thread Richard Weinberger
Hi!

How can I cryptographically verify libvirt releases?
There are no signature/hash files in http://libvirt.org/sources/.

All I see is that your git release tags are PGP signed.
So, anyone who cares has to ignore everything in http://libvirt.org/sources/
and needs to regenerate the tarball from git.
Or do I miss something?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH RFC] LXC: add HOME environment variable

2014-08-12 Thread Richard Weinberger
On Mon, Aug 11, 2014 at 11:13 AM, Daniel P. Berrange
 wrote:
> On Tue, Aug 05, 2014 at 02:40:53AM +, chenhanx...@cn.fujitsu.com wrote:
>> ping
>>
>> > -Original Message-
>> > From: libvir-list-boun...@redhat.com 
>> > [mailto:libvir-list-boun...@redhat.com]
>> > On Behalf Of Chen Hanxiao
>> > Sent: Friday, July 25, 2014 2:40 PM
>> > To: libvir-list@redhat.com
>> > Subject: [libvirt] [PATCH RFC] LXC: add HOME environment variable
>> >
>> > We lacked of HOME environment variable,
>> > set 'HOME=/' as default.
>> >
>> > Signed-off-by: Chen Hanxiao 
>> > ---
>> >  src/lxc/lxc_container.c | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
>> > index 1cf2c8f..9df9c04 100644
>> > --- a/src/lxc/lxc_container.c
>> > +++ b/src/lxc/lxc_container.c
>> > @@ -236,6 +236,7 @@ static virCommandPtr
>> > lxcContainerBuildInitCmd(virDomainDefPtr vmDef,
>> >  virCommandAddEnvString(cmd, "PATH=/bin:/sbin");
>> >  virCommandAddEnvString(cmd, "TERM=linux");
>> >  virCommandAddEnvString(cmd, "container=lxc-libvirt");
>> > +virCommandAddEnvString(cmd, "HOME=/");
>> >  virCommandAddEnvPair(cmd, "container_uuid", uuidstr);
>> >  if (nttyPaths > 1)
>> >  virCommandAddEnvPair(cmd, "container_ttys",
>> > virBufferCurrentContent(&buf));
>
> I'm curious what expects to have a $HOME env var set. I'd tend to view
> the setting of $HOME to be something that the software in the container
> should take care of.

The kernel sets up $HOME for the init process.
Therefore any init can assume that $HOME is set.
libvirt currently violates that implicit rule.

> Setting HOME=/ in libvirt isn't a problem, I'm just curious why we need
> it.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list



-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] LXC: Fix virLXCControllerSetupDevPTS() wrt user namespaces

2014-08-14 Thread Richard Weinberger
Am 14.08.2014 14:35, schrieb Ján Tomko:
> On 07/28/2014 10:59 PM, Richard Weinberger wrote:
>> The gid value passed to devpts has to be translated by hand as
>> virLXCControllerSetupDevPTS() is called before setting up the user
>> and group mappings.
>> Otherwise devpts will use an unmapped gid and openpty()
>> will fail within containers.
>> Linux commit commit 23adbe12
> 
> s/commit commit/kernel commit/
> 
>> ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid")
>> uncovered that issue.
>>
>> Signed-off-by: Richard Weinberger 
>> ---
>>  src/lxc/lxc_controller.c | 25 +++--
>>  1 file changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
>> index 2d220eb..82ecf12 100644
>> --- a/src/lxc/lxc_controller.c
>> +++ b/src/lxc/lxc_controller.c
>> @@ -1164,6 +1164,19 @@ static int virLXCControllerMain(virLXCControllerPtr 
>> ctrl)
>>  return rc;
>>  }
>>  
>> +static uint32_t
> 
> I've changed this to 'unsigned int' to match the type used by 
> virDomainIdMapEntry.

Why is uint32_t wrong? :)

>> +virLXCControllerLookupUsernsMap(virDomainIdMapEntryPtr map, int num,
>> +uint32_t src)
>> +{
>> +int i;
> 
> This should be size_t to pass 'make syntax-check'.

/me pushes 'make syntax-check' to TODO list.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Richard Weinberger
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  wrote:
>
> Linus,
>
> Please pull the for-linus branch from the git tree:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> for-linus
>
>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
> /proc/thread-self/mounts instead of /proc/self/mounts
>
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
>
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
>
> The next change fixes a bug in NFS that was discovered while auditing
> nsproxy users for the first optimization.  Today you can oops the kernel
> by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
> namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
> yesterday when a build bot caught my typo.  Given that no one to my
> knowledge bases anything on my tree fixing the typo in place seems more
> responsible that requiring a typo-fix to be backported as well.
>
> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.
>
> Eric
>
> Eric W. Biederman (11):
>   namespaces: Use task_lock and not rcu to protect nsproxy
>   mnt: Only change user settable mount flags in remount
>   mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into 
> do_remount
>   mnt: Correct permission checks in do_remount

This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

/*
 * We can't immediately set the MS_RDONLY flag when mounting filesystems
 * because (in at least some kernel versions) this will propagate back
 * to the original mount in the host OS, turning it readonly too. Thus
 * we mount the filesystem in read-write mode initially, and then do a
 * separate read-only bind mount on top of that.
 */
bindOverReadonly = !!(mnt_mflags & MS_RDONLY);

VIR_DEBUG("Mount %s on %s type=%s flags=%x",
  mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {

 Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...

virReportSystemError(errno,
 _("Failed to mount %s on %s type %s flags=%x"),
 mnt_src, mnt->dst, NULLSTR(mnt->type),
 mnt_mflags & ~MS_RDONLY);
goto cleanup;
}

if (bindOverReadonly &&
mount(mnt_src, mnt->dst, NULL,
  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

 virReportSystemError(errno,
 _("Failed to re-mount %s on %s flags=%x"),
 mnt_src, mnt->dst,
 MS_BIND|MS_REMOUNT|MS_RDONLY);
goto cleanup;
}


-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Richard Weinberger
Am 21.08.2014 06:53, schrieb Eric W. Biederman:
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.
> 
> Could you please look at my user-namespace.git#for-next branch I have a
> fix for at least one regresion causing issue in there.  I think it may
> fix your issues but I am not fully certain more comments below.

I'll run this on my LXC testbed today.

>> /*
>>  * We can't immediately set the MS_RDONLY flag when mounting 
>> filesystems
>>  * because (in at least some kernel versions) this will propagate 
>> back
>>  * to the original mount in the host OS, turning it readonly too. 
>> Thus
>>  * we mount the filesystem in read-write mode initially, and then do 
>> a
>>  * separate read-only bind mount on top of that.
>>  */
>> bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>>
>> VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>>   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
>> ~MS_RDONLY, NULL) < 0) {
>>
>>  Here it fails for sysfs because with user namespaces we bind the
>> existing /sys into the container
>> and would have to read out all existing mount flags from the current /sys 
>> mount.
>> Otherwise mount() fails with EPERM.
>> On my test system /sys is mounted with
>> "rw,nosuid,nodev,noexec,relatime" and libvirt
>> misses the realtime...
> 
> Not specifying any atime flags to mount should be safe as that asks for
> the default atime flags which for remount I have made the default atime
> flags the existing atime flags.  So I am scratching my head a little on
> this one.

Okay, let me find out why exactly libvirt gets a EPERM here.
Maybe there are more odds hidden.

>>
>> virReportSystemError(errno,
>>  _("Failed to mount %s on %s type %s 
>> flags=%x"),
>>  mnt_src, mnt->dst, NULLSTR(mnt->type),
>>  mnt_mflags & ~MS_RDONLY);
>> goto cleanup;
>> }
>>
>> if (bindOverReadonly &&
>> mount(mnt_src, mnt->dst, NULL,
>>   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>>
>> ^^^ Here it fails because now we'd have to specify all flags as used
>> for the first
>> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
>> See lxcBasicMounts[].
>> In this case the fix is easy, add mnt_mflags to the mount flags.
> 
> That has always been a bug in general because remount has always
> required specifying the complete set of mount flags you want to have.
> 
> That fact that flags such as nosuid are now properly locked so you can
> not change them if you are not the global root user just makes this
> obvious.
> 
> Andy Lutermorski has observed that statvfs will return the mount flags
> making reading them simple.

Thanks for the clarification, I'll create a fix for libvirt.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 08:29, schrieb Richard Weinberger:
> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>>
>> Could you please look at my user-namespace.git#for-next branch I have a
>> fix for at least one regresion causing issue in there.  I think it may
>> fix your issues but I am not fully certain more comments below.
> 
> I'll run this on my LXC testbed today.

Looks good. With these patches applied libvirt works again. :)

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 15:12, schrieb Christoph Hellwig:
> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger  writes:
>>
>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
>>> wrote:
>>>
>>> This commit breaks libvirt-lxc.
>>> libvirt does in lxcContainerMountBasicFS():
>>
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
> 
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
unbreaks libvirt-lxc.
I hope it hits Linus tree and -stable before the offending commit hits users.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv2 00/16] LXC network configuration support

2014-08-26 Thread Richard Weinberger
On Tue, Aug 26, 2014 at 3:20 PM, Cédric Bosdonnat  wrote:
> Hi all,
>
> Here is the whole series resent with a major addition: the functions
> used to set the IP and add a route now use libnl when possible. The idea
> behind this is to avoid requiring iproute2 or ifconfig installed in the
> container rootfs.

What about my comments on v1?

> Otherwise nothing changed since v1.
>
> Cédric Bosdonnat (16):
>   Forgot to cleanup ifname_guest* in domain network def parsing
>   Domain conf: allow more than one IP address for net devices
>   LXC: set IP addresses to veth devices in the container
>   lxc conf2xml: convert IP addresses
>   Allow network capabilities hostdev to configure IP addresses
>   lxc conf2xml: convert ip addresses for hostdev NICs
>   Domain network devices can now have a  element
>   lxc conf2xml: convert lxc.network.ipv[46].gateway
>   LXC: use the new net devices gateway definition
>   LXC: honour network devices link state
>   Wrong place for virDomainNetIpsFormat
>   virNetDevSetIPv4Address: libnl implementation
>   Renamed virNetDevSetIPv4Address to virNetDevSetIPAddress
>   virNetDevAddRoute: implementation using netlink
>   virNetDevClearIPv4Address: netlink implementation
>   Renamed virNetDevClearIPv4Address to virNetDevClearIPAddress
>
>  docs/formatdomain.html.in  |  39 +++
>  docs/schemas/domaincommon.rng  |  55 +++-
>  src/conf/domain_conf.c | 214 +--
>  src/conf/domain_conf.h |  22 +-
>  src/libvirt_private.syms   |   7 +-
>  src/lxc/lxc_container.c|  74 -
>  src/lxc/lxc_native.c   | 173 
>  src/network/bridge_driver.c|   4 +-
>  src/openvz/openvz_conf.c   |   2 +-
>  src/openvz/openvz_driver.c |   6 +-
>  src/qemu/qemu_driver.c |  25 +-
>  src/qemu/qemu_hotplug.c|   6 +-
>  src/uml/uml_conf.c |   2 +-
>  src/util/virnetdev.c   | 305 
> ++---
>  src/util/virnetdev.h   |  12 +-
>  src/util/virnetlink.c  |  38 +++
>  src/util/virnetlink.h  |   2 +
>  src/vbox/vbox_common.c |   3 +-
>  src/xenconfig/xen_common.c |  15 +-
>  src/xenconfig/xen_sxpr.c   |  12 +-
>  .../lxcconf2xmldata/lxcconf2xml-physnetwork.config |   4 +
>  tests/lxcconf2xmldata/lxcconf2xml-physnetwork.xml  |   3 +
>  tests/lxcconf2xmldata/lxcconf2xml-simple.config|   4 +
>  tests/lxcconf2xmldata/lxcconf2xml-simple.xml   |   3 +
>  tests/lxcxml2xmldata/lxc-hostdev.xml   |   3 +
>  tests/lxcxml2xmldata/lxc-idmap.xml |   3 +
>  26 files changed, 880 insertions(+), 156 deletions(-)
>
> --
> 1.8.4.5
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list



-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCHv2 00/16] LXC network configuration support

2014-08-27 Thread Richard Weinberger
Cedric,

Am 27.08.2014 09:33, schrieb Cedric Bosdonnat:
> Hi Richard,
> 
> On Tue, 2014-08-26 at 22:32 +0200, Richard Weinberger wrote:
>> On Tue, Aug 26, 2014 at 3:20 PM, Cédric Bosdonnat  
>> wrote:
>>> Hi all,
>>>
>>> Here is the whole series resent with a major addition: the functions
>>> used to set the IP and add a route now use libnl when possible. The idea
>>> behind this is to avoid requiring iproute2 or ifconfig installed in the
>>> container rootfs.
>>
>> What about my comments on v1?
> 
> Entering only the network NS would have a larger impact on the container
> initialization code and we would still need to have iproute2/ifconfig
> installed in the container... and nothing guarantees that will be true.

no, you can use the tools from the _host_ side. Because you're only entering the
network namespace, not the mount namespace.

> OTOH, we are pretty sure we'll have rtnetlink support in the kernel.
> 
> And if for some reason we don't have libnl at build time, the old code
> using iproute2/ifconfig will still be used.
> 
> I went with a libnl implementation based on Laine's advise on IRC.

Okay. I'm not a huge fan of reimplementing iproute2 in libvirt but

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Entering freeze for libvirt-1.2.8

2014-08-27 Thread Richard Weinberger
On Wed, Aug 27, 2014 at 9:18 AM, Daniel Veillard  wrote:
>   So I tagged 1.2.8-rc1 in git and made tarball and signed rpms

Can you please sign the tarball too?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Entering freeze for libvirt-1.2.8

2014-08-28 Thread Richard Weinberger
Am 28.08.2014 09:14, schrieb Daniel Veillard:
> On Wed, Aug 27, 2014 at 08:45:29PM +0200, Richard Weinberger wrote:
>> On Wed, Aug 27, 2014 at 9:18 AM, Daniel Veillard  wrote:
>>>   So I tagged 1.2.8-rc1 in git and made tarball and signed rpms
>>
>> Can you please sign the tarball too?
> 
>   Well, the source rpm is signed, you can check it and it contains the
> tarball, so technically there is already a signed source out there.
> Signing a tarballl means putting out an additional file and keeping
> it forever, I could do that but hum 

So everyone how wants to build libvirt from source and cares about data
integrity has to unpack/verify the rpm?
Come on... :-)

Signing tarballs is nothing new nor rocket science.
In times where the NSA tries to f*ck everyone at least some basic
cryptographic arrangements should be applied.

I know other projects are sloppy regarding signed releases too, this does
not mean that libvirt should follow their bad example.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Entering freeze for libvirt-1.2.8

2014-09-02 Thread Richard Weinberger
Am 29.08.2014 12:03, schrieb Daniel Veillard:
> On Wed, Aug 27, 2014 at 08:45:29PM +0200, Richard Weinberger wrote:
>> On Wed, Aug 27, 2014 at 9:18 AM, Daniel Veillard  wrote:
>>>   So I tagged 1.2.8-rc1 in git and made tarball and signed rpms
>>
>> Can you please sign the tarball too?
> 
>   Okay, I went the simplest route of creating an asc for the tarball,
> my key is on the mit server:
> 
> user: "Daniel Veillard (Red Hat work email) "
> 1024-bit DSA key, ID DE95BC1F, created 2000-05-31
> 
>   I also added asc for the latest 1.2.x releases along the tarballs,

Sorry for the late response.
Thanks a lot for doing so, I really appreciate that. :)

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-09-03 Thread Richard Weinberger
On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
 wrote:
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

*kind reminder* :-)

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] CreateMachine: Input/output error

2014-09-26 Thread Richard Weinberger
Hi!

Sometimes libvirt (1.2.7) becomes unable to start any container.
Logs show only:
error : virDBusCall:1429 : error from service: CreateMachine: Input/output error
It looks like dbus_connection_send_with_reply_and_block() returns EIO.

Has anyone else seen this kind of issue?
I'm currently a bit puzzled where to look for the root cause.
Maybe it is related to dbus.

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] CreateMachine: Input/output error

2014-09-26 Thread Richard Weinberger
Chen,

Am 26.09.2014 10:23, schrieb Chen, Hanxiao:
>> Has anyone else seen this kind of issue?
>> I'm currently a bit puzzled where to look for the root cause.
>> Maybe it is related to dbus.
> 
> Could you share your XML config?
> Guess it's something with systemd.

There you go:

c_secret_name
524288

exe
/sbin/init





















Nothing special. My host is openSUSE 13.1.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] CreateMachine: Input/output error

2014-09-26 Thread Richard Weinberger
Chen,

Am 26.09.2014 11:49, schrieb Chen, Hanxiao:
> Hi Richard,
> 
>> -Original Message-----
>> From: Richard Weinberger [mailto:rich...@nod.at]
>> Sent: Friday, September 26, 2014 4:59 PM
>> To: Chen, Hanxiao/陈 晗霄; Richard Weinberger; libvir-list@redhat.com
>> Subject: Re: [libvirt] CreateMachine: Input/output error
>>
>> Chen,
>>
>> Am 26.09.2014 10:23, schrieb Chen, Hanxiao:
>>>> Has anyone else seen this kind of issue?
>>>> I'm currently a bit puzzled where to look for the root cause.
>>>> Maybe it is related to dbus.
>>>
>>> Could you share your XML config?
>>> Guess it's something with systemd.
>>
>> There you go:
>> 
>> c_secret_name
>> 524288
>> 
>> exe
>> /sbin/init
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> Nothing special. My host is openSUSE 13.1.
> 
> On fedora20 with systemd 208, upstream libvirt,
> I could reproduce it.

We're also on systemd 208.

> It not happened 100%,

Here it happened only twice within months.
Always in production, never on my testbed. :(

> but once it happened, the container could not be started anymore.
> One workaround is to undefine it and change another name.
> With a quick look, I did not find out a explanations.

Hmm, maybe systemd-machined did not cleanup everything upon
container exit.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] CreateMachine: Input/output error

2014-09-26 Thread Richard Weinberger
Chen,

Am 26.09.2014 11:54, schrieb Richard Weinberger:
>> On fedora20 with systemd 208, upstream libvirt,
>> I could reproduce it.
> 
> We're also on systemd 208.

I have an idea, maybe we need this commit in our systemd:
http://lists.freedesktop.org/archives/systemd-commits/2014-July/006543.html
It is on systemd since v215.

Thanks,
//richard


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] CreateMachine: Input/output error

2014-09-26 Thread Richard Weinberger
Am 26.09.2014 19:40, schrieb Guido Günther:
> On Fri, Sep 26, 2014 at 10:06:39AM +0200, Richard Weinberger wrote:
>> Hi!
>>
>> Sometimes libvirt (1.2.7) becomes unable to start any container.
>> Logs show only:
>> error : virDBusCall:1429 : error from service: CreateMachine: Input/output 
>> error
>> It looks like dbus_connection_send_with_reply_and_block() returns EIO.
>>
>> Has anyone else seen this kind of issue?
>> I'm currently a bit puzzled where to look for the root cause.
>> Maybe it is related to dbus.
> 
> I've seen this while cooking up
> 
>   https://www.redhat.com/archives/libvir-list/2014-September/msg01549.html
> 
> once. The machine didn't get listed anymore with machinectl, there
> were no cgroups left but using systemctl I could still see a scope
> named after that machine like
> 
> machine-qemu\x2.scope  failed failed
> 
> I didn't manage to get rid of that one besides booting.
> Cheers,

What systemd version did you use?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] CreateMachine: Input/output error

2014-09-29 Thread Richard Weinberger
Am 29.09.2014 11:13, schrieb Chen, Hanxiao:
> I'm not sure this commit could help
> because reproduce this issue looks like so unpredictable.

Yeah, maybe.

> I did some tests in the last weekend,
> unfortunately, I could not reproduce it again with both 208 and 215...

Same here. So far I was unable to reproduce it on my testbed. :-\

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv3 00/16] Network configuration for lxc containers

2014-10-10 Thread Richard Weinberger
On Fri, Oct 10, 2014 at 2:03 PM, Cédric Bosdonnat  wrote:
> Hi all,
>
> Here is a rebased version of v2. Nothing changed except the 'since' version 
> number
> in the added doc that has been updated.
>
> --
> Cedric
>
> Cédric Bosdonnat (16):
>   Forgot to cleanup ifname_guest* in domain network def parsing
>   Domain conf: allow more than one IP address for net devices
>   LXC: set IP addresses to veth devices in the container
>   lxc conf2xml: convert IP addresses
>   Allow network capabilities hostdev to configure IP addresses
>   lxc conf2xml: convert ip addresses for hostdev NICs
>   Domain network devices can now have a  element
>   lxc conf2xml: convert lxc.network.ipv[46].gateway
>   LXC: use the new net devices gateway definition
>   LXC: honour network devices link state
>   Wrong place for virDomainNetIpsFormat
>   virNetDevSetIPv4Address: libnl implementation
>   Renamed virNetDevSetIPv4Address to virNetDevSetIPAddress
>   virNetDevAddRoute: implementation using netlink
>   virNetDevClearIPv4Address: netlink implementation
>   Renamed virNetDevClearIPv4Address to virNetDevClearIPAddress

I still think that going down the netlink path is not optimal.
As stated before in v2 you can just enter the network namespace and use
the host tools to setup networking.
This way no tools have to be installed within the container and we'd not depend
on netlink with reinventing iproute2 tools.

>
>  docs/formatdomain.html.in  |  39 +++
>  docs/schemas/domaincommon.rng  |  55 +++-
>  src/conf/domain_conf.c | 214 +--
>  src/conf/domain_conf.h |  22 +-
>  src/libvirt_private.syms   |   7 +-
>  src/lxc/lxc_container.c|  74 -
>  src/lxc/lxc_native.c   | 173 
>  src/network/bridge_driver.c|   4 +-
>  src/openvz/openvz_conf.c   |   2 +-
>  src/openvz/openvz_driver.c |   6 +-
>  src/qemu/qemu_driver.c |  25 +-
>  src/qemu/qemu_hotplug.c|   6 +-
>  src/uml/uml_conf.c |   2 +-
>  src/util/virnetdev.c   | 305 
> ++---
>  src/util/virnetdev.h   |  12 +-
>  src/util/virnetlink.c  |  38 +++
>  src/util/virnetlink.h  |   2 +
>  src/vbox/vbox_common.c |   3 +-
>  src/xenconfig/xen_common.c |  15 +-
>  src/xenconfig/xen_sxpr.c   |  12 +-
>  .../lxcconf2xmldata/lxcconf2xml-physnetwork.config |   4 +
>  tests/lxcconf2xmldata/lxcconf2xml-physnetwork.xml  |   3 +
>  tests/lxcconf2xmldata/lxcconf2xml-simple.config|   4 +
>  tests/lxcconf2xmldata/lxcconf2xml-simple.xml   |   3 +
>  tests/lxcxml2xmldata/lxc-hostdev.xml   |   3 +
>  tests/lxcxml2xmldata/lxc-idmap.xml |   3 +
>  26 files changed, 880 insertions(+), 156 deletions(-)
>
> --
> 1.8.4.5
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list



-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/5] ip link needs 'name' in 3.16 to create the veth pair

2014-11-25 Thread Richard Weinberger
On Tue, Nov 25, 2014 at 9:21 AM, Cedric Bosdonnat  wrote:
> On Tue, 2014-11-25 at 08:42 +0100, Martin Kletzander wrote:
>> On Mon, Nov 24, 2014 at 09:54:44PM +0100, Cédric Bosdonnat wrote:
>> >Due to a change (or bug?) in ip link implementation, the command
>> >'ip link add vnet0...'
>> >is forced into
>> >'ip link add name vnet0...'
>> >The changed command also works on older versions of iproute2, just the
>> >'name' parameter has been made mandatory.
>> >---
>> > src/util/virnetdevveth.c | 4 ++--
>> > 1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c
>> >index e9d6f9c..ad30e1d 100644
>> >--- a/src/util/virnetdevveth.c
>> >+++ b/src/util/virnetdevveth.c
>> >@@ -89,7 +89,7 @@ static int virNetDevVethGetFreeNum(int startDev)
>> >  * @veth2: pointer to return name for container end of veth pair
>> >  *
>> >  * Creates a veth device pair using the ip command:
>> >- * ip link add veth1 type veth peer name veth2
>> >+ * ip link add name veth1 type veth peer name veth2
>> >  * If veth1 points to NULL on entry, it will be a valid interface on
>> >  * return.  veth2 should point to NULL on entry.
>> >  *
>> >@@ -146,7 +146,7 @@ int virNetDevVethCreate(char** veth1, char** veth2)
>> > }
>> >
>> > cmd = virCommandNew("ip");
>> >-virCommandAddArgList(cmd, "link", "add",
>> >+virCommandAddArgList(cmd, "link", "add", "name",
>> >  *veth1 ? *veth1 : veth1auto,
>> >  "type", "veth", "peer", "name",
>> >  *veth2 ? *veth2 : veth2auto,
>> >--
>> >2.1.2
>> >
>>
>> I agree, the 'name' was always there, just optional.  But what version
>> of iproute2 do you have that requires it?  I checked the current HEAD
>> and it's still optional.  This must be a bug in that particular
>> implementation.
>>
>> ACK if you can argue with the version or platform this is required
>> on.
>
> At least the 3.16 shipped on openSUSE 13.2 has that problem... though I
> think it's just a side effect of another change in iproute2. It worked
> fine with version 3.12.

Instead of papering over the issue in libvirt better ship a non-broken iproute2
in openSUSE 13.2.
real fix: 
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=f1b66ff

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-11-25 Thread Richard Weinberger
Eric,

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
 wrote:
> Richard Weinberger  writes:
>
>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>> Richard Weinberger  writes:
>>>>
>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
>>>>> wrote:
>>>>>
>>>>> This commit breaks libvirt-lxc.
>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>
>>>> The bugs fixed are security issues, so if we have to break a small
>>>> number of userspace applications we will.  Anything that we can
>>>> reasonably do to avoid regressions will be done.
>>>
>>> Can you explain the security issues in detail?  Breaking common
>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>> entirely unacceptable.
>>
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

Today I've upgraded my LXC testbed to the most recent kernel and found
libvirt-lxc broken again (sic!).
Remounting /proc/sys/ is failing.
Investigating into the issue showed that commit "mnt: Implicitly add
MNT_NODEV on remount as we do on mount"
is not mainline.
Why did you left out this patch? In my previous mails I explicitly
stated that exactly this commit unbreaks libvirt-lxc.

Now the userspace breaking changes are mainline and hit users hard. :-(

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/5] ip link needs 'name' in 3.16 to create the veth pair

2014-11-26 Thread Richard Weinberger
Am 26.11.2014 um 05:51 schrieb Martin Kletzander:
> On Tue, Nov 25, 2014 at 04:19:48PM +0100, Richard Weinberger wrote:
>> On Tue, Nov 25, 2014 at 9:21 AM, Cedric Bosdonnat  
>> wrote:
>>> On Tue, 2014-11-25 at 08:42 +0100, Martin Kletzander wrote:
>>>> On Mon, Nov 24, 2014 at 09:54:44PM +0100, Cédric Bosdonnat wrote:
>>>> >Due to a change (or bug?) in ip link implementation, the command
>>>> >'ip link add vnet0...'
>>>> >is forced into
>>>> >'ip link add name vnet0...'
>>>> >The changed command also works on older versions of iproute2, just the
>>>> >'name' parameter has been made mandatory.
>>>> >---
>>>> > src/util/virnetdevveth.c | 4 ++--
>>>> > 1 file changed, 2 insertions(+), 2 deletions(-)
>>>> >
>>>> >diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c
>>>> >index e9d6f9c..ad30e1d 100644
>>>> >--- a/src/util/virnetdevveth.c
>>>> >+++ b/src/util/virnetdevveth.c
>>>> >@@ -89,7 +89,7 @@ static int virNetDevVethGetFreeNum(int startDev)
>>>> >  * @veth2: pointer to return name for container end of veth pair
>>>> >  *
>>>> >  * Creates a veth device pair using the ip command:
>>>> >- * ip link add veth1 type veth peer name veth2
>>>> >+ * ip link add name veth1 type veth peer name veth2
>>>> >  * If veth1 points to NULL on entry, it will be a valid interface on
>>>> >  * return.  veth2 should point to NULL on entry.
>>>> >  *
>>>> >@@ -146,7 +146,7 @@ int virNetDevVethCreate(char** veth1, char** veth2)
>>>> > }
>>>> >
>>>> > cmd = virCommandNew("ip");
>>>> >-virCommandAddArgList(cmd, "link", "add",
>>>> >+virCommandAddArgList(cmd, "link", "add", "name",
>>>> >  *veth1 ? *veth1 : veth1auto,
>>>> >  "type", "veth", "peer", "name",
>>>> >  *veth2 ? *veth2 : veth2auto,
>>>> >--
>>>> >2.1.2
>>>> >
>>>>
>>>> I agree, the 'name' was always there, just optional.  But what version
>>>> of iproute2 do you have that requires it?  I checked the current HEAD
>>>> and it's still optional.  This must be a bug in that particular
>>>> implementation.
>>>>
>>>> ACK if you can argue with the version or platform this is required
>>>> on.
>>>
>>> At least the 3.16 shipped on openSUSE 13.2 has that problem... though I
>>> think it's just a side effect of another change in iproute2. It worked
>>> fine with version 3.12.
>>
>> Instead of papering over the issue in libvirt better ship a non-broken 
>> iproute2
>> in openSUSE 13.2.
>> real fix: 
>> https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=f1b66ff
>>
> 
> Oh, thank you for finding that, I should've done my homework!  Since
> it really is just a bug on iproute2 side in openSUSE, I'd rather keep
> it in its original state.  And since the patch is already pushed, I'm
> inclining to reverting it.

Yes, please revert the libvirt change.
openSUSE is working on the issue: 
https://bugzilla.novell.com/show_bug.cgi?id=907093

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/5] ip link needs 'name' in 3.16 to create the veth pair

2014-11-26 Thread Richard Weinberger
Am 26.11.2014 um 09:25 schrieb Cedric Bosdonnat:
> Hi Martin,
> 
> On Wed, 2014-11-26 at 05:51 +0100, Martin Kletzander wrote:
>>> Instead of papering over the issue in libvirt better ship a non-broken 
>>> iproute2
>>> in openSUSE 13.2.
>>> real fix: 
>>> https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=f1b66ff
>>>
>>
>> Oh, thank you for finding that, I should've done my homework!  Since
>> it really is just a bug on iproute2 side in openSUSE, I'd rather keep
>> it in its original state.  And since the patch is already pushed, I'm
>> inclining to reverting it.
>>
>> Other opinions?
> 
> Quoting a colleague of mine working on the network stack:
> 
> [this is] a regression in (upstream) iproute2 3.16, fixed in 3.17.  
> bnc#907093 has been created for it. (Factory is also affected but a
> submit request is already on its way.)
> 
> So I think we should keep that for those running the buggy 3.16.

openSUSE has to fix their package and to serve a bugfix update, full stop.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/5] ip link needs 'name' in 3.16 to create the veth pair

2014-11-26 Thread Richard Weinberger
Am 26.11.2014 um 10:16 schrieb Cedric Bosdonnat:
> On Wed, 2014-11-26 at 09:34 +0100, Richard Weinberger wrote:
>> Am 26.11.2014 um 09:25 schrieb Cedric Bosdonnat:
>>> Hi Martin,
>>>
>>> On Wed, 2014-11-26 at 05:51 +0100, Martin Kletzander wrote:
>>>>> Instead of papering over the issue in libvirt better ship a non-broken 
>>>>> iproute2
>>>>> in openSUSE 13.2.
>>>>> real fix: 
>>>>> https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=f1b66ff
>>>>>
>>>>
>>>> Oh, thank you for finding that, I should've done my homework!  Since
>>>> it really is just a bug on iproute2 side in openSUSE, I'd rather keep
>>>> it in its original state.  And since the patch is already pushed, I'm
>>>> inclining to reverting it.
>>>>
>>>> Other opinions?
>>>
>>> Quoting a colleague of mine working on the network stack:
>>>
>>> [this is] a regression in (upstream) iproute2 3.16, fixed in 3.17.  
>>> bnc#907093 has been created for it. (Factory is also affected but a
>>> submit request is already on its way.)
>>>
>>> So I think we should keep that for those running the buggy 3.16.
>>
>> openSUSE has to fix their package and to serve a bugfix update, full stop.
> 
> Thought that may not happen only to openSUSE... and that fix didn't harm
> at all.

Yes, but "fixing" this issue in libvirt is not correct.
It needs to be fixed in the right place and iproute2 this is.

libvirt is not the only user of this iproute2 feature.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/5] ip link needs 'name' in 3.16 to create the veth pair

2014-11-26 Thread Richard Weinberger
Am 26.11.2014 um 14:23 schrieb Eric Blake:
> On 11/26/2014 02:25 AM, Richard Weinberger wrote:
> 
>>>>> So I think we should keep that for those running the buggy 3.16.
>>>>
>>>> openSUSE has to fix their package and to serve a bugfix update, full stop.
>>>
>>> Thought that may not happen only to openSUSE... and that fix didn't harm
>>> at all.
>>
>> Yes, but "fixing" this issue in libvirt is not correct.
>> It needs to be fixed in the right place and iproute2 this is.
>>
>> libvirt is not the only user of this iproute2 feature.
> 
> I agree that making all clients work around a bug is annoying; on the
> other hand, once a client has worked around it, I see no reason to force
> the client to have to revert, since the workaround is arguably more legible.

I agree that we don't have to force the revert.
But we have to be very sure that *every* iproute2 version understands the
new command line. I'm not an iproute2 expert, so I can't tell for sure.
I doubt I'd revert it. Distros which ship the broken iproute2 package have
to fix it anyways as a lot of other tools/scripts break too.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] systemd-cgroups-agent not working in containers

2014-11-26 Thread Richard Weinberger
Hi!

I run a Linux container setup with openSUSE 13.1/2 as guest distro.
After some time containers slow down.
An investigation showed that the containers slow down because a lot of stale
user sessions slow down almost all systemd tools, mostly systemctl.
loginctl reports many thousand sessions.
All in state "closing".

The vast majority of these sessions are from crond an ssh logins.
It turned out that sessions are never closed and stay around.
The control group of a said session contains zero tasks.
So I started to explore why systemd keeps it.
After another few hours of debugging I realized that systemd never
issues the release signal from cgroups.
Also calling the release agent by hand did not help. i.e.
/usr/lib/systemd/systemd-cgroups-agent 
/user.slice/user-0.slice/session-c324.scope

Therefore systemd never recognizes that a server/session has no more tasks
and will close it.
First I thought it is an issue in libvirt combined with user namespaces.
But I can trigger this also without user namespaces and also with 
systemd-nspawn.
Tested with systemd 208 and 210 from openSUSE, their packages have all known 
bugfixes.

Any idea where to look further?
How do you run the most current systemd on your distro?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] systemd-cgroups-agent not working in containers

2014-11-27 Thread Richard Weinberger
Am 26.11.2014 um 22:29 schrieb Richard Weinberger:
> Hi!
> 
> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> After some time containers slow down.
> An investigation showed that the containers slow down because a lot of stale
> user sessions slow down almost all systemd tools, mostly systemctl.
> loginctl reports many thousand sessions.
> All in state "closing".
> 
> The vast majority of these sessions are from crond an ssh logins.
> It turned out that sessions are never closed and stay around.
> The control group of a said session contains zero tasks.
> So I started to explore why systemd keeps it.
> After another few hours of debugging I realized that systemd never
> issues the release signal from cgroups.
> Also calling the release agent by hand did not help. i.e.
> /usr/lib/systemd/systemd-cgroups-agent 
> /user.slice/user-0.slice/session-c324.scope
> 
> Therefore systemd never recognizes that a server/session has no more tasks
> and will close it.
> First I thought it is an issue in libvirt combined with user namespaces.
> But I can trigger this also without user namespaces and also with 
> systemd-nspawn.
> Tested with systemd 208 and 210 from openSUSE, their packages have all known 
> bugfixes.
> 
> Any idea where to look further?
> How do you run the most current systemd on your distro?

Btw: I face exactly the same issue also on fc21 (guest is fc20).

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [systemd-devel] systemd-cgroups-agent not working in containers

2014-11-28 Thread Richard Weinberger
Am 28.11.2014 um 06:33 schrieb Martin Pitt:
> Hello all,
> 
> Cameron Norman [2014-11-27 12:26 -0800]:
>> On Wed, Nov 26, 2014 at 1:29 PM, Richard Weinberger  wrote:
>>> Hi!
>>>
>>> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
>>> After some time containers slow down.
>>> An investigation showed that the containers slow down because a lot of stale
>>> user sessions slow down almost all systemd tools, mostly systemctl.
>>> loginctl reports many thousand sessions.
>>> All in state "closing".
>>
>> This sounds similar to an issue that systemd-shim in Debian had.
>> Martin Pitt (helps to maintain systemd in Debian) fixed that issue; he
>> may have some ideas here. I CC'd him.
> 
> The problem with systemd-shim under sysvinit or upstart was that shim
> didn't set a cgroup release agent like systemd itself does. Thus the
> cgroups were never cleaned up after all the session processes died.
> (See 1.4 on https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
> for details)
> 
> I don't think that SUSE uses systemd-shim, I take it in that setup you
> are running systemd proper on both the host and the guest? Then I
> suggest checking the cgroups that correspond to the "closing" sessions
> in the container, i. e. /sys/fs/cgroup/systemd/.../session-XX.scope/tasks.
> If there are still processes in it, logind is merely waiting for them
> to exit (or set KillUserProcesses in logind.conf). If they are empty,
> check that /sys/fs/cgroup/systemd/.../session-XX.scope/notify_on_release is 1
> and that /sys/fs/cgroup/systemd/release_agent is set?

The problem is that within the container the release agent is not executed.
It is executed on the host side.

Lennart, how is this supposed to work?
Is the theory of operation that the host systemd sends 
org.freedesktop.systemd1.Agent Released
via dbus into the guest?
The guests systemd definitely does not receive such a signal.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

2014-11-29 Thread Richard Weinberger
Am 26.11.2014 um 00:15 schrieb Richard Weinberger:
> Eric,
> 
> On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
>  wrote:
>> Richard Weinberger  writes:
>>
>>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>>> Richard Weinberger  writes:
>>>>>
>>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman 
>>>>>>  wrote:
>>>>>>
>>>>>> This commit breaks libvirt-lxc.
>>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>>
>>>>> The bugs fixed are security issues, so if we have to break a small
>>>>> number of userspace applications we will.  Anything that we can
>>>>> reasonably do to avoid regressions will be done.
>>>>
>>>> Can you explain the security issues in detail?  Breaking common
>>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>>> entirely unacceptable.
>>>
>>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>>> unbreaks libvirt-lxc.
>>> I hope it hits Linus tree and -stable before the offending commit hits 
>>> users.
>>
>> I plan to send the pull request to Linus as soon as I have caught my
>> breath (from all of the conferences this week) that I can be certain I
>> am thinking clearly and not rushing things.
> 
> Today I've upgraded my LXC testbed to the most recent kernel and found
> libvirt-lxc broken again (sic!).
> Remounting /proc/sys/ is failing.
> Investigating into the issue showed that commit "mnt: Implicitly add
> MNT_NODEV on remount as we do on mount"
> is not mainline.
> Why did you left out this patch? In my previous mails I explicitly
> stated that exactly this commit unbreaks libvirt-lxc.
> 
> Now the userspace breaking changes are mainline and hit users hard. :-(

*kind ping*
...to make sure that this issue doesn't get lost.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv2] lxc: give RW access to /proc/sys/net/ipv[46] to containers

2014-12-11 Thread Richard Weinberger
On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat  wrote:
> Some programs want to change some values for the network interfaces
> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
> allows wicked to work on openSUSE 13.2+.
>
> In order to mount those folders RW but keep the rest of /proc/sys RO,
> we add temporary mounts for these folders before bind-mounting
> /proc/sys. Those mounts will be skipped if the container doesn't have
> its own network namespace.
>
> It may happen that one of the temporary mounts in /proc/ filesystem
> isn't available due to a missing kernel feature. We need not to fail
> in that case.

IMHO we should drop the read-only /proc mount completely.
The idea behind having a read-only /proc was to make a container less insecure
because user namespaces did not exist yet.

Now as user namespaces are mainline and considered stable we should
start dropping such hacks
instead of adding more of them.

As consequence of that libvirt has to decide what kind of container it
wants to support.
IMHO the only sane way is to enforce user namespaces to provide
reasonable isolation.
If an user can do bad things with a read-write /proc it need to be
fixed in the kernel
and not in libvirt.

Containers without user namespaces and a root within are insecure and
broken by design.

Just my two cents.

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCHv2] lxc: give RW access to /proc/sys/net/ipv[46] to containers

2014-12-13 Thread Richard Weinberger
Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange:
> On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote:
>> On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat  
>> wrote:
>>> Some programs want to change some values for the network interfaces
>>> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
>>> allows wicked to work on openSUSE 13.2+.
>>>
>>> In order to mount those folders RW but keep the rest of /proc/sys RO,
>>> we add temporary mounts for these folders before bind-mounting
>>> /proc/sys. Those mounts will be skipped if the container doesn't have
>>> its own network namespace.
>>>
>>> It may happen that one of the temporary mounts in /proc/ filesystem
>>> isn't available due to a missing kernel feature. We need not to fail
>>> in that case.
>>
>> IMHO we should drop the read-only /proc mount completely.
>> The idea behind having a read-only /proc was to make a container less
>> insecure because user namespaces did not exist yet.
> 
> Yep, read-only /proc was a (failed) attempt to predict the future - we
> originally expected we'd need that even when user namespaces arrived,
> but of course in the end it was a waste of time.

Correct. Let's reduce this waste of time and don't add more code. :-)

>> Now as user namespaces are mainline and considered stable we should
>> start dropping such hacks
>> instead of adding more of them.
> 
> I'm trying to think if there are any backwards compatibility problems
> if we got rid of read-only /proc but I can't imagine any app out there
> is actively checked for a read-only /proc, so we'd probably be safe
> to just switch it read-write.

Same here.
I'd be astonished if an application will break if you make /proc rw.
BTW: While we are here, let's make /sys/ also rw.
Again, if an application can do bad things, this is a plain kernel bug.

>> As consequence of that libvirt has to decide what kind of container it
>> wants to support.
>> IMHO the only sane way is to enforce user namespaces to provide
>> reasonable isolation.
>> If an user can do bad things with a read-write /proc it need to be
>> fixed in the kernel
>> and not in libvirt.
>>
>> Containers without user namespaces and a root within are insecure and
>> broken by design.
> 
> Well addition of MAC can make them secure, but of course if you have
> MAC, there's again no need to make /proc mount read-only.

The MAC policy has to be *perfect* and has to use white listing.
Also if you make your MAC too restrictive you'll break certain programs.
You need more than just deny access to some magic files in /sys and /proc.
If you deny for example mount(2) many applications will break, most notable 
systemd.

I propose the following:
a) Make /sys and /proc read-write
b) If one create a container without and uid/g mapping print a big fat warning
that such a container is not suitable for hostile guests.
If the user has a specific use case where he can trust all guests, fine. But we
have to document it clearly.
Maybe a new config flag a la  would help too. ;-)

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCHv3] lxc: give RW access to /proc/sys/net/ipv[46] to containers

2014-12-23 Thread Richard Weinberger
On Wed, Dec 10, 2014 at 10:40 AM, Cédric Bosdonnat  wrote:
> Some programs want to change some values for the network interfaces
> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
> allows wicked to work on openSUSE 13.2+.
>
> Reusing the lxcNeedNetworkNamespace function to tell
> lxcContainerMountBasicFS if the netns is disabled. When no netns is
> set up, then we don't mount the /proc/sys/net/ipv[46] folder RW as
> these would provide full access to the host NICs config.
> ---
>  Diff to v2:
>* mount from /.oldroot as suggested by Dan... removed the whole temporary
>  mount related code as it turned out useless.
>
>  src/lxc/lxc_container.c | 64 
> +++--
>  1 file changed, 41 insertions(+), 23 deletions(-)

So you continue ignoring my comments.
Now this kludge is in git and I see the next hack in the pipeline.
"[PATCH RFC] LXC: don't RO mount /proc, /sys when user namespce enabled"
Great software design that is...

Enough moaning, can we please just drop the RO /sys and /proc mounts?
I'll happily submit a patch but I really want a clear signal from
maintainers whether we want
to continue with pseudo security or not.

BTW: We do we setup all these mounts in lxc_container.c anyway.
Wouldn't it make sense to define them
in the XML definition?

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH RFC] LXC: don't RO mount /proc, /sys when user namespce enabled

2014-12-23 Thread Richard Weinberger
On Mon, Dec 22, 2014 at 4:12 PM, Eric Blake  wrote:
> On 12/21/2014 08:57 PM, Chen Hanxiao wrote:
>
> s/namespce/namespace/ in the subject line
>
>> If we enabled user ns and provided a uid/gid map,
>> we do not need to mount /proc, /sys as readonly.
>> Leave it to kernel for protection.
>>
>> Signed-off-by: Chen Hanxiao 
>> ---
>>  src/lxc/lxc_container.c | 6 ++
>>  1 file changed, 6 insertions(+)
>
> I'll leave the actual patch review to someone more familiar with LXC
> namespace setups

This change will still mount some useless stuff like:
{ "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
MS_BIND, false, false, true },
{ "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
MS_BIND, false, false, true },

You can set skipUserNS for these.

But I *really* would like to see /proc and /sys mounted RW as default.
Please see my comment to:
[libvirt] [PATCHv3] lxc: give RW access to /proc/sys/net/ipv[46] to containers

-- 
Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH RFC] LXC: don't RO mount /proc, /sys when user namespce enabled

2014-12-24 Thread Richard Weinberger
Am 24.12.2014 um 03:23 schrieb Chen, Hanxiao:
> 
> 
>> -Original Message-----
>> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
>> Sent: Wednesday, December 24, 2014 5:36 AM
>> To: Eric Blake
>> Cc: Chen, Hanxiao/陈 晗霄; libvir-list@redhat.com
>> Subject: Re: [libvirt] [PATCH RFC] LXC: don't RO mount /proc, /sys when user
>> namespce enabled
>>
>> On Mon, Dec 22, 2014 at 4:12 PM, Eric Blake  wrote:
>>> On 12/21/2014 08:57 PM, Chen Hanxiao wrote:
>>>
>>> s/namespce/namespace/ in the subject line
>>>
>>>> If we enabled user ns and provided a uid/gid map,
>>>> we do not need to mount /proc, /sys as readonly.
>>>> Leave it to kernel for protection.
>>>>
>>>> Signed-off-by: Chen Hanxiao 
>>>> ---
>>>>  src/lxc/lxc_container.c | 6 ++
>>>>  1 file changed, 6 insertions(+)
>>>
>>> I'll leave the actual patch review to someone more familiar with LXC
>>> namespace setups
>>
>> This change will still mount some useless stuff like:
>> { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
>> MS_BIND, false, false, true },
>> { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
>> MS_BIND, false, false, true },
>>
>> You can set skipUserNS for these.
> 
> Thanks, I didn't notice that.
> 
>>
>> But I *really* would like to see /proc and /sys mounted RW as default.
>> Please see my comment to:
>> [libvirt] [PATCHv3] lxc: give RW access to /proc/sys/net/ipv[46] to 
>> containers
> 
> I see your new comments in that thread.
> If libvirt enable userns(provided a uid/gid map in XML),
> it's safe to drop RO mount completely;
> If not, I'm not sure whether it will bring back compatibility issues.
> 
> So let's wait for more comments from maintainers.

I Agree

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] lxc: Cleaning up mount setup

2015-01-08 Thread Richard Weinberger
Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange:
> We have historically done a number of things with LXC that are
> somewhat questionable in retrospect
> 
>  1. Mounted /proc/sys read-only, but then mounted
> /proc/sys/net/ipv* read-write again
>  2. Mounted /sys read only
>  3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
>  4. FUSE mount on /proc/meminfo
> 
> Items 1 & 2 are pointless as they offer no security benefit either
> with or without user namespaces. Without userns it is always insecure,
> with userns it is always secure, no matter what the mount state is.

I agree. Thanks a lot for addressing this, Daniel!

> Item 3 is some what dubious, since /proc/self/cgroup paths for
> processes are now not visible at /sys/fs/cgroup. This really
> confuses systemd inside the container making it create a broken
> layout

The question is, how to support systemd in containers?

As of now I'm not aware of a working concept.
With current libvirt it kind of works but recently I found a very nasty issue:
See: https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html

Maybe with cgroup namespaces it works. i.e. such that systemd can mount cgroupfs
within the container in a secure way.
The current discussion can be found here: https://lkml.org/lkml/2015/1/7/150

As of now I have to drop all my systemd lxc guests and will replace them by
a non-systemd distro, which is very sad. :-(

> Item 4 is some what dubious, since we're only changing some of the
> fields in /proc/meminfo. It helps apps which blindly parse
> /proc/meminfo to determine free system resources they can consume.
> Those apps are broken even without containers being involved though,
> since any application must expect to be placed inside a cgroup with
> limited resources. Faking /proc/meminfo is a pretty limited workaround
> that just delays the inevitable fixing of such apps..

You mean that tools like free(1) have to be patched to query also
memory limits from cgroupfs?

> The patch that follows just removes the items 1 & 2, but I'm thinking
> we should go further and remove items 3 & 4 too.
> 
> Changing 4 in particular though is certainly classed as a guest ABI
> change though, so is not something distros may wish to see when
> upgrading libvirt. There is scope to argue that 1-3 are guest ABI
> changes too
> 
> In full machine virt world, we deal with this using machine types.
> eg each new KVM version introduces a new machine type which models
> the guest ABI in a stable fashion. Guest machine types are fixed at
> time of first deployment. So when libvirt / KVM is upgraded, existing
> guests will not see any changes, but new guests will automatically
> get the new machine type.
> 
> I'm thinking we might want make use of this in LXC before making
> these changes. eg introduce a new machine 'libvirt-lxc-1' to
> represent the current guest mount setup and make sure all existing
> guests get that machine type. Then introduce a new machine type
> libvirt-lxc-2 that removes all this cruft, which new guests will
> get by default.
> 
> Alternatively we could call them 'libvirt-lxc-compat-1' and
> 'libvirt-lxc-bare-1' to give a clearer indication of their
> functional difference and version them separately in the future ?

Can we have a new machine type which enforces user namespaces?

> Regards,
> Daniel
> 
> Daniel P. Berrange (1):
>   lxc: Stop mouning /proc and /sys read only
> 
>  src/lxc/lxc_container.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)

Acked-by: Richard Weinberger 

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] lxc: Cleaning up mount setup

2015-01-08 Thread Richard Weinberger
Am 08.01.2015 um 14:45 schrieb Daniel P. Berrange:
> On Thu, Jan 08, 2015 at 02:36:36PM +0100, Richard Weinberger wrote:
>> Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange:
>>> We have historically done a number of things with LXC that are
>>> somewhat questionable in retrospect
>>>
>>>  1. Mounted /proc/sys read-only, but then mounted
>>> /proc/sys/net/ipv* read-write again
>>>  2. Mounted /sys read only
>>>  3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
>>>  4. FUSE mount on /proc/meminfo
>>>
>>> Items 1 & 2 are pointless as they offer no security benefit either
>>> with or without user namespaces. Without userns it is always insecure,
>>> with userns it is always secure, no matter what the mount state is.
>>
>> I agree. Thanks a lot for addressing this, Daniel!
>>
>>> Item 3 is some what dubious, since /proc/self/cgroup paths for
>>> processes are now not visible at /sys/fs/cgroup. This really
>>> confuses systemd inside the container making it create a broken
>>> layout
>>
>> The question is, how to support systemd in containers?
>>
>> As of now I'm not aware of a working concept.
>> With current libvirt it kind of works but recently I found a very nasty 
>> issue:
>> See: https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html
> 
> That reply from Lennart suggests systemd should pretty much work,
> albeit in a hacky way.

What hack to you mean?
*confused*

> I've not done much in anger with systemd in containers, but I have
> found it sufficient for application containers - ie not full OS
> containers with interactive sessions.

My use case is different. I need most of the time at least an init.
And if the distro is systemd based

>>
>> Maybe with cgroup namespaces it works. i.e. such that systemd can mount 
>> cgroupfs
>> within the container in a secure way.
>> The current discussion can be found here: https://lkml.org/lkml/2015/1/7/150
>>
>> As of now I have to drop all my systemd lxc guests and will replace them by
>> a non-systemd distro, which is very sad. :-(
>>
>>> Item 4 is some what dubious, since we're only changing some of the
>>> fields in /proc/meminfo. It helps apps which blindly parse
>>> /proc/meminfo to determine free system resources they can consume.
>>> Those apps are broken even without containers being involved though,
>>> since any application must expect to be placed inside a cgroup with
>>> limited resources. Faking /proc/meminfo is a pretty limited workaround
>>> that just delays the inevitable fixing of such apps..
>>
>> You mean that tools like free(1) have to be patched to query also
>> memory limits from cgroupfs?
> 
> Not neccessarily. The 'free' tool is said to
> 
>"Display amount of free and used memory in the system"
> 
> so it is arguably correct that it reports /proc/meminfo of the host
> as a whole.
> 
> What is broken are applications that are invoking 'free' and then
> believing that the values it reports correspond to what the
> application is able to use. ie the applications are not taking
> account that they might not have ability to use the entire system
> resources due to cgroups or containers or both.
> 
>>> The patch that follows just removes the items 1 & 2, but I'm thinking
>>> we should go further and remove items 3 & 4 too.
>>>
>>> Changing 4 in particular though is certainly classed as a guest ABI
>>> change though, so is not something distros may wish to see when
>>> upgrading libvirt. There is scope to argue that 1-3 are guest ABI
>>> changes too
>>>
>>> In full machine virt world, we deal with this using machine types.
>>> eg each new KVM version introduces a new machine type which models
>>> the guest ABI in a stable fashion. Guest machine types are fixed at
>>> time of first deployment. So when libvirt / KVM is upgraded, existing
>>> guests will not see any changes, but new guests will automatically
>>> get the new machine type.
>>>
>>> I'm thinking we might want make use of this in LXC before making
>>> these changes. eg introduce a new machine 'libvirt-lxc-1' to
>>> represent the current guest mount setup and make sure all existing
>>> guests get that machine type. Then introduce a new machine type
>>> libvirt-lxc-2 that removes all this cruft, which new guests will
>>> get by default.
>>>
>>> Alternatively we

Re: [libvirt] [PATCH] lxc: Cleaning up mount setup

2015-01-08 Thread Richard Weinberger
Am 08.01.2015 um 15:06 schrieb Daniel P. Berrange:
> On Thu, Jan 08, 2015 at 03:02:59PM +0100, Richard Weinberger wrote:
>> Am 08.01.2015 um 14:45 schrieb Daniel P. Berrange:
>>> On Thu, Jan 08, 2015 at 02:36:36PM +0100, Richard Weinberger wrote:
>>>> Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange:
>>>>> We have historically done a number of things with LXC that are
>>>>> somewhat questionable in retrospect
>>>>>
>>>>>  1. Mounted /proc/sys read-only, but then mounted
>>>>> /proc/sys/net/ipv* read-write again
>>>>>  2. Mounted /sys read only
>>>>>  3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
>>>>>  4. FUSE mount on /proc/meminfo
>>>>>
>>>>> Items 1 & 2 are pointless as they offer no security benefit either
>>>>> with or without user namespaces. Without userns it is always insecure,
>>>>> with userns it is always secure, no matter what the mount state is.
>>>>
>>>> I agree. Thanks a lot for addressing this, Daniel!
>>>>
>>>>> Item 3 is some what dubious, since /proc/self/cgroup paths for
>>>>> processes are now not visible at /sys/fs/cgroup. This really
>>>>> confuses systemd inside the container making it create a broken
>>>>> layout
>>>>
>>>> The question is, how to support systemd in containers?
>>>>
>>>> As of now I'm not aware of a working concept.
>>>> With current libvirt it kind of works but recently I found a very nasty 
>>>> issue:
>>>> See: 
>>>> https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html
>>>
>>> That reply from Lennart suggests systemd should pretty much work,
>>> albeit in a hacky way.
>>
>> What hack to you mean?
> 
> Lennarts reply detailing their workaround hacks:

Oh yes. But these do not work as I've stated in the mail.
My containers show thousands of orphaned login sessions
and render the container unusable after some time.

>> My use case is different. I need most of the time at least an init.
>> And if the distro is systemd based
>> Yeah but we have to warn the user that she is doing something insecure
>> if no mappings are set up.
> 
> Ultimately I think that's a docs problem, or something that a higher level
> app needs to deal with. eg OpenStack should setup LXC such that user
> namespaces are unconditionally enabled all the time, even if that's not
> the case in libvirt itself. OpenStack manages the whole machine, so it
> has enough context to do the setup that libvirt cannot do.

I don't run OpenStack but i tend to agree. :-)

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] LXC: fix order in virProcessGetNamespaces

2013-06-05 Thread Richard Weinberger
virProcessGetNamespaces() opens files in /proc/XXX/ns/ which will
later be passed to setns().
We have to make sure that the file descriptors in the array are in the correct
order. Otherwise setns() may fail.
The order has been taken from util-linux's sys-utils/nsenter.c

Signed-off-by: Richard Weinberger 
---
 src/util/virprocess.c | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/src/util/virprocess.c b/src/util/virprocess.c
index bc028d7..fce0d46 100644
--- a/src/util/virprocess.c
+++ b/src/util/virprocess.c
@@ -513,11 +513,11 @@ int virProcessGetNamespaces(pid_t pid,
 int **fdlist)
 {
 int ret = -1;
-DIR *dh = NULL;
 struct dirent *de;
 char *nsdir = NULL;
 char *nsfile = NULL;
-size_t i;
+char *ns_files[] = { "user", "ipc", "uts", "net", "pid", "mnt", NULL };
+size_t i = 0;
 
 *nfdlist = 0;
 *fdlist = NULL;
@@ -528,45 +528,32 @@ int virProcessGetNamespaces(pid_t pid,
 goto cleanup;
 }
 
-if (!(dh = opendir(nsdir))) {
-virReportSystemError(errno,
- _("Cannot read directory %s"),
- nsdir);
-goto cleanup;
-}
-
-while ((de = readdir(dh))) {
+while (ns_files[i]) {
 int fd;
-if (de->d_name[0] == '.')
-continue;
-
-if (VIR_EXPAND_N(*fdlist, *nfdlist, 1) < 0) {
+if (virAsprintf(&nsfile, "%s/%s", nsdir, ns_files[i]) < 0) {
 virReportOOMError();
 goto cleanup;
 }
 
-if (virAsprintf(&nsfile, "%s/%s", nsdir, de->d_name) < 0) {
-virReportOOMError();
-goto cleanup;
+if ((fd = open(nsfile, O_RDWR)) < 0) {
+goto next;
 }
 
-if ((fd = open(nsfile, O_RDWR)) < 0) {
-virReportSystemError(errno,
- _("Unable to open %s"),
- nsfile);
+if (VIR_EXPAND_N(*fdlist, *nfdlist, 1) < 0) {
+virReportOOMError();
 goto cleanup;
 }
 
 (*fdlist)[(*nfdlist)-1] = fd;
 
+next:
 VIR_FREE(nsfile);
+i++;
 }
 
 ret = 0;
 
 cleanup:
-if (dh)
-closedir(dh);
 VIR_FREE(nsdir);
 VIR_FREE(nsfile);
 if (ret < 0) {
-- 
1.8.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] How does virsh lxc-enter-namespace work? Does it?

2013-06-06 Thread Richard Weinberger

Hi!

I'm facing the issue that "virsh lxc-enter-namespace ..." does not work for me.
setns() always fails with EINVAL.

Reading the code confused me a bit, maybe you can help me. :D

virsh itself calls:
cmdLxcEnterNamespace()
 virDomainLxcOpenNamespace()
  conn->driver->domainLxcOpenNamespace()

Here comes the first thing that is not clear to me.
conn->driver seems to be the remote driver and therefore
->domainLxcOpenNamespace is remoteDomainLxcOpenNamespace()
Why is lxc:/// a remote connection?

remoteDomainLxcOpenNamespace() does a rpc call to libvirtd.

On the remote side libvirtd does:

lxcDispatchDomainOpenNamespace(), which opens the namespace fds,
and sends them back as result.
How can this work? Does it somewhere magic file descriptor passing
on AF_UNIX?

virsh then receives the fd's (pure numbers) and setns() failed badly.

Wouldn't it make much more sense to do the open(/proc/XXX/ns/{mnt, user, ...}) 
and setns()
calls directly on the local side? IOW directly in virsh?
driver->domainLxcOpenNamespace() should only report the process id of the 
container's
init process.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] LXC: fix order in virProcessGetNamespaces

2013-06-06 Thread Richard Weinberger

Am 06.06.2013 09:53, schrieb Daniel P. Berrange:

On Wed, Jun 05, 2013 at 11:23:07PM +0200, Richard Weinberger wrote:

virProcessGetNamespaces() opens files in /proc/XXX/ns/ which will
later be passed to setns().
We have to make sure that the file descriptors in the array are in the correct
order. Otherwise setns() may fail.


What is the scenario / cause of the failure ?


You cannot attach to namespaces in random order.
For example with user namespaces an unprivileged can enter other namespaces.
But to do so you have to enter the user namespace first and then
the other ones.
Same for mnt and pid, if you enter the mnt namespace before pid
your procfs will go nuts.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] How does virsh lxc-enter-namespace work? Does it?

2013-06-06 Thread Richard Weinberger

Am 06.06.2013 09:56, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 08:57:21AM +0200, Richard Weinberger wrote:

Hi!

I'm facing the issue that "virsh lxc-enter-namespace ..." does not work for me.
setns() always fails with EINVAL.

Reading the code confused me a bit, maybe you can help me. :D

virsh itself calls:
cmdLxcEnterNamespace()
  virDomainLxcOpenNamespace()
   conn->driver->domainLxcOpenNamespace()

Here comes the first thing that is not clear to me.
conn->driver seems to be the remote driver and therefore
->domainLxcOpenNamespace is remoteDomainLxcOpenNamespace()
Why is lxc:/// a remote connection?

remoteDomainLxcOpenNamespace() does a rpc call to libvirtd.

On the remote side libvirtd does:

lxcDispatchDomainOpenNamespace(), which opens the namespace fds,
and sends them back as result.
How can this work? Does it somewhere magic file descriptor passing
on AF_UNIX?


Yes, we use SCM_RIGHTS to pass FDs.


virsh then receives the fd's (pure numbers) and setns() failed badly.

Wouldn't it make much more sense to do the open(/proc/XXX/ns/{mnt, user, ...}) 
and setns()
calls directly on the local side? IOW directly in virsh?
driver->domainLxcOpenNamespace() should only report the process id of the 
container's
init process.


The reason for doing it server side is to get privilege separation.
eg libvirtd runs privileged to open the fds, and virsh can run
unprivileged with setns().  Unfortunately it seems the kernel
doesn't allow for the thing calling setns() to be unprivileged
at this time, but the design allows for this enhancement in the
future.


setns() needs CAP_SYS_ADMIN() and the manpage also says:

ERRORS:
...
EINVAL fd refers to a namespace whose type does not match that specified in 
nstype, or there is problem with reassociating the the thread with the 
specified namespace.


I'm sure in my case setns() fails because the calling thread did not open() the 
ns files itself.

What is the plan to make lxc-enter-namespace work?
Privilege separation is nice but as of now the kernel interface (setns()) seems 
not to allow this.
Are you forcing the kernel guys to change the interface?

In the meanwhile I'll use util-linux's nsenter which works fine.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] How does virsh lxc-enter-namespace work? Does it?

2013-06-06 Thread Richard Weinberger

Am 06.06.2013 10:13, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 10:07:26AM +0200, Richard Weinberger wrote:

Am 06.06.2013 09:56, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 08:57:21AM +0200, Richard Weinberger wrote:

Hi!

I'm facing the issue that "virsh lxc-enter-namespace ..." does not work for me.
setns() always fails with EINVAL.

Reading the code confused me a bit, maybe you can help me. :D

virsh itself calls:
cmdLxcEnterNamespace()
  virDomainLxcOpenNamespace()
   conn->driver->domainLxcOpenNamespace()

Here comes the first thing that is not clear to me.
conn->driver seems to be the remote driver and therefore
->domainLxcOpenNamespace is remoteDomainLxcOpenNamespace()
Why is lxc:/// a remote connection?

remoteDomainLxcOpenNamespace() does a rpc call to libvirtd.

On the remote side libvirtd does:

lxcDispatchDomainOpenNamespace(), which opens the namespace fds,
and sends them back as result.
How can this work? Does it somewhere magic file descriptor passing
on AF_UNIX?


Yes, we use SCM_RIGHTS to pass FDs.


virsh then receives the fd's (pure numbers) and setns() failed badly.

Wouldn't it make much more sense to do the open(/proc/XXX/ns/{mnt, user, ...}) 
and setns()
calls directly on the local side? IOW directly in virsh?
driver->domainLxcOpenNamespace() should only report the process id of the 
container's
init process.


The reason for doing it server side is to get privilege separation.
eg libvirtd runs privileged to open the fds, and virsh can run
unprivileged with setns().  Unfortunately it seems the kernel
doesn't allow for the thing calling setns() to be unprivileged
at this time, but the design allows for this enhancement in the
future.


setns() needs CAP_SYS_ADMIN() and the manpage also says:


The hope is that this can be relaxed - it ought to be sufficient to
just restrict access to the /proc/$PID/ns/ files to enforce permissions,
or require CAP_SYS_ADMIN when opening the files only. I can't see any
compelling reason why you should require CAP_SYS_ADMIN on setns() itself
once you have the FDs open.


ERRORS:
...
EINVAL fd refers to a namespace whose type does not match that specified in 
nstype, or there is problem with reassociating the the thread with the 
specified namespace.


I'm sure in my case setns() fails because the calling thread did not open() the 
ns files itself.


Do you have user namespaces enabled by chance ?


Yeah, but only within the kernel.
Maybe this is why the setns() is failing for me.


What is the plan to make lxc-enter-namespace work?
Privilege separation is nice but as of now the kernel interface (setns()) seems 
not to allow this.
Are you forcing the kernel guys to change the interface?


It has long worked fine on Fedora, though we do not have user namespaces
enabled since parts of the kernel are yet to be ported to that (XFS in
particular). My best guess is that user namespaces may have caused a
regression in this ability to call setns() from a separate process.


Sounds sane.
/me looks. :-)

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] LXC: fix order in virProcessGetNamespaces

2013-06-06 Thread Richard Weinberger

Am 06.06.2013 10:08, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 09:58:28AM +0200, Richard Weinberger wrote:

Am 06.06.2013 09:53, schrieb Daniel P. Berrange:

On Wed, Jun 05, 2013 at 11:23:07PM +0200, Richard Weinberger wrote:

virProcessGetNamespaces() opens files in /proc/XXX/ns/ which will
later be passed to setns().
We have to make sure that the file descriptors in the array are in the correct
order. Otherwise setns() may fail.


What is the scenario / cause of the failure ?


You cannot attach to namespaces in random order.
For example with user namespaces an unprivileged can enter other namespaces.
But to do so you have to enter the user namespace first and then
the other ones.


Ok, that kind of makes sense, ACK to the patch. I'll update the commit
message with this information.


Thanks. :)

FYI: util-linux's nsenter.c says:

/* Careful the order is significant in this array.
 *
 * The user namespace comes first, so that it is entered
 * first. This gives an unprivileged user the potential to
 * enter the other namespaces.
 */
 { .nstype = CLONE_NEWUSER, .name = "ns/user", .fd = -1 },
 { .nstype = CLONE_NEWIPC, .name = "ns/ipc", .fd = -1 },
 { .nstype = CLONE_NEWUTS, .name = "ns/uts", .fd = -1 },
 { .nstype = CLONE_NEWNET, .name = "ns/net", .fd = -1 },
 { .nstype = CLONE_NEWPID, .name = "ns/pid", .fd = -1 },
 { .nstype = CLONE_NEWNS, .name = "ns/mnt", .fd = -1 },
 { .nstype = 0, .name = NULL, .fd = -1 }



Same for mnt and pid, if you enter the mnt namespace before pid
your procfs will go nuts.


That shouldn't affect us since we don't need to access procfs at all
during the loop where we call setns().


Yeah, but there are some setns() combination which may still fail.
Some weeks ago I wrote my own ns attach tool that worked also with
user namespaces.
After debugging a few very strange issues there setns() failed all of
the sudden I realized that the setns() order matters.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] How does virsh lxc-enter-namespace work? Does it?

2013-06-06 Thread Richard Weinberger

Am 06.06.2013 10:13, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 10:07:26AM +0200, Richard Weinberger wrote:

Am 06.06.2013 09:56, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 08:57:21AM +0200, Richard Weinberger wrote:

Hi!

I'm facing the issue that "virsh lxc-enter-namespace ..." does not work for me.
setns() always fails with EINVAL.

Reading the code confused me a bit, maybe you can help me. :D

virsh itself calls:
cmdLxcEnterNamespace()
  virDomainLxcOpenNamespace()
   conn->driver->domainLxcOpenNamespace()

Here comes the first thing that is not clear to me.
conn->driver seems to be the remote driver and therefore
->domainLxcOpenNamespace is remoteDomainLxcOpenNamespace()
Why is lxc:/// a remote connection?

remoteDomainLxcOpenNamespace() does a rpc call to libvirtd.

On the remote side libvirtd does:

lxcDispatchDomainOpenNamespace(), which opens the namespace fds,
and sends them back as result.
How can this work? Does it somewhere magic file descriptor passing
on AF_UNIX?


Yes, we use SCM_RIGHTS to pass FDs.


virsh then receives the fd's (pure numbers) and setns() failed badly.

Wouldn't it make much more sense to do the open(/proc/XXX/ns/{mnt, user, ...}) 
and setns()
calls directly on the local side? IOW directly in virsh?
driver->domainLxcOpenNamespace() should only report the process id of the 
container's
init process.


The reason for doing it server side is to get privilege separation.
eg libvirtd runs privileged to open the fds, and virsh can run
unprivileged with setns().  Unfortunately it seems the kernel
doesn't allow for the thing calling setns() to be unprivileged
at this time, but the design allows for this enhancement in the
future.


setns() needs CAP_SYS_ADMIN() and the manpage also says:


The hope is that this can be relaxed - it ought to be sufficient to
just restrict access to the /proc/$PID/ns/ files to enforce permissions,
or require CAP_SYS_ADMIN when opening the files only. I can't see any
compelling reason why you should require CAP_SYS_ADMIN on setns() itself
once you have the FDs open.


ERRORS:
...
EINVAL fd refers to a namespace whose type does not match that specified in 
nstype, or there is problem with reassociating the the thread with the 
specified namespace.


I'm sure in my case setns() fails because the calling thread did not open() the 
ns files itself.


Do you have user namespaces enabled by chance ?


What is the plan to make lxc-enter-namespace work?
Privilege separation is nice but as of now the kernel interface (setns()) seems 
not to allow this.
Are you forcing the kernel guys to change the interface?


It has long worked fine on Fedora, though we do not have user namespaces
enabled since parts of the kernel are yet to be ported to that (XFS in
particular). My best guess is that user namespaces may have caused a
regression in this ability to call setns() from a separate process.


I can confirm that lxc-enter-namespace works fine when I disable CONFIG_USER_NS
in my kernel.

Currently I'm moving my old LXC setup over to libvirt and later I'll enable 
user namespaces
too.
Let's see what else breaks. ;-)

Stay tuned!

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] How does virsh lxc-enter-namespace work? Does it?

2013-06-07 Thread Richard Weinberger

Am 07.06.2013 17:34, schrieb Daniel P. Berrange:

On Thu, Jun 06, 2013 at 09:13:27AM +0100, Daniel P. Berrange wrote:

On Thu, Jun 06, 2013 at 10:07:26AM +0200, Richard Weinberger wrote:

I'm sure in my case setns() fails because the calling thread did not open() the 
ns files itself.


Do you have user namespaces enabled by chance ?


What is the plan to make lxc-enter-namespace work?
Privilege separation is nice but as of now the kernel interface (setns()) seems 
not to allow this.
Are you forcing the kernel guys to change the interface?


It has long worked fine on Fedora, though we do not have user namespaces
enabled since parts of the kernel are yet to be ported to that (XFS in
particular). My best guess is that user namespaces may have caused a
regression in this ability to call setns() from a separate process.


The problem is actually that you're not allowed to call setns(fd) for a
fd which refers to your current namespace. The fd must refer to a different
namespace. Of course the code is opening the '/proc/$PID/ns/user' file
even though libvirt doesn't give the container a new user namespace. The
simplest fix is to just ignore EINVAL from setns(), since we can't easily
figure out if the calling apps' namespace matches the namespace of the
container.


Thanks a ton for figuring that out!

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-10 Thread Richard Weinberger

Hi!

Am 04.06.2013 13:03, schrieb Daniel P. Berrange:

It's still under review. needs some ACK.
If you can help to test or ACK this patchset, it will be very helpful. :)

Actually, I just want to ping...


I've been away on holiday for 2 weeks, so not had a chance to review
it yet. I'll get to it this week. I hope we'll get this in the 1.0.6
release this month.


Finally I've found some time to test version 4 of the userns patch set.
But I'm unable to create a container.

---cut---
linux:~ # LANG=C /opt/libvirt/bin/virsh -c lxc:/// create c1.conf
error: Failed to create domain from c1.conf
error: Interner Fehler guest failed to start: PATH=/bin:/sbin TERM=linux container=lxc-libvirt container_uuid=3f86c48b-b027-4838-ba17-6202a1d7398b 
LIBVIRT_LXC_UUID=3f86c48b-b027-4838-ba17-6202a1d7398b LIBVIRT_LXC_NAME=c1 /bin/bash

error receiving signal from container: Input/output error
---cut---

lxcContainerWaitForContinue() in src/lxc/lxc_controller.c fails with EIO.
Maybe because the clone()'ed child dies and the file descriptor used for 
synchronization becomes invalid.

Here my container config:
---cut---

  c1
  102400
  
exe
/bin/bash
  
  


  
  


  
  

  

---cut---

Any ideas how to debug this further?
This is Linux 3.9.0 with all namespaces enabled.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-10 Thread Richard Weinberger

Am 10.06.2013 21:17, schrieb Richard Weinberger:

Hi!

Am 04.06.2013 13:03, schrieb Daniel P. Berrange:

It's still under review. needs some ACK.
If you can help to test or ACK this patchset, it will be very helpful. :)

Actually, I just want to ping...


I've been away on holiday for 2 weeks, so not had a chance to review
it yet. I'll get to it this week. I hope we'll get this in the 1.0.6
release this month.


Finally I've found some time to test version 4 of the userns patch set.
But I'm unable to create a container.

---cut---
linux:~ # LANG=C /opt/libvirt/bin/virsh -c lxc:/// create c1.conf
error: Failed to create domain from c1.conf
error: Interner Fehler guest failed to start: PATH=/bin:/sbin TERM=linux 
container=lxc-libvirt container_uuid=3f86c48b-b027-4838-ba17-6202a1d7398b
LIBVIRT_LXC_UUID=3f86c48b-b027-4838-ba17-6202a1d7398b LIBVIRT_LXC_NAME=c1 
/bin/bash
error receiving signal from container: Input/output error
---cut---

lxcContainerWaitForContinue() in src/lxc/lxc_controller.c fails with EIO.
Maybe because the clone()'ed child dies and the file descriptor used for 
synchronization becomes invalid.

Here my container config:
---cut---

   c1
   102400
   
 exe
 /bin/bash
   
   
 
 
   
   
 
 
   
   
 
   

---cut---

Any ideas how to debug this further?
This is Linux 3.9.0 with all namespaces enabled.


Whoops, forgot to add the libvirtd debug output:

---cut---
2013-06-10 19:41:24.661+: 29211: debug : virCommandRunAsync:2241 : About to run 
PATH=/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=1:stderr 
/opt/libvirt/lib/libvirt_lxc --name c1 --console 20 --security=none --handshake 23 --background

2013-06-10 19:41:24.663+: 29211: debug : virFileClose:90 : Closed fd 24
2013-06-10 19:41:24.663+: 29211: debug : virCommandRunAsync:2246 : Command 
result 0, with PID 29303
2013-06-10 19:41:24.664+: 29303: debug : virFileClose:90 : Closed fd 3
2013-06-10 19:41:24.665+: 29303: debug : virFileClose:90 : Closed fd 4
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 5
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 6
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 7
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 8
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 9
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 10
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 11
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 12
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 13
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 14
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 15
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 16
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 17
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 18
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 19
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 22
2013-06-10 19:41:24.790+: 29211: debug : virCommandRun:2115 : Result status 
0, stdout: '(null)' stderr: '(null)'
---cut---

Looks like libvirt_lxc was executed and died silently.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-10 Thread Richard Weinberger

Am 10.06.2013 21:53, schrieb Richard Weinberger:

Am 10.06.2013 21:17, schrieb Richard Weinberger:

Hi!

Am 04.06.2013 13:03, schrieb Daniel P. Berrange:

It's still under review. needs some ACK.
If you can help to test or ACK this patchset, it will be very helpful. :)

Actually, I just want to ping...


I've been away on holiday for 2 weeks, so not had a chance to review
it yet. I'll get to it this week. I hope we'll get this in the 1.0.6
release this month.


Finally I've found some time to test version 4 of the userns patch set.
But I'm unable to create a container.

---cut---
linux:~ # LANG=C /opt/libvirt/bin/virsh -c lxc:/// create c1.conf
error: Failed to create domain from c1.conf
error: Interner Fehler guest failed to start: PATH=/bin:/sbin TERM=linux 
container=lxc-libvirt container_uuid=3f86c48b-b027-4838-ba17-6202a1d7398b
LIBVIRT_LXC_UUID=3f86c48b-b027-4838-ba17-6202a1d7398b LIBVIRT_LXC_NAME=c1 
/bin/bash
error receiving signal from container: Input/output error
---cut---

lxcContainerWaitForContinue() in src/lxc/lxc_controller.c fails with EIO.
Maybe because the clone()'ed child dies and the file descriptor used for 
synchronization becomes invalid.

Here my container config:
---cut---

   c1
   102400
   
 exe
 /bin/bash
   
   
 
 
   
   
 
 
   
   
 
   

---cut---

Any ideas how to debug this further?
This is Linux 3.9.0 with all namespaces enabled.


Whoops, forgot to add the libvirtd debug output:

---cut---
2013-06-10 19:41:24.661+: 29211: debug : virCommandRunAsync:2241 : About to 
run
PATH=/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games
 LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=1:stderr
/opt/libvirt/lib/libvirt_lxc --name c1 --console 20 --security=none --handshake 
23 --background
2013-06-10 19:41:24.663+: 29211: debug : virFileClose:90 : Closed fd 24
2013-06-10 19:41:24.663+: 29211: debug : virCommandRunAsync:2246 : Command 
result 0, with PID 29303
2013-06-10 19:41:24.664+: 29303: debug : virFileClose:90 : Closed fd 3
2013-06-10 19:41:24.665+: 29303: debug : virFileClose:90 : Closed fd 4
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 5
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 6
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 7
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 8
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 9
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 10
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 11
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 12
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 13
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 14
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 15
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 16
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 17
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 18
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 19
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 22
2013-06-10 19:41:24.790+: 29211: debug : virCommandRun:2115 : Result status 
0, stdout: '(null)' stderr: '(null)'
---cut---

Looks like libvirt_lxc was executed and died silently.


Found the problem. /opt/libvirt/var/log/libvirt/lxc/c1.log contained the info I 
needed.
Search permissions for /root were missing. m(
Would be nice if virsh would be able to tell one this...

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-10 Thread Richard Weinberger

Am 11.06.2013 05:12, schrieb Gao feng:

On 06/11/2013 04:51 AM, Richard Weinberger wrote:

Am 10.06.2013 21:53, schrieb Richard Weinberger:

Am 10.06.2013 21:17, schrieb Richard Weinberger:

Hi!

Am 04.06.2013 13:03, schrieb Daniel P. Berrange:

It's still under review. needs some ACK.
If you can help to test or ACK this patchset, it will be very helpful. :)

Actually, I just want to ping...


I've been away on holiday for 2 weeks, so not had a chance to review
it yet. I'll get to it this week. I hope we'll get this in the 1.0.6
release this month.


Finally I've found some time to test version 4 of the userns patch set.
But I'm unable to create a container.

---cut---
linux:~ # LANG=C /opt/libvirt/bin/virsh -c lxc:/// create c1.conf
error: Failed to create domain from c1.conf
error: Interner Fehler guest failed to start: PATH=/bin:/sbin TERM=linux 
container=lxc-libvirt container_uuid=3f86c48b-b027-4838-ba17-6202a1d7398b
LIBVIRT_LXC_UUID=3f86c48b-b027-4838-ba17-6202a1d7398b LIBVIRT_LXC_NAME=c1 
/bin/bash
error receiving signal from container: Input/output error
---cut---

lxcContainerWaitForContinue() in src/lxc/lxc_controller.c fails with EIO.
Maybe because the clone()'ed child dies and the file descriptor used for 
synchronization becomes invalid.

Here my container config:
---cut---

c1
102400

  exe
  /bin/bash


  
  


  
  


  


---cut---

Any ideas how to debug this further?
This is Linux 3.9.0 with all namespaces enabled.


Whoops, forgot to add the libvirtd debug output:

---cut---
2013-06-10 19:41:24.661+: 29211: debug : virCommandRunAsync:2241 : About to 
run
PATH=/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games
 LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=1:stderr
/opt/libvirt/lib/libvirt_lxc --name c1 --console 20 --security=none --handshake 
23 --background
2013-06-10 19:41:24.663+: 29211: debug : virFileClose:90 : Closed fd 24
2013-06-10 19:41:24.663+: 29211: debug : virCommandRunAsync:2246 : Command 
result 0, with PID 29303
2013-06-10 19:41:24.664+: 29303: debug : virFileClose:90 : Closed fd 3
2013-06-10 19:41:24.665+: 29303: debug : virFileClose:90 : Closed fd 4
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 5
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 6
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 7
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 8
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 9
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 10
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 11
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 12
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 13
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 14
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 15
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 16
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 17
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 18
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 19
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 22
2013-06-10 19:41:24.790+: 29211: debug : virCommandRun:2115 : Result status 
0, stdout: '(null)' stderr: '(null)'
---cut---

Looks like libvirt_lxc was executed and died silently.


Found the problem. /opt/libvirt/var/log/libvirt/lxc/c1.log contained the info I 
needed.
Search permissions for /root were missing. m(
Would be nice if virsh would be able to tell one this...



:)
have fun with user namespace & libvirt.
And thanks for your test.


Yeah. So far it looks very good.
I was able to convert my containers from my custom lxc/userns setup to 
libvirt+userns.

One more question, is it by design that virsh lxc-enter-namespace does not setup
uid/gid mappings?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-11 Thread Richard Weinberger

Am 11.06.2013 08:17, schrieb Gao feng:

On 06/11/2013 02:02 PM, Richard Weinberger wrote:

Am 11.06.2013 05:12, schrieb Gao feng:

On 06/11/2013 04:51 AM, Richard Weinberger wrote:

Am 10.06.2013 21:53, schrieb Richard Weinberger:

Am 10.06.2013 21:17, schrieb Richard Weinberger:

Hi!

Am 04.06.2013 13:03, schrieb Daniel P. Berrange:

It's still under review. needs some ACK.
If you can help to test or ACK this patchset, it will be very helpful. :)

Actually, I just want to ping...


I've been away on holiday for 2 weeks, so not had a chance to review
it yet. I'll get to it this week. I hope we'll get this in the 1.0.6
release this month.


Finally I've found some time to test version 4 of the userns patch set.
But I'm unable to create a container.

---cut---
linux:~ # LANG=C /opt/libvirt/bin/virsh -c lxc:/// create c1.conf
error: Failed to create domain from c1.conf
error: Interner Fehler guest failed to start: PATH=/bin:/sbin TERM=linux 
container=lxc-libvirt container_uuid=3f86c48b-b027-4838-ba17-6202a1d7398b
LIBVIRT_LXC_UUID=3f86c48b-b027-4838-ba17-6202a1d7398b LIBVIRT_LXC_NAME=c1 
/bin/bash
error receiving signal from container: Input/output error
---cut---

lxcContainerWaitForContinue() in src/lxc/lxc_controller.c fails with EIO.
Maybe because the clone()'ed child dies and the file descriptor used for 
synchronization becomes invalid.

Here my container config:
---cut---

 c1
 102400
 
   exe
   /bin/bash
 
 
   
   
 
 
   
   
 
 
   
 

---cut---

Any ideas how to debug this further?
This is Linux 3.9.0 with all namespaces enabled.


Whoops, forgot to add the libvirtd debug output:

---cut---
2013-06-10 19:41:24.661+: 29211: debug : virCommandRunAsync:2241 : About to 
run
PATH=/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games
 LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=1:stderr
/opt/libvirt/lib/libvirt_lxc --name c1 --console 20 --security=none --handshake 
23 --background
2013-06-10 19:41:24.663+: 29211: debug : virFileClose:90 : Closed fd 24
2013-06-10 19:41:24.663+: 29211: debug : virCommandRunAsync:2246 : Command 
result 0, with PID 29303
2013-06-10 19:41:24.664+: 29303: debug : virFileClose:90 : Closed fd 3
2013-06-10 19:41:24.665+: 29303: debug : virFileClose:90 : Closed fd 4
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 5
2013-06-10 19:41:24.666+: 29303: debug : virFileClose:90 : Closed fd 6
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 7
2013-06-10 19:41:24.667+: 29303: debug : virFileClose:90 : Closed fd 8
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 9
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 10
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 11
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 12
2013-06-10 19:41:24.668+: 29303: debug : virFileClose:90 : Closed fd 13
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 14
2013-06-10 19:41:24.669+: 29303: debug : virFileClose:90 : Closed fd 15
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 16
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 17
2013-06-10 19:41:24.670+: 29303: debug : virFileClose:90 : Closed fd 18
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 19
2013-06-10 19:41:24.671+: 29303: debug : virFileClose:90 : Closed fd 22
2013-06-10 19:41:24.790+: 29211: debug : virCommandRun:2115 : Result status 
0, stdout: '(null)' stderr: '(null)'
---cut---

Looks like libvirt_lxc was executed and died silently.


Found the problem. /opt/libvirt/var/log/libvirt/lxc/c1.log contained the info I 
needed.
Search permissions for /root were missing. m(
Would be nice if virsh would be able to tell one this...



:)
have fun with user namespace & libvirt.
And thanks for your test.


Yeah. So far it looks very good.
I was able to convert my containers from my custom lxc/userns setup to 
libvirt+userns.

One more question, is it by design that virsh lxc-enter-namespace does not setup
uid/gid mappings?



lxc-enter-namespace doesn't have the need to setup uid/gid mappings, Since 
lxc-enter-namespace
is running on the host side, the uid/gid mappings already exist, But we should 
call setid for
the child task of lxc-enter-namespace, this child task running in the container.

I will improve lxc-enter-namespace after this patchset being accepted.


This makes sense.
As of now I'm using su to become uid 0.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] lxc: Create /dev/tty within a container

2013-06-12 Thread Richard Weinberger
Many applications use /dev/tty to read from stdin.
E.g. zypper on openSUSE.

Let's create this device node to unbreak those applications.
As /dev/tty is a synonym for the current controlling terminal
it cannot harm the host or any other containers.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 181f6c8..9ab64a1 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -837,6 +837,7 @@ static int lxcContainerPopulateDevices(char **ttyPaths, 
size_t nttyPaths)
 { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_FULL, 0666, "/dev/full" },
 { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_RANDOM, 0666, "/dev/random" },
 { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_URANDOM, 0666, "/dev/urandom" },
+{ LXC_DEV_MAJ_TTY, LXC_DEV_MIN_TTY, 0666, "/tty" },
 };
 const struct {
 const char *src;
-- 
1.8.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 00/12] Add user namespace support for libvirt lxc

2013-06-13 Thread Richard Weinberger

Am 11.06.2013 08:17, schrieb Gao feng:

:)
have fun with user namespace & libvirt.
And thanks for your test.


Found an nasty issue.
It looks like libvirt execs the lxc init within the wrong rootfs context.

My container's rootfs contains the script named /xxx.
If I try to use it as init, libvirt fails.

2013-06-13 13:18:04.499+: 1: error : lxcContainerChild:1941 : cannot find 
init path '/xxx' relative to container root: No such file or directory

It fails because it looks in the rootfs of the host.
If I create /xxx within my hostfs it works.

Nobody noticed so far because in 99.9% of all case you have /bin/bash, 
/sbin/init and friends in both filesystems.

---cut---

  c_test1
  102400
  
exe
/xxx
  
  


  
  


  
  


  
  


 
 
   
  

---cut---

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 1/2] LXC: Drop capabilities only if we're not within a user namespace

2013-06-13 Thread Richard Weinberger
Dropping capabilities within a user namespace makes no sense
because any uid 0 process will regain all caps upon execve().

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 958e20d..4f00420 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -1896,6 +1896,15 @@ static int lxcContainerDropCapabilities(bool keepReboot 
ATTRIBUTE_UNUSED)
 return 0;
 }
 
+static int userns_supported(void)
+{
+return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
+}
+
+static int userns_required(virDomainDefPtr def)
+{
+return def->idmap.uidmap && def->idmap.gidmap;
+}
 
 /**
  * lxcContainerChild:
@@ -1992,7 +2001,7 @@ static int lxcContainerChild(void *data)
 }
 
 /* drop a set of root capabilities */
-if (lxcContainerDropCapabilities(!!hasReboot) < 0)
+if (!userns_required(vmDef) && lxcContainerDropCapabilities(!!hasReboot) < 
0)
 goto cleanup;
 
 if (lxcContainerSendContinue(argv->handshakefd) < 0) {
@@ -2025,16 +2034,6 @@ cleanup:
 return ret;
 }
 
-static int userns_supported(void)
-{
-return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
-}
-
-static int userns_required(virDomainDefPtr def)
-{
-return def->idmap.uidmap && def->idmap.gidmap;
-}
-
 virArch lxcContainerGetAlt32bitArch(virArch arch)
 {
 /* Any Linux 64bit arch which has a 32bit
-- 
1.8.1.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-06-13 Thread Richard Weinberger
Within a user namespace root can remount these filesysems at any
time rw.
Create these mappings only if we're not playing with user namespaces.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 4f00420..a003ec8 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -682,8 +682,17 @@ err:
 return ret;
 }
 
+static int userns_supported(void)
+{
+return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
+}
 
-static int lxcContainerMountBasicFS(void)
+static int userns_required(virDomainDefPtr def)
+{
+return def->idmap.uidmap && def->idmap.gidmap;
+}
+
+static int lxcContainerMountBasicFS(virDomainDefPtr vmDef)
 {
 const struct {
 const char *src;
@@ -691,6 +700,7 @@ static int lxcContainerMountBasicFS(void)
 const char *type;
 const char *opts;
 int mflags;
+bool paranoia;
 } mnts[] = {
 /* When we want to make a bind mount readonly, for unknown reasons,
  * it is currently necessary to bind it once, and then remount the
@@ -698,14 +708,14 @@ static int lxcContainerMountBasicFS(void)
  * mount point in the main OS becomes readonly too which is not what
  * we want. Hence some things have two entries here.
  */
-{ "proc", "/proc", "proc", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV },
-{ "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND },
-{ "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY },
-{ "sysfs", "/sys", "sysfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV },
-{ "sysfs", "/sys", "sysfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY },
+{ "proc", "/proc", "proc", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV, false },
+{ "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND, true },
+{ "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, 
true },
+{ "sysfs", "/sys", "sysfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV, false 
},
+{ "sysfs", "/sys", "sysfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, true },
 #if WITH_SELINUX
-{ SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", NULL, 
MS_NOSUID|MS_NOEXEC|MS_NODEV },
-{ SELINUX_MOUNT, SELINUX_MOUNT, NULL, NULL, 
MS_BIND|MS_REMOUNT|MS_RDONLY },
+{ SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", NULL, 
MS_NOSUID|MS_NOEXEC|MS_NODEV, false },
+{ SELINUX_MOUNT, SELINUX_MOUNT, NULL, NULL, 
MS_BIND|MS_REMOUNT|MS_RDONLY, true },
 #endif
 };
 int i, rc = -1;
@@ -720,6 +730,10 @@ static int lxcContainerMountBasicFS(void)
 
 srcpath = mnts[i].src;
 
+/* Skip ro overlay mounts if we build a userns as root can remount it 
rw at any time */
+if (userns_required(vmDef) && mnts[i].paranoia)
+continue;
+
 /* Skip if mount doesn't exist in source */
 if ((srcpath[0] == '/') &&
 (access(srcpath, R_OK) < 0))
@@ -1780,7 +1794,7 @@ static int lxcContainerSetupPivotRoot(virDomainDefPtr 
vmDef,
 goto cleanup;
 
 /* Mounts the core /proc, /sys, etc filesystems */
-if (lxcContainerMountBasicFS() < 0)
+if (lxcContainerMountBasicFS(vmDef) < 0)
 goto cleanup;
 
 /* Mounts /proc/meminfo etc sysinfo */
@@ -1896,16 +1910,6 @@ static int lxcContainerDropCapabilities(bool keepReboot 
ATTRIBUTE_UNUSED)
 return 0;
 }
 
-static int userns_supported(void)
-{
-return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
-}
-
-static int userns_required(virDomainDefPtr def)
-{
-return def->idmap.uidmap && def->idmap.gidmap;
-}
-
 /**
  * lxcContainerChild:
  * @data: pointer to container arguments
-- 
1.8.1.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 1/2] LXC: Drop capabilities only if we're not within a user namespace

2013-06-13 Thread Richard Weinberger

Am 13.06.2013 20:02, schrieb Richard Weinberger:

Dropping capabilities within a user namespace makes no sense
because any uid 0 process will regain all caps upon execve().

Signed-off-by: Richard Weinberger 


BTW: This one solves also a funny systemd issue.
systemd reads from /proc/1/environ to detect whether it
runs with in LXC or not.
If we change the capability set (it does not matter which cap we drop),
uid 0/pid 1 is no longer allowed to read from that file.
If have to admit that I don't fully understand what kind of user 
namespace/capability
horror is going on. (Currently reading kernel sources to find out.)
But if pid 1 execve's anything else it regains fresh capability set and is 
allowed to
read /proc/1/environ.

This is way /sbin/init did not work for me.
If I use a simply bash wrapper as init which execve's systemd it works fine...

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] LXC: Ensure the init task of container comes from container

2013-06-13 Thread Richard Weinberger

Am 14.06.2013 07:54, schrieb Gao feng:

Richard found libvirt_lxc execs the lxc init programs within
the wrong rootfs context, we should run this init task from
the rootfs of container.

So chroot to the root directory of container, Make sure
libvirt_lxc execs the right lxc init program.

Signed-off-by: Gao feng 
---
  src/lxc/lxc_container.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 181f6c8..4edff15 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -645,8 +645,9 @@ static int lxcContainerPivotRoot(virDomainFSDefPtr root)
  goto err;
  }

-/* CWD is undefined after pivot_root, so go to / */
-if (chdir("/") < 0)
+/* CWD is undefined after pivot_root, so go to /,
+ * and chroot to the new root directroy */
+if (chdir("/") < 0 || chroot(".") < 0)
  goto err;


Hmm, that looks fishy to me.
We never have to do a chroot(".") after pivot_root().

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] LXC: Check container init path after pivot_root()

2013-06-13 Thread Richard Weinberger
Currently we check the path before changing the root directory.
This cannot work. Do the check after pivot_root() such that
we check for the patch within the correct root.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index a003ec8..7531fea 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -1948,13 +1948,6 @@ static int lxcContainerChild(void *data)
 if (lxcContainerResolveSymlinks(vmDef) < 0)
 goto cleanup;
 
-if (!virFileExists(vmDef->os.init)) {
-virReportSystemError(errno,
-_("cannot find init path '%s' relative to container root"),
-vmDef->os.init);
-goto cleanup;
-}
-
 /* Wait for interface devices to show up */
 if (lxcContainerWaitForContinue(argv->monitor) < 0) {
 virReportSystemError(errno, "%s",
@@ -1996,6 +1989,13 @@ static int lxcContainerChild(void *data)
argv->securityDriver) < 0)
 goto cleanup;
 
+if (!virFileExists(vmDef->os.init)) {
+virReportSystemError(errno,
+_("cannot find init path '%s' relative to container root"),
+vmDef->os.init);
+goto cleanup;
+}
+
 /* rename and enable interfaces */
 if (lxcContainerRenameAndEnableInterfaces(!!(vmDef->features &
  (1 << 
VIR_DOMAIN_FEATURE_PRIVNET)),
-- 
1.8.1.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] LXC: Ensure the init task of container comes from container

2013-06-13 Thread Richard Weinberger

Am 14.06.2013 07:54, schrieb Gao feng:

Richard found libvirt_lxc execs the lxc init programs within
the wrong rootfs context, we should run this init task from
the rootfs of container.

So chroot to the root directory of container, Make sure
libvirt_lxc execs the right lxc init program.

Signed-off-by: Gao feng 


Found the real issue.
See "[PATCH] LXC: Check container init path after pivot_root()".
Sorry, I forgot to CC you.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] LXC: s/chroot/chdir in lxcContainerPivotRoot()

2013-06-14 Thread Richard Weinberger
...fixes a trivial copy&paste error.

Signed-off-by: Richard Weinberger 
---
 src/lxc/lxc_container.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 7531fea..c4056c3 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -653,11 +653,11 @@ static int lxcContainerPivotRoot(virDomainFSDefPtr root)
 }
 }
 
-/* Now we chroot into the tmpfs, then pivot into the
+/* Now we chdir into the tmpfs, then pivot into the
  * root->src bind-mounted onto '/new' */
 if (chdir(newroot) < 0) {
 virReportSystemError(errno,
- _("Failed to chroot into %s"), newroot);
+ _("Failed to chdir into %s"), newroot);
 goto err;
 }
 
-- 
1.8.1.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 1/2] LXC: Drop capabilities only if we're not within a user namespace

2013-06-25 Thread Richard Weinberger
Am 13.06.2013 20:02, schrieb Richard Weinberger:
> Dropping capabilities within a user namespace makes no sense
> because any uid 0 process will regain all caps upon execve().
> 
> Signed-off-by: Richard Weinberger 
> ---
>  src/lxc/lxc_container.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 958e20d..4f00420 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -1896,6 +1896,15 @@ static int lxcContainerDropCapabilities(bool 
> keepReboot ATTRIBUTE_UNUSED)
>  return 0;
>  }
>  
> +static int userns_supported(void)
> +{
> +return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
> +}
> +
> +static int userns_required(virDomainDefPtr def)
> +{
> +return def->idmap.uidmap && def->idmap.gidmap;
> +}
>  
>  /**
>   * lxcContainerChild:
> @@ -1992,7 +2001,7 @@ static int lxcContainerChild(void *data)
>  }
>  
>  /* drop a set of root capabilities */
> -if (lxcContainerDropCapabilities(!!hasReboot) < 0)
> +if (!userns_required(vmDef) && lxcContainerDropCapabilities(!!hasReboot) 
> < 0)
>  goto cleanup;
>  
>  if (lxcContainerSendContinue(argv->handshakefd) < 0) {
> @@ -2025,16 +2034,6 @@ cleanup:
>  return ret;
>  }
>  
> -static int userns_supported(void)
> -{
> -return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
> -}
> -
> -static int userns_required(virDomainDefPtr def)
> -{
> -return def->idmap.uidmap && def->idmap.gidmap;
> -}
> -
>  virArch lxcContainerGetAlt32bitArch(virArch arch)
>  {
>  /* Any Linux 64bit arch which has a 32bit
> 

Any feedback on that one?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 1/2] LXC: Drop capabilities only if we're not within a user namespace

2013-06-25 Thread Richard Weinberger
Am 25.06.2013 22:36, schrieb Daniel P. Berrange:
> On Thu, Jun 13, 2013 at 08:02:17PM +0200, Richard Weinberger wrote:
>> Dropping capabilities within a user namespace makes no sense
>> because any uid 0 process will regain all caps upon execve().
> 
> That is true, except for the fact that libvirt has removed the
> capabilities from the bounding set too. This prevents them being
> regained upon execve.

Are you sure that this applies also for user namespaces?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-06-30 Thread Richard Weinberger
Am 01.07.2013 04:26, schrieb Gao feng:
>> Well, given that we're at rc2 now & I'm still unclear about how some
>> aspects of the userns setup is working, I'm afraid we'll have to wait
>> until 1.1.1 for the userns LXC code to merge.  I'll aim todo it next
>> week, so that we have plenty of time for further testing before the
>> 1.1.1 release.
>>
> 
> Ok, I think Richard had tested the userns support.
> Hi Richard, can you give me your ack or tested-by?

I'm still facing one userns related issue.

Create a container like this one:
---cut---

  testi
  102400
  
exe
/bin/bash
  
  


  
  


  
  

 
  


  

---cut---

After creating it attach to it's console, you'll find bash as pid 1.
And you'll find that /proc/1/ is not fully uid/gid-mapped:
---cut---
# ls -la /proc/1/
total 0
dr-xr-xr-x  8 root   root0 Jul  1 06:06 .
dr-xr-xr-x 74 nobody nogroup 0 Jul  1 06:06 ..
dr-xr-xr-x  2 root   root0 Jul  1 06:06 attr
-r  1 nobody nogroup 0 Jul  1 06:06 auxv
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 cgroup
--w---  1 nobody nogroup 0 Jul  1 06:06 clear_refs
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 cmdline
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 comm
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 coredump_filter
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 cpuset
lrwxrwxrwx  1 nobody nogroup 0 Jul  1 06:06 cwd -> /
-r  1 nobody nogroup 0 Jul  1 06:06 environ
lrwxrwxrwx  1 nobody nogroup 0 Jul  1 06:06 exe -> /bin/bash
dr-x--  2 nobody nogroup 0 Jul  1 06:06 fd
dr-x--  2 nobody nogroup 0 Jul  1 06:06 fdinfo
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 gid_map
-r  1 nobody nogroup 0 Jul  1 06:06 io
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 limits
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 loginuid
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 maps
-rw---  1 nobody nogroup 0 Jul  1 06:06 mem
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 mountinfo
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 mounts
-r  1 nobody nogroup 0 Jul  1 06:06 mountstats
dr-xr-xr-x 10 root   root0 Jul  1 06:06 net
dr-x--x--x  2 nobody nogroup 0 Jul  1 06:06 ns
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 numa_maps
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 oom_adj
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 oom_score
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 oom_score_adj
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 pagemap
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 personality
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 projid_map
lrwxrwxrwx  1 nobody nogroup 0 Jul  1 06:06 root -> /
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 schedstat
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 sessionid
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 smaps
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 stack
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 stat
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 statm
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 status
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 syscall
dr-xr-xr-x  3 root   root0 Jul  1 06:06 task
-rw-r--r--  1 nobody nogroup 0 Jul  1 06:06 uid_map
-r--r--r--  1 nobody nogroup 0 Jul  1 06:06 wchan
---cut---

Systemd suffers from this issue because it needs to read from /proc/1/environ.
After one exec /proc seems to be fixed:

---cut---
# cat /proc/1/environ
cat: /proc/1/environ: Permission denied
# exec /bin/bash
# cat /proc/1/environ
TERM=linuxPATH=/bin:/sbinPWD=/container_uuid=fabc42f8-cdee-461c-9a21-93902ab52b40SHLVL=0LIBVIRT_LXC_UUID=fabc42f8-cdee-461c-9a21-93902ab52b40LIBVIRT_LXC_NAME=testicontainer=lxc-libvirt

---cut---

If I turn lxcContainerDropCapabilities() into a NOP the permissions in /proc 
are no longer clobbered.

Another (maybe related issue),
No capabilities seem to get dropped.
(Of course tested where lxcContainerDropCapabilities() is not a NOP :) )

---cut---
# /usr/bin/pscap -a
ppid  pid   namecommand   capabilities
0 1 rootbash  full
---cut---

Any ideas what's going on here?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-01 Thread Richard Weinberger
Hi!

If you have multiple LXC containers with networking and the autostart feature 
enabled libvirtd fails to
up some veth interfaces on the host side.

Most of the time only the first veth device is in state up, all others are down.

Reproducing is easy.
1. Define a few containers (5 in my case)
2. Run "virsh autostart ..." on each one.
3. stop/start libvirtd

You'll observe that all containers are running, but "ip a" will report on the 
host
side that not all veth devices are up and are not usable within the containers.

This is not userns related, just retested with libvirt of today.

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-07-01 Thread Richard Weinberger
Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>> Am 01.07.2013 04:26, schrieb Gao feng:
>>>> Well, given that we're at rc2 now & I'm still unclear about how some
>>>> aspects of the userns setup is working, I'm afraid we'll have to wait
>>>> until 1.1.1 for the userns LXC code to merge.  I'll aim todo it next
>>>> week, so that we have plenty of time for further testing before the
>>>> 1.1.1 release.
>>>>
>>>
>>> Ok, I think Richard had tested the userns support.
>>> Hi Richard, can you give me your ack or tested-by?
>>
>> I'm still facing one userns related issue.
> 
> [snip]
> 
>> After creating it attach to it's console, you'll find bash as pid 1.
>> And you'll find that /proc/1/ is not fully uid/gid-mapped:
>> ---cut---
>> # ls -la /proc/1/
>> total 0
>> dr-xr-xr-x  8 root   root0 Jul  1 06:06 .
>> dr-xr-xr-x 74 nobody nogroup 0 Jul  1 06:06 ..
>> dr-xr-xr-x  2 root   root0 Jul  1 06:06 attr
> 
> [snip]
> 
>> Any ideas what's going on here?
> 
> No, it is very odd. It smells like a kernel issue to me. What
> version are you running ?

I see this issue on all kernels.
Currently I'm using vanilla v3.9.x and v3.10.

> I've also tried running the demo programs shown on the LWN.net
> article
> 
>https://lwn.net/Articles/532593/
> 
> and they don't operate in the way described by the article - the demo
> programs continue to ru as 'nfsnobody' even after the mappings are
> setup.
> 
> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
> in KConfig.  I'm wondering if there is still stuff missing in 3.9.x
> that prevents this from working properly, or if the kernel behaviour
> changed after those LWN articles were written.

To me it looks like the capability system behaves odd.
The mappings in /proc are fine as long I do not call capng_updatev().
Also calling capng_updatev() with parameters that do not change the current cap 
set
triggers the odd behavior too.

So we see two (related?) issues:
1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files till 
we exec().
2. Dropping  capabilities does not work we always gain a fresh and full 
capability set.

BTW: I'm sure the issues are not caused by Gau Feng's userns patches.
Feel free to add:
Acked-by: Richard Weinberger 
Tested-by: Richard Weinberger 

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-07-01 Thread Richard Weinberger
Am 01.07.2013 13:22, schrieb Daniel P. Berrange:
> On Mon, Jul 01, 2013 at 01:05:23PM +0200, Richard Weinberger wrote:
>> Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
>>> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>>>> Any ideas what's going on here?
>>>
>>> No, it is very odd. It smells like a kernel issue to me. What
>>> version are you running ?
>>
>> I see this issue on all kernels.
>> Currently I'm using vanilla v3.9.x and v3.10.
>>
>>> I've also tried running the demo programs shown on the LWN.net
>>> article
>>>
>>>https://lwn.net/Articles/532593/
>>>
>>> and they don't operate in the way described by the article - the demo
>>> programs continue to ru as 'nfsnobody' even after the mappings are
>>> setup.
>>>
>>> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
>>> in KConfig.  I'm wondering if there is still stuff missing in 3.9.x
>>> that prevents this from working properly, or if the kernel behaviour
>>> changed after those LWN articles were written.
>>
>> To me it looks like the capability system behaves odd.
>> The mappings in /proc are fine as long I do not call capng_updatev().
>> Also calling capng_updatev() with parameters that do not change the current 
>> cap set
>> triggers the odd behavior too.
>>
>> So we see two (related?) issues:
>> 1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files 
>> till we exec().
>> 2. Dropping  capabilities does not work we always gain a fresh and full 
>> capability set.
>>
>> BTW: I'm sure the issues are not caused by Gau Feng's userns patches.
> 
> Yeah, I've reproduced this problem with standalone code outside of
> libvirt. 
> 
> Take the attached code and run

-ENOATTACHMENT :-(

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-07-01 Thread Richard Weinberger
Am 01.07.2013 13:35, schrieb Daniel P. Berrange:
> On Mon, Jul 01, 2013 at 01:25:28PM +0200, Richard Weinberger wrote:
>> Am 01.07.2013 13:22, schrieb Daniel P. Berrange:
>>> On Mon, Jul 01, 2013 at 01:05:23PM +0200, Richard Weinberger wrote:
>>>> Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
>>>>> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>>>>>> Any ideas what's going on here?
>>>>>
>>>>> No, it is very odd. It smells like a kernel issue to me. What
>>>>> version are you running ?
>>>>
>>>> I see this issue on all kernels.
>>>> Currently I'm using vanilla v3.9.x and v3.10.
>>>>
>>>>> I've also tried running the demo programs shown on the LWN.net
>>>>> article
>>>>>
>>>>>https://lwn.net/Articles/532593/
>>>>>
>>>>> and they don't operate in the way described by the article - the demo
>>>>> programs continue to ru as 'nfsnobody' even after the mappings are
>>>>> setup.
>>>>>
>>>>> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
>>>>> in KConfig.  I'm wondering if there is still stuff missing in 3.9.x
>>>>> that prevents this from working properly, or if the kernel behaviour
>>>>> changed after those LWN articles were written.
>>>>
>>>> To me it looks like the capability system behaves odd.
>>>> The mappings in /proc are fine as long I do not call capng_updatev().
>>>> Also calling capng_updatev() with parameters that do not change the 
>>>> current cap set
>>>> triggers the odd behavior too.
>>>>
>>>> So we see two (related?) issues:
>>>> 1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files 
>>>> till we exec().
>>>> 2. Dropping  capabilities does not work we always gain a fresh and full 
>>>> capability set.
>>>>
>>>> BTW: I'm sure the issues are not caused by Gau Feng's userns patches.
>>>
>>> Yeah, I've reproduced this problem with standalone code outside of
>>> libvirt. 
>>>
>>> Take the attached code and run
>>
>> -ENOATTACHMENT :-(
> 
> Now really attached.
> 
> I think I might know what is happening now though.  When you start a new
> namespace, you must mount a new instance of 'proc' filesystem. We are
> not synchronizing this wrt setup of the uid/gid mappings though, so we
> are racy. So I have a feeling we're creating the proc filesystem before
> the mappings are setup. I'm going to add some synchronization in to see
> if it makes a difference in this respect.

So you mount /proc and write the uid/gid mappings in parallel?
Both has to be done on the host side. Why is this parallel?

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 2/2] LXC: Create ro overlay mounts only if we're not within a user namespace

2013-07-01 Thread Richard Weinberger
Am 01.07.2013 13:44, schrieb Richard Weinberger:
> Am 01.07.2013 13:35, schrieb Daniel P. Berrange:
>> On Mon, Jul 01, 2013 at 01:25:28PM +0200, Richard Weinberger wrote:
>>> Am 01.07.2013 13:22, schrieb Daniel P. Berrange:
>>>> On Mon, Jul 01, 2013 at 01:05:23PM +0200, Richard Weinberger wrote:
>>>>> Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
>>>>>> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>>>>>>> Any ideas what's going on here?
>>>>>>
>>>>>> No, it is very odd. It smells like a kernel issue to me. What
>>>>>> version are you running ?
>>>>>
>>>>> I see this issue on all kernels.
>>>>> Currently I'm using vanilla v3.9.x and v3.10.
>>>>>
>>>>>> I've also tried running the demo programs shown on the LWN.net
>>>>>> article
>>>>>>
>>>>>>https://lwn.net/Articles/532593/
>>>>>>
>>>>>> and they don't operate in the way described by the article - the demo
>>>>>> programs continue to ru as 'nfsnobody' even after the mappings are
>>>>>> setup.
>>>>>>
>>>>>> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
>>>>>> in KConfig.  I'm wondering if there is still stuff missing in 3.9.x
>>>>>> that prevents this from working properly, or if the kernel behaviour
>>>>>> changed after those LWN articles were written.
>>>>>
>>>>> To me it looks like the capability system behaves odd.
>>>>> The mappings in /proc are fine as long I do not call capng_updatev().
>>>>> Also calling capng_updatev() with parameters that do not change the 
>>>>> current cap set
>>>>> triggers the odd behavior too.
>>>>>
>>>>> So we see two (related?) issues:
>>>>> 1. If we try updating the capabilities of pid1 /proc/1/ has unmapped 
>>>>> files till we exec().
>>>>> 2. Dropping  capabilities does not work we always gain a fresh and full 
>>>>> capability set.
>>>>>
>>>>> BTW: I'm sure the issues are not caused by Gau Feng's userns patches.
>>>>
>>>> Yeah, I've reproduced this problem with standalone code outside of
>>>> libvirt. 
>>>>
>>>> Take the attached code and run
>>>
>>> -ENOATTACHMENT :-(
>>
>> Now really attached.
>>
>> I think I might know what is happening now though.  When you start a new
>> namespace, you must mount a new instance of 'proc' filesystem. We are
>> not synchronizing this wrt setup of the uid/gid mappings though, so we
>> are racy. So I have a feeling we're creating the proc filesystem before
>> the mappings are setup. I'm going to add some synchronization in to see
>> if it makes a difference in this respect.
> 
> So you mount /proc and write the uid/gid mappings in parallel?
> Both has to be done on the host side. Why is this parallel?

Forget this one... :D

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-04 Thread Richard Weinberger
Hi,

Am 03.07.2013 12:04, schrieb Gao feng:
> Hi,
> On 07/01/2013 03:45 PM, Richard Weinberger wrote:
>> Hi!
>>
>> If you have multiple LXC containers with networking and the autostart 
>> feature enabled libvirtd fails to
>> up some veth interfaces on the host side.
>>
>> Most of the time only the first veth device is in state up, all others are 
>> down.
>>
>> Reproducing is easy.
>> 1. Define a few containers (5 in my case)
>> 2. Run "virsh autostart ..." on each one.
>> 3. stop/start libvirtd
>>
>> You'll observe that all containers are running, but "ip a" will report on 
>> the host
>> side that not all veth devices are up and are not usable within the 
>> containers.
>>
>> This is not userns related, just retested with libvirt of today.
> 
> I can not reproduce this problem on my test bed...

Strange.

> maybe you should wait some seconds for the starting of these containers.

Please see the attached shell script. Using it I'm able to trigger the issue on 
all of
my test machines.
run.sh creates six very minimal containers and enables autostart. Then it kills 
and restarts libvirtd.
After the script is done you'll see that only one or two veth devices are up.

On the over hand, if I start them manually using a command like this one:
for cfg in a b c d e f ; do /opt/libvirt/bin/virsh -c lxc:/// start test-$cfg ; 
done
All veths are always up.

Thanks,
//richard


run.sh
Description: Bourne shell script
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-05 Thread Richard Weinberger
Am 05.07.2013 03:36, schrieb Gao feng:
> On 07/05/2013 04:45 AM, Richard Weinberger wrote:
>> Hi,
>>
>> Am 03.07.2013 12:04, schrieb Gao feng:
>>> Hi,
>>> On 07/01/2013 03:45 PM, Richard Weinberger wrote:
>>>> Hi!
>>>>
>>>> If you have multiple LXC containers with networking and the autostart 
>>>> feature enabled libvirtd fails to
>>>> up some veth interfaces on the host side.
>>>>
>>>> Most of the time only the first veth device is in state up, all others are 
>>>> down.
>>>>
>>>> Reproducing is easy.
>>>> 1. Define a few containers (5 in my case)
>>>> 2. Run "virsh autostart ..." on each one.
>>>> 3. stop/start libvirtd
>>>>
>>>> You'll observe that all containers are running, but "ip a" will report on 
>>>> the host
>>>> side that not all veth devices are up and are not usable within the 
>>>> containers.
>>>>
>>>> This is not userns related, just retested with libvirt of today.
>>>
>>> I can not reproduce this problem on my test bed...
>>
>> Strange.
>>
>>> maybe you should wait some seconds for the starting of these containers.
>>
>> Please see the attached shell script. Using it I'm able to trigger the issue 
>> on all of
>> my test machines.
>> run.sh creates six very minimal containers and enables autostart. Then it 
>> kills and restarts libvirtd.
>> After the script is done you'll see that only one or two veth devices are up.
>>
>> On the over hand, if I start them manually using a command like this one:
>> for cfg in a b c d e f ; do /opt/libvirt/bin/virsh -c lxc:/// start 
>> test-$cfg ; done
>> All veths are always up.
>>
> 
> 
> I still can not reproduce even use your script.
> 
> [root@Donkey-I5 Desktop]# ./run.sh
> Domain test-a defined from container_a.conf
> 
> Domain test-a marked as autostarted
> 
> Domain test-b defined from container_b.conf
> 
> Domain test-b marked as autostarted
> 
> Domain test-c defined from container_c.conf
> 
> Domain test-c marked as autostarted
> 
> Domain test-d defined from container_d.conf
> 
> Domain test-d marked as autostarted
> 
> Domain test-e defined from container_e.conf
> 
> Domain test-e marked as autostarted
> 
> Domain test-f defined from container_f.conf
> 
> Domain test-f marked as autostarted
> 
> 2013-07-05 01:26:47.155+: 27163: info : libvirt version: 1.1.0
> 2013-07-05 01:26:47.155+: 27163: debug : virLogParseOutputs:1334 : 
> outputs=1:file:/home/gaofeng/libvirtd.log
> waiting a bit
> 167: veth0:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 169: veth1:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 171: veth2:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 173: veth3:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 175: veth4:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 177: veth5:  mtu 1500 qdisc pfifo_fast 
> master virbr0 state UP qlen 1000
> 
> 
> Can you post your libvirt debug log?

Please see attached file.

43: veth0:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UP qlen 1000
45: veth1:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UP qlen 1000
47: veth2:  mtu 1500 qdisc pfifo_fast master virbr0 state 
DOWN qlen 1000
49: veth3:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UP qlen 1000
51: veth4:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UP qlen 1000
53: veth5:  mtu 1500 qdisc pfifo_fast master virbr0 state 
DOWN qlen 100

Thanks,
//richard



debug.log.bz2
Description: application/bzip
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-09 Thread Richard Weinberger
Am 08.07.2013 05:54, schrieb Gao feng:
> On 07/05/2013 06:22 PM, Richard Weinberger wrote:
>> Am 05.07.2013 03:36, schrieb Gao feng:
>>> On 07/05/2013 04:45 AM, Richard Weinberger wrote:
>>>> Hi,
>>>>
>>>> Am 03.07.2013 12:04, schrieb Gao feng:
>>>>> Hi,
>>>>> On 07/01/2013 03:45 PM, Richard Weinberger wrote:
>>>>>> Hi!
>>>>>>
>>>>>> If you have multiple LXC containers with networking and the autostart 
>>>>>> feature enabled libvirtd fails to
>>>>>> up some veth interfaces on the host side.
>>>>>>
>>>>>> Most of the time only the first veth device is in state up, all others 
>>>>>> are down.
>>>>>>
>>>>>> Reproducing is easy.
>>>>>> 1. Define a few containers (5 in my case)
>>>>>> 2. Run "virsh autostart ..." on each one.
>>>>>> 3. stop/start libvirtd
>>>>>>
>>>>>> You'll observe that all containers are running, but "ip a" will report 
>>>>>> on the host
>>>>>> side that not all veth devices are up and are not usable within the 
>>>>>> containers.
>>>>>>
>>>>>> This is not userns related, just retested with libvirt of today.
>>>>>
>>>>> I can not reproduce this problem on my test bed...
>>>>
>>>> Strange.
>>>>
>>>>> maybe you should wait some seconds for the starting of these containers.
>>>>
>>>> Please see the attached shell script. Using it I'm able to trigger the 
>>>> issue on all of
>>>> my test machines.
>>>> run.sh creates six very minimal containers and enables autostart. Then it 
>>>> kills and restarts libvirtd.
>>>> After the script is done you'll see that only one or two veth devices are 
>>>> up.
>>>>
>>>> On the over hand, if I start them manually using a command like this one:
>>>> for cfg in a b c d e f ; do /opt/libvirt/bin/virsh -c lxc:/// start 
>>>> test-$cfg ; done
>>>> All veths are always up.
>>>>
>>>
>>>
>>> I still can not reproduce even use your script.
>>>
>>> [root@Donkey-I5 Desktop]# ./run.sh
>>> Domain test-a defined from container_a.conf
>>>
>>> Domain test-a marked as autostarted
>>>
>>> Domain test-b defined from container_b.conf
>>>
>>> Domain test-b marked as autostarted
>>>
>>> Domain test-c defined from container_c.conf
>>>
>>> Domain test-c marked as autostarted
>>>
>>> Domain test-d defined from container_d.conf
>>>
>>> Domain test-d marked as autostarted
>>>
>>> Domain test-e defined from container_e.conf
>>>
>>> Domain test-e marked as autostarted
>>>
>>> Domain test-f defined from container_f.conf
>>>
>>> Domain test-f marked as autostarted
>>>
>>> 2013-07-05 01:26:47.155+: 27163: info : libvirt version: 1.1.0
>>> 2013-07-05 01:26:47.155+: 27163: debug : virLogParseOutputs:1334 : 
>>> outputs=1:file:/home/gaofeng/libvirtd.log
>>> waiting a bit
>>> 167: veth0:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>> 169: veth1:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>> 171: veth2:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>> 173: veth3:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>> 175: veth4:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>> 177: veth5:  mtu 1500 qdisc pfifo_fast 
>>> master virbr0 state UP qlen 1000
>>>
>>>
>>> Can you post your libvirt debug log?
>>
>> Please see attached file.
>>
>> 43: veth0:  mtu 1500 qdisc pfifo_fast 
>> master virbr0 state UP qlen 1000
>> 45: veth1:  mtu 1500 qdisc pfifo_fast 
>> master virbr0 state UP qlen 1000
>> 47: veth2:  mtu 1500 qdisc pfifo_fast master virbr0 
>> state DOWN qlen 1000
>> 49: veth3:  mtu 1500 qdisc pfifo_fast 
>> master virbr0 state UP qlen 1000
>> 51: veth4:  mtu 1500 qdisc pfifo_fast 
>> master virbr0 state UP qlen 1000
>> 53: veth5:  mtu 1500 qdisc pfifo_fast master virbr0 
>> state DOWN qlen 100
>>
> 
> strange, I can not see veth related error message from your log.
> 
> seems like all of the veth devices of host had been up but for some reasons 
> they become down.

I think libvirt has to do "ip link set dev vethX up".
Otherwise the device state is undefined.

> can you also show me the bridge information on your host?
> brctl show virbr0

bridge name bridge id   STP enabled interfaces
virbr0  8000.2e63ddca07fa   yes veth0
veth1
veth2
veth3
veth4
veth5

> and your default net configuration
> virsh -c lxc:/// net-dumpxml default


  default
  a81b9f40-d279-4e96-8cec-6a906323f162
  

  

  
  
  

  

  


Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-09 Thread Richard Weinberger
Am 10.07.2013 03:20, schrieb Gao feng:
> On 07/09/2013 09:11 PM, Richard Weinberger wrote:
>> Am 08.07.2013 05:54, schrieb Gao feng:
>>> On 07/05/2013 06:22 PM, Richard Weinberger wrote:
>>>> Am 05.07.2013 03:36, schrieb Gao feng:
>>>>> On 07/05/2013 04:45 AM, Richard Weinberger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 03.07.2013 12:04, schrieb Gao feng:
>>>>>>> Hi,
>>>>>>> On 07/01/2013 03:45 PM, Richard Weinberger wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> If you have multiple LXC containers with networking and the autostart 
>>>>>>>> feature enabled libvirtd fails to
>>>>>>>> up some veth interfaces on the host side.
>>>>>>>>
>>>>>>>> Most of the time only the first veth device is in state up, all others 
>>>>>>>> are down.
>>>>>>>>
>>>>>>>> Reproducing is easy.
>>>>>>>> 1. Define a few containers (5 in my case)
>>>>>>>> 2. Run "virsh autostart ..." on each one.
>>>>>>>> 3. stop/start libvirtd
>>>>>>>>
>>>>>>>> You'll observe that all containers are running, but "ip a" will report 
>>>>>>>> on the host
>>>>>>>> side that not all veth devices are up and are not usable within the 
>>>>>>>> containers.
>>>>>>>>
>>>>>>>> This is not userns related, just retested with libvirt of today.
>>>>>>>
>>>>>>> I can not reproduce this problem on my test bed...
>>>>>>
>>>>>> Strange.
>>>>>>
>>>>>>> maybe you should wait some seconds for the starting of these containers.
>>>>>>
>>>>>> Please see the attached shell script. Using it I'm able to trigger the 
>>>>>> issue on all of
>>>>>> my test machines.
>>>>>> run.sh creates six very minimal containers and enables autostart. Then 
>>>>>> it kills and restarts libvirtd.
>>>>>> After the script is done you'll see that only one or two veth devices 
>>>>>> are up.
>>>>>>
>>>>>> On the over hand, if I start them manually using a command like this one:
>>>>>> for cfg in a b c d e f ; do /opt/libvirt/bin/virsh -c lxc:/// start 
>>>>>> test-$cfg ; done
>>>>>> All veths are always up.
>>>>>>
>>>>>
>>>>>
>>>>> I still can not reproduce even use your script.
>>>>>
>>>>> [root@Donkey-I5 Desktop]# ./run.sh
>>>>> Domain test-a defined from container_a.conf
>>>>>
>>>>> Domain test-a marked as autostarted
>>>>>
>>>>> Domain test-b defined from container_b.conf
>>>>>
>>>>> Domain test-b marked as autostarted
>>>>>
>>>>> Domain test-c defined from container_c.conf
>>>>>
>>>>> Domain test-c marked as autostarted
>>>>>
>>>>> Domain test-d defined from container_d.conf
>>>>>
>>>>> Domain test-d marked as autostarted
>>>>>
>>>>> Domain test-e defined from container_e.conf
>>>>>
>>>>> Domain test-e marked as autostarted
>>>>>
>>>>> Domain test-f defined from container_f.conf
>>>>>
>>>>> Domain test-f marked as autostarted
>>>>>
>>>>> 2013-07-05 01:26:47.155+: 27163: info : libvirt version: 1.1.0
>>>>> 2013-07-05 01:26:47.155+: 27163: debug : virLogParseOutputs:1334 : 
>>>>> outputs=1:file:/home/gaofeng/libvirtd.log
>>>>> waiting a bit
>>>>> 167: veth0:  mtu 1500 qdisc pfifo_fast 
>>>>> master virbr0 state UP qlen 1000
>>>>> 169: veth1:  mtu 1500 qdisc pfifo_fast 
>>>>> master virbr0 state UP qlen 1000
>>>>> 171: veth2:  mtu 1500 qdisc pfifo_fast 
>>>>> master virbr0 state UP qlen 1000
>>>>> 173: veth3:  mtu 1500 qdisc pfifo_fast 
>>>>> master virbr0 state UP qlen 1000
>>>>> 175: veth4:  mtu 1500 qdisc pfifo_fast 
>>>>> master virbr0 state UP qlen 100

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-10 Thread Richard Weinberger
Am 10.07.2013 09:03, schrieb Gao feng:
> On 07/10/2013 02:00 PM, Richard Weinberger wrote:
> 
>>>
>>> Yes,actually libvirt did up the veth devices, that's why only veth2& veth5 
>>> are down.
>>
>> Where does libvirt up the devices? The debug log does not contain any "ip 
>> link set dev XXX up" commands.
>> Also in src/util/virnetdevveth.c I'm unable to find such a ip command.
>>
> 
> virLXCProcessSetupInterfaceBridged calls virNetDevSetOnline.

A, it's using an ioctl().

>>> I need to know why these two devices are down, I believe they were up, your 
>>> bridge and default-net
>>> looks good. So please show me your kernel message (dmesg), maybe it can 
>>> give us some useful information.
>>
>> This time veth4 and 5 are down.
>>
>> ---cut---
> 
>> [   44.158209] IPv6: ADDRCONF(NETDEV_UP): veth4: link is not ready
>> [   44.473317] IPv6: ADDRCONF(NETDEV_CHANGE): veth4: link becomes ready
>> [   44.473400] virbr0: topology change detected, propagating
>> [   44.473407] virbr0: port 5(veth4) entered forwarding state
>> [   44.473423] virbr0: port 5(veth4) entered forwarding state
> 
> veth4 were up here
> 
>> [   44.566186] device veth5 entered promiscuous mode
>> [   44.571234] IPv6: ADDRCONF(NETDEV_UP): veth5: link is not ready
>> [   44.571243] virbr0: topology change detected, propagating
>> [   44.571250] virbr0: port 6(veth5) entered forwarding state
>> [   44.571261] virbr0: port 6(veth5) entered forwarding state
>> [   44.902308] IPv6: ADDRCONF(NETDEV_CHANGE): veth5: link becomes ready
>> [   45.000580] virbr0: port 5(veth4) entered disabled state
> 
> and then it became down.
> 
>> [   45.348548] virbr0: port 6(veth5) entered disabled state
> 
> So, Some places disable the veth4 and veth5.
> I don't know in which case these two devices will be disabled.
> 
> I still can't reproduce this problem in my test bed :(
> I need more information to analyse why these two device being disabled.
> 
> So, can you run kernel with the below debug patch?
> 
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index d45e760..aed319b 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -103,7 +103,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
> p->state = BR_STATE_DISABLED;
> p->topology_change_ack = 0;
> p->config_pending = 0;
> -
> +   dump_stack();
> br_log_state(p);
> br_ifinfo_notify(RTM_NEWLINK, p);
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index faebb39..9b1617b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1368,6 +1368,7 @@ static int dev_close_many(struct list_head *head)
> 
> list_for_each_entry(dev, head, unreg_list) {
> rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
> +   dump_stack();
> call_netdevice_notifiers(NETDEV_DOWN, dev);
> }
> 
> @@ -4729,8 +4730,10 @@ void __dev_notify_flags(struct net_device *dev, 
> unsigned int old_flags)
> if (changes & IFF_UP) {
> if (dev->flags & IFF_UP)
> call_netdevice_notifiers(NETDEV_UP, dev);
> -   else
> +   else {
> +   dump_stack();
> call_netdevice_notifiers(NETDEV_DOWN, dev);
> +   }
> }
> 
> if (dev->flags & IFF_UP &&
> 
> 
> Thanks!
> 

There you go:

[2.517243] systemd-journald[729]: Received SIGUSR1
[3.427222] Adding 690172k swap on /dev/vda1.  Priority:-1 extents:1 
across:690172k
[5.703781] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[5.703796] 8021q: adding VLAN 0 to HW filter on device eth0
[7.708579] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
RX
[7.709232] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  126.002595] IPv6: ADDRCONF(NETDEV_UP): virbr0: link is not ready
[  126.450081] cgroup: libvirtd (2763) created nested cgroup for controller 
"memory" which has incomplete hierarchy support. Nested cgroups may change 
behavior in the future.
[  126.450085] cgroup: "memory" requires setting use_hierarchy to 1 on the root.
[  126.450365] cgroup: libvirtd (2763) created nested cgroup for controller 
"blkio" which has incomplete hierarchy support. Nested cgroups may change 
behavior in the future.
[  126.463191] device veth0 entered promiscuous mode
[  126.468207] IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
[  126.468216] virbr0: topology change detected, propagating
[  126.468222] virb

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-10 Thread Richard Weinberger
Am 10.07.2013 11:42, schrieb Gao feng:
> On 07/10/2013 03:23 PM, Richard Weinberger wrote:
>> Am 10.07.2013 09:03, schrieb Gao feng:
>>> On 07/10/2013 02:00 PM, Richard Weinberger wrote:
>>>
>>>>>
>>>>> Yes,actually libvirt did up the veth devices, that's why only veth2& 
>>>>> veth5 are down.
>>>>
>>>> Where does libvirt up the devices? The debug log does not contain any "ip 
>>>> link set dev XXX up" commands.
>>>> Also in src/util/virnetdevveth.c I'm unable to find such a ip command.
>>>>
>>>
>>> virLXCProcessSetupInterfaceBridged calls virNetDevSetOnline.
>>
>> A, it's using an ioctl().
>>
>>>>> I need to know why these two devices are down, I believe they were up, 
>>>>> your bridge and default-net
>>>>> looks good. So please show me your kernel message (dmesg), maybe it can 
>>>>> give us some useful information.
>>>>
>>>> This time veth4 and 5 are down.
>>>>
>>>> ---cut---
>>>
>>>> [   44.158209] IPv6: ADDRCONF(NETDEV_UP): veth4: link is not ready
>>>> [   44.473317] IPv6: ADDRCONF(NETDEV_CHANGE): veth4: link becomes ready
>>>> [   44.473400] virbr0: topology change detected, propagating
>>>> [   44.473407] virbr0: port 5(veth4) entered forwarding state
>>>> [   44.473423] virbr0: port 5(veth4) entered forwarding state
>>>
>>> veth4 were up here
>>>
>>>> [   44.566186] device veth5 entered promiscuous mode
>>>> [   44.571234] IPv6: ADDRCONF(NETDEV_UP): veth5: link is not ready
>>>> [   44.571243] virbr0: topology change detected, propagating
>>>> [   44.571250] virbr0: port 6(veth5) entered forwarding state
>>>> [   44.571261] virbr0: port 6(veth5) entered forwarding state
>>>> [   44.902308] IPv6: ADDRCONF(NETDEV_CHANGE): veth5: link becomes ready
>>>> [   45.000580] virbr0: port 5(veth4) entered disabled state
>>>
>>> and then it became down.
>>>
>>>> [   45.348548] virbr0: port 6(veth5) entered disabled state
>>>
>>> So, Some places disable the veth4 and veth5.
>>> I don't know in which case these two devices will be disabled.
>>>
>>> I still can't reproduce this problem in my test bed :(
>>> I need more information to analyse why these two device being disabled.
>>>
>>> So, can you run kernel with the below debug patch?
>>>
>>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>>> index d45e760..aed319b 100644
>>> --- a/net/bridge/br_stp_if.c
>>> +++ b/net/bridge/br_stp_if.c
>>> @@ -103,7 +103,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
>>> p->state = BR_STATE_DISABLED;
>>> p->topology_change_ack = 0;
>>> p->config_pending = 0;
>>> -
>>> +   dump_stack();
>>> br_log_state(p);
>>> br_ifinfo_notify(RTM_NEWLINK, p);
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index faebb39..9b1617b 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -1368,6 +1368,7 @@ static int dev_close_many(struct list_head *head)
>>>
>>> list_for_each_entry(dev, head, unreg_list) {
>>> rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
>>> +   dump_stack();
>>> call_netdevice_notifiers(NETDEV_DOWN, dev);
>>> }
>>>
>>> @@ -4729,8 +4730,10 @@ void __dev_notify_flags(struct net_device *dev, 
>>> unsigned int old_flags)
>>> if (changes & IFF_UP) {
>>> if (dev->flags & IFF_UP)
>>> call_netdevice_notifiers(NETDEV_UP, dev);
>>> -   else
>>> +   else {
>>> +   dump_stack();
>>> call_netdevice_notifiers(NETDEV_DOWN, dev);
>>> +   }
>>> }
>>>
>>> if (dev->flags & IFF_UP &&
>>>
>>>
>>> Thanks!
>>>
>>
>> There you go:
>>
> 
> Thank you very much.
> 
>> [  129.084408] CPU: 1 PID: 4473 Comm: ip Not tainted 3.10.0+ #20
>> [  129.084412] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [  129.084415]  88003760d000 88003ce7f798 8172b2a6 
>> 88003ce7f7b8
&

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-11 Thread Richard Weinberger
Am 10.07.2013 11:42, schrieb Gao feng:
> On 07/10/2013 03:23 PM, Richard Weinberger wrote:
>> Am 10.07.2013 09:03, schrieb Gao feng:
>>> On 07/10/2013 02:00 PM, Richard Weinberger wrote:
>>>
>>>>>
>>>>> Yes,actually libvirt did up the veth devices, that's why only veth2& 
>>>>> veth5 are down.
>>>>
>>>> Where does libvirt up the devices? The debug log does not contain any "ip 
>>>> link set dev XXX up" commands.
>>>> Also in src/util/virnetdevveth.c I'm unable to find such a ip command.
>>>>
>>>
>>> virLXCProcessSetupInterfaceBridged calls virNetDevSetOnline.
>>
>> A, it's using an ioctl().
>>
>>>>> I need to know why these two devices are down, I believe they were up, 
>>>>> your bridge and default-net
>>>>> looks good. So please show me your kernel message (dmesg), maybe it can 
>>>>> give us some useful information.
>>>>
>>>> This time veth4 and 5 are down.
>>>>
>>>> ---cut---
>>>
>>>> [   44.158209] IPv6: ADDRCONF(NETDEV_UP): veth4: link is not ready
>>>> [   44.473317] IPv6: ADDRCONF(NETDEV_CHANGE): veth4: link becomes ready
>>>> [   44.473400] virbr0: topology change detected, propagating
>>>> [   44.473407] virbr0: port 5(veth4) entered forwarding state
>>>> [   44.473423] virbr0: port 5(veth4) entered forwarding state
>>>
>>> veth4 were up here
>>>
>>>> [   44.566186] device veth5 entered promiscuous mode
>>>> [   44.571234] IPv6: ADDRCONF(NETDEV_UP): veth5: link is not ready
>>>> [   44.571243] virbr0: topology change detected, propagating
>>>> [   44.571250] virbr0: port 6(veth5) entered forwarding state
>>>> [   44.571261] virbr0: port 6(veth5) entered forwarding state
>>>> [   44.902308] IPv6: ADDRCONF(NETDEV_CHANGE): veth5: link becomes ready
>>>> [   45.000580] virbr0: port 5(veth4) entered disabled state
>>>
>>> and then it became down.
>>>
>>>> [   45.348548] virbr0: port 6(veth5) entered disabled state
>>>
>>> So, Some places disable the veth4 and veth5.
>>> I don't know in which case these two devices will be disabled.
>>>
>>> I still can't reproduce this problem in my test bed :(
>>> I need more information to analyse why these two device being disabled.
>>>
>>> So, can you run kernel with the below debug patch?
>>>
>>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>>> index d45e760..aed319b 100644
>>> --- a/net/bridge/br_stp_if.c
>>> +++ b/net/bridge/br_stp_if.c
>>> @@ -103,7 +103,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
>>> p->state = BR_STATE_DISABLED;
>>> p->topology_change_ack = 0;
>>> p->config_pending = 0;
>>> -
>>> +   dump_stack();
>>> br_log_state(p);
>>> br_ifinfo_notify(RTM_NEWLINK, p);
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index faebb39..9b1617b 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -1368,6 +1368,7 @@ static int dev_close_many(struct list_head *head)
>>>
>>> list_for_each_entry(dev, head, unreg_list) {
>>> rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
>>> +   dump_stack();
>>> call_netdevice_notifiers(NETDEV_DOWN, dev);
>>> }
>>>
>>> @@ -4729,8 +4730,10 @@ void __dev_notify_flags(struct net_device *dev, 
>>> unsigned int old_flags)
>>> if (changes & IFF_UP) {
>>> if (dev->flags & IFF_UP)
>>> call_netdevice_notifiers(NETDEV_UP, dev);
>>> -   else
>>> +   else {
>>> +   dump_stack();
>>> call_netdevice_notifiers(NETDEV_DOWN, dev);
>>> +   }
>>> }
>>>
>>> if (dev->flags & IFF_UP &&
>>>
>>>
>>> Thanks!
>>>
>>
>> There you go:
>>
> 
> Thank you very much.
> 
>> [  129.084408] CPU: 1 PID: 4473 Comm: ip Not tainted 3.10.0+ #20
>> [  129.084412] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [  129.084415]  88003760d000 88003ce7f798 8172b2a6 
>> 88003ce7f7b8
>

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-11 Thread Richard Weinberger
Am 11.07.2013 11:42, schrieb Gao feng:
> On 07/11/2013 03:18 PM, Richard Weinberger wrote:
>> Am 10.07.2013 11:42, schrieb Gao feng:
>>> On 07/10/2013 03:23 PM, Richard Weinberger wrote:
>>>> Am 10.07.2013 09:03, schrieb Gao feng:
>>>>> On 07/10/2013 02:00 PM, Richard Weinberger wrote:
>>>>>
>>>>>>>
>>>>>>> Yes,actually libvirt did up the veth devices, that's why only veth2& 
>>>>>>> veth5 are down.
>>>>>>
>>>>>> Where does libvirt up the devices? The debug log does not contain any 
>>>>>> "ip link set dev XXX up" commands.
>>>>>> Also in src/util/virnetdevveth.c I'm unable to find such a ip command.
>>>>>>
>>>>>
>>>>> virLXCProcessSetupInterfaceBridged calls virNetDevSetOnline.
>>>>
>>>> A, it's using an ioctl().
>>>>
>>>>>>> I need to know why these two devices are down, I believe they were up, 
>>>>>>> your bridge and default-net
>>>>>>> looks good. So please show me your kernel message (dmesg), maybe it can 
>>>>>>> give us some useful information.
>>>>>>
>>>>>> This time veth4 and 5 are down.
>>>>>>
>>>>>> ---cut---
>>>>>
>>>>>> [   44.158209] IPv6: ADDRCONF(NETDEV_UP): veth4: link is not ready
>>>>>> [   44.473317] IPv6: ADDRCONF(NETDEV_CHANGE): veth4: link becomes ready
>>>>>> [   44.473400] virbr0: topology change detected, propagating
>>>>>> [   44.473407] virbr0: port 5(veth4) entered forwarding state
>>>>>> [   44.473423] virbr0: port 5(veth4) entered forwarding state
>>>>>
>>>>> veth4 were up here
>>>>>
>>>>>> [   44.566186] device veth5 entered promiscuous mode
>>>>>> [   44.571234] IPv6: ADDRCONF(NETDEV_UP): veth5: link is not ready
>>>>>> [   44.571243] virbr0: topology change detected, propagating
>>>>>> [   44.571250] virbr0: port 6(veth5) entered forwarding state
>>>>>> [   44.571261] virbr0: port 6(veth5) entered forwarding state
>>>>>> [   44.902308] IPv6: ADDRCONF(NETDEV_CHANGE): veth5: link becomes ready
>>>>>> [   45.000580] virbr0: port 5(veth4) entered disabled state
>>>>>
>>>>> and then it became down.
>>>>>
>>>>>> [   45.348548] virbr0: port 6(veth5) entered disabled state
>>>>>
>>>>> So, Some places disable the veth4 and veth5.
>>>>> I don't know in which case these two devices will be disabled.
>>>>>
>>>>> I still can't reproduce this problem in my test bed :(
>>>>> I need more information to analyse why these two device being disabled.
>>>>>
>>>>> So, can you run kernel with the below debug patch?
>>>>>
>>>>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>>>>> index d45e760..aed319b 100644
>>>>> --- a/net/bridge/br_stp_if.c
>>>>> +++ b/net/bridge/br_stp_if.c
>>>>> @@ -103,7 +103,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
>>>>> p->state = BR_STATE_DISABLED;
>>>>> p->topology_change_ack = 0;
>>>>> p->config_pending = 0;
>>>>> -
>>>>> +   dump_stack();
>>>>> br_log_state(p);
>>>>> br_ifinfo_notify(RTM_NEWLINK, p);
>>>>>
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index faebb39..9b1617b 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -1368,6 +1368,7 @@ static int dev_close_many(struct list_head *head)
>>>>>
>>>>> list_for_each_entry(dev, head, unreg_list) {
>>>>> rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
>>>>> +   dump_stack();
>>>>> call_netdevice_notifiers(NETDEV_DOWN, dev);
>>>>> }
>>>>>
>>>>> @@ -4729,8 +4730,10 @@ void __dev_notify_flags(struct net_device *dev, 
>>>>> unsigned int old_flags)
>>>>> if (changes & IFF_UP) {
>>>>> if (dev->flags & IFF_UP)
>>>>> c

Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-11 Thread Richard Weinberger
Am 11.07.2013 11:49, schrieb Daniel P. Berrange:
> On Thu, Jul 11, 2013 at 11:44:48AM +0200, Richard Weinberger wrote:
>> Am 11.07.2013 11:42, schrieb Gao feng:
>>> On 07/11/2013 03:18 PM, Richard Weinberger wrote:
>>>> This morning I've installed a wrapper around ip to show me the process 
>>>> tree upon ip link ... down is used.
>>>> The log showed this:
>>>>
>>>>   769 ?Ss 0:00 /usr/lib/systemd/systemd-udevd
>>>> 17759 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>> 17764 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>> 17772 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>> 19477 ?S  0:00  |   \_ /bin/bash /sbin/ifdown veth5 -o hotplug
>>>> 19910 ?S  0:00  |   \_ /sbin/ip link set dev veth5 down
>>>>
>>>> Now I have to urge to use a "Kantholz". ;-)
>>>>
>>>
>>> hmmm...
>>>
>>> it's systemd... I have no idea now... :(
>>
>> TBH it is not systemd's fault.
>> OpenSUSE's /usr/lib/udev/rules.d/77-network.rules did not white list veth* 
>> devices.
>> Therefore systemd-udevd called ifup/down and other hotplug magic.
> 
> Ah ha, that's a nice issue :-) I assume you've filed a bug against opensuse
> to fix this ? Can you post a link to the bug here for the sake of archive
> records.

Sure:
https://bugzilla.novell.com/show_bug.cgi?id=829033

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-17 Thread Richard Weinberger
Am 12.07.2013 03:36, schrieb Gao feng:
> On 07/11/2013 07:58 PM, Richard Weinberger wrote:
>> Am 11.07.2013 11:49, schrieb Daniel P. Berrange:
>>> On Thu, Jul 11, 2013 at 11:44:48AM +0200, Richard Weinberger wrote:
>>>> Am 11.07.2013 11:42, schrieb Gao feng:
>>>>> On 07/11/2013 03:18 PM, Richard Weinberger wrote:
>>>>>> This morning I've installed a wrapper around ip to show me the process 
>>>>>> tree upon ip link ... down is used.
>>>>>> The log showed this:
>>>>>>
>>>>>>   769 ?Ss 0:00 /usr/lib/systemd/systemd-udevd
>>>>>> 17759 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>> 17764 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>> 17772 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>> 19477 ?S  0:00  |   \_ /bin/bash /sbin/ifdown veth5 -o 
>>>>>> hotplug
>>>>>> 19910 ?S  0:00  |   \_ /sbin/ip link set dev veth5 down
>>>>>>
>>>>>> Now I have to urge to use a "Kantholz". ;-)
>>>>>>
>>>>>
>>>>> hmmm...
>>>>>
>>>>> it's systemd... I have no idea now... :(
>>>>
>>>> TBH it is not systemd's fault.
>>>> OpenSUSE's /usr/lib/udev/rules.d/77-network.rules did not white list veth* 
>>>> devices.
>>>> Therefore systemd-udevd called ifup/down and other hotplug magic.
>>>
>>> Ah ha, that's a nice issue :-) I assume you've filed a bug against opensuse
>>> to fix this ? Can you post a link to the bug here for the sake of archive
>>> records.
>>
>> Sure:
>> https://bugzilla.novell.com/show_bug.cgi?id=829033
>>
> 
> It's good news we know what causes veth device down. :)

How does Fedora deal with veth devices?
SUSE folks think that this is a more likely a libvirt issue and closed my bug 
report as invalid...

Thanks,
//richard

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] LXC: autostart feature does set all interfaces to state up.

2013-07-18 Thread Richard Weinberger
Am 18.07.2013 16:50, schrieb Jim Fehlig:
> Richard,
> 
> I think you should have cc'd the bug assignee when discussing this issue
> upstream.  Adding him now...

Oh, sorry for that!
I thought I did so after pointing Marius to the thread in the mailing list 
archive,
but obviously I forgot.

Thanks,
//richard

> Regards,
> Jim
> 
> 
> Daniel P. Berrange wrote:
>> On Wed, Jul 17, 2013 at 11:33:22PM +0200, Richard Weinberger wrote:
>>   
>>> Am 12.07.2013 03:36, schrieb Gao feng:
>>> 
>>>> On 07/11/2013 07:58 PM, Richard Weinberger wrote:
>>>>   
>>>>> Am 11.07.2013 11:49, schrieb Daniel P. Berrange:
>>>>> 
>>>>>> On Thu, Jul 11, 2013 at 11:44:48AM +0200, Richard Weinberger wrote:
>>>>>>   
>>>>>>> Am 11.07.2013 11:42, schrieb Gao feng:
>>>>>>> 
>>>>>>>> On 07/11/2013 03:18 PM, Richard Weinberger wrote:
>>>>>>>>   
>>>>>>>>> This morning I've installed a wrapper around ip to show me the 
>>>>>>>>> process tree upon ip link ... down is used.
>>>>>>>>> The log showed this:
>>>>>>>>>
>>>>>>>>>   769 ?Ss 0:00 /usr/lib/systemd/systemd-udevd
>>>>>>>>> 17759 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>>>>> 17764 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>>>>> 17772 ?S  0:00  \_ /usr/lib/systemd/systemd-udevd
>>>>>>>>> 19477 ?S  0:00  |   \_ /bin/bash /sbin/ifdown veth5 -o 
>>>>>>>>> hotplug
>>>>>>>>> 19910 ?S  0:00  |   \_ /sbin/ip link set dev veth5 
>>>>>>>>> down
>>>>>>>>>
>>>>>>>>> Now I have to urge to use a "Kantholz". ;-)
>>>>>>>>>
>>>>>>>>> 
>>>>>>>> hmmm...
>>>>>>>>
>>>>>>>> it's systemd... I have no idea now... :(
>>>>>>>>   
>>>>>>> TBH it is not systemd's fault.
>>>>>>> OpenSUSE's /usr/lib/udev/rules.d/77-network.rules did not white list 
>>>>>>> veth* devices.
>>>>>>> Therefore systemd-udevd called ifup/down and other hotplug magic.
>>>>>>> 
>>>>>> Ah ha, that's a nice issue :-) I assume you've filed a bug against 
>>>>>> opensuse
>>>>>> to fix this ? Can you post a link to the bug here for the sake of archive
>>>>>> records.
>>>>>>   
>>>>> Sure:
>>>>> https://bugzilla.novell.com/show_bug.cgi?id=829033
>>>>>
>>>>> 
>>>> It's good news we know what causes veth device down. :)
>>>>   
>>> How does Fedora deal with veth devices?
>>> 
>>
>> Well the udev script you mention above does not exist on Fedora and AFAIK
>> there's no other udev script which runs 'ifconfig down' on NICs.
>>
>>   
>>> SUSE folks think that this is a more likely a libvirt issue and closed my 
>>> bug report as invalid...
>>> 
>>
>> If you remove or modify the 77-network.rules file does it fix the problem.
>> If so, then it is obviously not a libvirt issue.  IMHO it is completely
>> bogus for udev to be arbitrarily ifdown'ing any interface.
>>
>> Daniel
>>   

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


  1   2   >