To get the environment variables in snapping

2017-03-13 Thread Jin Hsieh
Hello guys,

We are making a workable mail solution snap,
the selected back-end services are Postfix(SMTP) and Dovecot(POP3/IMAP).

For those services as you known they have some dependencies with classic
paths,
for example the /etc/postfix/main.cf and /etc/dovecot/dovecot.conf so we
made some changes on the upstream code,
to get the environment variables such as SNAP, SNAP_DATA/SNAP_COMMON by
getenv(),
then replace the classic one into $SNAP_DATA/etc/postfix/* to make it works
properly.

Also, for a bigger snap like this,
the component(one of the parts) will execute external command to trigger
another one,
in such the case Postfix forks its child process: pipe to call dovecot-lda
(which is a sub-process of Dovecot),
to complete the mail retrieving function, now I would say *it is lucky to
have a useful configuration parameter* "export_environment" in Postfix,
so that we can make sure the commands could be executed correctly in our
snap world with specific $SNAP exported.
http://www.postfix.org/postconf.5.html


But you could recognize the modification we made *is hard to upstream guys
to accept/merge the changes into their trunk*,
so many getenv("SNAP") in the code or in where the code flow uses
clean_env() we also need to cache the SNAP-related variables,
I am wondering if we have a better way to do this, for example an interface
or environment-sharing could make the life easier?
1. then have the chances to make the upstream code up without
getenv("SNAP") to make their code snap-adaptive
2. then have a much regular way to export environment when one
component calls another one externally without "export_environment" support

Any feedback would be welcomed,
many thanks.

BR.
Jin
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
Thank you Alfonso for that link (I didn't find that when I looked, so
kudos), and thank you Michael and Zyga for all your work on this!

On 12 March 2017 at 08:38, Michael Hudson-Doyle
 wrote:
> PS: I guess I should back port that Go fix to all supported Go releases?

that would be very nice. Does that include trusty? And how far can it
reach in Debian?

Zyga, what are our chances of patching Go on other distros?

Michael, WRT the non-cgo case, can you report it upstream? Would you
rather I did?

I'm off to try to find a reproducer for the "oops i lost the suid bit" bug :-D

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Part 2! Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
This one is slightly more interesting.

You need 1.8 (or patched <1.8 as per the previous thread) for this one
to make sense; without it you're just going to get drowned in warning
messages and not see the real issue.

This one is the real issue :-)

In go, when calling syscall.Exec to a setuid root binary, sometimes
(about 4% of the times, on my machine, but it's hardware- and
load-dependent), the exec'ed process will find itself running with
effective uid different to zero. That is, a setuid root binary will
find itself running as non-root. As the process that sets up
confinement is setuid root (in distros where setuid is favoured over
capabilities), this means the snap app falls on its face.

TODO: check if something similar happens when using caps

This is *probably* a bug in Go, but it only seems to arise when using
syscall.Exec, which as far as I can tell is unsupported (the whole
syscall package is unsupported -- not covered by the go1 compatibility
promise -- and its replacement, golang.org/x/sys/unix, is ominously
missing Exec).

Having said that, it might be a bug in the kernel ;-)
And I say this because if you pin the process to a single cpu, the
issue doesn't arise.

Anyway, code to repro this is at
https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813

on my machine,

$ for i in `seq -w `; do ./a_c; done | wc -l
0
$ for i in `seq -w `; do ./a_go; done | wc -l
394

And,

$ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
0

Gnarly!

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Gustavo Niemeyer
Thanks for finding and debugging this John.

I can't see how this one would be in Go. I can't think of any related
action that would disable the setuid bit on the exec syscall.

Perhaps someone from the kernel team can help here?



On Mon, Mar 13, 2017 at 9:59 AM, John Lenton 
wrote:

> This one is slightly more interesting.
>
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
>
> This one is the real issue :-)
>
> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
>
> TODO: check if something similar happens when using caps
>
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
>
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
>
> Anyway, code to repro this is at
> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>
> on my machine,
>
> $ for i in `seq -w `; do ./a_c; done | wc -l
> 0
> $ for i in `seq -w `; do ./a_go; done | wc -l
> 394
>
> And,
>
> $ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
> 0
>
> Gnarly!
>
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
> an/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-13 Thread Gustavo Niemeyer
Thanks for your help on this, Michael.

Yes, the fix seems pretty simple and porting to older Go releases will
ensure people are not getting those super awkward messages randomly.

I have created a Trello card with a checklist so we can track all the
distributions that we need to fix, and included you, John, and Zygmunt.



On Sun, Mar 12, 2017 at 5:38 AM, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> PS: I guess I should back port that Go fix to all supported Go releases?
>
> On 12 March 2017 at 21:37, Michael Hudson-Doyle <
> michael.hud...@canonical.com> wrote:
>
> > Before we get into this, what is the actual problem here? Just the ugly
> > messages?
> >
> > On 11 March 2017 at 02:58, Alfonso Sanchez-Beato  > canonical.com> wrote:
> >
> >> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton <
> john.len...@canonical.com>
> >> wrote:
> >>
> >> > Hello!
> >> >
> >> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> >> > you're knowledgeable about one or more of those things, could you take
> >> > a look? Thank you.
> >> >
> >> > The issue manifests as nasty warnings from the "snap run" command,
> >> > which is also the first step into a snapped app or service. It looks
> >> > like
> >> >
> >> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >> >
> >> > a very stripped-down reproducer is http://pastebin.ubuntu.com/
> 24150663/
> >> >
> >> > build that, run it in a loop, and you'll see a bunch of those messages
> >> > (and some weirder ones, but let's take it one step at a time)
> >>
> >
> > Turns out this was fixed in Go 1.8: https://go-review.
> > googlesource.com/c/33894/
> >
> >
> >> > if you comment out the 'import "C"' line the message will change but
> >> > still happen, which makes me think that at least in part this is a Go
> >> > issue (or that we're holding it wrong).
> >>
> >
> > ... but only in the non-cgo case, you can (occasionally) still get
> > messages like:
> >
> > runtime: failed to create new OS thread (have 5 already; errno=11)
> > runtime: may need to increase max user processes (ulimit -u)
> > fatal error: newosproc
> >
> > if you comment out the import "C". I guess we should report that
> upstream.
> >
> >
> >> > Note that the exec does work; the warning seems to come from a
> >> > different thread than the one doing the Exec (the other clue that
> >> > points in this direction is that sometimes the message is truncated).
> >> > You can verify the fact that it does run by changing the /bin/true to
> >> > /bin/echo os.Args[1], but because this issue is obviously a race
> >> > somewhere, this change makes it less likely to happen (from ~10% down
> >> > to ~.5% of runs, in my machines).
> >> >
> >> > One thing that makes this harder to debug is that strace'ing the
> >> > process hangs (hard, kill -9 of strace to get out) before reproducing
> >> > the issue. This probably means we need to trace it at a lower level,
> >> > and I don't know enough about tracing a process group from inside the
> >> > kernel to be able to do that; what I can find about kernel-level
> >> > tracing is around syscalls or devices.
> >> >
> >> > Ideas?
> >> >
> >>
> >> I found this related thread:
> >>
> >> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
> >> 4/lhROTfN9TxIJ
> >>
> >> <<
> >> I believe this can happen on GNU/Linux if your program uses cgo and if
> >> thread A is in the Go runtime starting up a new thread B while thread
> >> C is execing a program.  The underlying cause is that while one thread
> >> is calling exec the Linux kernel will fail attempts by other threads
> >> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> >> field in the kernel sources.)
> >> >>
> >>
> >
> > Yeah, this seems to be very accurate. It's also why it seems this is a
> > cosmetic problem only, some thread not calling exec fails, but well, the
> > thread is about to die anyway.
> >
> >
> >> Something like adding a little sleep removes the traces, for instance:
> >>
> >> http://paste.ubuntu.com/24151637/
> >>
> >> where the program run sleep for 1ms before calling Exec. For smaller
> units
> >> (say, 20 us) the issue still happens.
> >>
> >> It looks to me that right before running main(), go creates some
> threads,
> >> calling clone() and probably getting the race described in the thread.
> As
> >> anyway you are running Exec I guess the traces are harmless, you do not
> >> need the go threads. Nonetheless, I think that the go run time should
> >> retry
> >> instead of printing that trace.
> >
> >
> > Cheers,
> > mwh
> >
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Removing an unreleased package revision

2017-03-13 Thread Bret A. Barker
On Sun, Mar 12, 2017 at 06:15:21PM +0100, Joseph Rushton Wakeling wrote:
> Hello folks,
> 
> Is there a mechanism for removing an unreleased package revision from the
> store?  I'd like to get rid of revision 7 of the ldc2 snap, which has some
> issues on 16.10 that make it unsuitable for release (even to the Edge
> channel).
> 

We don't generally remove revisions, but it shouldn't impact your ability to 
push and release newer revisions.

-bret

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Michael Hudson-Doyle
On 14 March 2017 at 01:59, John Lenton  wrote:

> This one is slightly more interesting.
>
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
>
> This one is the real issue :-)
>

Ah _hah_!


> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
>
> TODO: check if something similar happens when using caps
>
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
>
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
>
> Anyway, code to repro this is at
> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>
> on my machine,
>
> $ for i in `seq -w `; do ./a_c; done | wc -l
> 0
> $ for i in `seq -w `; do ./a_go; done | wc -l
> 394
>
> And,
>
> $ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
> 0
>
> Gnarly!
>

That's pretty exciting.  I bet this is going to have the same underlying
cause as the other bug: something some other thread in the go process is
doing is causing the kernel to ignore the setuid bit. If I add a
time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
is respected every time. It doesn't help that setuid is ignored when
tracing or that strace likes to hang when you trace a_go.

I spent a while staring at the kernel source but I don't really have any
idea how this might be happening. It might be this code
https://github.com/torvalds/linux/blob/master/security/commoncap.c#L549-L561,
but I don't know how to be sure (well, without building kernels to do
debugging-via-kprint or whatever).

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Michael Hudson-Doyle
On 14 March 2017 at 10:05, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

>
>
> On 14 March 2017 at 01:59, John Lenton  wrote:
>
>> This one is slightly more interesting.
>>
>> You need 1.8 (or patched <1.8 as per the previous thread) for this one
>> to make sense; without it you're just going to get drowned in warning
>> messages and not see the real issue.
>>
>> This one is the real issue :-)
>>
>
> Ah _hah_!
>
>
>> In go, when calling syscall.Exec to a setuid root binary, sometimes
>> (about 4% of the times, on my machine, but it's hardware- and
>> load-dependent), the exec'ed process will find itself running with
>> effective uid different to zero. That is, a setuid root binary will
>> find itself running as non-root. As the process that sets up
>> confinement is setuid root (in distros where setuid is favoured over
>> capabilities), this means the snap app falls on its face.
>>
>> TODO: check if something similar happens when using caps
>>
>> This is *probably* a bug in Go, but it only seems to arise when using
>> syscall.Exec, which as far as I can tell is unsupported (the whole
>> syscall package is unsupported -- not covered by the go1 compatibility
>> promise -- and its replacement, golang.org/x/sys/unix, is ominously
>> missing Exec).
>>
>> Having said that, it might be a bug in the kernel ;-)
>> And I say this because if you pin the process to a single cpu, the
>> issue doesn't arise.
>>
>> Anyway, code to repro this is at
>> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>>
>> on my machine,
>>
>> $ for i in `seq -w `; do ./a_c; done | wc -l
>> 0
>> $ for i in `seq -w `; do ./a_go; done | wc -l
>> 394
>>
>> And,
>>
>> $ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
>> 0
>>
>> Gnarly!
>>
>
> That's pretty exciting.  I bet this is going to have the same underlying
> cause as the other bug: something some other thread in the go process is
> doing is causing the kernel to ignore the setuid bit. If I add a
> time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
> is respected every time. It doesn't help that setuid is ignored when
> tracing or that strace likes to hang when you trace a_go.
>
> I spent a while staring at the kernel source but I don't really have any
> idea how this might be happening. It might be this code
> https://github.com/torvalds/linux/blob/master/
> security/commoncap.c#L549-L561, but I don't know how to be sure (well,
> without building kernels to do debugging-via-kprint or whatever).
>
>
Perhaps unsurprisingly, I think what the other go threads are doing that is
causing the problem is -- as before -- calling clone(). I can reproduce
with a C program I wrote to call it as much as possible:

mwhudson@aeglos:/opt/opensource/go-test-cases/setuid-lossage$ cat a_c2.c
#include 
#include 
#include 
#include 

void* nothing(void *p) {
return NULL;
}

void run_do_nothing_thread() {
pthread_t t;
if (pthread_create(&t, NULL, nothing, NULL) == 0) {
pthread_join(t, NULL);
}
}

void* target(void *p) {
while (1) {
run_do_nothing_thread();
}
return NULL;
}

void run_spinner_thread() {
pthread_t t;
pthread_create(&t, NULL, target, NULL);
}

int main() {
struct timespec tv;
for (int i = 0; i < 10; i++) {
run_spinner_thread();
}
tv.tv_sec = 0;
tv.tv_nsec = 10;
nanosleep(&tv, NULL);
int rv = execl("./b", "./b", NULL);
printf("WHAAT! %d\n", rv);
return 0;
}
mwhudson@aeglos:/opt/opensource/go-test-cases/setuid-lossage$ cc -Wall
 a_c2.c   -o a_c2 -lpthread
mwhudson@aeglos:/opt/opensource/go-test-cases/setuid-lossage$ for i in `seq
1000`; do ./a_c2; done | wc -l
23

Looking at the kernel again, I think it might be some kind of race between
all the copying of kernel structures that happens in the clone syscall and
the checking that happens around exec? Pretty hairy though.

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
On 13 March 2017 at 21:05, Michael Hudson-Doyle
 wrote:
> If I add a
> time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
> is respected every time.

on my way to bed, I'll give your response a proper read in the
morning, but note that my reproducer causes the issue a lot more
frequently than in "the real world" (snap run calling snap-confine
calling snap-exec), where delays are happening naturally (because the
programs aren't just calling exec, they actually have work to do :-p).
I don't have numbers for how often it happens in snappy; it's a lot
less frequent, but often enough to be annoying when working
interactively (and there are bug reports with these warnings/errors in
lp already).

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Michael Hudson-Doyle
On 14 March 2017 at 12:21, John Lenton  wrote:

> On 13 March 2017 at 21:05, Michael Hudson-Doyle
>  wrote:
> > If I add a
> > time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
> > is respected every time.
>
> on my way to bed, I'll give your response a proper read in the
> morning, but note that my reproducer causes the issue a lot more
> frequently than in "the real world" (snap run calling snap-confine
> calling snap-exec), where delays are happening naturally (because the
> programs aren't just calling exec, they actually have work to do :-p).
> I don't have numbers for how often it happens in snappy; it's a lot
> less frequent, but often enough to be annoying when working
> interactively (and there are bug reports with these warnings/errors in
> lp already).


Oh yes, the sleep was just to allow the other threads to settle in the test
case program. It's not a solution for the real world at all.

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: To get the environment variables in snapping

2017-03-13 Thread Manik Taneja
jin,

i wonder if this would be better confined as classic?

/manik

On Mon, Mar 13, 2017 at 1:29 AM, Jin Hsieh  wrote:

> Hello guys,
>
> We are making a workable mail solution snap,
> the selected back-end services are Postfix(SMTP) and Dovecot(POP3/IMAP).
>
> For those services as you known they have some dependencies with classic
> paths,
> for example the /etc/postfix/main.cf and /etc/dovecot/dovecot.conf so we
> made some changes on the upstream code,
> to get the environment variables such as SNAP, SNAP_DATA/SNAP_COMMON by
> getenv(),
> then replace the classic one into $SNAP_DATA/etc/postfix/* to make it works
> properly.
>
> Also, for a bigger snap like this,
> the component(one of the parts) will execute external command to trigger
> another one,
> in such the case Postfix forks its child process: pipe to call dovecot-lda
> (which is a sub-process of Dovecot),
> to complete the mail retrieving function, now I would say *it is lucky to
> have a useful configuration parameter* "export_environment" in Postfix,
> so that we can make sure the commands could be executed correctly in our
> snap world with specific $SNAP exported.
> http://www.postfix.org/postconf.5.html
>
>
> But you could recognize the modification we made *is hard to upstream guys
> to accept/merge the changes into their trunk*,
> so many getenv("SNAP") in the code or in where the code flow uses
> clean_env() we also need to cache the SNAP-related variables,
> I am wondering if we have a better way to do this, for example an interface
> or environment-sharing could make the life easier?
> 1. then have the chances to make the upstream code up without
> getenv("SNAP") to make their code snap-adaptive
> 2. then have a much regular way to export environment when one
> component calls another one externally without "export_environment" support
>
> Any feedback would be welcomed,
> many thanks.
>
> BR.
> Jin
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft