Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Andy Lutomirski
On Jul 11, 2014 3:48 PM, "Kees Cook"  wrote:
>
> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore  wrote:
> > Anyway, getting back to the idea I mentioned earlier ... as many of you may
> > know, Kees (added to the CC line) is working on some seccomp filter
> > improvements which will result in a new seccomp syscall.  Perhaps one way
> > forward is to preserve everything as it is currently with the prctl()
> > interface, but with the new seccomp() based interface we fixup x32 and use 
> > the
> > new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of 
> > the
> > kernel, but I don't think it would be too bad, and I think it would address
> > both our concerns.
>
> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
> x32 and x86_64 are considered "64-bit". */" should be changed...)
>

I admit I'm not convinced that the current situation is really wrong.
The "audit arch" uniquely defines a mapping between the actual syscall
and nr and the regs.  The natural way to convert between seccomp_data
and the syscall entry and regs is completely correct.

For things like ARM OABI, the situation is different: OABI and EABI
really do work differently wrt the interpretation of the syscall regs.
This isn't the case for x32.

Of course, the audit code screws this up completely, but audit is
disabled for x32.  My upcoming seccomp patchset will clean it up a
little.  IMO there's no actual audit ABI to preserve for x32, because
that code has never been anything other than terminally fvcked.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 3:55 PM, H. Peter Anvin  wrote:
> It includes the X32 bit.

If the uapi for __NR_* includes the x32 bit, then that's what seccomp
filters must be seeing. Building seccomp filters is documented to use
the __NR_* values.

-Kees

>
> On July 11, 2014 3:52:42 PM PDT, Kees Cook  wrote:
>>On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook 
>>wrote:
>>> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore 
>>wrote:
 Anyway, getting back to the idea I mentioned earlier ... as many of
>>you may
 know, Kees (added to the CC line) is working on some seccomp filter
 improvements which will result in a new seccomp syscall.  Perhaps
>>one way
 forward is to preserve everything as it is currently with the
>>prctl()
 interface, but with the new seccomp() based interface we fixup x32
>>and use the
 new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
>>some of the
 kernel, but I don't think it would be too bad, and I think it would
>>address
 both our concerns.
>>>
>>> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/*
>>Both
>>> x32 and x86_64 are considered "64-bit". */" should be changed...)
>>>
>>> Just so I understand: currently x86_64 and x32 both present as
>>> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
>>> (due to the set high bit).
>>>
>>> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
>>> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
>>> without setting __X32_SYSCALL_BIT would be totally fine (it would
>>> catch the arch instead of the syscall). This sounds similar to how
>>> libseccomp is doing things, so these should be fine.
>>
>>I should clarify: seccomp expects to find whatever is sent as the
>>syscall nr... as in the __NR_read used like this:
>>
>>BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
>>offsetof(struct seccomp_data, nr)),
>>BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
>>BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
>>BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
>>
>>Are there native x32 users yet? What does __NR_read resolve to via the
>>uapi on a native x32 userspace?
>>
>>-Kees
>>
>>> The only project I know of doing blacklisting is lxc, and Eric's
>>> example looks a lot like a discussion I saw with lxc and init_module.
>>> :) So it sounds like we can get this right there.
>>>
>>> I'd like to avoid carrying a delta on filter logic based on the prctl
>>> vs syscall entry. Can we find any userspace filters being used that a
>>> "correct" fix would break? (If so, then yes, we'll need to do this
>>> proposed "via prctl or via syscall?" change.)
>>>
>>> -Kees
>>>
>>> --
>>> Kees Cook
>>> Chrome OS Security
>
> --
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
It includes the X32 bit.

On July 11, 2014 3:52:42 PM PDT, Kees Cook  wrote:
>On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook 
>wrote:
>> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore 
>wrote:
>>> Anyway, getting back to the idea I mentioned earlier ... as many of
>you may
>>> know, Kees (added to the CC line) is working on some seccomp filter
>>> improvements which will result in a new seccomp syscall.  Perhaps
>one way
>>> forward is to preserve everything as it is currently with the
>prctl()
>>> interface, but with the new seccomp() based interface we fixup x32
>and use the
>>> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
>some of the
>>> kernel, but I don't think it would be too bad, and I think it would
>address
>>> both our concerns.
>>
>> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/*
>Both
>> x32 and x86_64 are considered "64-bit". */" should be changed...)
>>
>> Just so I understand: currently x86_64 and x32 both present as
>> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
>> (due to the set high bit).
>>
>> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
>> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
>> without setting __X32_SYSCALL_BIT would be totally fine (it would
>> catch the arch instead of the syscall). This sounds similar to how
>> libseccomp is doing things, so these should be fine.
>
>I should clarify: seccomp expects to find whatever is sent as the
>syscall nr... as in the __NR_read used like this:
>
>BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
>offsetof(struct seccomp_data, nr)),
>BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
>BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
>BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
>
>Are there native x32 users yet? What does __NR_read resolve to via the
>uapi on a native x32 userspace?
>
>-Kees
>
>> The only project I know of doing blacklisting is lxc, and Eric's
>> example looks a lot like a discussion I saw with lxc and init_module.
>> :) So it sounds like we can get this right there.
>>
>> I'd like to avoid carrying a delta on filter logic based on the prctl
>> vs syscall entry. Can we find any userspace filters being used that a
>> "correct" fix would break? (If so, then yes, we'll need to do this
>> proposed "via prctl or via syscall?" change.)
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Chrome OS Security

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook  wrote:
> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore  wrote:
>> Anyway, getting back to the idea I mentioned earlier ... as many of you may
>> know, Kees (added to the CC line) is working on some seccomp filter
>> improvements which will result in a new seccomp syscall.  Perhaps one way
>> forward is to preserve everything as it is currently with the prctl()
>> interface, but with the new seccomp() based interface we fixup x32 and use 
>> the
>> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of 
>> the
>> kernel, but I don't think it would be too bad, and I think it would address
>> both our concerns.
>
> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
> x32 and x86_64 are considered "64-bit". */" should be changed...)
>
> Just so I understand: currently x86_64 and x32 both present as
> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
> (due to the set high bit).
>
> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
> without setting __X32_SYSCALL_BIT would be totally fine (it would
> catch the arch instead of the syscall). This sounds similar to how
> libseccomp is doing things, so these should be fine.

I should clarify: seccomp expects to find whatever is sent as the
syscall nr... as in the __NR_read used like this:

BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

Are there native x32 users yet? What does __NR_read resolve to via the
uapi on a native x32 userspace?

-Kees

> The only project I know of doing blacklisting is lxc, and Eric's
> example looks a lot like a discussion I saw with lxc and init_module.
> :) So it sounds like we can get this right there.
>
> I'd like to avoid carrying a delta on filter logic based on the prctl
> vs syscall entry. Can we find any userspace filters being used that a
> "correct" fix would break? (If so, then yes, we'll need to do this
> proposed "via prctl or via syscall?" change.)
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore  wrote:
> Anyway, getting back to the idea I mentioned earlier ... as many of you may
> know, Kees (added to the CC line) is working on some seccomp filter
> improvements which will result in a new seccomp syscall.  Perhaps one way
> forward is to preserve everything as it is currently with the prctl()
> interface, but with the new seccomp() based interface we fixup x32 and use the
> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the
> kernel, but I don't think it would be too bad, and I think it would address
> both our concerns.

Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
x32 and x86_64 are considered "64-bit". */" should be changed...)

Just so I understand: currently x86_64 and x32 both present as
AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
(due to the set high bit).

The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
without setting __X32_SYSCALL_BIT would be totally fine (it would
catch the arch instead of the syscall). This sounds similar to how
libseccomp is doing things, so these should be fine.

The only project I know of doing blacklisting is lxc, and Eric's
example looks a lot like a discussion I saw with lxc and init_module.
:) So it sounds like we can get this right there.

I'd like to avoid carrying a delta on filter logic based on the prctl
vs syscall entry. Can we find any userspace filters being used that a
"correct" fix would break? (If so, then yes, we'll need to do this
proposed "via prctl or via syscall?" change.)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 02:31:06 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
> > On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:

...

> > > So we have a security interface that is damn near impossible to get
> > > right.  Perfect.
> > 
> > What?  Having to do two comparisons instead of one is "damn near
> > impossible"? I think that might be a bit of an overreaction don't you
> > think?
> 
> Actually no.  How can a normal userspace application coder POSSIBLY know
> this?  Find this thread on an e-mail list, by accident?

I suppose some clarification of perspective is in order.  Personally, I don't 
think a "normal" (which brings up the age old question: what is normal 
anyway?) userspace developer is going to write their own seccomp BPF filter, 
there is just too much of a barrier to entry and it is way too easy to get it 
wrong.  This is why I started the libseccomp project.

However, I think there *may* be a solution to satisfy us both, see below.

> > > I think this explains exactly why I support this idea.  Make X32 look
> > > like everyone else ...
> > 
> > You do realize that this patch set makes x32 the odd man out by having
> > syscall_get_nr() return a different syscall number than what was used to
> > make the syscall?  I don't understand how that makes "x32 look like
> > everyone else".
>
> Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
> audit.  No problem.  We don't need to strip it in syscall_get_nr().
> I'll gladly concede that part of the patch series.
> 
> But given an x86_64 kernel a seccomp filter writer has to know about X32
> and how to write rules to block the X32 ABI.  And I stick with my
> assessment that x32 + seccomp is darn near impossible for a normal
> developer to handle.

No argument here; like I said, I think seccomp BPF filters are too much for 
most folks regardless of the arch.
 
> Heck, even chromium took months to realize that x32 was a weird beast.
> And they got it wrong on their first try.  Their original implementation
> didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
> still not sure it does the right thing.  And they are the EXPERTS.  They
> wrote seccomp!

I think you're helping prove my point.  Or I'm providing yours.  Or something.

However, my point about breaking userspace still stands.  Assuming the kernel 
interface is functional, regardless of warts, I stand fast in that anything 
that breaks the kernel/userspace interface is bad.  

> > > Honestly, how many people are using seccomp on X32 and would be horribly
> > > pissed if we just fixed it?
> > 
> > Okay, please stop suggesting we break the x32 kernel/user interface to
> > workaround a flaw in audit.  I get that it sucks for audit, I really do,
> > but this is audit's problem.
> 
> No one is asking to break X32 to fix audit.  Audit can handle itself.  I
> don't want anything in the kernel to pretend that X32 is X86_64.  It
> isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
> I'm suggesting to fix how seccomp exposes X32 information because it is
> a HORRIBLE interface that even the experts have gotten wrong, over and
> over and over.

See my comments above.
 
> I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
> (Leaving the _X32_SYSCALL_BIT exposed as it is today)
> 
> But I'd love to hear some thoughts on how that is a bad thing.  If no
> one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
> what the fallout from fixing it will be.

This would break the seccomp filter API and any application that uses it 
properly.  I'm not excited about the idea of "let's just break the interface 
and see who notices philosophy"; that's a very bad practice IMHO.

Anyway, getting back to the idea I mentioned earlier ... as many of you may 
know, Kees (added to the CC line) is working on some seccomp filter 
improvements which will result in a new seccomp syscall.  Perhaps one way 
forward is to preserve everything as it is currently with the prctl() 
interface, but with the new seccomp() based interface we fixup x32 and use the 
new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the 
kernel, but I don't think it would be too bad, and I think it would address 
both our concerns.

Thoughts?

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> > > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > > > recevie system calls from any of the x86 architectures, regardless
> > > > > > of
> > > > > > how the program is invoked?  (This is unusual, so normally denying
> > > > > > those
> > > > > > "alien" calls is the right thing to do.)
> > > > > 
> > > > > I obviously can't speak for all seccomp users, but libseccomp handles
> > > > > this
> > > > > by checking the seccomp_data->arch value at the start of the filter
> > > > > and
> > > > > killing (by default) any non-native architectures.  If you want, you
> > > > > can
> > > > > change this default behavior or add support for other architectures
> > > > > (e.g.
> > > > > create a filter that allows both x86-64 and x32 but disallows x86, or
> > > > > any
> > > > > combination of the three for that matter).
> > > > 
> > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > > > interface is crap...  Since seccomp_data->arch can't distinguish between
> > > > X32 and X86_64.  If I write a seccomp filter which says
> > > > 
> > > > KILL arch != x86_64
> > > > KILL init_module
> > > > ALLOW everything else
> > > > 
> > > > I can still call init_module, I just have to use the X32 variant.
> > > > 
> > > > If libseccomp is translating:
> > > > 
> > > > KILL arch != x86_64 into:
> > > > 
> > > > KILL arch != x86_64
> > > > KILL syscall_nr >= 2000
> > > > 
> > > > That's just showing how dumb the kernel interface is...   Good for you
> > > > guys, but the kernel is just being dumb   :)
> > > 
> > > You're not going to hear me ever say that I like how the x32 ABI was done,
> > > it is a real mess from a seccomp filter point of view and we have to do
> > > some nasty stuff in libseccomp to make it all work correctly (see my
> > > comments on the libseccomp-devel list regarding my severe displeasure
> > > over x32), but what's done is done.
> > > 
> > > I think it's too late to change the x32 seccomp filter ABI.
> > 
> > So we have a security interface that is damn near impossible to get
> > right.  Perfect.
> 
> What?  Having to do two comparisons instead of one is "damn near impossible"? 
>  
> I think that might be a bit of an overreaction don't you think?

Actually no.  How can a normal userspace application coder POSSIBLY know
this?  Find this thread on an e-mail list, by accident?  
> 
> > I think this explains exactly why I support this idea.  Make X32 look
> > like everyone else ...
> 
> You do realize that this patch set makes x32 the odd man out by having 
> syscall_get_nr() return a different syscall number than what was used to make 
> the syscall?  I don't understand how that makes "x32 look like everyone else".

Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
audit.  No problem.  We don't need to strip it in syscall_get_nr().
I'll gladly concede that part of the patch series.

But given an x86_64 kernel a seccomp filter writer has to know about X32
and how to write rules to block the X32 ABI.  And I stick with my
assessment that x32 + seccomp is darn near impossible for a normal
developer to handle.  

Heck, even chromium took months to realize that x32 was a weird beast.
And they got it wrong on their first try.  Their original implementation
didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
still not sure it does the right thing.  And they are the EXPERTS.  They
wrote seccomp!

> > Honestly, how many people are using seccomp on X32 and would be horribly
> > pissed if we just fixed it?
> 
> Okay, please stop suggesting we break the x32 kernel/user interface to 
> workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
> this is audit's problem.

No one is asking to break X32 to fix audit.  Audit can handle itself.  I
don't want anything in the kernel to pretend that X32 is X86_64.  It
isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
I'm suggesting to fix how seccomp exposes X32 information because it is
a HORRIBLE interface that even the experts have gotten wrong, over and
over and over.

I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
(Leaving the _X32_SYSCALL_BIT exposed as it is today)

But I'd love to hear some thoughts on how that is a bad thing.  If no
one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
what the fallout from fixing it will be.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:36 AM, Paul Moore wrote:
> 
> Arguably audit is broken anyway by not correctly treating syscall numbers as 
> 32 bit integers like everyone else.
> 

That is really the root cause of the problem.  x86 is not the only
architecture with a sparse syscall numbering scheme (in fact the x32
method was based on the MIPS syscall numbering scheme.)

What syscall_get_nr() returns becomes a matter of definition at that point.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Thursday, July 10, 2014 11:38:13 PM Richard Guy Briggs wrote:
> Commit
>   fca460f h...@zytor.com 2012-02-19 07:56:26 -0800
>   x32: Handle the x32 system call flag
> 
> provided a method to multiplex architecture with the syscall number for X32
> calls.
> 
> Commit
>   8b4b9f2 pmo...@redhat.com 2013-02-15 12:21:43 -0500
>   x86: remove the x32 syscall bitmask from syscall_get_nr()
> 
> broke audit and potentially other users of syscall_get_nr() which depend on
> that call as named.

Arguably audit is broken anyway by not correctly treating syscall numbers as 
32 bit integers like everyone else.

The commit above, 8b4b9f2, changed syscall_get_nr() so that it returned the 
same syscall number that is used by the architecture's ABI; just like every* 
other architecture in the kernel.

* Admittedly I didn't check every architecture's implementation, but after a 
half dozen I stopped checking as there was a definite trend.

{snip}

> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index d6a756a..d58b6be 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -236,6 +236,10 @@ static inline int syscall_get_arch(void)
>   return AUDIT_ARCH_I386;
>  #endif
>   /* Both x32 and x86_64 are considered "64-bit". */
> +#ifdef CONFIG_X86_X32_ABI
> + if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)
> + return AUDIT_ARCH_X86_X32;
> +#endif

No.  See my comments above and in other parts of this thread.

>   return AUDIT_ARCH_X86_64;
>  }
>  #endif   /* CONFIG_X86_32 */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b35c215..bc18214 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data
> *sd)
> 
>   sd->nr = syscall_get_nr(task, regs);
>   sd->arch = syscall_get_arch();
> +#ifdef CONFIG_X86_X32_ABI
> + if (sd->arch == AUDIT_ARCH_X86_X32) {
> + sd->arch = AUDIT_ARCH_X86_64;
> + sd->nr |= __X32_SYSCALL_BIT;
> + }
> +#endif

Once again, I'm not really sure I need to comment further here, but don't 
change syscall_get_nr(), it should return the same syscall number as was used 
by userspace to initiate the syscall.  If you really want to use the new 
AUDIT_ARCH_X86_X32 macro/define, go ahead, but make sure you rewrite it to the 
x86-64 value here so as to not break compatibility with existing seccomp 
filter users.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > > recevie system calls from any of the x86 architectures, regardless
> > > > > of
> > > > > how the program is invoked?  (This is unusual, so normally denying
> > > > > those
> > > > > "alien" calls is the right thing to do.)
> > > > 
> > > > I obviously can't speak for all seccomp users, but libseccomp handles
> > > > this
> > > > by checking the seccomp_data->arch value at the start of the filter
> > > > and
> > > > killing (by default) any non-native architectures.  If you want, you
> > > > can
> > > > change this default behavior or add support for other architectures
> > > > (e.g.
> > > > create a filter that allows both x86-64 and x32 but disallows x86, or
> > > > any
> > > > combination of the three for that matter).
> > > 
> > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > > interface is crap...  Since seccomp_data->arch can't distinguish between
> > > X32 and X86_64.  If I write a seccomp filter which says
> > > 
> > > KILL arch != x86_64
> > > KILL init_module
> > > ALLOW everything else
> > > 
> > > I can still call init_module, I just have to use the X32 variant.
> > > 
> > > If libseccomp is translating:
> > > 
> > > KILL arch != x86_64 into:
> > > 
> > > KILL arch != x86_64
> > > KILL syscall_nr >= 2000
> > > 
> > > That's just showing how dumb the kernel interface is...   Good for you
> > > guys, but the kernel is just being dumb   :)
> > 
> > You're not going to hear me ever say that I like how the x32 ABI was done,
> > it is a real mess from a seccomp filter point of view and we have to do
> > some nasty stuff in libseccomp to make it all work correctly (see my
> > comments on the libseccomp-devel list regarding my severe displeasure
> > over x32), but what's done is done.
> > 
> > I think it's too late to change the x32 seccomp filter ABI.
> 
> So we have a security interface that is damn near impossible to get
> right.  Perfect.

What?  Having to do two comparisons instead of one is "damn near impossible"?  
I think that might be a bit of an overreaction don't you think?

> I think this explains exactly why I support this idea.  Make X32 look
> like everyone else ...

You do realize that this patch set makes x32 the odd man out by having 
syscall_get_nr() return a different syscall number than what was used to make 
the syscall?  I don't understand how that makes "x32 look like everyone else".

> ... and put these custom horrific hacks in seccomp if we are unwilling to
> 'do it right'

If you want to add the new x32 audit arch #define, go for it, like I said that 
was something that I feel should have been in there from the beginning.  As 
far as I'm concerned you can even put a hack in kernel/seccomp.c to rewrite 
the arch token value if it makes your life easier.

> Honestly, how many people are using seccomp on X32 and would be horribly
> pissed if we just fixed it?

Okay, please stop suggesting we break the x32 kernel/user interface to 
workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
this is audit's problem.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:23 AM, Eric Paris wrote:
>>
>> You're not going to hear me ever say that I like how the x32 ABI was done, 
>> it 
>> is a real mess from a seccomp filter point of view and we have to do some 
>> nasty stuff in libseccomp to make it all work correctly (see my comments on 
>> the libseccomp-devel list regarding my severe displeasure over x32), but 
>> what's done is done.
>>
>> I think it's too late to change the x32 seccomp filter ABI.
> 
> So we have a security interface that is damn near impossible to get
> right.  Perfect.
> 
> I think this explains exactly why I support this idea.  Make X32 look
> like everyone else and put these custom horrific hacks in seccomp if we
> are unwilling to 'do it right'
> 
> Honestly, how many people are using seccomp on X32 and would be horribly
> pissed if we just fixed it?
> 

The bigger issue is probably if we will open a problem with the older
kernels.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > recevie system calls from any of the x86 architectures, regardless of
> > > > how the program is invoked?  (This is unusual, so normally denying those
> > > > "alien" calls is the right thing to do.)
> > > 
> > > I obviously can't speak for all seccomp users, but libseccomp handles this
> > > by checking the seccomp_data->arch value at the start of the filter and
> > > killing (by default) any non-native architectures.  If you want, you can
> > > change this default behavior or add support for other architectures (e.g.
> > > create a filter that allows both x86-64 and x32 but disallows x86, or any
> > > combination of the three for that matter).
> > 
> > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > interface is crap...  Since seccomp_data->arch can't distinguish between
> > X32 and X86_64.  If I write a seccomp filter which says
> > 
> > KILL arch != x86_64
> > KILL init_module
> > ALLOW everything else
> > 
> > I can still call init_module, I just have to use the X32 variant.
> > 
> > If libseccomp is translating:
> > 
> > KILL arch != x86_64 into:
> > 
> > KILL arch != x86_64
> > KILL syscall_nr >= 2000
> > 
> > That's just showing how dumb the kernel interface is...   Good for you
> > guys, but the kernel is just being dumb   :)
> 
> You're not going to hear me ever say that I like how the x32 ABI was done, it 
> is a real mess from a seccomp filter point of view and we have to do some 
> nasty stuff in libseccomp to make it all work correctly (see my comments on 
> the libseccomp-devel list regarding my severe displeasure over x32), but 
> what's done is done.
> 
> I think it's too late to change the x32 seccomp filter ABI.

So we have a security interface that is damn near impossible to get
right.  Perfect.

I think this explains exactly why I support this idea.  Make X32 look
like everyone else and put these custom horrific hacks in seccomp if we
are unwilling to 'do it right'

Honestly, how many people are using seccomp on X32 and would be horribly
pissed if we just fixed it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > recevie system calls from any of the x86 architectures, regardless of
> > > how the program is invoked?  (This is unusual, so normally denying those
> > > "alien" calls is the right thing to do.)
> > 
> > I obviously can't speak for all seccomp users, but libseccomp handles this
> > by checking the seccomp_data->arch value at the start of the filter and
> > killing (by default) any non-native architectures.  If you want, you can
> > change this default behavior or add support for other architectures (e.g.
> > create a filter that allows both x86-64 and x32 but disallows x86, or any
> > combination of the three for that matter).
> 
> Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> interface is crap...  Since seccomp_data->arch can't distinguish between
> X32 and X86_64.  If I write a seccomp filter which says
> 
> KILL arch != x86_64
> KILL init_module
> ALLOW everything else
> 
> I can still call init_module, I just have to use the X32 variant.
> 
> If libseccomp is translating:
> 
> KILL arch != x86_64 into:
> 
> KILL arch != x86_64
> KILL syscall_nr >= 2000
> 
> That's just showing how dumb the kernel interface is...   Good for you
> guys, but the kernel is just being dumb   :)

You're not going to hear me ever say that I like how the x32 ABI was done, it 
is a real mess from a seccomp filter point of view and we have to do some 
nasty stuff in libseccomp to make it all work correctly (see my comments on 
the libseccomp-devel list regarding my severe displeasure over x32), but 
what's done is done.

I think it's too late to change the x32 seccomp filter ABI.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > Incidentally: do seccomp users know that on an x86-64 system you can
> > recevie system calls from any of the x86 architectures, regardless of
> > how the program is invoked?  (This is unusual, so normally denying those
> > "alien" calls is the right thing to do.)
> 
> I obviously can't speak for all seccomp users, but libseccomp handles this by 
> checking the seccomp_data->arch value at the start of the filter and killing 
> (by default) any non-native architectures.  If you want, you can change this 
> default behavior or add support for other architectures (e.g. create a filter 
> that allows both x86-64 and x32 but disallows x86, or any combination of the 
> three for that matter).

Maybe libseccomp does some HORRIFIC contortions under the hood, but the
interface is crap...  Since seccomp_data->arch can't distinguish between
X32 and X86_64.  If I write a seccomp filter which says

KILL arch != x86_64
KILL init_module
ALLOW everything else

I can still call init_module, I just have to use the X32 variant.

If libseccomp is translating:

KILL arch != x86_64 into:

KILL arch != x86_64
KILL syscall_nr >= 2000

That's just showing how dumb the kernel interface is...   Good for you
guys, but the kernel is just being dumb   :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:11 AM, Paul Moore wrote:
> On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
>> Incidentally: do seccomp users know that on an x86-64 system you can
>> recevie system calls from any of the x86 architectures, regardless of
>> how the program is invoked?  (This is unusual, so normally denying those
>> "alien" calls is the right thing to do.)
> 
> I obviously can't speak for all seccomp users, but libseccomp handles this by 
> checking the seccomp_data->arch value at the start of the filter and killing 
> (by default) any non-native architectures.  If you want, you can change this 
> default behavior or add support for other architectures (e.g. create a filter 
> that allows both x86-64 and x32 but disallows x86, or any combination of the 
> three for that matter).
> 

OK, that seems reasonable.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> Incidentally: do seccomp users know that on an x86-64 system you can
> recevie system calls from any of the x86 architectures, regardless of
> how the program is invoked?  (This is unusual, so normally denying those
> "alien" calls is the right thing to do.)

I obviously can't speak for all seccomp users, but libseccomp handles this by 
checking the seccomp_data->arch value at the start of the filter and killing 
(by default) any non-native architectures.  If you want, you can change this 
default behavior or add support for other architectures (e.g. create a filter 
that allows both x86-64 and x32 but disallows x86, or any combination of the 
three for that matter).

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
 Incidentally: do seccomp users know that on an x86-64 system you can
 recevie system calls from any of the x86 architectures, regardless of
 how the program is invoked?  (This is unusual, so normally denying those
 alien calls is the right thing to do.)

I obviously can't speak for all seccomp users, but libseccomp handles this by 
checking the seccomp_data-arch value at the start of the filter and killing 
(by default) any non-native architectures.  If you want, you can change this 
default behavior or add support for other architectures (e.g. create a filter 
that allows both x86-64 and x32 but disallows x86, or any combination of the 
three for that matter).

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:11 AM, Paul Moore wrote:
 On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
 Incidentally: do seccomp users know that on an x86-64 system you can
 recevie system calls from any of the x86 architectures, regardless of
 how the program is invoked?  (This is unusual, so normally denying those
 alien calls is the right thing to do.)
 
 I obviously can't speak for all seccomp users, but libseccomp handles this by 
 checking the seccomp_data-arch value at the start of the filter and killing 
 (by default) any non-native architectures.  If you want, you can change this 
 default behavior or add support for other architectures (e.g. create a filter 
 that allows both x86-64 and x32 but disallows x86, or any combination of the 
 three for that matter).
 

OK, that seems reasonable.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
 On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
  Incidentally: do seccomp users know that on an x86-64 system you can
  recevie system calls from any of the x86 architectures, regardless of
  how the program is invoked?  (This is unusual, so normally denying those
  alien calls is the right thing to do.)
 
 I obviously can't speak for all seccomp users, but libseccomp handles this by 
 checking the seccomp_data-arch value at the start of the filter and killing 
 (by default) any non-native architectures.  If you want, you can change this 
 default behavior or add support for other architectures (e.g. create a filter 
 that allows both x86-64 and x32 but disallows x86, or any combination of the 
 three for that matter).

Maybe libseccomp does some HORRIFIC contortions under the hood, but the
interface is crap...  Since seccomp_data-arch can't distinguish between
X32 and X86_64.  If I write a seccomp filter which says

KILL arch != x86_64
KILL init_module
ALLOW everything else

I can still call init_module, I just have to use the X32 variant.

If libseccomp is translating:

KILL arch != x86_64 into:

KILL arch != x86_64
KILL syscall_nr = 2000

That's just showing how dumb the kernel interface is...   Good for you
guys, but the kernel is just being dumb   :)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
 On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
  On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
   Incidentally: do seccomp users know that on an x86-64 system you can
   recevie system calls from any of the x86 architectures, regardless of
   how the program is invoked?  (This is unusual, so normally denying those
   alien calls is the right thing to do.)
  
  I obviously can't speak for all seccomp users, but libseccomp handles this
  by checking the seccomp_data-arch value at the start of the filter and
  killing (by default) any non-native architectures.  If you want, you can
  change this default behavior or add support for other architectures (e.g.
  create a filter that allows both x86-64 and x32 but disallows x86, or any
  combination of the three for that matter).
 
 Maybe libseccomp does some HORRIFIC contortions under the hood, but the
 interface is crap...  Since seccomp_data-arch can't distinguish between
 X32 and X86_64.  If I write a seccomp filter which says
 
 KILL arch != x86_64
 KILL init_module
 ALLOW everything else
 
 I can still call init_module, I just have to use the X32 variant.
 
 If libseccomp is translating:
 
 KILL arch != x86_64 into:
 
 KILL arch != x86_64
 KILL syscall_nr = 2000
 
 That's just showing how dumb the kernel interface is...   Good for you
 guys, but the kernel is just being dumb   :)

You're not going to hear me ever say that I like how the x32 ABI was done, it 
is a real mess from a seccomp filter point of view and we have to do some 
nasty stuff in libseccomp to make it all work correctly (see my comments on 
the libseccomp-devel list regarding my severe displeasure over x32), but 
what's done is done.

I think it's too late to change the x32 seccomp filter ABI.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
 On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
  On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
   On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
Incidentally: do seccomp users know that on an x86-64 system you can
recevie system calls from any of the x86 architectures, regardless of
how the program is invoked?  (This is unusual, so normally denying those
alien calls is the right thing to do.)
   
   I obviously can't speak for all seccomp users, but libseccomp handles this
   by checking the seccomp_data-arch value at the start of the filter and
   killing (by default) any non-native architectures.  If you want, you can
   change this default behavior or add support for other architectures (e.g.
   create a filter that allows both x86-64 and x32 but disallows x86, or any
   combination of the three for that matter).
  
  Maybe libseccomp does some HORRIFIC contortions under the hood, but the
  interface is crap...  Since seccomp_data-arch can't distinguish between
  X32 and X86_64.  If I write a seccomp filter which says
  
  KILL arch != x86_64
  KILL init_module
  ALLOW everything else
  
  I can still call init_module, I just have to use the X32 variant.
  
  If libseccomp is translating:
  
  KILL arch != x86_64 into:
  
  KILL arch != x86_64
  KILL syscall_nr = 2000
  
  That's just showing how dumb the kernel interface is...   Good for you
  guys, but the kernel is just being dumb   :)
 
 You're not going to hear me ever say that I like how the x32 ABI was done, it 
 is a real mess from a seccomp filter point of view and we have to do some 
 nasty stuff in libseccomp to make it all work correctly (see my comments on 
 the libseccomp-devel list regarding my severe displeasure over x32), but 
 what's done is done.
 
 I think it's too late to change the x32 seccomp filter ABI.

So we have a security interface that is damn near impossible to get
right.  Perfect.

I think this explains exactly why I support this idea.  Make X32 look
like everyone else and put these custom horrific hacks in seccomp if we
are unwilling to 'do it right'

Honestly, how many people are using seccomp on X32 and would be horribly
pissed if we just fixed it?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:23 AM, Eric Paris wrote:

 You're not going to hear me ever say that I like how the x32 ABI was done, 
 it 
 is a real mess from a seccomp filter point of view and we have to do some 
 nasty stuff in libseccomp to make it all work correctly (see my comments on 
 the libseccomp-devel list regarding my severe displeasure over x32), but 
 what's done is done.

 I think it's too late to change the x32 seccomp filter ABI.
 
 So we have a security interface that is damn near impossible to get
 right.  Perfect.
 
 I think this explains exactly why I support this idea.  Make X32 look
 like everyone else and put these custom horrific hacks in seccomp if we
 are unwilling to 'do it right'
 
 Honestly, how many people are using seccomp on X32 and would be horribly
 pissed if we just fixed it?
 

The bigger issue is probably if we will open a problem with the older
kernels.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
 On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
  On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
   On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
 Incidentally: do seccomp users know that on an x86-64 system you can
 recevie system calls from any of the x86 architectures, regardless
 of
 how the program is invoked?  (This is unusual, so normally denying
 those
 alien calls is the right thing to do.)

I obviously can't speak for all seccomp users, but libseccomp handles
this
by checking the seccomp_data-arch value at the start of the filter
and
killing (by default) any non-native architectures.  If you want, you
can
change this default behavior or add support for other architectures
(e.g.
create a filter that allows both x86-64 and x32 but disallows x86, or
any
combination of the three for that matter).
   
   Maybe libseccomp does some HORRIFIC contortions under the hood, but the
   interface is crap...  Since seccomp_data-arch can't distinguish between
   X32 and X86_64.  If I write a seccomp filter which says
   
   KILL arch != x86_64
   KILL init_module
   ALLOW everything else
   
   I can still call init_module, I just have to use the X32 variant.
   
   If libseccomp is translating:
   
   KILL arch != x86_64 into:
   
   KILL arch != x86_64
   KILL syscall_nr = 2000
   
   That's just showing how dumb the kernel interface is...   Good for you
   guys, but the kernel is just being dumb   :)
  
  You're not going to hear me ever say that I like how the x32 ABI was done,
  it is a real mess from a seccomp filter point of view and we have to do
  some nasty stuff in libseccomp to make it all work correctly (see my
  comments on the libseccomp-devel list regarding my severe displeasure
  over x32), but what's done is done.
  
  I think it's too late to change the x32 seccomp filter ABI.
 
 So we have a security interface that is damn near impossible to get
 right.  Perfect.

What?  Having to do two comparisons instead of one is damn near impossible?  
I think that might be a bit of an overreaction don't you think?

 I think this explains exactly why I support this idea.  Make X32 look
 like everyone else ...

You do realize that this patch set makes x32 the odd man out by having 
syscall_get_nr() return a different syscall number than what was used to make 
the syscall?  I don't understand how that makes x32 look like everyone else.

 ... and put these custom horrific hacks in seccomp if we are unwilling to
 'do it right'

If you want to add the new x32 audit arch #define, go for it, like I said that 
was something that I feel should have been in there from the beginning.  As 
far as I'm concerned you can even put a hack in kernel/seccomp.c to rewrite 
the arch token value if it makes your life easier.

 Honestly, how many people are using seccomp on X32 and would be horribly
 pissed if we just fixed it?

Okay, please stop suggesting we break the x32 kernel/user interface to 
workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
this is audit's problem.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Thursday, July 10, 2014 11:38:13 PM Richard Guy Briggs wrote:
 Commit
   fca460f h...@zytor.com 2012-02-19 07:56:26 -0800
   x32: Handle the x32 system call flag
 
 provided a method to multiplex architecture with the syscall number for X32
 calls.
 
 Commit
   8b4b9f2 pmo...@redhat.com 2013-02-15 12:21:43 -0500
   x86: remove the x32 syscall bitmask from syscall_get_nr()
 
 broke audit and potentially other users of syscall_get_nr() which depend on
 that call as named.

Arguably audit is broken anyway by not correctly treating syscall numbers as 
32 bit integers like everyone else.

The commit above, 8b4b9f2, changed syscall_get_nr() so that it returned the 
same syscall number that is used by the architecture's ABI; just like every* 
other architecture in the kernel.

* Admittedly I didn't check every architecture's implementation, but after a 
half dozen I stopped checking as there was a definite trend.

{snip}

 diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
 index d6a756a..d58b6be 100644
 --- a/arch/x86/include/asm/syscall.h
 +++ b/arch/x86/include/asm/syscall.h
 @@ -236,6 +236,10 @@ static inline int syscall_get_arch(void)
   return AUDIT_ARCH_I386;
  #endif
   /* Both x32 and x86_64 are considered 64-bit. */
 +#ifdef CONFIG_X86_X32_ABI
 + if (task_pt_regs(current)-orig_ax  __X32_SYSCALL_BIT)
 + return AUDIT_ARCH_X86_X32;
 +#endif

No.  See my comments above and in other parts of this thread.

   return AUDIT_ARCH_X86_64;
  }
  #endif   /* CONFIG_X86_32 */
 diff --git a/kernel/seccomp.c b/kernel/seccomp.c
 index b35c215..bc18214 100644
 --- a/kernel/seccomp.c
 +++ b/kernel/seccomp.c
 @@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data
 *sd)
 
   sd-nr = syscall_get_nr(task, regs);
   sd-arch = syscall_get_arch();
 +#ifdef CONFIG_X86_X32_ABI
 + if (sd-arch == AUDIT_ARCH_X86_X32) {
 + sd-arch = AUDIT_ARCH_X86_64;
 + sd-nr |= __X32_SYSCALL_BIT;
 + }
 +#endif

Once again, I'm not really sure I need to comment further here, but don't 
change syscall_get_nr(), it should return the same syscall number as was used 
by userspace to initiate the syscall.  If you really want to use the new 
AUDIT_ARCH_X86_X32 macro/define, go ahead, but make sure you rewrite it to the 
x86-64 value here so as to not break compatibility with existing seccomp 
filter users.

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
On 07/11/2014 09:36 AM, Paul Moore wrote:
 
 Arguably audit is broken anyway by not correctly treating syscall numbers as 
 32 bit integers like everyone else.
 

That is really the root cause of the problem.  x86 is not the only
architecture with a sparse syscall numbering scheme (in fact the x32
method was based on the MIPS syscall numbering scheme.)

What syscall_get_nr() returns becomes a matter of definition at that point.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
 On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
  On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
   On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
 On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
  Incidentally: do seccomp users know that on an x86-64 system you can
  recevie system calls from any of the x86 architectures, regardless
  of
  how the program is invoked?  (This is unusual, so normally denying
  those
  alien calls is the right thing to do.)
 
 I obviously can't speak for all seccomp users, but libseccomp handles
 this
 by checking the seccomp_data-arch value at the start of the filter
 and
 killing (by default) any non-native architectures.  If you want, you
 can
 change this default behavior or add support for other architectures
 (e.g.
 create a filter that allows both x86-64 and x32 but disallows x86, or
 any
 combination of the three for that matter).

Maybe libseccomp does some HORRIFIC contortions under the hood, but the
interface is crap...  Since seccomp_data-arch can't distinguish between
X32 and X86_64.  If I write a seccomp filter which says

KILL arch != x86_64
KILL init_module
ALLOW everything else

I can still call init_module, I just have to use the X32 variant.

If libseccomp is translating:

KILL arch != x86_64 into:

KILL arch != x86_64
KILL syscall_nr = 2000

That's just showing how dumb the kernel interface is...   Good for you
guys, but the kernel is just being dumb   :)
   
   You're not going to hear me ever say that I like how the x32 ABI was done,
   it is a real mess from a seccomp filter point of view and we have to do
   some nasty stuff in libseccomp to make it all work correctly (see my
   comments on the libseccomp-devel list regarding my severe displeasure
   over x32), but what's done is done.
   
   I think it's too late to change the x32 seccomp filter ABI.
  
  So we have a security interface that is damn near impossible to get
  right.  Perfect.
 
 What?  Having to do two comparisons instead of one is damn near impossible? 
  
 I think that might be a bit of an overreaction don't you think?

Actually no.  How can a normal userspace application coder POSSIBLY know
this?  Find this thread on an e-mail list, by accident?  
 
  I think this explains exactly why I support this idea.  Make X32 look
  like everyone else ...
 
 You do realize that this patch set makes x32 the odd man out by having 
 syscall_get_nr() return a different syscall number than what was used to make 
 the syscall?  I don't understand how that makes x32 look like everyone else.

Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
audit.  No problem.  We don't need to strip it in syscall_get_nr().
I'll gladly concede that part of the patch series.

But given an x86_64 kernel a seccomp filter writer has to know about X32
and how to write rules to block the X32 ABI.  And I stick with my
assessment that x32 + seccomp is darn near impossible for a normal
developer to handle.  

Heck, even chromium took months to realize that x32 was a weird beast.
And they got it wrong on their first try.  Their original implementation
didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
still not sure it does the right thing.  And they are the EXPERTS.  They
wrote seccomp!

  Honestly, how many people are using seccomp on X32 and would be horribly
  pissed if we just fixed it?
 
 Okay, please stop suggesting we break the x32 kernel/user interface to 
 workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
 this is audit's problem.

No one is asking to break X32 to fix audit.  Audit can handle itself.  I
don't want anything in the kernel to pretend that X32 is X86_64.  It
isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
I'm suggesting to fix how seccomp exposes X32 information because it is
a HORRIBLE interface that even the experts have gotten wrong, over and
over and over.

I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
(Leaving the _X32_SYSCALL_BIT exposed as it is today)

But I'd love to hear some thoughts on how that is a bad thing.  If no
one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
what the fallout from fixing it will be.

-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Paul Moore
On Friday, July 11, 2014 02:31:06 PM Eric Paris wrote:
 On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
  On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:

...

   So we have a security interface that is damn near impossible to get
   right.  Perfect.
  
  What?  Having to do two comparisons instead of one is damn near
  impossible? I think that might be a bit of an overreaction don't you
  think?
 
 Actually no.  How can a normal userspace application coder POSSIBLY know
 this?  Find this thread on an e-mail list, by accident?

I suppose some clarification of perspective is in order.  Personally, I don't 
think a normal (which brings up the age old question: what is normal 
anyway?) userspace developer is going to write their own seccomp BPF filter, 
there is just too much of a barrier to entry and it is way too easy to get it 
wrong.  This is why I started the libseccomp project.

However, I think there *may* be a solution to satisfy us both, see below.

   I think this explains exactly why I support this idea.  Make X32 look
   like everyone else ...
  
  You do realize that this patch set makes x32 the odd man out by having
  syscall_get_nr() return a different syscall number than what was used to
  make the syscall?  I don't understand how that makes x32 look like
  everyone else.

 Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
 audit.  No problem.  We don't need to strip it in syscall_get_nr().
 I'll gladly concede that part of the patch series.
 
 But given an x86_64 kernel a seccomp filter writer has to know about X32
 and how to write rules to block the X32 ABI.  And I stick with my
 assessment that x32 + seccomp is darn near impossible for a normal
 developer to handle.

No argument here; like I said, I think seccomp BPF filters are too much for 
most folks regardless of the arch.
 
 Heck, even chromium took months to realize that x32 was a weird beast.
 And they got it wrong on their first try.  Their original implementation
 didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
 still not sure it does the right thing.  And they are the EXPERTS.  They
 wrote seccomp!

I think you're helping prove my point.  Or I'm providing yours.  Or something.

However, my point about breaking userspace still stands.  Assuming the kernel 
interface is functional, regardless of warts, I stand fast in that anything 
that breaks the kernel/userspace interface is bad.  

   Honestly, how many people are using seccomp on X32 and would be horribly
   pissed if we just fixed it?
  
  Okay, please stop suggesting we break the x32 kernel/user interface to
  workaround a flaw in audit.  I get that it sucks for audit, I really do,
  but this is audit's problem.
 
 No one is asking to break X32 to fix audit.  Audit can handle itself.  I
 don't want anything in the kernel to pretend that X32 is X86_64.  It
 isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
 I'm suggesting to fix how seccomp exposes X32 information because it is
 a HORRIBLE interface that even the experts have gotten wrong, over and
 over and over.

See my comments above.
 
 I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
 (Leaving the _X32_SYSCALL_BIT exposed as it is today)
 
 But I'd love to hear some thoughts on how that is a bad thing.  If no
 one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
 what the fallout from fixing it will be.

This would break the seccomp filter API and any application that uses it 
properly.  I'm not excited about the idea of let's just break the interface 
and see who notices philosophy; that's a very bad practice IMHO.

Anyway, getting back to the idea I mentioned earlier ... as many of you may 
know, Kees (added to the CC line) is working on some seccomp filter 
improvements which will result in a new seccomp syscall.  Perhaps one way 
forward is to preserve everything as it is currently with the prctl() 
interface, but with the new seccomp() based interface we fixup x32 and use the 
new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the 
kernel, but I don't think it would be too bad, and I think it would address 
both our concerns.

Thoughts?

-- 
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore pmo...@redhat.com wrote:
 Anyway, getting back to the idea I mentioned earlier ... as many of you may
 know, Kees (added to the CC line) is working on some seccomp filter
 improvements which will result in a new seccomp syscall.  Perhaps one way
 forward is to preserve everything as it is currently with the prctl()
 interface, but with the new seccomp() based interface we fixup x32 and use the
 new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the
 kernel, but I don't think it would be too bad, and I think it would address
 both our concerns.

Adding AUDIT_ARCH_X32: yes please. (On that note, the comment /* Both
x32 and x86_64 are considered 64-bit. */ should be changed...)

Just so I understand: currently x86_64 and x32 both present as
AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
(due to the set high bit).

The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
without setting __X32_SYSCALL_BIT would be totally fine (it would
catch the arch instead of the syscall). This sounds similar to how
libseccomp is doing things, so these should be fine.

The only project I know of doing blacklisting is lxc, and Eric's
example looks a lot like a discussion I saw with lxc and init_module.
:) So it sounds like we can get this right there.

I'd like to avoid carrying a delta on filter logic based on the prctl
vs syscall entry. Can we find any userspace filters being used that a
correct fix would break? (If so, then yes, we'll need to do this
proposed via prctl or via syscall? change.)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook keesc...@chromium.org wrote:
 On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore pmo...@redhat.com wrote:
 Anyway, getting back to the idea I mentioned earlier ... as many of you may
 know, Kees (added to the CC line) is working on some seccomp filter
 improvements which will result in a new seccomp syscall.  Perhaps one way
 forward is to preserve everything as it is currently with the prctl()
 interface, but with the new seccomp() based interface we fixup x32 and use 
 the
 new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of 
 the
 kernel, but I don't think it would be too bad, and I think it would address
 both our concerns.

 Adding AUDIT_ARCH_X32: yes please. (On that note, the comment /* Both
 x32 and x86_64 are considered 64-bit. */ should be changed...)

 Just so I understand: currently x86_64 and x32 both present as
 AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
 (due to the set high bit).

 The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
 whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
 without setting __X32_SYSCALL_BIT would be totally fine (it would
 catch the arch instead of the syscall). This sounds similar to how
 libseccomp is doing things, so these should be fine.

I should clarify: seccomp expects to find whatever is sent as the
syscall nr... as in the __NR_read used like this:

BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

Are there native x32 users yet? What does __NR_read resolve to via the
uapi on a native x32 userspace?

-Kees

 The only project I know of doing blacklisting is lxc, and Eric's
 example looks a lot like a discussion I saw with lxc and init_module.
 :) So it sounds like we can get this right there.

 I'd like to avoid carrying a delta on filter logic based on the prctl
 vs syscall entry. Can we find any userspace filters being used that a
 correct fix would break? (If so, then yes, we'll need to do this
 proposed via prctl or via syscall? change.)

 -Kees

 --
 Kees Cook
 Chrome OS Security



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread H. Peter Anvin
It includes the X32 bit.

On July 11, 2014 3:52:42 PM PDT, Kees Cook keesc...@chromium.org wrote:
On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook keesc...@chromium.org
wrote:
 On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore pmo...@redhat.com
wrote:
 Anyway, getting back to the idea I mentioned earlier ... as many of
you may
 know, Kees (added to the CC line) is working on some seccomp filter
 improvements which will result in a new seccomp syscall.  Perhaps
one way
 forward is to preserve everything as it is currently with the
prctl()
 interface, but with the new seccomp() based interface we fixup x32
and use the
 new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
some of the
 kernel, but I don't think it would be too bad, and I think it would
address
 both our concerns.

 Adding AUDIT_ARCH_X32: yes please. (On that note, the comment /*
Both
 x32 and x86_64 are considered 64-bit. */ should be changed...)

 Just so I understand: currently x86_64 and x32 both present as
 AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
 (due to the set high bit).

 The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
 whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
 without setting __X32_SYSCALL_BIT would be totally fine (it would
 catch the arch instead of the syscall). This sounds similar to how
 libseccomp is doing things, so these should be fine.

I should clarify: seccomp expects to find whatever is sent as the
syscall nr... as in the __NR_read used like this:

BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

Are there native x32 users yet? What does __NR_read resolve to via the
uapi on a native x32 userspace?

-Kees

 The only project I know of doing blacklisting is lxc, and Eric's
 example looks a lot like a discussion I saw with lxc and init_module.
 :) So it sounds like we can get this right there.

 I'd like to avoid carrying a delta on filter logic based on the prctl
 vs syscall entry. Can we find any userspace filters being used that a
 correct fix would break? (If so, then yes, we'll need to do this
 proposed via prctl or via syscall? change.)

 -Kees

 --
 Kees Cook
 Chrome OS Security

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Kees Cook
On Fri, Jul 11, 2014 at 3:55 PM, H. Peter Anvin h...@zytor.com wrote:
 It includes the X32 bit.

If the uapi for __NR_* includes the x32 bit, then that's what seccomp
filters must be seeing. Building seccomp filters is documented to use
the __NR_* values.

-Kees


 On July 11, 2014 3:52:42 PM PDT, Kees Cook keesc...@chromium.org wrote:
On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook keesc...@chromium.org
wrote:
 On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore pmo...@redhat.com
wrote:
 Anyway, getting back to the idea I mentioned earlier ... as many of
you may
 know, Kees (added to the CC line) is working on some seccomp filter
 improvements which will result in a new seccomp syscall.  Perhaps
one way
 forward is to preserve everything as it is currently with the
prctl()
 interface, but with the new seccomp() based interface we fixup x32
and use the
 new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
some of the
 kernel, but I don't think it would be too bad, and I think it would
address
 both our concerns.

 Adding AUDIT_ARCH_X32: yes please. (On that note, the comment /*
Both
 x32 and x86_64 are considered 64-bit. */ should be changed...)

 Just so I understand: currently x86_64 and x32 both present as
 AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
 (due to the set high bit).

 The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
 whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
 without setting __X32_SYSCALL_BIT would be totally fine (it would
 catch the arch instead of the syscall). This sounds similar to how
 libseccomp is doing things, so these should be fine.

I should clarify: seccomp expects to find whatever is sent as the
syscall nr... as in the __NR_read used like this:

BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

Are there native x32 users yet? What does __NR_read resolve to via the
uapi on a native x32 userspace?

-Kees

 The only project I know of doing blacklisting is lxc, and Eric's
 example looks a lot like a discussion I saw with lxc and init_module.
 :) So it sounds like we can get this right there.

 I'd like to avoid carrying a delta on filter logic based on the prctl
 vs syscall entry. Can we find any userspace filters being used that a
 correct fix would break? (If so, then yes, we'll need to do this
 proposed via prctl or via syscall? change.)

 -Kees

 --
 Kees Cook
 Chrome OS Security

 --
 Sent from my mobile phone.  Please pardon brevity and lack of formatting.



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Andy Lutomirski
On Jul 11, 2014 3:48 PM, Kees Cook keesc...@chromium.org wrote:

 On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore pmo...@redhat.com wrote:
  Anyway, getting back to the idea I mentioned earlier ... as many of you may
  know, Kees (added to the CC line) is working on some seccomp filter
  improvements which will result in a new seccomp syscall.  Perhaps one way
  forward is to preserve everything as it is currently with the prctl()
  interface, but with the new seccomp() based interface we fixup x32 and use 
  the
  new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of 
  the
  kernel, but I don't think it would be too bad, and I think it would address
  both our concerns.

 Adding AUDIT_ARCH_X32: yes please. (On that note, the comment /* Both
 x32 and x86_64 are considered 64-bit. */ should be changed...)


I admit I'm not convinced that the current situation is really wrong.
The audit arch uniquely defines a mapping between the actual syscall
and nr and the regs.  The natural way to convert between seccomp_data
and the syscall entry and regs is completely correct.

For things like ARM OABI, the situation is different: OABI and EABI
really do work differently wrt the interpretation of the syscall regs.
This isn't the case for x32.

Of course, the audit code screws this up completely, but audit is
disabled for x32.  My upcoming seccomp patchset will clean it up a
little.  IMO there's no actual audit ABI to preserve for x32, because
that code has never been anything other than terminally fvcked.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-10 Thread H. Peter Anvin
On 07/10/2014 08:38 PM, Richard Guy Briggs wrote:
> Commit
>   fca460f h...@zytor.com 2012-02-19 07:56:26 -0800
>   x32: Handle the x32 system call flag
> 
> provided a method to multiplex architecture with the syscall number for X32
> calls.
> 
> Commit
>   8b4b9f2 pmo...@redhat.com 2013-02-15 12:21:43 -0500
>   x86: remove the x32 syscall bitmask from syscall_get_nr()
> 
> broke audit and potentially other users of syscall_get_nr() which depend on
> that call as named.
> 
> Commit
>   audit: add AUDIT_ARCH_X86_X32 arch definition
> 
> is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for
> syscall_get_arch().
> 
> This patch along with reverting 8b4b9f2 should satisfy other regular users of
> syscall_get_nr() without changing the seccomp interface to BPF.
> 

Incidentally: do seccomp users know that on an x86-64 system you can
recevie system calls from any of the x86 architectures, regardless of
how the program is invoked?  (This is unusual, so normally denying those
"alien" calls is the right thing to do.)

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-10 Thread H. Peter Anvin
On 07/10/2014 08:38 PM, Richard Guy Briggs wrote:
 Commit
   fca460f h...@zytor.com 2012-02-19 07:56:26 -0800
   x32: Handle the x32 system call flag
 
 provided a method to multiplex architecture with the syscall number for X32
 calls.
 
 Commit
   8b4b9f2 pmo...@redhat.com 2013-02-15 12:21:43 -0500
   x86: remove the x32 syscall bitmask from syscall_get_nr()
 
 broke audit and potentially other users of syscall_get_nr() which depend on
 that call as named.
 
 Commit
   audit: add AUDIT_ARCH_X86_X32 arch definition
 
 is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for
 syscall_get_arch().
 
 This patch along with reverting 8b4b9f2 should satisfy other regular users of
 syscall_get_nr() without changing the seccomp interface to BPF.
 

Incidentally: do seccomp users know that on an x86-64 system you can
recevie system calls from any of the x86 architectures, regardless of
how the program is invoked?  (This is unusual, so normally denying those
alien calls is the right thing to do.)

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/