Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Steven Rostedt
On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar  wrote:


> The bona fide removal of a real instruction from a true hot 
> path is generally always worth doing, you don't even have 
> to 'prove' that it improves things: unless the claim is 
> that for some really robust reason the instruction was zero 
> cost to begin with.

I agree that removing instructions from hot paths are for the most part
worth doing without any proof, as long as it doesn't change behavior or
increase the memory footprint. But that's not the case here. If the
change does affect behavior or increases memory footprint, then it
should prove itself as worth doing.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Steven Rostedt
On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar  wrote:

> So the main question here is not whether it's worth doing 
> it, the question is the cost of the removal:
> 
>   - the change in syscall number overflow handling
> behavior. (We might not want the new behavior)
> 
>   - the increase in the syscall table size.

Yep, totally agree.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Ingo Molnar

* Steven Rostedt  wrote:

> > That would require a branch insn. The whole idea of 
> > masking was merely to avoid that branch. If you need a 
> > branch, then you can as well just retain current code.
> 
> I'm just curious, do all these micro optimizations have 
> any real impact on real use cases?

The bona fide removal of a real instruction from a true hot 
path is generally always worth doing, you don't even have 
to 'prove' that it improves things: unless the claim is 
that for some really robust reason the instruction was zero 
cost to begin with.

So the main question here is not whether it's worth doing 
it, the question is the cost of the removal:

  - the change in syscall number overflow handling
behavior. (We might not want the new behavior)

  - the increase in the syscall table size.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Steven Rostedt
On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar mi...@kernel.org wrote:

 So the main question here is not whether it's worth doing 
 it, the question is the cost of the removal:
 
   - the change in syscall number overflow handling
 behavior. (We might not want the new behavior)
 
   - the increase in the syscall table size.

Yep, totally agree.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Steven Rostedt
On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar mi...@kernel.org wrote:


 The bona fide removal of a real instruction from a true hot 
 path is generally always worth doing, you don't even have 
 to 'prove' that it improves things: unless the claim is 
 that for some really robust reason the instruction was zero 
 cost to begin with.

I agree that removing instructions from hot paths are for the most part
worth doing without any proof, as long as it doesn't change behavior or
increase the memory footprint. But that's not the case here. If the
change does affect behavior or increases memory footprint, then it
should prove itself as worth doing.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-26 Thread Ingo Molnar

* Steven Rostedt rost...@goodmis.org wrote:

  That would require a branch insn. The whole idea of 
  masking was merely to avoid that branch. If you need a 
  branch, then you can as well just retain current code.
 
 I'm just curious, do all these micro optimizations have 
 any real impact on real use cases?

The bona fide removal of a real instruction from a true hot 
path is generally always worth doing, you don't even have 
to 'prove' that it improves things: unless the claim is 
that for some really robust reason the instruction was zero 
cost to begin with.

So the main question here is not whether it's worth doing 
it, the question is the cost of the removal:

  - the change in syscall number overflow handling
behavior. (We might not want the new behavior)

  - the increase in the syscall table size.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Borislav Petkov
On Wed, Feb 25, 2015 at 11:01:29AM -0500, Steven Rostedt wrote:
> I'm just curious, do all these micro optimizations have any real impact
> on real use cases?
> 
> That is, if we are going to make the system less robust, shouldn't we
> show that it has real benefit?

I'm wondering the same thing this whole time. Is it even worth
the bother if those "improvements" don't show on any reasonable
benchmark...? And maybe we should benchmark stuff first and then
apply.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Steven Rostedt
On Wed, 25 Feb 2015 16:40:49 +0100
Denys Vlasenko  wrote:

> >> The downside would be that if we ever grow past 1024
> >> syscall entries we'll be in trouble if new userspace calls
> >> syscall 513 on an old kernel and gets syscall 1.
> >
> > What if we test against ~0x3ff and jump to sys_ni if anything is set.
> > This would still work with new userspace calls on older kernels.
> 
> That would require a branch insn. The whole idea of masking
> was merely to avoid that branch. If you need a branch,
> then you can as well just retain current code.

I'm just curious, do all these micro optimizations have any real impact
on real use cases?

That is, if we are going to make the system less robust, shouldn't we
show that it has real benefit?

And yes, I consider user space passing in a system call of 0x401 and
having it work the same as 0x1 as being less robust.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Denys Vlasenko
On Wed, Feb 25, 2015 at 3:43 PM, Steven Rostedt  wrote:
> On Wed, 25 Feb 2015 10:20:43 +0100 Ingo Molnar  wrote:
>> But, if we do that, we can do even better, and also do an
>> optimization of the 64-bit entry path as well: we could
>> simply mask RAX with 0x3ff and not do a compare. Pad the
>> syscall table up to 0x400 (1024) entries and fill in the
>> table with sys_ni syscall entries.
>>
>> This is valid on 64-bit and 32-bit kernels as well, and it
>> allows the removal of a compare from the syscall entry
>> path, at the cost of a couple of kilobytes of unused
>> syscall table.
>>
>> The downside would be that if we ever grow past 1024
>> syscall entries we'll be in trouble if new userspace calls
>> syscall 513 on an old kernel and gets syscall 1.
>
> What if we test against ~0x3ff and jump to sys_ni if anything is set.
> This would still work with new userspace calls on older kernels.

That would require a branch insn. The whole idea of masking
was merely to avoid that branch. If you need a branch,
then you can as well just retain current code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Steven Rostedt
On Wed, 25 Feb 2015 10:20:43 +0100
Ingo Molnar  wrote:


> But, if we do that, we can do even better, and also do an 
> optimization of the 64-bit entry path as well: we could 
> simply mask RAX with 0x3ff and not do a compare. Pad the 
> syscall table up to 0x400 (1024) entries and fill in the 
> table with sys_ni syscall entries.
> 
> This is valid on 64-bit and 32-bit kernels as well, and it 
> allows the removal of a compare from the syscall entry 
> path, at the cost of a couple of kilobytes of unused 
> syscall table.
> 
> The downside would be that if we ever grow past 1024 
> syscall entries we'll be in trouble if new userspace calls 
> syscall 513 on an old kernel and gets syscall 1.

What if we test against ~0x3ff and jump to sys_ni if anything is set.
This would still work with new userspace calls on older kernels.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> > So could we just zap the high 32 bits of RAX early in 
> > the entry code, and then from that point on we could 
> > both use 32-bit ops and won't have to remember the 
> > possibility either?
> 
> We do that, [...]

Ok, indeed, so in ia32_sysenter_target() we have:

movl%eax, %eax

> [...] but people keep "optimizing" the zero extend away. 
> [...]

Possibly because there's not a single comment near that 
code explaining the importance of that line. But nobody 
will get a change past me with such a warning next to the 
line.

> [...]  We have had this cause a wide-open security hole 
> twice already.  So the extra REX prefix is a cheap cost 
> to avoid this happen again.

But since we already zap the high bits, there's no point in 
doing 64-bit compares...

Just make sure the high zero bit clearing is there and is 
never removed.

So in that sense the changes are correct, even in the 
security robustness sense.

Furthermore, with the masking suggestion I made in the 
previous mail it's moot as we can solve both problems: 
64-bit uses of RAX will become correct as well, and it
will be a bit faster as well.

Hm?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread H. Peter Anvin

On 02/25/2015 01:20 AM, Ingo Molnar wrote:


I think the fundamental fragility is that we allow the high
32 bits to be nonzero.

So could we just zap the high 32 bits of RAX early in the
entry code, and then from that point on we could both use
32-bit ops and won't have to remember the possibility
either?



We do that, but people keep "optimizing" the zero extend away.  We have 
had this cause a wide-open security hole twice already.  So the extra 
REX prefix is a cheap cost to avoid this happen again.


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
> > On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  
> > wrote:
> >>
> >> In all three 32-bit entry points, %eax is 
> >> zero-extended to %rax. It is safe to do 32-bit compare 
> >> when checking that syscall# is not too large.
> > 
> > Applied.  Thanks!
> > 
> 
> NAK NAK NAK NAK NAK
> 
> We have already had this turn into a security issue not 
> just once but TWICE, because someone decided to 
> "optimize" the path by taking out the zero extend.
> 
> The use of a 64-bit compare here is an intentional "belts 
> and suspenders" safety issue.

I think the fundamental fragility is that we allow the high 
32 bits to be nonzero.

So could we just zap the high 32 bits of RAX early in the 
entry code, and then from that point on we could both use 
32-bit ops and won't have to remember the possibility 
either?

That's arguably one more (cheap) instruction in the 32-bit 
entry paths but then we could use the shorter 32-bit 
instructions for compares and other uses and could always 
be certain that we get what we want.


But, if we do that, we can do even better, and also do an 
optimization of the 64-bit entry path as well: we could 
simply mask RAX with 0x3ff and not do a compare. Pad the 
syscall table up to 0x400 (1024) entries and fill in the 
table with sys_ni syscall entries.

This is valid on 64-bit and 32-bit kernels as well, and it 
allows the removal of a compare from the syscall entry 
path, at the cost of a couple of kilobytes of unused 
syscall table.

The downside would be that if we ever grow past 1024 
syscall entries we'll be in trouble if new userspace calls 
syscall 513 on an old kernel and gets syscall 1.

I doubt we'll ever get so many syscalls, and user-space 
will be able to be smart in any case, so it's not a 
showstopper.

This is the safest and quickest implementation as well.

Thoughts?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Steven Rostedt
On Wed, 25 Feb 2015 10:20:43 +0100
Ingo Molnar mi...@kernel.org wrote:


 But, if we do that, we can do even better, and also do an 
 optimization of the 64-bit entry path as well: we could 
 simply mask RAX with 0x3ff and not do a compare. Pad the 
 syscall table up to 0x400 (1024) entries and fill in the 
 table with sys_ni syscall entries.
 
 This is valid on 64-bit and 32-bit kernels as well, and it 
 allows the removal of a compare from the syscall entry 
 path, at the cost of a couple of kilobytes of unused 
 syscall table.
 
 The downside would be that if we ever grow past 1024 
 syscall entries we'll be in trouble if new userspace calls 
 syscall 513 on an old kernel and gets syscall 1.

What if we test against ~0x3ff and jump to sys_ni if anything is set.
This would still work with new userspace calls on older kernels.

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Denys Vlasenko
On Wed, Feb 25, 2015 at 3:43 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Wed, 25 Feb 2015 10:20:43 +0100 Ingo Molnar mi...@kernel.org wrote:
 But, if we do that, we can do even better, and also do an
 optimization of the 64-bit entry path as well: we could
 simply mask RAX with 0x3ff and not do a compare. Pad the
 syscall table up to 0x400 (1024) entries and fill in the
 table with sys_ni syscall entries.

 This is valid on 64-bit and 32-bit kernels as well, and it
 allows the removal of a compare from the syscall entry
 path, at the cost of a couple of kilobytes of unused
 syscall table.

 The downside would be that if we ever grow past 1024
 syscall entries we'll be in trouble if new userspace calls
 syscall 513 on an old kernel and gets syscall 1.

 What if we test against ~0x3ff and jump to sys_ni if anything is set.
 This would still work with new userspace calls on older kernels.

That would require a branch insn. The whole idea of masking
was merely to avoid that branch. If you need a branch,
then you can as well just retain current code.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Borislav Petkov
On Wed, Feb 25, 2015 at 11:01:29AM -0500, Steven Rostedt wrote:
 I'm just curious, do all these micro optimizations have any real impact
 on real use cases?
 
 That is, if we are going to make the system less robust, shouldn't we
 show that it has real benefit?

I'm wondering the same thing this whole time. Is it even worth
the bother if those improvements don't show on any reasonable
benchmark...? And maybe we should benchmark stuff first and then
apply.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Steven Rostedt
On Wed, 25 Feb 2015 16:40:49 +0100
Denys Vlasenko vda.li...@googlemail.com wrote:

  The downside would be that if we ever grow past 1024
  syscall entries we'll be in trouble if new userspace calls
  syscall 513 on an old kernel and gets syscall 1.
 
  What if we test against ~0x3ff and jump to sys_ni if anything is set.
  This would still work with new userspace calls on older kernels.
 
 That would require a branch insn. The whole idea of masking
 was merely to avoid that branch. If you need a branch,
 then you can as well just retain current code.

I'm just curious, do all these micro optimizations have any real impact
on real use cases?

That is, if we are going to make the system less robust, shouldn't we
show that it has real benefit?

And yes, I consider user space passing in a system call of 0x401 and
having it work the same as 0x1 as being less robust.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

  So could we just zap the high 32 bits of RAX early in 
  the entry code, and then from that point on we could 
  both use 32-bit ops and won't have to remember the 
  possibility either?
 
 We do that, [...]

Ok, indeed, so in ia32_sysenter_target() we have:

movl%eax, %eax

 [...] but people keep optimizing the zero extend away. 
 [...]

Possibly because there's not a single comment near that 
code explaining the importance of that line. But nobody 
will get a change past me with such a warning next to the 
line.

 [...]  We have had this cause a wide-open security hole 
 twice already.  So the extra REX prefix is a cheap cost 
 to avoid this happen again.

But since we already zap the high bits, there's no point in 
doing 64-bit compares...

Just make sure the high zero bit clearing is there and is 
never removed.

So in that sense the changes are correct, even in the 
security robustness sense.

Furthermore, with the masking suggestion I made in the 
previous mail it's moot as we can solve both problems: 
64-bit uses of RAX will become correct as well, and it
will be a bit faster as well.

Hm?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

 On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
  On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com 
  wrote:
 
  In all three 32-bit entry points, %eax is 
  zero-extended to %rax. It is safe to do 32-bit compare 
  when checking that syscall# is not too large.
  
  Applied.  Thanks!
  
 
 NAK NAK NAK NAK NAK
 
 We have already had this turn into a security issue not 
 just once but TWICE, because someone decided to 
 optimize the path by taking out the zero extend.
 
 The use of a 64-bit compare here is an intentional belts 
 and suspenders safety issue.

I think the fundamental fragility is that we allow the high 
32 bits to be nonzero.

So could we just zap the high 32 bits of RAX early in the 
entry code, and then from that point on we could both use 
32-bit ops and won't have to remember the possibility 
either?

That's arguably one more (cheap) instruction in the 32-bit 
entry paths but then we could use the shorter 32-bit 
instructions for compares and other uses and could always 
be certain that we get what we want.


But, if we do that, we can do even better, and also do an 
optimization of the 64-bit entry path as well: we could 
simply mask RAX with 0x3ff and not do a compare. Pad the 
syscall table up to 0x400 (1024) entries and fill in the 
table with sys_ni syscall entries.

This is valid on 64-bit and 32-bit kernels as well, and it 
allows the removal of a compare from the syscall entry 
path, at the cost of a couple of kilobytes of unused 
syscall table.

The downside would be that if we ever grow past 1024 
syscall entries we'll be in trouble if new userspace calls 
syscall 513 on an old kernel and gets syscall 1.

I doubt we'll ever get so many syscalls, and user-space 
will be able to be smart in any case, so it's not a 
showstopper.

This is the safest and quickest implementation as well.

Thoughts?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-25 Thread H. Peter Anvin

On 02/25/2015 01:20 AM, Ingo Molnar wrote:


I think the fundamental fragility is that we allow the high
32 bits to be nonzero.

So could we just zap the high 32 bits of RAX early in the
entry code, and then from that point on we could both use
32-bit ops and won't have to remember the possibility
either?



We do that, but people keep optimizing the zero extend away.  We have 
had this cause a wide-open security hole twice already.  So the extra 
REX prefix is a cheap cost to avoid this happen again.


-hpa
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
On Wed, Feb 25, 2015 at 12:03 AM, Andy Lutomirski  wrote:
>> Actually this part should have been broken up.  The word "several" in
>> the patch description is by itself a cause to NAK the patch.
>
> Point taken.
>
> Denys, can you fix this and send a v2 of the entire series with the
> traps.c fix as well?

Ok, working on it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 3:01 PM, H. Peter Anvin  wrote:
> On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
>> On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin  wrote:
>>> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  
 wrote:
> In all three 32-bit entry points, %eax is zero-extended to %rax.
> It is safe to do 32-bit compare when checking that syscall#
> is not too large.

 Applied.  Thanks!

>>>
>>> NAK NAK NAK NAK NAK
>>>
>>> We have already had this turn into a security issue not just once but
>>> TWICE, because someone decided to "optimize" the path by taking out the
>>> zero extend.
>>>
>>> The use of a 64-bit compare here is an intentional "belts and
>>> suspenders" safety issue.
>>
>> Fair enough.  OK if I just undo that part of this patch?
>>
>
> Actually this part should have been broken up.  The word "several" in
> the patch description is by itself a cause to NAK the patch.

Point taken.

Denys, can you fix this and send a v2 of the entire series with the
traps.c fix as well?

Thanks,
Andy

>
> -hpa
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
> On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin  wrote:
>> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
>>> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  
>>> wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.
>>>
>>> Applied.  Thanks!
>>>
>>
>> NAK NAK NAK NAK NAK
>>
>> We have already had this turn into a security issue not just once but
>> TWICE, because someone decided to "optimize" the path by taking out the
>> zero extend.
>>
>> The use of a 64-bit compare here is an intentional "belts and
>> suspenders" safety issue.
> 
> Fair enough.  OK if I just undo that part of this patch?
> 

Actually this part should have been broken up.  The word "several" in
the patch description is by itself a cause to NAK the patch.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin  wrote:
> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
>> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  wrote:
>>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>>> It is safe to do 32-bit compare when checking that syscall#
>>> is not too large.
>>
>> Applied.  Thanks!
>>
>
> NAK NAK NAK NAK NAK
>
> We have already had this turn into a security issue not just once but
> TWICE, because someone decided to "optimize" the path by taking out the
> zero extend.
>
> The use of a 64-bit compare here is an intentional "belts and
> suspenders" safety issue.

Fair enough.  OK if I just undo that part of this patch?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  wrote:
>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>> It is safe to do 32-bit compare when checking that syscall#
>> is not too large.
> 
> Applied.  Thanks!
> 

NAK NAK NAK NAK NAK

We have already had this turn into a security issue not just once but
TWICE, because someone decided to "optimize" the path by taking out the
zero extend.

The use of a 64-bit compare here is an intentional "belts and
suspenders" safety issue.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 12:36 PM, Borislav Petkov wrote:
> On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
>> They aren't equal.  $1 and $2 in two lowest bits will also
>> be interpreted as "userspace" here. "Equal to $3" sends
>> a wrong message here to a human reading the code,
>> the code doesn't test for CPL=3, it tests for any nonzero CPL.
> 
> Doh, of course.
> 
> I was going to propose to make it even more explicit:
> 
>   andb $3, CS(%rsp)...
> 
> but that's destructive.
> 
> So yours makes it less misleading, actually. To me at least and
> obviously.
> 

Yes, please use JZ/JNZ with TEST.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko  wrote:
> In all three 32-bit entry points, %eax is zero-extended to %rax.
> It is safe to do 32-bit compare when checking that syscall#
> is not too large.

Applied.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Borislav Petkov
On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
> They aren't equal.  $1 and $2 in two lowest bits will also
> be interpreted as "userspace" here. "Equal to $3" sends
> a wrong message here to a human reading the code,
> the code doesn't test for CPL=3, it tests for any nonzero CPL.

Doh, of course.

I was going to propose to make it even more explicit:

andb $3, CS(%rsp)...

but that's destructive.

So yours makes it less misleading, actually. To me at least and
obviously.

:-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
On Tue, Feb 24, 2015 at 8:58 PM, Borislav Petkov  wrote:
> On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>> It is safe to do 32-bit compare when checking that syscall#
>> is not too large.
>>
>> The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.
>>
>> The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
>> is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
>> instead of 5-byte one.
>>
>> After TEST insn, JE anctually means "jump of zero",
>> let's use JZ mnemonic instead.
>
> Actually, JE == LZ as that's the same opcode for testing ZF=1.

Yes, I know that :)

> And I have to object:
>
> testl $3,CS(%rsp)
> je retint_kernel
>
> is much more understandable than
>
> testl $3,CS(%rsp)
> jz retint_kernel
>
> It basically says are $3 and CS(%rsp) equal.

They aren't equal.  $1 and $2 in two lowest bits will also
be interpreted as "userspace" here. "Equal to $3" sends
a wrong message here to a human reading the code,
the code doesn't test for CPL=3, it tests for any nonzero CPL.

> JZ, on the other hand, not so clear: the TEST ANDed the two operands and
> set flags accordingly, so JZ is testing the ZF. This means, you actually
> know what TEST does and you haven't forgotten.

JZ says "jump if zero", in this case, "jump if CPL is zero".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Borislav Petkov
On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
> In all three 32-bit entry points, %eax is zero-extended to %rax.
> It is safe to do 32-bit compare when checking that syscall#
> is not too large.
> 
> The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.
> 
> The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
> is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
> instead of 5-byte one.
> 
> After TEST insn, JE anctually means "jump of zero",
> let's use JZ mnemonic instead.

Actually, JE == LZ as that's the same opcode for testing ZF=1.

And I have to object:

testl $3,CS(%rsp)
je retint_kernel

is much more understandable than

testl $3,CS(%rsp)
jz retint_kernel

It basically says are $3 and CS(%rsp) equal.

JZ, on the other hand, not so clear: the TEST ANDed the two operands and
set flags accordingly, so JZ is testing the ZF. This means, you actually
know what TEST does and you haven't forgotten.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
In all three 32-bit entry points, %eax is zero-extended to %rax.
It is safe to do 32-bit compare when checking that syscall#
is not too large.

The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.

The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
instead of 5-byte one.

After TEST insn, JE anctually means "jump of zero",
let's use JZ mnemonic instead.

At irq_return_via_sysret:
* avoid redundant load of %r11 (it is already loaded a few instructions before).
* do not needlessly increment %rsp - we are going to return to userspace
  via SYSRET, this insn doesn't use stack for return.

Signed-off-by: Denys Vlasenko 
CC: Linus Torvalds 
CC: Steven Rostedt 
CC: Ingo Molnar 
CC: Borislav Petkov 
CC: "H. Peter Anvin" 
CC: Andy Lutomirski 
CC: Oleg Nesterov 
CC: Frederic Weisbecker 
CC: Alexei Starovoitov 
CC: Will Drewry 
CC: Kees Cook 
CC: x...@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S  | 14 +++---
 arch/x86/include/asm/calling.h |  3 +++
 arch/x86/kernel/entry_64.S | 13 ++---
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 6dcd372..01eca80 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -163,8 +163,8 @@ sysenter_flags_fixed:
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
-   jnz  sysenter_tracesys
-   cmpq$(IA32_NR_syscalls-1),%rax
+   jnz sysenter_tracesys
+   cmpl$(IA32_NR_syscalls-1),%eax
ja  ia32_badsys
 sysenter_do_call:
/* 32bit syscall -> 64bit C ABI argument conversion */
@@ -350,9 +350,9 @@ ENTRY(ia32_cstar_target)
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
-   jnz   cstar_tracesys
-   cmpq $IA32_NR_syscalls-1,%rax
-   ja  ia32_badsys
+   jnz cstar_tracesys
+   cmpl$(IA32_NR_syscalls-1),%eax
+   ja  ia32_badsys
 cstar_do_call:
/* 32bit syscall -> 64bit C ABI argument conversion */
movl%edi,%r8d   /* arg5 */
@@ -473,7 +473,7 @@ ENTRY(ia32_syscall)
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
jnz ia32_tracesys
-   cmpq $(IA32_NR_syscalls-1),%rax
+   cmpl $(IA32_NR_syscalls-1),%eax
ja ia32_badsys
 ia32_do_call:
/* 32bit syscall -> 64bit C ABI argument conversion */
@@ -536,7 +536,7 @@ ia32_ptregs_common:
CFI_ENDPROC
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
-   CFI_DEF_CFA rsp,SS+8
+   CFI_DEF_CFA rsp,SIZEOF_PTREGS
CFI_REL_OFFSET  rax,RAX
CFI_REL_OFFSET  rcx,RCX
CFI_REL_OFFSET  rdx,RDX
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 3374235..f1a962f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -176,6 +176,9 @@ For 32-bit we have the following conventions - kernel is 
built with
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
+   .macro RESTORE_C_REGS_EXCEPT_R11
+   RESTORE_C_REGS_HELPER 1,1,0,1,1
+   .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 64f2fd3..b93f3a2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -312,7 +312,7 @@ int_ret_from_sys_call_fixup:
/* Do syscall tracing */
 tracesys:
movq %rsp, %rdi
-   movq $AUDIT_ARCH_X86_64, %rsi
+   movl $AUDIT_ARCH_X86_64, %esi
call syscall_trace_enter_phase1
test %rax, %rax
jnz tracesys_phase2 /* if needed, run the slow path */
@@ -323,7 +323,7 @@ tracesys_phase2:
SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %rdi
movq %rsp, %rdi
-   movq $AUDIT_ARCH_X86_64, %rsi
+   movl $AUDIT_ARCH_X86_64, %esi
movq %rax,%rdx
call syscall_trace_enter_phase2
 
@@ -686,7 +686,7 @@ ret_from_intr:
 exit_intr:
GET_THREAD_INFO(%rcx)
testl $3,CS(%rsp)
-   je retint_kernel
+   jz retint_kernel
 
/* Interrupt came from user space */
/*
@@ -742,7 +742,7 @@ retint_swapgs:  /* return to user-space */
cmpq %r11,EFLAGS(%rsp)  /* R11 == RFLAGS */
jne opportunistic_sysret_failed
 
-   testq $X86_EFLAGS_RF,%r11   /* sysret can't restore RF */
+   testl $X86_EFLAGS_RF,%r11d  /* sysret can't restore RF */
jnz opportunistic_sysret_failed
 
/* nothing to check for RSP */
@@ -756,9 +756,8 @@ retint_swapgs:  /* return to user-space */

Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Borislav Petkov
On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.
 
 The last instance of mysterious SS+8 constant is replaced by SIZEOF_PTREGS.
 
 The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
 is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
 instead of 5-byte one.
 
 After TEST insn, JE anctually means jump of zero,
 let's use JZ mnemonic instead.

Actually, JE == LZ as that's the same opcode for testing ZF=1.

And I have to object:

testl $3,CS(%rsp)
je retint_kernel

is much more understandable than

testl $3,CS(%rsp)
jz retint_kernel

It basically says are $3 and CS(%rsp) equal.

JZ, on the other hand, not so clear: the TEST ANDed the two operands and
set flags accordingly, so JZ is testing the ZF. This means, you actually
know what TEST does and you haven't forgotten.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Borislav Petkov
On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
 They aren't equal.  $1 and $2 in two lowest bits will also
 be interpreted as userspace here. Equal to $3 sends
 a wrong message here to a human reading the code,
 the code doesn't test for CPL=3, it tests for any nonzero CPL.

Doh, of course.

I was going to propose to make it even more explicit:

andb $3, CS(%rsp)...

but that's destructive.

So yours makes it less misleading, actually. To me at least and
obviously.

:-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
On Tue, Feb 24, 2015 at 8:58 PM, Borislav Petkov b...@alien8.de wrote:
 On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.

 The last instance of mysterious SS+8 constant is replaced by SIZEOF_PTREGS.

 The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
 is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
 instead of 5-byte one.

 After TEST insn, JE anctually means jump of zero,
 let's use JZ mnemonic instead.

 Actually, JE == LZ as that's the same opcode for testing ZF=1.

Yes, I know that :)

 And I have to object:

 testl $3,CS(%rsp)
 je retint_kernel

 is much more understandable than

 testl $3,CS(%rsp)
 jz retint_kernel

 It basically says are $3 and CS(%rsp) equal.

They aren't equal.  $1 and $2 in two lowest bits will also
be interpreted as userspace here. Equal to $3 sends
a wrong message here to a human reading the code,
the code doesn't test for CPL=3, it tests for any nonzero CPL.

 JZ, on the other hand, not so clear: the TEST ANDed the two operands and
 set flags accordingly, so JZ is testing the ZF. This means, you actually
 know what TEST does and you haven't forgotten.

JZ says jump if zero, in this case, jump if CPL is zero.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
In all three 32-bit entry points, %eax is zero-extended to %rax.
It is safe to do 32-bit compare when checking that syscall#
is not too large.

The last instance of mysterious SS+8 constant is replaced by SIZEOF_PTREGS.

The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
instead of 5-byte one.

After TEST insn, JE anctually means jump of zero,
let's use JZ mnemonic instead.

At irq_return_via_sysret:
* avoid redundant load of %r11 (it is already loaded a few instructions before).
* do not needlessly increment %rsp - we are going to return to userspace
  via SYSRET, this insn doesn't use stack for return.

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Linus Torvalds torva...@linux-foundation.org
CC: Steven Rostedt rost...@goodmis.org
CC: Ingo Molnar mi...@kernel.org
CC: Borislav Petkov b...@alien8.de
CC: H. Peter Anvin h...@zytor.com
CC: Andy Lutomirski l...@amacapital.net
CC: Oleg Nesterov o...@redhat.com
CC: Frederic Weisbecker fweis...@gmail.com
CC: Alexei Starovoitov a...@plumgrid.com
CC: Will Drewry w...@chromium.org
CC: Kees Cook keesc...@chromium.org
CC: x...@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S  | 14 +++---
 arch/x86/include/asm/calling.h |  3 +++
 arch/x86/kernel/entry_64.S | 13 ++---
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 6dcd372..01eca80 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -163,8 +163,8 @@ sysenter_flags_fixed:
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
-   jnz  sysenter_tracesys
-   cmpq$(IA32_NR_syscalls-1),%rax
+   jnz sysenter_tracesys
+   cmpl$(IA32_NR_syscalls-1),%eax
ja  ia32_badsys
 sysenter_do_call:
/* 32bit syscall - 64bit C ABI argument conversion */
@@ -350,9 +350,9 @@ ENTRY(ia32_cstar_target)
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
-   jnz   cstar_tracesys
-   cmpq $IA32_NR_syscalls-1,%rax
-   ja  ia32_badsys
+   jnz cstar_tracesys
+   cmpl$(IA32_NR_syscalls-1),%eax
+   ja  ia32_badsys
 cstar_do_call:
/* 32bit syscall - 64bit C ABI argument conversion */
movl%edi,%r8d   /* arg5 */
@@ -473,7 +473,7 @@ ENTRY(ia32_syscall)
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
jnz ia32_tracesys
-   cmpq $(IA32_NR_syscalls-1),%rax
+   cmpl $(IA32_NR_syscalls-1),%eax
ja ia32_badsys
 ia32_do_call:
/* 32bit syscall - 64bit C ABI argument conversion */
@@ -536,7 +536,7 @@ ia32_ptregs_common:
CFI_ENDPROC
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
-   CFI_DEF_CFA rsp,SS+8
+   CFI_DEF_CFA rsp,SIZEOF_PTREGS
CFI_REL_OFFSET  rax,RAX
CFI_REL_OFFSET  rcx,RCX
CFI_REL_OFFSET  rdx,RDX
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 3374235..f1a962f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -176,6 +176,9 @@ For 32-bit we have the following conventions - kernel is 
built with
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
+   .macro RESTORE_C_REGS_EXCEPT_R11
+   RESTORE_C_REGS_HELPER 1,1,0,1,1
+   .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 64f2fd3..b93f3a2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -312,7 +312,7 @@ int_ret_from_sys_call_fixup:
/* Do syscall tracing */
 tracesys:
movq %rsp, %rdi
-   movq $AUDIT_ARCH_X86_64, %rsi
+   movl $AUDIT_ARCH_X86_64, %esi
call syscall_trace_enter_phase1
test %rax, %rax
jnz tracesys_phase2 /* if needed, run the slow path */
@@ -323,7 +323,7 @@ tracesys_phase2:
SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %rdi
movq %rsp, %rdi
-   movq $AUDIT_ARCH_X86_64, %rsi
+   movl $AUDIT_ARCH_X86_64, %esi
movq %rax,%rdx
call syscall_trace_enter_phase2
 
@@ -686,7 +686,7 @@ ret_from_intr:
 exit_intr:
GET_THREAD_INFO(%rcx)
testl $3,CS(%rsp)
-   je retint_kernel
+   jz retint_kernel
 
/* Interrupt came from user space */
/*
@@ -742,7 +742,7 @@ retint_swapgs:  /* return to user-space */
cmpq %r11,EFLAGS(%rsp)  /* R11 == RFLAGS */
jne opportunistic_sysret_failed
 
-   testq $X86_EFLAGS_RF,%r11   /* sysret can't restore RF */
+   testl 

Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.
 
 Applied.  Thanks!
 

NAK NAK NAK NAK NAK

We have already had this turn into a security issue not just once but
TWICE, because someone decided to optimize the path by taking out the
zero extend.

The use of a 64-bit compare here is an intentional belts and
suspenders safety issue.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 3:01 PM, H. Peter Anvin h...@zytor.com wrote:
 On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin h...@zytor.com wrote:
 On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com 
 wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.

 Applied.  Thanks!


 NAK NAK NAK NAK NAK

 We have already had this turn into a security issue not just once but
 TWICE, because someone decided to optimize the path by taking out the
 zero extend.

 The use of a 64-bit compare here is an intentional belts and
 suspenders safety issue.

 Fair enough.  OK if I just undo that part of this patch?


 Actually this part should have been broken up.  The word several in
 the patch description is by itself a cause to NAK the patch.

Point taken.

Denys, can you fix this and send a v2 of the entire series with the
traps.c fix as well?

Thanks,
Andy


 -hpa





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 12:36 PM, Borislav Petkov wrote:
 On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
 They aren't equal.  $1 and $2 in two lowest bits will also
 be interpreted as userspace here. Equal to $3 sends
 a wrong message here to a human reading the code,
 the code doesn't test for CPL=3, it tests for any nonzero CPL.
 
 Doh, of course.
 
 I was going to propose to make it even more explicit:
 
   andb $3, CS(%rsp)...
 
 but that's destructive.
 
 So yours makes it less misleading, actually. To me at least and
 obviously.
 

Yes, please use JZ/JNZ with TEST.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin h...@zytor.com wrote:
 On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.

 Applied.  Thanks!


 NAK NAK NAK NAK NAK

 We have already had this turn into a security issue not just once but
 TWICE, because someone decided to optimize the path by taking out the
 zero extend.

 The use of a 64-bit compare here is an intentional belts and
 suspenders safety issue.

Fair enough.  OK if I just undo that part of this patch?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Denys Vlasenko
On Wed, Feb 25, 2015 at 12:03 AM, Andy Lutomirski l...@amacapital.net wrote:
 Actually this part should have been broken up.  The word several in
 the patch description is by itself a cause to NAK the patch.

 Point taken.

 Denys, can you fix this and send a v2 of the entire series with the
 traps.c fix as well?

Ok, working on it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread Andy Lutomirski
On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.

Applied.  Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns

2015-02-24 Thread H. Peter Anvin
On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin h...@zytor.com wrote:
 On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
 On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko dvlas...@redhat.com 
 wrote:
 In all three 32-bit entry points, %eax is zero-extended to %rax.
 It is safe to do 32-bit compare when checking that syscall#
 is not too large.

 Applied.  Thanks!


 NAK NAK NAK NAK NAK

 We have already had this turn into a security issue not just once but
 TWICE, because someone decided to optimize the path by taking out the
 zero extend.

 The use of a 64-bit compare here is an intentional belts and
 suspenders safety issue.
 
 Fair enough.  OK if I just undo that part of this patch?
 

Actually this part should have been broken up.  The word several in
the patch description is by itself a cause to NAK the patch.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/