Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-29 Thread Boris Lukashev
If i understand the current direction for smalloc, its to implement it
without the ability to "unseal," which has implications on how LSM
implementations and other users of these dynamic allocations handle
things. If its implemented without a writeable interface for modules
which need it, then switched to a CoW (or other
controlled/isolated/hardened) mechanism for creating writeable
segments in dynamic allocations, that would create another set of
breaking changes for the consumers adapting to the first set. In the
spirit of adopting things faster/smoother, maybe it makes sense to
permit the smalloc re-writing method suggested in the first
implementation; with an API which consumers can later use for a more
secure approach that reduces the attack surface without breaking
consumer interfaces (or internal semantics). A sysctl affecting the
behavior as ro-only or ro-and-sometimes-rw would give users the
flexibility to tune their environment per their needs, which should
also reduce potential conflict/complaints.

On Sun, May 28, 2017 at 5:32 PM, Kees Cook  wrote:
> On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
>  wrote:
>> So what about a middle ground where CoW semantics are used to enforce
>> the state of these allocations as RO, but provide a strictly
>> controlled pathway to read the RO data, copy and modify it, then write
>> and seal into a new allocation. Successful return from this process
>> should permit the page table to change the pointer to where the object
>> now resides, and initiate freeing of the original memory so long as a
>> refcount is kept for accesses. That way, sealable memory is sealed,
>> and any consumers reading it will be using the original ptr to the
>> original smalloc region. Attackers who do manage to change the
>
> This could be another way to do it, yeah, and it helps that smalloc()
> is built on vmalloc(). It'd require some careful design, but it could
> be a way forward after this initial sealed-after-init version goes in.
>
>> Lastly, my meager understanding is that PAX set the entire kernel as
>> RO, and implemented writeable access via pax_open/close. How were they
>> fighting against race conditions, and what is the benefit of specific
>> regions being allocated this way as opposed to the RO-all-the-things
>> approach which makes writes a specialized set of operations?
>
> My understanding is that PaX's KERNEXEC with the constification plugin
> moves a substantial portion of the kernel's .data section
> (effectively) into the .rodata section. It's not the "entire" kernel.
> (Well, depending on how you count. The .text section is already
> read-only upstream.) PaX, as far as I know, provided no dynamic memory
> allocation protections, like smalloc() would provide.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus


Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-29 Thread Boris Lukashev
If i understand the current direction for smalloc, its to implement it
without the ability to "unseal," which has implications on how LSM
implementations and other users of these dynamic allocations handle
things. If its implemented without a writeable interface for modules
which need it, then switched to a CoW (or other
controlled/isolated/hardened) mechanism for creating writeable
segments in dynamic allocations, that would create another set of
breaking changes for the consumers adapting to the first set. In the
spirit of adopting things faster/smoother, maybe it makes sense to
permit the smalloc re-writing method suggested in the first
implementation; with an API which consumers can later use for a more
secure approach that reduces the attack surface without breaking
consumer interfaces (or internal semantics). A sysctl affecting the
behavior as ro-only or ro-and-sometimes-rw would give users the
flexibility to tune their environment per their needs, which should
also reduce potential conflict/complaints.

On Sun, May 28, 2017 at 5:32 PM, Kees Cook  wrote:
> On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
>  wrote:
>> So what about a middle ground where CoW semantics are used to enforce
>> the state of these allocations as RO, but provide a strictly
>> controlled pathway to read the RO data, copy and modify it, then write
>> and seal into a new allocation. Successful return from this process
>> should permit the page table to change the pointer to where the object
>> now resides, and initiate freeing of the original memory so long as a
>> refcount is kept for accesses. That way, sealable memory is sealed,
>> and any consumers reading it will be using the original ptr to the
>> original smalloc region. Attackers who do manage to change the
>
> This could be another way to do it, yeah, and it helps that smalloc()
> is built on vmalloc(). It'd require some careful design, but it could
> be a way forward after this initial sealed-after-init version goes in.
>
>> Lastly, my meager understanding is that PAX set the entire kernel as
>> RO, and implemented writeable access via pax_open/close. How were they
>> fighting against race conditions, and what is the benefit of specific
>> regions being allocated this way as opposed to the RO-all-the-things
>> approach which makes writes a specialized set of operations?
>
> My understanding is that PaX's KERNEXEC with the constification plugin
> moves a substantial portion of the kernel's .data section
> (effectively) into the .rodata section. It's not the "entire" kernel.
> (Well, depending on how you count. The .text section is already
> read-only upstream.) PaX, as far as I know, provided no dynamic memory
> allocation protections, like smalloc() would provide.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Boris Lukashev
Systems Architect
Semper Victus


Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-28 Thread Kees Cook
On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
 wrote:
> So what about a middle ground where CoW semantics are used to enforce
> the state of these allocations as RO, but provide a strictly
> controlled pathway to read the RO data, copy and modify it, then write
> and seal into a new allocation. Successful return from this process
> should permit the page table to change the pointer to where the object
> now resides, and initiate freeing of the original memory so long as a
> refcount is kept for accesses. That way, sealable memory is sealed,
> and any consumers reading it will be using the original ptr to the
> original smalloc region. Attackers who do manage to change the

This could be another way to do it, yeah, and it helps that smalloc()
is built on vmalloc(). It'd require some careful design, but it could
be a way forward after this initial sealed-after-init version goes in.

> Lastly, my meager understanding is that PAX set the entire kernel as
> RO, and implemented writeable access via pax_open/close. How were they
> fighting against race conditions, and what is the benefit of specific
> regions being allocated this way as opposed to the RO-all-the-things
> approach which makes writes a specialized set of operations?

My understanding is that PaX's KERNEXEC with the constification plugin
moves a substantial portion of the kernel's .data section
(effectively) into the .rodata section. It's not the "entire" kernel.
(Well, depending on how you count. The .text section is already
read-only upstream.) PaX, as far as I know, provided no dynamic memory
allocation protections, like smalloc() would provide.

-Kees

-- 
Kees Cook
Pixel Security


Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-28 Thread Kees Cook
On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev
 wrote:
> So what about a middle ground where CoW semantics are used to enforce
> the state of these allocations as RO, but provide a strictly
> controlled pathway to read the RO data, copy and modify it, then write
> and seal into a new allocation. Successful return from this process
> should permit the page table to change the pointer to where the object
> now resides, and initiate freeing of the original memory so long as a
> refcount is kept for accesses. That way, sealable memory is sealed,
> and any consumers reading it will be using the original ptr to the
> original smalloc region. Attackers who do manage to change the

This could be another way to do it, yeah, and it helps that smalloc()
is built on vmalloc(). It'd require some careful design, but it could
be a way forward after this initial sealed-after-init version goes in.

> Lastly, my meager understanding is that PAX set the entire kernel as
> RO, and implemented writeable access via pax_open/close. How were they
> fighting against race conditions, and what is the benefit of specific
> regions being allocated this way as opposed to the RO-all-the-things
> approach which makes writes a specialized set of operations?

My understanding is that PaX's KERNEXEC with the constification plugin
moves a substantial portion of the kernel's .data section
(effectively) into the .rodata section. It's not the "entire" kernel.
(Well, depending on how you count. The .text section is already
read-only upstream.) PaX, as far as I know, provided no dynamic memory
allocation protections, like smalloc() would provide.

-Kees

-- 
Kees Cook
Pixel Security


Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-28 Thread Boris Lukashev
One-time sealable memory makes the most sense from a defensive
perspective - red team reads this stuff, the races mentioned will be
implemented as described to win the day, and probably in other
innovative ways. If a gap is left in the implementation, without
explicit coverage by an adjacent function, it will be used no matter
how small the chances of it occurring in the real world are - grooming
systems to create unlikely conditions is fair play (look at
eternalblue's SMB pool machinations).
However, out of tree modules will likely not appreciate this - third
party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH"
debate, they're real functional components, used by real companies,
often enough that they need to be considered a member of the
ecosystem, even if not a first-order member.
So what about a middle ground where CoW semantics are used to enforce
the state of these allocations as RO, but provide a strictly
controlled pathway to read the RO data, copy and modify it, then write
and seal into a new allocation. Successful return from this process
should permit the page table to change the pointer to where the object
now resides, and initiate freeing of the original memory so long as a
refcount is kept for accesses. That way, sealable memory is sealed,
and any consumers reading it will be using the original ptr to the
original smalloc region. Attackers who do manage to change the
allocation by writing a new one still have to figure out how to change
the region used by an existing consumer. New consumers will get hit by
whatever they changed, but the change will be tracked and require them
to gain execution control over the allocator itself in order to affect
the change (or ROP chain something else into doing it, but thats's a
discussion on RAP/CFI).
CPU-local ops are great, if they dont halt the other cores. Stopping
all other CPUs is going to be DoA in HPC and other CPU intensive
workloads - think what ZFS would do if its pipelines kept getting
halted by something running a lot of smallocs (they get
non-preemptible often enough, requiring waits on both sides of the
op), how how LIO would behave - iSCSI waits for no man or woman, or
their allocator strategies. I'm all for "security should be more of a
concern than performance - its easier to build a faster car later if
you're still alive to do it," but keeping in mind who the consumers
are, i can easily see this functionality staying disabled in most
distributions and thus receiving much less testing and beatdown than
it should.
Lastly, my meager understanding is that PAX set the entire kernel as
RO, and implemented writeable access via pax_open/close. How were they
fighting against race conditions, and what is the benefit of specific
regions being allocated this way as opposed to the RO-all-the-things
approach which makes writes a specialized set of operations?

On Sun, May 28, 2017 at 2:23 PM, Kees Cook  wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa  wrote:
>> On 23/05/17 23:11, Kees Cook wrote:
>>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa  wrote:
>>> I meant this:
>>>
>>> CPU 1 CPU 2
>>> create
>>> alloc
>>> write
>>> seal
>>> ...
>>> unseal
>>> write
>>> write
>>> seal
>>>
>>> The CPU 2 write would be, for example, an attacker using a
>>> vulnerability to attempt to write to memory in the sealed area. All it
>>> would need to do to succeed would be to trigger an action in the
>>> kernel that would do a "legitimate" write (which requires the unseal),
>>> and race it. Unsealing should be CPU-local, if the API is going to
>>> support this kind of access.
>>
>> I see.
>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
>
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)
>
>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
>
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).
>
>>> I am more concerned about _any_ unseal after initial seal. And even
>>> then, it'd be nice to keep things CPU-local. My concerns are related
>>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>>> which is kind of like this, but focused on the .data section, not
>>> dynamic memory. It has similar concerns about CPU-locality.
>>> Additionally, even writing to memory and then making it read-only
>>> later runs risks (see threads about BPF JIT races vs making things
>>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>>> doesn't change the risk this series is fixing: races with attacker
>>> writes during assignment but before read-only marking).
>>
>> If you are 

Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support

2017-05-28 Thread Boris Lukashev
One-time sealable memory makes the most sense from a defensive
perspective - red team reads this stuff, the races mentioned will be
implemented as described to win the day, and probably in other
innovative ways. If a gap is left in the implementation, without
explicit coverage by an adjacent function, it will be used no matter
how small the chances of it occurring in the real world are - grooming
systems to create unlikely conditions is fair play (look at
eternalblue's SMB pool machinations).
However, out of tree modules will likely not appreciate this - third
party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH"
debate, they're real functional components, used by real companies,
often enough that they need to be considered a member of the
ecosystem, even if not a first-order member.
So what about a middle ground where CoW semantics are used to enforce
the state of these allocations as RO, but provide a strictly
controlled pathway to read the RO data, copy and modify it, then write
and seal into a new allocation. Successful return from this process
should permit the page table to change the pointer to where the object
now resides, and initiate freeing of the original memory so long as a
refcount is kept for accesses. That way, sealable memory is sealed,
and any consumers reading it will be using the original ptr to the
original smalloc region. Attackers who do manage to change the
allocation by writing a new one still have to figure out how to change
the region used by an existing consumer. New consumers will get hit by
whatever they changed, but the change will be tracked and require them
to gain execution control over the allocator itself in order to affect
the change (or ROP chain something else into doing it, but thats's a
discussion on RAP/CFI).
CPU-local ops are great, if they dont halt the other cores. Stopping
all other CPUs is going to be DoA in HPC and other CPU intensive
workloads - think what ZFS would do if its pipelines kept getting
halted by something running a lot of smallocs (they get
non-preemptible often enough, requiring waits on both sides of the
op), how how LIO would behave - iSCSI waits for no man or woman, or
their allocator strategies. I'm all for "security should be more of a
concern than performance - its easier to build a faster car later if
you're still alive to do it," but keeping in mind who the consumers
are, i can easily see this functionality staying disabled in most
distributions and thus receiving much less testing and beatdown than
it should.
Lastly, my meager understanding is that PAX set the entire kernel as
RO, and implemented writeable access via pax_open/close. How were they
fighting against race conditions, and what is the benefit of specific
regions being allocated this way as opposed to the RO-all-the-things
approach which makes writes a specialized set of operations?

On Sun, May 28, 2017 at 2:23 PM, Kees Cook  wrote:
> On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa  wrote:
>> On 23/05/17 23:11, Kees Cook wrote:
>>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa  wrote:
>>> I meant this:
>>>
>>> CPU 1 CPU 2
>>> create
>>> alloc
>>> write
>>> seal
>>> ...
>>> unseal
>>> write
>>> write
>>> seal
>>>
>>> The CPU 2 write would be, for example, an attacker using a
>>> vulnerability to attempt to write to memory in the sealed area. All it
>>> would need to do to succeed would be to trigger an action in the
>>> kernel that would do a "legitimate" write (which requires the unseal),
>>> and race it. Unsealing should be CPU-local, if the API is going to
>>> support this kind of access.
>>
>> I see.
>> If the CPU1 were to forcibly halt anything that can race with it, then
>> it would be sure that there was no interference.
>
> Correct. This is actually what ARM does for doing kernel memory
> writing when poking stuff for kprobes, etc. It's rather dramatic,
> though. :)
>
>> A reactive approach could be, instead, to re-validate the content after
>> the sealing, assuming that it is possible.
>
> I would prefer to avoid this, as that allows an attacker to still have
> made the changes (which could even result in them then disabling the
> re-validation during the attack).
>
>>> I am more concerned about _any_ unseal after initial seal. And even
>>> then, it'd be nice to keep things CPU-local. My concerns are related
>>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704)
>>> which is kind of like this, but focused on the .data section, not
>>> dynamic memory. It has similar concerns about CPU-locality.
>>> Additionally, even writing to memory and then making it read-only
>>> later runs risks (see threads about BPF JIT races vs making things
>>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK
>>> doesn't change the risk this series is fixing: races with attacker
>>> writes during assignment but before read-only marking).
>>
>> If you are talking about an attacker, rather than protection against
>> accidental