Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
If i understand the current direction for smalloc, its to implement it without the ability to "unseal," which has implications on how LSM implementations and other users of these dynamic allocations handle things. If its implemented without a writeable interface for modules which need it, then switched to a CoW (or other controlled/isolated/hardened) mechanism for creating writeable segments in dynamic allocations, that would create another set of breaking changes for the consumers adapting to the first set. In the spirit of adopting things faster/smoother, maybe it makes sense to permit the smalloc re-writing method suggested in the first implementation; with an API which consumers can later use for a more secure approach that reduces the attack surface without breaking consumer interfaces (or internal semantics). A sysctl affecting the behavior as ro-only or ro-and-sometimes-rw would give users the flexibility to tune their environment per their needs, which should also reduce potential conflict/complaints. On Sun, May 28, 2017 at 5:32 PM, Kees Cookwrote: > On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev > wrote: >> So what about a middle ground where CoW semantics are used to enforce >> the state of these allocations as RO, but provide a strictly >> controlled pathway to read the RO data, copy and modify it, then write >> and seal into a new allocation. Successful return from this process >> should permit the page table to change the pointer to where the object >> now resides, and initiate freeing of the original memory so long as a >> refcount is kept for accesses. That way, sealable memory is sealed, >> and any consumers reading it will be using the original ptr to the >> original smalloc region. Attackers who do manage to change the > > This could be another way to do it, yeah, and it helps that smalloc() > is built on vmalloc(). It'd require some careful design, but it could > be a way forward after this initial sealed-after-init version goes in. > >> Lastly, my meager understanding is that PAX set the entire kernel as >> RO, and implemented writeable access via pax_open/close. How were they >> fighting against race conditions, and what is the benefit of specific >> regions being allocated this way as opposed to the RO-all-the-things >> approach which makes writes a specialized set of operations? > > My understanding is that PaX's KERNEXEC with the constification plugin > moves a substantial portion of the kernel's .data section > (effectively) into the .rodata section. It's not the "entire" kernel. > (Well, depending on how you count. The .text section is already > read-only upstream.) PaX, as far as I know, provided no dynamic memory > allocation protections, like smalloc() would provide. > > -Kees > > -- > Kees Cook > Pixel Security -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
If i understand the current direction for smalloc, its to implement it without the ability to "unseal," which has implications on how LSM implementations and other users of these dynamic allocations handle things. If its implemented without a writeable interface for modules which need it, then switched to a CoW (or other controlled/isolated/hardened) mechanism for creating writeable segments in dynamic allocations, that would create another set of breaking changes for the consumers adapting to the first set. In the spirit of adopting things faster/smoother, maybe it makes sense to permit the smalloc re-writing method suggested in the first implementation; with an API which consumers can later use for a more secure approach that reduces the attack surface without breaking consumer interfaces (or internal semantics). A sysctl affecting the behavior as ro-only or ro-and-sometimes-rw would give users the flexibility to tune their environment per their needs, which should also reduce potential conflict/complaints. On Sun, May 28, 2017 at 5:32 PM, Kees Cook wrote: > On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev > wrote: >> So what about a middle ground where CoW semantics are used to enforce >> the state of these allocations as RO, but provide a strictly >> controlled pathway to read the RO data, copy and modify it, then write >> and seal into a new allocation. Successful return from this process >> should permit the page table to change the pointer to where the object >> now resides, and initiate freeing of the original memory so long as a >> refcount is kept for accesses. That way, sealable memory is sealed, >> and any consumers reading it will be using the original ptr to the >> original smalloc region. Attackers who do manage to change the > > This could be another way to do it, yeah, and it helps that smalloc() > is built on vmalloc(). It'd require some careful design, but it could > be a way forward after this initial sealed-after-init version goes in. > >> Lastly, my meager understanding is that PAX set the entire kernel as >> RO, and implemented writeable access via pax_open/close. How were they >> fighting against race conditions, and what is the benefit of specific >> regions being allocated this way as opposed to the RO-all-the-things >> approach which makes writes a specialized set of operations? > > My understanding is that PaX's KERNEXEC with the constification plugin > moves a substantial portion of the kernel's .data section > (effectively) into the .rodata section. It's not the "entire" kernel. > (Well, depending on how you count. The .text section is already > read-only upstream.) PaX, as far as I know, provided no dynamic memory > allocation protections, like smalloc() would provide. > > -Kees > > -- > Kees Cook > Pixel Security -- Boris Lukashev Systems Architect Semper Victus
Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
On Sun, May 28, 2017 at 11:56 AM, Boris Lukashevwrote: > So what about a middle ground where CoW semantics are used to enforce > the state of these allocations as RO, but provide a strictly > controlled pathway to read the RO data, copy and modify it, then write > and seal into a new allocation. Successful return from this process > should permit the page table to change the pointer to where the object > now resides, and initiate freeing of the original memory so long as a > refcount is kept for accesses. That way, sealable memory is sealed, > and any consumers reading it will be using the original ptr to the > original smalloc region. Attackers who do manage to change the This could be another way to do it, yeah, and it helps that smalloc() is built on vmalloc(). It'd require some careful design, but it could be a way forward after this initial sealed-after-init version goes in. > Lastly, my meager understanding is that PAX set the entire kernel as > RO, and implemented writeable access via pax_open/close. How were they > fighting against race conditions, and what is the benefit of specific > regions being allocated this way as opposed to the RO-all-the-things > approach which makes writes a specialized set of operations? My understanding is that PaX's KERNEXEC with the constification plugin moves a substantial portion of the kernel's .data section (effectively) into the .rodata section. It's not the "entire" kernel. (Well, depending on how you count. The .text section is already read-only upstream.) PaX, as far as I know, provided no dynamic memory allocation protections, like smalloc() would provide. -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
On Sun, May 28, 2017 at 11:56 AM, Boris Lukashev wrote: > So what about a middle ground where CoW semantics are used to enforce > the state of these allocations as RO, but provide a strictly > controlled pathway to read the RO data, copy and modify it, then write > and seal into a new allocation. Successful return from this process > should permit the page table to change the pointer to where the object > now resides, and initiate freeing of the original memory so long as a > refcount is kept for accesses. That way, sealable memory is sealed, > and any consumers reading it will be using the original ptr to the > original smalloc region. Attackers who do manage to change the This could be another way to do it, yeah, and it helps that smalloc() is built on vmalloc(). It'd require some careful design, but it could be a way forward after this initial sealed-after-init version goes in. > Lastly, my meager understanding is that PAX set the entire kernel as > RO, and implemented writeable access via pax_open/close. How were they > fighting against race conditions, and what is the benefit of specific > regions being allocated this way as opposed to the RO-all-the-things > approach which makes writes a specialized set of operations? My understanding is that PaX's KERNEXEC with the constification plugin moves a substantial portion of the kernel's .data section (effectively) into the .rodata section. It's not the "entire" kernel. (Well, depending on how you count. The .text section is already read-only upstream.) PaX, as far as I know, provided no dynamic memory allocation protections, like smalloc() would provide. -Kees -- Kees Cook Pixel Security
Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
One-time sealable memory makes the most sense from a defensive perspective - red team reads this stuff, the races mentioned will be implemented as described to win the day, and probably in other innovative ways. If a gap is left in the implementation, without explicit coverage by an adjacent function, it will be used no matter how small the chances of it occurring in the real world are - grooming systems to create unlikely conditions is fair play (look at eternalblue's SMB pool machinations). However, out of tree modules will likely not appreciate this - third party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH" debate, they're real functional components, used by real companies, often enough that they need to be considered a member of the ecosystem, even if not a first-order member. So what about a middle ground where CoW semantics are used to enforce the state of these allocations as RO, but provide a strictly controlled pathway to read the RO data, copy and modify it, then write and seal into a new allocation. Successful return from this process should permit the page table to change the pointer to where the object now resides, and initiate freeing of the original memory so long as a refcount is kept for accesses. That way, sealable memory is sealed, and any consumers reading it will be using the original ptr to the original smalloc region. Attackers who do manage to change the allocation by writing a new one still have to figure out how to change the region used by an existing consumer. New consumers will get hit by whatever they changed, but the change will be tracked and require them to gain execution control over the allocator itself in order to affect the change (or ROP chain something else into doing it, but thats's a discussion on RAP/CFI). CPU-local ops are great, if they dont halt the other cores. Stopping all other CPUs is going to be DoA in HPC and other CPU intensive workloads - think what ZFS would do if its pipelines kept getting halted by something running a lot of smallocs (they get non-preemptible often enough, requiring waits on both sides of the op), how how LIO would behave - iSCSI waits for no man or woman, or their allocator strategies. I'm all for "security should be more of a concern than performance - its easier to build a faster car later if you're still alive to do it," but keeping in mind who the consumers are, i can easily see this functionality staying disabled in most distributions and thus receiving much less testing and beatdown than it should. Lastly, my meager understanding is that PAX set the entire kernel as RO, and implemented writeable access via pax_open/close. How were they fighting against race conditions, and what is the benefit of specific regions being allocated this way as opposed to the RO-all-the-things approach which makes writes a specialized set of operations? On Sun, May 28, 2017 at 2:23 PM, Kees Cookwrote: > On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa wrote: >> On 23/05/17 23:11, Kees Cook wrote: >>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa wrote: >>> I meant this: >>> >>> CPU 1 CPU 2 >>> create >>> alloc >>> write >>> seal >>> ... >>> unseal >>> write >>> write >>> seal >>> >>> The CPU 2 write would be, for example, an attacker using a >>> vulnerability to attempt to write to memory in the sealed area. All it >>> would need to do to succeed would be to trigger an action in the >>> kernel that would do a "legitimate" write (which requires the unseal), >>> and race it. Unsealing should be CPU-local, if the API is going to >>> support this kind of access. >> >> I see. >> If the CPU1 were to forcibly halt anything that can race with it, then >> it would be sure that there was no interference. > > Correct. This is actually what ARM does for doing kernel memory > writing when poking stuff for kprobes, etc. It's rather dramatic, > though. :) > >> A reactive approach could be, instead, to re-validate the content after >> the sealing, assuming that it is possible. > > I would prefer to avoid this, as that allows an attacker to still have > made the changes (which could even result in them then disabling the > re-validation during the attack). > >>> I am more concerned about _any_ unseal after initial seal. And even >>> then, it'd be nice to keep things CPU-local. My concerns are related >>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704) >>> which is kind of like this, but focused on the .data section, not >>> dynamic memory. It has similar concerns about CPU-locality. >>> Additionally, even writing to memory and then making it read-only >>> later runs risks (see threads about BPF JIT races vs making things >>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK >>> doesn't change the risk this series is fixing: races with attacker >>> writes during assignment but before read-only marking). >> >> If you are
Re: [kernel-hardening] Re: [PATCH 1/1] Sealable memory support
One-time sealable memory makes the most sense from a defensive perspective - red team reads this stuff, the races mentioned will be implemented as described to win the day, and probably in other innovative ways. If a gap is left in the implementation, without explicit coverage by an adjacent function, it will be used no matter how small the chances of it occurring in the real world are - grooming systems to create unlikely conditions is fair play (look at eternalblue's SMB pool machinations). However, out of tree modules will likely not appreciate this - third party LSMs, tpe module, SCST, etc. I dont want to get into the "NIH" debate, they're real functional components, used by real companies, often enough that they need to be considered a member of the ecosystem, even if not a first-order member. So what about a middle ground where CoW semantics are used to enforce the state of these allocations as RO, but provide a strictly controlled pathway to read the RO data, copy and modify it, then write and seal into a new allocation. Successful return from this process should permit the page table to change the pointer to where the object now resides, and initiate freeing of the original memory so long as a refcount is kept for accesses. That way, sealable memory is sealed, and any consumers reading it will be using the original ptr to the original smalloc region. Attackers who do manage to change the allocation by writing a new one still have to figure out how to change the region used by an existing consumer. New consumers will get hit by whatever they changed, but the change will be tracked and require them to gain execution control over the allocator itself in order to affect the change (or ROP chain something else into doing it, but thats's a discussion on RAP/CFI). CPU-local ops are great, if they dont halt the other cores. Stopping all other CPUs is going to be DoA in HPC and other CPU intensive workloads - think what ZFS would do if its pipelines kept getting halted by something running a lot of smallocs (they get non-preemptible often enough, requiring waits on both sides of the op), how how LIO would behave - iSCSI waits for no man or woman, or their allocator strategies. I'm all for "security should be more of a concern than performance - its easier to build a faster car later if you're still alive to do it," but keeping in mind who the consumers are, i can easily see this functionality staying disabled in most distributions and thus receiving much less testing and beatdown than it should. Lastly, my meager understanding is that PAX set the entire kernel as RO, and implemented writeable access via pax_open/close. How were they fighting against race conditions, and what is the benefit of specific regions being allocated this way as opposed to the RO-all-the-things approach which makes writes a specialized set of operations? On Sun, May 28, 2017 at 2:23 PM, Kees Cook wrote: > On Wed, May 24, 2017 at 10:45 AM, Igor Stoppa wrote: >> On 23/05/17 23:11, Kees Cook wrote: >>> On Tue, May 23, 2017 at 2:43 AM, Igor Stoppa wrote: >>> I meant this: >>> >>> CPU 1 CPU 2 >>> create >>> alloc >>> write >>> seal >>> ... >>> unseal >>> write >>> write >>> seal >>> >>> The CPU 2 write would be, for example, an attacker using a >>> vulnerability to attempt to write to memory in the sealed area. All it >>> would need to do to succeed would be to trigger an action in the >>> kernel that would do a "legitimate" write (which requires the unseal), >>> and race it. Unsealing should be CPU-local, if the API is going to >>> support this kind of access. >> >> I see. >> If the CPU1 were to forcibly halt anything that can race with it, then >> it would be sure that there was no interference. > > Correct. This is actually what ARM does for doing kernel memory > writing when poking stuff for kprobes, etc. It's rather dramatic, > though. :) > >> A reactive approach could be, instead, to re-validate the content after >> the sealing, assuming that it is possible. > > I would prefer to avoid this, as that allows an attacker to still have > made the changes (which could even result in them then disabling the > re-validation during the attack). > >>> I am more concerned about _any_ unseal after initial seal. And even >>> then, it'd be nice to keep things CPU-local. My concerns are related >>> to the write-rarely proposal (https://lkml.org/lkml/2017/3/29/704) >>> which is kind of like this, but focused on the .data section, not >>> dynamic memory. It has similar concerns about CPU-locality. >>> Additionally, even writing to memory and then making it read-only >>> later runs risks (see threads about BPF JIT races vs making things >>> read-only: https://patchwork.kernel.org/patch/9662653/ Alexei's NAK >>> doesn't change the risk this series is fixing: races with attacker >>> writes during assignment but before read-only marking). >> >> If you are talking about an attacker, rather than protection against >> accidental