On Fri, Apr 29, 2016 at 12:18 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Kees Cook <keesc...@chromium.org> wrote:
>
>> From: Yinghai Lu <ying...@kernel.org>
>>
>> This change makes later calculations about where the kernel is located
>> easier to reason about. To better understand this change, we must first
>> clarify what VO and ZO are. They were introduced in commits by hpa:
>>
>> 77d1a49 x86, boot: make symbols from the main vmlinux available
>> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>>
>> Specifically:
>>
>> VO:
>> - uncompressed kernel image
>> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>>
>> ZO:
>> - bootable compressed kernel image (boot/compressed/vmlinux)
>> - head text + compressed kernel (VO and relocs table) + decompressor code
>> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>>
>> The INIT_SIZE definition is used to find the larger of the two image sizes:
>>
>>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>>  #define VO_INIT_SIZE    (VO__end - VO__text)
>>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>>  #define INIT_SIZE ZO_INIT_SIZE
>>  #else
>>  #define INIT_SIZE VO_INIT_SIZE
>>  #endif
>>
>> The current code uses extract_offset to decide where to position the
>> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
>> currently includes the extract_offset.)
>
> Yeah, so I rewrote the above to:
>
> =================>
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what 'VO' and 'ZO' are. These values were introduced in commits
> by hpa:
>
>   77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
>   37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage 
> fields")
>
> Specifically:
>
> All names prefixed with 'VO_':
>
>  - relate to the uncompressed kernel image
>
>  - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
>
> All names prefixed with 'ZO_':
>
>  - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
>    which is composed of the following memory areas:
>      - head text
>      - compressed kernel (VO image and relocs table)
>      - decompressor code
>
>  - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" 
> define, though see below)
>
> The 'INIT_SIZE' value is used to find the larger of the two image sizes:
>
>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>  #define VO_INIT_SIZE    (VO__end - VO__text)
>
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  # define INIT_SIZE ZO_INIT_SIZE
>  #else
>  # define INIT_SIZE VO_INIT_SIZE
>  #endif
>
> The current code uses extract_offset to decide where to position the
> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
> currently includes the extract_offset.)
> <=================
>
> Assuming the edits I made are correct, this is the point where the changelog 
> lost
> me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO 
> copied to
> offset 0?
>
> I had to go into arch/x86/boot/compressed/mkpiggy.c, where 
> ZO_z_extract_offset is
> generated, to find the answer: it's needed because we are trying to minimize 
> the
> amount of RAM used for the whole act of creating an uncompressed, executable,
> properly relocation-linked kernel image in system memory. We do this so that
> kernels can be booted on even very small systems.
>
> To achieve the goal of minimal memory consumption we have implemented an 
> in-place
> decompression strategy: instead of cleanly separating the VO and ZO images and
> also allocating some memory for the decompression code's runtime needs, we 
> instead
> create this elaborate layout of memory buffers where the output (decompressed)
> stream, as it progresses, overlaps with and destroys the input (compressed)
> stream. This can only be done safely if the ZO image is placed to the end of 
> the
> VO range, plus a certain amount of safety distance to make sure that when the 
> last
> bytes of the VO range are decompressed, the compressed stream pointer is 
> safely
> beyond the end of the VO range. Correct?
>
> This is a very essential central concept to the whole code, but nowhere is it
> described clearly!

That would certainly be worth calling out in the description, true.

> But more importantly, especially in view of address space randomization, we 
> should
> realize that the days of 8 MB i386-DX systems are gone, and we should get rid 
> of
> all this crazy obfuscation that is hindering development in this area. I also
> suspect that the actual temporary allocation size reduction savings from this
> trick are relatively small, compared to the resulting total memory size.
>
> So my suggestion: let's just cleanly separate all the data areas and not try 
> to do
> any clever overlapping: the benefit will be minimal, and any system that has 
> main
> RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and
> unusable anyway.
>
> I.e. have a really clean size calculation of:
>
>         ZO + VO + decompressor-stacks-size + decompressor-data-size
>
> and decompress accordingly without tricks, without overlaps, without any 
> chance
> for corruption - and, most importantly, without this metric ton of obfuscation
> that very few people have managed to fight their way through in the last 
> couple of
> years, and which hinders essential features ...
>
> Agreed?

I don't agree. We do still have embedded systems running x86 kernels,
and we have cases where we're running multiple kernels in memory (like
kdump). I think the memory savings is worth the complexity, especially
since the complexity is being reduced up by this patch. But that's not
all:

If we moved the compressed kernel after the buffer, the only thing
we'd do would be taking up more memory. We'd still have the head_*.S
complexity of handling the relocation and handling the copy, we'd
still have the extraction, etc, etc. The only thing would be literally
changing extract_offset to INIT_SIZE. Everything else would be the
same.

If we moved the decompressed kernel after the compressed kernel,
(ignoring KASLR for a moment) then we'd end up in a confusing
situation where the kernel would be running somewhere other than where
the boot loader asked it to load. I don't even want to think about the
weird bug reports we might get from a change like that from old or
weird loaders.

This patch gets us a more reasonable layout with less complexity and
no change to the memory footprint without changing the expectations of
the boot loader. I really think this should stand.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Reply via email to