On Fri, Apr 29, 2016 at 12:18 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * Kees Cook <keesc...@chromium.org> wrote: > >> From: Yinghai Lu <ying...@kernel.org> >> >> This change makes later calculations about where the kernel is located >> easier to reason about. To better understand this change, we must first >> clarify what VO and ZO are. They were introduced in commits by hpa: >> >> 77d1a49 x86, boot: make symbols from the main vmlinux available >> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields >> >> Specifically: >> >> VO: >> - uncompressed kernel image >> - size: VO__end - VO__text ("VO_INIT_SIZE" define) >> >> ZO: >> - bootable compressed kernel image (boot/compressed/vmlinux) >> - head text + compressed kernel (VO and relocs table) + decompressor code >> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below) >> >> The INIT_SIZE definition is used to find the larger of the two image sizes: >> >> #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) >> #define VO_INIT_SIZE (VO__end - VO__text) >> #if ZO_INIT_SIZE > VO_INIT_SIZE >> #define INIT_SIZE ZO_INIT_SIZE >> #else >> #define INIT_SIZE VO_INIT_SIZE >> #endif >> >> The current code uses extract_offset to decide where to position the >> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE >> currently includes the extract_offset.) > > Yeah, so I rewrote the above to: > > =================> > This change makes later calculations about where the kernel is located > easier to reason about. To better understand this change, we must first > clarify what 'VO' and 'ZO' are. These values were introduced in commits > by hpa: > > 77d1a4999502 ("x86, boot: make symbols from the main vmlinux available") > 37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage > fields") > > Specifically: > > All names prefixed with 'VO_': > > - relate to the uncompressed kernel image > > - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define) > > All names prefixed with 'ZO_': > > - relate to the bootable compressed kernel image (boot/compressed/vmlinux), > which is composed of the following memory areas: > - head text > - compressed kernel (VO image and relocs table) > - decompressor code > > - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" > define, though see below) > > The 'INIT_SIZE' value is used to find the larger of the two image sizes: > > #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) > #define VO_INIT_SIZE (VO__end - VO__text) > > #if ZO_INIT_SIZE > VO_INIT_SIZE > # define INIT_SIZE ZO_INIT_SIZE > #else > # define INIT_SIZE VO_INIT_SIZE > #endif > > The current code uses extract_offset to decide where to position the > copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE > currently includes the extract_offset.) > <================= > > Assuming the edits I made are correct, this is the point where the changelog > lost > me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO > copied to > offset 0? > > I had to go into arch/x86/boot/compressed/mkpiggy.c, where > ZO_z_extract_offset is > generated, to find the answer: it's needed because we are trying to minimize > the > amount of RAM used for the whole act of creating an uncompressed, executable, > properly relocation-linked kernel image in system memory. We do this so that > kernels can be booted on even very small systems. > > To achieve the goal of minimal memory consumption we have implemented an > in-place > decompression strategy: instead of cleanly separating the VO and ZO images and > also allocating some memory for the decompression code's runtime needs, we > instead > create this elaborate layout of memory buffers where the output (decompressed) > stream, as it progresses, overlaps with and destroys the input (compressed) > stream. This can only be done safely if the ZO image is placed to the end of > the > VO range, plus a certain amount of safety distance to make sure that when the > last > bytes of the VO range are decompressed, the compressed stream pointer is > safely > beyond the end of the VO range. Correct? > > This is a very essential central concept to the whole code, but nowhere is it > described clearly!
That would certainly be worth calling out in the description, true. > But more importantly, especially in view of address space randomization, we > should > realize that the days of 8 MB i386-DX systems are gone, and we should get rid > of > all this crazy obfuscation that is hindering development in this area. I also > suspect that the actual temporary allocation size reduction savings from this > trick are relatively small, compared to the resulting total memory size. > > So my suggestion: let's just cleanly separate all the data areas and not try > to do > any clever overlapping: the benefit will be minimal, and any system that has > main > RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and > unusable anyway. > > I.e. have a really clean size calculation of: > > ZO + VO + decompressor-stacks-size + decompressor-data-size > > and decompress accordingly without tricks, without overlaps, without any > chance > for corruption - and, most importantly, without this metric ton of obfuscation > that very few people have managed to fight their way through in the last > couple of > years, and which hinders essential features ... > > Agreed? I don't agree. We do still have embedded systems running x86 kernels, and we have cases where we're running multiple kernels in memory (like kdump). I think the memory savings is worth the complexity, especially since the complexity is being reduced up by this patch. But that's not all: If we moved the compressed kernel after the buffer, the only thing we'd do would be taking up more memory. We'd still have the head_*.S complexity of handling the relocation and handling the copy, we'd still have the extraction, etc, etc. The only thing would be literally changing extract_offset to INIT_SIZE. Everything else would be the same. If we moved the decompressed kernel after the compressed kernel, (ignoring KASLR for a moment) then we'd end up in a confusing situation where the kernel would be running somewhere other than where the boot loader asked it to load. I don't even want to think about the weird bug reports we might get from a change like that from old or weird loaders. This patch gets us a more reasonable layout with less complexity and no change to the memory footprint without changing the expectations of the boot loader. I really think this should stand. -Kees -- Kees Cook Chrome OS & Brillo Security