On Fri, Jun 23, 2017 at 10:42:10AM +0200, Ingo Molnar wrote:
> 
> * Chen Yu <yu.c.c...@intel.com> wrote:
> 
> > Hi Ingo,
> > On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> > > 
> > > * Chen Yu <yu.c.c...@intel.com> wrote:
> > > 
> > > > Currently we try to have e820_table_firmware to represent the
> > > > original firmware memory layout passed to us by the bootloader,
> > > > however it is not the case, the e820_table_firmware might still
> > > > be modified by linux:
> > > > 1. During bootup, the efi boot stub might allocate memory via
> > > >    efi service for the PCI device information structure, then
> > > >    later e820_reserve_setup_data() reserved these dynamically
> > > >    allocated structures(AKA, setup_data) in e820_table_firmware
> > > >    accordingly.
> > > > 2. The kexec might also modify the e820_table_firmware.
> > > 
> > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > > it modify e820_table?
> > > 
> > Both the e820_table and e820_table_firmware will be updated in
> > e820__reserve_setup_data():
> > Changing the PCI device information structures from E820_TYPE_RAM
> > to E820_TYPE_RESERVED_KERN.
> > > I.e. what is the point of having 3 different versions of the
> > > memory layout table?
> > My original thought was that, we should not record the modification
> > from the efi boot stub into the e820_tabel_firmware and we are done.
> > But after checking the code, I realized that if we do so the
> > kexec might have potiential problem.
> > 
> > The e820_table_firmware was introduced mainly for kexec and
> > was used to pass the original memory layout to the second
> > kernel:
> > 
> >    commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
> >    Author: Bernhard Walle <bwa...@suse.de>
> >    Date:   Fri Jun 27 13:12:55 2008 +0200
> > 
> >        x86: use FIRMWARE_MEMMAP on x86/E820
> > 
> > Besides, the second kernel will not re-enter the efi boot stub
> > code and it will reuse the PCI device information structure created
> > by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> > region. So these PCI device information structures will not be
> > modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> > to the second kernel, thus the latter could leverage ioremap to access
> > the PCI information.
> > 
> > So the problem is, if we do not record the PCI information in
> > the e820_table_firmware, the PCI information will be kept as
> > type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> > be passed to the second kernel and might be allocated for ordinary
> > use in the second kernel, as a result the second kernel might not
> > get valid PCI information(might be overwritten by others). So
> > currently we try to introduce a new e820_table_ori to represent
> > the original one provided by the BIOS(mainly for hibernation
> > memory layout md5 checking).
> 
> So there's 3 versions we need:
> 
>  - the original 'firmware' table as-is - for MD5 check and other potential 
>    purposes
> 
>  - some intermediate version of the table for kexec: what is the exact 
> definition 
>    of that table, what changes from the real table does it _not_ want?
>
Some boot options such as 'mem=' are not wanted by kexec, because the kexec
wants to let the second kernel see the whole memory layout passed by
the bootloader. I think this is why e820_table_firmware was introduced.
>  - the 'real' table
> 
> all the naming should reflect that. I.e. instead of some nonsensical "_ori" 
> postfix, that is really the _firmware table. If kexec needs a separate one 
> then 
> name it _kexec and copy it at the right stage.
> 
> Ok?
>
Ok. I'm sending V2 of this patch. I tried not to break the old behavior and
split the patch into three, thus the logic might look more clear.
> Thanks,
> 
>       Ingo
Thanks,
        Yu

Reply via email to