Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-07-02 Thread Chen Yu
On Fri, Jun 23, 2017 at 10:42:10AM +0200, Ingo Molnar wrote:
> 
> * Chen Yu  wrote:
> 
> > Hi Ingo,
> > On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> > > 
> > > * Chen Yu  wrote:
> > > 
> > > > Currently we try to have e820_table_firmware to represent the
> > > > original firmware memory layout passed to us by the bootloader,
> > > > however it is not the case, the e820_table_firmware might still
> > > > be modified by linux:
> > > > 1. During bootup, the efi boot stub might allocate memory via
> > > >efi service for the PCI device information structure, then
> > > >later e820_reserve_setup_data() reserved these dynamically
> > > >allocated structures(AKA, setup_data) in e820_table_firmware
> > > >accordingly.
> > > > 2. The kexec might also modify the e820_table_firmware.
> > > 
> > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > > it modify e820_table?
> > > 
> > Both the e820_table and e820_table_firmware will be updated in
> > e820__reserve_setup_data():
> > Changing the PCI device information structures from E820_TYPE_RAM
> > to E820_TYPE_RESERVED_KERN.
> > > I.e. what is the point of having 3 different versions of the
> > > memory layout table?
> > My original thought was that, we should not record the modification
> > from the efi boot stub into the e820_tabel_firmware and we are done.
> > But after checking the code, I realized that if we do so the
> > kexec might have potiential problem.
> > 
> > The e820_table_firmware was introduced mainly for kexec and
> > was used to pass the original memory layout to the second
> > kernel:
> > 
> >commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
> >Author: Bernhard Walle 
> >Date:   Fri Jun 27 13:12:55 2008 +0200
> > 
> >x86: use FIRMWARE_MEMMAP on x86/E820
> > 
> > Besides, the second kernel will not re-enter the efi boot stub
> > code and it will reuse the PCI device information structure created
> > by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> > region. So these PCI device information structures will not be
> > modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> > to the second kernel, thus the latter could leverage ioremap to access
> > the PCI information.
> > 
> > So the problem is, if we do not record the PCI information in
> > the e820_table_firmware, the PCI information will be kept as
> > type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> > be passed to the second kernel and might be allocated for ordinary
> > use in the second kernel, as a result the second kernel might not
> > get valid PCI information(might be overwritten by others). So
> > currently we try to introduce a new e820_table_ori to represent
> > the original one provided by the BIOS(mainly for hibernation
> > memory layout md5 checking).
> 
> So there's 3 versions we need:
> 
>  - the original 'firmware' table as-is - for MD5 check and other potential 
>purposes
> 
>  - some intermediate version of the table for kexec: what is the exact 
> definition 
>of that table, what changes from the real table does it _not_ want?
>
Some boot options such as 'mem=' are not wanted by kexec, because the kexec
wants to let the second kernel see the whole memory layout passed by
the bootloader. I think this is why e820_table_firmware was introduced.
>  - the 'real' table
> 
> all the naming should reflect that. I.e. instead of some nonsensical "_ori" 
> postfix, that is really the _firmware table. If kexec needs a separate one 
> then 
> name it _kexec and copy it at the right stage.
> 
> Ok?
>
Ok. I'm sending V2 of this patch. I tried not to break the old behavior and
split the patch into three, thus the logic might look more clear.
> Thanks,
> 
>   Ingo
Thanks,
Yu


Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-07-02 Thread Chen Yu
On Fri, Jun 23, 2017 at 10:42:10AM +0200, Ingo Molnar wrote:
> 
> * Chen Yu  wrote:
> 
> > Hi Ingo,
> > On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> > > 
> > > * Chen Yu  wrote:
> > > 
> > > > Currently we try to have e820_table_firmware to represent the
> > > > original firmware memory layout passed to us by the bootloader,
> > > > however it is not the case, the e820_table_firmware might still
> > > > be modified by linux:
> > > > 1. During bootup, the efi boot stub might allocate memory via
> > > >efi service for the PCI device information structure, then
> > > >later e820_reserve_setup_data() reserved these dynamically
> > > >allocated structures(AKA, setup_data) in e820_table_firmware
> > > >accordingly.
> > > > 2. The kexec might also modify the e820_table_firmware.
> > > 
> > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > > it modify e820_table?
> > > 
> > Both the e820_table and e820_table_firmware will be updated in
> > e820__reserve_setup_data():
> > Changing the PCI device information structures from E820_TYPE_RAM
> > to E820_TYPE_RESERVED_KERN.
> > > I.e. what is the point of having 3 different versions of the
> > > memory layout table?
> > My original thought was that, we should not record the modification
> > from the efi boot stub into the e820_tabel_firmware and we are done.
> > But after checking the code, I realized that if we do so the
> > kexec might have potiential problem.
> > 
> > The e820_table_firmware was introduced mainly for kexec and
> > was used to pass the original memory layout to the second
> > kernel:
> > 
> >commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
> >Author: Bernhard Walle 
> >Date:   Fri Jun 27 13:12:55 2008 +0200
> > 
> >x86: use FIRMWARE_MEMMAP on x86/E820
> > 
> > Besides, the second kernel will not re-enter the efi boot stub
> > code and it will reuse the PCI device information structure created
> > by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> > region. So these PCI device information structures will not be
> > modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> > to the second kernel, thus the latter could leverage ioremap to access
> > the PCI information.
> > 
> > So the problem is, if we do not record the PCI information in
> > the e820_table_firmware, the PCI information will be kept as
> > type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> > be passed to the second kernel and might be allocated for ordinary
> > use in the second kernel, as a result the second kernel might not
> > get valid PCI information(might be overwritten by others). So
> > currently we try to introduce a new e820_table_ori to represent
> > the original one provided by the BIOS(mainly for hibernation
> > memory layout md5 checking).
> 
> So there's 3 versions we need:
> 
>  - the original 'firmware' table as-is - for MD5 check and other potential 
>purposes
> 
>  - some intermediate version of the table for kexec: what is the exact 
> definition 
>of that table, what changes from the real table does it _not_ want?
>
Some boot options such as 'mem=' are not wanted by kexec, because the kexec
wants to let the second kernel see the whole memory layout passed by
the bootloader. I think this is why e820_table_firmware was introduced.
>  - the 'real' table
> 
> all the naming should reflect that. I.e. instead of some nonsensical "_ori" 
> postfix, that is really the _firmware table. If kexec needs a separate one 
> then 
> name it _kexec and copy it at the right stage.
> 
> Ok?
>
Ok. I'm sending V2 of this patch. I tried not to break the old behavior and
split the patch into three, thus the logic might look more clear.
> Thanks,
> 
>   Ingo
Thanks,
Yu


Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-23 Thread Ingo Molnar

* Chen Yu  wrote:

> Hi Ingo,
> On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> > 
> > * Chen Yu  wrote:
> > 
> > > Currently we try to have e820_table_firmware to represent the
> > > original firmware memory layout passed to us by the bootloader,
> > > however it is not the case, the e820_table_firmware might still
> > > be modified by linux:
> > > 1. During bootup, the efi boot stub might allocate memory via
> > >efi service for the PCI device information structure, then
> > >later e820_reserve_setup_data() reserved these dynamically
> > >allocated structures(AKA, setup_data) in e820_table_firmware
> > >accordingly.
> > > 2. The kexec might also modify the e820_table_firmware.
> > 
> > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > it modify e820_table?
> > 
> Both the e820_table and e820_table_firmware will be updated in
> e820__reserve_setup_data():
> Changing the PCI device information structures from E820_TYPE_RAM
> to E820_TYPE_RESERVED_KERN.
> > I.e. what is the point of having 3 different versions of the
> > memory layout table?
> My original thought was that, we should not record the modification
> from the efi boot stub into the e820_tabel_firmware and we are done.
> But after checking the code, I realized that if we do so the
> kexec might have potiential problem.
> 
> The e820_table_firmware was introduced mainly for kexec and
> was used to pass the original memory layout to the second
> kernel:
> 
>commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
>Author: Bernhard Walle 
>Date:   Fri Jun 27 13:12:55 2008 +0200
> 
>x86: use FIRMWARE_MEMMAP on x86/E820
> 
> Besides, the second kernel will not re-enter the efi boot stub
> code and it will reuse the PCI device information structure created
> by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> region. So these PCI device information structures will not be
> modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> to the second kernel, thus the latter could leverage ioremap to access
> the PCI information.
> 
> So the problem is, if we do not record the PCI information in
> the e820_table_firmware, the PCI information will be kept as
> type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> be passed to the second kernel and might be allocated for ordinary
> use in the second kernel, as a result the second kernel might not
> get valid PCI information(might be overwritten by others). So
> currently we try to introduce a new e820_table_ori to represent
> the original one provided by the BIOS(mainly for hibernation
> memory layout md5 checking).

So there's 3 versions we need:

 - the original 'firmware' table as-is - for MD5 check and other potential 
   purposes

 - some intermediate version of the table for kexec: what is the exact 
definition 
   of that table, what changes from the real table does it _not_ want?

 - the 'real' table

all the naming should reflect that. I.e. instead of some nonsensical "_ori" 
postfix, that is really the _firmware table. If kexec needs a separate one then 
name it _kexec and copy it at the right stage.

Ok?

Thanks,

Ingo


Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-23 Thread Ingo Molnar

* Chen Yu  wrote:

> Hi Ingo,
> On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> > 
> > * Chen Yu  wrote:
> > 
> > > Currently we try to have e820_table_firmware to represent the
> > > original firmware memory layout passed to us by the bootloader,
> > > however it is not the case, the e820_table_firmware might still
> > > be modified by linux:
> > > 1. During bootup, the efi boot stub might allocate memory via
> > >efi service for the PCI device information structure, then
> > >later e820_reserve_setup_data() reserved these dynamically
> > >allocated structures(AKA, setup_data) in e820_table_firmware
> > >accordingly.
> > > 2. The kexec might also modify the e820_table_firmware.
> > 
> > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > it modify e820_table?
> > 
> Both the e820_table and e820_table_firmware will be updated in
> e820__reserve_setup_data():
> Changing the PCI device information structures from E820_TYPE_RAM
> to E820_TYPE_RESERVED_KERN.
> > I.e. what is the point of having 3 different versions of the
> > memory layout table?
> My original thought was that, we should not record the modification
> from the efi boot stub into the e820_tabel_firmware and we are done.
> But after checking the code, I realized that if we do so the
> kexec might have potiential problem.
> 
> The e820_table_firmware was introduced mainly for kexec and
> was used to pass the original memory layout to the second
> kernel:
> 
>commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
>Author: Bernhard Walle 
>Date:   Fri Jun 27 13:12:55 2008 +0200
> 
>x86: use FIRMWARE_MEMMAP on x86/E820
> 
> Besides, the second kernel will not re-enter the efi boot stub
> code and it will reuse the PCI device information structure created
> by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> region. So these PCI device information structures will not be
> modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> to the second kernel, thus the latter could leverage ioremap to access
> the PCI information.
> 
> So the problem is, if we do not record the PCI information in
> the e820_table_firmware, the PCI information will be kept as
> type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> be passed to the second kernel and might be allocated for ordinary
> use in the second kernel, as a result the second kernel might not
> get valid PCI information(might be overwritten by others). So
> currently we try to introduce a new e820_table_ori to represent
> the original one provided by the BIOS(mainly for hibernation
> memory layout md5 checking).

So there's 3 versions we need:

 - the original 'firmware' table as-is - for MD5 check and other potential 
   purposes

 - some intermediate version of the table for kexec: what is the exact 
definition 
   of that table, what changes from the real table does it _not_ want?

 - the 'real' table

all the naming should reflect that. I.e. instead of some nonsensical "_ori" 
postfix, that is really the _firmware table. If kexec needs a separate one then 
name it _kexec and copy it at the right stage.

Ok?

Thanks,

Ingo


Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-22 Thread Chen Yu
Hi Ingo,
On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> 
> * Chen Yu  wrote:
> 
> > Currently we try to have e820_table_firmware to represent the
> > original firmware memory layout passed to us by the bootloader,
> > however it is not the case, the e820_table_firmware might still
> > be modified by linux:
> > 1. During bootup, the efi boot stub might allocate memory via
> >efi service for the PCI device information structure, then
> >later e820_reserve_setup_data() reserved these dynamically
> >allocated structures(AKA, setup_data) in e820_table_firmware
> >accordingly.
> > 2. The kexec might also modify the e820_table_firmware.
> 
> Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> it modify e820_table?
> 
Both the e820_table and e820_table_firmware will be updated in
e820__reserve_setup_data():
Changing the PCI device information structures from E820_TYPE_RAM
to E820_TYPE_RESERVED_KERN.
> I.e. what is the point of having 3 different versions of the
> memory layout table?
My original thought was that, we should not record the modification
from the efi boot stub into the e820_tabel_firmware and we are done.
But after checking the code, I realized that if we do so the
kexec might have potiential problem.

The e820_table_firmware was introduced mainly for kexec and
was used to pass the original memory layout to the second
kernel:

   commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
   Author: Bernhard Walle 
   Date:   Fri Jun 27 13:12:55 2008 +0200

   x86: use FIRMWARE_MEMMAP on x86/E820

Besides, the second kernel will not re-enter the efi boot stub
code and it will reuse the PCI device information structure created
by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
region. So these PCI device information structures will not be
modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
to the second kernel, thus the latter could leverage ioremap to access
the PCI information.

So the problem is, if we do not record the PCI information in
the e820_table_firmware, the PCI information will be kept as
type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
be passed to the second kernel and might be allocated for ordinary
use in the second kernel, as a result the second kernel might not
get valid PCI information(might be overwritten by others). So
currently we try to introduce a new e820_table_ori to represent
the original one provided by the BIOS(mainly for hibernation
memory layout md5 checking).

Thanks,
Yu
> 
> Thanks,
> 
>   Ingo



Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-22 Thread Chen Yu
Hi Ingo,
On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> 
> * Chen Yu  wrote:
> 
> > Currently we try to have e820_table_firmware to represent the
> > original firmware memory layout passed to us by the bootloader,
> > however it is not the case, the e820_table_firmware might still
> > be modified by linux:
> > 1. During bootup, the efi boot stub might allocate memory via
> >efi service for the PCI device information structure, then
> >later e820_reserve_setup_data() reserved these dynamically
> >allocated structures(AKA, setup_data) in e820_table_firmware
> >accordingly.
> > 2. The kexec might also modify the e820_table_firmware.
> 
> Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> it modify e820_table?
> 
Both the e820_table and e820_table_firmware will be updated in
e820__reserve_setup_data():
Changing the PCI device information structures from E820_TYPE_RAM
to E820_TYPE_RESERVED_KERN.
> I.e. what is the point of having 3 different versions of the
> memory layout table?
My original thought was that, we should not record the modification
from the efi boot stub into the e820_tabel_firmware and we are done.
But after checking the code, I realized that if we do so the
kexec might have potiential problem.

The e820_table_firmware was introduced mainly for kexec and
was used to pass the original memory layout to the second
kernel:

   commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
   Author: Bernhard Walle 
   Date:   Fri Jun 27 13:12:55 2008 +0200

   x86: use FIRMWARE_MEMMAP on x86/E820

Besides, the second kernel will not re-enter the efi boot stub
code and it will reuse the PCI device information structure created
by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
region. So these PCI device information structures will not be
modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
to the second kernel, thus the latter could leverage ioremap to access
the PCI information.

So the problem is, if we do not record the PCI information in
the e820_table_firmware, the PCI information will be kept as
type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
be passed to the second kernel and might be allocated for ordinary
use in the second kernel, as a result the second kernel might not
get valid PCI information(might be overwritten by others). So
currently we try to introduce a new e820_table_ori to represent
the original one provided by the BIOS(mainly for hibernation
memory layout md5 checking).

Thanks,
Yu
> 
> Thanks,
> 
>   Ingo



Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-22 Thread Ingo Molnar

* Chen Yu  wrote:

> Currently we try to have e820_table_firmware to represent the
> original firmware memory layout passed to us by the bootloader,
> however it is not the case, the e820_table_firmware might still
> be modified by linux:
> 1. During bootup, the efi boot stub might allocate memory via
>efi service for the PCI device information structure, then
>later e820_reserve_setup_data() reserved these dynamically
>allocated structures(AKA, setup_data) in e820_table_firmware
>accordingly.
> 2. The kexec might also modify the e820_table_firmware.

Hm, so why does the EFI code modify e280_table_firmware - why doesn't
it modify e820_table?

I.e. what is the point of having 3 different versions of the
memory layout table?

Thanks,

Ingo


Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

2017-06-22 Thread Ingo Molnar

* Chen Yu  wrote:

> Currently we try to have e820_table_firmware to represent the
> original firmware memory layout passed to us by the bootloader,
> however it is not the case, the e820_table_firmware might still
> be modified by linux:
> 1. During bootup, the efi boot stub might allocate memory via
>efi service for the PCI device information structure, then
>later e820_reserve_setup_data() reserved these dynamically
>allocated structures(AKA, setup_data) in e820_table_firmware
>accordingly.
> 2. The kexec might also modify the e820_table_firmware.

Hm, so why does the EFI code modify e280_table_firmware - why doesn't
it modify e820_table?

I.e. what is the point of having 3 different versions of the
memory layout table?

Thanks,

Ingo