Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Russ Anderson
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin  wrote:
> >
> > BRK makes sense as long as you can set a sane O(1) size limit.
> >
> >>
> >>put the acpi override table in BRK, we still need ok from HPA.
> >>I have impression that he did not like it, so want to confirm from him.
> 
> on 8 sockets system:
> -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
> -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
> -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
> -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
> -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
> -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
> -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
> -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
> -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
> -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
> -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
> -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
> -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
> -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
> -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
> -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
> -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
> -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
> -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
> -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
> -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
> -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
> -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat
> 
> assume for 32sockets will have four times bigger with DSDT and SSDT.
> (with more pci and cpus)
> 
> So we can not have O(1) the size.
> 
> Russ, What is ACPI table size on your big machine?

This is from a 255 socket, 4080 cpu, 15TB system.

---
-rw-r--r-- 1 root root   65392 Aug 23 21:23 apic.dat
-rw-r--r-- 1 root root 316 Aug 23 21:23 dmar.dat
-rw-r--r-- 1 root root 8309249 Aug 23 21:23 dsdt.dat
-rw-r--r-- 1 root root 244 Aug 23 21:23 facp.dat
-rw-r--r-- 1 root root  64 Aug 23 21:23 facs.dat
-rw-r--r-- 1 root root  56 Aug 23 21:23 hpet.dat
-rw-r--r-- 1 root root4172 Aug 23 21:23 mcfg.dat
-rw-r--r-- 1 root root  36 Aug 23 21:23 rsdp.dat
-rw-r--r-- 1 root root  80 Aug 23 21:23 rsdt.dat
-rw-r--r-- 1 root root   65069 Aug 23 21:23 slit.dat
-rw-r--r-- 1 root root  80 Aug 23 21:23 spcr.dat
-rw-r--r-- 1 root root  108168 Aug 23 21:23 srat.dat
-rw-r--r-- 1 root root   21330 Aug 23 21:23 ssdt.dat
-rw-r--r-- 1 root root  92 Aug 23 21:23 uefi1.dat
-rw-r--r-- 1 root root 298 Aug 23 21:23 uefi.dat
-rw-r--r-- 1 root root 124 Aug 23 21:23 xsdt.dat
---

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
> So we need to allocate memory. That is why you suggested to use BRK, right ?
> And the size seems to be a problem.
>
> So I suggest to use early_ioremap().
>
> 1. After paging is enabled, before direct mapping page tables are
> setup, we map the
> initrd with early_ioremap(). And we are able to access it with va,
> even on 32bit.
> Then we can find all tables.
> 2. We still use memblock to allocate memory. Maybe it will be
> hotpluggable memory,
> but this memory can be freed when all the acpi tables are parsed, right ?
>
> So I want to try early_ioremap(). All these should be done in setup_arch().

no.
cpio search need to take whole range virtual address,
and early_ioremap has size limitation.
you will have to update cpio search to take mapping function.
could be too messy.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread chen tang
Hi Yinghai,

2013/8/24 Yinghai Lu :
> On Fri, Aug 23, 2013 at 2:50 PM, chen tang  wrote:
>>>
>>> so the DSDT is 7F493E, and total is more than 8M.
>>>
>>> that will need BRK to be extended 16M?
>>>
>>
>> Then how about use early_ioremap(), and don't do it that early in
>> head_32 and head64 ?
>
> why could early_ioremap() help?
>
> when to use early_ioremap()? what for?
>

In my understanding, acpica framework needs users to copy the override tables
somewhere in the memory. And acpica will get these user specified tables when
installing firmware tables. This is the acpica logic, which cannot be
changed, I think.

So we need to allocate memory. That is why you suggested to use BRK, right ?
And the size seems to be a problem.

So I suggest to use early_ioremap().

1. After paging is enabled, before direct mapping page tables are
setup, we map the
initrd with early_ioremap(). And we are able to access it with va,
even on 32bit.
Then we can find all tables.
2. We still use memblock to allocate memory. Maybe it will be
hotpluggable memory,
but this memory can be freed when all the acpi tables are parsed, right ?

So I want to try early_ioremap(). All these should be done in setup_arch().

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 2:50 PM, chen tang  wrote:
>>
>> so the DSDT is 7F493E, and total is more than 8M.
>>
>> that will need BRK to be extended 16M?
>>
>
> Then how about use early_ioremap(), and don't do it that early in
> head_32 and head64 ?

why could early_ioremap() help?

when to use early_ioremap()? what for?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 2:52 PM, Moore, Robert  wrote:
> While we're at it:
>
> Can someone send me the acpidump for this machine? We very much would like to 
> test all of ACPICA with such a large DSDT.

That is Russ.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Moore, Robert
While we're at it:

Can someone send me the acpidump for this machine? We very much would like to 
test all of ACPICA with such a large DSDT.

Thanks,
Bob


> -Original Message-
> From: chen tang [mailto:imtangc...@gmail.com]
> Sent: Friday, August 23, 2013 2:51 PM
> To: Yinghai Lu
> Cc: Russ Anderson; H. Peter Anvin; Zhang Yanfei; Toshi Kani; Tejun Heo;
> Tang Chen; Konrad Rzeszutek Wilk; Moore, Robert; Zheng, Lv; Rafael J.
> Wysocki; Ingo Molnar; Andrew Morton; Thomas Renninger; Yasuaki Ishimatsu;
> Mel Gorman; Linux Kernel Mailing List
> Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
> 
> Hi Yinghai,
> 
> 2013/8/24 Yinghai Lu :
> > On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson  wrote:
> >> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> >>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin 
> wrote:
> >
> >>> Russ, What is ACPI table size on your big machine?
> >>
> >> This is from a 256 socket 32TB system.
> >>
> >>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM:
> 32501719MB)
> >>  ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
> >>  ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO
> 0113)
> >>  ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT
> 0113)
> >>  ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT
> 0113)
> >>  ACPI: FACS 7d147000 00040
> >>  ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO 
> )
> >>  ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV 
> )
> >>  ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT
> 0113)
> >>  ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL
> 20070508)
> >>  ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT
> 0001)
> >>  ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT
> 0001)
> >>  ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT
> 0001)
> >>  ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT
> 0001)
> >>  ACPI: SPCR 7e6c2000 00050 (v01 
> )
> >>  ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT
> 0113)
> >>
> >
> > so the DSDT is 7F493E, and total is more than 8M.
> >
> > that will need BRK to be extended 16M?
> >
> 
> Then how about use early_ioremap(), and don't do it that early in
> head_32 and head64 ?
> 
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread chen tang
Hi Yinghai,

2013/8/24 Yinghai Lu :
> On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson  wrote:
>> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
>>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin  wrote:
>
>>> Russ, What is ACPI table size on your big machine?
>>
>> This is from a 256 socket 32TB system.
>>
>>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
>> 32501719MB)
>>  ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
>>  ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  
>> 0113)
>>  ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 
>> 0113)
>>  ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 
>> 0113)
>>  ACPI: FACS 7d147000 00040
>>  ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   
>> )
>>  ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   
>> )
>>  ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 
>> 0113)
>>  ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 
>> 20070508)
>>  ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 
>> 0001)
>>  ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 
>> 0001)
>>  ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 
>> 0001)
>>  ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 
>> 0001)
>>  ACPI: SPCR 7e6c2000 00050 (v01   
>> )
>>  ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 
>> 0113)
>>
>
> so the DSDT is 7F493E, and total is more than 8M.
>
> that will need BRK to be extended 16M?
>

Then how about use early_ioremap(), and don't do it that early in
head_32 and head64 ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson  wrote:
> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin  wrote:

>> Russ, What is ACPI table size on your big machine?
>
> This is from a 256 socket 32TB system.
>
>  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
> 32501719MB)
>  ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
>  ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  
> 0113)
>  ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 
> 0113)
>  ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 
> 0113)
>  ACPI: FACS 7d147000 00040
>  ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   
> )
>  ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   
> )
>  ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 
> 0113)
>  ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 
> 20070508)
>  ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 
> 0001)
>  ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 
> 0001)
>  ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 
> 0001)
>  ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 
> 0001)
>  ACPI: SPCR 7e6c2000 00050 (v01   
> )
>  ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 
> 0113)
>

so the DSDT is 7F493E, and total is more than 8M.

that will need BRK to be extended 16M?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Russ Anderson
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin  wrote:
> >
> > BRK makes sense as long as you can set a sane O(1) size limit.
> >
> >>
> >>put the acpi override table in BRK, we still need ok from HPA.
> >>I have impression that he did not like it, so want to confirm from him.
> 
> on 8 sockets system:
> -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
> -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
> -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
> -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
> -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
> -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
> -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
> -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
> -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
> -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
> -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
> -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
> -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
> -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
> -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
> -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
> -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
> -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
> -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
> -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
> -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
> -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
> -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat
> 
> assume for 32sockets will have four times bigger with DSDT and SSDT.
> (with more pci and cpus)
> 
> So we can not have O(1) the size.
> 
> Russ, What is ACPI table size on your big machine?

This is from a 256 socket 32TB system.

 Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
32501719MB)
 ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
 ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  0113)
 ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 0113)
 ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 0113)
 ACPI: FACS 7d147000 00040
 ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   )
 ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   )
 ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 0113)
 ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 20070508)
 ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 0001)
 ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 0001)
 ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 0001)
 ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 0001)
 ACPI: SPCR 7e6c2000 00050 (v01   )
 ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 0113)

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin  wrote:
>
> BRK makes sense as long as you can set a sane O(1) size limit.
>
>>
>>put the acpi override table in BRK, we still need ok from HPA.
>>I have impression that he did not like it, so want to confirm from him.

on 8 sockets system:
-rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
-rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
-rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
-rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
-rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
-rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
-rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
-rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
-rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
-rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
-rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
-rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
-rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
-rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
-rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
-rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
-rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
-rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
-rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
-rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
-rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
-rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
-rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat

assume for 32sockets will have four times bigger with DSDT and SSDT.
(with more pci and cpus)

So we can not have O(1) the size.

Russ, What is ACPI table size on your big machine?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello Zhang,

On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
> > Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> > 
> 
> By saying TJ's suggestion, you mean, we will let memblock to control the
> behaviour, that said, we will do early allocations near the kernel image
> range before we get the SRAT info?

Right.

> If so, yeah, we have been working on this direction. 

Great!

> By doing this, we may
> have two main changes:
> 
> 1. change some of memblock's APIs to make it have the ability to allocate
>memory from low address.
> 2. setup kernel page table down-top. Concretely, we first map the memory
>just after the kernel image to the top, then, we map 0 - kernel image end.
> 
> Do you guys think this is reasonable and acceptable?

Have you also looked at Yinghai's comments below?

http://www.spinics.net/lists/linux-mm/msg61362.html

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Zhang Yanfei
Hi Toshi,

On 08/24/2013 01:13 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
>> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
>>> I still think acpi table info should be available earlier, but I do not
>>> think I can convince you on this.  This can be religious debate.
>>
>> I'm curious.  If there aren't substantial enough benefits, why would
>> you still want to pull it earlier when it brings in things like initrd
>> override and crafting the code carefully so that it's safe to execute
>> it from different address modes and so on?  Please note that x86 is
>> not ia64.  The early environment is completely different not only
>> technically but also in its diversity and suckiness.  It wasn't too
>> long ago that vendors were screwing up ACPI left and right.  It has
>> been getting better but there's a reason why, for example, we still
>> consider e820 to be the authoritative information over ACPI.
> 
> Firmware generates tables, and provides them via some interface.  Memory
> map table can be provided via e820 or EFI memory map.  Memory topology
> table is provided via ACPI.  I agree to prioritize one table over the
> other when there is overlap.  But in the end, it is the firmware that
> generates the tables.  Because it is provided via ACPI does not make it
> suddenly unreliable.  I think table info from e820/EFI/ACPI should be
> available at the same time.  To me, it makes more sense to use the
> hotplug info to initialize memblock than try to find a way to workaround
> without it.  

Yeah, agreed. But sigh on x86, we have ACPI initrd override, so we still
cannot convince Tj

I think we will continue to be in that way to find a
> workaround in this direction. 
> 
> I came from ia64 background, and am not very familiar with x86.  So, you
> may be very right about that x86 is different.  I also agree that initrd
> is making it unnecessarily complicated.  We may see some initial issues,
> but my hope is that the code gets matured over the time.
> 
> Thanks,
> -Toshi
> 


-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello,

On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> > I still think acpi table info should be available earlier, but I do not
> > think I can convince you on this.  This can be religious debate.
> 
> I'm curious.  If there aren't substantial enough benefits, why would
> you still want to pull it earlier when it brings in things like initrd
> override and crafting the code carefully so that it's safe to execute
> it from different address modes and so on?  Please note that x86 is
> not ia64.  The early environment is completely different not only
> technically but also in its diversity and suckiness.  It wasn't too
> long ago that vendors were screwing up ACPI left and right.  It has
> been getting better but there's a reason why, for example, we still
> consider e820 to be the authoritative information over ACPI.

Firmware generates tables, and provides them via some interface.  Memory
map table can be provided via e820 or EFI memory map.  Memory topology
table is provided via ACPI.  I agree to prioritize one table over the
other when there is overlap.  But in the end, it is the firmware that
generates the tables.  Because it is provided via ACPI does not make it
suddenly unreliable.  I think table info from e820/EFI/ACPI should be
available at the same time.  To me, it makes more sense to use the
hotplug info to initialize memblock than try to find a way to workaround
without it.  I think we will continue to be in that way to find a
workaround in this direction. 

I came from ia64 background, and am not very familiar with x86.  So, you
may be very right about that x86 is different.  I also agree that initrd
is making it unnecessarily complicated.  We may see some initial issues,
but my hope is that the code gets matured over the time.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Zhang Yanfei
Hello

On 08/24/2013 12:14 AM, Toshi Kani wrote:
> Hello,
> 
> On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
>> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>>> I am relatively new to Linux, so I am not a good person to elaborate
>>> this.  From my experience on other OS, huge pages helped for the kernel,
>>> but did not necessarily help user applications.  It depended on
>>> applications, which were not niche cases.  But Linux may be different,
>>> so I asked since you seemed confident.  I'd appreciate if you can point
>>> us some data that endorses your statement.
>>
>> We are talking about the kernel linear mapping which is created during
>> early boot, so if it's available and useable there's no reason not to
>> use it.  Exceptions would be earlier processors which didn't do 1G
>> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>> configurations, the former has been history for a bit now.  Can't be
>> sure about the latter but it'd be surprising for that to affect large
>> amount of memory in the systems that are of interest here.  Ooh, that
>> reminds me that we probably wanna go back to 1G + MTRR mapping under
>> 4G.  We're currently creating a lot of mapping holes.
> 
> Thanks for the explanation.
> 
>>> My worry is that the code is unlikely tested with the special logic when
>>> someone makes code changes to the page tables.  Such code can easily be
>>> broken in future.
>>
>> Well, I wouldn't consider flipping the direction of allocation to be
>> particularly difficult to get right especially when compared to
>> bringing in ACPI tables into the mix.
>>
>>> To answer your other question/email, I believe Tang's next step is to
>>> support local page tables.  This is why we think pursing SRAT earlier is
>>> the right direction.
>>
>> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>> more more skeptical.
> 
> With 1G mappings, I agree that it won't make much difference.
> 
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.
> 
> Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
> 

By saying TJ's suggestion, you mean, we will let memblock to control the
behaviour, that said, we will do early allocations near the kernel image
range before we get the SRAT info?

If so, yeah, we have been working on this direction. By doing this, we may
have two main changes:

1. change some of memblock's APIs to make it have the ability to allocate
   memory from low address.
2. setup kernel page table down-top. Concretely, we first map the memory
   just after the kernel image to the top, then, we map 0 - kernel image end.

Do you guys think this is reasonable and acceptable?

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
> I still think acpi table info should be available earlier, but I do not
> think I can convince you on this.  This can be religious debate.

I'm curious.  If there aren't substantial enough benefits, why would
you still want to pull it earlier when it brings in things like initrd
override and crafting the code carefully so that it's safe to execute
it from different address modes and so on?  Please note that x86 is
not ia64.  The early environment is completely different not only
technically but also in its diversity and suckiness.  It wasn't too
long ago that vendors were screwing up ACPI left and right.  It has
been getting better but there's a reason why, for example, we still
consider e820 to be the authoritative information over ACPI.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello,

On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> > I am relatively new to Linux, so I am not a good person to elaborate
> > this.  From my experience on other OS, huge pages helped for the kernel,
> > but did not necessarily help user applications.  It depended on
> > applications, which were not niche cases.  But Linux may be different,
> > so I asked since you seemed confident.  I'd appreciate if you can point
> > us some data that endorses your statement.
> 
> We are talking about the kernel linear mapping which is created during
> early boot, so if it's available and useable there's no reason not to
> use it.  Exceptions would be earlier processors which didn't do 1G
> mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
> configurations, the former has been history for a bit now.  Can't be
> sure about the latter but it'd be surprising for that to affect large
> amount of memory in the systems that are of interest here.  Ooh, that
> reminds me that we probably wanna go back to 1G + MTRR mapping under
> 4G.  We're currently creating a lot of mapping holes.

Thanks for the explanation.

> > My worry is that the code is unlikely tested with the special logic when
> > someone makes code changes to the page tables.  Such code can easily be
> > broken in future.
> 
> Well, I wouldn't consider flipping the direction of allocation to be
> particularly difficult to get right especially when compared to
> bringing in ACPI tables into the mix.
> 
> > To answer your other question/email, I believe Tang's next step is to
> > support local page tables.  This is why we think pursing SRAT earlier is
> > the right direction.
> 
> Given 1G mappings, is that even a worthwhile effort?  I'm getting even
> more more skeptical.

With 1G mappings, I agree that it won't make much difference.

I still think acpi table info should be available earlier, but I do not
think I can convince you on this.  This can be religious debate.

Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote:
> Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
> doing that for over a decade and are still doing it anyway if I'm not
> mistaken.  It probably isn't a big difference but it's still a bit sad
> that this is likely causing small performance regression out in the
> wild.

Just went over the processor manual and it doesn't seem like doing the
above would be a good idea.


  System Programming Guide, Part 1

  11.11.9 Large Page Size Considerations

 ... 
 Because the memory type for a large page is cached in the TLB, the
 processor can behave in an undefined manner if a large page is mapped
 to a region of memory that MTRRs have mapped with multiple memory
 types.
 ...
 If a large page maps to a region of memory containing different
 MTRR-defined memory types, the PCD and PWT flags in the page-table
 entry should be set for the most conservative memory type for that
 range. For example, a large page used for memory mapped I/O and
 regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL
 ...

 The Pentium 4, Intel Xeon, and P6 family processors provide special
 support for the physical memory range from 0 to 4 MBytes,
 ...
 Here, the processor maps the memory range as multiple 4-KByte pages
 within the TLB. This operation insures correct behavior at the cost
 of performance. To avoid this performance penalty, operating-system
 software should reserve the large page option for regions of memory
 at addresses greater than or equal to 4 MBytes.

So, yeah, the current behavior seems like the right thing to do.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote:
> Well... relying on MTRRs is a big cost in complexity and failure modes.

Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
doing that for over a decade and are still doing it anyway if I'm not
mistaken.  It probably isn't a big difference but it's still a bit sad
that this is likely causing small performance regression out in the
wild.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread H. Peter Anvin
Well... relying on MTRRs is a big cost in complexity and failure modes.

Tejun Heo  wrote:
>Hello,
>
>On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
>> What is the point of 1G+MTRR?  If there are caching differences the
>> TLB will fracture the pages anyway.
>
>Ah, right.  Consuming less memory / cachelines would still be a small
>advantage tho unless creating split TLB from larger mapping is
>noticeably less efficient.  If the extra logic to do that is small,
>which I think it'd be, it'd be a gain at almost no cost.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
> What is the point of 1G+MTRR?  If there are caching differences the
> TLB will fracture the pages anyway.

Ah, right.  Consuming less memory / cachelines would still be a small
advantage tho unless creating split TLB from larger mapping is
noticeably less efficient.  If the extra logic to do that is small,
which I think it'd be, it'd be a gain at almost no cost.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread H. Peter Anvin
What is the point of 1G+MTRR?  If there are caching differences the TLB will 
fracture the pages anyway.

Tejun Heo  wrote:
>Hello, Toshi.
>
>On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
>> I am relatively new to Linux, so I am not a good person to elaborate
>> this.  From my experience on other OS, huge pages helped for the
>kernel,
>> but did not necessarily help user applications.  It depended on
>> applications, which were not niche cases.  But Linux may be
>different,
>> so I asked since you seemed confident.  I'd appreciate if you can
>point
>> us some data that endorses your statement.
>
>We are talking about the kernel linear mapping which is created during
>early boot, so if it's available and useable there's no reason not to
>use it.  Exceptions would be earlier processors which didn't do 1G
>mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
>configurations, the former has been history for a bit now.  Can't be
>sure about the latter but it'd be surprising for that to affect large
>amount of memory in the systems that are of interest here.  Ooh, that
>reminds me that we probably wanna go back to 1G + MTRR mapping under
>4G.  We're currently creating a lot of mapping holes.
>
>> My worry is that the code is unlikely tested with the special logic
>when
>> someone makes code changes to the page tables.  Such code can easily
>be
>> broken in future.
>
>Well, I wouldn't consider flipping the direction of allocation to be
>particularly difficult to get right especially when compared to
>bringing in ACPI tables into the mix.
>
>> To answer your other question/email, I believe Tang's next step is to
>> support local page tables.  This is why we think pursing SRAT earlier
>is
>> the right direction.
>
>Given 1G mappings, is that even a worthwhile effort?  I'm getting even
>more more skeptical.
>
>Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello, Toshi.

On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
> I am relatively new to Linux, so I am not a good person to elaborate
> this.  From my experience on other OS, huge pages helped for the kernel,
> but did not necessarily help user applications.  It depended on
> applications, which were not niche cases.  But Linux may be different,
> so I asked since you seemed confident.  I'd appreciate if you can point
> us some data that endorses your statement.

We are talking about the kernel linear mapping which is created during
early boot, so if it's available and useable there's no reason not to
use it.  Exceptions would be earlier processors which didn't do 1G
mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
configurations, the former has been history for a bit now.  Can't be
sure about the latter but it'd be surprising for that to affect large
amount of memory in the systems that are of interest here.  Ooh, that
reminds me that we probably wanna go back to 1G + MTRR mapping under
4G.  We're currently creating a lot of mapping holes.

> My worry is that the code is unlikely tested with the special logic when
> someone makes code changes to the page tables.  Such code can easily be
> broken in future.

Well, I wouldn't consider flipping the direction of allocation to be
particularly difficult to get right especially when compared to
bringing in ACPI tables into the mix.

> To answer your other question/email, I believe Tang's next step is to
> support local page tables.  This is why we think pursing SRAT earlier is
> the right direction.

Given 1G mappings, is that even a worthwhile effort?  I'm getting even
more more skeptical.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello, Toshi.

On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
 I am relatively new to Linux, so I am not a good person to elaborate
 this.  From my experience on other OS, huge pages helped for the kernel,
 but did not necessarily help user applications.  It depended on
 applications, which were not niche cases.  But Linux may be different,
 so I asked since you seemed confident.  I'd appreciate if you can point
 us some data that endorses your statement.

We are talking about the kernel linear mapping which is created during
early boot, so if it's available and useable there's no reason not to
use it.  Exceptions would be earlier processors which didn't do 1G
mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
configurations, the former has been history for a bit now.  Can't be
sure about the latter but it'd be surprising for that to affect large
amount of memory in the systems that are of interest here.  Ooh, that
reminds me that we probably wanna go back to 1G + MTRR mapping under
4G.  We're currently creating a lot of mapping holes.

 My worry is that the code is unlikely tested with the special logic when
 someone makes code changes to the page tables.  Such code can easily be
 broken in future.

Well, I wouldn't consider flipping the direction of allocation to be
particularly difficult to get right especially when compared to
bringing in ACPI tables into the mix.

 To answer your other question/email, I believe Tang's next step is to
 support local page tables.  This is why we think pursing SRAT earlier is
 the right direction.

Given 1G mappings, is that even a worthwhile effort?  I'm getting even
more more skeptical.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread H. Peter Anvin
What is the point of 1G+MTRR?  If there are caching differences the TLB will 
fracture the pages anyway.

Tejun Heo t...@kernel.org wrote:
Hello, Toshi.

On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
 I am relatively new to Linux, so I am not a good person to elaborate
 this.  From my experience on other OS, huge pages helped for the
kernel,
 but did not necessarily help user applications.  It depended on
 applications, which were not niche cases.  But Linux may be
different,
 so I asked since you seemed confident.  I'd appreciate if you can
point
 us some data that endorses your statement.

We are talking about the kernel linear mapping which is created during
early boot, so if it's available and useable there's no reason not to
use it.  Exceptions would be earlier processors which didn't do 1G
mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
configurations, the former has been history for a bit now.  Can't be
sure about the latter but it'd be surprising for that to affect large
amount of memory in the systems that are of interest here.  Ooh, that
reminds me that we probably wanna go back to 1G + MTRR mapping under
4G.  We're currently creating a lot of mapping holes.

 My worry is that the code is unlikely tested with the special logic
when
 someone makes code changes to the page tables.  Such code can easily
be
 broken in future.

Well, I wouldn't consider flipping the direction of allocation to be
particularly difficult to get right especially when compared to
bringing in ACPI tables into the mix.

 To answer your other question/email, I believe Tang's next step is to
 support local page tables.  This is why we think pursing SRAT earlier
is
 the right direction.

Given 1G mappings, is that even a worthwhile effort?  I'm getting even
more more skeptical.

Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
 What is the point of 1G+MTRR?  If there are caching differences the
 TLB will fracture the pages anyway.

Ah, right.  Consuming less memory / cachelines would still be a small
advantage tho unless creating split TLB from larger mapping is
noticeably less efficient.  If the extra logic to do that is small,
which I think it'd be, it'd be a gain at almost no cost.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread H. Peter Anvin
Well... relying on MTRRs is a big cost in complexity and failure modes.

Tejun Heo t...@kernel.org wrote:
Hello,

On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote:
 What is the point of 1G+MTRR?  If there are caching differences the
 TLB will fracture the pages anyway.

Ah, right.  Consuming less memory / cachelines would still be a small
advantage tho unless creating split TLB from larger mapping is
noticeably less efficient.  If the extra logic to do that is small,
which I think it'd be, it'd be a gain at almost no cost.

Thanks.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote:
 Well... relying on MTRRs is a big cost in complexity and failure modes.

Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
doing that for over a decade and are still doing it anyway if I'm not
mistaken.  It probably isn't a big difference but it's still a bit sad
that this is likely causing small performance regression out in the
wild.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote:
 Yeah, it's true that MTRRs are nasty.  On the other hand, we've been
 doing that for over a decade and are still doing it anyway if I'm not
 mistaken.  It probably isn't a big difference but it's still a bit sad
 that this is likely causing small performance regression out in the
 wild.

Just went over the processor manual and it doesn't seem like doing the
above would be a good idea.


  System Programming Guide, Part 1

  11.11.9 Large Page Size Considerations

 ... 
 Because the memory type for a large page is cached in the TLB, the
 processor can behave in an undefined manner if a large page is mapped
 to a region of memory that MTRRs have mapped with multiple memory
 types.
 ...
 If a large page maps to a region of memory containing different
 MTRR-defined memory types, the PCD and PWT flags in the page-table
 entry should be set for the most conservative memory type for that
 range. For example, a large page used for memory mapped I/O and
 regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL
 ...

 The Pentium 4, Intel Xeon, and P6 family processors provide special
 support for the physical memory range from 0 to 4 MBytes,
 ...
 Here, the processor maps the memory range as multiple 4-KByte pages
 within the TLB. This operation insures correct behavior at the cost
 of performance. To avoid this performance penalty, operating-system
 software should reserve the large page option for regions of memory
 at addresses greater than or equal to 4 MBytes.

So, yeah, the current behavior seems like the right thing to do.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello,

On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
 On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
  I am relatively new to Linux, so I am not a good person to elaborate
  this.  From my experience on other OS, huge pages helped for the kernel,
  but did not necessarily help user applications.  It depended on
  applications, which were not niche cases.  But Linux may be different,
  so I asked since you seemed confident.  I'd appreciate if you can point
  us some data that endorses your statement.
 
 We are talking about the kernel linear mapping which is created during
 early boot, so if it's available and useable there's no reason not to
 use it.  Exceptions would be earlier processors which didn't do 1G
 mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
 configurations, the former has been history for a bit now.  Can't be
 sure about the latter but it'd be surprising for that to affect large
 amount of memory in the systems that are of interest here.  Ooh, that
 reminds me that we probably wanna go back to 1G + MTRR mapping under
 4G.  We're currently creating a lot of mapping holes.

Thanks for the explanation.

  My worry is that the code is unlikely tested with the special logic when
  someone makes code changes to the page tables.  Such code can easily be
  broken in future.
 
 Well, I wouldn't consider flipping the direction of allocation to be
 particularly difficult to get right especially when compared to
 bringing in ACPI tables into the mix.
 
  To answer your other question/email, I believe Tang's next step is to
  support local page tables.  This is why we think pursing SRAT earlier is
  the right direction.
 
 Given 1G mappings, is that even a worthwhile effort?  I'm getting even
 more more skeptical.

With 1G mappings, I agree that it won't make much difference.

I still think acpi table info should be available earlier, but I do not
think I can convince you on this.  This can be religious debate.

Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
 I still think acpi table info should be available earlier, but I do not
 think I can convince you on this.  This can be religious debate.

I'm curious.  If there aren't substantial enough benefits, why would
you still want to pull it earlier when it brings in things like initrd
override and crafting the code carefully so that it's safe to execute
it from different address modes and so on?  Please note that x86 is
not ia64.  The early environment is completely different not only
technically but also in its diversity and suckiness.  It wasn't too
long ago that vendors were screwing up ACPI left and right.  It has
been getting better but there's a reason why, for example, we still
consider e820 to be the authoritative information over ACPI.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Zhang Yanfei
Hello

On 08/24/2013 12:14 AM, Toshi Kani wrote:
 Hello,
 
 On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote:
 On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote:
 I am relatively new to Linux, so I am not a good person to elaborate
 this.  From my experience on other OS, huge pages helped for the kernel,
 but did not necessarily help user applications.  It depended on
 applications, which were not niche cases.  But Linux may be different,
 so I asked since you seemed confident.  I'd appreciate if you can point
 us some data that endorses your statement.

 We are talking about the kernel linear mapping which is created during
 early boot, so if it's available and useable there's no reason not to
 use it.  Exceptions would be earlier processors which didn't do 1G
 mappings or e820 maps with a lot of holes.  For CPUs used in NUMA
 configurations, the former has been history for a bit now.  Can't be
 sure about the latter but it'd be surprising for that to affect large
 amount of memory in the systems that are of interest here.  Ooh, that
 reminds me that we probably wanna go back to 1G + MTRR mapping under
 4G.  We're currently creating a lot of mapping holes.
 
 Thanks for the explanation.
 
 My worry is that the code is unlikely tested with the special logic when
 someone makes code changes to the page tables.  Such code can easily be
 broken in future.

 Well, I wouldn't consider flipping the direction of allocation to be
 particularly difficult to get right especially when compared to
 bringing in ACPI tables into the mix.

 To answer your other question/email, I believe Tang's next step is to
 support local page tables.  This is why we think pursing SRAT earlier is
 the right direction.

 Given 1G mappings, is that even a worthwhile effort?  I'm getting even
 more more skeptical.
 
 With 1G mappings, I agree that it won't make much difference.
 
 I still think acpi table info should be available earlier, but I do not
 think I can convince you on this.  This can be religious debate.
 
 Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
 

By saying TJ's suggestion, you mean, we will let memblock to control the
behaviour, that said, we will do early allocations near the kernel image
range before we get the SRAT info?

If so, yeah, we have been working on this direction. By doing this, we may
have two main changes:

1. change some of memblock's APIs to make it have the ability to allocate
   memory from low address.
2. setup kernel page table down-top. Concretely, we first map the memory
   just after the kernel image to the top, then, we map 0 - kernel image end.

Do you guys think this is reasonable and acceptable?

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello,

On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
 On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
  I still think acpi table info should be available earlier, but I do not
  think I can convince you on this.  This can be religious debate.
 
 I'm curious.  If there aren't substantial enough benefits, why would
 you still want to pull it earlier when it brings in things like initrd
 override and crafting the code carefully so that it's safe to execute
 it from different address modes and so on?  Please note that x86 is
 not ia64.  The early environment is completely different not only
 technically but also in its diversity and suckiness.  It wasn't too
 long ago that vendors were screwing up ACPI left and right.  It has
 been getting better but there's a reason why, for example, we still
 consider e820 to be the authoritative information over ACPI.

Firmware generates tables, and provides them via some interface.  Memory
map table can be provided via e820 or EFI memory map.  Memory topology
table is provided via ACPI.  I agree to prioritize one table over the
other when there is overlap.  But in the end, it is the firmware that
generates the tables.  Because it is provided via ACPI does not make it
suddenly unreliable.  I think table info from e820/EFI/ACPI should be
available at the same time.  To me, it makes more sense to use the
hotplug info to initialize memblock than try to find a way to workaround
without it.  I think we will continue to be in that way to find a
workaround in this direction. 

I came from ia64 background, and am not very familiar with x86.  So, you
may be very right about that x86 is different.  I also agree that initrd
is making it unnecessarily complicated.  We may see some initial issues,
but my hope is that the code gets matured over the time.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Zhang Yanfei
Hi Toshi,

On 08/24/2013 01:13 AM, Toshi Kani wrote:
 Hello,
 
 On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote:
 On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote:
 I still think acpi table info should be available earlier, but I do not
 think I can convince you on this.  This can be religious debate.

 I'm curious.  If there aren't substantial enough benefits, why would
 you still want to pull it earlier when it brings in things like initrd
 override and crafting the code carefully so that it's safe to execute
 it from different address modes and so on?  Please note that x86 is
 not ia64.  The early environment is completely different not only
 technically but also in its diversity and suckiness.  It wasn't too
 long ago that vendors were screwing up ACPI left and right.  It has
 been getting better but there's a reason why, for example, we still
 consider e820 to be the authoritative information over ACPI.
 
 Firmware generates tables, and provides them via some interface.  Memory
 map table can be provided via e820 or EFI memory map.  Memory topology
 table is provided via ACPI.  I agree to prioritize one table over the
 other when there is overlap.  But in the end, it is the firmware that
 generates the tables.  Because it is provided via ACPI does not make it
 suddenly unreliable.  I think table info from e820/EFI/ACPI should be
 available at the same time.  To me, it makes more sense to use the
 hotplug info to initialize memblock than try to find a way to workaround
 without it.  

Yeah, agreed. But sigh on x86, we have ACPI initrd override, so we still
cannot convince Tj

I think we will continue to be in that way to find a
 workaround in this direction. 
 
 I came from ia64 background, and am not very familiar with x86.  So, you
 may be very right about that x86 is different.  I also agree that initrd
 is making it unnecessarily complicated.  We may see some initial issues,
 but my hope is that the code gets matured over the time.
 
 Thanks,
 -Toshi
 


-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Toshi Kani
Hello Zhang,

On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote:
  Tang, what do you think?  Are you OK to try Tejun's suggestion as well? 
  
 
 By saying TJ's suggestion, you mean, we will let memblock to control the
 behaviour, that said, we will do early allocations near the kernel image
 range before we get the SRAT info?

Right.

 If so, yeah, we have been working on this direction. 

Great!

 By doing this, we may
 have two main changes:
 
 1. change some of memblock's APIs to make it have the ability to allocate
memory from low address.
 2. setup kernel page table down-top. Concretely, we first map the memory
just after the kernel image to the top, then, we map 0 - kernel image end.
 
 Do you guys think this is reasonable and acceptable?

Have you also looked at Yinghai's comments below?

http://www.spinics.net/lists/linux-mm/msg61362.html

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote:

 BRK makes sense as long as you can set a sane O(1) size limit.


put the acpi override table in BRK, we still need ok from HPA.
I have impression that he did not like it, so want to confirm from him.

on 8 sockets system:
-rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
-rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
-rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
-rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
-rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
-rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
-rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
-rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
-rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
-rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
-rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
-rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
-rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
-rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
-rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
-rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
-rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
-rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
-rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
-rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
-rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
-rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
-rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat

assume for 32sockets will have four times bigger with DSDT and SSDT.
(with more pci and cpus)

So we can not have O(1) the size.

Russ, What is ACPI table size on your big machine?

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Russ Anderson
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
 On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote:
 
  BRK makes sense as long as you can set a sane O(1) size limit.
 
 
 put the acpi override table in BRK, we still need ok from HPA.
 I have impression that he did not like it, so want to confirm from him.
 
 on 8 sockets system:
 -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
 -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
 -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
 -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
 -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
 -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
 -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
 -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
 -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
 -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
 -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
 -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
 -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
 -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
 -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
 -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
 -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
 -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
 -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
 -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
 -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
 -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
 -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat
 
 assume for 32sockets will have four times bigger with DSDT and SSDT.
 (with more pci and cpus)
 
 So we can not have O(1) the size.
 
 Russ, What is ACPI table size on your big machine?

This is from a 256 socket 32TB system.

 Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
32501719MB)
 ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
 ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  0113)
 ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 0113)
 ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 0113)
 ACPI: FACS 7d147000 00040
 ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   )
 ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   )
 ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 0113)
 ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 20070508)
 ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 0001)
 ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 0001)
 ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 0001)
 ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 0001)
 ACPI: SPCR 7e6c2000 00050 (v01   )
 ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 0113)

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote:
 On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
 On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote:

 Russ, What is ACPI table size on your big machine?

 This is from a 256 socket 32TB system.

  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
 32501719MB)
  ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
  ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  
 0113)
  ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 
 0113)
  ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 
 0113)
  ACPI: FACS 7d147000 00040
  ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   
 )
  ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   
 )
  ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 
 0113)
  ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 
 20070508)
  ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: SPCR 7e6c2000 00050 (v01   
 )
  ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 
 0113)


so the DSDT is 7F493E, and total is more than 8M.

that will need BRK to be extended 16M?

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread chen tang
Hi Yinghai,

2013/8/24 Yinghai Lu ying...@kernel.org:
 On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote:
 On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
 On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote:

 Russ, What is ACPI table size on your big machine?

 This is from a 256 socket 32TB system.

  Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 
 32501719MB)
  ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
  ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO  
 0113)
  ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT 
 0113)
  ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT 
 0113)
  ACPI: FACS 7d147000 00040
  ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO   
 )
  ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV   
 )
  ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT 
 0113)
  ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL 
 20070508)
  ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT 
 0001)
  ACPI: SPCR 7e6c2000 00050 (v01   
 )
  ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT 
 0113)


 so the DSDT is 7F493E, and total is more than 8M.

 that will need BRK to be extended 16M?


Then how about use early_ioremap(), and don't do it that early in
head_32 and head64 ?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Moore, Robert
While we're at it:

Can someone send me the acpidump for this machine? We very much would like to 
test all of ACPICA with such a large DSDT.

Thanks,
Bob


 -Original Message-
 From: chen tang [mailto:imtangc...@gmail.com]
 Sent: Friday, August 23, 2013 2:51 PM
 To: Yinghai Lu
 Cc: Russ Anderson; H. Peter Anvin; Zhang Yanfei; Toshi Kani; Tejun Heo;
 Tang Chen; Konrad Rzeszutek Wilk; Moore, Robert; Zheng, Lv; Rafael J.
 Wysocki; Ingo Molnar; Andrew Morton; Thomas Renninger; Yasuaki Ishimatsu;
 Mel Gorman; Linux Kernel Mailing List
 Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
 
 Hi Yinghai,
 
 2013/8/24 Yinghai Lu ying...@kernel.org:
  On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote:
  On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
  On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com
 wrote:
 
  Russ, What is ACPI table size on your big machine?
 
  This is from a 256 socket 32TB system.
 
   Reserving 256MB of memory at 66973408MB for crashkernel (System RAM:
 32501719MB)
   ACPI: RSDP 7ef3d014 00024 (v02 INTEL )
   ACPI: XSDT 7ef3d120 0007C (v01 INTEL  TIANO
 0113)
   ACPI: FACP 7ef3a000 000F4 (v04 INTEL  TIANO MSFT
 0113)
   ACPI: DSDT 7e6c3000 7F493E (v02   SGI2  UVX 0002 MSFT
 0113)
   ACPI: FACS 7d147000 00040
   ACPI: UEFI 7ef3c000 0012A (v01  INTEL  RstScuO 
 )
   ACPI: UEFI 7ef3b000 0005C (v01  INTEL  RstScuV 
 )
   ACPI: HPET 7ef39000 00038 (v01 INTEL  TIANO0001 MSFT
 0113)
   ACPI: SSDT 7ef33000 05352 (v02  INTEL ROSECITY 0003 INTL
 20070508)
   ACPI: SLIT 7ef1 1002C (v01   SGI2  UVX 0002 MSFT
 0001)
   ACPI: APIC 7000 10070 (v03   SGI2  UVX 0002 MSFT
 0001)
   ACPI: SRAT 7eeb8000 1A830 (v03   SGI2  UVX 0002 MSFT
 0001)
   ACPI: MCFG 7d6d4000 0105C (v01   SGI2  UVX 0002 MSFT
 0001)
   ACPI: SPCR 7e6c2000 00050 (v01 
 )
   ACPI: DMAR 7d6d3000 0013C (v01 INTEL  TIANO0001 MSFT
 0113)
 
 
  so the DSDT is 7F493E, and total is more than 8M.
 
  that will need BRK to be extended 16M?
 
 
 Then how about use early_ioremap(), and don't do it that early in
 head_32 and head64 ?
 
 Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 2:52 PM, Moore, Robert robert.mo...@intel.com wrote:
 While we're at it:

 Can someone send me the acpidump for this machine? We very much would like to 
 test all of ACPICA with such a large DSDT.

That is Russ.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
On Fri, Aug 23, 2013 at 2:50 PM, chen tang imtangc...@gmail.com wrote:

 so the DSDT is 7F493E, and total is more than 8M.

 that will need BRK to be extended 16M?


 Then how about use early_ioremap(), and don't do it that early in
 head_32 and head64 ?

why could early_ioremap() help?

when to use early_ioremap()? what for?

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread chen tang
Hi Yinghai,

2013/8/24 Yinghai Lu ying...@kernel.org:
 On Fri, Aug 23, 2013 at 2:50 PM, chen tang imtangc...@gmail.com wrote:

 so the DSDT is 7F493E, and total is more than 8M.

 that will need BRK to be extended 16M?


 Then how about use early_ioremap(), and don't do it that early in
 head_32 and head64 ?

 why could early_ioremap() help?

 when to use early_ioremap()? what for?


In my understanding, acpica framework needs users to copy the override tables
somewhere in the memory. And acpica will get these user specified tables when
installing firmware tables. This is the acpica logic, which cannot be
changed, I think.

So we need to allocate memory. That is why you suggested to use BRK, right ?
And the size seems to be a problem.

So I suggest to use early_ioremap().

1. After paging is enabled, before direct mapping page tables are
setup, we map the
initrd with early_ioremap(). And we are able to access it with va,
even on 32bit.
Then we can find all tables.
2. We still use memblock to allocate memory. Maybe it will be
hotpluggable memory,
but this memory can be freed when all the acpi tables are parsed, right ?

So I want to try early_ioremap(). All these should be done in setup_arch().

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Yinghai Lu
 So we need to allocate memory. That is why you suggested to use BRK, right ?
 And the size seems to be a problem.

 So I suggest to use early_ioremap().

 1. After paging is enabled, before direct mapping page tables are
 setup, we map the
 initrd with early_ioremap(). And we are able to access it with va,
 even on 32bit.
 Then we can find all tables.
 2. We still use memblock to allocate memory. Maybe it will be
 hotpluggable memory,
 but this memory can be freed when all the acpi tables are parsed, right ?

 So I want to try early_ioremap(). All these should be done in setup_arch().

no.
cpio search need to take whole range virtual address,
and early_ioremap has size limitation.
you will have to update cpio search to take mapping function.
could be too messy.

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-23 Thread Russ Anderson
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote:
 On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote:
 
  BRK makes sense as long as you can set a sane O(1) size limit.
 
 
 put the acpi override table in BRK, we still need ok from HPA.
 I have impression that he did not like it, so want to confirm from him.
 
 on 8 sockets system:
 -rw-r--r-- 1 root root   3532 Aug 22 10:26 APIC.dat
 -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat
 -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat
 -rw-r--r-- 1 root root  83509 Aug 22 10:26 DSDT.dat
 -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat
 -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat
 -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat
 -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat
 -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat
 -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat
 -rw-r--r-- 1 root root   6712 Aug 22 10:26 MPST.dat
 -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat
 -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat
 -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat
 -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat
 -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat
 -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat
 -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat
 -rw-r--r-- 1 root root   6448 Aug 22 10:26 SRAT.dat
 -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat
 -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat
 -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat
 -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat
 
 assume for 32sockets will have four times bigger with DSDT and SSDT.
 (with more pci and cpus)
 
 So we can not have O(1) the size.
 
 Russ, What is ACPI table size on your big machine?

This is from a 255 socket, 4080 cpu, 15TB system.

---
-rw-r--r-- 1 root root   65392 Aug 23 21:23 apic.dat
-rw-r--r-- 1 root root 316 Aug 23 21:23 dmar.dat
-rw-r--r-- 1 root root 8309249 Aug 23 21:23 dsdt.dat
-rw-r--r-- 1 root root 244 Aug 23 21:23 facp.dat
-rw-r--r-- 1 root root  64 Aug 23 21:23 facs.dat
-rw-r--r-- 1 root root  56 Aug 23 21:23 hpet.dat
-rw-r--r-- 1 root root4172 Aug 23 21:23 mcfg.dat
-rw-r--r-- 1 root root  36 Aug 23 21:23 rsdp.dat
-rw-r--r-- 1 root root  80 Aug 23 21:23 rsdt.dat
-rw-r--r-- 1 root root   65069 Aug 23 21:23 slit.dat
-rw-r--r-- 1 root root  80 Aug 23 21:23 spcr.dat
-rw-r--r-- 1 root root  108168 Aug 23 21:23 srat.dat
-rw-r--r-- 1 root root   21330 Aug 23 21:23 ssdt.dat
-rw-r--r-- 1 root root  92 Aug 23 21:23 uefi1.dat
-rw-r--r-- 1 root root 298 Aug 23 21:23 uefi.dat
-rw-r--r-- 1 root root 124 Aug 23 21:23 xsdt.dat
---

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote:
 :
> > Local page table and memory hotplug are two separate things.  That is,
> > local page tables can be supported on all NUMA platforms without hotplug
> > support.  Are you sure huge mapping will solve everything for all types
> > of applications, and therefore local page tables won't be needed at all?
> 
> When you throw around terms like "all" and "at all", you can't reach
> rational discussion about engineering trade-offs.  I was asking you
> whether it was reasonable to do per-node page table when most machines
> support huge page mappings which makes the whole thing rather
> pointless.  Of course there will be some niche cases where this might
> not be optimal but do you think that would be enough to justify the
> added complexity and churn?  If you think so, can you please
> elaborate?

I am relatively new to Linux, so I am not a good person to elaborate
this.  From my experience on other OS, huge pages helped for the kernel,
but did not necessarily help user applications.  It depended on
applications, which were not niche cases.  But Linux may be different,
so I asked since you seemed confident.  I'd appreciate if you can point
us some data that endorses your statement.

> > When someone changes the page table init code, who will test it with the
> > special allocation code?
> 
> What are you worrying about?  Are you saying that allocating page
> table towards top or bottom of memory would be more disruptive and
> difficult to debug than pulling in ACPI init and SRAT information into
> the process?  Am I missing something here?

My worry is that the code is unlikely tested with the special logic when
someone makes code changes to the page tables.  Such code can easily be
broken in future.

To answer your other question/email, I believe Tang's next step is to
support local page tables.  This is why we think pursing SRAT earlier is
the right direction.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello, Toshi.

On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote:
> Since some node(s) won't be ejectable, this solution is reasonable as
> the first step.  I do not think it is a distraction.  I view your

But does this contribute to reaching the next step?  If so, how?
I can't see how and that's why I said this was a distraction.

> suggestion as a distraction of supporting local page tables, though.

Hmmm...

> Local page table and memory hotplug are two separate things.  That is,
> local page tables can be supported on all NUMA platforms without hotplug
> support.  Are you sure huge mapping will solve everything for all types
> of applications, and therefore local page tables won't be needed at all?

When you throw around terms like "all" and "at all", you can't reach
rational discussion about engineering trade-offs.  I was asking you
whether it was reasonable to do per-node page table when most machines
support huge page mappings which makes the whole thing rather
pointless.  Of course there will be some niche cases where this might
not be optimal but do you think that would be enough to justify the
added complexity and churn?  If you think so, can you please
elaborate?

> When someone changes the page table init code, who will test it with the
> special allocation code?

What are you worrying about?  Are you saying that allocating page
table towards top or bottom of memory would be more disruptive and
difficult to debug than pulling in ACPI init and SRAT information into
the process?  Am I missing something here?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> > It's too late for the kernel image itself, but it prevents allocating
> > kernel memory from movable ranges after that.  I'd say it solves a half
> > of the issue this time.
> 
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Since some node(s) won't be ejectable, this solution is reasonable as
the first step.  I do not think it is a distraction.  I view your
suggestion as a distraction of supporting local page tables, though.

> > > > Also, how do you support local page tables without pursing SRAT early?
> > > 
> > > Does it even matter with huge mappings?  It's gonna be contained in a
> > > single page anyway, right?
> > 
> > Are the huge mappings always used?  We cannot force user programs to use
> > huge pages, can we?
> 
> Everything is a trade-off.  Should we do all this just to support the
> off chance someone tries to use memory hotplug on a machine which
> doesn't support huge mapping when virtually all CPUs on market
> supports it?

Local page table and memory hotplug are two separate things.  That is,
local page tables can be supported on all NUMA platforms without hotplug
support.  Are you sure huge mapping will solve everything for all types
of applications, and therefore local page tables won't be needed at all?

> > As for the maintainability, I am far more concerned with your suggestion
> > of having a separate page table init code when SRAT is used.  This kind
> > of divergence is a recipe of breakage.
> 
> I don't buy that.  The only thing which needs to change is the
> directionality of allocation and we probably don't even need to do
> that if huge mapping is in use.

When someone changes the page table init code, who will test it with the
special allocation code?

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
A bit of addition.

On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote:
> That works if such half solution eventually leads to the full
> solution.  This is just a distraction.  You are already too late in
> the boot sequence.  It doesn't even qualify as a half solution.  It's
> like obsessing about a speck on your shirt without your trousers on.
> If you want to solve this, do that from a place where it actually is
> solvable.

Seriously, what's the end game here?  How do you guys see this
eventually reaching full solution?  If you don't see that and this
kinda-sorta-working solution is fine, then that's fine too but we
aren't gonna make a lot of invasive changes for that.  If you can at
least envision the full solution, please try to fit this effort into
the bigger picture.

In all possible solutions that I can think of, there needs to be
earlier handling of SRAT informtaion before the kernel proper starts
executing be that either the actual bootloader or earlier kernel
serving as kexec host.  If a proper solution needs such processing
earlier anyway, it can set up things so that either the default
booting behavior doesn't harm hotpluggability or feed the necessary
information to the kernel.  In both cases, doing ACPI super early in
the booting kernel doesn't buy us anything.

So, then, what the hell are we doing here with all these relocations,
careful double execution of the same code from different execution
contexts, worrying about initrd firmware override even before the
kernel page table is set up?  If we're doing all those to just make
the temporary half-assed-anyway solution minutely better, that's just
plain stupid.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
> It's too late for the kernel image itself, but it prevents allocating
> kernel memory from movable ranges after that.  I'd say it solves a half
> of the issue this time.

That works if such half solution eventually leads to the full
solution.  This is just a distraction.  You are already too late in
the boot sequence.  It doesn't even qualify as a half solution.  It's
like obsessing about a speck on your shirt without your trousers on.
If you want to solve this, do that from a place where it actually is
solvable.

> > > Also, how do you support local page tables without pursing SRAT early?
> > 
> > Does it even matter with huge mappings?  It's gonna be contained in a
> > single page anyway, right?
> 
> Are the huge mappings always used?  We cannot force user programs to use
> huge pages, can we?

Everything is a trade-off.  Should we do all this just to support the
off chance someone tries to use memory hotplug on a machine which
doesn't support huge mapping when virtually all CPUs on market
supports it?

> As for the maintainability, I am far more concerned with your suggestion
> of having a separate page table init code when SRAT is used.  This kind
> of divergence is a recipe of breakage.

I don't buy that.  The only thing which needs to change is the
directionality of allocation and we probably don't even need to do
that if huge mapping is in use.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote:
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> > I understand that you are concerned about stability of the ACPI stuff,
> > which I think is a valid point, but most of (if not all) of the
> > ACPI-related issues come from ACPI namespace/methods, which is a very
> > different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.

If you are referring ACPICA as the AML interpreter, right, we do not
move it up as I explained before.  We are trying to move up the ACPI
table init code (which is part of ACPICA, but has nothing to do with
AML.)

Note that ia64 also uses ACPI, and calls acpi_table_init() in
setup_arch() before initializing the bootmap in find_memory().

> > namespace/methods stuff remains the same and continues to be initialized
> > at very late in the boot sequence.
> > 
> > What's making the patchset complicated is acpi_initrd_override(), which
> > is intended for developers and allows overwriting ACPI bits at their own
> > risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

I think the reason why Tang is working on this stuff again is that his
previous change (which was once accepted) had broken initrd.  So, he'd
have to support it this time...

> > If you are referring the issue of kernel image location, it is a
> > limitation in the current implementation, not a technical limitation.  I
> > know other OS that supports movable memory and puts the kernel image
> > into a movable memory with SRAT by changing the bootloader.
> 
> I'm not saying that problem shouldn't be solved.  I'm saying what you
> guys are pushing doesn't help solving it at all.  It's too late in the
> boot process.  It needs to be handled either by bootloader or earlier
> kernel kexecing the actual one and super-early SRAT doens't help at
> all in either case, so what's the point of pulling ACPI code in when
> it doesn't contribute to solving the problem properly?

It's too late for the kernel image itself, but it prevents allocating
kernel memory from movable ranges after that.  I'd say it solves a half
of the issue this time.

> > Also, how do you support local page tables without pursing SRAT early?
> 
> Does it even matter with huge mappings?  It's gonna be contained in a
> single page anyway, right?

Are the huge mappings always used?  We cannot force user programs to use
huge pages, can we?

> > Initializing page tables on large systems may take a long time, and I do
> > think that earlyprink needs to be available before that point.
> 
> Yeah, sure, implement it in *minimal* way which doesn't affect
> anything if not explicitly enabled by kernel param like other
> earlyprintks.  It doens't make any sense to add dependency to acpi
> from early boot for that.

It makes sense because it needs to obtain the config info from ACPI
tables.

As for the maintainability, I am far more concerned with your suggestion
of having a separate page table init code when SRAT is used.  This kind
of divergence is a recipe of breakage.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote:
> What do you mean by "earlyboot"? And also in your previous mail, I am also
> a little confused by what you said "the very first stage of boot". Does
> this mean the stage we are in head_32 or head64.c?

Mostly referring to the state where we don't have basic environment
set up yet including page tables.

> If so, could we just do something just as Yinghai did before, that is, Split
> acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
> the find, and I think that is less of risk. Or we can just do ACPI override
> earlier in setup_arch(), not pulling this process that early during boot?

But *WHY*?  It doesn't really buy us anything substantial.  What are
you trying to achieve here?  "Making ACPI info available early" can't
be a goal in itself and the two benefits cited in this thread seem
pretty dubious to me.  Why are you guys trying to push this
convolution when it doesn't bring any substantial gain?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Zhang Yanfei
Hello tejun,

On 08/23/2013 02:31 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
>> I understand that you are concerned about stability of the ACPI stuff,
>> which I think is a valid point, but most of (if not all) of the
>> ACPI-related issues come from ACPI namespace/methods, which is a very
>> different thing.  Please do not mix up those two.  The ACPI
> 
> I have no objection to implementing self-conftained earlyprintk
> support.  If that's all you want to do, please go ahead but do not
> pull in initrd override or ACPICA into it.
> 
>> namespace/methods stuff remains the same and continues to be initialized
>> at very late in the boot sequence.
>>
>> What's making the patchset complicated is acpi_initrd_override(), which
>> is intended for developers and allows overwriting ACPI bits at their own
>> risk.  This feature won't be used by regular users. 
> 
> Yeah, please forget about that in earlyboot.  It doesn't make any
> sense to fiddle with initrd that early during boot.

What do you mean by "earlyboot"? And also in your previous mail, I am also
a little confused by what you said "the very first stage of boot". Does
this mean the stage we are in head_32 or head64.c?

If so, could we just do something just as Yinghai did before, that is, Split
acpi_override into 2 parts: find and copy. And in "earlyboot", we just do
the find, and I think that is less of risk. Or we can just do ACPI override
earlier in setup_arch(), not pulling this process that early during boot?

Thanks

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
> I understand that you are concerned about stability of the ACPI stuff,
> which I think is a valid point, but most of (if not all) of the
> ACPI-related issues come from ACPI namespace/methods, which is a very
> different thing.  Please do not mix up those two.  The ACPI

I have no objection to implementing self-conftained earlyprintk
support.  If that's all you want to do, please go ahead but do not
pull in initrd override or ACPICA into it.

> namespace/methods stuff remains the same and continues to be initialized
> at very late in the boot sequence.
> 
> What's making the patchset complicated is acpi_initrd_override(), which
> is intended for developers and allows overwriting ACPI bits at their own
> risk.  This feature won't be used by regular users. 

Yeah, please forget about that in earlyboot.  It doesn't make any
sense to fiddle with initrd that early during boot.

> If you are referring the issue of kernel image location, it is a
> limitation in the current implementation, not a technical limitation.  I
> know other OS that supports movable memory and puts the kernel image
> into a movable memory with SRAT by changing the bootloader.

I'm not saying that problem shouldn't be solved.  I'm saying what you
guys are pushing doesn't help solving it at all.  It's too late in the
boot process.  It needs to be handled either by bootloader or earlier
kernel kexecing the actual one and super-early SRAT doens't help at
all in either case, so what's the point of pulling ACPI code in when
it doesn't contribute to solving the problem properly?

> Also, how do you support local page tables without pursing SRAT early?

Does it even matter with huge mappings?  It's gonna be contained in a
single page anyway, right?

> Initializing page tables on large systems may take a long time, and I do
> think that earlyprink needs to be available before that point.

Yeah, sure, implement it in *minimal* way which doesn't affect
anything if not explicitly enabled by kernel param like other
earlyprintks.  It doens't make any sense to add dependency to acpi
from early boot for that.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> > I agree that ACPI is rather complicated stuff.  But in my experience,
> > the majority complication comes from ACPI namespace and methods, not
> > from ACPI tables.  Do you really think ACPI table init is that risky?  I
> > consider ACPI tables are part of the minimum config info, esp. for
> > legacy-free platforms.
> 
> It's just that we're talking about the very first stage of boot.  We
> really don't do much there and pulling in ACPI code into that stage is
> a lot by comparison.  If that's gonna happen, it needs pretty strong
> justification.

It moves up the ACPI table init code, which itself is simple.  And ACPI
tables are defined to be pursed at early boot-time, which is why they
exist in addition to ACPI namespace/methods.  They are similar to EFI
memory table.  Firmware publishes tables in one way or the other.

I understand that you are concerned about stability of the ACPI stuff,
which I think is a valid point, but most of (if not all) of the
ACPI-related issues come from ACPI namespace/methods, which is a very
different thing.  Please do not mix up those two.  The ACPI
namespace/methods stuff remains the same and continues to be initialized
at very late in the boot sequence.

What's making the patchset complicated is acpi_initrd_override(), which
is intended for developers and allows overwriting ACPI bits at their own
risk.  This feature won't be used by regular users. 

> > earlyprintk is just another example to this SRAT issue.  The local page
> > table is yet another example.  My hope here is for us to be able to
> > utilize ACPI tables properly without hitting this kind of ordering
> > issues again and again, which requires considerable time & effort to
> > address.
> 
> So, the two things brought up at this point are early parsing of SRAT,
> which can't really solve the problem at hand anyway, 

If you are referring the issue of kernel image location, it is a
limitation in the current implementation, not a technical limitation.  I
know other OS that supports movable memory and puts the kernel image
into a movable memory with SRAT by changing the bootloader.

Also, how do you support local page tables without pursing SRAT early?

> and earlyprintk
> which should be implemented in minimal way which is not activated
> unless specifically enabled with earlyprintk boot param.  Neither
> seems to justify pulling in full ACPI into early boot, right?

Initializing page tables on large systems may take a long time, and I do
think that earlyprink needs to be available before that point.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote:
 On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
  I agree that ACPI is rather complicated stuff.  But in my experience,
  the majority complication comes from ACPI namespace and methods, not
  from ACPI tables.  Do you really think ACPI table init is that risky?  I
  consider ACPI tables are part of the minimum config info, esp. for
  legacy-free platforms.
 
 It's just that we're talking about the very first stage of boot.  We
 really don't do much there and pulling in ACPI code into that stage is
 a lot by comparison.  If that's gonna happen, it needs pretty strong
 justification.

It moves up the ACPI table init code, which itself is simple.  And ACPI
tables are defined to be pursed at early boot-time, which is why they
exist in addition to ACPI namespace/methods.  They are similar to EFI
memory table.  Firmware publishes tables in one way or the other.

I understand that you are concerned about stability of the ACPI stuff,
which I think is a valid point, but most of (if not all) of the
ACPI-related issues come from ACPI namespace/methods, which is a very
different thing.  Please do not mix up those two.  The ACPI
namespace/methods stuff remains the same and continues to be initialized
at very late in the boot sequence.

What's making the patchset complicated is acpi_initrd_override(), which
is intended for developers and allows overwriting ACPI bits at their own
risk.  This feature won't be used by regular users. 

  earlyprintk is just another example to this SRAT issue.  The local page
  table is yet another example.  My hope here is for us to be able to
  utilize ACPI tables properly without hitting this kind of ordering
  issues again and again, which requires considerable time  effort to
  address.
 
 So, the two things brought up at this point are early parsing of SRAT,
 which can't really solve the problem at hand anyway, 

If you are referring the issue of kernel image location, it is a
limitation in the current implementation, not a technical limitation.  I
know other OS that supports movable memory and puts the kernel image
into a movable memory with SRAT by changing the bootloader.

Also, how do you support local page tables without pursing SRAT early?

 and earlyprintk
 which should be implemented in minimal way which is not activated
 unless specifically enabled with earlyprintk boot param.  Neither
 seems to justify pulling in full ACPI into early boot, right?

Initializing page tables on large systems may take a long time, and I do
think that earlyprink needs to be available before that point.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
 I understand that you are concerned about stability of the ACPI stuff,
 which I think is a valid point, but most of (if not all) of the
 ACPI-related issues come from ACPI namespace/methods, which is a very
 different thing.  Please do not mix up those two.  The ACPI

I have no objection to implementing self-conftained earlyprintk
support.  If that's all you want to do, please go ahead but do not
pull in initrd override or ACPICA into it.

 namespace/methods stuff remains the same and continues to be initialized
 at very late in the boot sequence.
 
 What's making the patchset complicated is acpi_initrd_override(), which
 is intended for developers and allows overwriting ACPI bits at their own
 risk.  This feature won't be used by regular users. 

Yeah, please forget about that in earlyboot.  It doesn't make any
sense to fiddle with initrd that early during boot.

 If you are referring the issue of kernel image location, it is a
 limitation in the current implementation, not a technical limitation.  I
 know other OS that supports movable memory and puts the kernel image
 into a movable memory with SRAT by changing the bootloader.

I'm not saying that problem shouldn't be solved.  I'm saying what you
guys are pushing doesn't help solving it at all.  It's too late in the
boot process.  It needs to be handled either by bootloader or earlier
kernel kexecing the actual one and super-early SRAT doens't help at
all in either case, so what's the point of pulling ACPI code in when
it doesn't contribute to solving the problem properly?

 Also, how do you support local page tables without pursing SRAT early?

Does it even matter with huge mappings?  It's gonna be contained in a
single page anyway, right?

 Initializing page tables on large systems may take a long time, and I do
 think that earlyprink needs to be available before that point.

Yeah, sure, implement it in *minimal* way which doesn't affect
anything if not explicitly enabled by kernel param like other
earlyprintks.  It doens't make any sense to add dependency to acpi
from early boot for that.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Zhang Yanfei
Hello tejun,

On 08/23/2013 02:31 AM, Tejun Heo wrote:
 Hello,
 
 On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
 I understand that you are concerned about stability of the ACPI stuff,
 which I think is a valid point, but most of (if not all) of the
 ACPI-related issues come from ACPI namespace/methods, which is a very
 different thing.  Please do not mix up those two.  The ACPI
 
 I have no objection to implementing self-conftained earlyprintk
 support.  If that's all you want to do, please go ahead but do not
 pull in initrd override or ACPICA into it.
 
 namespace/methods stuff remains the same and continues to be initialized
 at very late in the boot sequence.

 What's making the patchset complicated is acpi_initrd_override(), which
 is intended for developers and allows overwriting ACPI bits at their own
 risk.  This feature won't be used by regular users. 
 
 Yeah, please forget about that in earlyboot.  It doesn't make any
 sense to fiddle with initrd that early during boot.

What do you mean by earlyboot? And also in your previous mail, I am also
a little confused by what you said the very first stage of boot. Does
this mean the stage we are in head_32 or head64.c?

If so, could we just do something just as Yinghai did before, that is, Split
acpi_override into 2 parts: find and copy. And in earlyboot, we just do
the find, and I think that is less of risk. Or we can just do ACPI override
earlier in setup_arch(), not pulling this process that early during boot?

Thanks

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote:
 What do you mean by earlyboot? And also in your previous mail, I am also
 a little confused by what you said the very first stage of boot. Does
 this mean the stage we are in head_32 or head64.c?

Mostly referring to the state where we don't have basic environment
set up yet including page tables.

 If so, could we just do something just as Yinghai did before, that is, Split
 acpi_override into 2 parts: find and copy. And in earlyboot, we just do
 the find, and I think that is less of risk. Or we can just do ACPI override
 earlier in setup_arch(), not pulling this process that early during boot?

But *WHY*?  It doesn't really buy us anything substantial.  What are
you trying to achieve here?  Making ACPI info available early can't
be a goal in itself and the two benefits cited in this thread seem
pretty dubious to me.  Why are you guys trying to push this
convolution when it doesn't bring any substantial gain?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote:
 On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote:
  I understand that you are concerned about stability of the ACPI stuff,
  which I think is a valid point, but most of (if not all) of the
  ACPI-related issues come from ACPI namespace/methods, which is a very
  different thing.  Please do not mix up those two.  The ACPI
 
 I have no objection to implementing self-conftained earlyprintk
 support.  If that's all you want to do, please go ahead but do not
 pull in initrd override or ACPICA into it.

If you are referring ACPICA as the AML interpreter, right, we do not
move it up as I explained before.  We are trying to move up the ACPI
table init code (which is part of ACPICA, but has nothing to do with
AML.)

Note that ia64 also uses ACPI, and calls acpi_table_init() in
setup_arch() before initializing the bootmap in find_memory().

  namespace/methods stuff remains the same and continues to be initialized
  at very late in the boot sequence.
  
  What's making the patchset complicated is acpi_initrd_override(), which
  is intended for developers and allows overwriting ACPI bits at their own
  risk.  This feature won't be used by regular users. 
 
 Yeah, please forget about that in earlyboot.  It doesn't make any
 sense to fiddle with initrd that early during boot.

I think the reason why Tang is working on this stuff again is that his
previous change (which was once accepted) had broken initrd.  So, he'd
have to support it this time...

  If you are referring the issue of kernel image location, it is a
  limitation in the current implementation, not a technical limitation.  I
  know other OS that supports movable memory and puts the kernel image
  into a movable memory with SRAT by changing the bootloader.
 
 I'm not saying that problem shouldn't be solved.  I'm saying what you
 guys are pushing doesn't help solving it at all.  It's too late in the
 boot process.  It needs to be handled either by bootloader or earlier
 kernel kexecing the actual one and super-early SRAT doens't help at
 all in either case, so what's the point of pulling ACPI code in when
 it doesn't contribute to solving the problem properly?

It's too late for the kernel image itself, but it prevents allocating
kernel memory from movable ranges after that.  I'd say it solves a half
of the issue this time.

  Also, how do you support local page tables without pursing SRAT early?
 
 Does it even matter with huge mappings?  It's gonna be contained in a
 single page anyway, right?

Are the huge mappings always used?  We cannot force user programs to use
huge pages, can we?

  Initializing page tables on large systems may take a long time, and I do
  think that earlyprink needs to be available before that point.
 
 Yeah, sure, implement it in *minimal* way which doesn't affect
 anything if not explicitly enabled by kernel param like other
 earlyprintks.  It doens't make any sense to add dependency to acpi
 from early boot for that.

It makes sense because it needs to obtain the config info from ACPI
tables.

As for the maintainability, I am far more concerned with your suggestion
of having a separate page table init code when SRAT is used.  This kind
of divergence is a recipe of breakage.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello,

On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
 It's too late for the kernel image itself, but it prevents allocating
 kernel memory from movable ranges after that.  I'd say it solves a half
 of the issue this time.

That works if such half solution eventually leads to the full
solution.  This is just a distraction.  You are already too late in
the boot sequence.  It doesn't even qualify as a half solution.  It's
like obsessing about a speck on your shirt without your trousers on.
If you want to solve this, do that from a place where it actually is
solvable.

   Also, how do you support local page tables without pursing SRAT early?
  
  Does it even matter with huge mappings?  It's gonna be contained in a
  single page anyway, right?
 
 Are the huge mappings always used?  We cannot force user programs to use
 huge pages, can we?

Everything is a trade-off.  Should we do all this just to support the
off chance someone tries to use memory hotplug on a machine which
doesn't support huge mapping when virtually all CPUs on market
supports it?

 As for the maintainability, I am far more concerned with your suggestion
 of having a separate page table init code when SRAT is used.  This kind
 of divergence is a recipe of breakage.

I don't buy that.  The only thing which needs to change is the
directionality of allocation and we probably don't even need to do
that if huge mapping is in use.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
A bit of addition.

On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote:
 That works if such half solution eventually leads to the full
 solution.  This is just a distraction.  You are already too late in
 the boot sequence.  It doesn't even qualify as a half solution.  It's
 like obsessing about a speck on your shirt without your trousers on.
 If you want to solve this, do that from a place where it actually is
 solvable.

Seriously, what's the end game here?  How do you guys see this
eventually reaching full solution?  If you don't see that and this
kinda-sorta-working solution is fine, then that's fine too but we
aren't gonna make a lot of invasive changes for that.  If you can at
least envision the full solution, please try to fit this effort into
the bigger picture.

In all possible solutions that I can think of, there needs to be
earlier handling of SRAT informtaion before the kernel proper starts
executing be that either the actual bootloader or earlier kernel
serving as kexec host.  If a proper solution needs such processing
earlier anyway, it can set up things so that either the default
booting behavior doesn't harm hotpluggability or feed the necessary
information to the kernel.  In both cases, doing ACPI super early in
the booting kernel doesn't buy us anything.

So, then, what the hell are we doing here with all these relocations,
careful double execution of the same code from different execution
contexts, worrying about initrd firmware override even before the
kernel page table is set up?  If we're doing all those to just make
the temporary half-assed-anyway solution minutely better, that's just
plain stupid.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote:
 On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote:
  It's too late for the kernel image itself, but it prevents allocating
  kernel memory from movable ranges after that.  I'd say it solves a half
  of the issue this time.
 
 That works if such half solution eventually leads to the full
 solution.  This is just a distraction.  You are already too late in
 the boot sequence.  It doesn't even qualify as a half solution.  It's
 like obsessing about a speck on your shirt without your trousers on.
 If you want to solve this, do that from a place where it actually is
 solvable.

Since some node(s) won't be ejectable, this solution is reasonable as
the first step.  I do not think it is a distraction.  I view your
suggestion as a distraction of supporting local page tables, though.

Also, how do you support local page tables without pursing SRAT early?
   
   Does it even matter with huge mappings?  It's gonna be contained in a
   single page anyway, right?
  
  Are the huge mappings always used?  We cannot force user programs to use
  huge pages, can we?
 
 Everything is a trade-off.  Should we do all this just to support the
 off chance someone tries to use memory hotplug on a machine which
 doesn't support huge mapping when virtually all CPUs on market
 supports it?

Local page table and memory hotplug are two separate things.  That is,
local page tables can be supported on all NUMA platforms without hotplug
support.  Are you sure huge mapping will solve everything for all types
of applications, and therefore local page tables won't be needed at all?

  As for the maintainability, I am far more concerned with your suggestion
  of having a separate page table init code when SRAT is used.  This kind
  of divergence is a recipe of breakage.
 
 I don't buy that.  The only thing which needs to change is the
 directionality of allocation and we probably don't even need to do
 that if huge mapping is in use.

When someone changes the page table init code, who will test it with the
special allocation code?

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Tejun Heo
Hello, Toshi.

On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote:
 Since some node(s) won't be ejectable, this solution is reasonable as
 the first step.  I do not think it is a distraction.  I view your

But does this contribute to reaching the next step?  If so, how?
I can't see how and that's why I said this was a distraction.

 suggestion as a distraction of supporting local page tables, though.

Hmmm...

 Local page table and memory hotplug are two separate things.  That is,
 local page tables can be supported on all NUMA platforms without hotplug
 support.  Are you sure huge mapping will solve everything for all types
 of applications, and therefore local page tables won't be needed at all?

When you throw around terms like all and at all, you can't reach
rational discussion about engineering trade-offs.  I was asking you
whether it was reasonable to do per-node page table when most machines
support huge page mappings which makes the whole thing rather
pointless.  Of course there will be some niche cases where this might
not be optimal but do you think that would be enough to justify the
added complexity and churn?  If you think so, can you please
elaborate?

 When someone changes the page table init code, who will test it with the
 special allocation code?

What are you worrying about?  Are you saying that allocating page
table towards top or bottom of memory would be more disruptive and
difficult to debug than pulling in ACPI init and SRAT information into
the process?  Am I missing something here?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-22 Thread Toshi Kani
Hello Tejun,

On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote:
 :
  Local page table and memory hotplug are two separate things.  That is,
  local page tables can be supported on all NUMA platforms without hotplug
  support.  Are you sure huge mapping will solve everything for all types
  of applications, and therefore local page tables won't be needed at all?
 
 When you throw around terms like all and at all, you can't reach
 rational discussion about engineering trade-offs.  I was asking you
 whether it was reasonable to do per-node page table when most machines
 support huge page mappings which makes the whole thing rather
 pointless.  Of course there will be some niche cases where this might
 not be optimal but do you think that would be enough to justify the
 added complexity and churn?  If you think so, can you please
 elaborate?

I am relatively new to Linux, so I am not a good person to elaborate
this.  From my experience on other OS, huge pages helped for the kernel,
but did not necessarily help user applications.  It depended on
applications, which were not niche cases.  But Linux may be different,
so I asked since you seemed confident.  I'd appreciate if you can point
us some data that endorses your statement.

  When someone changes the page table init code, who will test it with the
  special allocation code?
 
 What are you worrying about?  Are you saying that allocating page
 table towards top or bottom of memory would be more disruptive and
 difficult to debug than pulling in ACPI init and SRAT information into
 the process?  Am I missing something here?

My worry is that the code is unlikely tested with the special logic when
someone makes code changes to the page tables.  Such code can easily be
broken in future.

To answer your other question/email, I believe Tang's next step is to
support local page tables.  This is why we think pursing SRAT earlier is
the right direction.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
> I agree that ACPI is rather complicated stuff.  But in my experience,
> the majority complication comes from ACPI namespace and methods, not
> from ACPI tables.  Do you really think ACPI table init is that risky?  I
> consider ACPI tables are part of the minimum config info, esp. for
> legacy-free platforms.

It's just that we're talking about the very first stage of boot.  We
really don't do much there and pulling in ACPI code into that stage is
a lot by comparison.  If that's gonna happen, it needs pretty strong
justification.

> earlyprintk is just another example to this SRAT issue.  The local page
> table is yet another example.  My hope here is for us to be able to
> utilize ACPI tables properly without hitting this kind of ordering
> issues again and again, which requires considerable time & effort to
> address.

So, the two things brought up at this point are early parsing of SRAT,
which can't really solve the problem at hand anyway, and earlyprintk
which should be implemented in minimal way which is not activated
unless specifically enabled with earlyprintk boot param.  Neither
seems to justify pulling in full ACPI into early boot, right?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> > Platforms vendors (which care Linux) need to support the existing Linux
> > features.  This means that they have to implement legacy interfaces on
> > x86 until the kernel supports an alternative method.  For instance, some
> > platforms are legacy-free and do not have legacy COM ports.  These ACPI
> > tables were defined so that non-legacy COM ports can be described and
> > informed to the OS.  Without this support, such platforms may have to
> > emulate the legacy COM ports for Linux, or drop Linux support.
> 
> Are you seriously saying that vendors are gonna drop linux support for
> lacking ACPI earlyprintk support?  Please...

earlyprintk is an example of the issues.  The point is that vendors are
required to support legacy stuff for Linux.

> Please take a look at the existing earlyprintk code and how compact
> and self-contained they are.  If you want to add ACPI earlyprintk, do
> similar stuff.  Forget about firmware blob override from initrd or
> ACPICA.  Just implement the bare minimum to get the thing working.  Do
> not add dependency to large body of code from earlyboot.  It's a bad
> idea through and through.

I am not saying that ACPI earlyprintk must be available at exactly the
same point.  How early it can reasonably be is a subject of discussion.

> > I think the kernel boot-up sequence should be designed in such a way
> > that can support legacy-free and/or NUMA platforms properly.
> 
> Blanket statements like the above don't mean much.  There are many
> separate stages of boot and you're talking about one of the very first
> stages where we traditionally have always depended upon only the very
> bare minimum of the platform both in hardware itself and configuration
> information.  We've been doing that for *very* good reasons.  If you
> screw up there, it's mighty tricky to figure out what went wrong
> especially on the machines that you can't physically kick.  You're now
> suggesting to add whole ACPI parsing including overloading from initrd
> into that stage with pretty weak rationale.

I agree that ACPI is rather complicated stuff.  But in my experience,
the majority complication comes from ACPI namespace and methods, not
from ACPI tables.  Do you really think ACPI table init is that risky?  I
consider ACPI tables are part of the minimum config info, esp. for
legacy-free platforms.

> Seriously, if you want ACPI based earlyprintk, implement it in a
> discrete minimal code which is easy to verify and won't get affected
> when the rest of ACPI machinery is updated.  We really don't want
> earlyboot to fail because someone screwed up ACPI or initrd handling.

earlyprintk is just another example to this SRAT issue.  The local page
table is yet another example.  My hope here is for us to be able to
utilize ACPI tables properly without hitting this kind of ordering
issues again and again, which requires considerable time & effort to
address.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello, Toshi.

On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
> Platforms vendors (which care Linux) need to support the existing Linux
> features.  This means that they have to implement legacy interfaces on
> x86 until the kernel supports an alternative method.  For instance, some
> platforms are legacy-free and do not have legacy COM ports.  These ACPI
> tables were defined so that non-legacy COM ports can be described and
> informed to the OS.  Without this support, such platforms may have to
> emulate the legacy COM ports for Linux, or drop Linux support.

Are you seriously saying that vendors are gonna drop linux support for
lacking ACPI earlyprintk support?  Please...

Please take a look at the existing earlyprintk code and how compact
and self-contained they are.  If you want to add ACPI earlyprintk, do
similar stuff.  Forget about firmware blob override from initrd or
ACPICA.  Just implement the bare minimum to get the thing working.  Do
not add dependency to large body of code from earlyboot.  It's a bad
idea through and through.

> I think the kernel boot-up sequence should be designed in such a way
> that can support legacy-free and/or NUMA platforms properly.

Blanket statements like the above don't mean much.  There are many
separate stages of boot and you're talking about one of the very first
stages where we traditionally have always depended upon only the very
bare minimum of the platform both in hardware itself and configuration
information.  We've been doing that for *very* good reasons.  If you
screw up there, it's mighty tricky to figure out what went wrong
especially on the machines that you can't physically kick.  You're now
suggesting to add whole ACPI parsing including overloading from initrd
into that stage with pretty weak rationale.

Seriously, if you want ACPI based earlyprintk, implement it in a
discrete minimal code which is easy to verify and won't get affected
when the rest of ACPI machinery is updated.  We really don't want
earlyboot to fail because someone screwed up ACPI or initrd handling.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> > Well, there is reason why we have earlyprintk feature today.  So, let's
> > not debate on this feature now.  There was previous attempt to support
> 
> Are you saying the existing earlyprintk automatically justifies
> addition of more complex mechanism?  The added complex of course
> should be traded off against the benefits of gaining ACPI based early
> boot.  You aren't gonna suggest implementing netconsole based
> earlyprintk, right?

Platforms vendors (which care Linux) need to support the existing Linux
features.  This means that they have to implement legacy interfaces on
x86 until the kernel supports an alternative method.  For instance, some
platforms are legacy-free and do not have legacy COM ports.  These ACPI
tables were defined so that non-legacy COM ports can be described and
informed to the OS.  Without this support, such platforms may have to
emulate the legacy COM ports for Linux, or drop Linux support.

> > this feature with ACPI tables below.  As described, it had the same
> > ordering issue.
> > 
> > https://lkml.org/lkml/2012/10/8/498
> > 
> > There is a basic problem that when we try to use ACPI tables that
> > extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> > this ordering issue because ACPI is not available as early as the legacy
> > interfaces.
> 
> Do we even want ACPI parsing and all that that early?  Parsing SRAT
> early doesn't buy us much and I'm not sure whether adding ACPI
> earlyprintk would increase or decrease debuggability during earlyboot.
> It adds whole lot more code paths where things can go wrong while the
> basic execution environment is unstable.  Why do that?

I think the kernel boot-up sequence should be designed in such a way
that can support legacy-free and/or NUMA platforms properly.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello, Toshi.

On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
> Well, there is reason why we have earlyprintk feature today.  So, let's
> not debate on this feature now.  There was previous attempt to support

Are you saying the existing earlyprintk automatically justifies
addition of more complex mechanism?  The added complex of course
should be traded off against the benefits of gaining ACPI based early
boot.  You aren't gonna suggest implementing netconsole based
earlyprintk, right?

> this feature with ACPI tables below.  As described, it had the same
> ordering issue.
> 
> https://lkml.org/lkml/2012/10/8/498
> 
> There is a basic problem that when we try to use ACPI tables that
> extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
> this ordering issue because ACPI is not available as early as the legacy
> interfaces.

Do we even want ACPI parsing and all that that early?  Parsing SRAT
early doesn't buy us much and I'm not sure whether adding ACPI
earlyprintk would increase or decrease debuggability during earlyboot.
It adds whole lot more code paths where things can go wrong while the
basic execution environment is unstable.  Why do that?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> > In current boot order, before we get the SRAT, we have a big consumer of 
> > early
> > allocations: we are setting up the page table in top-down (The idea was 
> > proposed by HPA,
> > Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page 
> > table
> > setup will make the page tables as high as possible in memory, since memory 
> > at low 
> > addresses is precious (for stupid DMA devices, for things like  
> > kexec/kdump, and so on.)
> 
> With huge mappings, they are fairly small, right?  And this whole
> thing needs a kernel param anyway at this point, so the allocation
> direction can be made dependent on that or huge mapping availability
> and, even with 4k mappings, we aren't talking about gigabytes of
> memory, are we?
> 
> > So if we are trying to make early allocations close to kernel image, we 
> > should
> > rewrite the way we are setting up page table totally. That is not a easy 
> > thing
> > to do.
> 
> It has been a while since I looked at the code so can you please
> elaborate why that is not easy?  It's pretty simple conceptually.
> 
> > * For memory hotplug, we need ACPI SRAT at early time to be aware of which 
> > memory
> >   ranges are hotpluggable, and tell the kernel to try to stay away from 
> > hotpluggable
> >   nodes.
> > 
> > This one is the current requirement of us but may be very helpful for 
> > future change:
> > 
> > * As suggested by Yinghai, we should allocate page tables in local node. 
> > This also
> >   needs SRAT before direct mapping page tables are setup.
> 
> Does this even matter for huge mappings?
> 
> > * As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables
> >   allow the OS to initialize serial console/debug ports at early boot time. 
> > The
> >   earlier it can be initialized, the better this feature will be.  These 
> > tables
> >   are not currently used by Linux due to a licensing issue, but it could be
> >   addressed some time soon.
> > 
> > So we decided to firstly make ACPI override earlier and use BRK (this is 
> > obviously
> > near the kernel image range) to store the found ACPI tables.
> 
> I don't know.  The whole effort seems way overcomplicated compared to
> the benefits it would bring.  For NUMA memory hotunplug, what's the
> point of doing all this when the kernel doesn't have any control over
> where its image is gonna be?  Some megabytes at the tail aren't gonna
> make a huge difference and if you wanna do this properly, you need to
> determine the load address of the kernel considering the node
> boundaries and hotpluggability of each node, which has to happen
> before the early kernel boot code executes.  And if there's a code
> piece which does that, that might as well place the kernel image such
> that extra allocation afterwards doesn't interfere with memory
> hotunplugging.
> 
> It looks like a lot of code changes for a mechanism which doesn't seem
> all that useful.  This code is already too late in boot sequence to be
> a proper solution so I don't see the point in pushing the coverage to
> the maximum from here.  It's kinda silly.
> 
> The last point - early init of debug facility - makes some sense but
> again how extra coverage are we talking about?  The code path between
> the two points is fairly short and the change doesn't come free.  It
> means we add more fragile firmware-specific code path before the
> execution environment is stable and get to do things like traveling
> the same code paths multiple times in different environments.  Doesn't
> seem like a win.  We want to reach stable execution environment as
> soon as possible.  Shoving whole more logic before that in the name of
> "earlier debugging" doesn't make a lot of sense.

Well, there is reason why we have earlyprintk feature today.  So, let's
not debate on this feature now.  There was previous attempt to support
this feature with ACPI tables below.  As described, it had the same
ordering issue.

https://lkml.org/lkml/2012/10/8/498

There is a basic problem that when we try to use ACPI tables that
extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
this ordering issue because ACPI is not available as early as the legacy
interfaces.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> In current boot order, before we get the SRAT, we have a big consumer of early
> allocations: we are setting up the page table in top-down (The idea was 
> proposed by HPA,
> Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> setup will make the page tables as high as possible in memory, since memory 
> at low 
> addresses is precious (for stupid DMA devices, for things like  kexec/kdump, 
> and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

> So if we are trying to make early allocations close to kernel image, we should
> rewrite the way we are setting up page table totally. That is not a easy thing
> to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

> * For memory hotplug, we need ACPI SRAT at early time to be aware of which 
> memory
>   ranges are hotpluggable, and tell the kernel to try to stay away from 
> hotpluggable
>   nodes.
> 
> This one is the current requirement of us but may be very helpful for future 
> change:
> 
> * As suggested by Yinghai, we should allocate page tables in local node. This 
> also
>   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

> * As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables
>   allow the OS to initialize serial console/debug ports at early boot time. 
> The
>   earlier it can be initialized, the better this feature will be.  These 
> tables
>   are not currently used by Linux due to a licensing issue, but it could be
>   addressed some time soon.
> 
> So we decided to firstly make ACPI override earlier and use BRK (this is 
> obviously
> near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
"earlier debugging" doesn't make a lot of sense.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Zhang Yanfei
Hi tejun,

On 08/21/2013 09:06 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
>> [What are we doing]
>>
>> We are trying to initialize acip tables as early as possible. But Linux 
>> kernel
>> allows users to override acpi tables by specifying their own tables in 
>> initrd.
>> So we have to do acpi_initrd_override() earlier first.
> 
> So, are we now back to making SRAT info as early as possible?  What
> happened to just co-locating early allocations close to kernel image?
> What'd be the benefit of doing this over that?

We know you are trying to give the direction to make the change more natural and
robust and very thankful for your comments. We have taken your comments and 
suggestions
about co-locating early allocations close to kernel image into consideration, 
but
still we found that not that easy.

In current boot order, before we get the SRAT, we have a big consumer of early
allocations: we are setting up the page table in top-down (The idea was 
proposed by HPA,
Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
setup will make the page tables as high as possible in memory, since memory at 
low 
addresses is precious (for stupid DMA devices, for things like  kexec/kdump, 
and so on.)

So if we are trying to make early allocations close to kernel image, we should
rewrite the way we are setting up page table totally. That is not a easy thing
to do.

As for the benefits of the patchset, just as Tang said in this patch,

* For memory hotplug, we need ACPI SRAT at early time to be aware of which 
memory
  ranges are hotpluggable, and tell the kernel to try to stay away from 
hotpluggable
  nodes.

This one is the current requirement of us but may be very helpful for future 
change:

* As suggested by Yinghai, we should allocate page tables in local node. This 
also
  needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.

So we decided to firstly make ACPI override earlier and use BRK (this is 
obviously
near the kernel image range) to store the found ACPI tables.

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
> [What are we doing]
> 
> We are trying to initialize acip tables as early as possible. But Linux kernel
> allows users to override acpi tables by specifying their own tables in initrd.
> So we have to do acpi_initrd_override() earlier first.

So, are we now back to making SRAT info as early as possible?  What
happened to just co-locating early allocations close to kernel image?
What'd be the benefit of doing this over that?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tang Chen

Hi all,

This patch-set has not been fully tested. I sent them first for you
to review. Please comment if we can agree on this solution.

Thanks.:)

On 08/21/2013 06:15 PM, Tang Chen wrote:

This patch-set aims to move acpi_initrd_override() earlier on x86.
Some of the patches are from Yinghai's patch-set:
https://lkml.org/lkml/2013/6/14/561

The difference between this patch-set and Yinghai's original patch-set are:
1. This patch-set doesn't split acpi_initrd_override(), but call it as a
whole operation at early time.
2. Allocate memory from BRK to store override tables.
(This idea is also from Yinghai.)


[Current state]

The current Linux kernel will initialize acpi tables like the following:

1. Find all acpi override table provided by users in initrd.
(Linux allows users to override acpi tables in firmware, by specifying
their own tables in initrd.)

2. Use acpica code to initialize acpi global root table list and install all
tables into it. If any override tables exists, use it to override the one
provided by firmware.

Then others can parse these tables and get useful info.

Both of the two steps happen after direct mapping page tables are setup.

[Issues]

In the current Linux kernel, the initialization of acpi tables is too late for
new functionalities.

We have some issues about this:

* For memory hotplug, we need ACPI SRAT at early time to be aware of which 
memory
   ranges are hotpluggable, and prevent bootmem allocator from allocating memory
   for the kernel. (Kernel pages cannot be hotplugged because )

* As suggested by Yinghai Lu, we should allocate page tables
   in local node. This also needs SRAT before direct mapping page tables are 
setup.

* As mentioned by Toshi Kani, ACPI SCPR/DBGP/DBG2 tables
   allow the OS to initialize serial console/debug ports at early boot time. The
   earlier it can be initialized, the better this feature will be.  These tables
   are not currently used by Linux due to a licensing issue, but it could be
   addressed some time soon.


[What are we doing]

We are trying to initialize acip tables as early as possible. But Linux kernel
allows users to override acpi tables by specifying their own tables in initrd.
So we have to do acpi_initrd_override() earlier first.


[About this patch-set]

This patch-set aims to move acpi_initrd_override() as early as possible on x86.
As suggested by Yinghai, we are trying to do it like this:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
   access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
   is setup.

And also, acpi_initrd_override() needs to allocate memory for override tables.
But at such an early time, there is no memory allocator works. So the basic idea
from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.


Tang Chen (6):
   x86, acpi: Move table_sigs[] to stack.
   x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
   x86, brk: Make extend_brk() available with va/pa.
   x86, acpi: Make acpi_initrd_override() available with va or pa.
   x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
 with va/pa.
   x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.

Yinghai Lu (2):
   x86: Make get_ramdisk_{image|size}() global.
   x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.

  arch/x86/include/asm/dmi.h  |2 +-
  arch/x86/include/asm/setup.h|   11 +++-
  arch/x86/kernel/head64.c|4 +
  arch/x86/kernel/head_32.S   |4 +
  arch/x86/kernel/microcode_intel_early.c |8 +-
  arch/x86/kernel/setup.c |   93 --
  arch/x86/mm/init.c  |2 +-
  arch/x86/xen/enlighten.c|2 +-
  arch/x86/xen/mmu.c  |6 +-
  arch/x86/xen/p2m.c  |   27 ---
  drivers/acpi/osl.c  |  130 ---
  include/linux/acpi.h|5 +-
  12 files changed, 196 insertions(+), 98 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email:mailto:"d...@kvack.org;>  em...@kvack.org


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tang Chen

Hi all,

This patch-set has not been fully tested. I sent them first for you
to review. Please comment if we can agree on this solution.

Thanks.:)

On 08/21/2013 06:15 PM, Tang Chen wrote:

This patch-set aims to move acpi_initrd_override() earlier on x86.
Some of the patches are from Yinghai's patch-set:
https://lkml.org/lkml/2013/6/14/561

The difference between this patch-set and Yinghai's original patch-set are:
1. This patch-set doesn't split acpi_initrd_override(), but call it as a
whole operation at early time.
2. Allocate memory from BRK to store override tables.
(This idea is also from Yinghai.)


[Current state]

The current Linux kernel will initialize acpi tables like the following:

1. Find all acpi override table provided by users in initrd.
(Linux allows users to override acpi tables in firmware, by specifying
their own tables in initrd.)

2. Use acpica code to initialize acpi global root table list and install all
tables into it. If any override tables exists, use it to override the one
provided by firmware.

Then others can parse these tables and get useful info.

Both of the two steps happen after direct mapping page tables are setup.

[Issues]

In the current Linux kernel, the initialization of acpi tables is too late for
new functionalities.

We have some issues about this:

* For memory hotplug, we need ACPI SRAT at early time to be aware of which 
memory
   ranges are hotpluggable, and prevent bootmem allocator from allocating memory
   for the kernel. (Kernel pages cannot be hotplugged because )

* As suggested by Yinghai Luying...@kernel.org, we should allocate page tables
   in local node. This also needs SRAT before direct mapping page tables are 
setup.

* As mentioned by Toshi Kanitoshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables
   allow the OS to initialize serial console/debug ports at early boot time. The
   earlier it can be initialized, the better this feature will be.  These tables
   are not currently used by Linux due to a licensing issue, but it could be
   addressed some time soon.


[What are we doing]

We are trying to initialize acip tables as early as possible. But Linux kernel
allows users to override acpi tables by specifying their own tables in initrd.
So we have to do acpi_initrd_override() earlier first.


[About this patch-set]

This patch-set aims to move acpi_initrd_override() as early as possible on x86.
As suggested by Yinghai, we are trying to do it like this:

On 32bit: do it in head_32.S, before paging is enabled. In this case, we can
   access initrd with physical address without page tables.

On 64bit: do it in head_64.c, after paging is enabled but before direct mapping
   is setup.

And also, acpi_initrd_override() needs to allocate memory for override tables.
But at such an early time, there is no memory allocator works. So the basic idea
from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set.


Tang Chen (6):
   x86, acpi: Move table_sigs[] to stack.
   x86, acpi, brk: Extend BRK 256KB to store acpi override tables.
   x86, brk: Make extend_brk() available with va/pa.
   x86, acpi: Make acpi_initrd_override() available with va or pa.
   x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available
 with va/pa.
   x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c.

Yinghai Lu (2):
   x86: Make get_ramdisk_{image|size}() global.
   x86, microcode: Use get_ramdisk_{image|size}() in microcode handling.

  arch/x86/include/asm/dmi.h  |2 +-
  arch/x86/include/asm/setup.h|   11 +++-
  arch/x86/kernel/head64.c|4 +
  arch/x86/kernel/head_32.S   |4 +
  arch/x86/kernel/microcode_intel_early.c |8 +-
  arch/x86/kernel/setup.c |   93 --
  arch/x86/mm/init.c  |2 +-
  arch/x86/xen/enlighten.c|2 +-
  arch/x86/xen/mmu.c  |6 +-
  arch/x86/xen/p2m.c  |   27 ---
  drivers/acpi/osl.c  |  130 ---
  include/linux/acpi.h|5 +-
  12 files changed, 196 insertions(+), 98 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email:a href=mailto:d...@kvack.org;  em...@kvack.org/a


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
 [What are we doing]
 
 We are trying to initialize acip tables as early as possible. But Linux kernel
 allows users to override acpi tables by specifying their own tables in initrd.
 So we have to do acpi_initrd_override() earlier first.

So, are we now back to making SRAT info as early as possible?  What
happened to just co-locating early allocations close to kernel image?
What'd be the benefit of doing this over that?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Zhang Yanfei
Hi tejun,

On 08/21/2013 09:06 PM, Tejun Heo wrote:
 Hello,
 
 On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote:
 [What are we doing]

 We are trying to initialize acip tables as early as possible. But Linux 
 kernel
 allows users to override acpi tables by specifying their own tables in 
 initrd.
 So we have to do acpi_initrd_override() earlier first.
 
 So, are we now back to making SRAT info as early as possible?  What
 happened to just co-locating early allocations close to kernel image?
 What'd be the benefit of doing this over that?

We know you are trying to give the direction to make the change more natural and
robust and very thankful for your comments. We have taken your comments and 
suggestions
about co-locating early allocations close to kernel image into consideration, 
but
still we found that not that easy.

In current boot order, before we get the SRAT, we have a big consumer of early
allocations: we are setting up the page table in top-down (The idea was 
proposed by HPA,
Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
setup will make the page tables as high as possible in memory, since memory at 
low 
addresses is precious (for stupid DMA devices, for things like  kexec/kdump, 
and so on.)

So if we are trying to make early allocations close to kernel image, we should
rewrite the way we are setting up page table totally. That is not a easy thing
to do.

As for the benefits of the patchset, just as Tang said in this patch,

* For memory hotplug, we need ACPI SRAT at early time to be aware of which 
memory
  ranges are hotpluggable, and tell the kernel to try to stay away from 
hotpluggable
  nodes.

This one is the current requirement of us but may be very helpful for future 
change:

* As suggested by Yinghai, we should allocate page tables in local node. This 
also
  needs SRAT before direct mapping page tables are setup.

* As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables
  allow the OS to initialize serial console/debug ports at early boot time. The
  earlier it can be initialized, the better this feature will be.  These tables
  are not currently used by Linux due to a licensing issue, but it could be
  addressed some time soon.

So we decided to firstly make ACPI override earlier and use BRK (this is 
obviously
near the kernel image range) to store the found ACPI tables.

-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
 In current boot order, before we get the SRAT, we have a big consumer of early
 allocations: we are setting up the page table in top-down (The idea was 
 proposed by HPA,
 Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
 setup will make the page tables as high as possible in memory, since memory 
 at low 
 addresses is precious (for stupid DMA devices, for things like  kexec/kdump, 
 and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

 So if we are trying to make early allocations close to kernel image, we should
 rewrite the way we are setting up page table totally. That is not a easy thing
 to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

 * For memory hotplug, we need ACPI SRAT at early time to be aware of which 
 memory
   ranges are hotpluggable, and tell the kernel to try to stay away from 
 hotpluggable
   nodes.
 
 This one is the current requirement of us but may be very helpful for future 
 change:
 
 * As suggested by Yinghai, we should allocate page tables in local node. This 
 also
   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

 * As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables
   allow the OS to initialize serial console/debug ports at early boot time. 
 The
   earlier it can be initialized, the better this feature will be.  These 
 tables
   are not currently used by Linux due to a licensing issue, but it could be
   addressed some time soon.
 
 So we decided to firstly make ACPI override earlier and use BRK (this is 
 obviously
 near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
earlier debugging doesn't make a lot of sense.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote:
 Hello,
 
 On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
  In current boot order, before we get the SRAT, we have a big consumer of 
  early
  allocations: we are setting up the page table in top-down (The idea was 
  proposed by HPA,
  Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page 
  table
  setup will make the page tables as high as possible in memory, since memory 
  at low 
  addresses is precious (for stupid DMA devices, for things like  
  kexec/kdump, and so on.)
 
 With huge mappings, they are fairly small, right?  And this whole
 thing needs a kernel param anyway at this point, so the allocation
 direction can be made dependent on that or huge mapping availability
 and, even with 4k mappings, we aren't talking about gigabytes of
 memory, are we?
 
  So if we are trying to make early allocations close to kernel image, we 
  should
  rewrite the way we are setting up page table totally. That is not a easy 
  thing
  to do.
 
 It has been a while since I looked at the code so can you please
 elaborate why that is not easy?  It's pretty simple conceptually.
 
  * For memory hotplug, we need ACPI SRAT at early time to be aware of which 
  memory
ranges are hotpluggable, and tell the kernel to try to stay away from 
  hotpluggable
nodes.
  
  This one is the current requirement of us but may be very helpful for 
  future change:
  
  * As suggested by Yinghai, we should allocate page tables in local node. 
  This also
needs SRAT before direct mapping page tables are setup.
 
 Does this even matter for huge mappings?
 
  * As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables
allow the OS to initialize serial console/debug ports at early boot time. 
  The
earlier it can be initialized, the better this feature will be.  These 
  tables
are not currently used by Linux due to a licensing issue, but it could be
addressed some time soon.
  
  So we decided to firstly make ACPI override earlier and use BRK (this is 
  obviously
  near the kernel image range) to store the found ACPI tables.
 
 I don't know.  The whole effort seems way overcomplicated compared to
 the benefits it would bring.  For NUMA memory hotunplug, what's the
 point of doing all this when the kernel doesn't have any control over
 where its image is gonna be?  Some megabytes at the tail aren't gonna
 make a huge difference and if you wanna do this properly, you need to
 determine the load address of the kernel considering the node
 boundaries and hotpluggability of each node, which has to happen
 before the early kernel boot code executes.  And if there's a code
 piece which does that, that might as well place the kernel image such
 that extra allocation afterwards doesn't interfere with memory
 hotunplugging.
 
 It looks like a lot of code changes for a mechanism which doesn't seem
 all that useful.  This code is already too late in boot sequence to be
 a proper solution so I don't see the point in pushing the coverage to
 the maximum from here.  It's kinda silly.
 
 The last point - early init of debug facility - makes some sense but
 again how extra coverage are we talking about?  The code path between
 the two points is fairly short and the change doesn't come free.  It
 means we add more fragile firmware-specific code path before the
 execution environment is stable and get to do things like traveling
 the same code paths multiple times in different environments.  Doesn't
 seem like a win.  We want to reach stable execution environment as
 soon as possible.  Shoving whole more logic before that in the name of
 earlier debugging doesn't make a lot of sense.

Well, there is reason why we have earlyprintk feature today.  So, let's
not debate on this feature now.  There was previous attempt to support
this feature with ACPI tables below.  As described, it had the same
ordering issue.

https://lkml.org/lkml/2012/10/8/498

There is a basic problem that when we try to use ACPI tables that
extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
this ordering issue because ACPI is not available as early as the legacy
interfaces.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello, Toshi.

On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
 Well, there is reason why we have earlyprintk feature today.  So, let's
 not debate on this feature now.  There was previous attempt to support

Are you saying the existing earlyprintk automatically justifies
addition of more complex mechanism?  The added complex of course
should be traded off against the benefits of gaining ACPI based early
boot.  You aren't gonna suggest implementing netconsole based
earlyprintk, right?

 this feature with ACPI tables below.  As described, it had the same
 ordering issue.
 
 https://lkml.org/lkml/2012/10/8/498
 
 There is a basic problem that when we try to use ACPI tables that
 extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
 this ordering issue because ACPI is not available as early as the legacy
 interfaces.

Do we even want ACPI parsing and all that that early?  Parsing SRAT
early doesn't buy us much and I'm not sure whether adding ACPI
earlyprintk would increase or decrease debuggability during earlyboot.
It adds whole lot more code paths where things can go wrong while the
basic execution environment is unstable.  Why do that?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote:
 On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote:
  Well, there is reason why we have earlyprintk feature today.  So, let's
  not debate on this feature now.  There was previous attempt to support
 
 Are you saying the existing earlyprintk automatically justifies
 addition of more complex mechanism?  The added complex of course
 should be traded off against the benefits of gaining ACPI based early
 boot.  You aren't gonna suggest implementing netconsole based
 earlyprintk, right?

Platforms vendors (which care Linux) need to support the existing Linux
features.  This means that they have to implement legacy interfaces on
x86 until the kernel supports an alternative method.  For instance, some
platforms are legacy-free and do not have legacy COM ports.  These ACPI
tables were defined so that non-legacy COM ports can be described and
informed to the OS.  Without this support, such platforms may have to
emulate the legacy COM ports for Linux, or drop Linux support.

  this feature with ACPI tables below.  As described, it had the same
  ordering issue.
  
  https://lkml.org/lkml/2012/10/8/498
  
  There is a basic problem that when we try to use ACPI tables that
  extends or replaces legacy interfaces (ex. SRAT extending e820), we hit
  this ordering issue because ACPI is not available as early as the legacy
  interfaces.
 
 Do we even want ACPI parsing and all that that early?  Parsing SRAT
 early doesn't buy us much and I'm not sure whether adding ACPI
 earlyprintk would increase or decrease debuggability during earlyboot.
 It adds whole lot more code paths where things can go wrong while the
 basic execution environment is unstable.  Why do that?

I think the kernel boot-up sequence should be designed in such a way
that can support legacy-free and/or NUMA platforms properly.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello, Toshi.

On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
 Platforms vendors (which care Linux) need to support the existing Linux
 features.  This means that they have to implement legacy interfaces on
 x86 until the kernel supports an alternative method.  For instance, some
 platforms are legacy-free and do not have legacy COM ports.  These ACPI
 tables were defined so that non-legacy COM ports can be described and
 informed to the OS.  Without this support, such platforms may have to
 emulate the legacy COM ports for Linux, or drop Linux support.

Are you seriously saying that vendors are gonna drop linux support for
lacking ACPI earlyprintk support?  Please...

Please take a look at the existing earlyprintk code and how compact
and self-contained they are.  If you want to add ACPI earlyprintk, do
similar stuff.  Forget about firmware blob override from initrd or
ACPICA.  Just implement the bare minimum to get the thing working.  Do
not add dependency to large body of code from earlyboot.  It's a bad
idea through and through.

 I think the kernel boot-up sequence should be designed in such a way
 that can support legacy-free and/or NUMA platforms properly.

Blanket statements like the above don't mean much.  There are many
separate stages of boot and you're talking about one of the very first
stages where we traditionally have always depended upon only the very
bare minimum of the platform both in hardware itself and configuration
information.  We've been doing that for *very* good reasons.  If you
screw up there, it's mighty tricky to figure out what went wrong
especially on the machines that you can't physically kick.  You're now
suggesting to add whole ACPI parsing including overloading from initrd
into that stage with pretty weak rationale.

Seriously, if you want ACPI based earlyprintk, implement it in a
discrete minimal code which is easy to verify and won't get affected
when the rest of ACPI machinery is updated.  We really don't want
earlyboot to fail because someone screwed up ACPI or initrd handling.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Toshi Kani
Hello Tejun,

On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote:
 On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote:
  Platforms vendors (which care Linux) need to support the existing Linux
  features.  This means that they have to implement legacy interfaces on
  x86 until the kernel supports an alternative method.  For instance, some
  platforms are legacy-free and do not have legacy COM ports.  These ACPI
  tables were defined so that non-legacy COM ports can be described and
  informed to the OS.  Without this support, such platforms may have to
  emulate the legacy COM ports for Linux, or drop Linux support.
 
 Are you seriously saying that vendors are gonna drop linux support for
 lacking ACPI earlyprintk support?  Please...

earlyprintk is an example of the issues.  The point is that vendors are
required to support legacy stuff for Linux.

 Please take a look at the existing earlyprintk code and how compact
 and self-contained they are.  If you want to add ACPI earlyprintk, do
 similar stuff.  Forget about firmware blob override from initrd or
 ACPICA.  Just implement the bare minimum to get the thing working.  Do
 not add dependency to large body of code from earlyboot.  It's a bad
 idea through and through.

I am not saying that ACPI earlyprintk must be available at exactly the
same point.  How early it can reasonably be is a subject of discussion.

  I think the kernel boot-up sequence should be designed in such a way
  that can support legacy-free and/or NUMA platforms properly.
 
 Blanket statements like the above don't mean much.  There are many
 separate stages of boot and you're talking about one of the very first
 stages where we traditionally have always depended upon only the very
 bare minimum of the platform both in hardware itself and configuration
 information.  We've been doing that for *very* good reasons.  If you
 screw up there, it's mighty tricky to figure out what went wrong
 especially on the machines that you can't physically kick.  You're now
 suggesting to add whole ACPI parsing including overloading from initrd
 into that stage with pretty weak rationale.

I agree that ACPI is rather complicated stuff.  But in my experience,
the majority complication comes from ACPI namespace and methods, not
from ACPI tables.  Do you really think ACPI table init is that risky?  I
consider ACPI tables are part of the minimum config info, esp. for
legacy-free platforms.

 Seriously, if you want ACPI based earlyprintk, implement it in a
 discrete minimal code which is easy to verify and won't get affected
 when the rest of ACPI machinery is updated.  We really don't want
 earlyboot to fail because someone screwed up ACPI or initrd handling.

earlyprintk is just another example to this SRAT issue.  The local page
table is yet another example.  My hope here is for us to be able to
utilize ACPI tables properly without hitting this kind of ordering
issues again and again, which requires considerable time  effort to
address.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.

2013-08-21 Thread Tejun Heo
Hello,

On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote:
 I agree that ACPI is rather complicated stuff.  But in my experience,
 the majority complication comes from ACPI namespace and methods, not
 from ACPI tables.  Do you really think ACPI table init is that risky?  I
 consider ACPI tables are part of the minimum config info, esp. for
 legacy-free platforms.

It's just that we're talking about the very first stage of boot.  We
really don't do much there and pulling in ACPI code into that stage is
a lot by comparison.  If that's gonna happen, it needs pretty strong
justification.

 earlyprintk is just another example to this SRAT issue.  The local page
 table is yet another example.  My hope here is for us to be able to
 utilize ACPI tables properly without hitting this kind of ordering
 issues again and again, which requires considerable time  effort to
 address.

So, the two things brought up at this point are early parsing of SRAT,
which can't really solve the problem at hand anyway, and earlyprintk
which should be implemented in minimal way which is not activated
unless specifically enabled with earlyprintk boot param.  Neither
seems to justify pulling in full ACPI into early boot, right?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/