Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: > On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin wrote: > > > > BRK makes sense as long as you can set a sane O(1) size limit. > > > >> > >>put the acpi override table in BRK, we still need ok from HPA. > >>I have impression that he did not like it, so want to confirm from him. > > on 8 sockets system: > -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat > -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat > -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat > -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat > -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat > -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat > -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat > -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat > -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat > -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat > -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat > -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat > -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat > -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat > -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat > -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat > -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat > -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat > -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat > -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat > -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat > -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat > -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat > > assume for 32sockets will have four times bigger with DSDT and SSDT. > (with more pci and cpus) > > So we can not have O(1) the size. > > Russ, What is ACPI table size on your big machine? This is from a 255 socket, 4080 cpu, 15TB system. --- -rw-r--r-- 1 root root 65392 Aug 23 21:23 apic.dat -rw-r--r-- 1 root root 316 Aug 23 21:23 dmar.dat -rw-r--r-- 1 root root 8309249 Aug 23 21:23 dsdt.dat -rw-r--r-- 1 root root 244 Aug 23 21:23 facp.dat -rw-r--r-- 1 root root 64 Aug 23 21:23 facs.dat -rw-r--r-- 1 root root 56 Aug 23 21:23 hpet.dat -rw-r--r-- 1 root root4172 Aug 23 21:23 mcfg.dat -rw-r--r-- 1 root root 36 Aug 23 21:23 rsdp.dat -rw-r--r-- 1 root root 80 Aug 23 21:23 rsdt.dat -rw-r--r-- 1 root root 65069 Aug 23 21:23 slit.dat -rw-r--r-- 1 root root 80 Aug 23 21:23 spcr.dat -rw-r--r-- 1 root root 108168 Aug 23 21:23 srat.dat -rw-r--r-- 1 root root 21330 Aug 23 21:23 ssdt.dat -rw-r--r-- 1 root root 92 Aug 23 21:23 uefi1.dat -rw-r--r-- 1 root root 298 Aug 23 21:23 uefi.dat -rw-r--r-- 1 root root 124 Aug 23 21:23 xsdt.dat --- -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
> So we need to allocate memory. That is why you suggested to use BRK, right ? > And the size seems to be a problem. > > So I suggest to use early_ioremap(). > > 1. After paging is enabled, before direct mapping page tables are > setup, we map the > initrd with early_ioremap(). And we are able to access it with va, > even on 32bit. > Then we can find all tables. > 2. We still use memblock to allocate memory. Maybe it will be > hotpluggable memory, > but this memory can be freed when all the acpi tables are parsed, right ? > > So I want to try early_ioremap(). All these should be done in setup_arch(). no. cpio search need to take whole range virtual address, and early_ioremap has size limitation. you will have to update cpio search to take mapping function. could be too messy. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Yinghai, 2013/8/24 Yinghai Lu : > On Fri, Aug 23, 2013 at 2:50 PM, chen tang wrote: >>> >>> so the DSDT is 7F493E, and total is more than 8M. >>> >>> that will need BRK to be extended 16M? >>> >> >> Then how about use early_ioremap(), and don't do it that early in >> head_32 and head64 ? > > why could early_ioremap() help? > > when to use early_ioremap()? what for? > In my understanding, acpica framework needs users to copy the override tables somewhere in the memory. And acpica will get these user specified tables when installing firmware tables. This is the acpica logic, which cannot be changed, I think. So we need to allocate memory. That is why you suggested to use BRK, right ? And the size seems to be a problem. So I suggest to use early_ioremap(). 1. After paging is enabled, before direct mapping page tables are setup, we map the initrd with early_ioremap(). And we are able to access it with va, even on 32bit. Then we can find all tables. 2. We still use memblock to allocate memory. Maybe it will be hotpluggable memory, but this memory can be freed when all the acpi tables are parsed, right ? So I want to try early_ioremap(). All these should be done in setup_arch(). Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 2:50 PM, chen tang wrote: >> >> so the DSDT is 7F493E, and total is more than 8M. >> >> that will need BRK to be extended 16M? >> > > Then how about use early_ioremap(), and don't do it that early in > head_32 and head64 ? why could early_ioremap() help? when to use early_ioremap()? what for? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 2:52 PM, Moore, Robert wrote: > While we're at it: > > Can someone send me the acpidump for this machine? We very much would like to > test all of ACPICA with such a large DSDT. That is Russ. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
While we're at it: Can someone send me the acpidump for this machine? We very much would like to test all of ACPICA with such a large DSDT. Thanks, Bob > -Original Message- > From: chen tang [mailto:imtangc...@gmail.com] > Sent: Friday, August 23, 2013 2:51 PM > To: Yinghai Lu > Cc: Russ Anderson; H. Peter Anvin; Zhang Yanfei; Toshi Kani; Tejun Heo; > Tang Chen; Konrad Rzeszutek Wilk; Moore, Robert; Zheng, Lv; Rafael J. > Wysocki; Ingo Molnar; Andrew Morton; Thomas Renninger; Yasuaki Ishimatsu; > Mel Gorman; Linux Kernel Mailing List > Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier. > > Hi Yinghai, > > 2013/8/24 Yinghai Lu : > > On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson wrote: > >> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: > >>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin > wrote: > > > >>> Russ, What is ACPI table size on your big machine? > >> > >> This is from a 256 socket 32TB system. > >> > >> Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: > 32501719MB) > >> ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) > >> ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO > 0113) > >> ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT > 0113) > >> ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT > 0113) > >> ACPI: FACS 7d147000 00040 > >> ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO > ) > >> ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV > ) > >> ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT > 0113) > >> ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL > 20070508) > >> ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT > 0001) > >> ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT > 0001) > >> ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT > 0001) > >> ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT > 0001) > >> ACPI: SPCR 7e6c2000 00050 (v01 > ) > >> ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT > 0113) > >> > > > > so the DSDT is 7F493E, and total is more than 8M. > > > > that will need BRK to be extended 16M? > > > > Then how about use early_ioremap(), and don't do it that early in > head_32 and head64 ? > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Yinghai, 2013/8/24 Yinghai Lu : > On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson wrote: >> On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: >>> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin wrote: > >>> Russ, What is ACPI table size on your big machine? >> >> This is from a 256 socket 32TB system. >> >> Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: >> 32501719MB) >> ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) >> ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO >> 0113) >> ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT >> 0113) >> ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT >> 0113) >> ACPI: FACS 7d147000 00040 >> ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO >> ) >> ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV >> ) >> ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT >> 0113) >> ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL >> 20070508) >> ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT >> 0001) >> ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT >> 0001) >> ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT >> 0001) >> ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT >> 0001) >> ACPI: SPCR 7e6c2000 00050 (v01 >> ) >> ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT >> 0113) >> > > so the DSDT is 7F493E, and total is more than 8M. > > that will need BRK to be extended 16M? > Then how about use early_ioremap(), and don't do it that early in head_32 and head64 ? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson wrote: > On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: >> On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin wrote: >> Russ, What is ACPI table size on your big machine? > > This is from a 256 socket 32TB system. > > Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: > 32501719MB) > ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) > ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO > 0113) > ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT > 0113) > ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT > 0113) > ACPI: FACS 7d147000 00040 > ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO > ) > ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV > ) > ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT > 0113) > ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL > 20070508) > ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT > 0001) > ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT > 0001) > ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT > 0001) > ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT > 0001) > ACPI: SPCR 7e6c2000 00050 (v01 > ) > ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT > 0113) > so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: > On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin wrote: > > > > BRK makes sense as long as you can set a sane O(1) size limit. > > > >> > >>put the acpi override table in BRK, we still need ok from HPA. > >>I have impression that he did not like it, so want to confirm from him. > > on 8 sockets system: > -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat > -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat > -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat > -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat > -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat > -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat > -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat > -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat > -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat > -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat > -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat > -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat > -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat > -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat > -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat > -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat > -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat > -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat > -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat > -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat > -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat > -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat > -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat > > assume for 32sockets will have four times bigger with DSDT and SSDT. > (with more pci and cpus) > > So we can not have O(1) the size. > > Russ, What is ACPI table size on your big machine? This is from a 256 socket 32TB system. Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB) ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO 0113) ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT 0113) ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT 0113) ACPI: FACS 7d147000 00040 ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO ) ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV ) ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT 0113) ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL 20070508) ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: SPCR 7e6c2000 00050 (v01 ) ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT 0113) -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin wrote: > > BRK makes sense as long as you can set a sane O(1) size limit. > >> >>put the acpi override table in BRK, we still need ok from HPA. >>I have impression that he did not like it, so want to confirm from him. on 8 sockets system: -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat assume for 32sockets will have four times bigger with DSDT and SSDT. (with more pci and cpus) So we can not have O(1) the size. Russ, What is ACPI table size on your big machine? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Zhang, On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote: > > Tang, what do you think? Are you OK to try Tejun's suggestion as well? > > > > By saying TJ's suggestion, you mean, we will let memblock to control the > behaviour, that said, we will do early allocations near the kernel image > range before we get the SRAT info? Right. > If so, yeah, we have been working on this direction. Great! > By doing this, we may > have two main changes: > > 1. change some of memblock's APIs to make it have the ability to allocate >memory from low address. > 2. setup kernel page table down-top. Concretely, we first map the memory >just after the kernel image to the top, then, we map 0 - kernel image end. > > Do you guys think this is reasonable and acceptable? Have you also looked at Yinghai's comments below? http://www.spinics.net/lists/linux-mm/msg61362.html Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Toshi, On 08/24/2013 01:13 AM, Toshi Kani wrote: > Hello, > > On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote: >> On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: >>> I still think acpi table info should be available earlier, but I do not >>> think I can convince you on this. This can be religious debate. >> >> I'm curious. If there aren't substantial enough benefits, why would >> you still want to pull it earlier when it brings in things like initrd >> override and crafting the code carefully so that it's safe to execute >> it from different address modes and so on? Please note that x86 is >> not ia64. The early environment is completely different not only >> technically but also in its diversity and suckiness. It wasn't too >> long ago that vendors were screwing up ACPI left and right. It has >> been getting better but there's a reason why, for example, we still >> consider e820 to be the authoritative information over ACPI. > > Firmware generates tables, and provides them via some interface. Memory > map table can be provided via e820 or EFI memory map. Memory topology > table is provided via ACPI. I agree to prioritize one table over the > other when there is overlap. But in the end, it is the firmware that > generates the tables. Because it is provided via ACPI does not make it > suddenly unreliable. I think table info from e820/EFI/ACPI should be > available at the same time. To me, it makes more sense to use the > hotplug info to initialize memblock than try to find a way to workaround > without it. Yeah, agreed. But sigh on x86, we have ACPI initrd override, so we still cannot convince Tj I think we will continue to be in that way to find a > workaround in this direction. > > I came from ia64 background, and am not very familiar with x86. So, you > may be very right about that x86 is different. I also agree that initrd > is making it unnecessarily complicated. We may see some initial issues, > but my hope is that the code gets matured over the time. > > Thanks, > -Toshi > -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote: > On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: > > I still think acpi table info should be available earlier, but I do not > > think I can convince you on this. This can be religious debate. > > I'm curious. If there aren't substantial enough benefits, why would > you still want to pull it earlier when it brings in things like initrd > override and crafting the code carefully so that it's safe to execute > it from different address modes and so on? Please note that x86 is > not ia64. The early environment is completely different not only > technically but also in its diversity and suckiness. It wasn't too > long ago that vendors were screwing up ACPI left and right. It has > been getting better but there's a reason why, for example, we still > consider e820 to be the authoritative information over ACPI. Firmware generates tables, and provides them via some interface. Memory map table can be provided via e820 or EFI memory map. Memory topology table is provided via ACPI. I agree to prioritize one table over the other when there is overlap. But in the end, it is the firmware that generates the tables. Because it is provided via ACPI does not make it suddenly unreliable. I think table info from e820/EFI/ACPI should be available at the same time. To me, it makes more sense to use the hotplug info to initialize memblock than try to find a way to workaround without it. I think we will continue to be in that way to find a workaround in this direction. I came from ia64 background, and am not very familiar with x86. So, you may be very right about that x86 is different. I also agree that initrd is making it unnecessarily complicated. We may see some initial issues, but my hope is that the code gets matured over the time. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello On 08/24/2013 12:14 AM, Toshi Kani wrote: > Hello, > > On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote: >> On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: >>> I am relatively new to Linux, so I am not a good person to elaborate >>> this. From my experience on other OS, huge pages helped for the kernel, >>> but did not necessarily help user applications. It depended on >>> applications, which were not niche cases. But Linux may be different, >>> so I asked since you seemed confident. I'd appreciate if you can point >>> us some data that endorses your statement. >> >> We are talking about the kernel linear mapping which is created during >> early boot, so if it's available and useable there's no reason not to >> use it. Exceptions would be earlier processors which didn't do 1G >> mappings or e820 maps with a lot of holes. For CPUs used in NUMA >> configurations, the former has been history for a bit now. Can't be >> sure about the latter but it'd be surprising for that to affect large >> amount of memory in the systems that are of interest here. Ooh, that >> reminds me that we probably wanna go back to 1G + MTRR mapping under >> 4G. We're currently creating a lot of mapping holes. > > Thanks for the explanation. > >>> My worry is that the code is unlikely tested with the special logic when >>> someone makes code changes to the page tables. Such code can easily be >>> broken in future. >> >> Well, I wouldn't consider flipping the direction of allocation to be >> particularly difficult to get right especially when compared to >> bringing in ACPI tables into the mix. >> >>> To answer your other question/email, I believe Tang's next step is to >>> support local page tables. This is why we think pursing SRAT earlier is >>> the right direction. >> >> Given 1G mappings, is that even a worthwhile effort? I'm getting even >> more more skeptical. > > With 1G mappings, I agree that it won't make much difference. > > I still think acpi table info should be available earlier, but I do not > think I can convince you on this. This can be religious debate. > > Tang, what do you think? Are you OK to try Tejun's suggestion as well? > By saying TJ's suggestion, you mean, we will let memblock to control the behaviour, that said, we will do early allocations near the kernel image range before we get the SRAT info? If so, yeah, we have been working on this direction. By doing this, we may have two main changes: 1. change some of memblock's APIs to make it have the ability to allocate memory from low address. 2. setup kernel page table down-top. Concretely, we first map the memory just after the kernel image to the top, then, we map 0 - kernel image end. Do you guys think this is reasonable and acceptable? -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: > I still think acpi table info should be available earlier, but I do not > think I can convince you on this. This can be religious debate. I'm curious. If there aren't substantial enough benefits, why would you still want to pull it earlier when it brings in things like initrd override and crafting the code carefully so that it's safe to execute it from different address modes and so on? Please note that x86 is not ia64. The early environment is completely different not only technically but also in its diversity and suckiness. It wasn't too long ago that vendors were screwing up ACPI left and right. It has been getting better but there's a reason why, for example, we still consider e820 to be the authoritative information over ACPI. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote: > On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: > > I am relatively new to Linux, so I am not a good person to elaborate > > this. From my experience on other OS, huge pages helped for the kernel, > > but did not necessarily help user applications. It depended on > > applications, which were not niche cases. But Linux may be different, > > so I asked since you seemed confident. I'd appreciate if you can point > > us some data that endorses your statement. > > We are talking about the kernel linear mapping which is created during > early boot, so if it's available and useable there's no reason not to > use it. Exceptions would be earlier processors which didn't do 1G > mappings or e820 maps with a lot of holes. For CPUs used in NUMA > configurations, the former has been history for a bit now. Can't be > sure about the latter but it'd be surprising for that to affect large > amount of memory in the systems that are of interest here. Ooh, that > reminds me that we probably wanna go back to 1G + MTRR mapping under > 4G. We're currently creating a lot of mapping holes. Thanks for the explanation. > > My worry is that the code is unlikely tested with the special logic when > > someone makes code changes to the page tables. Such code can easily be > > broken in future. > > Well, I wouldn't consider flipping the direction of allocation to be > particularly difficult to get right especially when compared to > bringing in ACPI tables into the mix. > > > To answer your other question/email, I believe Tang's next step is to > > support local page tables. This is why we think pursing SRAT earlier is > > the right direction. > > Given 1G mappings, is that even a worthwhile effort? I'm getting even > more more skeptical. With 1G mappings, I agree that it won't make much difference. I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. Tang, what do you think? Are you OK to try Tejun's suggestion as well? Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote: > Yeah, it's true that MTRRs are nasty. On the other hand, we've been > doing that for over a decade and are still doing it anyway if I'm not > mistaken. It probably isn't a big difference but it's still a bit sad > that this is likely causing small performance regression out in the > wild. Just went over the processor manual and it doesn't seem like doing the above would be a good idea. System Programming Guide, Part 1 11.11.9 Large Page Size Considerations ... Because the memory type for a large page is cached in the TLB, the processor can behave in an undefined manner if a large page is mapped to a region of memory that MTRRs have mapped with multiple memory types. ... If a large page maps to a region of memory containing different MTRR-defined memory types, the PCD and PWT flags in the page-table entry should be set for the most conservative memory type for that range. For example, a large page used for memory mapped I/O and regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL ... The Pentium 4, Intel Xeon, and P6 family processors provide special support for the physical memory range from 0 to 4 MBytes, ... Here, the processor maps the memory range as multiple 4-KByte pages within the TLB. This operation insures correct behavior at the cost of performance. To avoid this performance penalty, operating-system software should reserve the large page option for regions of memory at addresses greater than or equal to 4 MBytes. So, yeah, the current behavior seems like the right thing to do. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote: > Well... relying on MTRRs is a big cost in complexity and failure modes. Yeah, it's true that MTRRs are nasty. On the other hand, we've been doing that for over a decade and are still doing it anyway if I'm not mistaken. It probably isn't a big difference but it's still a bit sad that this is likely causing small performance regression out in the wild. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Well... relying on MTRRs is a big cost in complexity and failure modes. Tejun Heo wrote: >Hello, > >On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote: >> What is the point of 1G+MTRR? If there are caching differences the >> TLB will fracture the pages anyway. > >Ah, right. Consuming less memory / cachelines would still be a small >advantage tho unless creating split TLB from larger mapping is >noticeably less efficient. If the extra logic to do that is small, >which I think it'd be, it'd be a gain at almost no cost. > >Thanks. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote: > What is the point of 1G+MTRR? If there are caching differences the > TLB will fracture the pages anyway. Ah, right. Consuming less memory / cachelines would still be a small advantage tho unless creating split TLB from larger mapping is noticeably less efficient. If the extra logic to do that is small, which I think it'd be, it'd be a gain at almost no cost. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
What is the point of 1G+MTRR? If there are caching differences the TLB will fracture the pages anyway. Tejun Heo wrote: >Hello, Toshi. > >On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: >> I am relatively new to Linux, so I am not a good person to elaborate >> this. From my experience on other OS, huge pages helped for the >kernel, >> but did not necessarily help user applications. It depended on >> applications, which were not niche cases. But Linux may be >different, >> so I asked since you seemed confident. I'd appreciate if you can >point >> us some data that endorses your statement. > >We are talking about the kernel linear mapping which is created during >early boot, so if it's available and useable there's no reason not to >use it. Exceptions would be earlier processors which didn't do 1G >mappings or e820 maps with a lot of holes. For CPUs used in NUMA >configurations, the former has been history for a bit now. Can't be >sure about the latter but it'd be surprising for that to affect large >amount of memory in the systems that are of interest here. Ooh, that >reminds me that we probably wanna go back to 1G + MTRR mapping under >4G. We're currently creating a lot of mapping holes. > >> My worry is that the code is unlikely tested with the special logic >when >> someone makes code changes to the page tables. Such code can easily >be >> broken in future. > >Well, I wouldn't consider flipping the direction of allocation to be >particularly difficult to get right especially when compared to >bringing in ACPI tables into the mix. > >> To answer your other question/email, I believe Tang's next step is to >> support local page tables. This is why we think pursing SRAT earlier >is >> the right direction. > >Given 1G mappings, is that even a worthwhile effort? I'm getting even >more more skeptical. > >Thanks. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: > I am relatively new to Linux, so I am not a good person to elaborate > this. From my experience on other OS, huge pages helped for the kernel, > but did not necessarily help user applications. It depended on > applications, which were not niche cases. But Linux may be different, > so I asked since you seemed confident. I'd appreciate if you can point > us some data that endorses your statement. We are talking about the kernel linear mapping which is created during early boot, so if it's available and useable there's no reason not to use it. Exceptions would be earlier processors which didn't do 1G mappings or e820 maps with a lot of holes. For CPUs used in NUMA configurations, the former has been history for a bit now. Can't be sure about the latter but it'd be surprising for that to affect large amount of memory in the systems that are of interest here. Ooh, that reminds me that we probably wanna go back to 1G + MTRR mapping under 4G. We're currently creating a lot of mapping holes. > My worry is that the code is unlikely tested with the special logic when > someone makes code changes to the page tables. Such code can easily be > broken in future. Well, I wouldn't consider flipping the direction of allocation to be particularly difficult to get right especially when compared to bringing in ACPI tables into the mix. > To answer your other question/email, I believe Tang's next step is to > support local page tables. This is why we think pursing SRAT earlier is > the right direction. Given 1G mappings, is that even a worthwhile effort? I'm getting even more more skeptical. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. We are talking about the kernel linear mapping which is created during early boot, so if it's available and useable there's no reason not to use it. Exceptions would be earlier processors which didn't do 1G mappings or e820 maps with a lot of holes. For CPUs used in NUMA configurations, the former has been history for a bit now. Can't be sure about the latter but it'd be surprising for that to affect large amount of memory in the systems that are of interest here. Ooh, that reminds me that we probably wanna go back to 1G + MTRR mapping under 4G. We're currently creating a lot of mapping holes. My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. Well, I wouldn't consider flipping the direction of allocation to be particularly difficult to get right especially when compared to bringing in ACPI tables into the mix. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Given 1G mappings, is that even a worthwhile effort? I'm getting even more more skeptical. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
What is the point of 1G+MTRR? If there are caching differences the TLB will fracture the pages anyway. Tejun Heo t...@kernel.org wrote: Hello, Toshi. On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. We are talking about the kernel linear mapping which is created during early boot, so if it's available and useable there's no reason not to use it. Exceptions would be earlier processors which didn't do 1G mappings or e820 maps with a lot of holes. For CPUs used in NUMA configurations, the former has been history for a bit now. Can't be sure about the latter but it'd be surprising for that to affect large amount of memory in the systems that are of interest here. Ooh, that reminds me that we probably wanna go back to 1G + MTRR mapping under 4G. We're currently creating a lot of mapping holes. My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. Well, I wouldn't consider flipping the direction of allocation to be particularly difficult to get right especially when compared to bringing in ACPI tables into the mix. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Given 1G mappings, is that even a worthwhile effort? I'm getting even more more skeptical. Thanks. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote: What is the point of 1G+MTRR? If there are caching differences the TLB will fracture the pages anyway. Ah, right. Consuming less memory / cachelines would still be a small advantage tho unless creating split TLB from larger mapping is noticeably less efficient. If the extra logic to do that is small, which I think it'd be, it'd be a gain at almost no cost. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Well... relying on MTRRs is a big cost in complexity and failure modes. Tejun Heo t...@kernel.org wrote: Hello, On Fri, Aug 23, 2013 at 03:08:55PM +0200, H. Peter Anvin wrote: What is the point of 1G+MTRR? If there are caching differences the TLB will fracture the pages anyway. Ah, right. Consuming less memory / cachelines would still be a small advantage tho unless creating split TLB from larger mapping is noticeably less efficient. If the extra logic to do that is small, which I think it'd be, it'd be a gain at almost no cost. Thanks. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 04:24:06PM +0200, H. Peter Anvin wrote: Well... relying on MTRRs is a big cost in complexity and failure modes. Yeah, it's true that MTRRs are nasty. On the other hand, we've been doing that for over a decade and are still doing it anyway if I'm not mistaken. It probably isn't a big difference but it's still a bit sad that this is likely causing small performance regression out in the wild. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 10:35:07AM -0400, Tejun Heo wrote: Yeah, it's true that MTRRs are nasty. On the other hand, we've been doing that for over a decade and are still doing it anyway if I'm not mistaken. It probably isn't a big difference but it's still a bit sad that this is likely causing small performance regression out in the wild. Just went over the processor manual and it doesn't seem like doing the above would be a good idea. System Programming Guide, Part 1 11.11.9 Large Page Size Considerations ... Because the memory type for a large page is cached in the TLB, the processor can behave in an undefined manner if a large page is mapped to a region of memory that MTRRs have mapped with multiple memory types. ... If a large page maps to a region of memory containing different MTRR-defined memory types, the PCD and PWT flags in the page-table entry should be set for the most conservative memory type for that range. For example, a large page used for memory mapped I/O and regular memory 11-48 Vol. 3A MEMORY CACHE CONTROL ... The Pentium 4, Intel Xeon, and P6 family processors provide special support for the physical memory range from 0 to 4 MBytes, ... Here, the processor maps the memory range as multiple 4-KByte pages within the TLB. This operation insures correct behavior at the cost of performance. To avoid this performance penalty, operating-system software should reserve the large page option for regions of memory at addresses greater than or equal to 4 MBytes. So, yeah, the current behavior seems like the right thing to do. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote: On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. We are talking about the kernel linear mapping which is created during early boot, so if it's available and useable there's no reason not to use it. Exceptions would be earlier processors which didn't do 1G mappings or e820 maps with a lot of holes. For CPUs used in NUMA configurations, the former has been history for a bit now. Can't be sure about the latter but it'd be surprising for that to affect large amount of memory in the systems that are of interest here. Ooh, that reminds me that we probably wanna go back to 1G + MTRR mapping under 4G. We're currently creating a lot of mapping holes. Thanks for the explanation. My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. Well, I wouldn't consider flipping the direction of allocation to be particularly difficult to get right especially when compared to bringing in ACPI tables into the mix. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Given 1G mappings, is that even a worthwhile effort? I'm getting even more more skeptical. With 1G mappings, I agree that it won't make much difference. I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. Tang, what do you think? Are you OK to try Tejun's suggestion as well? Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. I'm curious. If there aren't substantial enough benefits, why would you still want to pull it earlier when it brings in things like initrd override and crafting the code carefully so that it's safe to execute it from different address modes and so on? Please note that x86 is not ia64. The early environment is completely different not only technically but also in its diversity and suckiness. It wasn't too long ago that vendors were screwing up ACPI left and right. It has been getting better but there's a reason why, for example, we still consider e820 to be the authoritative information over ACPI. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello On 08/24/2013 12:14 AM, Toshi Kani wrote: Hello, On Fri, 2013-08-23 at 09:04 -0400, Tejun Heo wrote: On Thu, Aug 22, 2013 at 04:17:41PM -0600, Toshi Kani wrote: I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. We are talking about the kernel linear mapping which is created during early boot, so if it's available and useable there's no reason not to use it. Exceptions would be earlier processors which didn't do 1G mappings or e820 maps with a lot of holes. For CPUs used in NUMA configurations, the former has been history for a bit now. Can't be sure about the latter but it'd be surprising for that to affect large amount of memory in the systems that are of interest here. Ooh, that reminds me that we probably wanna go back to 1G + MTRR mapping under 4G. We're currently creating a lot of mapping holes. Thanks for the explanation. My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. Well, I wouldn't consider flipping the direction of allocation to be particularly difficult to get right especially when compared to bringing in ACPI tables into the mix. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Given 1G mappings, is that even a worthwhile effort? I'm getting even more more skeptical. With 1G mappings, I agree that it won't make much difference. I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. Tang, what do you think? Are you OK to try Tejun's suggestion as well? By saying TJ's suggestion, you mean, we will let memblock to control the behaviour, that said, we will do early allocations near the kernel image range before we get the SRAT info? If so, yeah, we have been working on this direction. By doing this, we may have two main changes: 1. change some of memblock's APIs to make it have the ability to allocate memory from low address. 2. setup kernel page table down-top. Concretely, we first map the memory just after the kernel image to the top, then, we map 0 - kernel image end. Do you guys think this is reasonable and acceptable? -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote: On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. I'm curious. If there aren't substantial enough benefits, why would you still want to pull it earlier when it brings in things like initrd override and crafting the code carefully so that it's safe to execute it from different address modes and so on? Please note that x86 is not ia64. The early environment is completely different not only technically but also in its diversity and suckiness. It wasn't too long ago that vendors were screwing up ACPI left and right. It has been getting better but there's a reason why, for example, we still consider e820 to be the authoritative information over ACPI. Firmware generates tables, and provides them via some interface. Memory map table can be provided via e820 or EFI memory map. Memory topology table is provided via ACPI. I agree to prioritize one table over the other when there is overlap. But in the end, it is the firmware that generates the tables. Because it is provided via ACPI does not make it suddenly unreliable. I think table info from e820/EFI/ACPI should be available at the same time. To me, it makes more sense to use the hotplug info to initialize memblock than try to find a way to workaround without it. I think we will continue to be in that way to find a workaround in this direction. I came from ia64 background, and am not very familiar with x86. So, you may be very right about that x86 is different. I also agree that initrd is making it unnecessarily complicated. We may see some initial issues, but my hope is that the code gets matured over the time. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Toshi, On 08/24/2013 01:13 AM, Toshi Kani wrote: Hello, On Fri, 2013-08-23 at 12:24 -0400, Tejun Heo wrote: On Fri, Aug 23, 2013 at 10:14:08AM -0600, Toshi Kani wrote: I still think acpi table info should be available earlier, but I do not think I can convince you on this. This can be religious debate. I'm curious. If there aren't substantial enough benefits, why would you still want to pull it earlier when it brings in things like initrd override and crafting the code carefully so that it's safe to execute it from different address modes and so on? Please note that x86 is not ia64. The early environment is completely different not only technically but also in its diversity and suckiness. It wasn't too long ago that vendors were screwing up ACPI left and right. It has been getting better but there's a reason why, for example, we still consider e820 to be the authoritative information over ACPI. Firmware generates tables, and provides them via some interface. Memory map table can be provided via e820 or EFI memory map. Memory topology table is provided via ACPI. I agree to prioritize one table over the other when there is overlap. But in the end, it is the firmware that generates the tables. Because it is provided via ACPI does not make it suddenly unreliable. I think table info from e820/EFI/ACPI should be available at the same time. To me, it makes more sense to use the hotplug info to initialize memblock than try to find a way to workaround without it. Yeah, agreed. But sigh on x86, we have ACPI initrd override, so we still cannot convince Tj I think we will continue to be in that way to find a workaround in this direction. I came from ia64 background, and am not very familiar with x86. So, you may be very right about that x86 is different. I also agree that initrd is making it unnecessarily complicated. We may see some initial issues, but my hope is that the code gets matured over the time. Thanks, -Toshi -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Zhang, On Sat, 2013-08-24 at 00:54 +0800, Zhang Yanfei wrote: Tang, what do you think? Are you OK to try Tejun's suggestion as well? By saying TJ's suggestion, you mean, we will let memblock to control the behaviour, that said, we will do early allocations near the kernel image range before we get the SRAT info? Right. If so, yeah, we have been working on this direction. Great! By doing this, we may have two main changes: 1. change some of memblock's APIs to make it have the ability to allocate memory from low address. 2. setup kernel page table down-top. Concretely, we first map the memory just after the kernel image to the top, then, we map 0 - kernel image end. Do you guys think this is reasonable and acceptable? Have you also looked at Yinghai's comments below? http://www.spinics.net/lists/linux-mm/msg61362.html Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: BRK makes sense as long as you can set a sane O(1) size limit. put the acpi override table in BRK, we still need ok from HPA. I have impression that he did not like it, so want to confirm from him. on 8 sockets system: -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat assume for 32sockets will have four times bigger with DSDT and SSDT. (with more pci and cpus) So we can not have O(1) the size. Russ, What is ACPI table size on your big machine? Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: BRK makes sense as long as you can set a sane O(1) size limit. put the acpi override table in BRK, we still need ok from HPA. I have impression that he did not like it, so want to confirm from him. on 8 sockets system: -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat assume for 32sockets will have four times bigger with DSDT and SSDT. (with more pci and cpus) So we can not have O(1) the size. Russ, What is ACPI table size on your big machine? This is from a 256 socket 32TB system. Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB) ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO 0113) ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT 0113) ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT 0113) ACPI: FACS 7d147000 00040 ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO ) ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV ) ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT 0113) ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL 20070508) ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: SPCR 7e6c2000 00050 (v01 ) ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT 0113) -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote: On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: Russ, What is ACPI table size on your big machine? This is from a 256 socket 32TB system. Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB) ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO 0113) ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT 0113) ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT 0113) ACPI: FACS 7d147000 00040 ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO ) ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV ) ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT 0113) ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL 20070508) ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: SPCR 7e6c2000 00050 (v01 ) ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT 0113) so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Yinghai, 2013/8/24 Yinghai Lu ying...@kernel.org: On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote: On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: Russ, What is ACPI table size on your big machine? This is from a 256 socket 32TB system. Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB) ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO 0113) ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT 0113) ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT 0113) ACPI: FACS 7d147000 00040 ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO ) ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV ) ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT 0113) ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL 20070508) ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: SPCR 7e6c2000 00050 (v01 ) ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT 0113) so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Then how about use early_ioremap(), and don't do it that early in head_32 and head64 ? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
While we're at it: Can someone send me the acpidump for this machine? We very much would like to test all of ACPICA with such a large DSDT. Thanks, Bob -Original Message- From: chen tang [mailto:imtangc...@gmail.com] Sent: Friday, August 23, 2013 2:51 PM To: Yinghai Lu Cc: Russ Anderson; H. Peter Anvin; Zhang Yanfei; Toshi Kani; Tejun Heo; Tang Chen; Konrad Rzeszutek Wilk; Moore, Robert; Zheng, Lv; Rafael J. Wysocki; Ingo Molnar; Andrew Morton; Thomas Renninger; Yasuaki Ishimatsu; Mel Gorman; Linux Kernel Mailing List Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier. Hi Yinghai, 2013/8/24 Yinghai Lu ying...@kernel.org: On Fri, Aug 23, 2013 at 1:30 PM, Russ Anderson r...@sgi.com wrote: On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: Russ, What is ACPI table size on your big machine? This is from a 256 socket 32TB system. Reserving 256MB of memory at 66973408MB for crashkernel (System RAM: 32501719MB) ACPI: RSDP 7ef3d014 00024 (v02 INTEL ) ACPI: XSDT 7ef3d120 0007C (v01 INTEL TIANO 0113) ACPI: FACP 7ef3a000 000F4 (v04 INTEL TIANO MSFT 0113) ACPI: DSDT 7e6c3000 7F493E (v02 SGI2 UVX 0002 MSFT 0113) ACPI: FACS 7d147000 00040 ACPI: UEFI 7ef3c000 0012A (v01 INTEL RstScuO ) ACPI: UEFI 7ef3b000 0005C (v01 INTEL RstScuV ) ACPI: HPET 7ef39000 00038 (v01 INTEL TIANO0001 MSFT 0113) ACPI: SSDT 7ef33000 05352 (v02 INTEL ROSECITY 0003 INTL 20070508) ACPI: SLIT 7ef1 1002C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: APIC 7000 10070 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: SRAT 7eeb8000 1A830 (v03 SGI2 UVX 0002 MSFT 0001) ACPI: MCFG 7d6d4000 0105C (v01 SGI2 UVX 0002 MSFT 0001) ACPI: SPCR 7e6c2000 00050 (v01 ) ACPI: DMAR 7d6d3000 0013C (v01 INTEL TIANO0001 MSFT 0113) so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Then how about use early_ioremap(), and don't do it that early in head_32 and head64 ? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 2:52 PM, Moore, Robert robert.mo...@intel.com wrote: While we're at it: Can someone send me the acpidump for this machine? We very much would like to test all of ACPICA with such a large DSDT. That is Russ. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 2:50 PM, chen tang imtangc...@gmail.com wrote: so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Then how about use early_ioremap(), and don't do it that early in head_32 and head64 ? why could early_ioremap() help? when to use early_ioremap()? what for? Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi Yinghai, 2013/8/24 Yinghai Lu ying...@kernel.org: On Fri, Aug 23, 2013 at 2:50 PM, chen tang imtangc...@gmail.com wrote: so the DSDT is 7F493E, and total is more than 8M. that will need BRK to be extended 16M? Then how about use early_ioremap(), and don't do it that early in head_32 and head64 ? why could early_ioremap() help? when to use early_ioremap()? what for? In my understanding, acpica framework needs users to copy the override tables somewhere in the memory. And acpica will get these user specified tables when installing firmware tables. This is the acpica logic, which cannot be changed, I think. So we need to allocate memory. That is why you suggested to use BRK, right ? And the size seems to be a problem. So I suggest to use early_ioremap(). 1. After paging is enabled, before direct mapping page tables are setup, we map the initrd with early_ioremap(). And we are able to access it with va, even on 32bit. Then we can find all tables. 2. We still use memblock to allocate memory. Maybe it will be hotpluggable memory, but this memory can be freed when all the acpi tables are parsed, right ? So I want to try early_ioremap(). All these should be done in setup_arch(). Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
So we need to allocate memory. That is why you suggested to use BRK, right ? And the size seems to be a problem. So I suggest to use early_ioremap(). 1. After paging is enabled, before direct mapping page tables are setup, we map the initrd with early_ioremap(). And we are able to access it with va, even on 32bit. Then we can find all tables. 2. We still use memblock to allocate memory. Maybe it will be hotpluggable memory, but this memory can be freed when all the acpi tables are parsed, right ? So I want to try early_ioremap(). All these should be done in setup_arch(). no. cpio search need to take whole range virtual address, and early_ioremap has size limitation. you will have to update cpio search to take mapping function. could be too messy. Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Fri, Aug 23, 2013 at 01:08:56PM -0700, Yinghai Lu wrote: On Fri, Aug 23, 2013 at 11:25 AM, H. Peter Anvin h...@zytor.com wrote: BRK makes sense as long as you can set a sane O(1) size limit. put the acpi override table in BRK, we still need ok from HPA. I have impression that he did not like it, so want to confirm from him. on 8 sockets system: -rw-r--r-- 1 root root 3532 Aug 22 10:26 APIC.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 BDAT.dat -rw-r--r-- 1 root root824 Aug 22 10:26 DMAR.dat -rw-r--r-- 1 root root 83509 Aug 22 10:26 DSDT.dat -rw-r--r-- 1 root root244 Aug 22 10:26 FACP.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 FACS.dat -rw-r--r-- 1 root root 68 Aug 22 10:26 FPDT.dat -rw-r--r-- 1 root root 56 Aug 22 10:26 HPET.dat -rw-r--r-- 1 root root304 Aug 22 10:26 MCEJ.dat -rw-r--r-- 1 root root 60 Aug 22 10:26 MCFG.dat -rw-r--r-- 1 root root 6712 Aug 22 10:26 MPST.dat -rw-r--r-- 1 root root232 Aug 22 10:26 MSCT.dat -rw-r--r-- 1 root root172 Aug 22 10:26 PCCT.dat -rw-r--r-- 1 root root 96 Aug 22 10:26 PMCT.dat -rw-r--r-- 1 root root 48 Aug 22 10:26 RASF.dat -rw-r--r-- 1 root root108 Aug 22 10:26 SLIT.dat -rw-r--r-- 1 root root 80 Aug 22 10:26 SPCR.dat -rw-r--r-- 1 root root 65 Aug 22 10:26 SPMI.dat -rw-r--r-- 1 root root 6448 Aug 22 10:26 SRAT.dat -rw-r--r-- 1 root root100 Aug 22 10:26 SSDT1.dat -rw-r--r-- 1 root root 283527 Aug 22 10:26 SSDT2.dat -rw-r--r-- 1 root root 66 Aug 22 10:26 UEFI.dat -rw-r--r-- 1 root root 64 Aug 22 10:26 WDDT.dat assume for 32sockets will have four times bigger with DSDT and SSDT. (with more pci and cpus) So we can not have O(1) the size. Russ, What is ACPI table size on your big machine? This is from a 255 socket, 4080 cpu, 15TB system. --- -rw-r--r-- 1 root root 65392 Aug 23 21:23 apic.dat -rw-r--r-- 1 root root 316 Aug 23 21:23 dmar.dat -rw-r--r-- 1 root root 8309249 Aug 23 21:23 dsdt.dat -rw-r--r-- 1 root root 244 Aug 23 21:23 facp.dat -rw-r--r-- 1 root root 64 Aug 23 21:23 facs.dat -rw-r--r-- 1 root root 56 Aug 23 21:23 hpet.dat -rw-r--r-- 1 root root4172 Aug 23 21:23 mcfg.dat -rw-r--r-- 1 root root 36 Aug 23 21:23 rsdp.dat -rw-r--r-- 1 root root 80 Aug 23 21:23 rsdt.dat -rw-r--r-- 1 root root 65069 Aug 23 21:23 slit.dat -rw-r--r-- 1 root root 80 Aug 23 21:23 spcr.dat -rw-r--r-- 1 root root 108168 Aug 23 21:23 srat.dat -rw-r--r-- 1 root root 21330 Aug 23 21:23 ssdt.dat -rw-r--r-- 1 root root 92 Aug 23 21:23 uefi1.dat -rw-r--r-- 1 root root 298 Aug 23 21:23 uefi.dat -rw-r--r-- 1 root root 124 Aug 23 21:23 xsdt.dat --- -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc r...@sgi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote: : > > Local page table and memory hotplug are two separate things. That is, > > local page tables can be supported on all NUMA platforms without hotplug > > support. Are you sure huge mapping will solve everything for all types > > of applications, and therefore local page tables won't be needed at all? > > When you throw around terms like "all" and "at all", you can't reach > rational discussion about engineering trade-offs. I was asking you > whether it was reasonable to do per-node page table when most machines > support huge page mappings which makes the whole thing rather > pointless. Of course there will be some niche cases where this might > not be optimal but do you think that would be enough to justify the > added complexity and churn? If you think so, can you please > elaborate? I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. > > When someone changes the page table init code, who will test it with the > > special allocation code? > > What are you worrying about? Are you saying that allocating page > table towards top or bottom of memory would be more disruptive and > difficult to debug than pulling in ACPI init and SRAT information into > the process? Am I missing something here? My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote: > Since some node(s) won't be ejectable, this solution is reasonable as > the first step. I do not think it is a distraction. I view your But does this contribute to reaching the next step? If so, how? I can't see how and that's why I said this was a distraction. > suggestion as a distraction of supporting local page tables, though. Hmmm... > Local page table and memory hotplug are two separate things. That is, > local page tables can be supported on all NUMA platforms without hotplug > support. Are you sure huge mapping will solve everything for all types > of applications, and therefore local page tables won't be needed at all? When you throw around terms like "all" and "at all", you can't reach rational discussion about engineering trade-offs. I was asking you whether it was reasonable to do per-node page table when most machines support huge page mappings which makes the whole thing rather pointless. Of course there will be some niche cases where this might not be optimal but do you think that would be enough to justify the added complexity and churn? If you think so, can you please elaborate? > When someone changes the page table init code, who will test it with the > special allocation code? What are you worrying about? Are you saying that allocating page table towards top or bottom of memory would be more disruptive and difficult to debug than pulling in ACPI init and SRAT information into the process? Am I missing something here? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote: > On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote: > > It's too late for the kernel image itself, but it prevents allocating > > kernel memory from movable ranges after that. I'd say it solves a half > > of the issue this time. > > That works if such half solution eventually leads to the full > solution. This is just a distraction. You are already too late in > the boot sequence. It doesn't even qualify as a half solution. It's > like obsessing about a speck on your shirt without your trousers on. > If you want to solve this, do that from a place where it actually is > solvable. Since some node(s) won't be ejectable, this solution is reasonable as the first step. I do not think it is a distraction. I view your suggestion as a distraction of supporting local page tables, though. > > > > Also, how do you support local page tables without pursing SRAT early? > > > > > > Does it even matter with huge mappings? It's gonna be contained in a > > > single page anyway, right? > > > > Are the huge mappings always used? We cannot force user programs to use > > huge pages, can we? > > Everything is a trade-off. Should we do all this just to support the > off chance someone tries to use memory hotplug on a machine which > doesn't support huge mapping when virtually all CPUs on market > supports it? Local page table and memory hotplug are two separate things. That is, local page tables can be supported on all NUMA platforms without hotplug support. Are you sure huge mapping will solve everything for all types of applications, and therefore local page tables won't be needed at all? > > As for the maintainability, I am far more concerned with your suggestion > > of having a separate page table init code when SRAT is used. This kind > > of divergence is a recipe of breakage. > > I don't buy that. The only thing which needs to change is the > directionality of allocation and we probably don't even need to do > that if huge mapping is in use. When someone changes the page table init code, who will test it with the special allocation code? Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
A bit of addition. On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote: > That works if such half solution eventually leads to the full > solution. This is just a distraction. You are already too late in > the boot sequence. It doesn't even qualify as a half solution. It's > like obsessing about a speck on your shirt without your trousers on. > If you want to solve this, do that from a place where it actually is > solvable. Seriously, what's the end game here? How do you guys see this eventually reaching full solution? If you don't see that and this kinda-sorta-working solution is fine, then that's fine too but we aren't gonna make a lot of invasive changes for that. If you can at least envision the full solution, please try to fit this effort into the bigger picture. In all possible solutions that I can think of, there needs to be earlier handling of SRAT informtaion before the kernel proper starts executing be that either the actual bootloader or earlier kernel serving as kexec host. If a proper solution needs such processing earlier anyway, it can set up things so that either the default booting behavior doesn't harm hotpluggability or feed the necessary information to the kernel. In both cases, doing ACPI super early in the booting kernel doesn't buy us anything. So, then, what the hell are we doing here with all these relocations, careful double execution of the same code from different execution contexts, worrying about initrd firmware override even before the kernel page table is set up? If we're doing all those to just make the temporary half-assed-anyway solution minutely better, that's just plain stupid. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote: > It's too late for the kernel image itself, but it prevents allocating > kernel memory from movable ranges after that. I'd say it solves a half > of the issue this time. That works if such half solution eventually leads to the full solution. This is just a distraction. You are already too late in the boot sequence. It doesn't even qualify as a half solution. It's like obsessing about a speck on your shirt without your trousers on. If you want to solve this, do that from a place where it actually is solvable. > > > Also, how do you support local page tables without pursing SRAT early? > > > > Does it even matter with huge mappings? It's gonna be contained in a > > single page anyway, right? > > Are the huge mappings always used? We cannot force user programs to use > huge pages, can we? Everything is a trade-off. Should we do all this just to support the off chance someone tries to use memory hotplug on a machine which doesn't support huge mapping when virtually all CPUs on market supports it? > As for the maintainability, I am far more concerned with your suggestion > of having a separate page table init code when SRAT is used. This kind > of divergence is a recipe of breakage. I don't buy that. The only thing which needs to change is the directionality of allocation and we probably don't even need to do that if huge mapping is in use. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote: > On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: > > I understand that you are concerned about stability of the ACPI stuff, > > which I think is a valid point, but most of (if not all) of the > > ACPI-related issues come from ACPI namespace/methods, which is a very > > different thing. Please do not mix up those two. The ACPI > > I have no objection to implementing self-conftained earlyprintk > support. If that's all you want to do, please go ahead but do not > pull in initrd override or ACPICA into it. If you are referring ACPICA as the AML interpreter, right, we do not move it up as I explained before. We are trying to move up the ACPI table init code (which is part of ACPICA, but has nothing to do with AML.) Note that ia64 also uses ACPI, and calls acpi_table_init() in setup_arch() before initializing the bootmap in find_memory(). > > namespace/methods stuff remains the same and continues to be initialized > > at very late in the boot sequence. > > > > What's making the patchset complicated is acpi_initrd_override(), which > > is intended for developers and allows overwriting ACPI bits at their own > > risk. This feature won't be used by regular users. > > Yeah, please forget about that in earlyboot. It doesn't make any > sense to fiddle with initrd that early during boot. I think the reason why Tang is working on this stuff again is that his previous change (which was once accepted) had broken initrd. So, he'd have to support it this time... > > If you are referring the issue of kernel image location, it is a > > limitation in the current implementation, not a technical limitation. I > > know other OS that supports movable memory and puts the kernel image > > into a movable memory with SRAT by changing the bootloader. > > I'm not saying that problem shouldn't be solved. I'm saying what you > guys are pushing doesn't help solving it at all. It's too late in the > boot process. It needs to be handled either by bootloader or earlier > kernel kexecing the actual one and super-early SRAT doens't help at > all in either case, so what's the point of pulling ACPI code in when > it doesn't contribute to solving the problem properly? It's too late for the kernel image itself, but it prevents allocating kernel memory from movable ranges after that. I'd say it solves a half of the issue this time. > > Also, how do you support local page tables without pursing SRAT early? > > Does it even matter with huge mappings? It's gonna be contained in a > single page anyway, right? Are the huge mappings always used? We cannot force user programs to use huge pages, can we? > > Initializing page tables on large systems may take a long time, and I do > > think that earlyprink needs to be available before that point. > > Yeah, sure, implement it in *minimal* way which doesn't affect > anything if not explicitly enabled by kernel param like other > earlyprintks. It doens't make any sense to add dependency to acpi > from early boot for that. It makes sense because it needs to obtain the config info from ACPI tables. As for the maintainability, I am far more concerned with your suggestion of having a separate page table init code when SRAT is used. This kind of divergence is a recipe of breakage. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote: > What do you mean by "earlyboot"? And also in your previous mail, I am also > a little confused by what you said "the very first stage of boot". Does > this mean the stage we are in head_32 or head64.c? Mostly referring to the state where we don't have basic environment set up yet including page tables. > If so, could we just do something just as Yinghai did before, that is, Split > acpi_override into 2 parts: find and copy. And in "earlyboot", we just do > the find, and I think that is less of risk. Or we can just do ACPI override > earlier in setup_arch(), not pulling this process that early during boot? But *WHY*? It doesn't really buy us anything substantial. What are you trying to achieve here? "Making ACPI info available early" can't be a goal in itself and the two benefits cited in this thread seem pretty dubious to me. Why are you guys trying to push this convolution when it doesn't bring any substantial gain? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello tejun, On 08/23/2013 02:31 AM, Tejun Heo wrote: > Hello, > > On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: >> I understand that you are concerned about stability of the ACPI stuff, >> which I think is a valid point, but most of (if not all) of the >> ACPI-related issues come from ACPI namespace/methods, which is a very >> different thing. Please do not mix up those two. The ACPI > > I have no objection to implementing self-conftained earlyprintk > support. If that's all you want to do, please go ahead but do not > pull in initrd override or ACPICA into it. > >> namespace/methods stuff remains the same and continues to be initialized >> at very late in the boot sequence. >> >> What's making the patchset complicated is acpi_initrd_override(), which >> is intended for developers and allows overwriting ACPI bits at their own >> risk. This feature won't be used by regular users. > > Yeah, please forget about that in earlyboot. It doesn't make any > sense to fiddle with initrd that early during boot. What do you mean by "earlyboot"? And also in your previous mail, I am also a little confused by what you said "the very first stage of boot". Does this mean the stage we are in head_32 or head64.c? If so, could we just do something just as Yinghai did before, that is, Split acpi_override into 2 parts: find and copy. And in "earlyboot", we just do the find, and I think that is less of risk. Or we can just do ACPI override earlier in setup_arch(), not pulling this process that early during boot? Thanks -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: > I understand that you are concerned about stability of the ACPI stuff, > which I think is a valid point, but most of (if not all) of the > ACPI-related issues come from ACPI namespace/methods, which is a very > different thing. Please do not mix up those two. The ACPI I have no objection to implementing self-conftained earlyprintk support. If that's all you want to do, please go ahead but do not pull in initrd override or ACPICA into it. > namespace/methods stuff remains the same and continues to be initialized > at very late in the boot sequence. > > What's making the patchset complicated is acpi_initrd_override(), which > is intended for developers and allows overwriting ACPI bits at their own > risk. This feature won't be used by regular users. Yeah, please forget about that in earlyboot. It doesn't make any sense to fiddle with initrd that early during boot. > If you are referring the issue of kernel image location, it is a > limitation in the current implementation, not a technical limitation. I > know other OS that supports movable memory and puts the kernel image > into a movable memory with SRAT by changing the bootloader. I'm not saying that problem shouldn't be solved. I'm saying what you guys are pushing doesn't help solving it at all. It's too late in the boot process. It needs to be handled either by bootloader or earlier kernel kexecing the actual one and super-early SRAT doens't help at all in either case, so what's the point of pulling ACPI code in when it doesn't contribute to solving the problem properly? > Also, how do you support local page tables without pursing SRAT early? Does it even matter with huge mappings? It's gonna be contained in a single page anyway, right? > Initializing page tables on large systems may take a long time, and I do > think that earlyprink needs to be available before that point. Yeah, sure, implement it in *minimal* way which doesn't affect anything if not explicitly enabled by kernel param like other earlyprintks. It doens't make any sense to add dependency to acpi from early boot for that. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote: > On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote: > > I agree that ACPI is rather complicated stuff. But in my experience, > > the majority complication comes from ACPI namespace and methods, not > > from ACPI tables. Do you really think ACPI table init is that risky? I > > consider ACPI tables are part of the minimum config info, esp. for > > legacy-free platforms. > > It's just that we're talking about the very first stage of boot. We > really don't do much there and pulling in ACPI code into that stage is > a lot by comparison. If that's gonna happen, it needs pretty strong > justification. It moves up the ACPI table init code, which itself is simple. And ACPI tables are defined to be pursed at early boot-time, which is why they exist in addition to ACPI namespace/methods. They are similar to EFI memory table. Firmware publishes tables in one way or the other. I understand that you are concerned about stability of the ACPI stuff, which I think is a valid point, but most of (if not all) of the ACPI-related issues come from ACPI namespace/methods, which is a very different thing. Please do not mix up those two. The ACPI namespace/methods stuff remains the same and continues to be initialized at very late in the boot sequence. What's making the patchset complicated is acpi_initrd_override(), which is intended for developers and allows overwriting ACPI bits at their own risk. This feature won't be used by regular users. > > earlyprintk is just another example to this SRAT issue. The local page > > table is yet another example. My hope here is for us to be able to > > utilize ACPI tables properly without hitting this kind of ordering > > issues again and again, which requires considerable time & effort to > > address. > > So, the two things brought up at this point are early parsing of SRAT, > which can't really solve the problem at hand anyway, If you are referring the issue of kernel image location, it is a limitation in the current implementation, not a technical limitation. I know other OS that supports movable memory and puts the kernel image into a movable memory with SRAT by changing the bootloader. Also, how do you support local page tables without pursing SRAT early? > and earlyprintk > which should be implemented in minimal way which is not activated > unless specifically enabled with earlyprintk boot param. Neither > seems to justify pulling in full ACPI into early boot, right? Initializing page tables on large systems may take a long time, and I do think that earlyprink needs to be available before that point. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 23:32 -0400, Tejun Heo wrote: On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote: I agree that ACPI is rather complicated stuff. But in my experience, the majority complication comes from ACPI namespace and methods, not from ACPI tables. Do you really think ACPI table init is that risky? I consider ACPI tables are part of the minimum config info, esp. for legacy-free platforms. It's just that we're talking about the very first stage of boot. We really don't do much there and pulling in ACPI code into that stage is a lot by comparison. If that's gonna happen, it needs pretty strong justification. It moves up the ACPI table init code, which itself is simple. And ACPI tables are defined to be pursed at early boot-time, which is why they exist in addition to ACPI namespace/methods. They are similar to EFI memory table. Firmware publishes tables in one way or the other. I understand that you are concerned about stability of the ACPI stuff, which I think is a valid point, but most of (if not all) of the ACPI-related issues come from ACPI namespace/methods, which is a very different thing. Please do not mix up those two. The ACPI namespace/methods stuff remains the same and continues to be initialized at very late in the boot sequence. What's making the patchset complicated is acpi_initrd_override(), which is intended for developers and allows overwriting ACPI bits at their own risk. This feature won't be used by regular users. earlyprintk is just another example to this SRAT issue. The local page table is yet another example. My hope here is for us to be able to utilize ACPI tables properly without hitting this kind of ordering issues again and again, which requires considerable time effort to address. So, the two things brought up at this point are early parsing of SRAT, which can't really solve the problem at hand anyway, If you are referring the issue of kernel image location, it is a limitation in the current implementation, not a technical limitation. I know other OS that supports movable memory and puts the kernel image into a movable memory with SRAT by changing the bootloader. Also, how do you support local page tables without pursing SRAT early? and earlyprintk which should be implemented in minimal way which is not activated unless specifically enabled with earlyprintk boot param. Neither seems to justify pulling in full ACPI into early boot, right? Initializing page tables on large systems may take a long time, and I do think that earlyprink needs to be available before that point. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: I understand that you are concerned about stability of the ACPI stuff, which I think is a valid point, but most of (if not all) of the ACPI-related issues come from ACPI namespace/methods, which is a very different thing. Please do not mix up those two. The ACPI I have no objection to implementing self-conftained earlyprintk support. If that's all you want to do, please go ahead but do not pull in initrd override or ACPICA into it. namespace/methods stuff remains the same and continues to be initialized at very late in the boot sequence. What's making the patchset complicated is acpi_initrd_override(), which is intended for developers and allows overwriting ACPI bits at their own risk. This feature won't be used by regular users. Yeah, please forget about that in earlyboot. It doesn't make any sense to fiddle with initrd that early during boot. If you are referring the issue of kernel image location, it is a limitation in the current implementation, not a technical limitation. I know other OS that supports movable memory and puts the kernel image into a movable memory with SRAT by changing the bootloader. I'm not saying that problem shouldn't be solved. I'm saying what you guys are pushing doesn't help solving it at all. It's too late in the boot process. It needs to be handled either by bootloader or earlier kernel kexecing the actual one and super-early SRAT doens't help at all in either case, so what's the point of pulling ACPI code in when it doesn't contribute to solving the problem properly? Also, how do you support local page tables without pursing SRAT early? Does it even matter with huge mappings? It's gonna be contained in a single page anyway, right? Initializing page tables on large systems may take a long time, and I do think that earlyprink needs to be available before that point. Yeah, sure, implement it in *minimal* way which doesn't affect anything if not explicitly enabled by kernel param like other earlyprintks. It doens't make any sense to add dependency to acpi from early boot for that. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello tejun, On 08/23/2013 02:31 AM, Tejun Heo wrote: Hello, On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: I understand that you are concerned about stability of the ACPI stuff, which I think is a valid point, but most of (if not all) of the ACPI-related issues come from ACPI namespace/methods, which is a very different thing. Please do not mix up those two. The ACPI I have no objection to implementing self-conftained earlyprintk support. If that's all you want to do, please go ahead but do not pull in initrd override or ACPICA into it. namespace/methods stuff remains the same and continues to be initialized at very late in the boot sequence. What's making the patchset complicated is acpi_initrd_override(), which is intended for developers and allows overwriting ACPI bits at their own risk. This feature won't be used by regular users. Yeah, please forget about that in earlyboot. It doesn't make any sense to fiddle with initrd that early during boot. What do you mean by earlyboot? And also in your previous mail, I am also a little confused by what you said the very first stage of boot. Does this mean the stage we are in head_32 or head64.c? If so, could we just do something just as Yinghai did before, that is, Split acpi_override into 2 parts: find and copy. And in earlyboot, we just do the find, and I think that is less of risk. Or we can just do ACPI override earlier in setup_arch(), not pulling this process that early during boot? Thanks -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Fri, Aug 23, 2013 at 03:39:53AM +0800, Zhang Yanfei wrote: What do you mean by earlyboot? And also in your previous mail, I am also a little confused by what you said the very first stage of boot. Does this mean the stage we are in head_32 or head64.c? Mostly referring to the state where we don't have basic environment set up yet including page tables. If so, could we just do something just as Yinghai did before, that is, Split acpi_override into 2 parts: find and copy. And in earlyboot, we just do the find, and I think that is less of risk. Or we can just do ACPI override earlier in setup_arch(), not pulling this process that early during boot? But *WHY*? It doesn't really buy us anything substantial. What are you trying to achieve here? Making ACPI info available early can't be a goal in itself and the two benefits cited in this thread seem pretty dubious to me. Why are you guys trying to push this convolution when it doesn't bring any substantial gain? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Thu, 2013-08-22 at 14:31 -0400, Tejun Heo wrote: On Thu, Aug 22, 2013 at 09:52:09AM -0600, Toshi Kani wrote: I understand that you are concerned about stability of the ACPI stuff, which I think is a valid point, but most of (if not all) of the ACPI-related issues come from ACPI namespace/methods, which is a very different thing. Please do not mix up those two. The ACPI I have no objection to implementing self-conftained earlyprintk support. If that's all you want to do, please go ahead but do not pull in initrd override or ACPICA into it. If you are referring ACPICA as the AML interpreter, right, we do not move it up as I explained before. We are trying to move up the ACPI table init code (which is part of ACPICA, but has nothing to do with AML.) Note that ia64 also uses ACPI, and calls acpi_table_init() in setup_arch() before initializing the bootmap in find_memory(). namespace/methods stuff remains the same and continues to be initialized at very late in the boot sequence. What's making the patchset complicated is acpi_initrd_override(), which is intended for developers and allows overwriting ACPI bits at their own risk. This feature won't be used by regular users. Yeah, please forget about that in earlyboot. It doesn't make any sense to fiddle with initrd that early during boot. I think the reason why Tang is working on this stuff again is that his previous change (which was once accepted) had broken initrd. So, he'd have to support it this time... If you are referring the issue of kernel image location, it is a limitation in the current implementation, not a technical limitation. I know other OS that supports movable memory and puts the kernel image into a movable memory with SRAT by changing the bootloader. I'm not saying that problem shouldn't be solved. I'm saying what you guys are pushing doesn't help solving it at all. It's too late in the boot process. It needs to be handled either by bootloader or earlier kernel kexecing the actual one and super-early SRAT doens't help at all in either case, so what's the point of pulling ACPI code in when it doesn't contribute to solving the problem properly? It's too late for the kernel image itself, but it prevents allocating kernel memory from movable ranges after that. I'd say it solves a half of the issue this time. Also, how do you support local page tables without pursing SRAT early? Does it even matter with huge mappings? It's gonna be contained in a single page anyway, right? Are the huge mappings always used? We cannot force user programs to use huge pages, can we? Initializing page tables on large systems may take a long time, and I do think that earlyprink needs to be available before that point. Yeah, sure, implement it in *minimal* way which doesn't affect anything if not explicitly enabled by kernel param like other earlyprintks. It doens't make any sense to add dependency to acpi from early boot for that. It makes sense because it needs to obtain the config info from ACPI tables. As for the maintainability, I am far more concerned with your suggestion of having a separate page table init code when SRAT is used. This kind of divergence is a recipe of breakage. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote: It's too late for the kernel image itself, but it prevents allocating kernel memory from movable ranges after that. I'd say it solves a half of the issue this time. That works if such half solution eventually leads to the full solution. This is just a distraction. You are already too late in the boot sequence. It doesn't even qualify as a half solution. It's like obsessing about a speck on your shirt without your trousers on. If you want to solve this, do that from a place where it actually is solvable. Also, how do you support local page tables without pursing SRAT early? Does it even matter with huge mappings? It's gonna be contained in a single page anyway, right? Are the huge mappings always used? We cannot force user programs to use huge pages, can we? Everything is a trade-off. Should we do all this just to support the off chance someone tries to use memory hotplug on a machine which doesn't support huge mapping when virtually all CPUs on market supports it? As for the maintainability, I am far more concerned with your suggestion of having a separate page table init code when SRAT is used. This kind of divergence is a recipe of breakage. I don't buy that. The only thing which needs to change is the directionality of allocation and we probably don't even need to do that if huge mapping is in use. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
A bit of addition. On Thu, Aug 22, 2013 at 04:21:58PM -0400, Tejun Heo wrote: That works if such half solution eventually leads to the full solution. This is just a distraction. You are already too late in the boot sequence. It doesn't even qualify as a half solution. It's like obsessing about a speck on your shirt without your trousers on. If you want to solve this, do that from a place where it actually is solvable. Seriously, what's the end game here? How do you guys see this eventually reaching full solution? If you don't see that and this kinda-sorta-working solution is fine, then that's fine too but we aren't gonna make a lot of invasive changes for that. If you can at least envision the full solution, please try to fit this effort into the bigger picture. In all possible solutions that I can think of, there needs to be earlier handling of SRAT informtaion before the kernel proper starts executing be that either the actual bootloader or earlier kernel serving as kexec host. If a proper solution needs such processing earlier anyway, it can set up things so that either the default booting behavior doesn't harm hotpluggability or feed the necessary information to the kernel. In both cases, doing ACPI super early in the booting kernel doesn't buy us anything. So, then, what the hell are we doing here with all these relocations, careful double execution of the same code from different execution contexts, worrying about initrd firmware override even before the kernel page table is set up? If we're doing all those to just make the temporary half-assed-anyway solution minutely better, that's just plain stupid. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Thu, 2013-08-22 at 16:21 -0400, Tejun Heo wrote: On Thu, Aug 22, 2013 at 02:11:32PM -0600, Toshi Kani wrote: It's too late for the kernel image itself, but it prevents allocating kernel memory from movable ranges after that. I'd say it solves a half of the issue this time. That works if such half solution eventually leads to the full solution. This is just a distraction. You are already too late in the boot sequence. It doesn't even qualify as a half solution. It's like obsessing about a speck on your shirt without your trousers on. If you want to solve this, do that from a place where it actually is solvable. Since some node(s) won't be ejectable, this solution is reasonable as the first step. I do not think it is a distraction. I view your suggestion as a distraction of supporting local page tables, though. Also, how do you support local page tables without pursing SRAT early? Does it even matter with huge mappings? It's gonna be contained in a single page anyway, right? Are the huge mappings always used? We cannot force user programs to use huge pages, can we? Everything is a trade-off. Should we do all this just to support the off chance someone tries to use memory hotplug on a machine which doesn't support huge mapping when virtually all CPUs on market supports it? Local page table and memory hotplug are two separate things. That is, local page tables can be supported on all NUMA platforms without hotplug support. Are you sure huge mapping will solve everything for all types of applications, and therefore local page tables won't be needed at all? As for the maintainability, I am far more concerned with your suggestion of having a separate page table init code when SRAT is used. This kind of divergence is a recipe of breakage. I don't buy that. The only thing which needs to change is the directionality of allocation and we probably don't even need to do that if huge mapping is in use. When someone changes the page table init code, who will test it with the special allocation code? Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Thu, Aug 22, 2013 at 03:06:38PM -0600, Toshi Kani wrote: Since some node(s) won't be ejectable, this solution is reasonable as the first step. I do not think it is a distraction. I view your But does this contribute to reaching the next step? If so, how? I can't see how and that's why I said this was a distraction. suggestion as a distraction of supporting local page tables, though. Hmmm... Local page table and memory hotplug are two separate things. That is, local page tables can be supported on all NUMA platforms without hotplug support. Are you sure huge mapping will solve everything for all types of applications, and therefore local page tables won't be needed at all? When you throw around terms like all and at all, you can't reach rational discussion about engineering trade-offs. I was asking you whether it was reasonable to do per-node page table when most machines support huge page mappings which makes the whole thing rather pointless. Of course there will be some niche cases where this might not be optimal but do you think that would be enough to justify the added complexity and churn? If you think so, can you please elaborate? When someone changes the page table init code, who will test it with the special allocation code? What are you worrying about? Are you saying that allocating page table towards top or bottom of memory would be more disruptive and difficult to debug than pulling in ACPI init and SRAT information into the process? Am I missing something here? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Thu, 2013-08-22 at 17:21 -0400, Tejun Heo wrote: : Local page table and memory hotplug are two separate things. That is, local page tables can be supported on all NUMA platforms without hotplug support. Are you sure huge mapping will solve everything for all types of applications, and therefore local page tables won't be needed at all? When you throw around terms like all and at all, you can't reach rational discussion about engineering trade-offs. I was asking you whether it was reasonable to do per-node page table when most machines support huge page mappings which makes the whole thing rather pointless. Of course there will be some niche cases where this might not be optimal but do you think that would be enough to justify the added complexity and churn? If you think so, can you please elaborate? I am relatively new to Linux, so I am not a good person to elaborate this. From my experience on other OS, huge pages helped for the kernel, but did not necessarily help user applications. It depended on applications, which were not niche cases. But Linux may be different, so I asked since you seemed confident. I'd appreciate if you can point us some data that endorses your statement. When someone changes the page table init code, who will test it with the special allocation code? What are you worrying about? Are you saying that allocating page table towards top or bottom of memory would be more disruptive and difficult to debug than pulling in ACPI init and SRAT information into the process? Am I missing something here? My worry is that the code is unlikely tested with the special logic when someone makes code changes to the page tables. Such code can easily be broken in future. To answer your other question/email, I believe Tang's next step is to support local page tables. This is why we think pursing SRAT earlier is the right direction. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote: > I agree that ACPI is rather complicated stuff. But in my experience, > the majority complication comes from ACPI namespace and methods, not > from ACPI tables. Do you really think ACPI table init is that risky? I > consider ACPI tables are part of the minimum config info, esp. for > legacy-free platforms. It's just that we're talking about the very first stage of boot. We really don't do much there and pulling in ACPI code into that stage is a lot by comparison. If that's gonna happen, it needs pretty strong justification. > earlyprintk is just another example to this SRAT issue. The local page > table is yet another example. My hope here is for us to be able to > utilize ACPI tables properly without hitting this kind of ordering > issues again and again, which requires considerable time & effort to > address. So, the two things brought up at this point are early parsing of SRAT, which can't really solve the problem at hand anyway, and earlyprintk which should be implemented in minimal way which is not activated unless specifically enabled with earlyprintk boot param. Neither seems to justify pulling in full ACPI into early boot, right? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote: > On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote: > > Platforms vendors (which care Linux) need to support the existing Linux > > features. This means that they have to implement legacy interfaces on > > x86 until the kernel supports an alternative method. For instance, some > > platforms are legacy-free and do not have legacy COM ports. These ACPI > > tables were defined so that non-legacy COM ports can be described and > > informed to the OS. Without this support, such platforms may have to > > emulate the legacy COM ports for Linux, or drop Linux support. > > Are you seriously saying that vendors are gonna drop linux support for > lacking ACPI earlyprintk support? Please... earlyprintk is an example of the issues. The point is that vendors are required to support legacy stuff for Linux. > Please take a look at the existing earlyprintk code and how compact > and self-contained they are. If you want to add ACPI earlyprintk, do > similar stuff. Forget about firmware blob override from initrd or > ACPICA. Just implement the bare minimum to get the thing working. Do > not add dependency to large body of code from earlyboot. It's a bad > idea through and through. I am not saying that ACPI earlyprintk must be available at exactly the same point. How early it can reasonably be is a subject of discussion. > > I think the kernel boot-up sequence should be designed in such a way > > that can support legacy-free and/or NUMA platforms properly. > > Blanket statements like the above don't mean much. There are many > separate stages of boot and you're talking about one of the very first > stages where we traditionally have always depended upon only the very > bare minimum of the platform both in hardware itself and configuration > information. We've been doing that for *very* good reasons. If you > screw up there, it's mighty tricky to figure out what went wrong > especially on the machines that you can't physically kick. You're now > suggesting to add whole ACPI parsing including overloading from initrd > into that stage with pretty weak rationale. I agree that ACPI is rather complicated stuff. But in my experience, the majority complication comes from ACPI namespace and methods, not from ACPI tables. Do you really think ACPI table init is that risky? I consider ACPI tables are part of the minimum config info, esp. for legacy-free platforms. > Seriously, if you want ACPI based earlyprintk, implement it in a > discrete minimal code which is easy to verify and won't get affected > when the rest of ACPI machinery is updated. We really don't want > earlyboot to fail because someone screwed up ACPI or initrd handling. earlyprintk is just another example to this SRAT issue. The local page table is yet another example. My hope here is for us to be able to utilize ACPI tables properly without hitting this kind of ordering issues again and again, which requires considerable time & effort to address. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote: > Platforms vendors (which care Linux) need to support the existing Linux > features. This means that they have to implement legacy interfaces on > x86 until the kernel supports an alternative method. For instance, some > platforms are legacy-free and do not have legacy COM ports. These ACPI > tables were defined so that non-legacy COM ports can be described and > informed to the OS. Without this support, such platforms may have to > emulate the legacy COM ports for Linux, or drop Linux support. Are you seriously saying that vendors are gonna drop linux support for lacking ACPI earlyprintk support? Please... Please take a look at the existing earlyprintk code and how compact and self-contained they are. If you want to add ACPI earlyprintk, do similar stuff. Forget about firmware blob override from initrd or ACPICA. Just implement the bare minimum to get the thing working. Do not add dependency to large body of code from earlyboot. It's a bad idea through and through. > I think the kernel boot-up sequence should be designed in such a way > that can support legacy-free and/or NUMA platforms properly. Blanket statements like the above don't mean much. There are many separate stages of boot and you're talking about one of the very first stages where we traditionally have always depended upon only the very bare minimum of the platform both in hardware itself and configuration information. We've been doing that for *very* good reasons. If you screw up there, it's mighty tricky to figure out what went wrong especially on the machines that you can't physically kick. You're now suggesting to add whole ACPI parsing including overloading from initrd into that stage with pretty weak rationale. Seriously, if you want ACPI based earlyprintk, implement it in a discrete minimal code which is easy to verify and won't get affected when the rest of ACPI machinery is updated. We really don't want earlyboot to fail because someone screwed up ACPI or initrd handling. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote: > On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote: > > Well, there is reason why we have earlyprintk feature today. So, let's > > not debate on this feature now. There was previous attempt to support > > Are you saying the existing earlyprintk automatically justifies > addition of more complex mechanism? The added complex of course > should be traded off against the benefits of gaining ACPI based early > boot. You aren't gonna suggest implementing netconsole based > earlyprintk, right? Platforms vendors (which care Linux) need to support the existing Linux features. This means that they have to implement legacy interfaces on x86 until the kernel supports an alternative method. For instance, some platforms are legacy-free and do not have legacy COM ports. These ACPI tables were defined so that non-legacy COM ports can be described and informed to the OS. Without this support, such platforms may have to emulate the legacy COM ports for Linux, or drop Linux support. > > this feature with ACPI tables below. As described, it had the same > > ordering issue. > > > > https://lkml.org/lkml/2012/10/8/498 > > > > There is a basic problem that when we try to use ACPI tables that > > extends or replaces legacy interfaces (ex. SRAT extending e820), we hit > > this ordering issue because ACPI is not available as early as the legacy > > interfaces. > > Do we even want ACPI parsing and all that that early? Parsing SRAT > early doesn't buy us much and I'm not sure whether adding ACPI > earlyprintk would increase or decrease debuggability during earlyboot. > It adds whole lot more code paths where things can go wrong while the > basic execution environment is unstable. Why do that? I think the kernel boot-up sequence should be designed in such a way that can support legacy-free and/or NUMA platforms properly. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote: > Well, there is reason why we have earlyprintk feature today. So, let's > not debate on this feature now. There was previous attempt to support Are you saying the existing earlyprintk automatically justifies addition of more complex mechanism? The added complex of course should be traded off against the benefits of gaining ACPI based early boot. You aren't gonna suggest implementing netconsole based earlyprintk, right? > this feature with ACPI tables below. As described, it had the same > ordering issue. > > https://lkml.org/lkml/2012/10/8/498 > > There is a basic problem that when we try to use ACPI tables that > extends or replaces legacy interfaces (ex. SRAT extending e820), we hit > this ordering issue because ACPI is not available as early as the legacy > interfaces. Do we even want ACPI parsing and all that that early? Parsing SRAT early doesn't buy us much and I'm not sure whether adding ACPI earlyprintk would increase or decrease debuggability during earlyboot. It adds whole lot more code paths where things can go wrong while the basic execution environment is unstable. Why do that? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote: > Hello, > > On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote: > > In current boot order, before we get the SRAT, we have a big consumer of > > early > > allocations: we are setting up the page table in top-down (The idea was > > proposed by HPA, > > Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page > > table > > setup will make the page tables as high as possible in memory, since memory > > at low > > addresses is precious (for stupid DMA devices, for things like > > kexec/kdump, and so on.) > > With huge mappings, they are fairly small, right? And this whole > thing needs a kernel param anyway at this point, so the allocation > direction can be made dependent on that or huge mapping availability > and, even with 4k mappings, we aren't talking about gigabytes of > memory, are we? > > > So if we are trying to make early allocations close to kernel image, we > > should > > rewrite the way we are setting up page table totally. That is not a easy > > thing > > to do. > > It has been a while since I looked at the code so can you please > elaborate why that is not easy? It's pretty simple conceptually. > > > * For memory hotplug, we need ACPI SRAT at early time to be aware of which > > memory > > ranges are hotpluggable, and tell the kernel to try to stay away from > > hotpluggable > > nodes. > > > > This one is the current requirement of us but may be very helpful for > > future change: > > > > * As suggested by Yinghai, we should allocate page tables in local node. > > This also > > needs SRAT before direct mapping page tables are setup. > > Does this even matter for huge mappings? > > > * As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables > > allow the OS to initialize serial console/debug ports at early boot time. > > The > > earlier it can be initialized, the better this feature will be. These > > tables > > are not currently used by Linux due to a licensing issue, but it could be > > addressed some time soon. > > > > So we decided to firstly make ACPI override earlier and use BRK (this is > > obviously > > near the kernel image range) to store the found ACPI tables. > > I don't know. The whole effort seems way overcomplicated compared to > the benefits it would bring. For NUMA memory hotunplug, what's the > point of doing all this when the kernel doesn't have any control over > where its image is gonna be? Some megabytes at the tail aren't gonna > make a huge difference and if you wanna do this properly, you need to > determine the load address of the kernel considering the node > boundaries and hotpluggability of each node, which has to happen > before the early kernel boot code executes. And if there's a code > piece which does that, that might as well place the kernel image such > that extra allocation afterwards doesn't interfere with memory > hotunplugging. > > It looks like a lot of code changes for a mechanism which doesn't seem > all that useful. This code is already too late in boot sequence to be > a proper solution so I don't see the point in pushing the coverage to > the maximum from here. It's kinda silly. > > The last point - early init of debug facility - makes some sense but > again how extra coverage are we talking about? The code path between > the two points is fairly short and the change doesn't come free. It > means we add more fragile firmware-specific code path before the > execution environment is stable and get to do things like traveling > the same code paths multiple times in different environments. Doesn't > seem like a win. We want to reach stable execution environment as > soon as possible. Shoving whole more logic before that in the name of > "earlier debugging" doesn't make a lot of sense. Well, there is reason why we have earlyprintk feature today. So, let's not debate on this feature now. There was previous attempt to support this feature with ACPI tables below. As described, it had the same ordering issue. https://lkml.org/lkml/2012/10/8/498 There is a basic problem that when we try to use ACPI tables that extends or replaces legacy interfaces (ex. SRAT extending e820), we hit this ordering issue because ACPI is not available as early as the legacy interfaces. Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote: > In current boot order, before we get the SRAT, we have a big consumer of early > allocations: we are setting up the page table in top-down (The idea was > proposed by HPA, > Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table > setup will make the page tables as high as possible in memory, since memory > at low > addresses is precious (for stupid DMA devices, for things like kexec/kdump, > and so on.) With huge mappings, they are fairly small, right? And this whole thing needs a kernel param anyway at this point, so the allocation direction can be made dependent on that or huge mapping availability and, even with 4k mappings, we aren't talking about gigabytes of memory, are we? > So if we are trying to make early allocations close to kernel image, we should > rewrite the way we are setting up page table totally. That is not a easy thing > to do. It has been a while since I looked at the code so can you please elaborate why that is not easy? It's pretty simple conceptually. > * For memory hotplug, we need ACPI SRAT at early time to be aware of which > memory > ranges are hotpluggable, and tell the kernel to try to stay away from > hotpluggable > nodes. > > This one is the current requirement of us but may be very helpful for future > change: > > * As suggested by Yinghai, we should allocate page tables in local node. This > also > needs SRAT before direct mapping page tables are setup. Does this even matter for huge mappings? > * As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables > allow the OS to initialize serial console/debug ports at early boot time. > The > earlier it can be initialized, the better this feature will be. These > tables > are not currently used by Linux due to a licensing issue, but it could be > addressed some time soon. > > So we decided to firstly make ACPI override earlier and use BRK (this is > obviously > near the kernel image range) to store the found ACPI tables. I don't know. The whole effort seems way overcomplicated compared to the benefits it would bring. For NUMA memory hotunplug, what's the point of doing all this when the kernel doesn't have any control over where its image is gonna be? Some megabytes at the tail aren't gonna make a huge difference and if you wanna do this properly, you need to determine the load address of the kernel considering the node boundaries and hotpluggability of each node, which has to happen before the early kernel boot code executes. And if there's a code piece which does that, that might as well place the kernel image such that extra allocation afterwards doesn't interfere with memory hotunplugging. It looks like a lot of code changes for a mechanism which doesn't seem all that useful. This code is already too late in boot sequence to be a proper solution so I don't see the point in pushing the coverage to the maximum from here. It's kinda silly. The last point - early init of debug facility - makes some sense but again how extra coverage are we talking about? The code path between the two points is fairly short and the change doesn't come free. It means we add more fragile firmware-specific code path before the execution environment is stable and get to do things like traveling the same code paths multiple times in different environments. Doesn't seem like a win. We want to reach stable execution environment as soon as possible. Shoving whole more logic before that in the name of "earlier debugging" doesn't make a lot of sense. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi tejun, On 08/21/2013 09:06 PM, Tejun Heo wrote: > Hello, > > On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote: >> [What are we doing] >> >> We are trying to initialize acip tables as early as possible. But Linux >> kernel >> allows users to override acpi tables by specifying their own tables in >> initrd. >> So we have to do acpi_initrd_override() earlier first. > > So, are we now back to making SRAT info as early as possible? What > happened to just co-locating early allocations close to kernel image? > What'd be the benefit of doing this over that? We know you are trying to give the direction to make the change more natural and robust and very thankful for your comments. We have taken your comments and suggestions about co-locating early allocations close to kernel image into consideration, but still we found that not that easy. In current boot order, before we get the SRAT, we have a big consumer of early allocations: we are setting up the page table in top-down (The idea was proposed by HPA, Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table setup will make the page tables as high as possible in memory, since memory at low addresses is precious (for stupid DMA devices, for things like kexec/kdump, and so on.) So if we are trying to make early allocations close to kernel image, we should rewrite the way we are setting up page table totally. That is not a easy thing to do. As for the benefits of the patchset, just as Tang said in this patch, * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable nodes. This one is the current requirement of us but may be very helpful for future change: * As suggested by Yinghai, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. * As mentioned by Toshi Kani , ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. So we decided to firstly make ACPI override earlier and use BRK (this is obviously near the kernel image range) to store the found ACPI tables. -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote: > [What are we doing] > > We are trying to initialize acip tables as early as possible. But Linux kernel > allows users to override acpi tables by specifying their own tables in initrd. > So we have to do acpi_initrd_override() earlier first. So, are we now back to making SRAT info as early as possible? What happened to just co-locating early allocations close to kernel image? What'd be the benefit of doing this over that? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi all, This patch-set has not been fully tested. I sent them first for you to review. Please comment if we can agree on this solution. Thanks.:) On 08/21/2013 06:15 PM, Tang Chen wrote: This patch-set aims to move acpi_initrd_override() earlier on x86. Some of the patches are from Yinghai's patch-set: https://lkml.org/lkml/2013/6/14/561 The difference between this patch-set and Yinghai's original patch-set are: 1. This patch-set doesn't split acpi_initrd_override(), but call it as a whole operation at early time. 2. Allocate memory from BRK to store override tables. (This idea is also from Yinghai.) [Current state] The current Linux kernel will initialize acpi tables like the following: 1. Find all acpi override table provided by users in initrd. (Linux allows users to override acpi tables in firmware, by specifying their own tables in initrd.) 2. Use acpica code to initialize acpi global root table list and install all tables into it. If any override tables exists, use it to override the one provided by firmware. Then others can parse these tables and get useful info. Both of the two steps happen after direct mapping page tables are setup. [Issues] In the current Linux kernel, the initialization of acpi tables is too late for new functionalities. We have some issues about this: * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and prevent bootmem allocator from allocating memory for the kernel. (Kernel pages cannot be hotplugged because ) * As suggested by Yinghai Lu, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. * As mentioned by Toshi Kani, ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. [What are we doing] We are trying to initialize acip tables as early as possible. But Linux kernel allows users to override acpi tables by specifying their own tables in initrd. So we have to do acpi_initrd_override() earlier first. [About this patch-set] This patch-set aims to move acpi_initrd_override() as early as possible on x86. As suggested by Yinghai, we are trying to do it like this: On 32bit: do it in head_32.S, before paging is enabled. In this case, we can access initrd with physical address without page tables. On 64bit: do it in head_64.c, after paging is enabled but before direct mapping is setup. And also, acpi_initrd_override() needs to allocate memory for override tables. But at such an early time, there is no memory allocator works. So the basic idea from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set. Tang Chen (6): x86, acpi: Move table_sigs[] to stack. x86, acpi, brk: Extend BRK 256KB to store acpi override tables. x86, brk: Make extend_brk() available with va/pa. x86, acpi: Make acpi_initrd_override() available with va or pa. x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa. x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c. Yinghai Lu (2): x86: Make get_ramdisk_{image|size}() global. x86, microcode: Use get_ramdisk_{image|size}() in microcode handling. arch/x86/include/asm/dmi.h |2 +- arch/x86/include/asm/setup.h| 11 +++- arch/x86/kernel/head64.c|4 + arch/x86/kernel/head_32.S |4 + arch/x86/kernel/microcode_intel_early.c |8 +- arch/x86/kernel/setup.c | 93 -- arch/x86/mm/init.c |2 +- arch/x86/xen/enlighten.c|2 +- arch/x86/xen/mmu.c |6 +- arch/x86/xen/p2m.c | 27 --- drivers/acpi/osl.c | 130 --- include/linux/acpi.h|5 +- 12 files changed, 196 insertions(+), 98 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email:mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi all, This patch-set has not been fully tested. I sent them first for you to review. Please comment if we can agree on this solution. Thanks.:) On 08/21/2013 06:15 PM, Tang Chen wrote: This patch-set aims to move acpi_initrd_override() earlier on x86. Some of the patches are from Yinghai's patch-set: https://lkml.org/lkml/2013/6/14/561 The difference between this patch-set and Yinghai's original patch-set are: 1. This patch-set doesn't split acpi_initrd_override(), but call it as a whole operation at early time. 2. Allocate memory from BRK to store override tables. (This idea is also from Yinghai.) [Current state] The current Linux kernel will initialize acpi tables like the following: 1. Find all acpi override table provided by users in initrd. (Linux allows users to override acpi tables in firmware, by specifying their own tables in initrd.) 2. Use acpica code to initialize acpi global root table list and install all tables into it. If any override tables exists, use it to override the one provided by firmware. Then others can parse these tables and get useful info. Both of the two steps happen after direct mapping page tables are setup. [Issues] In the current Linux kernel, the initialization of acpi tables is too late for new functionalities. We have some issues about this: * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and prevent bootmem allocator from allocating memory for the kernel. (Kernel pages cannot be hotplugged because ) * As suggested by Yinghai Luying...@kernel.org, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. * As mentioned by Toshi Kanitoshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. [What are we doing] We are trying to initialize acip tables as early as possible. But Linux kernel allows users to override acpi tables by specifying their own tables in initrd. So we have to do acpi_initrd_override() earlier first. [About this patch-set] This patch-set aims to move acpi_initrd_override() as early as possible on x86. As suggested by Yinghai, we are trying to do it like this: On 32bit: do it in head_32.S, before paging is enabled. In this case, we can access initrd with physical address without page tables. On 64bit: do it in head_64.c, after paging is enabled but before direct mapping is setup. And also, acpi_initrd_override() needs to allocate memory for override tables. But at such an early time, there is no memory allocator works. So the basic idea from Yinghai is to use BRK. We will extend BRK 256KB in this patch-set. Tang Chen (6): x86, acpi: Move table_sigs[] to stack. x86, acpi, brk: Extend BRK 256KB to store acpi override tables. x86, brk: Make extend_brk() available with va/pa. x86, acpi: Make acpi_initrd_override() available with va or pa. x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa. x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c. Yinghai Lu (2): x86: Make get_ramdisk_{image|size}() global. x86, microcode: Use get_ramdisk_{image|size}() in microcode handling. arch/x86/include/asm/dmi.h |2 +- arch/x86/include/asm/setup.h| 11 +++- arch/x86/kernel/head64.c|4 + arch/x86/kernel/head_32.S |4 + arch/x86/kernel/microcode_intel_early.c |8 +- arch/x86/kernel/setup.c | 93 -- arch/x86/mm/init.c |2 +- arch/x86/xen/enlighten.c|2 +- arch/x86/xen/mmu.c |6 +- arch/x86/xen/p2m.c | 27 --- drivers/acpi/osl.c | 130 --- include/linux/acpi.h|5 +- 12 files changed, 196 insertions(+), 98 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email:a href=mailto:d...@kvack.org; em...@kvack.org/a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote: [What are we doing] We are trying to initialize acip tables as early as possible. But Linux kernel allows users to override acpi tables by specifying their own tables in initrd. So we have to do acpi_initrd_override() earlier first. So, are we now back to making SRAT info as early as possible? What happened to just co-locating early allocations close to kernel image? What'd be the benefit of doing this over that? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hi tejun, On 08/21/2013 09:06 PM, Tejun Heo wrote: Hello, On Wed, Aug 21, 2013 at 06:15:35PM +0800, Tang Chen wrote: [What are we doing] We are trying to initialize acip tables as early as possible. But Linux kernel allows users to override acpi tables by specifying their own tables in initrd. So we have to do acpi_initrd_override() earlier first. So, are we now back to making SRAT info as early as possible? What happened to just co-locating early allocations close to kernel image? What'd be the benefit of doing this over that? We know you are trying to give the direction to make the change more natural and robust and very thankful for your comments. We have taken your comments and suggestions about co-locating early allocations close to kernel image into consideration, but still we found that not that easy. In current boot order, before we get the SRAT, we have a big consumer of early allocations: we are setting up the page table in top-down (The idea was proposed by HPA, Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table setup will make the page tables as high as possible in memory, since memory at low addresses is precious (for stupid DMA devices, for things like kexec/kdump, and so on.) So if we are trying to make early allocations close to kernel image, we should rewrite the way we are setting up page table totally. That is not a easy thing to do. As for the benefits of the patchset, just as Tang said in this patch, * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable nodes. This one is the current requirement of us but may be very helpful for future change: * As suggested by Yinghai, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. * As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. So we decided to firstly make ACPI override earlier and use BRK (this is obviously near the kernel image range) to store the found ACPI tables. -- Thanks. Zhang Yanfei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote: In current boot order, before we get the SRAT, we have a big consumer of early allocations: we are setting up the page table in top-down (The idea was proposed by HPA, Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table setup will make the page tables as high as possible in memory, since memory at low addresses is precious (for stupid DMA devices, for things like kexec/kdump, and so on.) With huge mappings, they are fairly small, right? And this whole thing needs a kernel param anyway at this point, so the allocation direction can be made dependent on that or huge mapping availability and, even with 4k mappings, we aren't talking about gigabytes of memory, are we? So if we are trying to make early allocations close to kernel image, we should rewrite the way we are setting up page table totally. That is not a easy thing to do. It has been a while since I looked at the code so can you please elaborate why that is not easy? It's pretty simple conceptually. * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable nodes. This one is the current requirement of us but may be very helpful for future change: * As suggested by Yinghai, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. Does this even matter for huge mappings? * As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. So we decided to firstly make ACPI override earlier and use BRK (this is obviously near the kernel image range) to store the found ACPI tables. I don't know. The whole effort seems way overcomplicated compared to the benefits it would bring. For NUMA memory hotunplug, what's the point of doing all this when the kernel doesn't have any control over where its image is gonna be? Some megabytes at the tail aren't gonna make a huge difference and if you wanna do this properly, you need to determine the load address of the kernel considering the node boundaries and hotpluggability of each node, which has to happen before the early kernel boot code executes. And if there's a code piece which does that, that might as well place the kernel image such that extra allocation afterwards doesn't interfere with memory hotunplugging. It looks like a lot of code changes for a mechanism which doesn't seem all that useful. This code is already too late in boot sequence to be a proper solution so I don't see the point in pushing the coverage to the maximum from here. It's kinda silly. The last point - early init of debug facility - makes some sense but again how extra coverage are we talking about? The code path between the two points is fairly short and the change doesn't come free. It means we add more fragile firmware-specific code path before the execution environment is stable and get to do things like traveling the same code paths multiple times in different environments. Doesn't seem like a win. We want to reach stable execution environment as soon as possible. Shoving whole more logic before that in the name of earlier debugging doesn't make a lot of sense. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
On Wed, 2013-08-21 at 11:36 -0400, Tejun Heo wrote: Hello, On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote: In current boot order, before we get the SRAT, we have a big consumer of early allocations: we are setting up the page table in top-down (The idea was proposed by HPA, Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table setup will make the page tables as high as possible in memory, since memory at low addresses is precious (for stupid DMA devices, for things like kexec/kdump, and so on.) With huge mappings, they are fairly small, right? And this whole thing needs a kernel param anyway at this point, so the allocation direction can be made dependent on that or huge mapping availability and, even with 4k mappings, we aren't talking about gigabytes of memory, are we? So if we are trying to make early allocations close to kernel image, we should rewrite the way we are setting up page table totally. That is not a easy thing to do. It has been a while since I looked at the code so can you please elaborate why that is not easy? It's pretty simple conceptually. * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable nodes. This one is the current requirement of us but may be very helpful for future change: * As suggested by Yinghai, we should allocate page tables in local node. This also needs SRAT before direct mapping page tables are setup. Does this even matter for huge mappings? * As mentioned by Toshi Kani toshi.k...@hp.com, ACPI SCPR/DBGP/DBG2 tables allow the OS to initialize serial console/debug ports at early boot time. The earlier it can be initialized, the better this feature will be. These tables are not currently used by Linux due to a licensing issue, but it could be addressed some time soon. So we decided to firstly make ACPI override earlier and use BRK (this is obviously near the kernel image range) to store the found ACPI tables. I don't know. The whole effort seems way overcomplicated compared to the benefits it would bring. For NUMA memory hotunplug, what's the point of doing all this when the kernel doesn't have any control over where its image is gonna be? Some megabytes at the tail aren't gonna make a huge difference and if you wanna do this properly, you need to determine the load address of the kernel considering the node boundaries and hotpluggability of each node, which has to happen before the early kernel boot code executes. And if there's a code piece which does that, that might as well place the kernel image such that extra allocation afterwards doesn't interfere with memory hotunplugging. It looks like a lot of code changes for a mechanism which doesn't seem all that useful. This code is already too late in boot sequence to be a proper solution so I don't see the point in pushing the coverage to the maximum from here. It's kinda silly. The last point - early init of debug facility - makes some sense but again how extra coverage are we talking about? The code path between the two points is fairly short and the change doesn't come free. It means we add more fragile firmware-specific code path before the execution environment is stable and get to do things like traveling the same code paths multiple times in different environments. Doesn't seem like a win. We want to reach stable execution environment as soon as possible. Shoving whole more logic before that in the name of earlier debugging doesn't make a lot of sense. Well, there is reason why we have earlyprintk feature today. So, let's not debate on this feature now. There was previous attempt to support this feature with ACPI tables below. As described, it had the same ordering issue. https://lkml.org/lkml/2012/10/8/498 There is a basic problem that when we try to use ACPI tables that extends or replaces legacy interfaces (ex. SRAT extending e820), we hit this ordering issue because ACPI is not available as early as the legacy interfaces. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote: Well, there is reason why we have earlyprintk feature today. So, let's not debate on this feature now. There was previous attempt to support Are you saying the existing earlyprintk automatically justifies addition of more complex mechanism? The added complex of course should be traded off against the benefits of gaining ACPI based early boot. You aren't gonna suggest implementing netconsole based earlyprintk, right? this feature with ACPI tables below. As described, it had the same ordering issue. https://lkml.org/lkml/2012/10/8/498 There is a basic problem that when we try to use ACPI tables that extends or replaces legacy interfaces (ex. SRAT extending e820), we hit this ordering issue because ACPI is not available as early as the legacy interfaces. Do we even want ACPI parsing and all that that early? Parsing SRAT early doesn't buy us much and I'm not sure whether adding ACPI earlyprintk would increase or decrease debuggability during earlyboot. It adds whole lot more code paths where things can go wrong while the basic execution environment is unstable. Why do that? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 15:54 -0400, Tejun Heo wrote: On Wed, Aug 21, 2013 at 01:31:43PM -0600, Toshi Kani wrote: Well, there is reason why we have earlyprintk feature today. So, let's not debate on this feature now. There was previous attempt to support Are you saying the existing earlyprintk automatically justifies addition of more complex mechanism? The added complex of course should be traded off against the benefits of gaining ACPI based early boot. You aren't gonna suggest implementing netconsole based earlyprintk, right? Platforms vendors (which care Linux) need to support the existing Linux features. This means that they have to implement legacy interfaces on x86 until the kernel supports an alternative method. For instance, some platforms are legacy-free and do not have legacy COM ports. These ACPI tables were defined so that non-legacy COM ports can be described and informed to the OS. Without this support, such platforms may have to emulate the legacy COM ports for Linux, or drop Linux support. this feature with ACPI tables below. As described, it had the same ordering issue. https://lkml.org/lkml/2012/10/8/498 There is a basic problem that when we try to use ACPI tables that extends or replaces legacy interfaces (ex. SRAT extending e820), we hit this ordering issue because ACPI is not available as early as the legacy interfaces. Do we even want ACPI parsing and all that that early? Parsing SRAT early doesn't buy us much and I'm not sure whether adding ACPI earlyprintk would increase or decrease debuggability during earlyboot. It adds whole lot more code paths where things can go wrong while the basic execution environment is unstable. Why do that? I think the kernel boot-up sequence should be designed in such a way that can support legacy-free and/or NUMA platforms properly. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, Toshi. On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote: Platforms vendors (which care Linux) need to support the existing Linux features. This means that they have to implement legacy interfaces on x86 until the kernel supports an alternative method. For instance, some platforms are legacy-free and do not have legacy COM ports. These ACPI tables were defined so that non-legacy COM ports can be described and informed to the OS. Without this support, such platforms may have to emulate the legacy COM ports for Linux, or drop Linux support. Are you seriously saying that vendors are gonna drop linux support for lacking ACPI earlyprintk support? Please... Please take a look at the existing earlyprintk code and how compact and self-contained they are. If you want to add ACPI earlyprintk, do similar stuff. Forget about firmware blob override from initrd or ACPICA. Just implement the bare minimum to get the thing working. Do not add dependency to large body of code from earlyboot. It's a bad idea through and through. I think the kernel boot-up sequence should be designed in such a way that can support legacy-free and/or NUMA platforms properly. Blanket statements like the above don't mean much. There are many separate stages of boot and you're talking about one of the very first stages where we traditionally have always depended upon only the very bare minimum of the platform both in hardware itself and configuration information. We've been doing that for *very* good reasons. If you screw up there, it's mighty tricky to figure out what went wrong especially on the machines that you can't physically kick. You're now suggesting to add whole ACPI parsing including overloading from initrd into that stage with pretty weak rationale. Seriously, if you want ACPI based earlyprintk, implement it in a discrete minimal code which is easy to verify and won't get affected when the rest of ACPI machinery is updated. We really don't want earlyboot to fail because someone screwed up ACPI or initrd handling. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello Tejun, On Wed, 2013-08-21 at 16:40 -0400, Tejun Heo wrote: On Wed, Aug 21, 2013 at 02:29:28PM -0600, Toshi Kani wrote: Platforms vendors (which care Linux) need to support the existing Linux features. This means that they have to implement legacy interfaces on x86 until the kernel supports an alternative method. For instance, some platforms are legacy-free and do not have legacy COM ports. These ACPI tables were defined so that non-legacy COM ports can be described and informed to the OS. Without this support, such platforms may have to emulate the legacy COM ports for Linux, or drop Linux support. Are you seriously saying that vendors are gonna drop linux support for lacking ACPI earlyprintk support? Please... earlyprintk is an example of the issues. The point is that vendors are required to support legacy stuff for Linux. Please take a look at the existing earlyprintk code and how compact and self-contained they are. If you want to add ACPI earlyprintk, do similar stuff. Forget about firmware blob override from initrd or ACPICA. Just implement the bare minimum to get the thing working. Do not add dependency to large body of code from earlyboot. It's a bad idea through and through. I am not saying that ACPI earlyprintk must be available at exactly the same point. How early it can reasonably be is a subject of discussion. I think the kernel boot-up sequence should be designed in such a way that can support legacy-free and/or NUMA platforms properly. Blanket statements like the above don't mean much. There are many separate stages of boot and you're talking about one of the very first stages where we traditionally have always depended upon only the very bare minimum of the platform both in hardware itself and configuration information. We've been doing that for *very* good reasons. If you screw up there, it's mighty tricky to figure out what went wrong especially on the machines that you can't physically kick. You're now suggesting to add whole ACPI parsing including overloading from initrd into that stage with pretty weak rationale. I agree that ACPI is rather complicated stuff. But in my experience, the majority complication comes from ACPI namespace and methods, not from ACPI tables. Do you really think ACPI table init is that risky? I consider ACPI tables are part of the minimum config info, esp. for legacy-free platforms. Seriously, if you want ACPI based earlyprintk, implement it in a discrete minimal code which is easy to verify and won't get affected when the rest of ACPI machinery is updated. We really don't want earlyboot to fail because someone screwed up ACPI or initrd handling. earlyprintk is just another example to this SRAT issue. The local page table is yet another example. My hope here is for us to be able to utilize ACPI tables properly without hitting this kind of ordering issues again and again, which requires considerable time effort to address. Thanks, -Toshi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Hello, On Wed, Aug 21, 2013 at 04:36:35PM -0600, Toshi Kani wrote: I agree that ACPI is rather complicated stuff. But in my experience, the majority complication comes from ACPI namespace and methods, not from ACPI tables. Do you really think ACPI table init is that risky? I consider ACPI tables are part of the minimum config info, esp. for legacy-free platforms. It's just that we're talking about the very first stage of boot. We really don't do much there and pulling in ACPI code into that stage is a lot by comparison. If that's gonna happen, it needs pretty strong justification. earlyprintk is just another example to this SRAT issue. The local page table is yet another example. My hope here is for us to be able to utilize ACPI tables properly without hitting this kind of ordering issues again and again, which requires considerable time effort to address. So, the two things brought up at this point are early parsing of SRAT, which can't really solve the problem at hand anyway, and earlyprintk which should be implemented in minimal way which is not activated unless specifically enabled with earlyprintk boot param. Neither seems to justify pulling in full ACPI into early boot, right? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/