Guy,
Someone forwarded your email to me since I'm not on the
[email protected] email alias. I am the person who added
support to Solaris to use the ACPI System Resource Affinity Table (SRAT).
Your workaround is mostly ok except for a couple of little issues:
- You seem to imply that you're supposed to set the value of boot
property to "y" to disable Solaris from using the SRAT, but your code
doesn't seem to check whether the value of the boot property is "y" or not.
- There is already an existing kernel variable for enabling/disabling
the use of the SRAT called "lgrp_plat_srat_enable" in
usr/src/uts/i86pc/os/lgrpplat.c
More importantly, I am interested in finding out more about *why* you
are doing this, but I am addressing that in a separate email thread with
you and those who are involved in CR#6745357.
Jonathan
-------- Original Message --------
Subject: [osol-code] workaround proposal for bug 6745357
Date: Mon, 23 Feb 2009 06:32:17 -0800 (PST)
From: Guy <[email protected]>
To: [email protected]
Hello
I worked lately on the bug 6745357 (kernel crash during startup at
page_ctr_add_internal).
I have observed the following :
The bug was introduced from nv_88.
The changeset that breaks the boot is this one :
Issues Resolved:
BUG/RFE:6594519Need support for ACPI System Resource Affinity Table
(SRAT)
BUG/RFE:6621201Need support to read ACPI System Locality Information
Table (SLIT)
BUG/RFE:6688471x86/x64 lgroup platform support code needs cleaning
Files Changed:
update:usr/src/uts/i86pc/os/acpi_fw.h
update:usr/src/uts/i86pc/os/cpuid.c
update:usr/src/uts/i86pc/os/fakebop.c
update:usr/src/uts/i86pc/os/lgrpplat.c
update:usr/src/uts/i86pc/os/mlsetup.c
update:usr/src/uts/intel/sys/x86_archext.h
I haven't found a fix yet to resolve this issue, but I tested a
workaround :
I modified the lgrpplat.c to add a new boot option (-B
disable-numa-srat=y), that (when detected) bypasses the call to
lgrp_plat_process_srat.
<code>
# diff -c lgrpplat.c_orig usr/src/uts/i86pc/os/lgrpplat.c
*** lgrpplat.c_orig Sun Feb 22 15:36:42 2009
--- usr/src/uts/i86pc/os/lgrpplat.c Mon Feb 23 14:13:04 2009
***************
*** 150,155 ****
--- 150,157 ----
#define MAX_NODES 8
#define NLGRP (MAX_NODES * (MAX_NODES - 1) + 1)
+ #define BP_DISABLE_NUMA_SRAT "disable-numa-srat"
+
/*
* Constants for configuring probing
*/
***************
*** 684,690 ****
--- 686,695 ----
lgrp_plat_node_cnt = max_mem_nodes = 1;
#else /* __xpv */
uint_t probe_op;
+ int boot_prop_len;
+ char *boot_prop_name = BP_DISABLE_NUMA_SRAT;
+
/*
* Initialize as a UMA machine
*/
***************
*** 700,710 ****
lgrp_plat_apic_ncpus =
lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
/*
* Determine which CPUs and memory are local to each other and
number
* of NUMA nodes by reading ACPI System Resource Affinity
Table (SRAT)
*/
! if (lgrp_plat_apic_ncpus > 0) {
int retval;
retval = lgrp_plat_process_srat(srat_ptr,
--- 705,720 ----
lgrp_plat_apic_ncpus =
lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
+ boot_prop_len = BOP_GETPROPLEN(bootops, boot_prop_name);
+ if (boot_prop_len > 0) {
+ lgrp_plat_srat_error=-1;
+ }
+
/*
* Determine which CPUs and memory are local to each other and
number
* of NUMA nodes by reading ACPI System Resource Affinity
Table (SRAT)
*/
! if ((boot_prop_len <= 0) && lgrp_plat_apic_ncpus > 0) {
int retval;
retval = lgrp_plat_process_srat(srat_ptr,
</code>
This change has low impact, since it does nothing when no additional
boot option is added.
It allows to boot nevada on impacted platform (boot broken since build
88 !).
What do think about it ?
Can this be included in the source ?
Regards
Guy
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code