Guy,

Someone forwarded your email to me since I'm not on the [email protected] email alias. I am the person who added support to Solaris to use the ACPI System Resource Affinity Table (SRAT).

Your workaround is mostly ok except for a couple of little issues:

- You seem to imply that you're supposed to set the value of boot property to "y" to disable Solaris from using the SRAT, but your code doesn't seem to check whether the value of the boot property is "y" or not.

- There is already an existing kernel variable for enabling/disabling the use of the SRAT called "lgrp_plat_srat_enable" in usr/src/uts/i86pc/os/lgrpplat.c


More importantly, I am interested in finding out more about *why* you are doing this, but I am addressing that in a separate email thread with you and those who are involved in CR#6745357.



Jonathan


-------- Original Message --------
Subject: [osol-code] workaround proposal for bug 6745357
Date: Mon, 23 Feb 2009 06:32:17 -0800 (PST)
From: Guy <[email protected]>
To: [email protected]

Hello

I worked lately on the bug 6745357 (kernel crash during startup at page_ctr_add_internal).

I have observed the following :

The bug was introduced from nv_88.

The changeset that breaks the boot is this one :

Issues Resolved:
BUG/RFE:6594519Need support for ACPI System Resource Affinity Table (SRAT) BUG/RFE:6621201Need support to read ACPI System Locality Information Table (SLIT)
BUG/RFE:6688471x86/x64 lgroup platform support code needs cleaning
Files Changed:
update:usr/src/uts/i86pc/os/acpi_fw.h
update:usr/src/uts/i86pc/os/cpuid.c
update:usr/src/uts/i86pc/os/fakebop.c
update:usr/src/uts/i86pc/os/lgrpplat.c
update:usr/src/uts/i86pc/os/mlsetup.c
update:usr/src/uts/intel/sys/x86_archext.h



I haven't found a fix yet to resolve this issue, but I tested a workaround : I modified the lgrpplat.c to add a new boot option (-B disable-numa-srat=y), that (when detected) bypasses the call to lgrp_plat_process_srat.

<code>
# diff -c lgrpplat.c_orig usr/src/uts/i86pc/os/lgrpplat.c
*** lgrpplat.c_orig     Sun Feb 22 15:36:42 2009
--- usr/src/uts/i86pc/os/lgrpplat.c     Mon Feb 23 14:13:04 2009
***************
*** 150,155 ****
--- 150,157 ----
  #define       MAX_NODES               8
  #define       NLGRP                   (MAX_NODES * (MAX_NODES - 1) + 1)

+ #define BP_DISABLE_NUMA_SRAT  "disable-numa-srat"
+
  /*
   * Constants for configuring probing
   */
***************
*** 684,690 ****
--- 686,695 ----
        lgrp_plat_node_cnt = max_mem_nodes = 1;
  #else /* __xpv */
        uint_t  probe_op;
+       int     boot_prop_len;
+       char    *boot_prop_name = BP_DISABLE_NUMA_SRAT;

+
        /*
         * Initialize as a UMA machine
         */
***************
*** 700,710 ****
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);

        /*
* Determine which CPUs and memory are local to each other and number * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if (lgrp_plat_apic_ncpus > 0) {
                int     retval;

                retval = lgrp_plat_process_srat(srat_ptr,
--- 705,720 ----
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);

+       boot_prop_len = BOP_GETPROPLEN(bootops, boot_prop_name);
+       if (boot_prop_len > 0) {
+               lgrp_plat_srat_error=-1;
+               }
+
        /*
* Determine which CPUs and memory are local to each other and number * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if ((boot_prop_len <= 0) && lgrp_plat_apic_ncpus > 0) {
                int     retval;

                retval = lgrp_plat_process_srat(srat_ptr,
</code>


This change has low impact, since it does nothing when no additional boot option is added.

It allows to boot nevada on impacted platform (boot broken since build 88 !).

What do think about it ?
Can this be included in the source ?

Regards

Guy

_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to