Hello

I worked lately on the bug 6745357 (kernel crash during startup at 
page_ctr_add_internal).

I have observed the following :

The bug was introduced from nv_88.

The changeset that breaks the boot is this one :

Issues Resolved:
BUG/RFE:6594519Need support for ACPI System Resource Affinity Table (SRAT)
BUG/RFE:6621201Need support to read ACPI System Locality Information Table 
(SLIT)
BUG/RFE:6688471x86/x64 lgroup platform support code needs cleaning
Files Changed:
update:usr/src/uts/i86pc/os/acpi_fw.h
update:usr/src/uts/i86pc/os/cpuid.c
update:usr/src/uts/i86pc/os/fakebop.c
update:usr/src/uts/i86pc/os/lgrpplat.c
update:usr/src/uts/i86pc/os/mlsetup.c
update:usr/src/uts/intel/sys/x86_archext.h



I haven't found a fix yet to resolve this issue, but I tested a workaround :
I modified the lgrpplat.c to add a new boot option (-B disable-numa-srat=y), 
that (when detected) bypasses the call to lgrp_plat_process_srat.

<code>
# diff -c lgrpplat.c_orig usr/src/uts/i86pc/os/lgrpplat.c 
*** lgrpplat.c_orig     Sun Feb 22 15:36:42 2009
--- usr/src/uts/i86pc/os/lgrpplat.c     Mon Feb 23 14:13:04 2009
***************
*** 150,155 ****
--- 150,157 ----
  #define       MAX_NODES               8
  #define       NLGRP                   (MAX_NODES * (MAX_NODES - 1) + 1)
  
+ #define BP_DISABLE_NUMA_SRAT  "disable-numa-srat"
+ 
  /*
   * Constants for configuring probing
   */
***************
*** 684,690 ****
--- 686,695 ----
        lgrp_plat_node_cnt = max_mem_nodes = 1;
  #else /* __xpv */
        uint_t  probe_op;
+       int     boot_prop_len;
+       char    *boot_prop_name = BP_DISABLE_NUMA_SRAT;
  
+ 
        /*
         * Initialize as a UMA machine
         */
***************
*** 700,710 ****
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
  
        /*
         * Determine which CPUs and memory are local to each other and number
         * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if (lgrp_plat_apic_ncpus > 0) {
                int     retval;
  
                retval = lgrp_plat_process_srat(srat_ptr,
--- 705,720 ----
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
  
+       boot_prop_len = BOP_GETPROPLEN(bootops, boot_prop_name);
+       if (boot_prop_len > 0) {
+               lgrp_plat_srat_error=-1;
+               }
+ 
        /*
         * Determine which CPUs and memory are local to each other and number
         * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if ((boot_prop_len <= 0) && lgrp_plat_apic_ncpus > 0) {
                int     retval;
  
                retval = lgrp_plat_process_srat(srat_ptr,
</code>


This change has low impact, since it does nothing when no additional boot 
option is added.

It allows to boot nevada on impacted platform (boot broken since build 88 !).

What do think about it ?
Can this be included in the source ?

Regards

Guy
-- 
This message posted from opensolaris.org
# diff -c lgrpplat.c_orig usr/src/uts/i86pc/os/lgrpplat.c 
*** lgrpplat.c_orig     Sun Feb 22 15:36:42 2009
--- usr/src/uts/i86pc/os/lgrpplat.c     Mon Feb 23 14:13:04 2009
***************
*** 150,155 ****
--- 150,157 ----
  #define       MAX_NODES               8
  #define       NLGRP                   (MAX_NODES * (MAX_NODES - 1) + 1)
  
+ #define BP_DISABLE_NUMA_SRAT  "disable-numa-srat"
+ 
  /*
   * Constants for configuring probing
   */
***************
*** 684,690 ****
--- 686,695 ----
        lgrp_plat_node_cnt = max_mem_nodes = 1;
  #else /* __xpv */
        uint_t  probe_op;
+       int     boot_prop_len;
+       char    *boot_prop_name = BP_DISABLE_NUMA_SRAT;
  
+ 
        /*
         * Initialize as a UMA machine
         */
***************
*** 700,710 ****
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
  
        /*
         * Determine which CPUs and memory are local to each other and number
         * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if (lgrp_plat_apic_ncpus > 0) {
                int     retval;
  
                retval = lgrp_plat_process_srat(srat_ptr,
--- 705,720 ----
        lgrp_plat_apic_ncpus =
            lgrp_plat_process_cpu_apicids(lgrp_plat_cpu_node);
  
+       boot_prop_len = BOP_GETPROPLEN(bootops, boot_prop_name);
+       if (boot_prop_len > 0) {
+               lgrp_plat_srat_error=-1;
+               }
+ 
        /*
         * Determine which CPUs and memory are local to each other and number
         * of NUMA nodes by reading ACPI System Resource Affinity Table (SRAT)
         */
!       if ((boot_prop_len <= 0) && lgrp_plat_apic_ncpus > 0) {
                int     retval;
  
                retval = lgrp_plat_process_srat(srat_ptr,
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to