Re: Kernel panic from recent build

2016-05-03 Thread Eric van Gyzen
I intend it as a workaround to be committed, not as debugging.

Eric

On 05/02/2016 18:51, Ryan Stone wrote:
> Do we need this debug output?  It's quite clear from the acpidump output
> that there is an entry for APIC ID 0 in memory domain 0 and memory
> domain 1.  Not sure if that's legal by the spec.
> 
> On Mon, May 2, 2016 at 6:17 PM, Eric van Gyzen  > wrote:
> 
> On 05/02/2016 16:14, Bill O'Hanlon wrote:
> > On Mon, May 2, 2016 at 3:55 PM, John Baldwin  > wrote:
> >
> >> On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
> >>> ​
> >>>  IMG_20160502_130335.jpg
> >>> <
> >> 
> https://drive.google.com/file/d/1dtJxTwWXfhXVUUtn1Vvpzh3laJt7AILyCg/view?usp=drive_web
> >>> ​
> >>> I'm getting the following panic from a recent (May 2, 2016) build.
> >>> panic: Duplicate local APIC ID 0
> >>>
> >>> The system is a Dell Precision T5500 with generic factory BIOS 
> settings.
> >>> It has run previous builds without event for several years.
> >>>
> >>> I'm attaching a link to a photo of the screen for added details.
> >> Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
> >> the output of 'acpidump -t' on your next boot.  The SRAT table used by
> >> the NUMA code appears to be corrupted by your BIOS.
> >>
> >> --
> >> John Baldwin
> >>
> >
> > That allowed me to boot.  I'm attaching the output of 'acpidump -t'.
> > Thanks!
> 
> Bill,
> 
> Do you have the time and interest to test this patch?  If so, remove the
> line that you added to /boot/loader.conf so the patch actually gets
> exercised.
> 
> Eric
> 
> 
> diff --git a/sys/x86/acpica/srat.c b/sys/x86/acpica/srat.c
> index 85f1922..1d0f73d 100644
> --- a/sys/x86/acpica/srat.c
> +++ b/sys/x86/acpica/srat.c
> @@ -201,8 +201,12 @@ srat_parse_entry(ACPI_SUBTABLE_HEADER *entry, void
> *arg)
>  "enabled" : "disabled");
>  if (!(cpu->Flags & ACPI_SRAT_CPU_ENABLED))
>  break;
> -KASSERT(!cpus[cpu->ApicId].enabled,
> -("Duplicate local APIC ID %u", cpu->ApicId));
> +if (cpus[cpu->ApicId].enabled) {
> +printf("SRAT: Duplicate local APIC ID %u\n",
> +cpu->ApicId);
> +*(int *)arg = ENXIO;
> +break;
> +}
>  cpus[cpu->ApicId].domain = domain;
>  cpus[cpu->ApicId].enabled = 1;
>  break;
> 
> ___
> freebsd-current@freebsd.org 
> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscr...@freebsd.org
> "
> 
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Kernel panic from recent build

2016-05-02 Thread John Baldwin
On Monday, May 02, 2016 05:17:49 PM Eric van Gyzen wrote:
> On 05/02/2016 16:14, Bill O'Hanlon wrote:
> > On Mon, May 2, 2016 at 3:55 PM, John Baldwin  wrote:
> >
> >> On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
> >>> ​
> >>>  IMG_20160502_130335.jpg
> >>> <
> >> https://drive.google.com/file/d/1dtJxTwWXfhXVUUtn1Vvpzh3laJt7AILyCg/view?usp=drive_web
> >>> ​
> >>> I'm getting the following panic from a recent (May 2, 2016) build.
> >>> panic: Duplicate local APIC ID 0
> >>>
> >>> The system is a Dell Precision T5500 with generic factory BIOS settings.
> >>> It has run previous builds without event for several years.
> >>>
> >>> I'm attaching a link to a photo of the screen for added details.
> >> Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
> >> the output of 'acpidump -t' on your next boot.  The SRAT table used by
> >> the NUMA code appears to be corrupted by your BIOS.
> >>
> >> --
> >> John Baldwin
> >>
> >
> > That allowed me to boot.  I'm attaching the output of 'acpidump -t'.
> > Thanks!
> 
> Bill,
> 
> Do you have the time and interest to test this patch?  If so, remove the
> line that you added to /boot/loader.conf so the patch actually gets
> exercised.

This patch looks fine to me.  A shame since the SRAT is mostly correct, but
there are spurious 'enabled' entries in the table.  If we wanted to be more
forgiving, we could perhaps just warn about duplicate entries if they all
agree.  That might work for this system for APIC ID 0, but there are also
some enabled CPUs with APIC ID 1 which I bet don't exist on this system, so
that would probably result in a panic anyway.

You might try updating your BIOS if you are brave.

> Eric
> 
> 
> diff --git a/sys/x86/acpica/srat.c b/sys/x86/acpica/srat.c
> index 85f1922..1d0f73d 100644
> --- a/sys/x86/acpica/srat.c
> +++ b/sys/x86/acpica/srat.c
> @@ -201,8 +201,12 @@ srat_parse_entry(ACPI_SUBTABLE_HEADER *entry, void
> *arg)
>  "enabled" : "disabled");
>  if (!(cpu->Flags & ACPI_SRAT_CPU_ENABLED))
>  break;
> -KASSERT(!cpus[cpu->ApicId].enabled,
> -("Duplicate local APIC ID %u", cpu->ApicId));
> +if (cpus[cpu->ApicId].enabled) {
> +printf("SRAT: Duplicate local APIC ID %u\n",
> +cpu->ApicId);
> +*(int *)arg = ENXIO;
> +break;
> +}
>  cpus[cpu->ApicId].domain = domain;
>  cpus[cpu->ApicId].enabled = 1;
>  break;
> 


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Kernel panic from recent build

2016-05-02 Thread Ryan Stone
Do we need this debug output?  It's quite clear from the acpidump output
that there is an entry for APIC ID 0 in memory domain 0 and memory domain
1.  Not sure if that's legal by the spec.

On Mon, May 2, 2016 at 6:17 PM, Eric van Gyzen  wrote:

> On 05/02/2016 16:14, Bill O'Hanlon wrote:
> > On Mon, May 2, 2016 at 3:55 PM, John Baldwin  wrote:
> >
> >> On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
> >>> ​
> >>>  IMG_20160502_130335.jpg
> >>> <
> >>
> https://drive.google.com/file/d/1dtJxTwWXfhXVUUtn1Vvpzh3laJt7AILyCg/view?usp=drive_web
> >>> ​
> >>> I'm getting the following panic from a recent (May 2, 2016) build.
> >>> panic: Duplicate local APIC ID 0
> >>>
> >>> The system is a Dell Precision T5500 with generic factory BIOS
> settings.
> >>> It has run previous builds without event for several years.
> >>>
> >>> I'm attaching a link to a photo of the screen for added details.
> >> Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
> >> the output of 'acpidump -t' on your next boot.  The SRAT table used by
> >> the NUMA code appears to be corrupted by your BIOS.
> >>
> >> --
> >> John Baldwin
> >>
> >
> > That allowed me to boot.  I'm attaching the output of 'acpidump -t'.
> > Thanks!
>
> Bill,
>
> Do you have the time and interest to test this patch?  If so, remove the
> line that you added to /boot/loader.conf so the patch actually gets
> exercised.
>
> Eric
>
>
> diff --git a/sys/x86/acpica/srat.c b/sys/x86/acpica/srat.c
> index 85f1922..1d0f73d 100644
> --- a/sys/x86/acpica/srat.c
> +++ b/sys/x86/acpica/srat.c
> @@ -201,8 +201,12 @@ srat_parse_entry(ACPI_SUBTABLE_HEADER *entry, void
> *arg)
>  "enabled" : "disabled");
>  if (!(cpu->Flags & ACPI_SRAT_CPU_ENABLED))
>  break;
> -KASSERT(!cpus[cpu->ApicId].enabled,
> -("Duplicate local APIC ID %u", cpu->ApicId));
> +if (cpus[cpu->ApicId].enabled) {
> +printf("SRAT: Duplicate local APIC ID %u\n",
> +cpu->ApicId);
> +*(int *)arg = ENXIO;
> +break;
> +}
>  cpus[cpu->ApicId].domain = domain;
>  cpus[cpu->ApicId].enabled = 1;
>  break;
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Kernel panic from recent build

2016-05-02 Thread Eric van Gyzen
On 05/02/2016 16:14, Bill O'Hanlon wrote:
> On Mon, May 2, 2016 at 3:55 PM, John Baldwin  wrote:
>
>> On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
>>> ​
>>>  IMG_20160502_130335.jpg
>>> <
>> https://drive.google.com/file/d/1dtJxTwWXfhXVUUtn1Vvpzh3laJt7AILyCg/view?usp=drive_web
>>> ​
>>> I'm getting the following panic from a recent (May 2, 2016) build.
>>> panic: Duplicate local APIC ID 0
>>>
>>> The system is a Dell Precision T5500 with generic factory BIOS settings.
>>> It has run previous builds without event for several years.
>>>
>>> I'm attaching a link to a photo of the screen for added details.
>> Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
>> the output of 'acpidump -t' on your next boot.  The SRAT table used by
>> the NUMA code appears to be corrupted by your BIOS.
>>
>> --
>> John Baldwin
>>
>
> That allowed me to boot.  I'm attaching the output of 'acpidump -t'.
> Thanks!

Bill,

Do you have the time and interest to test this patch?  If so, remove the
line that you added to /boot/loader.conf so the patch actually gets
exercised.

Eric


diff --git a/sys/x86/acpica/srat.c b/sys/x86/acpica/srat.c
index 85f1922..1d0f73d 100644
--- a/sys/x86/acpica/srat.c
+++ b/sys/x86/acpica/srat.c
@@ -201,8 +201,12 @@ srat_parse_entry(ACPI_SUBTABLE_HEADER *entry, void
*arg)
 "enabled" : "disabled");
 if (!(cpu->Flags & ACPI_SRAT_CPU_ENABLED))
 break;
-KASSERT(!cpus[cpu->ApicId].enabled,
-("Duplicate local APIC ID %u", cpu->ApicId));
+if (cpus[cpu->ApicId].enabled) {
+printf("SRAT: Duplicate local APIC ID %u\n",
+cpu->ApicId);
+*(int *)arg = ENXIO;
+break;
+}
 cpus[cpu->ApicId].domain = domain;
 cpus[cpu->ApicId].enabled = 1;
 break;

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Kernel panic from recent build

2016-05-02 Thread Bill O'Hanlon
On Mon, May 2, 2016 at 3:55 PM, John Baldwin  wrote:

> On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
> > ​
> >  IMG_20160502_130335.jpg
> > <
> https://drive.google.com/file/d/1dtJxTwWXfhXVUUtn1Vvpzh3laJt7AILyCg/view?usp=drive_web
> >
> > ​
> > I'm getting the following panic from a recent (May 2, 2016) build.
> > panic: Duplicate local APIC ID 0
> >
> > The system is a Dell Precision T5500 with generic factory BIOS settings.
> > It has run previous builds without event for several years.
> >
> > I'm attaching a link to a photo of the screen for added details.
>
> Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
> the output of 'acpidump -t' on your next boot.  The SRAT table used by
> the NUMA code appears to be corrupted by your BIOS.
>
> --
> John Baldwin
>


That allowed me to boot.  I'm attaching the output of 'acpidump -t'.
Thanks!


-- 
Bill O'Hanlon
612-205-9643


ACPIDUMP2
Description: Binary data
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Kernel panic from recent build

2016-05-02 Thread John Baldwin
On Monday, May 02, 2016 01:35:54 PM Bill O'Hanlon wrote:
> ​
>  IMG_20160502_130335.jpg
> 
> ​
> I'm getting the following panic from a recent (May 2, 2016) build.
> panic: Duplicate local APIC ID 0
> 
> The system is a Dell Precision T5500 with generic factory BIOS settings.
> It has run previous builds without event for several years.
> 
> I'm attaching a link to a photo of the screen for added details.

Try setting 'hint.srat.0.disabled=1' at the loader prompt and then grab
the output of 'acpidump -t' on your next boot.  The SRAT table used by
the NUMA code appears to be corrupted by your BIOS.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Kernel panic from recent build

2016-05-02 Thread Bill O'Hanlon
​
 IMG_20160502_130335.jpg

​
I'm getting the following panic from a recent (May 2, 2016) build.
panic: Duplicate local APIC ID 0

The system is a Dell Precision T5500 with generic factory BIOS settings.
It has run previous builds without event for several years.

I'm attaching a link to a photo of the screen for added details.
-- 
Bill O'Hanlon
612-205-9643
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"