Re: kernel 8.2 and 9.1 crashes

2020-11-12 Thread Martin Husemann
On Fri, Nov 13, 2020 at 07:35:24AM +0100, tlaro...@polynum.com wrote:
> I tried to recompile a kernel, with 8.2 and with 9.1 and both
> crash, 9.1 with:
> 
> unable to execute instruction 0x18 (SMEP)
> 
> (from memory)

This is (I guess) the kernel jumping through a NULL function pointer.

> The kernel enters debugging but the keyboard being unusable (no key
> does whatever) I have to hard reboot.

Does boot -c work for you?
If it works, try disabling all USB host controllers, like:

disable ehci
disable xhci
q

and then see if the keyboard works in the debugger. We need a backtrace
(bt command) from the crash.

If that does not help, rebuild a local kernel with:

options DDB_COMMANDONENTER="bt"

Martin


kernel 8.2 and 9.1 crashes

2020-11-12 Thread tlaronde
I tried to recompile a kernel, with 8.2 and with 9.1 and both
crash, 9.1 with:

unable to execute instruction 0x18 (SMEP)

(from memory)

The kernel enters debugging but the keyboard being unusable (no key
does whatever) I have to hard reboot.

The last message (via dmesg) from 9.1 is:

[   1.7964198] ahcisata0 port 3: device present, speed: 6.0Gb/s

It works with 8.0.

Here is the dmesg from 8.0:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 8.0 (CONFIG) #0: Thu Apr 16 18:47:07 CEST 2020

tlaronde@cauchy.polynum.local:/usr/obj/polynum.NODECONF-cauchy.polynum.local_netbsd-8.0-amd64_netbsd-amd64/obj/sys/arch/amd64/compile/CONFIG
total memory = 8120 MB
avail memory = 7868 MB
cpu_rng: RDRAND
rnd: seeded with 128 bits
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
MSI MS-7823 (1.0)
mainbus0 (root)
ACPI: RSDP 0x000F04A0 24 (v02 ALASKA)
ACPI: XSDT 0xDDF9A078 74 (v01 ALASKA A M I01072009 AMI  
00010013)
ACPI: FACP 0xDDFA7AC8 00010C (v05 ALASKA A M I01072009 AMI  
00010013)
ACPI: DSDT 0xDDF9A188 00D940 (v02 ALASKA A M I0034 INTL 
20120711)
ACPI: FACS 0xDDFC7F80 40
ACPI: APIC 0xDDFA7BD8 62 (v03 ALASKA A M I01072009 AMI  
00010013)
ACPI: FPDT 0xDDFA7C40 44 (v01 ALASKA A M I01072009 AMI  
00010013)
ACPI: SSDT 0xDDFA7C88 000539 (v01 PmRef  Cpu0Ist  3000 INTL 
20120711)
ACPI: SSDT 0xDDFA81C8 000AD8 (v01 PmRef  CpuPm3000 INTL 
20120711)
ACPI: MCFG 0xDDFA8CA0 3C (v01 ALASKA A M I01072009 MSFT 
0097)
ACPI: HPET 0xDDFA8CE0 38 (v01 ALASKA A M I01072009 AMI. 
0005)
ACPI: SSDT 0xDDFA8D18 00036D (v01 SataRe SataTabl 1000 INTL 
20120711)
ACPI: SSDT 0xDDFA9088 0034E1 (v01 SaSsdt SaSsdt   3000 INTL 
20091112)
ACPI: ASF! 0xDDFAC570 A5 (v32 INTEL   HCG 0001 TFSM 
000F4240)
ACPI: Executed 1 blocks of module-level executable AML code
ACPI: 5 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 8: pa 0xfec0, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Pentium(R) CPU G3220 @ 3.00GHz, id 0x306c3
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 2
cpu1: Intel(R) Pentium(R) CPU G3220 @ 3.00GHz, id 0x306c3
cpu1: package 0, core 1, smt 0
acpi0 at mainbus0: Intel ACPICA 20170303
acpi0: X/RSDT: OemId , AslId 
acpi0: MCFG: segment 0, bus 0-63, address 0xf800
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xFE821BD9E010 0003D3 (v01 PmRef  Cpu0Cst  3001 INTL 
20120711)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xFE810E813810 0005AA (v01 PmRef  ApIst3000 INTL 
20120711)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xFE821BCFB1D0 000119 (v01 PmRef  ApCst3000 INTL 
20120711)
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
acpiec0 at acpi0 (H_EC, PNP0C09-1)acpiec0: unable to evaluate _GPE: AE_NOT_FOUND
TPMX (PNP0C01) at acpi0 not configured
FWHD (INT0800) at acpi0 not configured
LDRC (PNP0C02) at acpi0 not configured
attimer1 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53 irq 0
CWDT (INT3F0D) at acpi0 not configured
SIO1 (PNP0C02) at acpi0 not configured
com2 at acpi0 (UAR1, PNP0501-1): io 0x3f8-0x3ff irq 4
com2: ns16550a, working fifo
lpt2 at acpi0 (LPTE, PNP0400): io 0x378-0x37f irq 5
RMSC (PNP0C02) at acpi0 not configured
acpiwmi0 at acpi0 (WMI1, PNP0C14-MXM2): ACPI WMI Interface
acpiwmibus at acpiwmi0 not configured
PDRC (PNP0C02) at acpi0 not configured
acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
acpiwmi1 at acpi0 (WMIO, PNP0C14-0): ACPI WMI Interface
acpiwmibus at acpiwmi1 not configured
PTMD (INT3394) at acpi0 not configured
acpifan0 at acpi0 (FAN0, PNP0C0B-0): ACPI Fan
acpifan1 at acpi0 (FAN1, PNP0C0B-1): ACPI Fan
acpifan2 at acpi0 (FAN2, PNP0C0B-2): ACPI Fan
acpifan3 at acpi0 (FAN3, PNP0C0B-3): ACPI Fan
acpifan4 at acpi0 (FAN4, PNP0C0B-4): ACPI Fan
acpitz0 at acpi0 (TZ00)
acpitz0: active cooling level 0: 80.0C
acpitz0: active cooling level 1: 55.0C
acpitz0: active cooling level 2: 0.0C
acpitz0: active cooling level 3: 0.0C
acpitz0: active cooling level 4: 0.0C
acpitz0: levels: critical 105.0 C
acpitz1 at acpi0 (TZ01): cpu0 cpu1
acpitz1: levels: critical 105.0 C, passive 108.0 C, passive cooling
ACPI: Enabled 6 GPEs in block 00 to 3F
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: Intel Haswell Host 

Re: Funded project(s) to improve Linux emulation

2020-11-12 Thread David Holland
On Thu, Nov 12, 2020 at 12:49:13PM +, nia wrote:
 > > Many years ago, I made it so that mixer operations
 > > would fall though and complete successfully (can't find the commit now).
 > > 
 > > [...]
 > 
 > Thanks. Sounds pretty similar to some problems I observed in native
 > libossaudio applications in the past.

I hope this is fully documented in the code so someone doesn't
accidentally "fix" it sometime.

-- 
David A. Holland
dholl...@netbsd.org


Re: boot -d

2020-11-12 Thread Edgar Fuß
> It's probably easier to revert src/sys/arch/x86/x86/db_memrw.c 1.6.
As far as I understood (which may well be wrong) the fixes fixed a real 
problem that only surfaced on that change by chance and might have other 
consequences?


Re: boot -d

2020-11-12 Thread Andreas Gustafsson
Edgar Fuß wrote:
> I had a look at the relevant commits
>   src/sys/arch/x86/include/pmap.h 1.100
>   src/sys/arch/x86/x86/pmap.c 1.330
>   src/sys/arch/xen/x86/xen_pmap.c 1.31
> but unfortunately am unable to back-port the second one to -8.
> I know nothing about pmap, and the -current version uses PTE_P and PTE_PS 
> while the -8 version uses PG_V/nothing.

It's probably easier to revert src/sys/arch/x86/x86/db_memrw.c 1.6.
-- 
Andreas Gustafsson, g...@gson.org


Re: boot -d

2020-11-12 Thread Edgar Fuß
> This looks like PR 53311.
Ah, thanks!

> The commit where that problem started (src/sys/arch/x86/x86/db_memrw.c 1.6) 
> was pulled up to to the -8 branch, and apparently the commits that fixed it 
> were not.
I currently seem to attract pull-ups that mess up things.

I had a look at the relevant commits
src/sys/arch/x86/include/pmap.h 1.100
src/sys/arch/x86/x86/pmap.c 1.330
src/sys/arch/xen/x86/xen_pmap.c 1.31
but unfortunately am unable to back-port the second one to -8.
I know nothing about pmap, and the -current version uses PTE_P and PTE_PS 
while the -8 version uses PG_V/nothing.

Could someone in the know port these fixes to -8, please? Or guide me?


Re: boot -d

2020-11-12 Thread Andreas Gustafsson
Edgar Fuß wrote:
> Real hardware (AMD64), 8.2_STABLE from yesterday, custom config.

This looks like PR 53311.  The commit where that problem started
(src/sys/arch/x86/x86/db_memrw.c 1.6) was pulled up to to the -8
branch, and apparently the commits that fixed it were not.
-- 
Andreas Gustafsson, g...@gson.org


Re: boot -d

2020-11-12 Thread Paul Goyette

On Thu, 12 Nov 2020, Edgar Fu? wrote:


Hello again.

In about the third nesting level of what I wanted to do in the first place,
I tried "boot netbsd -d" in the secondary boot. It loads the kernel, then
complains about the ffs module missing (I don't use modules and don't have
an 8.2 directory on that machine), clears the screen, displays
"fatal breakpoint in supervisor mode" and re-boots.

The problem is that the interesting messages are displayed only for a
fraction of a second. In one out of three tries, I was able to catch them
(partly) using the "slomo" (i.e. high speed) video recording mode of my
iPhone, but of the line after the "fatal breakpoint" message, only the
top half is displayed before it is cleared, so it's very hard to read the
interesting parts.


Any chance of getting the messages via a serial console connection?

Can the problem be reproduced in qemu?

(I'm not sure what the first two nesting levels were, but it would
help to describe what you're trying to boot - kernel config, version,
etc. - and in what environment - hardware, emulator, etc.)


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

boot -d

2020-11-12 Thread Edgar Fuß
Hello again.

In about the third nesting level of what I wanted to do in the first place, 
I tried "boot netbsd -d" in the secondary boot. It loads the kernel, then 
complains about the ffs module missing (I don't use modules and don't have 
an 8.2 directory on that machine), clears the screen, displays 
"fatal breakpoint in supervisor mode" and re-boots.

The problem is that the interesting messages are displayed only for a 
fraction of a second. In one out of three tries, I was able to catch them 
(partly) using the "slomo" (i.e. high speed) video recording mode of my 
iPhone, but of the line after the "fatal breakpoint" message, only the 
top half is displayed before it is cleared, so it's very hard to read the 
interesting parts.

Any hints?


Re: Funded project(s) to improve Linux emulation

2020-11-12 Thread nia
On Thu, Nov 12, 2020 at 11:02:34AM +, Stephen Borrill wrote:
> On Tue, 10 Nov 2020, nia wrote:
> > On Tue, Nov 10, 2020 at 06:36:06PM +0100, Martin Husemann wrote:
> > >  - Improve linux audio support
> > 
> > Does anyone know what's needed yet?
> 
> What springs to mind is that when running the existing net/citrix_ica
> package (or any update of it that will run under current levels of Linux
> emulation), it negotiates a very low bit (8k) rate even when told to use the
> best it can. I assume that it is relying on undefined behaviour from some of
> the ioctls it is using. For instance, when you perform mixer operations from
> within a remote desktop sessoin it does them on the audio devices, not
> mixer. This used to return an error which then killed audio for the
> remainder of the session. Many years ago, I made it so that mixer operations
> would fall though and complete successfully (can't find the commit now).
> 
> So probably just requires tracing the syscalls and seeing what is requested
> and returned.
> 
> -- 
> Stephen
> 

Thanks. Sounds pretty similar to some problems I observed in native
libossaudio applications in the past.


Re: Funded project(s) to improve Linux emulation

2020-11-12 Thread Stephen Borrill

On Tue, 10 Nov 2020, nia wrote:

On Tue, Nov 10, 2020 at 06:36:06PM +0100, Martin Husemann wrote:

 - Improve linux audio support


Does anyone know what's needed yet?


What springs to mind is that when running the existing net/citrix_ica 
package (or any update of it that will run under current levels of Linux 
emulation), it negotiates a very low bit (8k) rate even when told to use 
the best it can. I assume that it is relying on undefined behaviour from 
some of the ioctls it is using. For instance, when you perform mixer 
operations from within a remote desktop sessoin it does them on the audio 
devices, not mixer. This used to return an error which then killed audio 
for the remainder of the session. Many years ago, I made it so that mixer 
operations would fall though and complete successfully (can't find the 
commit now).


So probably just requires tracing the syscalls and seeing what is 
requested and returned.


--
Stephen



Re: Temporary memory allocation from interrupt context

2020-11-12 Thread Martin Husemann
On Thu, Nov 12, 2020 at 10:51:58AM +0100, Lars Reichardt wrote:
> I was/am under the impression that the longtime goal is to move all
> allocations out of interrupt paths and that kmem_intr_alloc is there
> only for transition, I have doubts this will happen.

That was my impression as well, so I am very reluctant to introduce
new uses of kmem_intr_alloc. I am currently investigating options to
rearange the code at hand to completley move all allocations to thread
context.

Martin


Re: Temporary memory allocation from interrupt context

2020-11-12 Thread Lars Reichardt
On Wed, 11 Nov 2020 07:32:56 -0800
Jason Thorpe  wrote:

> > On Nov 11, 2020, at 5:38 AM, Martin Husemann 
> > wrote:
> > 
> > Yes, and of course the real code has that (and works). It's just
> > that 
> > - memoryallocators(9) does not cover this case
> > - kmem_intr_alloc(9) is kinda deprecated - quoting the man page:
> > 
> > These routines are for the special cases.  Normally,
> > pool_cache(9) should be used for memory allocation from
> > interrupt context.
> > 
> >   but how would I use pool_cache(9) here?  
> 
> It's not "deprecated" per se.  Heck, kmem_intr_alloc() was added
> *after* the pool cache API was added :-).  Sounds to me like
> memoryallocators(9) needs to be combed through and updated.
> 
> Anyway, I think what the documentation is trying to convey is that
> "pool_cache is better if you are allocating and freeing fixed size
> objects in a hot code path".  However, you're not allocating
> fixed-size objects, so using pool_cache directly is not appropriate.
> Using kmem_intr_alloc() is preferable to rolling your own logic here,
> and gets you the optimal behavior for this use case.
> 
> -- thorpej
> 

The kmem_intr_alloc is a replacement for the old malloc.

I think dedicated pool_cache is useful if either there is a custom
constructor/destructor or a lot of allocations to save some memory due
to better fitting but that savings have to make up against the
pool_cache structures itself.
That's why we have kmem_intr_alloc a pool_cache just to have a
interrupt safe allocator for a single use case is, simply spoken, just
an enormous overhead.

I was/am under the impression that the longtime goal is to move all
allocations out of interrupt paths and that kmem_intr_alloc is there
only for transition, I have doubts this will happen.
Don't get me wrong moving allocations out of interrupt paths is a good
thing.

Technically kmem_alloc and kmem_intr_alloc are backed by the same pools
so the distinction is only API wise.
(I have a patch using different pool_caches for kmem_alloc and
kmem_intr_alloc it enforces that distinctions and allows some paths in
kmem_allocs pool_caches to skip interrupt disabling... I have no
idea/measured how much of a difference this makes)

Most systems I've looked at just have the equivalent of kmem_intr_alloc.

Having to block on deallocation is even worse but lukily we don't.
If I haven't missed anything technically kmem_intr_free does no
allocations for freeing resources, there is one allocation but if it
fails we skip caching the chunk of memory and return it directly.
(Even that could be tackled by having a bound number of
pool_cache_groups per pool_cache per cpu)
No allocation on the free path is a nice property of vmem.

para

-- 
You will continue to suffer
if you have an emotional reaction to everything that is said to you.
True power is sitting back and observing everything with logic.
If words control you that means everyone else can control you.
Breathe and allow things to pass.

--- Bruce Lee