[unveil] kernel panic for mfs/read-only filesystems during reboot

2018-08-25 Thread Jiri B .
I'm got kernel panic on OpenBSD 6.4-beta (GENERIC.MP) #260: Sat Aug 25 02:10:42 
MDT 2018
when experimenting with mfs and read-only filesystems.

# shutdown -r now
shutdown: unveil: Read-only file system
# mount   
/dev/sd0a on / type ffs (local, read-only)
mfs:35132 on /dev type mfs (asynchronous, local, noexec, nosuid, size=102400 
512-blocks)
mfs:74539 on /etc type mfs (asynchronous, local, nodev, nosuid, size=102400 
512-blocks)
mfs:26844 on /tmp type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=204800 512-blocks)
mfs:54540 on /var type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=102400 512-blocks)
mfs:55695 on /var/log type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=262144 512-blocks)
/dev/sd0d on /usr type ffs (local, nodev, read-only)
/dev/sd0e on /usr/local type ffs (local, nodev, nosuid, read-only)
/dev/sd0f on /data type ffs (local, nodev, nosuid, read-only)
# reboot
panic: kernel diagnostic assertion "vp->v_uvcount == 0" failed: file 
"/usr/src/sys/kern/kern_unveil.c", line 748
Stopped at  db_enter+0x12:  popq%r11
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*469083  35132  0   0  01K mount_mfs
 240741  31688  0 0x14000  0x2000  reaper
db_enter() at db_enter+0x12
panic() at panic+0x120
__assert(816afd14,800021147ec0,0,ff006fc21338) at __assert+0x24

unveil_removevnode(b3fdf0663036209b) at unveil_removevnode+0xf2
dounmount_leaf(36a320cd3cded2ab,80096c00,0) at dounmount_leaf+0x69
dounmount(bf305e5e97ceba2d,80096c00,80008008) at dounmount+0xfa

mfs_start(3167c81713baac45,80096c00,ff007f63e000) at mfs_start+0xf9

sys_mount(a36477f38b506169,150,80008008) at sys_mount+0x5b5
syscall(bc035fa0eb20afa4) at syscall+0x32a
Xsyscall(6,15,7f7da610,15,7f7daaac,7f7daf8d) at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7dae00, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> show panic
kernel diagnostic assertion "vp->v_uvcount == 0" failed: file "/usr/src/sys/ker
n/kern_unveil.c", line 748
ddb{1}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  504236 VM pages: 10856 active, 2913 inactive, 0 wired, 460030 free (57336 zer
o)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=16807, free-target=22409, inactive-target=0, wired-max=168078
  faults=131805, traps=130457, intrs=12942, ctxswitch=38035 fpuswitch=0
  softint=15328, syscalls=340659, kmapent=16
  fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=7550(7550), anget(retries)=49584(0), amapcopy=56876
neighbor anon/obj pg=3853/52486, gets(lock/unlock)=24354/7550
cases: anon=40560, anoncow=9024, obj=20857, prcopy=3497, przero=57861
  daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=526127, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
objs(kern)=0x81ca11a0
ddb{1}> show bcstats
Current Buffer Cache status:
numbufs 5869 busymapped 0, delwri 12
kvaslots 6302 avail kva slots 6302
bufpages 22694, dmapages 22694, dirtypages 21
pendingreads 0, pendingwrites 0
highflips 0, highflops 0, dmaflips 0
ddb{1}> trace
db_enter() at db_enter+0x12
panic() at panic+0x120
__assert(816afd14,800021147ec0,0,ff006fc21338) at __assert+0x24

unveil_removevnode(b3fdf0663036209b) at unveil_removevnode+0xf2
dounmount_leaf(36a320cd3cded2ab,80096c00,0) at dounmount_leaf+0x69
dounmount(bf305e5e97ceba2d,80096c00,80008008) at dounmount+0xfa

mfs_start(3167c81713baac45,80096c00,ff007f63e000) at mfs_start+0xf9

sys_mount(a36477f38b506169,150,80008008) at sys_mount+0x5b5
syscall(bc035fa0eb20afa4) at syscall+0x32a
Xsyscall(6,15,7f7da610,15,7f7daaac,7f7daf8d) at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7dae00, count: -10

- steps to reproduce:
* create fs setup as in mount output below
* shutdown -r
* reboot

# mount   
/dev/sd0a on / type ffs (local, read-only)
mfs:35132 on /dev type mfs (asynchronous, local, noexec, nosuid, size=102400 
512-blocks)
mfs:74539 on /etc type mfs (asynchronous, local, nodev, nosuid, size=102400 
512-blocks)
mfs:26844 on /tmp type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=204800 512-blocks)
mfs:54540 on /var type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=102400 512-blocks)
mfs:55695 on /var/log type mfs (asynchronous, local, nodev, noexec, nosuid, 
size=262144 512-blocks)
/dev/sd0d on /usr type ffs (local, nodev, read-only)
/dev/sd0e on /usr/local type ffs (local, nodev, nosuid, read-only)
/dev/sd0f on /data type 

Re: axen Ethernet device errors on both USB3.0 and USB2.0 ports

2018-08-25 Thread Remi Locherer
On Fri, Aug 24, 2018 at 06:02:20AM +, sc.dy...@gmail.com wrote:
> On 2018/08/19 09:40, Stefan Sperling wrote:
> > On Sun, Aug 19, 2018 at 11:05:04AM +0200, Stefan Sperling wrote:
> >> On Sun, Aug 19, 2018 at 09:56:33AM +0200, Remi Locherer wrote:
> >>> It would help if you could send a clean version that applies to -current.
> >>
> >> One of the attachments was in fact clean but yes, this
> >> thread has been much too noisy to follow easily.
> >>
> >> Try this.
> > 
> > Unfortunately, while this diff does indeed work on xhci(4), I've just
> > found that this diff breaks axen(4) attached to ehci(4) completely.
> > 
> > I see several "axen0: rxeof: too short transfer" in dmesg and
> > almost all packets are lost. Even my Ethernet switch gives up
> > eventually and disables the port.
> > 
> > So this diff is not ready to be committed.
> 
> I didn't check if axen works on ehci.
> On my ehci (intel PCH) that bug is reproduced, and
> I found that it works on ehci with 16kB RX buffer.
> I preserve the original bufsz decision code.

I applied axen5.diff and xhci.diff and tested the resulting kernel on
an old Samsung notebook that has ehci and xhci (demesg and usbdevs below).

When the axen dongle is attached via xhci it gets link but dhclient
never gets a lease. This works when attached via ehci. But after some
light traffic (browsing with netsurf) the systme panics. Here the output
from ddb (copied by hand):


kernel: page fault trap, code=0
Stopped at  memcpy+0x15:repe movsq  (%rsi),%es:(rdi)
ddb{1}> show panic
kernel page fault
uvm_fault(0xffdef19438, 0x0, 0, 1) -> e
memcpy(79e3..) at memcpy+0x15
end trace frame: 0x800032e06cd0, cound: 0
ddb{1} trace
memcpy(79e...) at memcpy+0x15
ptcread(5b11cd.) at ptcread+0x1eb
spec_read(70e.) at spec_read+0xab
VOP_READ(4b037..) at VOP_RAED+0x49
vn_read(af8b.) at dofilereadv+0xe0
sys_read(9862) at sys_read+0x5c
syscall(822b.) at syscall+0x32a
Xsyscall(0,3,0,3,f,1954e...) at Xsyscall+0x128
end of kernel
end trace frame 0x7f7d3430, count: -9
ddb{1}> mach ddb 0
Stopped at  x86_ipi_db+0x12:  popq%r11
ddb{0}> trace
x86_ipi_db(5d...) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi(9,ff.) at Xresume_lapic_ipi+0x23
___mp_lock(58ifaff) at ___mp_lock+0x68
intr_handler(a26f9) at intr_handler+0x40
Xintr_ioapic_edge12_untramp(6,fff...) at Xintr_ioapic_edge12_untramp+0x19f
___mp_lock(58faff...) at___mp_lock+0x68
intr_handler(a26f9) at intr_handler+040
Xintr_ioapic_edge25_untramp(0,3,..) at Xintr_ioapic_edge25_untramp+0x19f
acpicpu_idle() at acpicpu_idle+0x166
sched_idle(0) at sced_idle+0x245
end trace frame: 0x0, count: -11
ddb{0}


This does not happen when running a snapshot kernel.

dmesg + usbdevs -vvv

OpenBSD 6.4-beta (GENERIC.MP) #0: Sat Aug 25 19:45:29 CEST 2018
r...@530u.relo.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8462659584 (8070MB)
avail mem = 8196993024 (7817MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xe0840 (63 entries)
bios0: vendor Phoenix Technologies Ltd. version "05XK" date 02/10/2012
bios0: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP SLIC SSDT ASF! HPET APIC MCFG SSDT SSDT UEFI UEFI UEFI
acpi0: wakeup devices P0P1(S4) GLAN(S4) HDEF(S4) PXSX(S4) RP01(S4) PXSX(S4) 
RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP06(S4) PXSX(S4) 
RP07(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz, 1597.58 MHz, 06-2a-07
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz, 1895.69 MHz, 06-2a-07
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 14 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-25 Thread Tom Murphy
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
>  I've narrowed it down. 
>
>Last kernel where adb works:  June 24 09:59:46 MDT 2018
>1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
>
>  I did notice when my phone's booted into LineageOS and I have ADB
>turned on, when I connect the phone via USB I get:
>
>ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8
>
>  However, I'm not able to actually connect to it with adb shell or
>anything else. It says: Error device offline or something.
>
>  When I boot the phone into recovery mode, the phone shows up like
>this when I plug it in:
>
>  ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4
>
>  (different name!) and I am able use adb shell, adb push/pull, etc..
>
>  I think there's some issue with LineageOS' ADB mode, but that's not
>really relevant here (it's a separate issue and outside of OpenBSD
>perhaps though I'll have to test with Linux or some other OS.)
>
>  I'm going to look at the commits next.
>
>-Tom

I can verify that this commit is what makes the kernel panic when adb is
run and an Android device is connected to the machine with ADB enabled:

https://marc.info/?l=openbsd-cvs=152996258723362=2

CVSROOT:/cvs
Module name:src
Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27

Modified files:
sys/kern   : vfs_syscalls.c 
lib/libc/sys   : dup.2 

Log message:
During open(2), release the fdp lock before calling vn_open(9).
This lets other threads of the process modify the file descriptor
table even if the vn_open(9) call blocks.

The change has an effect on dup2(2) and dup3(2). If the new descriptor
is the same as the one reserved by an unfinished open(2), the system
call will fail with error EBUSY. The accept(2) system call already
behaves like this.

Issue pointed out by art@ via mpi@

Tested in a bulk build by ajacoutot@
OK mpi@

* * *

I tested kernels compiled just before that commit and right after, and that
commit makes the kernel panic.

-Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-25 Thread Tom Murphy
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Bryan Linton wrote:
> So I found some time to try to bisect this, but was hampered by my
> phone being somewhat temperamental.
> 
> Everything up to July 3rd was fine.  No crashes occurred.
> 
> On a July 15th checkout, my system panicked when trying to run adb
> with my phone connected.
> 
> Unfortunately when I tried to bisect this further, my phone began
> refusing to connect to my computer.  I get a generic
>   "uhub0: device problem, disabling port 2"
> error and cannot get my phone to attach to my computer even if I
> reboot it, plug/unplug it, etc.
> 
> I'll see if I can try to bisect this further once I figure out
> what the problem is with my phone, but in the meantime, I wanted
> to at least update the bugs@ list with my findings so far.
> 
> I see a few potential commits in that time-frame that could be
> responsible, so I'm going to see if I can manage to narrow this
> down even further.
> 
> -- 
> Bryan

Hi Bryan,

  I've narrowed it down. 

Last kernel where adb works:  June 24 09:59:46 MDT 2018
1st Kernel where adb panics:  June 25 13:10:32 MDT 2018

  I did notice when my phone's booted into LineageOS and I have ADB
turned on, when I connect the phone via USB I get:

ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8

  However, I'm not able to actually connect to it with adb shell or
anything else. It says: Error device offline or something.

  When I boot the phone into recovery mode, the phone shows up like
this when I plug it in:

  ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4

  (different name!) and I am able use adb shell, adb push/pull, etc..

  I think there's some issue with LineageOS' ADB mode, but that's not
really relevant here (it's a separate issue and outside of OpenBSD
perhaps though I'll have to test with Linux or some other OS.)

  I'm going to look at the commits next.

-Tom