Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Konstantin Belousov
On Sun, May 10, 2020 at 12:02:19AM +0300, Andriy Gapon wrote:
> On 09/05/2020 23:47, Konstantin Belousov wrote:
> > Might be not, might be it would help due to pmap_delayed_invl_genp().
> > But I would more worry about this 'already started' issue, because
> > this must not happen.  Can you remove the assert from the macro and
> > provide backtrace of 'DI already started' panic ?
> 
> Oh, now that you asked for it, I see that it was a secondary panic (through 
> vt,
> fb, drm code path).
> The first panic was still the same "address %lx beyond the last segment".
> I'll test your suggestion tomorrow.
Yes, the backtrace is reasonable in the sense that VM was recursed due to
panic while already in DI section.  So pmap_remove() from inside panic
handler indeed triggered the right assert.

> 
> 
> #10 0x8080340e in vpanic (fmt=, ap=) at
> /usr/devel/git/motil/sys/kern/kern_shutdown.c:902
> #11 0x808031a3 in panic (fmt=0x8119a998 
> "\265\001ʀ\377\377\377\377") at 
> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
> #12 0x80bb4c05 in pmap_delayed_invl_start_u () at
> /usr/devel/git/motil/sys/amd64/amd64/pmap.c:783
> #13 0x80bb8ede in pmap_remove (pmap=0x812ee930
> , sva=18446741877558251520, eva=) at
> /usr/devel/git/motil/sys/amd64/amd64/pmap.c:5418
> #14 0x80b2b6ad in _kmem_unback (object=,
> addr=18446741877558251520, size=102400) at 
> /usr/devel/git/motil/sys/vm/vm_kern.c:574
> #15 0x80b2b7dd in kmem_free (addr=18446741877558251520, size=102400) 
> at
> /usr/devel/git/motil/sys/vm/vm_kern.c:614
> #16 0x807db77b in free_large (addr=0xfe00ab2e9000, size=102400) at
> /usr/devel/git/motil/sys/kern/kern_malloc.c:599
> #17 free (addr=0xfe00ab2e9000, mtp=0x825f90c0 ) at
> /usr/devel/git/motil/sys/kern/kern_malloc.c:818
> #18 0x82444922 in dc_gamma_release (gamma=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:162
> #19 destruct (plane_state=0xf800080ef800) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:53
> #20 dc_plane_state_free (kref=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:140
> #21 kref_put (kref=, rel=) at
> /usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
> #22 dc_plane_state_release (plane_state=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:146
> #23 0x82442de9 in dc_resource_state_destruct
> (context=0xfe00a2af) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_resource.c:2295
> #24 0x824355d2 in dc_state_free (kref=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc.c:1152
> #25 kref_put (kref=, rel=) at
> /usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
> #26 dc_release_state (context=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc.c:1158
> #27 0x8241f6cc in dm_atomic_destroy_state (obj=,
> state=0xf80020465550) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1667
> #28 0x82569734 in drm_atomic_state_default_clear
> (state=0xf80008274a00) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:202
> #29 0x82569827 in drm_atomic_state_clear (state=) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:240
> #30 __drm_atomic_state_free (ref=0xf80008274a00) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:256
> #31 0x825998d8 in kref_put (kref=0xf80008274a00, rel= out>) at 
> /usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
> #32 drm_atomic_state_put (state=0xf80008274a00) at
> /usr/home/avg/devel/kms-drm/include/drm/drm_atomic.h:385
> #33 restore_fbdev_mode_atomic (fb_helper=, active=true) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_fb_helper.c:461
> #34 0x8259567a in drm_fb_helper_restore_fbdev_mode_unlocked
> (fb_helper=0xf8002096d800) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_fb_helper.c:549
> #35 0x825bcc8a in vt_kms_postswitch (arg=0xf800027c52c0) at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/linux_fb.c:97
> #36 0x806b04b2 in vt_window_switch (vw=0x80e999a8
> ) at /usr/devel/git/motil/sys/dev/vt/vt_core.c:603
> #37 0x806ada0f in vtterm_cngrab (tm=) at
> /usr/devel/git/motil/sys/dev/vt/vt_core.c:1612
> #38 0x8079f776 in cngrab () at 
> /usr/devel/git/motil/sys/kern/kern_cons.c:397
> #39 0x8080335c in vpanic (fmt=0x80cc257f "address %lx beyond 
> the
> last segment", ap=0xfe009e18c890) at
> /usr/devel/git/motil/sys/kern/kern_shutdown.c:887
> #40 0x808031a3 in panic (fmt=0x8119a998 
> "\265\001ʀ\377\377\377\377") at 
> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
> #41 0x80bc2ac3 in pmap_remove_pte (pmap=0xfe00a4cdbb08,
> ptq=0xf800cd2b4000, va=345

Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Andriy Gapon
On 09/05/2020 23:47, Konstantin Belousov wrote:
> Might be not, might be it would help due to pmap_delayed_invl_genp().
> But I would more worry about this 'already started' issue, because
> this must not happen.  Can you remove the assert from the macro and
> provide backtrace of 'DI already started' panic ?

Oh, now that you asked for it, I see that it was a secondary panic (through vt,
fb, drm code path).
The first panic was still the same "address %lx beyond the last segment".
I'll test your suggestion tomorrow.


#10 0x8080340e in vpanic (fmt=, ap=) at
/usr/devel/git/motil/sys/kern/kern_shutdown.c:902
#11 0x808031a3 in panic (fmt=0x8119a998 
"\265\001ʀ\377\377\377\377") at 
/usr/devel/git/motil/sys/kern/kern_shutdown.c:839
#12 0x80bb4c05 in pmap_delayed_invl_start_u () at
/usr/devel/git/motil/sys/amd64/amd64/pmap.c:783
#13 0x80bb8ede in pmap_remove (pmap=0x812ee930
, sva=18446741877558251520, eva=) at
/usr/devel/git/motil/sys/amd64/amd64/pmap.c:5418
#14 0x80b2b6ad in _kmem_unback (object=,
addr=18446741877558251520, size=102400) at 
/usr/devel/git/motil/sys/vm/vm_kern.c:574
#15 0x80b2b7dd in kmem_free (addr=18446741877558251520, size=102400) at
/usr/devel/git/motil/sys/vm/vm_kern.c:614
#16 0x807db77b in free_large (addr=0xfe00ab2e9000, size=102400) at
/usr/devel/git/motil/sys/kern/kern_malloc.c:599
#17 free (addr=0xfe00ab2e9000, mtp=0x825f90c0 ) at
/usr/devel/git/motil/sys/kern/kern_malloc.c:818
#18 0x82444922 in dc_gamma_release (gamma=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:162
#19 destruct (plane_state=0xf800080ef800) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:53
#20 dc_plane_state_free (kref=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:140
#21 kref_put (kref=, rel=) at
/usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
#22 dc_plane_state_release (plane_state=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_surface.c:146
#23 0x82442de9 in dc_resource_state_destruct
(context=0xfe00a2af) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc_resource.c:2295
#24 0x824355d2 in dc_state_free (kref=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc.c:1152
#25 kref_put (kref=, rel=) at
/usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
#26 dc_release_state (context=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/core/dc.c:1158
#27 0x8241f6cc in dm_atomic_destroy_state (obj=,
state=0xf80020465550) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1667
#28 0x82569734 in drm_atomic_state_default_clear
(state=0xf80008274a00) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:202
#29 0x82569827 in drm_atomic_state_clear (state=) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:240
#30 __drm_atomic_state_free (ref=0xf80008274a00) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_atomic.c:256
#31 0x825998d8 in kref_put (kref=0xf80008274a00, rel=) at /usr/devel/git/motil/sys/compat/linuxkpi/common/include/linux/kref.h:74
#32 drm_atomic_state_put (state=0xf80008274a00) at
/usr/home/avg/devel/kms-drm/include/drm/drm_atomic.h:385
#33 restore_fbdev_mode_atomic (fb_helper=, active=true) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_fb_helper.c:461
#34 0x8259567a in drm_fb_helper_restore_fbdev_mode_unlocked
(fb_helper=0xf8002096d800) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/drm_fb_helper.c:549
#35 0x825bcc8a in vt_kms_postswitch (arg=0xf800027c52c0) at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/linux_fb.c:97
#36 0x806b04b2 in vt_window_switch (vw=0x80e999a8
) at /usr/devel/git/motil/sys/dev/vt/vt_core.c:603
#37 0x806ada0f in vtterm_cngrab (tm=) at
/usr/devel/git/motil/sys/dev/vt/vt_core.c:1612
#38 0x8079f776 in cngrab () at 
/usr/devel/git/motil/sys/kern/kern_cons.c:397
#39 0x8080335c in vpanic (fmt=0x80cc257f "address %lx beyond the
last segment", ap=0xfe009e18c890) at
/usr/devel/git/motil/sys/kern/kern_shutdown.c:887
#40 0x808031a3 in panic (fmt=0x8119a998 
"\265\001ʀ\377\377\377\377") at 
/usr/devel/git/motil/sys/kern/kern_shutdown.c:839
#41 0x80bc2ac3 in pmap_remove_pte (pmap=0xfe00a4cdbb08,
ptq=0xf800cd2b4000, va=34523316224, ptepde=3442163815,
free=0xfe009e18c9a0, lockp=0xfe009e18c9b8) at
/usr/devel/git/motil/sys/amd64/amd64/pmap.c:3599
#42 0x80bba98c in pmap_remove_ptes (pmap=0xfe00a4cdbb08,
sva=34523316224, eva=34525413376, pde=0xf800b2515270,
free=0xfe009e18c9a0, lockp=0xfe009e18c9b8) at
/usr/devel/git/motil/sys/amd64/amd64/pmap.c:5378
#43 0x80bb921c in pmap_remove (pmap=, sva=34523316224,
eva=) at /u

Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Konstantin Belousov
On Sat, May 09, 2020 at 11:33:40PM +0300, Andriy Gapon wrote:
> On 09/05/2020 19:50, Konstantin Belousov wrote:
> > On Sat, May 09, 2020 at 07:16:27PM +0300, Andriy Gapon wrote:
> >> On 09/05/2020 19:13, Konstantin Belousov wrote:
> >>> On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
>  On 08/05/2020 19:15, Konstantin Belousov wrote:
> > On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
> >>
> >> I have a reproducible panic with a custom kernel without option NUMA 
> >> while using
> >> amdgpu driver from linuxkpi-based drm:
> >>
> >> panic: address 41ec0 beyond the last segment
> >>
> >> I did some quick debugging and the panic happens when Xorg server 
> >> tries to
> >> access a frame buffer (or something like that).  There is a page fault 
> >> that gets
> >> satisfied by ttm with a fictitious page.
> >>
> >> The stack trace is:
> >> #11 0x808031a3 in panic (fmt=0x8119a998 
> >> "5\003ʀ\377\377\377\377") at 
> >> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
> >> #12 0x80bbc552 in pmap_enter (pmap=, 
> >> va=34504441856,
> >> m=, prot=, flags=, 
> >> psind= >> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
> >> #13 0x80b288be in vm_fault_populate (fs=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:519
> >> #14 vm_fault_allocate (fs=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1032
> >> #15 vm_fault (map=, vaddr=, 
> >> fault_type= >> out>, fault_flags=, m_hold=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1342
> >> #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
> >> vaddr=, fault_type=, fault_flags=0,
> >> signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:589
> >> #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
> >> usermode=, signo=, 
> >> ucode=0x80853250
> >> ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
> >> #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
> >> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
> >>
> >>
> >> The line number in pmap_enter() is incorrect, I guess because of 
> >> optimizations.
> >> The assert seems to be reached via pmap_enter -> 
> >> CHANGE_PV_LIST_LOCK_TO_PHYS ->
> >> PHYS_TO_PV_LIST_LOCK -> pa_index().
> >>
> >> The panic in correct in that the page is fictitious and its physical 
> >> address is
> >> beyond the end of real physical memory.
> >> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but 
> >> !NUMA one
> >> is not.
> >
> > I think you can remove this assert.  pa_index() is always taken by
> > % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
> >
> > Try that and commit if it works for you.
> 
>  I tried this change:
>  diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
>  index 4deed86a76d1a..b834b7f0388b7 100644
>  --- a/sys/amd64/amd64/pmap.c
>  +++ b/sys/amd64/amd64/pmap.c
>  @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
>   #define NPV_LIST_LOCKS  MAXCPU
> 
>   #define PHYS_TO_PV_LIST_LOCK(pa)\
>  -(&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
>  +(&pv_list_locks[((pa) >> PDRSHIFT) % 
>  NPV_LIST_LOCKS])
>   #endif
> 
>   #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\
> 
>  It fixed the original problem, but I got a new panic.
>  "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
>  I guess that !NUMA variant does not get much testing, so I'll probably 
>  just
>  stick with the default.
> >>> Why didn't you just removed the KASSERT from pa_index ?
> >>
> >> Well, I thought it might be useful in the NUMA case.
> >> pa_index() definition is shared between both cases.
> > Might be define the macro two times, for NUMA/non-NUMA.  non-NUMA case
> > does not need the assert, because users take it mod NPV_LIST_LOCKS.
> > 
> 
> I still don't see how that could help with "DI already started" panic.

Might be not, might be it would help due to pmap_delayed_invl_genp().
But I would more worry about this 'already started' issue, because
this must not happen.  Can you remove the assert from the macro and
provide backtrace of 'DI already started' panic ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Andriy Gapon
On 09/05/2020 19:50, Konstantin Belousov wrote:
> On Sat, May 09, 2020 at 07:16:27PM +0300, Andriy Gapon wrote:
>> On 09/05/2020 19:13, Konstantin Belousov wrote:
>>> On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
 On 08/05/2020 19:15, Konstantin Belousov wrote:
> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
>>
>> I have a reproducible panic with a custom kernel without option NUMA 
>> while using
>> amdgpu driver from linuxkpi-based drm:
>>
>> panic: address 41ec0 beyond the last segment
>>
>> I did some quick debugging and the panic happens when Xorg server tries 
>> to
>> access a frame buffer (or something like that).  There is a page fault 
>> that gets
>> satisfied by ttm with a fictitious page.
>>
>> The stack trace is:
>> #11 0x808031a3 in panic (fmt=0x8119a998 
>> "5\003ʀ\377\377\377\377") at 
>> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
>> #12 0x80bbc552 in pmap_enter (pmap=, 
>> va=34504441856,
>> m=, prot=, flags=, 
>> psind=> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
>> #13 0x80b288be in vm_fault_populate (fs=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:519
>> #14 vm_fault_allocate (fs=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:1032
>> #15 vm_fault (map=, vaddr=, 
>> fault_type=> out>, fault_flags=, m_hold=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:1342
>> #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
>> vaddr=, fault_type=, fault_flags=0,
>> signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:589
>> #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
>> usermode=, signo=, ucode=0x80853250
>> ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
>> #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
>> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
>>
>>
>> The line number in pmap_enter() is incorrect, I guess because of 
>> optimizations.
>> The assert seems to be reached via pmap_enter -> 
>> CHANGE_PV_LIST_LOCK_TO_PHYS ->
>> PHYS_TO_PV_LIST_LOCK -> pa_index().
>>
>> The panic in correct in that the page is fictitious and its physical 
>> address is
>> beyond the end of real physical memory.
>> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but 
>> !NUMA one
>> is not.
>
> I think you can remove this assert.  pa_index() is always taken by
> % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
>
> Try that and commit if it works for you.

 I tried this change:
 diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
 index 4deed86a76d1a..b834b7f0388b7 100644
 --- a/sys/amd64/amd64/pmap.c
 +++ b/sys/amd64/amd64/pmap.c
 @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
  #define   NPV_LIST_LOCKS  MAXCPU

  #define   PHYS_TO_PV_LIST_LOCK(pa)\
 -  (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
 +  (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
  #endif

  #define   CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\

 It fixed the original problem, but I got a new panic.
 "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
 I guess that !NUMA variant does not get much testing, so I'll probably just
 stick with the default.
>>> Why didn't you just removed the KASSERT from pa_index ?
>>
>> Well, I thought it might be useful in the NUMA case.
>> pa_index() definition is shared between both cases.
> Might be define the macro two times, for NUMA/non-NUMA.  non-NUMA case
> does not need the assert, because users take it mod NPV_LIST_LOCKS.
> 

I still don't see how that could help with "DI already started" panic.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Gordon Bergling
Hi Michael,

On Sat, May 09, 2020 at 05:42:55PM +0200, Michael Tuexen wrote:
> > On 9. May 2020, at 16:25, Gordon Bergling  wrote:
> > I tried tcp_rack and tcp_bbr, since both are separate TCP stacks. I just 
> > posted the wrong error message. Both TCP stacks weren’t loadable as a 
> > kernel module with just the former mentioned build option.
> > 
> > I currently have build running with both kernel options  you mentioned.
> > 
> > If the build is successful and I can change the default TCP stack to RACK 
> > and BBR I let you know.
> That would be great. I have them running on my machines, but I might have 
> missed something.
> > 
> > Further I didn’t find any documentation within tcp(4) regarding RACK and 
> > BBR. Since I am about to enhance the manpages, I’ll extent tcp(4) about 
> > information about RACK and BBR, but this is a different topic.
> > 
> Yes it is. And I would suggest to use separate man pages, a single one for 
> each stack.
> The the generic man page might refer to them...

My first thoughts on this topic were about to extent tcp(4) and create links to
tcp_rack(4) and tcp_bbr(4), but separate manpages maybe the way to go. I just
have to investigate the respective details. I was once very deep into TCP/IP, 
while building perimeter firewalls with FreeBSD, but this was 20 years ago.

I add you as a reviever for the differential once I have a rough cut 
for the manpages ready.

Best regards,

Gordon

> >> Am 09.05.2020 um 14:37 schrieb Michael Tuexen :
> >>> On 9. May 2020, at 14:18, Gordon Bergling  
> >>> wrote:
> >>> 
> >>> Greetings,
> >>> 
> >>> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following 
> >>> error
> >>> when I try to load for example tcp_bbr.ko.
> >>> z
> >>> kldload: an error occurred while loading module tcp_rack.ko. Please check 
> >>> dmesg(8) for more details.
> >> This indicates that you want to load the RACK stack.
> >> 
> >> Please note that you need for BBR and RACK:
> >> optionsTCPHPTS
> >> in the kernel config and in addition to that for RACK
> >> optionsRATELIMIT
> >> 
> >>> dmesg shows:
> >>> 
> >>> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
> >>> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
> >>> 
> >>> Any hints on solving the problem?
> >>> 
> >>> The kernel config is GENERIC.
> >>> 
> >>> Best regards,
> >>> 
> >>> Gordon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Gordon Bergling
Hi Michael,

with a kernel config which includes

include GENERIC
options RATELIMIT
options TCPHPTS

applied, I could successfully use
net.inet.tcp.functions_default=bbr
to switch the TCP stack.

Thanks for the fast help,

Gordon

> Am 09.05.2020 um 16:25 schrieb Gordon Bergling :
> 
> Hi Michael,
> 
> thanks for your reply.
> 
> I tried tcp_rack and tcp_bbr, since both are separate TCP stacks. I just 
> posted the wrong error message. Both TCP stacks weren’t loadable as a kernel 
> module with just the former mentioned build option.
> 
> I currently have build running with both kernel options  you mentioned.
> 
> If the build is successful and I can change the default TCP stack to RACK and 
> BBR I let you know.
> 
> Further I didn’t find any documentation within tcp(4) regarding RACK and BBR. 
> Since I am about to enhance the manpages, I’ll extent tcp(4) about 
> information about RACK and BBR, but this is a different topic.
> 
> Best regards,
> 
> Gordon
> 
>> Am 09.05.2020 um 14:37 schrieb Michael Tuexen :
>> 
>>> On 9. May 2020, at 14:18, Gordon Bergling  wrote:
>>> 
>>> Greetings,
>>> 
>>> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
>>> when I try to load for example tcp_bbr.ko.
>>> z
>>> kldload: an error occurred while loading module tcp_rack.ko. Please check 
>>> dmesg(8) for more details.
>> This indicates that you want to load the RACK stack.
>> 
>> Please note that you need for BBR and RACK:
>> options  TCPHPTS
>> in the kernel config and in addition to that for RACK
>> options  RATELIMIT
>> 
>> Best regards
>> Michael
>>> 
>>> dmesg shows:
>>> 
>>> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
>>> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
>>> 
>>> Any hints on solving the problem?
>>> 
>>> The kernel config is GENERIC.
>>> 
>>> Best regards,
>>> 
>>> Gordon
>>> ___
>>> freebsd-current@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>> 
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Michael Tuexen
> On 9. May 2020, at 18:07, Gordon Bergling  wrote:
> 
> Hi Michael,
> 
> On Sat, May 09, 2020 at 05:42:55PM +0200, Michael Tuexen wrote:
>>> On 9. May 2020, at 16:25, Gordon Bergling  wrote:
>>> I tried tcp_rack and tcp_bbr, since both are separate TCP stacks. I just 
>>> posted the wrong error message. Both TCP stacks weren’t loadable as a 
>>> kernel module with just the former mentioned build option.
>>> 
>>> I currently have build running with both kernel options  you mentioned.
>>> 
>>> If the build is successful and I can change the default TCP stack to RACK 
>>> and BBR I let you know.
>> That would be great. I have them running on my machines, but I might have 
>> missed something.
>>> 
>>> Further I didn’t find any documentation within tcp(4) regarding RACK and 
>>> BBR. Since I am about to enhance the manpages, I’ll extent tcp(4) about 
>>> information about RACK and BBR, but this is a different topic.
>>> 
>> Yes it is. And I would suggest to use separate man pages, a single one for 
>> each stack.
>> The the generic man page might refer to them...
> 
> My first thoughts on this topic were about to extent tcp(4) and create links 
> to
> tcp_rack(4) and tcp_bbr(4), but separate manpages maybe the way to go. I just
> have to investigate the respective details. I was once very deep into TCP/IP, 
> while building perimeter firewalls with FreeBSD, but this was 20 years ago.
> 
> I add you as a reviever for the differential once I have a rough cut 
> for the manpages ready.
Hi Gordon,

please do so. Don't forget to add rrs@, since he wrote both stacks.

Best regards
Michael
> 
> Best regards,
> 
> Gordon
> 
 Am 09.05.2020 um 14:37 schrieb Michael Tuexen :
> On 9. May 2020, at 14:18, Gordon Bergling  
> wrote:
> 
> Greetings,
> 
> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following 
> error
> when I try to load for example tcp_bbr.ko.
> z
> kldload: an error occurred while loading module tcp_rack.ko. Please check 
> dmesg(8) for more details.
 This indicates that you want to load the RACK stack.
 
 Please note that you need for BBR and RACK:
 optionsTCPHPTS
 in the kernel config and in addition to that for RACK
 optionsRATELIMIT
 
> dmesg shows:
> 
> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
> 
> Any hints on solving the problem?
> 
> The kernel config is GENERIC.
> 
> Best regards,
> 
> Gordon

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Gordon Bergling
Hi Michael,

thanks for your reply.

I tried tcp_rack and tcp_bbr, since both are separate TCP stacks. I just posted 
the wrong error message. Both TCP stacks weren’t loadable as a kernel module 
with just the former mentioned build option.

I currently have build running with both kernel options  you mentioned.

If the build is successful and I can change the default TCP stack to RACK and 
BBR I let you know.

Further I didn’t find any documentation within tcp(4) regarding RACK and BBR. 
Since I am about to enhance the manpages, I’ll extent tcp(4) about information 
about RACK and BBR, but this is a different topic.

Best regards,

Gordon

> Am 09.05.2020 um 14:37 schrieb Michael Tuexen :
> 
>> On 9. May 2020, at 14:18, Gordon Bergling  wrote:
>> 
>> Greetings,
>> 
>> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
>> when I try to load for example tcp_bbr.ko.
>> z
>> kldload: an error occurred while loading module tcp_rack.ko. Please check 
>> dmesg(8) for more details.
> This indicates that you want to load the RACK stack.
> 
> Please note that you need for BBR and RACK:
> options   TCPHPTS
> in the kernel config and in addition to that for RACK
> options   RATELIMIT
> 
> Best regards
> Michael
>> 
>> dmesg shows:
>> 
>> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
>> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
>> 
>> Any hints on solving the problem?
>> 
>> The kernel config is GENERIC.
>> 
>> Best regards,
>> 
>> Gordon
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Konstantin Belousov
On Sat, May 09, 2020 at 07:16:27PM +0300, Andriy Gapon wrote:
> On 09/05/2020 19:13, Konstantin Belousov wrote:
> > On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
> >> On 08/05/2020 19:15, Konstantin Belousov wrote:
> >>> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
> 
>  I have a reproducible panic with a custom kernel without option NUMA 
>  while using
>  amdgpu driver from linuxkpi-based drm:
> 
>  panic: address 41ec0 beyond the last segment
> 
>  I did some quick debugging and the panic happens when Xorg server tries 
>  to
>  access a frame buffer (or something like that).  There is a page fault 
>  that gets
>  satisfied by ttm with a fictitious page.
> 
>  The stack trace is:
>  #11 0x808031a3 in panic (fmt=0x8119a998 
>  "5\003ʀ\377\377\377\377") at 
>  /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
>  #12 0x80bbc552 in pmap_enter (pmap=, 
>  va=34504441856,
>  m=, prot=, flags=, 
>  psind=  out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
>  #13 0x80b288be in vm_fault_populate (fs=) at
>  /usr/devel/git/motil/sys/vm/vm_fault.c:519
>  #14 vm_fault_allocate (fs=) at
>  /usr/devel/git/motil/sys/vm/vm_fault.c:1032
>  #15 vm_fault (map=, vaddr=, 
>  fault_type=  out>, fault_flags=, m_hold=) at
>  /usr/devel/git/motil/sys/vm/vm_fault.c:1342
>  #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
>  vaddr=, fault_type=, fault_flags=0,
>  signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
>  /usr/devel/git/motil/sys/vm/vm_fault.c:589
>  #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
>  usermode=, signo=, ucode=0x80853250
>  ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
>  #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
>  /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
> 
> 
>  The line number in pmap_enter() is incorrect, I guess because of 
>  optimizations.
>  The assert seems to be reached via pmap_enter -> 
>  CHANGE_PV_LIST_LOCK_TO_PHYS ->
>  PHYS_TO_PV_LIST_LOCK -> pa_index().
> 
>  The panic in correct in that the page is fictitious and its physical 
>  address is
>  beyond the end of real physical memory.
>  It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but 
>  !NUMA one
>  is not.
> >>>
> >>> I think you can remove this assert.  pa_index() is always taken by
> >>> % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
> >>>
> >>> Try that and commit if it works for you.
> >>
> >> I tried this change:
> >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> >> index 4deed86a76d1a..b834b7f0388b7 100644
> >> --- a/sys/amd64/amd64/pmap.c
> >> +++ b/sys/amd64/amd64/pmap.c
> >> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
> >>  #define   NPV_LIST_LOCKS  MAXCPU
> >>
> >>  #define   PHYS_TO_PV_LIST_LOCK(pa)\
> >> -  (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
> >> +  (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
> >>  #endif
> >>
> >>  #define   CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\
> >>
> >> It fixed the original problem, but I got a new panic.
> >> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
> >> I guess that !NUMA variant does not get much testing, so I'll probably just
> >> stick with the default.
> > Why didn't you just removed the KASSERT from pa_index ?
> 
> Well, I thought it might be useful in the NUMA case.
> pa_index() definition is shared between both cases.
Might be define the macro two times, for NUMA/non-NUMA.  non-NUMA case
does not need the assert, because users take it mod NPV_LIST_LOCKS.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Error loading tcp_bbr kernel module

2020-05-09 Thread Gordon Bergling
Greetings,

I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
when I try to load for example tcp_bbr.ko.

kldload: an error occurred while loading module tcp_rack.ko. Please check 
dmesg(8) for more details.

dmesg shows:

KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type

Any hints on solving the problem?

The kernel config is GENERIC.

Best regards,

Gordon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Andriy Gapon
On 09/05/2020 19:13, Konstantin Belousov wrote:
> On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
>> On 08/05/2020 19:15, Konstantin Belousov wrote:
>>> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:

 I have a reproducible panic with a custom kernel without option NUMA while 
 using
 amdgpu driver from linuxkpi-based drm:

 panic: address 41ec0 beyond the last segment

 I did some quick debugging and the panic happens when Xorg server tries to
 access a frame buffer (or something like that).  There is a page fault 
 that gets
 satisfied by ttm with a fictitious page.

 The stack trace is:
 #11 0x808031a3 in panic (fmt=0x8119a998 
 "5\003ʀ\377\377\377\377") at 
 /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
 #12 0x80bbc552 in pmap_enter (pmap=, va=34504441856,
 m=, prot=, flags=, 
 psind=>>> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
 #13 0x80b288be in vm_fault_populate (fs=) at
 /usr/devel/git/motil/sys/vm/vm_fault.c:519
 #14 vm_fault_allocate (fs=) at
 /usr/devel/git/motil/sys/vm/vm_fault.c:1032
 #15 vm_fault (map=, vaddr=, 
 fault_type=>>> out>, fault_flags=, m_hold=) at
 /usr/devel/git/motil/sys/vm/vm_fault.c:1342
 #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
 vaddr=, fault_type=, fault_flags=0,
 signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
 /usr/devel/git/motil/sys/vm/vm_fault.c:589
 #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
 usermode=, signo=, ucode=0x80853250
 ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
 #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
 /usr/devel/git/motil/sys/amd64/amd64/trap.c:34


 The line number in pmap_enter() is incorrect, I guess because of 
 optimizations.
 The assert seems to be reached via pmap_enter -> 
 CHANGE_PV_LIST_LOCK_TO_PHYS ->
 PHYS_TO_PV_LIST_LOCK -> pa_index().

 The panic in correct in that the page is fictitious and its physical 
 address is
 beyond the end of real physical memory.
 It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but 
 !NUMA one
 is not.
>>>
>>> I think you can remove this assert.  pa_index() is always taken by
>>> % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
>>>
>>> Try that and commit if it works for you.
>>
>> I tried this change:
>> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
>> index 4deed86a76d1a..b834b7f0388b7 100644
>> --- a/sys/amd64/amd64/pmap.c
>> +++ b/sys/amd64/amd64/pmap.c
>> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
>>  #define NPV_LIST_LOCKS  MAXCPU
>>
>>  #define PHYS_TO_PV_LIST_LOCK(pa)\
>> -(&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
>> +(&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
>>  #endif
>>
>>  #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\
>>
>> It fixed the original problem, but I got a new panic.
>> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
>> I guess that !NUMA variant does not get much testing, so I'll probably just
>> stick with the default.
> Why didn't you just removed the KASSERT from pa_index ?

Well, I thought it might be useful in the NUMA case.
pa_index() definition is shared between both cases.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Konstantin Belousov
On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
> On 08/05/2020 19:15, Konstantin Belousov wrote:
> > On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
> >>
> >> I have a reproducible panic with a custom kernel without option NUMA while 
> >> using
> >> amdgpu driver from linuxkpi-based drm:
> >>
> >> panic: address 41ec0 beyond the last segment
> >>
> >> I did some quick debugging and the panic happens when Xorg server tries to
> >> access a frame buffer (or something like that).  There is a page fault 
> >> that gets
> >> satisfied by ttm with a fictitious page.
> >>
> >> The stack trace is:
> >> #11 0x808031a3 in panic (fmt=0x8119a998 
> >> "5\003ʀ\377\377\377\377") at 
> >> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
> >> #12 0x80bbc552 in pmap_enter (pmap=, va=34504441856,
> >> m=, prot=, flags=, 
> >> psind= >> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
> >> #13 0x80b288be in vm_fault_populate (fs=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:519
> >> #14 vm_fault_allocate (fs=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1032
> >> #15 vm_fault (map=, vaddr=, 
> >> fault_type= >> out>, fault_flags=, m_hold=) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1342
> >> #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
> >> vaddr=, fault_type=, fault_flags=0,
> >> signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:589
> >> #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
> >> usermode=, signo=, ucode=0x80853250
> >> ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
> >> #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
> >> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
> >>
> >>
> >> The line number in pmap_enter() is incorrect, I guess because of 
> >> optimizations.
> >> The assert seems to be reached via pmap_enter -> 
> >> CHANGE_PV_LIST_LOCK_TO_PHYS ->
> >> PHYS_TO_PV_LIST_LOCK -> pa_index().
> >>
> >> The panic in correct in that the page is fictitious and its physical 
> >> address is
> >> beyond the end of real physical memory.
> >> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but 
> >> !NUMA one
> >> is not.
> > 
> > I think you can remove this assert.  pa_index() is always taken by
> > % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
> > 
> > Try that and commit if it works for you.
> 
> I tried this change:
> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> index 4deed86a76d1a..b834b7f0388b7 100644
> --- a/sys/amd64/amd64/pmap.c
> +++ b/sys/amd64/amd64/pmap.c
> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
>  #define  NPV_LIST_LOCKS  MAXCPU
> 
>  #define  PHYS_TO_PV_LIST_LOCK(pa)\
> - (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
> + (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
>  #endif
> 
>  #define  CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\
> 
> It fixed the original problem, but I got a new panic.
> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
> I guess that !NUMA variant does not get much testing, so I'll probably just
> stick with the default.
Why didn't you just removed the KASSERT from pa_index ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ?

2020-05-09 Thread Andriy Gapon
On 08/05/2020 19:15, Konstantin Belousov wrote:
> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
>>
>> I have a reproducible panic with a custom kernel without option NUMA while 
>> using
>> amdgpu driver from linuxkpi-based drm:
>>
>> panic: address 41ec0 beyond the last segment
>>
>> I did some quick debugging and the panic happens when Xorg server tries to
>> access a frame buffer (or something like that).  There is a page fault that 
>> gets
>> satisfied by ttm with a fictitious page.
>>
>> The stack trace is:
>> #11 0x808031a3 in panic (fmt=0x8119a998 
>> "5\003ʀ\377\377\377\377") at 
>> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
>> #12 0x80bbc552 in pmap_enter (pmap=, va=34504441856,
>> m=, prot=, flags=, 
>> psind=> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
>> #13 0x80b288be in vm_fault_populate (fs=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:519
>> #14 vm_fault_allocate (fs=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:1032
>> #15 vm_fault (map=, vaddr=, 
>> fault_type=> out>, fault_flags=, m_hold=) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:1342
>> #16 0x80b26e7e in vm_fault_trap (map=0xfe0017cd39e8,
>> vaddr=, fault_type=, fault_flags=0,
>> signo=0xfe00a810dbc4, ucode=0xfe00a810dbc0) at
>> /usr/devel/git/motil/sys/vm/vm_fault.c:589
>> #17 0x80bcf89c in trap_pfault (frame=0xfe00a810dc00,
>> usermode=, signo=, ucode=0x80853250
>> ) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
>> #18 0x80bceeec in trap (frame=0xfe00a810dc00) at
>> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
>>
>>
>> The line number in pmap_enter() is incorrect, I guess because of 
>> optimizations.
>> The assert seems to be reached via pmap_enter -> CHANGE_PV_LIST_LOCK_TO_PHYS 
>> ->
>> PHYS_TO_PV_LIST_LOCK -> pa_index().
>>
>> The panic in correct in that the page is fictitious and its physical address 
>> is
>> beyond the end of real physical memory.
>> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but !NUMA 
>> one
>> is not.
> 
> I think you can remove this assert.  pa_index() is always taken by
> % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
> 
> Try that and commit if it works for you.

I tried this change:
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 4deed86a76d1a..b834b7f0388b7 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
 #defineNPV_LIST_LOCKS  MAXCPU

 #definePHYS_TO_PV_LIST_LOCK(pa)\
-   (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
+   (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
 #endif

 #defineCHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa)  do {\

It fixed the original problem, but I got a new panic.
"DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
I guess that !NUMA variant does not get much testing, so I'll probably just
stick with the default.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Michael Tuexen
> On 9. May 2020, at 16:25, Gordon Bergling  wrote:
> 
> Hi Michael,
> 
> thanks for your reply.
> 
> I tried tcp_rack and tcp_bbr, since both are separate TCP stacks. I just 
> posted the wrong error message. Both TCP stacks weren’t loadable as a kernel 
> module with just the former mentioned build option.
> 
> I currently have build running with both kernel options  you mentioned.
> 
> If the build is successful and I can change the default TCP stack to RACK and 
> BBR I let you know.
That would be great. I have them running on my machines, but I might have 
missed something.
> 
> Further I didn’t find any documentation within tcp(4) regarding RACK and BBR. 
> Since I am about to enhance the manpages, I’ll extent tcp(4) about 
> information about RACK and BBR, but this is a different topic.
> 
Yes it is. And I would suggest to use separate man pages, a single one for each 
stack.
The the generic man page might refer to them...

Best regards
Michael
> Best regards,
> 
> Gordon
> 
>> Am 09.05.2020 um 14:37 schrieb Michael Tuexen :
>> 
>>> On 9. May 2020, at 14:18, Gordon Bergling  wrote:
>>> 
>>> Greetings,
>>> 
>>> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
>>> when I try to load for example tcp_bbr.ko.
>>> z
>>> kldload: an error occurred while loading module tcp_rack.ko. Please check 
>>> dmesg(8) for more details.
>> This indicates that you want to load the RACK stack.
>> 
>> Please note that you need for BBR and RACK:
>> options  TCPHPTS
>> in the kernel config and in addition to that for RACK
>> options  RATELIMIT
>> 
>> Best regards
>> Michael
>>> 
>>> dmesg shows:
>>> 
>>> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
>>> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
>>> 
>>> Any hints on solving the problem?
>>> 
>>> The kernel config is GENERIC.
>>> 
>>> Best regards,
>>> 
>>> Gordon
>>> ___
>>> freebsd-current@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>> 
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Xorg question

2020-05-09 Thread Filippo Moretti
I run the latest current and I have the following packages 
installed>xf86-input-keyboard-1.9.0_4 
xf86-input-libinput-0.28.2_1  
xf86-input-mouse-1.9.3_3 Should I keep all of them or may I keep 
xf86-input-libinputThank youFilippo
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread Michael Tuexen
> On 9. May 2020, at 14:18, Gordon Bergling  wrote:
> 
> Greetings,
> 
> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
> when I try to load for example tcp_bbr.ko.
> z
> kldload: an error occurred while loading module tcp_rack.ko. Please check 
> dmesg(8) for more details.
This indicates that you want to load the RACK stack.

Please note that you need for BBR and RACK:
options TCPHPTS
in the kernel config and in addition to that for RACK
options RATELIMIT

Best regards
Michael
> 
> dmesg shows:
> 
> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
> 
> Any hints on solving the problem?
> 
> The kernel config is GENERIC.
> 
> Best regards,
> 
> Gordon
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Error loading tcp_bbr kernel module

2020-05-09 Thread David Wolfskill
On Sat, May 09, 2020 at 02:18:51PM +0200, Gordon Bergling wrote:
> Greetings,
> 
> I build -CURRENT with WITH_EXTRA_TCP_STACKS=1, but I got the following error
> when I try to load for example tcp_bbr.ko.
> 
> kldload: an error occurred while loading module tcp_rack.ko. Please check 
> dmesg(8) for more details.
> 
> dmesg shows:
> 
> KLD tcp_bbr.ko: depends on tcphpts - not available or version mismatch
> linker_load_file: /boot/kernel/tcp_bbr.ko - unsupported file type
> 
> Any hints on solving the problem?
> 
> The kernel config is GENERIC.
> 
> Best regards,
> 
> Gordon
> 

Looks as if option TCPHPTS isn't in GENERIC, and it's a requisite for
BBR.

I'd probably create a custom kernel config that amounted to:

include GENERIC
options TCPHPTS

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Donald Trump had 3 years to replenish the US stockpile of PPE -- and failed.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: "make buildworld" fails for r360785?

2020-05-09 Thread Robert Huff


Dimitry Andric writes:
>   /usr/src/contrib/llvm-project/clang/lib/Basic/SourceManager.cpp:1228:10:
>   fatal error: 'emmintrin.h' file not found
>   #include 
>    ^
>  ...
>  > In file included from 
> /usr/src/contrib/llvm-project/clang/lib/Basic/SourceManager.cpp:1228:
>  > In file included from /usr/include/emmintrin.h:13:
>  > /usr/include/xmmintrin.h:27:10: fatal error: 'mm_malloc.h' file not found
>  > #include 
>  > ^
>  > 1 error generated.
>  > *** Error code 1
>  
>  During which stage of buildworld is this? If it is during the
>  cross-tools stage, your host environment is busted somehow.

That is the latest stage listed by the complete build log.
(Which I can make available if it's useful.)
The only complaint I see is two lines at the top of the log:

make[1]: "/usr/src/Makefile.inc1" line 325: SYSTEM_COMPILER: libclang will be 
built for bootstrapping a cross-compiler.
make[1]: "/usr/src/Makefile.inc1" line 330: SYSTEM_LINKER: libclang will be 
built for bootstrapping a cross-linker.

Makefile.inc1 was downloaded with the fresh source tree, and I
have already posted make.conf and src.conf.
I am not sure what else could be broken, nor how to diagnose it.

>  This appears to happen to some people on this list, for unknown reasons.
>  My guess is they either run "make delete-old" before running buildworld
>  (which is the wrong order!),

I did recently run "delete-old" ... but only as directed by the
official documentation.
But ... let's say that somehow happened.  How do I recover?  Is
there a bootstrap process?  

> or have done an earlier 
> buildworld where
>  they explicitly disabled clang, so the intrinsics headers never get
>  installed.

Other than a custom kernel config file, I don't touch the system
build process.  



Respectfully,


Robert Huff
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "make buildworld" fails for r360785?

2020-05-09 Thread Dimitry Andric
On 9 May 2020, at 05:10, Robert Huff  wrote:
> 
> Chris writes:
>>> "make buildowrld" fails with:
...
 /usr/src/contrib/llvm-project/clang/lib/Basic/SourceManager.cpp:1228:10:
 fatal error: 'emmintrin.h' file not found
 #include 
  ^
...
> In file included from 
> /usr/src/contrib/llvm-project/clang/lib/Basic/SourceManager.cpp:1228:
> In file included from /usr/include/emmintrin.h:13:
> /usr/include/xmmintrin.h:27:10: fatal error: 'mm_malloc.h' file not found
> #include 
> ^
> 1 error generated.
> *** Error code 1

During which stage of buildworld is this? If it is during the
cross-tools stage, your host environment is busted somehow.

This appears to happen to some people on this list, for unknown reasons.
My guess is they either run "make delete-old" before running buildworld
(which is the wrong order!), or have done an earlier buildworld where
they explicitly disabled clang, so the intrinsics headers never get
installed.

-Dimitry



signature.asc
Description: Message signed with OpenPGP


Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-09 Thread Mark Millard
[I caused nfsd to having things shifted in mmeory some to
see it it tracked content vs. page boundary for where the
zeros stop. Non-nfsd examples omitted.]

> . . .
>> nfsd hit an assert, failing ret == sz_size2index_compute(size)
> 
> [Correction: That should have referenced sz_index2size_lookup(index).]
> 
>> (also, but a different caller of sz_size2index):
> 
> [Correction: The "also" comment should be ignored:
> sz_index2size_lookup(index) is referenced below.]
> 
>> 
>> (gdb) bt
>> #0  thr_kill () at thr_kill.S:4
>> #1  0x502b2170 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
>> #2  0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> #3  0x50206104 in sz_index2size_lookup (index=) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> #4  sz_index2size (index=) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
>> #5  ifree (tsd=0x50094018, ptr=0x50041028, tcache=0x50094138, 
>> slow_path=) at jemalloc_jemalloc.c:2583
>> #6  0x50205cac in __je_free_default (ptr=0x50041028) at 
>> jemalloc_jemalloc.c:2784
>> #7  0x50206294 in __free (ptr=0x50041028) at jemalloc_jemalloc.c:2852
>> #8  0x50287ec8 in ns_src_free (src=0x50329004, srclistsize=) 
>> at /usr/src/lib/libc/net/nsdispatch.c:452
>> #9  ns_dbt_free (dbt=0x50329000) at /usr/src/lib/libc/net/nsdispatch.c:436
>> #10 vector_free (vec=0x50329000, count=, esize=12, 
>> free_elem=) at /usr/src/lib/libc/net/nsdispatch.c:253
>> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578
>> #12 0x5028d958 in __cxa_finalize (dso=0x0) at 
>> /usr/src/lib/libc/stdlib/atexit.c:240
>> #13 0x502117f8 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:74
>> #14 0x10013f9c in child_cleanup (signo=) at 
>> /usr/src/usr.sbin/nfsd/nfsd.c:969
>> #15 
>> #16 0x in ?? ()
>> 
>> (gdb) up 3
>> #3  0x50206104 in sz_index2size_lookup (index=) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> 200  assert(ret == sz_index2size_compute(index));
>> 
>> (ret is optimized out.)
>> 
>> 197  JEMALLOC_ALWAYS_INLINE size_t
>> 198  sz_index2size_lookup(szind_t index) {
>> 199  size_t ret = (size_t)sz_index2size_tab[index];
>> 200  assert(ret == sz_index2size_compute(index));
>> 201  return ret;
>> 202  }
> 
> (gdb) print/x __je_sz_index2size_tab
> $3 = {0x0 }
> 
> Also:
> 
> (gdb) x/4x __je_arenas+16368/4
> 0x5030cab0 <__je_arenas+16368>:   0x  0x  
> 0x  0x
> (gdb) print/x __je_arenas_lock
>  
> $8 = {{{prof_data = {tot_wait_time = {ns = 0x0}, max_wait_time = {ns = 0x0}, 
> n_wait_times = 0x0, n_spin_acquired = 0x0, max_n_thds = 0x0, n_waiting_thds = 
> {repr = 0x0}, n_owner_switches = 0x0, 
>   prev_owner = 0x0, n_lock_ops = 0x0}, lock = 0x0, postponed_next = 0x0, 
> locked = {repr = 0x0}}}, witness = {name = 0x0, rank = 0x0, comp = 0x0, 
> opaque = 0x0, link = {qre_next = 0x0, 
> qre_prev = 0x0}}, lock_order = 0x0}
> (gdb) print/x __je_narenas_auto
> $9 = 0x0
> (gdb) print/x malloc_conf  
> $10 = 0x0
> (gdb) print/x __je_ncpus 
> $11 = 0x0
> (gdb) print/x __je_manual_arena_base
> $12 = 0x0
> (gdb) print/x __je_sz_pind2sz_tab   
> $13 = {0x0 }
> (gdb) print/x __je_sz_size2index_tab
> $1 = {0x0 , 0x1a, 0x1b , 0x1c  64 times>}
> 
>> Booting and immediately trying something like:
>> 
>> service nfsd stop
>> 
>> did not lead to a failure. But may be after
>> a while it would and be less drastic than a
>> reboot or power down.
> 
> More detail:
> 
> So, for rpcbind and nfds at some point a large part of
> __je_sz_size2index_tab is being stomped on, as is all of
> __je_sz_index2size_tab and more.
> 
> . . .
> 
> For nfsd, it is similar (again showing the partially
> non-zero live process context instead of the all-zeros
> from the .core file):
> 
> 0x5030cab0 <__je_arenas+16368>:   0x  0x  
> 0x  0x0009
> 0x5030cac0 <__je_arenas_lock>:0x  0x  
> 0x  0x
> 0x5030cad0 <__je_arenas_lock+16>: 0x  0x  
> 0x  0x
> 0x5030cae0 <__je_arenas_lock+32>: 0x  0x  
> 0x  0x
> 0x5030caf0 <__je_arenas_lock+48>: 0x  0x  
> 0x  0x
> 0x5030cb00 <__je_arenas_lock+64>: 0x  0x502ff070  
> 0x  0x
> 0x5030cb10 <__je_arenas_lock+80>: 0x500ebb04  0x0003  
> 0x  0x
> 0x5030cb20 <__je_arenas_lock+96>: 0x5030cb10  0x5030cb10  
> 0x  0x
> 
> Then the memory in the crash continues to be zero until:
> 
> 0x5030d000 <__je_sz_size2index_tab+384>:  0x1a1b1b1b  0x1b1b1b1b  
> 0x1b1b1b1b  0x1b1b1b1b
> 
> Notice the interesting page boundary for where non-zero
> is first available again!
>