Re: Traffic "corruption" in 12-stable

2020-07-26 Thread Eugene Grosbein
27.07.2020 5:16, Joe Clarke wrote:

> About two weeks ago, I upgraded from the latest 11-stable to the latest 
> 12-stable.  After that, I periodically see the network throughput come to a 
> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
> acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
> ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
> VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
> the default 1500.
> 
> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
> times), I know the problem has occurred because my lldpd reports:
> 
> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
> bridge0
> 
> And if I turn on ipfw verbose messages, I see tons of:
> 
> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
> 
> This leads to me to believe packets are being corrupted on ingress.  I’ve 
> applied all the recent iflib changes, but the problem persists. What causes 
> it, I don’t know.
> 
> The only thing that changed (and yes, it’s a big one) is I upgraded to 
> 12-stable.  Meaning, the rest of the network infra and topology has remained 
> the same.  This did not happen at all in 11-stable.
> 
> I’m open to suggestions.

First, try: ifconfig $ifname -rxcsum -txcsum

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Traffic "corruption" in 12-stable

2020-07-26 Thread Joe Clarke
About two weeks ago, I upgraded from the latest 11-stable to the latest 
12-stable.  After that, I periodically see the network throughput come to a 
near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
the default 1500.

Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
times), I know the problem has occurred because my lldpd reports:

Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0

And if I turn on ipfw verbose messages, I see tons of:

Jul 26 16:02:23 namale kernel: ipfw: pullup failed

This leads to me to believe packets are being corrupted on ingress.  I’ve 
applied all the recent iflib changes, but the problem persists. What causes it, 
I don’t know.

The only thing that changed (and yes, it’s a big one) is I upgraded to 
12-stable.  Meaning, the rest of the network infra and topology has remained 
the same.  This did not happen at all in 11-stable.

I’m open to suggestions.

Thanks.

Joe

---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Laundry

2020-07-26 Thread Doug Hardie
> On 26 July 2020, at 13:31, Konstantin Belousov  wrote:
> 
> On Sun, Jul 26, 2020 at 01:11:33PM -0700, Doug Hardie wrote:
>> I have a production system (12.1-RELEASE-p6) that is showing around 1 GB of 
>> Laundry pages.  There are over 6 Gb Inact and 1 Gb free.  I can understand 
>> why the system would want to not prioritize laundering those pages as there 
>> is plenty of available pages.  However, does that mean that I have about 1 
>> GB of updated files that have not been written back to disk?  If so, then 
>> there is a significant issue with power failures and loss of data.
>> 
> Laundry keeps both file-backed (named) pages and swap-backed (anonymous)
> pages. Most likely it means that you have 1G of anonymous dirty
> mappings, for instance programs data/bss and malloced.

I don't believe there are very man anonymous pages, but there are lots of named 
pages.  If those are dirty, does that mean they have not yet been written back 
to disk?  The loss of those would be quite detrimental if not written back to 
disk.

-- Doug

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Laundry

2020-07-26 Thread Konstantin Belousov
On Sun, Jul 26, 2020 at 01:11:33PM -0700, Doug Hardie wrote:
> I have a production system (12.1-RELEASE-p6) that is showing around 1 GB of 
> Laundry pages.  There are over 6 Gb Inact and 1 Gb free.  I can understand 
> why the system would want to not prioritize laundering those pages as there 
> is plenty of available pages.  However, does that mean that I have about 1 GB 
> of updated files that have not been written back to disk?  If so, then there 
> is a significant issue with power failures and loss of data.
> 
Laundry keeps both file-backed (named) pages and swap-backed (anonymous)
pages. Most likely it means that you have 1G of anonymous dirty
mappings, for instance programs data/bss and malloced.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Laundry

2020-07-26 Thread Doug Hardie
I have a production system (12.1-RELEASE-p6) that is showing around 1 GB of 
Laundry pages.  There are over 6 Gb Inact and 1 Gb free.  I can understand why 
the system would want to not prioritize laundering those pages as there is 
plenty of available pages.  However, does that mean that I have about 1 GB of 
updated files that have not been written back to disk?  If so, then there is a 
significant issue with power failures and loss of data.

-- Doug

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Crash in stable 363430 and higher

2020-07-26 Thread peter . blok
Hi,

I’m getting the following crash during startup. It seems strongswan is setting 
a reqid.

Commit r363430 is on if_bridge. The IPSec interfaces are not bridged at all, so 
I’m clueless to why this crash relates to this commit. The only commonality is 
that the crash is Epoch related and the commit as well.

(kgdb) list
418  * Propagate our priority to any other waiters to 
prevent us
419  * from starving them. They will have their original 
priority
420  * restore on exit from epoch_wait().
421  */
422 curwaittd = tdwait->et_td;
423 if (!TD_IS_INHIBITED(curwaittd) && 
curwaittd->td_priority > td->td_priority) {
424 critical_enter();
425 thread_unlock(td);
426 thread_lock(curwaittd);
427 sched_prio(curwaittd, td->td_priority);
(kgdb) p/x tdwait
$3 = 0xfe0075dca778
(kgdb) p/x tdwait->et_td
$4 = 0x806
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0x8064d335 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x8064d773 in vpanic (fmt=, ap=) at 
/usr/src/sys/kern/kern_shutdown.c:880
#4  0x8064d593 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x809cc3d1 in trap_fatal (frame=0xfe00c8a0e6f0, eva=3094) at 
/usr/src/sys/amd64/amd64/trap.c:925
#6  0x809cc42f in trap_pfault (frame=0xfe00c8a0e6f0, 
usermode=, signo=, ucode=) at 
/usr/src/sys/amd64/amd64/trap.c:743
#7  0x809cba76 in trap (frame=0xfe00c8a0e6f0) at 
/usr/src/sys/amd64/amd64/trap.c:407
#8  
#9  epoch_block_handler_preempt (global=, cr=, 
arg=) at /usr/src/sys/kern/subr_epoch.c:423
#10 0x803677fd in epoch_block (global=0xf800020be600, 
cr=0xfe0075db9a00, cb=0x80692320 , 
ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#11 ck_epoch_synchronize_wait (global=0xf800020be600, cb=, 
ct=) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#12 0x806921da in epoch_wait_preempt (epoch=0xf800020be600) at 
/usr/src/sys/kern/subr_epoch.c:513
#13 0x80761687 in ipsec_set_reqid (sc=0xf8004261e200, reqid=103) at 
/usr/src/sys/net/if_ipsec.c:964
#14 ipsec_ioctl (ifp=, cmd=, data=) at /usr/src/sys/net/if_ipsec.c:764
#15 0x807527ef in ifioctl (so=0xf8011d766000, cmd=2149607841, 
data=0xfe00c8a0ea10 "btcd", td=) at 
/usr/src/sys/net/if.c:3147
#16 0x806b5f47 in fo_ioctl (fp=0xf800194846e0, com=2149607841, 
data=0x0, active_cred=0x0, td=0xf80122379740) at /usr/src/sys/sys/file.h:337
#17 kern_ioctl (td=0x80692320 , 
fd=, com=2149607841, data=0x0) at 
/usr/src/sys/kern/sys_generic.c:805
#18 0x806b5bea in sys_ioctl (td=0xf80122379740, 
uap=0xf80122379b00) at /usr/src/sys/kern/sys_generic.c:713
#19 0x809ccf87 in syscallenter (td=0xf80122379740) at 
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#20 amd64_syscall (td=0xf80122379740, traced=0) at 
/usr/src/sys/amd64/amd64/trap.c:1167
#21 
#22 0x00080044e0da in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffe1a8

Any pointers?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


11-STABLE build failure in gnu/usr.bin/binutils/ld on recent -CURRENT

2020-07-26 Thread Don Lewis
I ran into another problem updating my 11-STABLE poudriere jails on my
package build machine, which runs a fairly recent version of -CURRENT.

If I try to cross build:

  -O2 -pipe -DBFD_DEFAULT_TARGET_SIZE=64 -I. -I/tmp/src11/gnu/usr.bin/binutils/l
d -I/tmp/src11/gnu/usr.bin/binutils/ld/../libbfd -I/usr/obj/tmp/src11/gnu/usr.bi
n/binutils/ld/../libbfd -I/tmp/src11/gnu/usr.bin/binutils/ld/../../../../contrib
/binutils/include   -DTARGET=\"x86_64-unknown-freebsd\" -DDEFAULT_EMULATION=\"el
f_x86_64_fbsd\" -DSCRIPTDIR=\"/usr/libdata\" -DBFD_VERSION_STRING=\""2.17.50 [Fr
eeBSD] 2007-07-03"\" -DBINDIR=\"/usr/bin\" -DTARGET_SYSTEM_ROOT=\"/\" -DTOOLBIND
IR=\"//usr/bin/libexec\" -D_GNU_SOURCE -I/tmp/src11/gnu/usr.bin/binutils/ld/../.
./../../contrib/binutils/ld -I/tmp/src11/gnu/usr.bin/binutils/ld/../../../../con
trib/binutils/bfd -g -MD  -MF.depend.ldlex.o -MTldlex.o -std=gnu99 -fstack-prote
ctor-strong -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parame
ter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized
-Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variabl
e -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unu
sed-function -Wno-enum-conversion -Wno-unused-local-typedef -Wno-address-of-pack
ed-member  -Qunused-arguments  -c ldlex.c -o ldlex.o
ldlex.c:3216:3: error: incompatible pointer types passing 'int *' to parameter
  of type 'yy_size_t *' (aka 'unsigned long *')
  [-Werror,-Wincompatible-pointer-types]
  ...YY_INPUT( (_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move]),
 ^
/tmp/src11/gnu/usr.bin/binutils/ld/../../../../contrib/binutils/ld/ldlex.l:64:54
: note:
  expanded from macro 'YY_INPUT'
#define YY_INPUT(buf,result,max_size) yy_input (buf, , max_size)
 ^~~
/tmp/src11/gnu/usr.bin/binutils/ld/../../../../contrib/binutils/ld/ldlex.l:73:42
: note:
  passing argument to parameter here
static void yy_input (char *, yy_size_t *, yy_size_t);
 ^
1 error generated.
*** Error code 1


The problem is that the skeleton defines yy_n_chars as type 'int'
instead of type 'yy_size_t'.  That's a bit of a puzzle because it is
defined as 'yy_size_t' in usr.bin/lex/initskel.c.

If I force lex to always be built as a bootstrap tool, then I get a
successful build, so it looks like the host version of lex is getting
used by default.

I think this is a new problem when the build host is -CURRENT. This
commit:
  
  r362333 | jkim | 2020-06-18 11:09:16 -0700 (Thu, 18 Jun 2020) | 4 lines
  
  MFV:  r362286
  
  Merge flex 2.6.4.
  
  
changes the type of yy_size_t from 'yy_size_t' to 'int'.

I'm not sure what the best fix for this is.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"