[Bug 213370] Kernel panic with ng_tcpmss

2022-03-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213370

--- Comment #2 from Andrey  ---
Hello.
This is maybe actual for FreeBSD-10.3.
I stopped using FreeBSD after that.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: Erratic ping behavior, was Re: Pi3 answers ssh only if outbound ping is running on -current

2022-03-13 Thread Mark Millard
[The USB I/O problem.]

On 2022-Mar-13, at 13:46, Mark Millard  wrote:

>> FreeBSD pelorus.zefox.org 13.1-STABLE FreeBSD 13.1-STABLE #24 
>> stable/13-n249989-b85d0d603c5: Sat Mar 12 17:47:19 PST 2022
>> b...@pelorus.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64
> 
>> . . .
>> 
>> Unfortunately, new console errors are appeared:
>> 
>> bob@pelorus:~ % (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 1f ea a3 c0 00 
>> 00 40 00 
>> (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
>> (da0:umass-sim0:0:0:0): SCSI status: Check Condition
>> (da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information 
>> unit iuCRC error detected)
>> (da0:umass-sim0:0:0:0): Retrying command (per sense data)

Quoting part of a forum reply:

QUOTE
That error translates to "INFORMATION UNIT iuCRC ERROR DETECTED" (cut and 
pasted from the SCSI standard), which means an error in data transmission 
between the disk and the controller, not an error on the disk.
END QUOTE

You report: stable/13-n249989-b85d0d603c5 shows the problem and
later that 13-n249985-dd6c1475a63 does not. That is odd because
the only stable/13 activity after dd6c1475a63 is (newer to older):

git: b85d0d603c57 - stable/13 - tslog.4: Document TSLOG Mateusz Piotrowski 
git: 85379a47c4cb - stable/13 - time.3: Update ERRORS section Mateusz 
Piotrowski 
git: 991c0e3ddb93 - stable/13 - ctime.3: Add a cross-reference to 
clock_gettime(2) Mateusz Piotrowski 
git: bd2e56ef47d5 - stable/13 - zfs: merge openzfs/zfs_at_ef83e07db 
(zfs-2.1-release) into stable/13 Martin Matuska 

In other words: updates to man pages and a zfs update.
As I understand, you do not use zfs.

The next commit back is the dd6c1475a63 that you report as working:

git: dd6c1475a63a - stable/13 - Add support for getting early entropy from UEFI 
Colin Percival 

A possibility might be a poor connection that was later
reconnected (with better contact)? Brownout power
conditions?

It seems odd for zfs to be involved but man page changes
have no chance of being involved. It tends to suggest a
non-software issue at the time.

> . . .
> 
>> The error message repeated at intervals, seemingly linked to disk activity,
>> with the machine running out of retries after starting buildworld. 
>> That seems like progress in reverse 8-)
> 
> So you have bounds for a bisect to find where the USB
> failures start happening on your RPi3, if you want to
> investigate that.

Looks like this idea was a bust, given the narrow range
and the content involved in that range.

>> Reverting to 13-n249985-dd6c1475a63: Fri Mar 11 18:06:31 PST 2022
>> permitted recovery, with the old problems attendant.
>> 
>> -current now has some usb testing tools:
>> root@www:/usr/src # ls tools/tools/usbtest
>> Makefile usb_control_ep_test.c   usb_msc_test.c  
>> usbtest.c
>> Makefile.depend  usb_modem_test.cusb_msc_test.h  
>> usbtest.h
>> 
>> Is there any guidance on what they do? I couldn't find a man page.
>> 
> 
> 



===
Mark Millard
marklmi at yahoo.com




Problem reports for n...@freebsd.org that need special attention

2022-03-13 Thread bugzilla-noreply
To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status  |Bug Id | Description
+---+---
In Progress |221146 | [ixgbe] Problem with second laggport  
New |204438 | setsockopt() handling of kern.ipc.maxsockbuf limi 
New |213410 | [carp] service netif restart causes hang only whe 
Open|  7556 | ppp: sl_compress_init() will fail if called anyth 
Open|166724 | if_re(4): watchdog timeout
Open|193452 | Dell PowerEdge 210 II -- Kernel panic bce (broadc 
Open|202510 | [CARP] advertisements sourced from CARP IP cause  
Open|207261 | netmap: Doesn't do TX sync with kqueue
Open|73 | igb(4): Kernel panic (fatal trap 12) due to netwo 
Open|225438 | panic in6_unlink_ifa() due to race
Open|227720 | Kernel panic in ppp server
Open|230807 | if_alc(4): Driver not working for Killer Networki 
Open|236888 | ppp daemon: Allow MTU to be overridden for PPPoE  
Open|236983 | bnxt(4): VLAN not operational unless explicit "if 
Open|237072 | netgraph(4): performance issue [on HardenedBSD]?  
Open|237840 | Removed dummynet dependency on ipfw   
Open|238324 | Add XG-C100C/AQtion AQC107 10GbE NIC driver   
Open|238707 | Lock order reversal: rtentry vs "nd6 list"
Open|240944 | em(4): Crash with Intel 82571EB NIC with AMD Pile 
Open|241106 | tun/ppp: panic: vm_fault: fault on nofault entry  
Open|241162 | Panic in closefp() triggered by nginx (uwsgi with 
Open|241191 | route flush panic with RADIX_MPATH
Open|243463 | ix0: Watchdog timeout 
Open|247111 | pxeboot very slow with i219LM 
Open|257709 | netinet6: Set net.inet6.icmp6.nodeinfo default to 
Open|118111 | rc: network.subr Add MAC address based interface  

26 problems total for which you should take action.


Re: Erratic ping behavior, was Re: Pi3 answers ssh only if outbound ping is running on -current

2022-03-13 Thread Mark Millard
On 2022-Mar-13, at 12:34, bob prohaska  wrote:

> It looks as if things are somewhat altered, but not really
> improved,  after updating to 
> 
> FreeBSD pelorus.zefox.org 13.1-STABLE FreeBSD 13.1-STABLE #24 
> stable/13-n249989-b85d0d603c5: Sat Mar 12 17:47:19 PST 2022 
> b...@pelorus.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64

That is still your build and your configuration adjsutemtns
for your general use --on your usual media.

Are you ever going to test the networking behavior of a snapshot
image and/or, now, BETA image, installed to a fresh, independent
microsd card that is used to boot, no other media attached? The
point being: not your build, not your configuration, minimal
adjustment to allow the basic testing. (The system does not need
to be able to support your general use, just the network testing.)

If you want the problem worked on, folks will need an identified
configuration that they can reproduce the problem with. Your
builds and full set of configuration adjustments likely can not
be involved in that.

> on the Pi3 with USB root disk (hdd, not ssd).
> 
> After reboot the machine answered around 30 % of incoming pings and
> sometimes responded to ssh login attempts. At least once the login
> attempt was successful, even with no outgoing ping process running.
> That seems like progress.
> 
> With an outgoing ping running it's possible to ssh into the machine,
> with a highly variable delay for the password prompt.
> 
> Unfortunately, new console errors are appeared:
> 
> bob@pelorus:~ % (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 1f ea a3 c0 00 
> 00 40 00 
> (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
> (da0:umass-sim0:0:0:0): SCSI status: Check Condition
> (da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information 
> unit iuCRC error detected)
> (da0:umass-sim0:0:0:0): Retrying command (per sense data)

Note that no USB drive would be involved in the type of
testing that I've asked for. This sort of issue would not
interfere with the proposed testing.

> The error message repeated at intervals, seemingly linked to disk activity,
> with the machine running out of retries after starting buildworld. 
> That seems like progress in reverse 8-)

So you have bounds for a bisect to find where the USB
failures start happening on your RPi3, if you want to
investigate that.

> Reverting to 13-n249985-dd6c1475a63: Fri Mar 11 18:06:31 PST 2022
> permitted recovery, with the old problems attendant.
> 
> -current now has some usb testing tools:
> root@www:/usr/src # ls tools/tools/usbtest
> Makefile  usb_control_ep_test.c   usb_msc_test.c  
> usbtest.c
> Makefile.depend   usb_modem_test.cusb_msc_test.h  
> usbtest.h
> 
> Is there any guidance on what they do? I couldn't find a man page.
> 




===
Mark Millard
marklmi at yahoo.com




Re: Erratic ping behavior, was Re: Pi3 answers ssh only if outbound ping is running on -current

2022-03-13 Thread bob prohaska
It looks as if things are somewhat altered, but not really
improved,  after updating to 

FreeBSD pelorus.zefox.org 13.1-STABLE FreeBSD 13.1-STABLE #24 
stable/13-n249989-b85d0d603c5: Sat Mar 12 17:47:19 PST 2022 
b...@pelorus.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64

on the Pi3 with USB root disk (hdd, not ssd).

After reboot the machine answered around 30 % of incoming pings and
sometimes responded to ssh login attempts. At least once the login
attempt was successful, even with no outgoing ping process running.
That seems like progress.

With an outgoing ping running it's possible to ssh into the machine,
with a highly variable delay for the password prompt.

Unfortunately, new console errors are appeared:

bob@pelorus:~ % (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 1f ea a3 c0 00 00 
40 00 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit 
iuCRC error detected)
(da0:umass-sim0:0:0:0): Retrying command (per sense data)

The error message repeated at intervals, seemingly linked to disk activity,
with the machine running out of retries after starting buildworld. 
That seems like progress in reverse 8-)

Reverting to 13-n249985-dd6c1475a63: Fri Mar 11 18:06:31 PST 2022
permitted recovery, with the old problems attendant.
 
-current now has some usb testing tools:
root@www:/usr/src # ls tools/tools/usbtest
Makefileusb_control_ep_test.c   usb_msc_test.c  
usbtest.c
Makefile.depend usb_modem_test.cusb_msc_test.h  
usbtest.h

Is there any guidance on what they do? I couldn't find a man page.

Thanks for reading,

bob prohaska




Re: epair and vnet jail loose connection.

2022-03-13 Thread Bjoern A. Zeeb

On 13 Mar 2022, at 17:45, Michael Gmelin wrote:

On 13. Mar 2022, at 18:16, Bjoern A. Zeeb 
 wrote:


On 13 Mar 2022, at 16:33, Michael Gmelin wrote:

It's important to point out that this only happens with kern.ncpu>1.
With kern.ncpu==1 nothing gets stuck.

This perfectly fits into the picture, since, as pointed out by 
Johan,

the first commit that is affected[0] is about multicore support.


Ignore my ignorance, what is the default of net.isr.maxthreads and 
net.isr.bindthreads (in stable/13) these days?




My tests were on CURRENT and I’m afk, but according to cgit[0][1], 
max is 1 and bind is 0.


Would it make sense to repeat the test with max=-1?


I’d say yes, I’d also bind, but that’s just me.

I would almost assume Kristof running with -1 by default (but he can 
chime in on that).



Best
Michael

[0] https://cgit.freebsd.org/src/tree/sys/net/netisr.c#n280
[1] 
https://cgit.freebsd.org/src/tree/sys/net/netisr.c?h=stable%2F13#n280




Re: epair and vnet jail loose connection.

2022-03-13 Thread Michael Gmelin



> On 13. Mar 2022, at 18:16, Bjoern A. Zeeb  
> wrote:
> 
> On 13 Mar 2022, at 16:33, Michael Gmelin wrote:
>> It's important to point out that this only happens with kern.ncpu>1.
>> With kern.ncpu==1 nothing gets stuck.
>> 
>> This perfectly fits into the picture, since, as pointed out by Johan,
>> the first commit that is affected[0] is about multicore support.
> 
> Ignore my ignorance, what is the default of net.isr.maxthreads and 
> net.isr.bindthreads (in stable/13) these days?
> 

My tests were on CURRENT and I’m afk, but according to cgit[0][1], max is 1 and 
bind is 0.

Would it make sense to repeat the test with max=-1?

Best
Michael

[0] https://cgit.freebsd.org/src/tree/sys/net/netisr.c#n280
[1] https://cgit.freebsd.org/src/tree/sys/net/netisr.c?h=stable%2F13#n280



Re: epair and vnet jail loose connection.

2022-03-13 Thread Bjoern A. Zeeb

On 13 Mar 2022, at 16:33, Michael Gmelin wrote:

It's important to point out that this only happens with kern.ncpu>1.
With kern.ncpu==1 nothing gets stuck.

This perfectly fits into the picture, since, as pointed out by Johan,
the first commit that is affected[0] is about multicore support.


Ignore my ignorance, what is the default of net.isr.maxthreads and 
net.isr.bindthreads (in stable/13) these days?


/bz



Re: epair and vnet jail loose connection.

2022-03-13 Thread Michael Gmelin



On Sun, 13 Mar 2022 14:32:50 +0100
Johan Hendriks  wrote:

> On 13/03/2022 14:06, Patrick M. Hausen wrote:
> > Hi all,
> >
> > i was a bit puzzled by Michael using bhyve trying to reproduce.
> > Up until now I thought bhyve uses tap and not epair?
> >
> > Anyway ...
> >  
> >> Am 13.03.2022 um 14:01 schrieb Johan Hendriks
> >> : I have no idea why it does not work on
> >> my setup, which is nothing out of the ordinary i think, basic full
> >> jails connected to a bridge interface and one of them exposed to
> >> the world wide web using pf binat.  
> > What we do is full exposed VNET jails connected to the bridge
> > on the external interface of the host. ipfw kernel module loaded
> > but not used in this case, i.e. only the "default to accept" rule
> > active in the jails.
> >
> > I will probably downgrade the production host from 13.1-PRERELEASE
> > to 13.0-pX tomorrow and see if that changes anything.
> >
> > Kind regards,
> > Patrick  
> Downgrading to 13.0-p7 worked for me, it even works on 13.0-STABLE
> till this commit 18 days ago.
> https://freshbsd.org/freebsd/src/commit/2e0bee4c7f8176e0f8396c9389275745bac1e263
> 
> After that commit my setup stops working.
> 

@all Johan gave me access to a test system where I could see the
problem in action. There's nothing wrong with his config in respect to
the issue at hand.

I tried a few times more on my smaller test setup and I could reproduce
the issue there now as well (with ncpu=2).

I created a reduced test case that triggers the issue every time. It's
assumed to be run on a dedicated vm or host. It doesn't require pf,
bridges, tuning sysctl.conf, or any other special considerations.

/etc/rc.conf is very basic/vanilla:

hostname="johan"
ifconfig_vtnet0="10.1.1.16/24"
defaultrouter="10.1.1.1"
gateway_enable="YES"

sshd_enable="YES"
dumpdev="NO"
zfs_enable="YES"
sendmail_enable="NONE"

Script to test/reproduce:

#!/bin/sh

export PATH=/usr/local/bin:"$PATH"

jname="tj"
ename="epair_$jname"

set -e

echo "> Install packages"
pkg install -y haproxy hey

echo "> Remove some leftovers"
(
  killall hey || true
  jail -r "$jname" || true
  ifconfig "$ename" destroy || true
) 2>/dev/null

sleep 1

echo "> Create interfaces"
intf=$(ifconfig epair create)
jintf=$(echo "$intf" | sed "s|a$|b|")

ifconfig "$intf" name "$ename"
ifconfig "$ename" 10.233.185.1/24

echo "> Create and start jail"
jail -c vnet name="$jname" persist path=/ \
  host.hostname="$jname" vnet.interface="$jintf" 

jexec "$jname" ifconfig lo0 127.0.0.1/8
jexec "$jname" ifconfig "$jintf" 10.233.185.2/24 up
jexec "$jname" route add default 10.233.185.1

cat >/tmp/haproxy.conf< Start hey instances"
hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2;
hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2;
hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2;

echo "> Ping jail"
ping 10.233.185.2

# EOF

This script can be called multiple times in a row (it tears down what
it created in previous runs).

Now, testing with this script, I get:

> Install packages
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The most recent versions of packages are already installed
> Remove some leftovers
tj: removed
> Create interfaces
epair_tj
> Create and start jail
add net default: gateway 10.233.185.1
> Start hey instances
> Ping jail
PING 10.233.185.2 (10.233.185.2): 56 data bytes
64 bytes from 10.233.185.2: icmp_seq=0 ttl=64 time=0.076 ms
64 bytes from 10.233.185.2: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 10.233.185.2: icmp_seq=2 ttl=64 time=0.086 ms
64 bytes from 10.233.185.2: icmp_seq=3 ttl=64 time=0.158 ms
64 bytes from 10.233.185.2: icmp_seq=4 ttl=64 time=0.081 ms
64 bytes from 10.233.185.2: icmp_seq=5 ttl=64 time=0.093 ms

At which point it gets stuck. The exact moment when this happens
differs between runs, but it happens every time on my test host and
always within a couple of seconds.

It's important to point out that this only happens with kern.ncpu>1.
With kern.ncpu==1 nothing gets stuck.

This perfectly fits into the picture, since, as pointed out by Johan,
the first commit that is affected[0] is about multicore support.

Cheers
Michael


[0]
https://cgit.freebsd.org/src/commit/?id=24f0bfbad57b9c3cb9b543a60b2ba00e4812c286

-- 
Michael Gmelin



Re: epair and vnet jail loose connection.

2022-03-13 Thread Johan Hendriks



On 13/03/2022 14:06, Patrick M. Hausen wrote:

Hi all,

i was a bit puzzled by Michael using bhyve trying to reproduce.
Up until now I thought bhyve uses tap and not epair?

Anyway ...


Am 13.03.2022 um 14:01 schrieb Johan Hendriks :
I have no idea why it does not work on my setup, which is nothing out of the 
ordinary i think, basic full jails connected to a bridge interface and one of 
them exposed to the world wide web using pf binat.

What we do is full exposed VNET jails connected to the bridge
on the external interface of the host. ipfw kernel module loaded
but not used in this case, i.e. only the "default to accept" rule active
in the jails.

I will probably downgrade the production host from 13.1-PRERELEASE
to 13.0-pX tomorrow and see if that changes anything.

Kind regards,
Patrick
Downgrading to 13.0-p7 worked for me, it even works on 13.0-STABLE till 
this commit 18 days ago.

https://freshbsd.org/freebsd/src/commit/2e0bee4c7f8176e0f8396c9389275745bac1e263

After that commit my setup stops working.

regards
Johan Hendriks




Re: epair and vnet jail loose connection.

2022-03-13 Thread Michael Gmelin



> On 13. Mar 2022, at 14:07, Patrick M. Hausen  wrote:
> 
> Hi all,
> 
> i was a bit puzzled by Michael using bhyve trying to reproduce.
> Up until now I thought bhyve uses tap and not epair?
> 

In my setup, FreeBSD 14 runs on a bhyve vm, hosting the jails, which use vnet.

Bare metal -> FreeBSD 13.0 -> bhyve -> FreeBSD Current -> vnet jails 
haproxy/web01

Replace bhyve with VMware, AWS, or a bare metal server to understand the setup.

The reason I’m doing this is:
1. I don’t want to update the bare metal host to a non-release version
2. Johan is running his setup inside a vm as well.

Best
Michael

> Anyway ...
> 
>> Am 13.03.2022 um 14:01 schrieb Johan Hendriks :
>> I have no idea why it does not work on my setup, which is nothing out of the 
>> ordinary i think, basic full jails connected to a bridge interface and one 
>> of them exposed to the world wide web using pf binat.
> 
> What we do is full exposed VNET jails connected to the bridge
> on the external interface of the host. ipfw kernel module loaded
> but not used in this case, i.e. only the "default to accept" rule active
> in the jails.
> 
> I will probably downgrade the production host from 13.1-PRERELEASE
> to 13.0-pX tomorrow and see if that changes anything.
> 
> Kind regards,
> Patrick
> -- 
> punkt.de GmbH
> Patrick M. Hausen
> .infrastructure
> 
> Kaiserallee 13a
> 76133 Karlsruhe
> 
> Tel. +49 721 9109500
> 
> https://infrastructure.punkt.de
> i...@punkt.de
> 
> AG Mannheim 108285
> Geschäftsführer: Jürgen Egeling, Daniel Lienert, Fabian Stein




Re: epair and vnet jail loose connection.

2022-03-13 Thread Kristof Provost



> On 13 Mar 2022, at 08:01, Johan Hendriks  wrote:
> 
> 
>> On 13/03/2022 13:37, Kristof Provost wrote:
>>> On 13 Mar 2022, at 5:26, Johan Hendriks wrote:
>>> Copyd my haproxy en web01 jail to this machine and have the same problem.
>>> 
>> Do you mean you can or cannot reproduce it on the second machine?
> I have the same problem.
So it also fails on the second machine?

Simply this setup. Attempt to reproduce with iperf3, then without pf, then 
without the bridge. 

Kristof
> 



Re: epair and vnet jail loose connection.

2022-03-13 Thread Patrick M. Hausen
Hi all,

i was a bit puzzled by Michael using bhyve trying to reproduce.
Up until now I thought bhyve uses tap and not epair?

Anyway ...

> Am 13.03.2022 um 14:01 schrieb Johan Hendriks :
> I have no idea why it does not work on my setup, which is nothing out of the 
> ordinary i think, basic full jails connected to a bridge interface and one of 
> them exposed to the world wide web using pf binat.

What we do is full exposed VNET jails connected to the bridge
on the external interface of the host. ipfw kernel module loaded
but not used in this case, i.e. only the "default to accept" rule active
in the jails.

I will probably downgrade the production host from 13.1-PRERELEASE
to 13.0-pX tomorrow and see if that changes anything.

Kind regards,
Patrick
-- 
punkt.de GmbH
Patrick M. Hausen
.infrastructure

Kaiserallee 13a
76133 Karlsruhe

Tel. +49 721 9109500

https://infrastructure.punkt.de
i...@punkt.de

AG Mannheim 108285
Geschäftsführer: Jürgen Egeling, Daniel Lienert, Fabian Stein




Re: epair and vnet jail loose connection.

2022-03-13 Thread Johan Hendriks



On 13/03/2022 13:37, Kristof Provost wrote:

On 13 Mar 2022, at 5:26, Johan Hendriks wrote:

Copyd my haproxy en web01 jail to this machine and have the same problem.


Do you mean you can or cannot reproduce it on the second machine?

I have the same problem.



Could it be a sysctl i use? or boot/loader.conf setting.


None of those settings strike me as likely to cause this problem.

It looked ok to me also.

I f you want i can give you full root access on this machine.


I really need to be able to reproduce this locally, because I’ll need to run 
test code and be able to observe kernel output (and deal with panics).

Kristof
I have no idea why it does not work on my setup, which is nothing out of 
the ordinary i think, basic full jails connected to a bridge interface 
and one of them exposed to the world wide web using pf binat.


I gave Michael root access on the new machine, so maybe he can see what 
i am doing wrong.


I really hope i do not something really stupid.
The jails are full jails setup following the tutorial, freebsd jails the 
hard way.

https://clinta.github.io/freebsd-jails-the-hard-way/

regards
Johan



Re: epair and vnet jail loose connection.

2022-03-13 Thread Kristof Provost
On 13 Mar 2022, at 5:26, Johan Hendriks wrote:
> Copyd my haproxy en web01 jail to this machine and have the same problem.
>
Do you mean you can or cannot reproduce it on the second machine?

> Could it be a sysctl i use? or boot/loader.conf setting.
>
None of those settings strike me as likely to cause this problem.

> I f you want i can give you full root access on this machine.
>
I really need to be able to reproduce this locally, because I’ll need to run 
test code and be able to observe kernel output (and deal with panics).

Kristof



Re: epair and vnet jail loose connection.

2022-03-13 Thread Michael Gmelin


> On 13. Mar 2022, at 11:27, Johan Hendriks  wrote:
> 
> 
> 
> Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin :
>> I also gave it another go (this time with multiple CPUs assigned to the vm), 
>> still works just fine - so I think we would need more details about the 
>> setup.
>> 
>> Would it make sense to share our test setups, so Johan can try to reproduce 
>> with them?
>> 
>> -m
>> 
>>> On 13. Mar 2022, at 00:48, Kristof Provost  wrote:
>>> 
>>> I’m still failing to reproduce.
>>> 
>>> Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you 
>>> trigger it with iperf)? 
>>> Is the bridge strictly required?
>>> 
>>> Kristof
>>> 
>>> On 12 Mar 2022, at 8:18, Johan Hendriks wrote: 
>>> For me this minimal setup let me see the drop off of the network from the 
>>> haproxy server.
>>> 
>>> 2 jails, one with haproxy, one with nginx which is using the following html 
>>> file to be served.
>>> 
>>> 
>>> 
>>> 
>>> Page Title
>>> 
>>> 
>>> 
>>> My First Heading
>>> My first paragraph.
>>> 
>>> 
>>> 
>>> 
>>> From a remote machine i do a  hey -h2 -n 10 -c 10 -z 300s https://wp.test.nl
>>> Then a ping on the jailhost to the haproxy shows the following
>>> 
>>> [ /] > ping 10.233.185.20
>>> PING 10.233.185.20 (10.233.185.20): 56 data bytes
>>> 64 bytes from 10.233.185.20: icmp_seq=0 ttl=64 time=0.054 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=1 ttl=64 time=0.050 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=2 ttl=64 time=0.041 ms
>>> 
>>> 64 bytes from 10.233.185.20: icmp_seq=169 ttl=64 time=0.050 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=170 ttl=64 time=0.154 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=171 ttl=64 time=0.054 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=172 ttl=64 time=0.039 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=173 ttl=64 time=0.160 ms
>>> 64 bytes from 10.233.185.20: icmp_seq=174 ttl=64 time=0.045 ms
>>> ^C
>>> --- 10.233.185.20 ping statistics ---
>>> 335 packets transmitted, 175 packets received, 47.8% packet loss
>>> round-trip min/avg/max/stddev = 0.037/0.070/0.251/0.040 ms
>>> 
>>> 
>>> ifconfig
>>> vtnet0: flags=8963 metric 0 
>>> mtu 1500
>>> options=4c00bb
>>> ether 56:16:e9:80:5e:41
>>> inet 87.233.191.146 netmask 0xfff0 broadcast 87.233.191.159
>>> inet 87.233.191.156 netmask 0x broadcast 87.233.191.156
>>> inet 87.233.191.155 netmask 0x broadcast 87.233.191.155
>>> inet 87.233.191.154 netmask 0x broadcast 87.233.191.154
>>> media: Ethernet autoselect (10Gbase-T )
>>> status: active
>>> nd6 options=29
>>> vtnet1: flags=8863 metric 0 mtu 1500
>>> options=4c07bb
>>> ether 56:16:2c:64:32:35
>>> media: Ethernet autoselect (10Gbase-T )
>>> status: active
>>> nd6 options=29
>>> lo0: flags=8049 metric 0 mtu 16384
>>> options=680003
>>> inet6 ::1 prefixlen 128
>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
>>> inet 127.0.0.1 netmask 0xff00
>>> groups: lo
>>> nd6 options=21
>>> bridge0: flags=8843 metric 0 mtu 
>>> 1500
>>> ether 58:9c:fc:10:ff:82
>>> inet 10.233.185.1 netmask 0xff00 broadcast 10.233.185.255
>>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>>> member: epair20a flags=143
>>>ifmaxaddr 0 port 7 priority 128 path cost 2000
>>> member: epair18a flags=143
>>>ifmaxaddr 0 port 15 priority 128 path cost 2000
>>> groups: bridge
>>> nd6 options=9
>>> bridge1: flags=8843 metric 0 mtu 
>>> 1500
>>> ether 58:9c:fc:10:d9:1a
>>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>>> member: vtnet0 flags=143
>>>ifmaxaddr 0 port 1 priority 128 path cost 2000
>>> groups: bridge
>>> nd6 options=9
>>> pflog0: flags=141 metric 0 mtu 33160
>>> groups: pflog
>>> epair18a: flags=8963 metric 
>>> 0 mtu 1500
>>> description: jail_web01
>>> options=8
>>> ether 02:77:ea:19:c7:0a
>>> groups: epair
>>> media: Ethernet 10Gbase-T (10Gbase-T )
>>> status: active
>>> nd6 options=29
>>> epair20a: flags=8963 metric 
>>> 0 mtu 1500
>>> description: jail_haproxy
>>> options=8
>>> ether 02:9b:93:8c:59:0a
>>> groups: epair
>>> media: Ethernet 10Gbase-T (10Gbase-T )
>>> status: active
>>> nd6 options=29
>>> 
>>> jail.conf
>>> 
>>> # Global settings applied to all jails.
>>> $domain = "test.nl";
>>> 
>>> exec.start = "/bin/sh /etc/rc";
>>> exec.stop = "/bin/sh /etc/rc.shutdown";
>>> exec.clean;
>>> 
>>> mount.fstab = "/storage/jails/$name.fstab";
>>> 
>>> exec.system_user  = "root";
>>> exec.jail_user= "root";
>>> mount.devfs;
>>> sysvshm="new";
>>> sysvsem="new";
>>> allow.raw_sockets;
>>> allow.set_hostname = 0;
>>> allow.sysvipc;
>>> enforce_statfs = "2";
>>> devfs_ruleset = "11";
>>> 
>>> path = "/storage/jails/${name}";
>>> host.hostname = "${name}.${domain}";
>>> 
>>> 
>>> # Networking
>>> vnet;
>>> vnet.interface= "vnet0";
>>> 
>>>   # Commands 

Re: epair and vnet jail loose connection.

2022-03-13 Thread Johan Hendriks
Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin :

> I also gave it another go (this time with multiple CPUs assigned to the
> vm), still works just fine - so I think we would need more details about
> the setup.
>
> Would it make sense to share our test setups, so Johan can try to
> reproduce with them?
>
> -m
>
> On 13. Mar 2022, at 00:48, Kristof Provost  wrote:
>
> 
>
> I’m still failing to reproduce.
>
> Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you
> trigger it with iperf)?
> Is the bridge strictly required?
>
> Kristof
>
> On 12 Mar 2022, at 8:18, Johan Hendriks wrote:
>
> For me this minimal setup let me see the drop off of the network from the
> haproxy server.
>
> 2 jails, one with haproxy, one with nginx which is using the following
> html file to be served.
>
> 
> 
> 
> Page Title
> 
> 
>
> My First Heading
> My first paragraph.
>
> 
> 
>
> From a remote machine i do a  hey -h2 -n 10 -c 10 -z 300s
> https://wp.test.nl
> Then a ping on the jailhost to the haproxy shows the following
>
> [ /] > ping 10.233.185.20
> PING 10.233.185.20 (10.233.185.20): 56 data bytes
> 64 bytes from 10.233.185.20: icmp_seq=0 ttl=64 time=0.054 ms
> 64 bytes from 10.233.185.20: icmp_seq=1 ttl=64 time=0.050 ms
> 64 bytes from 10.233.185.20: icmp_seq=2 ttl=64 time=0.041 ms
> 
> 64 bytes from 10.233.185.20: icmp_seq=169 ttl=64 time=0.050 ms
> 64 bytes from 10.233.185.20: icmp_seq=170 ttl=64 time=0.154 ms
> 64 bytes from 10.233.185.20: icmp_seq=171 ttl=64 time=0.054 ms
> 64 bytes from 10.233.185.20: icmp_seq=172 ttl=64 time=0.039 ms
> 64 bytes from 10.233.185.20: icmp_seq=173 ttl=64 time=0.160 ms
> 64 bytes from 10.233.185.20: icmp_seq=174 ttl=64 time=0.045 ms
> ^C
> --- 10.233.185.20 ping statistics ---
> 335 packets transmitted, 175 packets received, 47.8% packet loss
> round-trip min/avg/max/stddev = 0.037/0.070/0.251/0.040 ms
>
>
> ifconfig
> vtnet0: flags=8963 metric
> 0 mtu 1500
>
> options=4c00bb
> ether 56:16:e9:80:5e:41
> inet 87.233.191.146 netmask 0xfff0 broadcast 87.233.191.159
> inet 87.233.191.156 netmask 0x broadcast 87.233.191.156
> inet 87.233.191.155 netmask 0x broadcast 87.233.191.155
> inet 87.233.191.154 netmask 0x broadcast 87.233.191.154
> media: Ethernet autoselect (10Gbase-T )
> status: active
> nd6 options=29
> vtnet1: flags=8863 metric 0 mtu
> 1500
>
> options=4c07bb
> ether 56:16:2c:64:32:35
> media: Ethernet autoselect (10Gbase-T )
> status: active
> nd6 options=29
> lo0: flags=8049 metric 0 mtu 16384
> options=680003
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
> inet 127.0.0.1 netmask 0xff00
> groups: lo
> nd6 options=21
> bridge0: flags=8843 metric 0 mtu
> 1500
> ether 58:9c:fc:10:ff:82
> inet 10.233.185.1 netmask 0xff00 broadcast 10.233.185.255
> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> member: epair20a flags=143
>ifmaxaddr 0 port 7 priority 128 path cost 2000
> member: epair18a flags=143
>ifmaxaddr 0 port 15 priority 128 path cost 2000
> groups: bridge
> nd6 options=9
> bridge1: flags=8843 metric 0 mtu
> 1500
> ether 58:9c:fc:10:d9:1a
> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> member: vtnet0 flags=143
>ifmaxaddr 0 port 1 priority 128 path cost 2000
> groups: bridge
> nd6 options=9
> pflog0: flags=141 metric 0 mtu 33160
> groups: pflog
> epair18a: flags=8963
> metric 0 mtu 1500
> description: jail_web01
> options=8
> ether 02:77:ea:19:c7:0a
> groups: epair
> media: Ethernet 10Gbase-T (10Gbase-T )
> status: active
> nd6 options=29
> epair20a: flags=8963
> metric 0 mtu 1500
> description: jail_haproxy
> options=8
> ether 02:9b:93:8c:59:0a
> groups: epair
> media: Ethernet 10Gbase-T (10Gbase-T )
> status: active
> nd6 options=29
>
> jail.conf
>
> # Global settings applied to all jails.
> $domain = "test.nl";
>
> exec.start = "/bin/sh /etc/rc";
> exec.stop = "/bin/sh /etc/rc.shutdown";
> exec.clean;
>
> mount.fstab = "/storage/jails/$name.fstab";
>
> exec.system_user  = "root";
> exec.jail_user= "root";
> mount.devfs;
> sysvshm="new";
> sysvsem="new";
> allow.raw_sockets;
> allow.set_hostname = 0;
> allow.sysvipc;
> enforce_statfs = "2";
> devfs_ruleset = "11";
>
> path = "/storage/jails/${name}";
> host.hostname = "${name}.${domain}";
>
>
> # Networking
> vnet;
> vnet.interface= "vnet0";
>
>   # Commands to run on host before jail is created
>   exec.prestart  = "ifconfig epair${ip} create up description
> jail_${name}";
>   exec.prestart  += "ifconfig epair${ip}a up";
>   exec.prestart  += "ifconfig bridge0 addm epair${ip}a up";
>   exec.created   = "ifconfig epair${ip}b name vnet0";
>
>   # Commands to run in jail after it is created
>   exec.start  += "/bin/sh /etc/rc";
>
>   # commands to