Re: Howto: ipsec tunnel routing both IPv4 and IPv6? Possible?

2024-01-15 Thread Andrey V. Elsukov

On 15.01.2024 16:09, Michael Grimm wrote:

Hi,

I do use an ipsec tunnel for routing local IPv4 traffic for years now 
(/etc/rc.conf):

  cloned_interfaces="ipsec0"
  static_routes="tunnel0"
  create_args_ipsec0="reqid 104"
  ifconfig_ipsec0="inet 10.2.2.250 10.1.1.254 tunnel 1.2.3.4 10.20.30.40"
  route_tunnel0="10.1.1.0/24 10.1.1.254"

ifconfig ipsec0 (erelevant info, only):
  ipsec0: flags=1008051 metric 0 
mtu 1400
  tunnel inet 1.2.3.4 --> 10.20.30.40
  inet 10.2.2.250 --> 10.1.1.254 netmask 0xff00
  reqid: 104


pf firewall entries are set to allow esp over that tunnel.

Now, I do want to route local IPv6 in addition, *if* that is possible, at all.


Hi,

try something like this:

ifconfig_ipsec0_ipv6="inet6 fd00:b:b:b::250 fd00:a:a:a::254 prefixlen 
128"

--
WBR, Andrey V. Elsukov




Re: Restarting IPv6

2023-10-04 Thread Andrey V. Elsukov

On 04.10.2023 03:02, Kevin Oberman wrote:


I had a problem with one network and had to restart (service 
wpa_supplicant restart wlan0). My Iv4 came up file, but IPv6 does not 
come up. Interface has only link-local address. When I boot to the same 
AP, it comes up fine.


Any idea what I'm missing?


Hi,

probably you need to use rtsold(8).

--
WBR, Andrey V. Elsukov




Re: em0: No buffer space available for IPv6 traffic but IPv4 is OK

2023-08-21 Thread Andrey V. Elsukov

On 19.08.2023 00:54, Kevin Oberman wrote:

Interestingly, there is incoming IPv6 local broadcast traffic as
sniffed
by
# tcpdump -n -i em0 ip6
(ICMP6, neighbor solicitation, UDP from LAN link local addresses).

Has anyone seen this before and can suggest a fix?

Reboot did not solve, no software updates made, no config changes, just
stop working from one day to the next.

Oddly, ENOBUFS is the error I get when my firewall is blocking transmit 
traffic. There may well be other causes.


Yes, network stack returns this error usually when it is unable to send 
data. And this could be due to many different causes.


First of check that prefixes and masks are correctly configured. Then 
check that your IPv6 default gateway is directly reachable and its layer 
2 address is known. Also you can check `netstat -s` to find what counter 
grows when you try to use IPv6. Make sure your firewall doesn't block 
ICMPv6 types needed for IPv6 to work. Check that multicast functions 
correctly.


  # ifconfig
  # ndp -an
  # netstat -s
  # ifmstat

--
WBR, Andrey V. Elsukov




Re: Is there a FreeBSD equivalent of 'tcpdump -i any' from Linux?

2023-08-03 Thread Andrey V. Elsukov

On 02.08.2023 06:49, Zane C B-H wrote:
Replacement for daemonlogger given it is dead upstream and no one else 
has picked up development. On Linux the same can easily be accomplished 
via tcpdump and the pcap rotation options and then just using removing 
old files based on age/disk usage. Unfortunately FreeBSD lacks support 
for '-i any'. In many ways settled upon tcpdump as it is not likely to 
just stopped be developed.


Netgraph looks semiworkable via one2many and setting the interfaces on 
the many side or promisc, but this also creates the issue of the 
listening interface can also transmit. That said looks like putting the 
connected ng_iface in monitor mode at creation should solve that. Been 
looking at that on and off today trying to wrap my head around netgraph.


You also can implement DLT_PKTAP or DLT_LINUX_SLL linktypes through some 
pseudo network driver, then modify ETHER_BPF_MTAP() macro, probably make 
some tweaks for tcpdump and you will get what you need. It seems not so 
hard.


--
WBR, Andrey V. Elsukov




Re: IPFW: IPv6 and NPTv6 issues: multiple IPv6 addresses confuses IPFW

2023-02-19 Thread Andrey V. Elsukov

18.02.2023 18:42, FreeBSD User пишет:

On a 24 hour basis, the ISP changes the IPv4 and IPv6 on the WAN
interface. We use NPTv6 to translate  ULA addresses for the inner
IPv6 networks. We use IPv6 privacy on the tun0 interface. The
router/firewall is operating after a reboot or restart of mpd5
correctly, IPv6 and IPv4 networks have conection to the internet.
When the ISP rotates it IPs, the IPv6 address is configured using
SLAAC and mpd5 seems to act weird:

- the IPv4 address is always set correct, IPFW and in-kernel NAT
route/filter traffic correctly - sometimes old IPv6 address is dumped
and only a new IPv6 address - in such a case, the old IPv6 is gone,
the new pair (temporary and MACified address are the only IPv6
addresses attached to the interface. - sometimes the old IPv6 address
set (= temporary) are marked "deprecated" and/or "detached" and a new
set is attached to the interface tun0, in some rare occassion also an
IPv6 address WITHOUT its "temoprary" sibbling is attached.

In any of the cases above, IPFW's NPTv6 gets confused, routing isn't
working properly anymore.

In any cases of a change of the IPv6 address, IPFW has to be
restartet!


Hi,

I assume you are using ext_if option in your NPTv6 instance configuration.

I think there might be several problems that lead to your situation:

1. NPTv6 tracks IPv6 addresses deletion, but since an old IPv6 address 
that was used as external prefix  kept on the interface, it ignores 
appearance of new IPv6 address.


2. Then, even if you delete old IPv6 address by hand, NPTv6 won't try to 
peak another one until there won't appear new address.


3. There should be some logic that takes into account presence of 
temporary and deprecated addresses on the interface.


--
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: NPTv6: prefix doesn't change in IPFW when prefix changes on dynamic interface

2022-11-24 Thread Andrey V. Elsukov

24.11.2022 18:27, FreeBSD User пишет:

Hello,

running a small routing/firewall applicance based on 13-STABLE and IPFW, I face 
a problem with
NPTv6. The external IPv6 is changing dynamically. While ipfw in-kernel NAT 
catch up with
dynamical changes of the IPv4, NPTv6 doesn't seem so.

I'm neither an expert in networking nor IPFW.

After a couple of days tun0 (the exterior PPP interface, uplink connection 
managed via mpd5)
has a lot of IPV6 addresses, all but one are marked "deprecated".



In case nor mpd5 is restarted or the exterior interface is assigned with 
several IPv6
addresses of which all but one are marked deprecated, pinging the outside world 
via IPv6 will
take the wrong IPv6 - IPFW doesn't seem to catch up with the changes.

How to fix this?


Hi,

probably the easiest way to solve your problem is periodically running 
some script that will find and delete deprecated addresses from an 
interface.


Then NPTv6 module will use first global prefix on the interface.

--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: ICMPv6 over lo0

2022-11-15 Thread Andrey V. Elsukov

16.11.2022 00:14, tue...@freebsd.org пишет:

when using the master branch of today (or 13.1) I get when running

tuexen@ampere128:~ % ping6 -c 1 -b 3 -s 2 ::1
PING6(20048=40+8+2 bytes) ::1 --> ::1
20008 bytes from ::1, icmp_seq=0 hlim=64 time=0.709 ms

--- ::1 ping6 statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.709/0.709/0.709/0.000 ms

which is expected. What I don't expect is:

tuexen@ampere128:~ % tcpdump -i lo0 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes
22:06:38.835630 IP6 ::1 > ::1: frag (0|1232) ICMP6, echo request, seq 0, length 
1232
22:06:38.835639 IP6 ::1 > ::1: frag (1232|1232)
22:06:38.835641 IP6 ::1 > ::1: frag (2464|1232)

Why is for the Echo Request an MTU of 1280 used, whereas for the response an 
MTU of 16384
is used.

Is this intended? At least for me, it is not expected...


Hi Michael,

I believe it is default behavior for ping6:
```
-u  By default, ping asks the kernel to fragment packets to fit into
the minimum IPv6 MTU.  The -u option will suppress the behavior
in the following two levels: when the option is specified once,
the behavior will be disabled for unicast packets.  When the
option is more than once, it will be disabled for both unicast
and multicast packets.
```

```
% ktrace ping6 -c 1 -b 3 -s 2 ::1
% kdump | grep -A1 MIN_MTU
 14793 ping6CALL 
setsockopt(0x3,IPPROTO_IPV6,IPV6_USE_MIN_MTU,0x7fffe614,0x4)

 14793 ping6RET   setsockopt 0
```

```
if (mflag != 1) {
optval = mflag > 1 ? 0 : 1;

if (setsockopt(ssend, IPPROTO_IPV6, IPV6_USE_MIN_MTU,
, sizeof(optval)) == -1)
err(1, "setsockopt(IPV6_USE_MIN_MTU)");
        }
```
--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: Poor performance with stable/13 and Mellanox ConnectX-6 (mlx5)

2022-06-14 Thread Andrey V. Elsukov

13.06.2022 21:25, Mike Jakubik пишет:

Hello,

I have two new servers with a Mellnox ConnectX-6 card linked at 25Gb/s, 
however, I am unable to get much more than 6Gb/s when testing with iperf3.


The servers are Lenovo SR665 (2 x AMD EPYC 7443 24-Core Processor, 256 
GB RAM, Mellanox ConnectX-6 Lx 10/25GbE SFP28 2-port OCP Ethernet Adapter)


They are connected to a Dell N3224PX-ON switch. Both servers are idle 
and not in use, with a fresh install of stable/13-ebea872f8, nothing 
running on them except ssh, sendmail, etc.


The same exact servers tested on Linux (fedora 34) produced nearly 3x 
the performance (see attached screenshots), i was able to get a steady 
14.6Gb/s rate with nearly 0 retries shown in iperf, the performance on 
FreeBSD seems to avg at around 6Gbs but it is very sporadic during the 
iperf run.


# ifconfig mce0
mce0: flags=8863 metric 0 mtu 1500
options=ffed07bb
     ether b8:ce:f6:81:df:6a
     inet 192.168.10.31 netmask 0xff00 broadcast 192.168.10.255
     media: Ethernet 25GBase-CR 
     status: active
     nd6 options=29


Hi,

Do you have the same MTU size on linux machine?

--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: if_enc(4) and net.inet.ipcomp.ipcomp_enable

2022-03-01 Thread Andrey V. Elsukov

28.02.2022 02:54, Matteo Riondato пишет:

Hello net@,

I am trying to use pf to filter packets in ipsec tunnels by filtering
on enc0 from if_enc(4).

I have the following values for the net.enc sysctl subtree: 
net.enc.out.ipsec_bpf_mask: 1 net.enc.out.ipsec_filter_mask: 1 
net.enc.in.ipsec_bpf_mask: 2 net.enc.in.ipsec_filter_mask: 2


and I have

net.inet.ipsec.filtertunnel: 1

Everything works well when the tunnel does not use ipcomp, but when
it does, the incoming packets seem to ignore the value of the 
net.enc.in.ipsec_filter_mask sysctl, thus they show up in pf “twice”:

once with both external and internall headers, and once only with
internal (the value of 2 for this sysctl should make these packets
show up only with internal headers). The same can be observed with
tcpdump on enc0. This behavior makes it hard to do filtering.

Is this behavior expected?


Hi,

are you sure that it is not just on ingress and egress? You can use -Q 
flag for tcpdump to make sure.


The first time when you see IPcomp packet in PF, it is when it arrives 
into IP stack on a physical interface (em, igb, ix, etc.). The second 
time is after decompression on if_enc interface, it is called from IPsec 
stack.


--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: Porting OpenBSD MPLS to FreeBSD

2021-12-10 Thread Andrey V. Elsukov

08.12.2021 21:01, Lutz Donnerhacke пишет:

On Wed, Dec 08, 2021 at 11:08:38AM +0300, Lev Serebryakov wrote:

On 07.12.2021 17:28, Lutz Donnerhacke wrote:

I do use netgraph for carrier-grade stuff.
Yes, ng_bridge was limited, but this is fixed.

  Doesn't it take separate lock for each packet passed though hook? I'm sure, 
it was true some time ago...


It's a read lock for each packet on each, which can be shared by CPUs.
I'm not aware that there are a major perfomace penalty, so I'd assume the
lock is lightweight.


Hi,

from the previous experience, when we had rwlock on fast path in the 
network stack - it was limited from 500Kpps to 2-4Mpps rate. It depends 
on number of CPU cores and NIC queues.


Using rwlock on high pps rate leads to high lock contention. So, this is 
just personal perception and experience what we call carrier-grade stuff.


I also think it is better to make MPLS implementation independent from 
netgraph. At least until it become lockless.


--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: dtrace to trace incoming connection not suceeding ?

2021-11-14 Thread Andrey V. Elsukov

12.11.2021 20:31, Kurt Jaeger пишет:

That's why I provided two outputs.

There's one small diff between the two that I do not understand:

-   18040 times no signature provided by segment
+   18045 times no signature provided by segment



Hello,

This means, that received TCP segment has not TCP-MD5 signature, but 
listen socket expects it. Such SYN segment will be dropped by syncache 
code. Probably your BGP daemon configured to use TCP-MD5 for connection, 
but remote side does not.


--
WBR, Andrey V. Elsukov


OpenPGP_signature
Description: OpenPGP digital signature


Re: IPSEC problems with pf

2021-09-25 Thread Andrey V. Elsukov
25.09.2021 03:31, Eugene Grosbein пишет:
> I know three main reasons that may prevent firewall+IPSec from working as 
> expected:
> 
> 1) for incoming packets: kernel could drop incoming packet withing ipsec code
> incrementing one of counters shown with "netstat -sp ipsec" command,
> so you should check it out first;
> 
> 2) for both outgoing and incoming packets there could be processing order 
> problem:
> packets processed first by pfil(9) framework (so pf/ipfw have a chance to do 
> NAT etc.)
> and only then sent to ipsec(4) to transform (in FreeBSD 11 at least), not 
> vice versa.

AFAIK, pf does not send packets to IPsec processing after NAT. You need
to make translation after IPsec processing using the if_enc interface.

> 
> 3) also read if_enc(4) manual page to make familiar with net.enc.out.* and 
> net.enc.in.* sysctl family,
> as it may affect, too. If you do not use enc(4) pseudo-interface, make sure 
> you changed defaults to:
> 
> net.enc.in.ipsec_filter_mask=0
> net.enc.out.ipsec_filter_mask=0
Another important variable that needs an attention is
net.inet.ipsec.filtertunnel

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: TCP6 regression for MTU path on stable/13

2021-09-13 Thread Andrey V. Elsukov
12.09.2021 14:12, Harry Schmalzbauer пишет:
> Will try to further track it down, but in case anybody has an idea, what
> change during the last view months in stable/13 could have caused this
> real-world problem regarding resulting TCP6 throughput, I'm happy to
> start testing at that point.

Hi,

Take a look at:

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255749
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248005

does the problem described in these PRs is the same as yours?

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 17:30, Mark Johnston пишет:
>>> So if there is some wired page leak, the pgcache zones are probably not
>>> directly responsible.
>>
>> We don't see any leaks, but our monitoring shows that "free" memory
>> migrates to "wired" and only these zones are grow.
> 
> How are you measuring this?  USED or USED+FREE?

AFAIK, monitoring uses sysctl variables:

vm.stats.vm.v_page_size
vm.stats.vm.v_free_count
vm.stats.vm.v_wire_count

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 16:47, Mark Johnston пишет:
>> We noticed the same problem, I'm not sure the exact version, but you can
>> check the output:
>> # vmstat -z | egrep "ITEM|pgcache"
>>
>> The page cache grows until lowmem is not reached. Then it automatically
>> cleans and begins to grow again.
> 
> The pgcache zones simply provide a per-CPU cache and allocator for
> physical page frames.  The sizes of the caches are bounded.  The numbers
> of "used" items from the pgcache zones do not really tell you anything
> since those pages may be allocated for any number of purposes, including
> for other UMA zones.  For instance, if ZFS allocates a buffer page from
> its ABD UMA zone, and that zone's caches are empty, UMA may allocate a
> new slab using uma_small_alloc() -> vm_page_alloc() -> pgcache zone.
> 
> So if there is some wired page leak, the pgcache zones are probably not
> directly responsible.

We don't see any leaks, but our monitoring shows that "free" memory
migrates to "wired" and only these zones are grow. So, we have on the
graphs linear growing of wired memory over 7 days. When free memory
reaches ~4% all returns to normal, and then again linear growing for 7
days. And pgcache zones reset their number of USED items to low value.
This is on the server with 256G RAM.

E.g. This is when 9% of free memory left:

$ vmstat -z | egrep "ITEM|pgcache"
ITEM   SIZE  LIMIT USED FREE  REQ
FAILSLEEP XDOMAIN
vm pgcache:4096,  0,5225, 139,  412976,   0,
0,   0
vm pgcache:4096,  0,28381269,  77,190108006,  24,
0,   0
vm pgcache:4096,  0,  166358,   11523,1684567513,3054,
 0,   0
vm pgcache:4096,  0,29548679, 576,780034183,1730,
0,   0
$ bc
>>> 5225+28381269+166358+29548679
58101531
>>> 58101531*4096/1024/1024/1024
221
>>>

This is when lowmem triggered:
% vmstat -z | egrep "ITEM|pgcache"
ITEM   SIZE  LIMIT USED FREE  REQ
FAILSLEEP XDOMAIN
vm pgcache:4096,  0,5336, 337,  410052,   0,
0,   0
vm pgcache:4096,  0, 3126129, 117,56689945,  24,
0,   0
vm pgcache:4096,  0,   49771,3910,413657845,1828,
0,   0
vm pgcache:4096,  0, 4249924, 706,224519238, 562,
0,   0
% bc
>>> 5336+3126129+49771+4249924
7431160
>>> 7431160*4096/1024/1024/1024
28
>>>

Look at the graph:
https://imgur.com/yhqK1p8.png

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
03.08.2021 11:40, Özkan KIRIK пишет:
> Thank you Andrey,
> 
> There is no line that contains the expression "pgcache".

Probably, it is only on 13+.

> I wonder that, what is the unit of USED column in vmstat -z output ?
> Is the size of allocated memory USED * SIZE bytes or USED bytes?

Yes, USED is the number of entries with SIZE bytes each.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Wired Memory Increasing about 500MBytes per day

2021-08-03 Thread Andrey V. Elsukov
02.08.2021 08:00, Özkan KIRIK пишет:
> Hello,
> 
> I'm using FreeBSD stable/12 0f97f2a1857a96563792f0d873b11a16ff9f818c (Jul
> 25) built.
> pf, ipfw and ipsec options are built with kernel. The server is used as
> firewall that squid and snort3 (daq - netmap) is running.
> 
> I saw that, wired memory is increasing every day. It's about 500MBytes per
> day. I'm checking vmstat and top (sorted by res), I couldn't find what is
> consuming the wired memory.
> 
> How can I find that which process or which part of kernel is consuming the
> wired memory ?

Hi,

We noticed the same problem, I'm not sure the exact version, but you can
check the output:
# vmstat -z | egrep "ITEM|pgcache"

The page cache grows until lowmem is not reached. Then it automatically
cleans and begins to grow again.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: IPsec performace - netisr hits %100

2021-05-02 Thread Andrey V. Elsukov
30.04.2021 23:11, Özkan KIRIK пишет:
> Hello,
> 
> I'm using FreeBSD stable/12 built world on 12 April 2021.
> my setup is:
> [freebsd host cc0] <> [cc1 - same freebsd, but jail]
> 
> without IPsec, I can achieve easily to 20Gbps. (test was run with different
> source IPs using multiple iperf to scale across multiple queues)
> My hardware is Xeon D-2146NT (8 core + SoC Qat), cc0 and cc1 is Chelsio
> T62100-LP-CR.

I suspect you are using 9k MTU on cc(4) interfaces.
If you set bigger MTU on the if_ipsec(4) interfaces, this can increase
throughput.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: IPsec performace - netisr hits %100

2021-05-02 Thread Andrey V. Elsukov
30.04.2021 23:32, Mark Johnston пишет:
> Second, netipsec unconditionally hands rx processing off to netisr
> threads for some reason, that's why changing the dispatch policy doesn't
> help.  Maybe it's to help avoid running out of kernel stack space or to
> somehow avoid packet reordering in some case that is not clear to me.  I
> tried a patch (see below) which eliminates this and it helped somewhat.
> If anyone can provide an explanation for the current behaviour I'd
> appreciate it.

Previously we have reports about kernel stack overflow during IPsec
processing. In your example there is only one IPsec transform is
configured, but it is possible to configure several in the bundle,
AFAIR, it is limited to 4 transforms. E.g. if you configure ESP+AH - it
is bundle of two transforms and this will grow kernel stack requirements.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


Re: Src IP 0.0.0.0 for outgoing off-net ping & SSH packets

2021-04-23 Thread Andrey V. Elsukov
21.04.2021 19:35, gfoster...@charter.net пишет:
> We are running FreeBSD 10.4 with multipath routing enabled (RADIX_MPATH)
> 
> and are using just a single static route (10.18.91.0/255.255.255.0
> 10.17.118.3)
> 
> when we infrequently run into the problem described below.
> Do you have any tips or specific areas of the routing code I should be
> looking into ?

Hi,

the routing subsystem was significantly reworked in FreeBSD 13.0, you
need to try reproduce the problem on the last release and report back if
it is reproducible. There is very little chance that someone will try
debug the problem in the such outdated code.

-- 
WBR, Andrey V. Elsukov



OpenPGP_signature
Description: OpenPGP digital signature


BFD failures with bird on FreeBSD (was: LACP BPDU packets priority?)

2021-02-09 Thread Andrey V. Elsukov
On 04.02.2021 23:56, Johannes Lundberg wrote:
> Hi
> 
> We're experiencing unstable lacp lagg and not seeing BPDU packets coming
> to the switch we when expect them to. Can anyone answer what is the
> priority of those packets? Could it be that they are not being sent from
> the FreeBSD host because they are stuck in outgoing queue?
> 
> Please cc me since I'm not subscribed.

Hi,

recently we faced with somehow similar problem, but with BGP and BFD
packets.
In our case BFD packets from neighbor do not reach bird daemon in
expected time window, and this leads to BFD failure. We did some
research and found, that BFD packets were received by the adapter, but
then they seems stalled in the adapter queues for at least 2-3 seconds,
then all packets passed to the bird daemon in several microseconds, but
failure has already happened.

First of we configured port mirroring on the switch to capture all
packets on the another host. In the mirrored traffic we can see that all
packets were send and received in correct time windows.

We use Mellanox network adapters, they have ability to attach hardware
timestamps to mbufs, so we can know the time when packets were received.

I made the patch to libpcap and bpf(4) to be able use these timestamps
in tcpdump.
  https://people.freebsd.org/~ae/bpf_adapter.diff

Then we have started two instances of tcpdump, one with host's
timestamps (-j host), second with adapter's timestamps (-j adapter).
When BFD failure happened, we saw equal ordering in both packet
captures. But timestamps are different - packets that were arrived too
late have correct receiving timestamp, so if they would not stalled in
the queues, then BFD failer would not happened.

I know that this story will not help you, but it might be useful for the
mail archives :)

-- 
WBR, Andrey V. Elsukov
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD does not reply to IPv6 Neighbor Solicitations

2021-01-13 Thread Andrey V. Elsukov
On 13.01.2021 00:37, John-Mark Gurney wrote:
>> when this will happen again, it would be nice to make sure that NS
>> packets hit the IP stack. E.g. with attached dtrace script.
> 
> Ok, I ran the dtrace script when I reproduced the problem, and it did
> not produce any output.
> 
> These are effectively what the script does:
> 9) configure inet6 addresses on ure and bge (duplicating the addresses
>already configured)

Does it mean that you just reconfigure address without removing it? It
looks like the problem, that was discussed here
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233535


> bge0:
> inet6 fe80::12e7:c6ff:fexx:%bge0 scopeid 0x2
> mldv2 flags=2 rv 2 qi 125 qri 10 uri 3
> group ff01::1%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:00:00:00:01
> group ff02::1%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:00:00:00:01
> group ff02::1:ffxx:%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:ff:xx:xx:xx
> 
> so, I made things works, and ran ifmcstat again, and this time it has
> an additional group in the output:
> [...]
> bge0:
> inet6 fe80::12e7:c6ff:fexx:%bge0 scopeid 0x2
> mldv2 flags=2 rv 2 qi 125 qri 10 uri 3
> group ff02::1:ff00:c43c%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:ff:00:c4:3c
> group ff01::1%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:00:00:00:01
> group ff02::1%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:00:00:00:01
> group ff02::1:ffxx:%bge0 scopeid 0x2 mode exclude
> mcast-macaddr 33:33:ff:xx:xx:xx
> 
> and the tcpdump output:
> 21:10:53.938655 IP6 fc00:b5d:41c:7e37::7e37 > ff02::1:ff00:c43c: ICMP6, 
> neighbor solicitation, who has fc00:b5d:41c:7e37::c43c, length 32
> 21:10:55.001428 IP6 fc00:b5d:41c:7e37::7e37 > ff02::1:ff00:c43c: ICMP6, 
> neighbor solicitation, who has fc00:b5d:41c:7e37::c43c, length 32

Since ff02::1:ff00:c43c%bge0 is not configured in first case, IP stack
just ignores NS messages and they don't hit ND6 code.

In the PR 233535 the problem was reproducible with MLDv1, so if you
disable MLDv2 will it work (to reduce possible scope of problematic code)?

net.inet6.mld.v2enable=0

-- 
WBR, Andrey V. Elsukov
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD does not reply to IPv6 Neighbor Solicitations

2021-01-12 Thread Andrey V. Elsukov
On 12.01.2021 05:25, John-Mark Gurney wrote:
>> The device, where the capture was taken does not respond tot he NS packet.
>> This might be caused by:
>>  a) the device has a different configured IP address, than requested
>>  b) the network card does not listen to the multicast group, which is
>> used by the request (you see it only due to the promisc mode of the
>> capture). But this is unlikely (due to the promisc mode)
>>  c) your system is broken
> 
> I have some test scripts where something similar to this happens.
> 
> I tcpdump shows the request coming into the FreeBSD box (in this case,
> 13-current main-c255640-gc38e59ce1b0), addressed to the IPv6 of the
> box, and FreeBSD failing to respond w/ an answer for it's own IP...
> 
> This is inconsistent and hard to reproduce, but it does happen with
> somewhat regularity.

Hi,

when this will happen again, it would be nice to make sure that NS
packets hit the IP stack. E.g. with attached dtrace script.

Also net.inet6.icmp6.nd6_debug variable should be set to see error
messages from ND code.

If it doesn't show expected info, this means that packets don't hit IP
stack. Probably some multicast related problem. In this case it could be
useful to obtain output of ifmcstat(8).

-- 
WBR, Andrey V. Elsukov
#!/usr/sbin/dtrace -s

fbt::nd6_ns_input:entry
{
ip = (struct ip6_hdr *)args[0]->m_data;
nd = (struct nd_neighbor_solicit *)args[0]->m_data + args[1];

printf("%s: NS from %s to %s, target %s",
stringof(args[0]->m_pkthdr.rcvif->if_xname),
inet_ntoa6(>ip6_src), inet_ntoa6(>ip6_dst),
inet_ntoa6(>nd_ns_target));
}

fbt::nd6_na_output_fib:entry
{

printf("%s: NA to %s, target %s",
stringof(args[0]->if_xname), inet_ntoa6(args[1]),
inet_ntoa6(args[2]));
}
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D26757: Fix to join AllHost mcast group again when adding an existing IP address

2020-10-13 Thread ae (Andrey V. Elsukov)
ae accepted this revision.
ae added a comment.
This revision is now accepted and ready to land.


  Looks correct to me.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D26757/new/

REVISION DETAIL
  https://reviews.freebsd.org/D26757

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: yannis.planus_alstomgroup.com, #network, mw, ae
Cc: ae, imp, freebsd-net-list, melifaro, rscheff
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: IP reassembly

2020-09-22 Thread Andrey V. Elsukov
On 22.09.2020 15:40, Eugene Grosbein wrote:
> Hi!
> 
> Is our IP reassembly facility supposed to handle incoming out-of-order 
> fragments?
> For example, IPIP packet created with gif(4) interface is fragmented with two 
> parts,
> and parts are delivered out of order, last fragment comes first.
> 
> In fact, I see this results in broken reassembly.

Hi,

IP reassembly is done in ip_input(), it doesn't matter what UL protocol
is inside. Do you have some traces? You can use dtrace fbt probes to
track your datagramms.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Ipv6 neighbor limit

2020-09-03 Thread Andrey V. Elsukov
On 03.09.2020 16:02, Cristian Cardoso wrote:
> Hi
> I don't know if that is it. I am trying to find out if there are any
> limits for ipv6 neighbors in the kernel, as soon I will go over 4000
> servers below my IPv6 router.
> In Juniper (which is a FreeBSD) I can set ndp6-max-cache, for example,
> to support more ipv6 neighbors.

Hi,

there is no such limit. When your system will approach to memory limits,
new ND entries creating will fail.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


[Differential] D24989: netinet: Generate a random RSS key on boot.

2020-06-01 Thread ae (Andrey V. Elsukov)
ae added a comment.


  In D24989#552576 , @avg wrote:
  
  > I have a vague memory, maybe wrong, that commonly used fixed RSS keys were 
selected because they had some property (-ies).
  > So, maybe just being random is not good enough?
  > I think that hypothetical `rss_isbadkey` was mentioned for a reason?
  
  I also have such feeling. For example, you have some server that handles some 
serious workload, but after reboot due to the new key it will not be able to 
handle the same workload.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D24989/new/

REVISION DETAIL
  https://reviews.freebsd.org/D24989

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: neel_neelc.org, #csprng, markm
Cc: avg, markm, cem, #csprng, kevans, debdrup, rwatson, imp, ae, melifaro, 
#contributor_reviews_base, freebsd-net-list, mmacy, kpraveen.lkml_gmail.com, 
marcnarc_gmail.com, simonvella_gmail.com, novice_techie.com, 
tommi.pernila_iki.fi, krzysztof.galazka_intel.com
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: RUNNING flag remains unset upon reinserting a gre into VNET jail

2020-05-07 Thread Andrey V. Elsukov
On 06.05.2020 10:00, Andrey V. Elsukov wrote:
>> # create a gre outside the jail, configure its tunnel endpoints
>>
>> ifconfig gre0 create tunnel 10.1.1.1 10.2.2.2
>> ifconfig gre0  # not RUNNING (OK)
>>
>> # place the gre into the jail, it should be running now
>>
>> ifconfig gre0 vnet demo
>> jexec demo ifconfig gre0  # not RUNNING (not OK)
> 
> Hi,
> 
> I'm not an advanced jail user, so this is my conclusion from a quick
> code look. It looks to me that all IPv4/IPv6 addresses should be purged
> from the interface that was moved from one vnet to another. The fact
> that tunnel's config still here is due to it is stored in the private
> interface's softc. Thus when you move ifnet from one vnet to another,
> ifaddr_event_ext is not handled properly and interface doesn't change
> its state.
> 
> If my conclusion is correct, I see two ways to fix this:
>   1. Add if_reassign() method to all tunneling interfaces and clear
> tunnel config when ifnet is moved to new jail. This will force you
> reconfigure interface after moving. Probably this is POLA violation.

Hi,

I think this patch should help:
https://people.freebsd.org/~ae/gre.diff

It is untested, if you have time please, test and report back.
The patch will clear tunnel config after moving from one vnet to
another. Thus you need to reconfigure all addresses.

>   2. Add if_reassign() method to all tunneling interfaces, that will
> invoke ifaddr_evnet_ext handler. This requires more code and looks
> hackish to me. :)


-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


[Differential] D24061: Hyper-V socket implementation for FreeBSD guest

2020-04-23 Thread ae (Andrey V. Elsukov)
ae added a comment.


  Do you have performance test results for already existing linux 
implementation?
  From a quick look it seems to me there will be bottleneck regarding locking 
that seems can be reduced using CK and epoch. But this task can be done in 
future, if you plan support this code.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D24061/new/

REVISION DETAIL
  https://reviews.freebsd.org/D24061

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: whu, decui_microsoft.com, freebsd-net-list
Cc: ae, greg_unrelenting.technology, imp
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D23737: nat64: Get the IPv4 address from a NAT64 address when comparing addresses in a ICMP translate

2020-02-19 Thread ae (Andrey V. Elsukov)
ae added a comment.


  Also, how did test your changes? :)
  NAT64 currently is not widely used, thus changes here can break something  
and you will know about breakage when it will be not so easy to fix, e.g. after 
release.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D23737/new/

REVISION DETAIL
  https://reviews.freebsd.org/D23737

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: neel_neelc.org, ae
Cc: imp, ae, melifaro, #contributor_reviews_base, freebsd-net-list, mmacy, 
kpraveen.lkml_gmail.com, marcnarc_gmail.com, simonvella_gmail.com, 
novice_techie.com, tommi.pernila_iki.fi
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D23737: nat64: Get the IPv4 address from a NAT64 address when comparing addresses in a ICMP translate

2020-02-19 Thread ae (Andrey V. Elsukov)
ae added a comment.


  In D23737#521593 , @neel_neelc.org 
wrote:
  
  > Here, I also compare the destination addresses. Is this what you want?
  
  No, take a look at RFC 6052 p2.2. 
.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D23737/new/

REVISION DETAIL
  https://reviews.freebsd.org/D23737

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: neel_neelc.org, ae
Cc: imp, ae, melifaro, #contributor_reviews_base, freebsd-net-list, mmacy, 
kpraveen.lkml_gmail.com, marcnarc_gmail.com, simonvella_gmail.com, 
novice_techie.com, tommi.pernila_iki.fi
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D23737: nat64: Get the IPv4 address from a NAT64 address when comparing addresses in a ICMP translate

2020-02-18 Thread ae (Andrey V. Elsukov)
ae requested changes to this revision.
ae added a comment.
This revision now requires changes to proceed.


  The patch is not correct. IPv4 address can be embedded in different places 
depending from configuration.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D23737/new/

REVISION DETAIL
  https://reviews.freebsd.org/D23737

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: neel_neelc.org, ae
Cc: imp, ae, melifaro, #contributor_reviews_base, freebsd-net-list, mmacy, 
kpraveen.lkml_gmail.com, marcnarc_gmail.com, simonvella_gmail.com, 
novice_techie.com, tommi.pernila_iki.fi
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1

2020-02-14 Thread Andrey V. Elsukov
On 13.02.2020 06:21, Rudy wrote:
> 
> 
> I'm having issues with a box that is acting as a BGP router for my
> network.  3 Chelsio cards, two T5 and one T6.  It was working great
> until I turned up our first port on the T6.  It seems like traffic
> passing in from a T5 card and out the T6 causes a really high load (and
> high interrupts).
> 
> Traffic (not that much, right?)
> 
>  Dev  RX bps    TX bps    RX PPS    TX PPS Error
>  cc0   0 0 0 0 0
>  cc1    2212 M   7 M 250 k   6 k 0 (100Gbps uplink,
> filtering inbound routes to keep TX low)
>     cxl0 287 k    2015 M 353   244 k 0   (our network)
>     cxl1 940 M    3115 M 176 k 360 k 0 (our network)
>     cxl2 634 M    1014 M 103 k 128 k 0 (our network)
>     cxl3   1 k  16 M   1 4 k   0
>     cxl4   0 0 0 0 0
>     cxl5   0 0 0 0 0
>     cxl6    2343 M 791 M 275 k 137 k 0 (IX , part of lagg0)
>     cxl7    1675 M 762 M 215 k 133 k 0 (IX , part of lagg0)
>     ixl0 913 k  18 M   0 0 0
>     ixl1   1 M  30 M   0 0 0
>    lagg0    4019 M    1554 M 491 k 271 k   0
>    lagg1   1 M  48 M   0 0 0
> FreeBSD 12.1-STABLE orange 976 Bytes/Packetavg
>  1:42PM  up 13:25, 5 users, load averages: 9.38, 10.43, 9.827

Hi,

did you try to use pmcstat to determine what is the heaviest task for
your system?

# kldload hwpmc
# pmcstat -S inst_retired.any -Tw1

Then capture several first lines from the output and quit using 'q'.

Do you use some firewall? Also, can you show the snapshot from the `top
-HPSIzts1` output.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2020-01-17 Thread Andrey V. Elsukov
On 16.01.2020 19:36, Andrey V. Elsukov wrote:
> For transport mode inner and outer headers will be the same.
> I guess the problem can be reproduced in the lab using the following config:
> 
> [Host A] <--> [Router] <--> [Host B]
> 
> IPsec should be configured between hosts A and B. Then you need to
> reduce MTU on the router. This should lead to ICMP NEEDFRAG messages
> from the router, and then host should correctly handle them.

I have tested this scenario, and it doesn't work. So, I will report back
when there will be some working solution.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2020-01-17 Thread Andrey V. Elsukov
On 17.01.2020 12:36, Victor Sudakov wrote:
> Back to the point. I've figured out that both encrypted (in transport
> mode) and unencrypted TCP segments have the same MSS=1460. Then I'm
> completely at a loss how the encrypted packets avoid being fragmented.
> TCP has no way to know in advance that encryption overhead will be
> added.

For IPsec endpoints (i.e. when you encrypt own sessions) TCP for each
outgoing packet invokes IPSEC_HDRSIZE() method, that returns approximate
size required for IPsec, and using this information it calculates MSS. I
think this should work in this way.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2020-01-16 Thread Andrey V. Elsukov
On 16.01.2020 19:07, Victor Sudakov wrote:
> Eugene Grosbein wrote:
>>
>>> What beats me is that I cannot reproduce this problem in bhyve. In this
>>> packet dump: http://admin.sibptus.ru/~vas/ipsec1.pcap.gz I'm scp-ing a
>>> 50M file from 192.168.246.10 (bhyve guest) to 192.168.246.1 (bhyve
>>> host), and I see no fragments, and the largets packet is 1466 bytes, and
>>> the scp never stalls nor fails.
>>>
>>> Why is it NOT broken this time?
>>>
>>> Both hosts are 12.1-RELEASE-p1.
>>
>> I could not reproduce the problem with unpatched recent stable/11, either :-)
> 
> Is there a way to view the MSS in the TCP segments before encryption or
> after decryption? I want to compare them in situations with IPSec
> enabled and disabled.
> 
> I've never been able to see anything in "tcpdump -i enc0", probably it
> cannot do transport mode IPSec because the man page talks about "outer
> and inner header."

For transport mode inner and outer headers will be the same.
I guess the problem can be reproduced in the lab using the following config:

[Host A] <--> [Router] <--> [Host B]

IPsec should be configured between hosts A and B. Then you need to
reduce MTU on the router. This should lead to ICMP NEEDFRAG messages
from the router, and then host should correctly handle them.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2020-01-16 Thread Andrey V. Elsukov
On 16.01.2020 17:24, Eugene Grosbein wrote:
> 16.01.2020 20:39, Andrey V. Elsukov wrote:
> 
>> I prepared the PoC patch that should fix the problem with TCP and
>> transport mode IPsec. But I have not free time currently to properly
>> test and debug it. It is only compile-tested. But If you want, you can
>> try :)
>> Currently only IPv4 support is implemented.
>>
>> https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff
> 
> In fact, I've faced this problem long time ago too and I work around it with 
> different approaches
> like "ipfw tcp-setmss" (MSS adjust) or by using IPSec transport mode
> with gif(4) interface removing DF bit out of encapsulated packets.
> 
> I was going to test your patch with my home router but the patch does not 
> apply to stable/11, at all.
> Do you have time to adjust it to stable/11 ?

I tried apply the patch with `svn patch` and it applies cleanly. The
only needed change is moving `#include ipsec_support.h` to the top of
file.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2020-01-16 Thread Andrey V. Elsukov
On 23.12.2019 15:00, Andrey V. Elsukov wrote:
> On 20.12.2019 18:23, Victor Sudakov wrote:
>> Dear Colleagues,
>>
>> I've set up IPSec in transport mode between two regular FreeBSD hosts,
>> for testing. Now TCP sessions between those hosts don't work normally
>> any more. For example, scp is stalled almost immediately after starting
>> a file transfer, and so is interactive ssh eventually.
>>
>> I feel that the problem is somehow related to MTU, MSS and fragmentation
>> of ESP packets, because:
>>
>> 1. When IPSec is disabled, I can "ping -s1472 -D" the remote host all
>> right. 
>>
>> 2. When IPSec is enabled, the maximum packet size I've been able to send
>> through is "ping -s1414 -D". ("ping -s1415 -D host-b" already disappears
>> in the void).
> 
> I think the silence from ping is due to IPsec works asynchronously.
> I.e. when application sends data to the stack, it receives good feedback
> and thinks that data was send successful then it waits for reply.
> But IPsec consumes the data and then encrypted data will be send from
> crypto thread via callback. And now they can not be fragmented due to
> IP_DF bit, but there are no app waiting for this error code.
> 
> Similar problem is with TCP. Probably we can try to send PRC_MSGSIZE
> notify when EMSGSIZE is returned from ip_output(). At least for TCP.

Hi,

I prepared the PoC patch that should fix the problem with TCP and
transport mode IPsec. But I have not free time currently to properly
test and debug it. It is only compile-tested. But If you want, you can
try :)
Currently only IPv4 support is implemented.

https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 23.12.2019 15:12, Eugene Grosbein wrote:
> 23.12.2019 19:00, Andrey V. Elsukov wrote:
> 
>> I think the silence from ping is due to IPsec works asynchronously.
>> I.e. when application sends data to the stack, it receives good feedback
>> and thinks that data was send successful then it waits for reply.
>> But IPsec consumes the data and then encrypted data will be send from
>> crypto thread via callback. And now they can not be fragmented due to
>> IP_DF bit, but there are no app waiting for this error code.
>>
>> Similar problem is with TCP. Probably we can try to send PRC_MSGSIZE
>> notify when EMSGSIZE is returned from ip_output(). At least for TCP.
> 
> What is "an application" in this case? Userland app dealing with sockets?
> Another part of the kernel? Some system daemon similar to natd?

TCP tries to automatically adjust MSS to avoid segments loss. It can
interoperate with ICMP to handle ICMP UNREACH messages. AFAIR, it works
via host cache. I need some time to remember how it works.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 20.12.2019 18:23, Victor Sudakov wrote:
> Dear Colleagues,
> 
> I've set up IPSec in transport mode between two regular FreeBSD hosts,
> for testing. Now TCP sessions between those hosts don't work normally
> any more. For example, scp is stalled almost immediately after starting
> a file transfer, and so is interactive ssh eventually.
> 
> I feel that the problem is somehow related to MTU, MSS and fragmentation
> of ESP packets, because:
> 
> 1. When IPSec is disabled, I can "ping -s1472 -D" the remote host all
> right. 
> 
> 2. When IPSec is enabled, the maximum packet size I've been able to send
> through is "ping -s1414 -D". ("ping -s1415 -D host-b" already disappears
> in the void).

I think the silence from ping is due to IPsec works asynchronously.
I.e. when application sends data to the stack, it receives good feedback
and thinks that data was send successful then it waits for reply.
But IPsec consumes the data and then encrypted data will be send from
crypto thread via callback. And now they can not be fragmented due to
IP_DF bit, but there are no app waiting for this error code.

Similar problem is with TCP. Probably we can try to send PRC_MSGSIZE
notify when EMSGSIZE is returned from ip_output(). At least for TCP.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 23.12.2019 14:08, Eugene Grosbein wrote:
>>> Sample patch creates another sysctl but we should do it unconditionally, 
>>> don't we?
>>
>> As I said I didn't find that other OSes do this. Linux has enabled by
>> PMTUD by default, strongswan doesn't set SADB_SAFLAGS_NOPMTUDISC flag,
>> OpenBSD hasn't such quirk. Why should we add this instead of try to fix
>> PMTUD?
> 
> RFC 2401 Appendix B https://tools.ietf.org/html/rfc2401#page-1-48 states
> that packets generated by IPSec transport mode must be "fragmentable" over 
> the path
> and this is incompatible with DF=1.

I don't see such requirements here, I think you read this somewhere
between lines :-)

"If required, IP fragmentation occurs after IPsec processing within an
  IPsec implementation. Thus, transport mode AH or ESP is applied only
 to whole IP datagrams (not to IP fragments)."

This is exactly how it works now. IPsec does encryption and passes ESP
packet to IP stack, then it can be fragmented if it is allowed (i.e. no
DF bit set).

"An IP packet to which AH or ESP has been applied may itself be
fragmented by routers en route, and such fragments MUST be reassembled
prior to IPsec processing at a receiver."

If fragmentation was allowed at previous step, the receiver will have
several fragments that will be reassembled into single ESP packet, and
then it will be decrypted and passed to IP stack. I.e. IPsec will not
try to decrypt each fragment before reassembly.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 23.12.2019 13:55, Eugene Grosbein wrote:
>> I think the real problem is that PMTUD doesn't work correctly with
>> IPsec. Linux has special sysctl variabl ip_no_pmtu_disc and flag
>> SADB_SAFLAGS_NOPMTUDISC for SA that can disable PMTUD for IPv4 and IP_DF
>> flag will not be set. We can add some similar quirks, but it would be
>> better to fix PMTUD. We already have hundreds sysctl in our system and
>> remembering all them is a problem too.
> 
> It's true that PMTUD does not work with IPSec transport mode.
> 
> I think we could just clear DF bit off encapsulated transport mode packets 
> unconditionally,
> please take a look at last chunk of sample patch in the PR 242744:
> https://bz-attachments.freebsd.org/attachment.cgi?id=210122
> 
> Sample patch creates another sysctl but we should do it unconditionally, 
> don't we?

As I said I didn't find that other OSes do this. Linux has enabled by
PMTUD by default, strongswan doesn't set SADB_SAFLAGS_NOPMTUDISC flag,
OpenBSD hasn't such quirk. Why should we add this instead of try to fix
PMTUD?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 23.12.2019 13:06, Victor Sudakov wrote:
>> ESP xform for transport mode just replaces protocol in IP header and
>> adds some info to the end of a packet.
> 
> It is rather easy to verify your theory. If you are right, then
> disabling net.inet.tcp.path_mtu_discovery globally should remove the DF
> flags from the ESP packets too, right?
> 
> Of course, net.inet.tcp.path_mtu_discovery=0 is not a solution, it's just
> a way to check the origin of the DF flag.
> 
> And if you are right, what does it mean to us? Did you see
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242744 already ?
> 
> My ultimate wish is to make transport mode work out of the box, without
> any workarounds like additional host routes or firewall rules.

I think the real problem is that PMTUD doesn't work correctly with
IPsec. Linux has special sysctl variabl ip_no_pmtu_disc and flag
SADB_SAFLAGS_NOPMTUDISC for SA that can disable PMTUD for IPv4 and IP_DF
flag will not be set. We can add some similar quirks, but it would be
better to fix PMTUD. We already have hundreds sysctl in our system and
remembering all them is a problem too.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 23.12.2019 12:39, Andrey V. Elsukov wrote:
> On 20.12.2019 19:22, Victor Sudakov wrote:
>>> What's the root of the problem? ESP packets cannot get fragmented or
>>> what? 
>>
>> Wireshark has shown that the "Don't Fragment" flag is set on all ESP
>> (protocol 50) packets. Who does this, why, and how can I switch it off
>> globally?
> 
> Hi,
> 
> I think this DF flag is originally from TCP packet.
> ESP xform for transport mode just replaces protocol in IP header and
> adds some info to the end of a packet.

This is controlled by net.inet.tcp.path_mtu_discovery variable.
TCP won't set IP_DF flag if you disable this feature.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPSec transport mode, mtu, fragmentation...

2019-12-23 Thread Andrey V. Elsukov
On 20.12.2019 19:22, Victor Sudakov wrote:
>> What's the root of the problem? ESP packets cannot get fragmented or
>> what? 
> 
> Wireshark has shown that the "Don't Fragment" flag is set on all ESP
> (protocol 50) packets. Who does this, why, and how can I switch it off
> globally?

Hi,

I think this DF flag is originally from TCP packet.
ESP xform for transport mode just replaces protocol in IP header and
adds some info to the end of a packet.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: NAT64 return traffic vanishes after successful de-alias

2019-12-15 Thread Andrey V. Elsukov
On 15.12.2019 19:15, John W. O'Brien wrote:
> Yes, this is exactly the problem. Thank you very much!
> 
> The reason it was working in the EC2 case is because the FreeBSD AMIs
> set ipv6_activate_all_interfaces="YES".
> 
> It helps me quite a lot to learn the concept of "reschedules a packet
> again on the same interface". That fills in a gap that I am sure will
> come in handy when trying to reason about behavior in the future.
> 
> Incidentally, where are those drops counted? I did start looking at
> "netstat -i" and "netstat -s" for clues, and even now that I know what
> to look for, I'm not sure I know what I'm seeing. Is it "ip6: output
> packets discarded due to no route"?

I think you can see such drops in the `netstat -isp ip6` output for each
specific interface in the `input datagram discarded` row.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: NAT64 return traffic vanishes after successful de-alias

2019-12-15 Thread Andrey V. Elsukov
On 14.12.2019 22:54, John W. O'Brien wrote:
> Hello FreeBSD Networking,
> 
> As the subject summarizes, I have a mostly-working NAT64 rig, but return
> traffic is disappearing, and I haven't been able to figure out why. I
> observe the post-translation (4-to-6) packets via ipfwlog0, but a simple
> ipfw counter rule ipfw matches nothing.

I suspect you have disabled IPv6 on the interface, where IPv4 address is
configured. Check that IFDISABLED flag is not set on the IPv4 side
interface.

When NAT64 does translation, by default it reschedules a packet again on
the same interface, but from another address family, so if you have
disabled IPv6, a packet will be just dropped by ip6_input.
You can enable IPv6 by the following command:

 # ifconfig igb0 inet6 -ifdisabled

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: icmp v4 redirect timeout

2019-10-25 Thread Andrey V. Elsukov
On 22.10.2019 17:30, Victor Gamov wrote:
> Hi All
> 
> I discover the following problem: FreeBSD host install route recived by
> ICMP-redirect from default GW and this route is permanents.
> 
> 
> In my case FreeBSD 192.168.1.10/24 has 192.168.1.254 as default gateway.
> This network has interconnection with remote network 192.168.88.0/24 via
> some other gateways -- 192.168.1.199 + 192.168.1.195 for example.  All
> gateways  using OSPF to exchange routing info.
> 
> 192.168.1.10 send packet destined to remote network (192.168.88.0/24) to
> default gateway 192.168.1.254, receive ICMP-redirect and install route
> to 192.168.88.0/24 via 192.168.1.199.  Then 192.168.1.199 off for some
> reason but 192.168.1.10 never know about it because route installed via
> 192.168.1.199 is permanent.
> 
> I see net.inet6.icmp6.redirtimeout in my FreeBSD 11.2-STABLE #0 r339734
> and I think this sysctl set timeout for routes installed via
> ICMP-redirects (route deletes after this timeout?).
> 
> Is it possible to get such sysctl for ipv4 ?

I think expiring doesn't work for IPv6 too. At least, I didn't find
related code from a quick look.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: How to disable tryforward ?

2019-10-25 Thread Andrey V. Elsukov
On 22.10.2019 08:38, k simon wrote:
> Hi,
> Tryforwad have merged 3 yeas ago, and  it haven’t a sysctl to disable it , so 
> ECMP was broken past 3 years. Olivier has fired a bug : 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225792 , it seems that a 
> few of people cares it.
> Andrey said maybe some ipsec policy can disable tryforward.( 
> https://lists.freebsd.org/pipermail/freebsd-net/2017-February/047203.html. ) 
> I have tried a lot configurations,  but  failed. Can someone point it out ?
> Thanks!

AFAIR, tryforward was disabled by default later.
You need to disable icmpredirects to enable tryforward. So, if you don't
need tryforward, just enable ICMP redirects.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: FRR on FreeBSD 12 - problems with OSPFv3

2019-10-11 Thread Andrey V. Elsukov
On 11.10.2019 12:09, Rudy wrote:
> I just upgraded from FreeBSD 11 to 12 and upgrade from quagga to FRR at
> the same time. I've tried frr6 and frr7 and get the same errors.
> 
> *** CRASH ***
> If I run on the command line and don't background, it bombs after 7
> seconds:
> # ospf6d
> Illegal instruction
> 
> 
> Here is the end of truss:
> # truss ospf6d
> ...
> mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 34863378432 (0x81e04f000)
> mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 34863382528 (0x81e05)
> mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 34863386624 (0x81e051000)
> mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 34863390720 (0x81e052000)
> SIGNAL 11 (SIGSEGV) code=SEGV_ACCERR trapno=12 addr=0x7fffdff8
> process killed, signal = 4
>

SIGILL usually means that a binary/library was built for specific CPU
and you need to rebuild it on the local host. If it was installed from
the official packages, this means that the port should be fixed to not
have such specific optimization flags.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: dummynet: bandwidth is limited to 2 Gbit/s ?

2019-09-25 Thread Andrey V. Elsukov
On 25.09.2019 16:51, Eugene Grosbein wrote:
>> Will this break upgrades with freebsd-update? On a major upgrade,
>> it will first install the new kernel and require a reboot before
>> you run freebsd-update again to install the rest.
> 
> So it will run without dummynet pipes (traffic shaping) configured
> meantime. Is it big deal?

Note, that if you have ipfw rule with pipe, that does not exist, all
matched traffic will be dropped. :-)

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: dummynet: bandwidth is limited to 2 Gbit/s ?

2019-09-25 Thread Andrey V. Elsukov
On 24.09.2019 08:42, Andriy Gapon wrote:
> 
> It seems that the userland component of ipfw/dummynet uses int for the 
> bandwidth
> represented in bit/s.  Also, int is used for passing that value from the
> userland to the kernel.
> 
> What would be the best way to extend this?
> Just use a larger type?
> Or maybe add another field to try to preserve KBI backward compatibility?

There is yet another problem, that you need to keep in mind.
Some people may use old ipfw(8) binaries inside jails, that can be
executed on modern host system. So, if you will break KBI, then such
jails will stop work.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: finding optimal ipfw strategy

2019-08-27 Thread Andrey V. Elsukov
On 26.08.2019 19:25, Victor Gamov wrote:
> More general question about my current config.  I have about 200Mbit
> input multicasts which bridged and filtered later (about 380 Mbit
> bridged if trafshow does not lie me :-) )
> 
> My FreeBSD box (12.0-STABLE r348449 GENERIC  amd64)  has one "Intel(R)
> Xeon(R) CPU E31270 @ 3.40GHz"  and 4-ports  "Intel(R) PRO/1000
> PCI-Express Network Driver".  HT disabled and traffic mainly income via
> igb0 and out both via igb0 and igb2.  About 30 VLANs now active some at
> igb0 and some at igb2.
> 
> 
> And I have following `top` stat:
> =
> CPU 0:  0.0% user,  0.0% nice, 80.5% system,  0.0% interrupt, 19.5% idle
> CPU 1:  0.0% user,  0.0% nice, 34.1% system,  0.0% interrupt, 65.9% idle
> CPU 2:  0.0% user,  0.0% nice, 17.1% system,  0.0% interrupt, 82.9% idle
> CPU 3:  0.0% user,  0.0% nice, 46.3% system,  0.0% interrupt, 53.7% idle
> =

This doesn't look like heavy ipfw load.
E.g. this is top output from slightly loaded firewall (300Mbytes/s
~500kpps):

last pid: 58184;  load averages:  9.07,  8.98,  8.83
  up
72+07:45:55  21:01:36
821 processes: 36 running, 680 sleeping, 105 waiting
CPU 0:   0.0% user,  0.0% nice,  0.0% system, 28.1% interrupt, 71.9% idle
CPU 1:   0.0% user,  0.0% nice,  0.0% system, 33.6% interrupt, 66.4% idle
CPU 2:   0.0% user,  0.0% nice,  0.0% system, 28.9% interrupt, 71.1% idle
CPU 3:   0.0% user,  0.0% nice,  0.0% system, 27.3% interrupt, 72.7% idle
CPU 4:   0.0% user,  0.0% nice,  0.0% system, 21.1% interrupt, 78.9% idle
CPU 5:   0.0% user,  0.0% nice,  0.0% system, 26.6% interrupt, 73.4% idle
CPU 6:   0.0% user,  0.0% nice,  0.0% system, 28.1% interrupt, 71.9% idle
CPU 7:   0.0% user,  0.0% nice,  0.0% system, 21.1% interrupt, 78.9% idle
CPU 8:   0.0% user,  0.0% nice,  0.0% system, 35.2% interrupt, 64.8% idle
CPU 9:   0.0% user,  0.0% nice,  0.0% system, 29.7% interrupt, 70.3% idle
CPU 10:  0.0% user,  0.0% nice,  0.0% system, 27.3% interrupt, 72.7% idle
CPU 11:  0.0% user,  0.0% nice,  0.0% system, 19.5% interrupt, 80.5% idle
CPU 12:  0.0% user,  0.0% nice,  0.0% system, 32.8% interrupt, 67.2% idle
CPU 13:  0.0% user,  0.0% nice,  0.0% system, 34.4% interrupt, 65.6% idle
CPU 14:  0.0% user,  0.0% nice,  0.0% system, 29.7% interrupt, 70.3% idle
CPU 15:  0.0% user,  0.0% nice,  0.0% system, 26.6% interrupt, 73.4% idle
CPU 16:  0.0% user,  0.0% nice,  0.0% system, 28.9% interrupt, 71.1% idle
CPU 17:  0.0% user,  0.0% nice,  0.0% system, 34.4% interrupt, 65.6% idle
CPU 18:  0.0% user,  0.0% nice,  0.0% system, 36.7% interrupt, 63.3% idle
CPU 19:  0.0% user,  0.0% nice,  0.0% system, 21.9% interrupt, 78.1% idle
CPU 20:  0.0% user,  0.0% nice,  0.0% system, 21.1% interrupt, 78.9% idle
CPU 21:  0.0% user,  0.0% nice,  0.0% system, 32.0% interrupt, 68.0% idle
CPU 22:  0.0% user,  0.0% nice,  0.0% system, 33.6% interrupt, 66.4% idle
CPU 23:  0.0% user,  0.0% nice,  0.0% system, 26.6% interrupt, 73.4% idle
CPU 24:  0.0% user,  0.0% nice,  0.0% system, 21.9% interrupt, 78.1% idle
CPU 25:  0.0% user,  0.0% nice,  0.0% system, 21.1% interrupt, 78.9% idle
CPU 26:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 27:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle

# pmcstat -S instructions -Tw1
PMC: [INSTR_RETIRED_ANY] Samples: 443074 (100.0%) , 0 unresolved
Key: q => exiting...
%SAMP IMAGE  FUNCTION CALLERS
 39.2 kernel sched_idletd fork_exit
 10.9 ipfw.koipfw_chk ipfw_check_packet
  3.6 kernel cpu_search_lowestcpu_search_lowest
  2.8 kernel lock_delay   _mtx_lock_spin_cookie
  2.5 kernel _rm_rlockin6_localip:1.3 pfil_run_hooks:0.6
  2.2 kernel rn_match ta_lookup_radix:1.5
fib6_lookup_nh_basic:0.6

As you can see, when ipfw produces high load, interrupt column is more
than system.


-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: finding optimal ipfw strategy

2019-08-26 Thread Andrey V. Elsukov
On 26.08.2019 03:30, Eugene Grosbein wrote:
>>> Also, you should use old table numbers instead of new symbolic table names
>>> when you have many rules checking for interface names and much traffic
>>> because checks for numbered tables are slightly more efficient.
>>> You may use symbolic names still at source level:
>>
>> There isn't any old tables, all tables have symbolic names. Even when
>> you are creating "table(1)", its name is converted into symbolic name.
> 
> Yes, and this code path is slightly more efficient. A bit.

I have not any performance measurements, but this code is for
compatibility and it has more checks to implement this compatibility.
So, I doubt it is more efficient :)
Internally all symbolic names are mapped into indexes and there should
not be any performance impact on packets processing.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: finding optimal ipfw strategy

2019-08-25 Thread Andrey V. Elsukov
On 24.08.2019 22:34, Eugene Grosbein wrote:
> If you are concerned of performance, general rule applies: less checks, 
> better performance.
> 
> First, use 'out xmit' instead of 'out via'. They are semantically equal and 
> this is micro-optimization
> but it still saves extra check unneeded when combined with "out" keyword.
> 
> Also, you should use old table numbers instead of new symbolic table names
> when you have many rules checking for interface names and much traffic
> because checks for numbered tables are slightly more efficient.
> You may use symbolic names still at source level:

There isn't any old tables, all tables have symbolic names. Even when
you are creating "table(1)", its name is converted into symbolic name.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: pf (rules and nat) + (ipfw + dummynet)

2019-08-19 Thread Andrey V. Elsukov
On 18.08.2019 00:25, Andrew White wrote:
> I also see some work underway to separate dummynet from ipfw, is there any
> docs for the goals or timelines, will this allow dummynet anchors and use
> of dnctl to use pf with dummynet like in macos ?

JFYI,

dummynet uses single exclusive mutex and this kills performance on
modern hardware. If you don't have some patches that are ready for
committing, I think after several months this code will be significantly
rewritten by me and your WIP patches will become stale.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: igb netstat input counters 2x?

2019-08-15 Thread Andrey V. Elsukov
On 14.08.2019 03:27, John-Mark Gurney wrote:
> I'm doing some perf testing on an APU4 board, and I noticed that
> it looks like the input netstat counters are 2x than what they should
> be.
> 
> I was seeing 60MiB/sec via netstat -w 1 -I igb1:
>  40034 0 0   60760352   2538 0 177909 0
>  40700 0 0   61776228   2574 0 180300 0
> 
> But the program was only reading 27MB/sec.  I decided to read the mac
> stats directly via:
> bytes=$(sysctl -n dev.igb.1.mac_stats.good_octets_recvd); while sleep 1; do
>   nbytes=$(sysctl -n dev.igb.1.mac_stats.good_octets_recvd)
>   echo $(($nbytes - $bytes)); bytes=$nbytes
> done
> 
> and saw much more reasonable numbers:
> 31099740
> 30512488
> 30675974
> 
> Which is more in line w/ the 27MB/sec that the program reports.
> 
> I haven't looked at the code to see what could be causing the double
> counting.  Also, the output numbers appear to be accurate.
> 
> This is with 13.0-CURRENT from the July 25th snapshot, which is r350322.

Does this doubling happens only with IBYTES counter? What about
IPACKETS? Also I'd check L2/L3 addresses to be sure that they by
accident are not broadcast/multicast.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Preferring internal IPv6 source address over gif tunnel IP?

2019-07-31 Thread Andrey V. Elsukov
On 31.07.2019 15:50, Viktor Dukhovni wrote:
> After further manpage reading, it seems to work with:
> 
> ifconfig_gif0_ipv6="inet6 ::2 ::1 prefixlen 
> 128 no_prefer_iface"
> ifconfig_igb1_ipv6="inet6 ::1 prefixlen 64 prefer_source"
> ip6addrctl_policy="AUTO"

Yes, in general this should help.

"no_prefer_iface" will lead to ignoring of "Rule 5: Prefer outgoing
interface", and then address with "prefer_source" flag will be chosen in
"Rule 10: prefer address with `prefer_source' flag" before "Rule 14: Use
longest matching prefix".

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: How to set up ipfw(8) NAT between an alias and the main IP address, when the alias is in another network?

2019-07-08 Thread Andrey V. Elsukov
On 06.07.2019 11:01, Yuri wrote:
> My network interface looks like this:
> $fw nat 1 config redirect_addr 192.168.100.2 192.168.1.2 redirect_addr
> 192.168.1.2 192.168.100.2 if sk0 unreg_only reset
> 
> $fw add 1001 nat 1 tcp from 192.168.100.2/32 to any via sk0 keep-state
> 
> $fw add 1002 check-state
> 
> 
> The rule 1001 has keep-state, therefore it should process both outgoing
> tcp and incoming response packets. But the outbound packets are NATted,
> but the inbound ones are not.
> 
> What is wrong, and how to fix this script?

'keep-state' creates state for TCP connection that is not yet
translated, thus it won't handle the reply packet, that has translated
address/port.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPFW NAT64 changed 11.2 --> 11.3?

2019-06-26 Thread Andrey V. Elsukov
On 26.06.2019 14:23, Patrick M. Hausen wrote:
> Hi all,
> 
>> Am 26.06.2019 um 12:28 schrieb Andrey V. Elsukov :
>>
>> On 26.06.2019 13:10, Patrick M. Hausen wrote:
>>> tcpdump will take some more time, currently we do not have /dev/bpf in 
>>> these jails.
>>
>> So, nat64_direct_output didn't help?
>> Does `ipfw nat64lsn NAT64 list states` shows correct addresses?
> 
> No, it didn’t. Yes, the IPv4 addresses shown are the external addresses
> of these „gate64“ jails.
> 
> See:
> 
> 13:06:28.205602 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 16) 
> 2a00:b580:8000:12:40f9:d4:cd11:d68c > 64:ff9b::9765:7085: [icmp6 sum ok] 
> ICMP6, echo request, seq 0
> 13:06:28.205611 IP (tos 0x0, ttl 63, id 25804, offset 0, flags [DF], proto 
> ICMP (1), length 36)
> 217.29.40.145 > 151.101.112.133: ICMP echo request, id 1024, seq 0, 
> length 16
> 13:06:28.207853 IP (tos 0x0, ttl 58, id 53557, offset 0, flags [none], proto 
> ICMP (1), length 36)
> 151.101.112.133 > 217.29.40.145: ICMP echo reply, id 1024, seq 0, length 
> 16
> 13:06:28.207861 IP6 (hlim 57, next-header ICMPv6 (58) payload length: 16) 
> d91d:2891::9765:7085 > 2a00:b580:8000:12:40f9:d4:cd11:d68c: [icmp6 sum ok] 
> ICMP6, echo reply, seq 0
> 13:06:29.268095 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 16) 
> 2a00:b580:8000:12:40f9:d4:cd11:d68c > 64:ff9b::9765:7085: [icmp6 sum ok] 
> ICMP6, echo request, seq 1
> 13:06:29.268106 IP (tos 0x0, ttl 63, id 18866, offset 0, flags [DF], proto 
> ICMP (1), length 36)
> 217.29.40.145 > 151.101.112.133: ICMP echo request, id 1024, seq 1, 
> length 16
> 13:06:29.270335 IP (tos 0x0, ttl 58, id 53653, offset 0, flags [none], proto 
> ICMP (1), length 36)
> 151.101.112.133 > 217.29.40.145: ICMP echo reply, id 1024, seq 1, length 
> 16
> 13:06:29.270340 IP6 (hlim 57, next-header ICMPv6 (58) payload length: 16) 
> d91d:2891::9765:7085 > 2a00:b580:8000:12:40f9:d4:cd11:d68c: [icmp6 sum ok] 
> ICMP6, echo reply, seq 1
> 
> So the IPv4 echo and reply exchange looks good. Then the packet is
> forwarded to IPv6 with an entirely bogus (AFAIK) IPv6 source address.
> 
> Interestingly the host portion of the address that should be nat64 is 
> identical,
> but the prefix - where does it get that idea?

Thanks, it is very useful. Due to the code difference between head/ and
stable/11 there were some partial MFCs, and r334836 seems has missing
part of code. Can you try this patch?

-- 
WBR, Andrey V. Elsukov
Index: sys/netpfil/ipfw/nat64/nat64lsn.c
===
--- sys/netpfil/ipfw/nat64/nat64lsn.c	(revision 349408)
+++ sys/netpfil/ipfw/nat64/nat64lsn.c	(working copy)
@@ -408,6 +408,7 @@ nat64lsn_translate4(struct nat64lsn_cfg *cfg, cons
 	} else
 		logdata = NULL;
 
+	src6 = cfg->base.plat_prefix;
 	nat64_embed_ip4(, cfg->base.plat_plen, htonl(f_id->src_ip));
 	ret = nat64_do_handle_ip4(*pm, , >addr, lport,
 	>base, logdata);


signature.asc
Description: OpenPGP digital signature


Re: IPFW NAT64 changed 11.2 --> 11.3?

2019-06-26 Thread Andrey V. Elsukov
On 26.06.2019 13:10, Patrick M. Hausen wrote:
> tcpdump will take some more time, currently we do not have /dev/bpf in these 
> jails.

So, nat64_direct_output didn't help?
Does `ipfw nat64lsn NAT64 list states` shows correct addresses?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPFW NAT64 changed 11.2 --> 11.3?

2019-06-26 Thread Andrey V. Elsukov
On 26.06.2019 11:05, Patrick M. Hausen wrote:
> Hi all,
> 
> we have a bit of a problem with some new servers that
> use NAT64 to access certain services that offer only
> legacy IP - like github.
> 
> As far as I found the respective NAT64 gateways (in jails
> with VNET) are configured identically except for the
> particular addresses, of course.
> 
> Yet, 11.2 works, 11.3-RC1 doesn’t> Any hints welcome.

Check the output of the following commands on both translators:

# sysctl net.inet.ip.fw | grep nat64
# ipfw nat64lsn all list
# ipfw nat64lsn NAT64 stats

# ipfw nat64lsn NAT64 config log
# ifconfig ipfwlog0 create
# tcpdump -nvi ipfwlog0

Check the counters of rules with nat64lsn action, probably you use
netisr output (default mode) and have traffic loops, i.e. a packet
captured by NAT64 instance several times.
Your rules looks like direct output is preferable for you (try to set
net.inet.ip.fw.nat64_direct_output=1).

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: ng_snd_item: Panic?

2019-06-25 Thread Andrey V. Elsukov
On 25.06.2019 15:59, Larry Rosenman wrote:
> On 06/25/2019 4:18 am, Andrey V. Elsukov wrote:
>> On 24.06.2019 23:10, Larry Rosenman wrote:
>>>>> #5  0x828ee5b7 in ng_snd_item (item=0xf8021e3b4d80,
>>>>> flags=0)
>>>>>     at /usr/src/sys/netgraph/ng_base.c:2252
>>>>
>>>> It looks like you use some netgraph based ethernet interface.
>>>> The system got received ARP request and is going to send the reply,
>>>> but somehow mbuf with this ARP request has initialized m_next pointer,
>>>> thus it is considered as a chain of mbufs.
>>>>
>>>> in_arpinput() reuses received mbuf to construct the reply, but it
>>>> doesn't check that an mbut is a chain. It just sets m_len and sends it.
>>>> Then since you have INVARIANTS in your kernel, the netgraph code check
>>>> the actual length of the chain, and it doesn't match to m_len. It
>>>> panics.
>>>
>>>
>>> so, is this a bug?  Timing race? Other?
>>
>> I think we should determine that my assumption is correct :)
>> Can you show the output of the following commands from the kgdb for this
>> core?
>>
>> (kgdb) f 7
>> (kgdb) p *m
>> (kgdb) p *m->m_next
> 
> 
> (kgdb) fr 7
> #7  0x805b1e43 in ether_output (ifp=,
> m=0xf81f59eefb00, dst=0xfe012628d740, ro=) at
> /usr/src/sys/net/if_ethersubr.c:430
> 430    if ((error = (*ng_ether_output_p)(ifp, )) != 0) {

I failed to track the possible way to get this.
Please, show the output of the following commands:
(kgdb) f 7
(kgdb) p/x (u_char[42])m->m_data
(kgdb) p/x (u_char[1372]m->m_next->m_data

Did you used this configuration for the long time and these panics were
the first time?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: ng_snd_item: Panic?

2019-06-25 Thread Andrey V. Elsukov
On 24.06.2019 23:10, Larry Rosenman wrote:
>>> #5  0x828ee5b7 in ng_snd_item (item=0xf8021e3b4d80, flags=0)
>>>     at /usr/src/sys/netgraph/ng_base.c:2252
>>
>> It looks like you use some netgraph based ethernet interface.
>> The system got received ARP request and is going to send the reply,
>> but somehow mbuf with this ARP request has initialized m_next pointer,
>> thus it is considered as a chain of mbufs.
>>
>> in_arpinput() reuses received mbuf to construct the reply, but it
>> doesn't check that an mbut is a chain. It just sets m_len and sends it.
>> Then since you have INVARIANTS in your kernel, the netgraph code check
>> the actual length of the chain, and it doesn't match to m_len. It panics.
> 
> 
> so, is this a bug?  Timing race? Other?

I think we should determine that my assumption is correct :)
Can you show the output of the following commands from the kgdb for this
core?

(kgdb) f 7
(kgdb) p *m
(kgdb) p *m->m_next

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: ng_snd_item: Panic?

2019-06-24 Thread Andrey V. Elsukov
24.06.2019 21:32, Larry Rosenman пишет:
> Got 2 of these today, and I have cores
> Ideas?
> r349200.
> 
> Unread portion of the kernel message buffer:
> panic: ng_snd_item: 42 != 1414
> cpuid = 10
> time = 1561382494
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe012628d400
> vpanic() at vpanic+0x19d/frame 0xfe012628d450
> panic() at panic+0x43/frame 0xfe012628d4b0
> ng_snd_item() at ng_snd_item+0x477/frame 0xfe012628d4f0
> ng_ether_output() at ng_ether_output+0x5e/frame 0xfe012628d520
> ether_output() at ether_output+0x473/frame 0xfe012628d5c0
> arpintr() at arpintr+0xfe3/frame 0xfe012628d780
> netisr_dispatch_src() at netisr_dispatch_src+0x89/frame 0xfe012628d7f0
> ether_demux() at ether_demux+0x137/frame 0xfe012628d820
> ng_ether_rcv_upper() at ng_ether_rcv_upper+0x95/frame 0xfe012628d840
> ng_apply_item() at ng_apply_item+0xf1/frame 0xfe012628d8c0
> ng_snd_item() at ng_snd_item+0x2ab/frame 0xfe012628d900
> ng_apply_item() at ng_apply_item+0xf1/frame 0xfe012628d980
> ng_snd_item() at ng_snd_item+0x2ab/frame 0xfe012628d9c0
> ng_ether_input() at ng_ether_input+0x4c/frame 0xfe012628d9f0
> ether_nh_input() at ether_nh_input+0x2cd/frame 0xfe012628da40
> netisr_dispatch_src() at netisr_dispatch_src+0x89/frame 0xfe012628dab0
> ether_input() at ether_input+0x48/frame 0xfe012628dad0
> bce_intr() at bce_intr+0x697/frame 0xfe012628db50
> ithread_loop() at ithread_loop+0x187/frame 0xfe012628dbb0
> fork_exit() at fork_exit+0x84/frame 0xfe012628dbf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe012628dbf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> Uptime: 4d18h45m34s
> Dumping 24921 out of 131026 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> #5  0x828ee5b7 in ng_snd_item (item=0xf8021e3b4d80, flags=0)
> at /usr/src/sys/netgraph/ng_base.c:2252

It looks like you use some netgraph based ethernet interface.
The system got received ARP request and is going to send the reply,
but somehow mbuf with this ARP request has initialized m_next pointer,
thus it is considered as a chain of mbufs.

in_arpinput() reuses received mbuf to construct the reply, but it
doesn't check that an mbut is a chain. It just sets m_len and sends it.
Then since you have INVARIANTS in your kernel, the netgraph code check
the actual length of the chain, and it doesn't match to m_len. It panics.

> #6  0x82900c2e in ng_ether_output (ifp=, 
> mp=0xfe012628d578) at /usr/src/sys/netgraph/ng_ether.c:294
> #7  0x805b1e43 in ether_output (ifp=, 
> m=0xf81f59eefb00, dst=0xfe012628d740, ro=)
> at /usr/src/sys/net/if_ethersubr.c:430
> #8  0x805cb3e3 in in_arpinput (m=)
> at /usr/src/sys/netinet/if_ether.c:1152
> #9  arpintr (m=0xf81f59eefb00) at /usr/src/sys/netinet/if_ether.c:749
> #10 0x805bcf89 in netisr_dispatch_src (proto=4, 
> source=, m=) at /usr/src/sys/net/netisr.c:1123
> #11 0x805b22d7 in ether_demux (ifp=0xf8012c902000, 
> m=) at /usr/src/sys/net/if_ethersubr.c:913
> #12 0x82901045 in ng_ether_rcv_upper (hook=, 
> item=) at /usr/src/sys/netgraph/ng_ether.c:741
> #13 0x828ee6e1 in ng_apply_item (node=0xf81054f43400, 
> item=0xf8021e3b4d80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
> #14 0x828ee3eb in ng_snd_item (item=0xf8021e3b4d80, flags=0)
> at /usr/src/sys/netgraph/ng_base.c:2320
> #15 0x828ee6e1 in ng_apply_item (node=0xf8012c2d3e00, 
> item=0xf8021e3b4d80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
> #16 0x828ee3eb in ng_snd_item (item=0xf8021e3b4d80, flags=0)
> at /usr/src/sys/netgraph/ng_base.c:2320
> #17 0x82900cbc in ng_ether_input (ifp=, 
> mp=0xfe012628da18) at /usr/src/sys/netgraph/ng_ether.c:255
> #18 0x805b34fd in ether_input_internal (ifp=0xf8012c902000, 
> m=0xf81f59eefb00) at /usr/src/sys/net/if_ethersubr.c:654
> #19 ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:735
> #20 0x805bcf89 in netisr_dispatch_src (proto=5, 
> source=, m=) at /usr/src/sys/net/netisr.c:1123
> #21 0x805b26f8 in ether_input (ifp=0xf8012c902000, m=0x0)
> at /usr/src/sys/net/if_ethersubr.c:823
> #22 0x8273c7f7 in bce_rx_intr (sc=)
> at /usr/src/sys/dev/bce/if_bce.c:6848
> #23 bce_intr (xsc=0xfe01665c2000) at /usr/src/sys/dev/bce/if_bce.c:8017
> #24 0x8047e0e7 in intr_event_execute_handlers (p=, 
> ie=) at /usr/src/sys/kern/kern_intr.c:1148
> #25 ithread_execute_handlers (p=, ie=)
> at /usr/src/sys/kern/kern_intr.c:1161
> #26 ithread_loop (arg=) at /usr/src/sys/kern/kern_intr.c:1241
&

Re: IPSec with if_ipsec strongswan and dynamic roadwarriors

2019-04-28 Thread Andrey V. Elsukov
On 28.04.2019 14:50, driesm.michi...@gmail.com wrote:
> Was wondering if it's possible to set-up a route based IPSec VPN with
> Strongswan with if_ipsec in FreeBSD?

We use if_ipsec(4) with Strongswan between offices. But our
configuration is specific. All if_ipsec(4) interfaces are preconfigured
via rc.conf. I.e. all interfaces has configured IP addresses and tunnel
endpoints. Strongswan is used to install security associations.
For each if_ipsec(4) interface we have corresponding entry in ipsec.conf.

 conn some-name-ipsec18
installpolicy=no
auto=route
left=Local-Tunnel-IP-address
right=Remote-Tunnel-IP-address
rightid=@some-name-id
reqid=18

Each interface has unique reqid.

> The caveat that I have are dynamic IP addresses (server (I have DDNS) +
> clients (roadwarriors; mobile, tablet, etc)).
> 
> How should one configure the if_ipsec interface? The Strongswan part is
> relatively straightforward as it takes variables that indicate "%any".
> 
> I found some guides for road warriors with Ubuntu VTI;, they configure it as
> such:
> 
> * ip tunnel add ipsec0 local 192.168.0.1 remote 0.0.0.0 mode vti key
> 42
> * Reference:
> https://wiki.strongswan.org/projects/strongswan/wiki/RouteBasedVPN
> 
> So the first address I assume is the left side of the external header (so
> NAT-T is needed) and the remote is a match all policy for the right side.
> 
> Can this be copy pasted on FreeBSD? In other words, is the Ubuntu command
> equivalent to "ifconfig ipsec0 inet tunnel 192.168.0.1 0.0.0.0" for FreeBSD?

This won't work. I think you need to write updown script that will
create corresponding if_ipsec(4) interface on demand and configure it,
i.e. set tunnel addresses and some internal if needed. Note, you need to
use the same reqid for if_ipsec(4) and for "conn" option.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: unicast vxlan - unable to tcp connect to ipv6 ip's on endpoint host

2019-04-19 Thread Andrey V. Elsukov
On 19.04.2019 13:46, Marco van Tol wrote:
> There is one exception to this: Host B can ping Host A on any of its
> IPv6 addresses, but it cannot make any tcp connection to any of the
> IPv6 addresses on Host A.  Is this expected?

Hi,

this looks like the problem with checksum offloading. When such
offloading is enabled on the interface, such protocols like TCP and UDP
 defer checksum calculation to interface hardware. ICMPv6 does checksum
calculation in software, thus it usually does not affected by such
problem. Sometimes NIC hardware or driver have bugs and offloading does
not work correctly. You can try to disable checksum offloading on your
interfaces and then try. Also you can use tcpdump to try determine what
the problem you have with TCP.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: bnxt(4) and VLANs - supposed to work?

2019-03-20 Thread Andrey V. Elsukov
On 20.03.2019 14:31, Patrick M. Hausen wrote:
> Hi all,
> 
> FreeBSD 11.2-STABLE (FreeNAS):
> 
> bnxt0: flags=8842 metric 0 mtu 1500 
> options=e527bb
> ether 00:25:90:5f:9a:82

Did you try to run `ifconfig bnxt0 up`?

> No traffic shows when I use tcpdump - either on vlan1 or bnxt0 - simply zero.
> There must be some broadcast frames flying past even if the switch on the
> other end should be misconfigured which I doubt ;-)
-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: UDP broadcast

2019-03-06 Thread Andrey V. Elsukov
On 06.03.2019 15:53, Goran Mekić wrote:
> I checked, and 02:c9:63:0b:f4:00 is my router MAC, so I suppose the
> ff:ff:ff:ff:ff:ff should be what I see.
> 
> Of course, I tried to write a simple C program that does the same, but
> just using 255.255.255.255 as IP address gives the same result as nc. Do
> I need to use raw sockets, perheps?

Take a look at ip(4) manual page, read about SO_BROADCAST and IP_ONESBCAST.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: UDP broadcast

2019-03-06 Thread Andrey V. Elsukov
On 06.03.2019 01:46, Goran Mekić wrote:
> Hello,
> 
> I have a audio mixer which is controleable over network via android app.
> The discovery is done by sending broadcast UDP message "/info\0\0\0" to
> 255.255.255.255 (checked by tcpdump on the router). I thought I can do
> the same with:
> 
> printf "/info\0\0\0" | nc -4u -w 0 255.255.255.255 10024
> 
> But I never get the reply. This is what tcpdump sees:
> 
> tcpdump -nnSX -v 'src 192.168.5.80 or dst 255.255.255.255'
> tcpdump: listening on re0, link-type EN10MB (Ethernet), capture size

I think it is because netcat does not send real broadcast, you can add
-e flag to tcpdump and compare ethernet destination addresses.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: RFC 5549?

2018-12-18 Thread Andrey V. Elsukov
On 18.12.2018 00:43, Donald Sharp wrote:
> On the other hand, ping doesn't appear to be working( but I think you
> probably knew that):
> sharpd@Janelle ~/frr> sudo tcpdump -i em1 icmp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
> 11:41:59.559457 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 27, length 64

For now I think source address specifying should help to use ping(8).

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: RFC 5549?

2018-12-17 Thread Andrey V. Elsukov
On 18.12.2018 00:43, Donald Sharp wrote:
> I took the code, compiled at and got FRR working with the v6 nexthop.
> https://github.com/FRRouting/frr/pull/3502.  Route installation from
> FRR appears to be working for me now:
> 
> Janelle# show ip route
> Codes: K - kernel route, C - connected, S - static, R - RIP,
>O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
>T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
>F - PBR, f - OpenFabric,
>> - selected route, * - FIB route
> 
> K>* 0.0.0.0/0 [0/0] via 10.50.12.1, em0, 00:13:13
> B>* 10.50.11.0/24 [20/0] via fe80::a00:27ff:fe28:3b50, em1, 00:13:09
> C>* 10.50.12.0/24 is directly connected, em0, 00:13:13
> B>* 10.232.0.16/32 [20/0] via fe80::a00:27ff:fe28:3b50, em1, 00:13:09
> B>* 192.168.209.0/24 [20/0] via fe80::a00:27ff:fe28:3b50, em1, 00:13:09
> B>* 192.168.230.0/24 [20/0] via fe80::a00:27ff:fe28:3b50, em1, 00:13:09
> B>* 192.168.231.0/24 [20/0] via fe80::a00:27ff:fe28:3b50, em1, 00:13:09
> Janelle# exit
> sharpd@Janelle ~/frr> netstat -rn
> Routing tables
> 
> Internet:
> DestinationGatewayFlags Netif Expire
> default10.50.12.1 UGS em0
> 10.50.11.0/24  fe80::a00:27ff:fe28:3b50%em1 UG1  em1
> 10.50.12.0/24  link#1 U   em0
> 10.50.12.121   link#1 UHS lo0
> 10.232.0.16fe80::a00:27ff:fe28:3b50%em1 UGH1  em1
> 127.0.0.1  link#4 UH  lo0
> 192.168.209.0/24   fe80::a00:27ff:fe28:3b50%em1 UG1  em1
> 192.168.230.0/24   fe80::a00:27ff:fe28:3b50%em1 UG1  em1
> 192.168.231.0/24   fe80::a00:27ff:fe28:3b50%em1 UG1  em1
> 
> On the other hand, ping doesn't appear to be working( but I think you
> probably knew that):
> sharpd@Janelle ~/frr> sudo tcpdump -i em1 icmp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
> 11:41:59.559457 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 27, length 64
> 11:42:00.626966 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 28, length 64
> 11:42:01.659515 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 29, length 64
> 11:42:02.732074 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 30, length 64
> 11:42:03.759432 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 31, length 64
> 11:42:04.833242 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 32, length 64
> 11:42:05.859559 IP 0.0.0.0 > 192.168.230.1: ICMP echo request, id
> 2119, seq 33, length 64
> ^C
> 7 packets captured
> 21 packets received by filter
> 0 packets dropped by kernel
> 
> This is pretty awesome progress though

Hi,

I think this happens when you try to ping from the router via interface
without IPv4 address. In this case rip_output() fills only destination
address in hope that ip_output() will fill source address. But since
there are no IPv4 addresses on the interface, in the line
https://svnweb.freebsd.org/base/head/sys/netinet/ip_output.c?annotate=339219#l452

it uses some zero filled word as source address.
Probably, we can just drop the packet, when gw->af_family == AF_INET6
and ip_src == INADDR_ANY. Also we can do some sort of source address
selection, but this variant needs more code :)

I think generic forwarding should work, when you use router only as
transit point.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: RFC 5549?

2018-12-17 Thread Andrey V. Elsukov
On 11.12.2018 15:07, Andrey V. Elsukov wrote:
>> The FRRouting project has some basic support for rfc 5549 and I've
>> been asked to see if it is possible to get this bit of code working
>> with the FRRouting freebsd kernel interface.  What is RFC 5549 you
>> ask?  The tl;dr of it is that you have v4 prefixes w/ a v6 gateway.
>> For some more background the linux implementation cheats ( and I would
>> like to emphatically point out that I'm not suggesting this solution,
>> I'm giving the linux solution to the problem as a data point to how it
>> was solved in one instance ) by installing a neighbor entry for
>> `169.254.0.1  ` and
>> when installing the v4 prefix we see the v6 nexthop and replace it
>> with `169.254.0.1 ` in the netlink message to the
>> kernel.  Is support of RFC 5549 possible in Freebsd?
> 
> I have thought a bit about this, and have some ideas how implement this.
> In general we can install into the kernel routes that has IPv6 address
> as gateway for IPv4 (currently this is not allowed by default, but it is
> easy to allow). So, as a routing daemon developer you can use generic
> API to install routes where RTAX_GATEWAY is IPv6 address.

Hi,

I have implemented basic support, so it can be tested now:

https://reviews.freebsd.org/D18581

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: RFC 5549?

2018-12-11 Thread Andrey V. Elsukov
On 24.10.2018 23:10, Donald Sharp wrote:
> All -
> 
> The FRRouting project has some basic support for rfc 5549 and I've
> been asked to see if it is possible to get this bit of code working
> with the FRRouting freebsd kernel interface.  What is RFC 5549 you
> ask?  The tl;dr of it is that you have v4 prefixes w/ a v6 gateway.
> For some more background the linux implementation cheats ( and I would
> like to emphatically point out that I'm not suggesting this solution,
> I'm giving the linux solution to the problem as a data point to how it
> was solved in one instance ) by installing a neighbor entry for
> `169.254.0.1  ` and
> when installing the v4 prefix we see the v6 nexthop and replace it
> with `169.254.0.1 ` in the netlink message to the
> kernel.  Is support of RFC 5549 possible in Freebsd?

Hi,

I have thought a bit about this, and have some ideas how implement this.
In general we can install into the kernel routes that has IPv6 address
as gateway for IPv4 (currently this is not allowed by default, but it is
easy to allow). So, as a routing daemon developer you can use generic
API to install routes where RTAX_GATEWAY is IPv6 address.

Then we need to modify ip_forward, ip_output, ip_tryforward to correctly
handle such routes. layer2 output routines should already correctly
handle IPv4 packets that are going trough the IPv6 gateway and it will
use ND6 lookup code to obtain Layer2 addresses.

The most complex it seems will the modification of ip_tryforward code,
since it is optimized for IPv4 and doesn't have enough room for
extending. With such changes IPv6 only router should be able to do IPv4
forwarding.

The problems that come to mind are inability to correctly send ICMP
messages, since there are no IPv4 addresses that can be used as IPv4
source, and how existing programs will handle such routes when they will
appear in a routing socket.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPsec: is it possible to encrypt transit traffic in transport mode?

2018-11-30 Thread Andrey V. Elsukov
On 30.11.2018 18:43, Lev Serebryakov wrote:
> Hello Olivier,
> 
> Friday, November 30, 2018, 3:34:50 PM, you wrote:
> 
>>>   I'm benchmarking different possible "native" VPN configurations and I have
>>>   gif(4) and gre(4) with and without IPsec in my battery. I have tunnel mode
>>>   IPsec too. Problem with gif(4) and gre(4) that hey are tremendously
>>>   expensive, and could be more expensive than IPsec itself on CPUs with 
>>> AES-NI.
>>>   So, this configuration impossible, I understand. Nothing to benchmark :-)
>> And what about using IPSec VTI (virtual tunneling interface)mode:  
>> if_ipsec(4)
>   And this one too. It gives slightly more PPS than "setkey-based" tunnel
>  mode, which is surprise for me.

If your goal is increasing of PPS throughput, there are several ways to
achieve it. For example, it is possible to make direct output from IPsec
code, I mean make a route lookup and call if_output() directly from
ipsec_process_done(). This removes many checks that does ip_output() and
also extra call to pfil(9).
Another idea is implementing some ipfw_ipsec(4) module, that can take
packets and do IPsec processing. Then this module can be attached to
Ethernet pfil hook and together with first idea, I think this can give a
measurable improvement of PPS rate.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPsec: is it possible to encrypt transit traffic in transport mode?

2018-11-30 Thread Andrey V. Elsukov
On 30.11.2018 04:06, Eugene Grosbein wrote:
>>   Is it possible to encrypt this traffic with IPsec in *transport* mode?
>>  I've tried to create SAs for 10.2.0.1 and 10.2.0.2 and SPDs for 10.1.0.0/24
>>  and 10.10.10.0/24 on A and B (not on endpoint devices) but looks like it
>>  doesn't work, traffic stops. It is not as encrypted traffic is sent but
>>  dropped on other end, no, interfaces between Host A and Host B becomes
>>  silent according to "tcpdump" and all forwarded/dropped/error counters in
>>  "nestat -s" don't change anymore, only "input packets" in "netstat -s -p ip"
>>  is still counting.
>>
> It is possible and it is the way I use extensively for long time since very 
> old
> FreeBSD versions having KAME IPSEC and it works with 11.2-STABLE, too.
> 
> You need to read setkey(8) manual page, section ALGORITHMS and make sure
> you use proper sized keys or it won't work, though.
> 
> And example of transport mode IPSEC with low-powered device having on-board
> Geode LX Security Block crypto accelerator with AES-128-CBC support:
> 
> add 1.1.1.1 2.2.2.2 esp 1081 -m transport -E rijndael-cbc "1234567890123456" 
> -A hmac-md5 "0123456789123456";
> add 2.2.2.2 1.1.1.1 esp 2081 -m transport -E rijndael-cbc "9876543210987654" 
> -A hmac-md5 "6543219876543210";
> 
> spdadd 1.1.1.1/32 2.2.2.2/32 any -P out ipsec esp/transport//require;
> spdadd 2.2.2.2/32 1.1.1.1/32 any -P in ipsec  esp/transport//require;
> 
> You have to use bigger keys if you use another -A algorithm like sha*, each 
> character counts for 8 bits.

There is one problem. IPsec won't handle inbound packets, that are not
destined to your IP address. Inbound packets are handled based on the
destination address, protocol and SPI value, so if ip_input() doesn't
decide that ESP packet is for your host, it will not invoke
IPSEC_INPUT() and encrypted packet will be routed as is.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: ASUS PCE-AC88 AC3100 Supported?

2018-10-31 Thread Andrey V. Elsukov
On 27.10.2018 14:08, Mark Raynsford via freebsd-net wrote:
>>   "The bwn(4) driver supports Broadcom BCM43xx based wireless
>>   devices..."
>>
>> But the actual card itself isn't listed.
> 
> To follow up on this, because I actually bought one, the card doesn't
> appear to be supported on 11.2-RELEASE.
> 
> There's nothing in dmesg. I see the following:
> 
> # pciconf -lv
> none2@pci0:7:0:0: class=0x028000 card=0x86fb1043 chip=0x43c314e4 rev=0x04 
> hdr=0x00
> vendor = 'Broadcom Limited'
> class  = network
> 
> Nothing appears in ifconfig.

OpenBSD/NetBSD has bwfm driver, that seems supports this card.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Configuring IPv6 on jails

2018-10-30 Thread Andrey V. Elsukov
On 29.10.2018 20:46, Dries Michiels wrote:
>>> Isn't there a way to let IPFW determine what interface to use (and
>>> thus IPv6
>>> prefix) for external translation? (for IPv4 NAT there is no need to
>>> specify the external IPv4 address)
>>
>> Hi,
>>
>> I think I can add this feature to ipfw_nptv6 module, but I need some spare
>> time to implement it. If you are interested, I'll send the patch to you 
>> later.
>> What version do you use? I suspect the patch will use some features, that are
>> present only in head/ yet.
> 
> Would be nice! I’m on 12-STABLE.

Hi,

I published the patch:
https://reviews.freebsd.org/D17765

For stable/12 you need to apply patch from r339537:
https://reviews.freebsd.org/D17100

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Configuring IPv6 on jails

2018-10-29 Thread Andrey V. Elsukov
On 29.10.2018 17:56, Dries Michiels wrote:
> * Use IPFW IPv6 prefix translation for the jail /64 prefix; translate
> between global routable /64 prefix and fd00::1/64 (as example). The latter
> can be statically configured in jail.conf.
> 
> My problem here is that the IPFW rule needs the external prefix as an
> argument. My prefix is dynamic so this might be tricky and indicates
> scripting to me.
> 
> Isn't there a way to let IPFW determine what interface to use (and thus IPv6
> prefix) for external translation? (for IPv4 NAT there is no need to specify
> the external IPv4 address)

Hi,

I think I can add this feature to ipfw_nptv6 module, but I need some
spare time to implement it. If you are interested, I'll send the patch
to you later. What version do you use? I suspect the patch will use some
features, that are present only in head/ yet.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: get rid of the eui64 address

2018-10-19 Thread Andrey V. Elsukov
On 19.10.2018 15:41, Victor Sudakov wrote:
> Andrey V. Elsukov wrote:
>>>
>>> BTW do you know the diffrence between the "accept_rtadv" and
>>> "autoconf" flags in ifconfig?
>>
>> The accept_rtadv is interface's attribute and it is shown in the "nd6
>> options" line, but the autoconf is address's attribute.
> 
> ifconfig(8) says that 
> 
> 
>  autoconf
>  Set a flag to accept router advertisements on an interface.
> 
>  -autoconf
>  Disable autoconfiguration.
> 
> This is unclear.

Seems your ifconfig(8) is a bit out of day.
My manual says:
 autoconf
 Set the IPv6 autoconfigured address bit.

 -autoconf
 Clear the IPv6 autoconfigured address bit.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: get rid of the eui64 address

2018-10-19 Thread Andrey V. Elsukov
On 19.10.2018 12:35, Victor Sudakov wrote:
> Andrey V. Elsukov wrote:
>> On 18.10.2018 18:56, Victor Sudakov wrote:
>>> Thank you Andrey, you made my day! I'm beginning to love IPv6 more and
>>> more.
>>>
>>> How would the prefer_source flag look like in rc.conf? Is the following 
>>> approach correct:
>>>
>>> ifconfig_fxp0_ipv6="inet6 accept_rtadv"
>>> ifconfig_fxp0_alias0="inet6 2001:19f0:8001:1219::10 prefer_source"
>>
>> Hi,
>>
>> I think you can just use all these options in one line:
>>
>> ifconfig_fxp0_ipv6="2001:19f0:8001:1219::10/64 prefer_source accept_rtadv"
> 
> Looks good.
> 
> BTW do you know the diffrence between the "accept_rtadv" and
> "autoconf" flags in ifconfig?

The accept_rtadv is interface's attribute and it is shown in the "nd6
options" line, but the autoconf is address's attribute.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: get rid of the eui64 address

2018-10-19 Thread Andrey V. Elsukov
On 18.10.2018 18:56, Victor Sudakov wrote:
> Thank you Andrey, you made my day! I'm beginning to love IPv6 more and
> more.
> 
> How would the prefer_source flag look like in rc.conf? Is the following 
> approach correct:
> 
> ifconfig_fxp0_ipv6="inet6 accept_rtadv"
> ifconfig_fxp0_alias0="inet6 2001:19f0:8001:1219::10 prefer_source"

Hi,

I think you can just use all these options in one line:

ifconfig_fxp0_ipv6="2001:19f0:8001:1219::10/64 prefer_source accept_rtadv"

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: get rid of the eui64 address

2018-10-18 Thread Andrey V. Elsukov
On 17.10.2018 12:45, Victor Sudakov wrote:
> Dear Colleagues,
> 
> I have a static IPv6 address configured on a host. However, an
> autoconfigured eui64 address keeps popping up even if I delete it
> manually.  It's marked by the "autoconf" flag in the ifconfig output.
> 
> Can I get rid of the autoconf'ed address for good, but still keep the
> rtsold running (for the sake of the default gateway)?
> 
> Or should I leave it alone? I just want to make sure that the static
> IPv6 address is used for outgoing connections.

You can use prefer_source flag for your static address, and for most
cases it will be chosen by IPv6 SAS algorithm (if it is from the same
prefix as autoconfigured one).

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Patching ng_iface to allow setting the MTU via netgraph API

2018-10-11 Thread Andrey V. Elsukov
On 10.10.2018 22:43, Andreas Kempe wrote:
> Hello!
> 
> I am working on a meshnet concept and am using netgraph to put it right
> on top of the MAC layer. I am using ng_iface to do some tunneling of IP
> over my protocol and I thought it would be nice to be able to set the
> MTU of the created interface using the netgraph interface.
> 
> I created a patch to do just that and thought I could share it if
> someone would find it worthwhile using for something, but I couldn't
> really figure out where to share it. I couldn't really get clarity from
> the contribution section in the handbook.
> 
> If someone could point me in the right direction, it would be
> appreciated. I'll attach the patch to this mail as well since it is a
> quite small one.

Hi,

take a look at this review https://reviews.freebsd.org/D17180
You can use the same way.

-- 
WBR, Andrey V. Elsukov
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: IPv6 fragment reassembly regression following FreeBSD-SA-18:10.ip

2018-09-24 Thread Andrey V. Elsukov
On 23.09.2018 16:43, John W. O'Brien wrote:
> I'd like to check my understanding and then ask a procedural question.
> 
> FreeBSD-SA-18:10.ip [0], released on 08/14, was resolved by r337828 [1].
> That changeset, resulting in 11.1R-p13 and 11.2R-p2, included a patch to
> the way IPv6 fragment reassembly is handled [2] that was part of the
> merge to releng. In an ensuing thread [3] two weeks later, an
> implementation defect was identified, but not before that defect had
> shipped. The defect is now being tracked as a bug [4], as of 09/03 has
> been fixed in head and stable/11, and is registered as a blocker for 12.0.
> 
> I believe this defect is the cause of a problem I detected recently
> where postfix would query BIND on ::1 for the DNSSEC-signed  of an
> MX, and never receive a response. I'm a little puzzled that lo0 is
> affected in spite of having a 16k MTU, but the other signs are there:
> the symptoms appeared after upgrading from 11.2R-p1 to -p3, and I can
> perform that query successfully on UDPv4 or TCPv6.
> 
> What I have been unable so far to determine is, will another 11.2R patch
> be forthcoming to resolve this regression, and if so, when? I can limp
> along without UDPv6 for a little while, but not until 11.3. The only
> clear alternative is to downgrade to -p1.
> 
> [0] https://www.freebsd.org/security/advisories/FreeBSD-SA-18:10.ip.asc
> [1] https://svnweb.freebsd.org/changeset/base/337828
> [2] https://svnweb.freebsd.org/changeset/base/337776
> [3] https://lists.freebsd.org/pipermail/svn-src-head/2018-August/117514.html
> [4] https://bugs.freebsd.org/231045

Your analysis looks correct to me. r338406 was not merged to releng/11.2.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: NFS poor performance in ipfw_nat

2018-09-18 Thread Andrey V. Elsukov
On 18.09.2018 01:53, KIRIYAMA Kazuhiko wrote:
> Hi, all
> 
> I'm working on ipfw_nat box with port redirect for sunrpc
> (111) and nfsd (2049):
> 
> # ifconfig 
> em0: flags=8843 metric 0 mtu 1500
> 
> options=85259b
> igb0: flags=8843 metric 0 mtu 1500
> 
> options=e505bb
> # ipfw nat show config
> ipfw nat 123 config if em0 log deny_in same_ports unreg_only reset 
> redirect_port tcp 192.168.1.253:22 22253 redirect_port tcp 192.168.1.252:22 
> 22252 redirect_port tcp 192.168.1.251:22 22251 redirect_port tcp 
> 192.168.1.250:22 22250 redirect_port tcp 192.168.1.249:22 22249 redirect_port 
> tcp 192.168.1.248:22 22248 redirect_port tcp 192.168.1.247:22 22247 
> redirect_port tcp 192.168.1.246:22 22246 redirect_port tcp 192.168.1.245:22 
> 22245 redirect_port tcp 192.168.1.244:22 22244 redirect_port tcp 
> 192.168.1.243:22 22243 > 
> Is there any suggestions ?
> 

Hi,

try to disable TSO on your NICs.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Is if_ipsec/ipsec - AESNI accelerated ?

2018-08-09 Thread Andrey V. Elsukov
On 09.08.2018 23:11, David P. Discher wrote:
> The documentation for using IPSec (especially if_ipsec) is really thin
> for freebsd, so I pieced some of this together from various posts and
> mailing lists threads.
>  
> Is there no need for racoon ?  How in this example is the IKE/ISAKMP
> setup done ? Is setkey doing this ?

> This is 11.2-stable, shortly after release … I don’t have this sysctl.

This is manually configured tunnel between two FreeBSD 12.0-CURRENT
hosts. I can suggest to try patch and config from this post:

https://lists.freebsd.org/pipermail/freebsd-net/2018-May/050509.html

>> Need to see your setkey.conf, or at least the output of setkey -D..
> 
> 
> setkey.conf is :
> 
>         flush;
>         spdflush;
> 
>         spdadd -4n 172.30.1.12/30 172.30.1.12/30 any -P out ipsec
> esp/tunnel/10.245.0.201-10.245.0.202/unique:12;
>         spdadd -4n 172.30.1.12/30 172.30.1.12/30 any -P in  ipsec
> esp/tunnel/10.245.0.202-10.245.0.201/unique:12;
>         spdadd -4n 172.30.1.4/30 172.30.1.4/30 any -P out ipsec
> esp/tunnel/10.245.0.201-10.245.0.203/unique:4;
>         spdadd -4n 172.30.1.4/30 172.30.1.4/30 any -P in  ipsec
> esp/tunnel/10.245.0.203-10.245.0.201/unique:4;

You don't need to create security policies for if_ipsec interfaces. They
are created by interface automatically.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Is if_ipsec/ipsec - AESNI accelerated ?

2018-08-09 Thread Andrey V. Elsukov
On 09.08.2018 10:00, David P. Discher wrote:
>   [ pts/0 sjc2 util201:~ ]
>   [ dpd ] > iperf3 -c 10.245.0.202 -i 8 -t 16
>   Connecting to host 10.245.0.202, port 5201
>   [  5] local 10.245.0.201 port 55165 connected to 10.245.0.202 port 5201
>   [ ID] Interval   Transfer Bitrate Retr  Cwnd
>   [  5]   0.00-8.00   sec   887 MBytes   930 Mbits/sec0419 KBytes
>   [  5]   8.00-16.00  sec   898 MBytes   941 Mbits/sec0419 KBytes
>   - - - - - - - - - - - - - - - - - - - - - - - - -
>   [ ID] Interval   Transfer Bitrate Retr
>   [  5]   0.00-16.00  sec  1.74 GBytes   936 Mbits/sec0 
> sender
>   [  5]   0.00-16.01  sec  1.74 GBytes   935 Mbits/sec  
> receiver
> 
>   iperf Done.
> 
>   [ pts/0 sjc2 util201:~ ]
>   [ dpd ] > iperf3 -c 172.30.1.14 -i 8 -t 16
>   Connecting to host 172.30.1.14, port 5201
>   [  5] local 172.30.1.13 port 41671 connected to 172.30.1.14 port 5201
>   [ ID] Interval   Transfer Bitrate Retr  Cwnd
>   [  5]   0.00-8.00   sec   166 MBytes   174 Mbits/sec0   64.3 KBytes
>   [  5]   8.00-16.00  sec   168 MBytes   176 Mbits/sec0   64.3 KBytes
>   - - - - - - - - - - - - - - - - - - - - - - - - -
>   [ ID] Interval   Transfer Bitrate Retr
>   [  5]   0.00-16.00  sec   334 MBytes   175 Mbits/sec0 
> sender
>   [  5]   0.00-16.01  sec   334 MBytes   175 Mbits/sec  
> receiver
I did some tests and here are my results:

# ifconfig ipsec0
ipsec0: flags=8051 metric 0 mtu 1400
tunnel inet 10.0.0.15 --> 10.0.0.25
inet 192.168.0.15 --> 192.168.0.25 netmask 0xff00
inet6 fe80::225:90ff:fef9:3c92%ipsec0 prefixlen 64 scopeid 0x8
nd6 options=23
reqid: 16385
groups: ipsec

# iperf -c 10.0.0.25 -i 8 -t 16

Client connecting to 10.0.0.25, TCP port 5001
TCP window size: 35.0 KByte (default)

[  3] local 10.0.0.15 port 21371 connected with 10.0.0.25 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0- 8.0 sec  9.09 GBytes  9.77 Gbits/sec
[  3]  8.0-16.0 sec  9.22 GBytes  9.90 Gbits/sec
[  3]  0.0-16.0 sec  18.3 GBytes  9.83 Gbits/sec

# iperf -c 192.168.0.25 -i 8 -t 16

Client connecting to 192.168.0.25, TCP port 5001
TCP window size: 33.2 KByte (default)

[  3] local 192.168.0.15 port 30394 connected with 192.168.0.25 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0- 8.0 sec   607 MBytes   636 Mbits/sec
[  3]  8.0-16.0 sec   606 MBytes   636 Mbits/sec
[  3]  0.0-16.0 sec  1.19 GBytes   636 Mbits/sec


# sysctl net.inet.ipsec.async_crypto=1
net.inet.ipsec.async_crypto: 0 -> 1

# iperf -c 192.168.0.25 -i 8 -t 16

Client connecting to 192.168.0.25, TCP port 5001
TCP window size: 33.2 KByte (default)

[  3] local 192.168.0.15 port 17716 connected with 192.168.0.25 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0- 8.0 sec  1.38 GBytes  1.48 Gbits/sec
[  3]  8.0-16.0 sec  1.40 GBytes  1.51 Gbits/sec
[  3]  0.0-16.0 sec  2.78 GBytes  1.50 Gbits/sec


# kldload aesni
# setkey -DF
# setkey -c
add 10.0.0.25 10.0.0.15 esp 1 -m tunnel -u 16385 -E rijndael-cbc
"0123456789123456";
add 10.0.0.15 10.0.0.25 esp 2 -m tunnel -u 16385 -E rijndael-cbc
"0123456789123456";

# sysctl net.inet.ipsec.async_crypto=0
net.inet.ipsec.async_crypto: 1 -> 0

# iperf -c 192.168.0.25 -i 8 -t 16

Client connecting to 192.168.0.25, TCP port 5001
TCP window size: 33.2 KByte (default)

[  3] local 192.168.0.15 port 57206 connected with 192.168.0.25 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0- 8.0 sec  1.08 GBytes  1.16 Gbits/sec
[  3]  8.0-16.0 sec  1.11 GBytes  1.19 Gbits/sec
[  3]  0.0-16.0 sec  2.19 GBytes  1.18 Gbits/sec

# sysctl net.inet.ipsec.async_crypto=1
net.inet.ipsec.async_crypto: 0 -> 1

# ifconfig ipsec0 mtu 8000 down up

# iperf -c 192.168.0.25 -i 8 -t 16

Client connecting to 192.168.0.25, TCP port 5001
TCP window size: 38.9 KByte (default)

[  3] local 192.168.0.15 port 37641 connected with 192.168.0.25 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0- 8.0 sec  5.64 GBytes  6.06 Gbits/sec
[  3]  8.0-16.0 sec  5.76 G

Re: Is if_ipsec/ipsec - AESNI accelerated ?

2018-08-08 Thread Andrey V. Elsukov
On 09.08.2018 06:57, David P. Discher wrote:
> I’m suspecting that IPSec in FreeBSD is not leveraging AESNI on Intel.  Is 
> this correct ?

IPsec uses crypto(9) framework that works by default without any
acceleration. You need to load aesni(4) kernel module to enable
acceleration. Also, you need to recreate security associations after
module loading to take effect.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPv6 scope handling, was Re: svn commit: r335806 - projects/pnfs-planb-server/usr.sbin/nfsd

2018-07-01 Thread Andrey V. Elsukov
On 01.07.2018 03:30, Rick Macklem via freebsd-net wrote:
>> [neighbor1 fe80::100]<-->[fe80::1%igb0 | fe80::1%igb1]<-->[fe80::100
>> neighbor2]
>>
>> neighbor1 can not reach neighbor2, since these addresses belongs to
>> different scope zones. On the host with two interfaces you as user can
>> use link-local addresses and can specify such addresses in application.
>> To disambiguate them you must specify scope zone identifier, "%igb0" or
>> "%igb1". E.g. if you want to connect with neighbor1, you can use "telnet
>> fe80::100%igb0 someport" and the kernel will initiate connection with
>> neighbor1 through igb0. inet_ntop() call doesn't support this.
> Ok, I think I follow that. I didn't explain what this case is...
> 
> The code for this patch runs on HostA.
> If looks up an address for HostB, but not so it can connect to HostB.
> It sends the address to HostC, so that HostC can connect to HostB.
> (So the address needed is the address that HostC would use to connect to
>  HostB.)
>
> I think what you are saying above is that a Link-local address won't work
> and that the address must be a global one?
> Should the code check for "fe8" at the start and skip over those ones?

It is possible that all hosts are in the same scope zone, e.g. they are
connected in the one broadcast domain through the switch.
In this case it is possible to use link-local addresses and they all
will be reachable.

> The "on-the-wire" address sent to HostC is specified in standard string form
> (can't remember the RFC#, but it is referenced by RFC5661), so I can't send
> any more than that to HostC.

So if I understand correctly, after formatting you are sending this
address string to the some foreign host?
The scope zone id specifier is only does matter for the host where it is
used. I.e. there is no sense to send "%ifname" to the foreign host,
because it can have different ifname for the link and that address
specification won't work.

I think for now we can leave the code as is (put some XXX with comment
here), and then in the future, if it will be needed, add better handling
for that :)

>> In the kernel when you operate with IPv6 link-local addresses you need
>> to properly prepare them in the IPv6 header, i.e. embed scope zone
>> identifier. Otherwise the kernel will fail to send such packets.
> How would HostA know what HostC should use?
> (I don't think it can know?)
> [stuff snipped]

The possible solution can be:
* for the sending host use scope zone id to determine proper interface
to send through;
* for the receiving host track the receiving interface and if given
address is link-local, recover the scope zone id from the receiving
interface.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: IPv6 scope handling, was Re: svn commit: r335806 - projects/pnfs-planb-server/usr.sbin/nfsd

2018-06-30 Thread Andrey V. Elsukov
On 30.06.2018 21:33, Rick Macklem wrote:
>> I'm unaware of applicability of IPv6 addresses with restricted scope in
>> this area, but when you use inet_ntop() to get IPv6 address text
>> representation, you can lost IPv6 scope zone id. getaddrinfo() can
>> return sockaddr structure with properly filled sin6_scope_id field. It
>> is better to use getnameinfo() with NI_NUMERICHOST flag. Also the size
>> of ip6 buffer should be enough to keep scope specifier.
> Thanks for mentioning this. First off, you could write what I know about IPv6
> addresses on a very small postage stamp...
> 
> Are you referring to the 4bits in the second octet of the address or the 
> stuff that
> can end up as a suffix starting with"%"?

RFC4007 defines scopes of IPv6 addresses. The important part of this is
that a host can have several interfaces (links) and each interface can
have the same unicast IPv6 address, but the scope of these addresses
will be restricted by these links. I.e. IPv6 link-local address from
e.g. igb0 interface can be used only within igb0, and the same address
on the igb1 interface can be used only within igb1. And neighbor hosts
on these links can have the same addresses, but they will not be reachable:

[neighbor1 fe80::100]<-->[fe80::1%igb0 | fe80::1%igb1]<-->[fe80::100
neighbor2]

neighbor1 can not reach neighbor2, since these addresses belongs to
different scope zones. On the host with two interfaces you as user can
use link-local addresses and can specify such addresses in application.
To disambiguate them you must specify scope zone identifier, "%igb0" or
"%igb1". E.g. if you want to connect with neighbor1, you can use "telnet
fe80::100%igb0 someport" and the kernel will initiate connection with
neighbor1 through igb0. inet_ntop() call doesn't support this.

KAME-based IPv6 stack uses the second 16-bits word of the address to
store scope zone id. It is hack and user applications should not use it,
and should not rely on this hack. The struct sockaddr_in6 has special
field sin6_scope_id to specify scope zone id. Note, that the size of
this field is 32-bits. If you want to support such addresses, you need
to use scope zones aware API - getaddrinfo()/getnameinfo(). Together
with using such API you will need to use struct sockaddr_in6, or keep
zone id separately.

In the kernel when you operate with IPv6 link-local addresses you need
to properly prepare them in the IPv6 header, i.e. embed scope zone
identifier. Otherwise the kernel will fail to send such packets.

> In this case, the address string is put "on the wire" for the client to use 
> to connect
> to a data server (DS). I'm not sure if the "%..." stuff is useful in this 
> case and,
> when it gets to the client, it will be translated to an address via the kernel
> version of inet_pton(), which does not parse "%..." as far as I can see.
> 
> So maybe others can clarify if it would be better to use getnameinfo() for 
> this
> use case?


-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [PATCH]: The 6to4 stf0 interface flapping in/out of tentative in FreeBSD 11

2018-06-22 Thread Andrey V. Elsukov
On 22.06.2018 21:08, Viktor Dukhovni wrote:
>> Your change looks reasonable due to IPv6 DAD procedure does check for
>> presence of IFF_DRV_RUNNING flag. But actually it seems the right
>> solution should be disabling DAD for if_stf(4) interface.
>> IPv6 DAD requires that given interface should be multicast capable, but
>> for if_stf(4) it is not true.
>> Will it help if you use `ifconfig stf0 inet6 no_dad` before assigning
>> IPv6 address?
> 
> stf_up() in /etc/rc.d/stf has:
> 
> ifconfig stf0 create >/dev/null 2>&1
> ifconfig stf0 inet6 
> 2002:${ipv4_in_hexformat}:${stf_interface_ipv6_slaid:-0}:${stf_interface_ipv6_ifid}
>  \
> prefixlen ${stf_prefixlen}
> 
> Are you suggesting to add the:
> 
>   ifconfig stf0 inet6 no_data

Yes, but "no_dad", not "no_data".

> right under "ifconfig stf0 create"?  I'd have to find a convenient time to
> reboot to the stock kernel, so this will take O(12 hours) before I can 
> re-test.
> 
> Perhaps the fix should be belt-and-suspenders?  Both set IFF_DRV_RUNNING
> and disable DAD automatically for lack of multicast support?  Setting
> the flag bit might avoid other future issues.  Avoiding needless DAD
> polling sounds sensible.

We have already one tweak for if_stf(4) in in6_ifattach(), that disables
automatic LLA creation. I think we can also add disabling DAD there.
Something like:

Index: in6_ifattach.c
===
--- in6_ifattach.c  (revision 335361)
+++ in6_ifattach.c  (working copy)
@@ -683,6 +683,7 @@ in6_ifattach(struct ifnet *ifp, struct ifnet *alti
 * it is rather harmful to have one.
 */
ND_IFINFO(ifp)->flags &= ~ND6_IFF_AUTO_LINKLOCAL;
+   ND_IFINFO(ifp)->flags |= ND6_IFF_NO_DAD;
break;
default:
break;


-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [PATCH]: The 6to4 stf0 interface flapping in/out of tentative in FreeBSD 11

2018-06-22 Thread Andrey V. Elsukov
On 22.06.2018 19:38, Viktor Dukhovni wrote:
> I just upgraded to 11.1-p10, forgetting I had patched my kernel,
> and the stf0 interface flapping was back, with IPv6 connectivity
> disappearing every other second or so (interface shows as "tentative"
> and outgoing connections fail with "can't assign requested address").
> 
> Appied the same patch and rebooted, and the problem is gone.  Here's
> the patch again:
> 
> Index: sys/net/if_stf.c
> --- sys/net/if_stf.c  (revision 75)
> +++ sys/net/if_stf.c  (working copy)
> @@ -722,6 +722,7 @@
>   }
>  
>   ifp->if_flags |= IFF_UP;
> + ifp->if_drv_flags |= IFF_DRV_RUNNING;
>   break;
>  
>   case SIOCADDMULTI:
> 
Hi,

Your change looks reasonable due to IPv6 DAD procedure does check for
presence of IFF_DRV_RUNNING flag. But actually it seems the right
solution should be disabling DAD for if_stf(4) interface.
IPv6 DAD requires that given interface should be multicast capable, but
for if_stf(4) it is not true.
Will it help if you use `ifconfig stf0 inet6 no_dad` before assigning
IPv6 address?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: In-kernel NAT [ipfw] dropping large UDP return packets

2018-06-13 Thread Andrey V. Elsukov
On 13.06.2018 23:04, Jeff Kletsky wrote:
>> The kernel version of libalias uses m_megapullup() function to make
>> single contiguous buffer. m_megapullup() uses m_get2() function to
>> allocate mbuf of appropriate size. If size of packet greater than 4k it
>> will fail. So, if you use MTU greater than 4k or if after fragments
>> reassembly you get a packet with length greater than 4k, ipfw_nat()
>> function will drop this packet.
>>
> Thanks!!
> 
> Mystery solved...
> 
> /usr/src/sys/netinet/libalias/alias.c
> 
> #ifdef _KERNEL
> /*
>  * m_megapullup() - this function is a big hack.
>  * Thankfully, it's only used in ng_nat and ipfw+nat.
> 
> suggests that the "old school" approach of natd might resolve this. I'll
> give it a try when I'm close enough to the box to resolve it when I make
> a configuration error.

I didn't look at the rest of libalias, but you, probably, can improve
this hack to use 9k or 16k mbufs. You can replace m_get2() call in
m_megapullup() with the following code:

if (len <= MJUMPAGESIZE)
mcl = m_get2(len, M_NOWAIT, MT_DATA, M_PKTHDR);
else if (len <= MJUM9BYTES)
mcl = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, MJUM9BYTES);
else if (len <= MJUM16BYTES)
        mcl = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, MJUM16BYTES);
else
goto bad;

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: In-kernel NAT [ipfw] dropping large UDP return packets

2018-06-13 Thread Andrey V. Elsukov
On 13.06.2018 20:16, Jeff Kletsky wrote:
> When a T-Mobile "femto-cell" is trying to establish its IPv4, IPSEC
> tunnel to the T-Mobile provisioning servers, the reassembled, 4640-byte
> return packet is silently dropped by the in-kernel NAT, even though it
> "matches" the outbound packet from less than 100 ms prior.
> Are there known causes and/or resolutions for this behavior?
> 
> Is there a way to be able to "monitor" the NAT table?
> 
> (I didn't see anything obvious in the ipfw, natd, or libalias man pages.)

The kernel version of libalias uses m_megapullup() function to make
single contiguous buffer. m_megapullup() uses m_get2() function to
allocate mbuf of appropriate size. If size of packet greater than 4k it
will fail. So, if you use MTU greater than 4k or if after fragments
reassembly you get a packet with length greater than 4k, ipfw_nat()
function will drop this packet.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: 11.2-RC1 setkey invalid spi ?

2018-06-13 Thread Andrey V. Elsukov
On 12.06.2018 17:02, Patrick Lamaiziere wrote:
> # setkey -f /etc/ipsec.conf
> # setkey -D
> 129.20.128.149 129.20.128.78
>   tcp mode=any spi=106079004(0x0652a31c) reqid=0(0x)
>   A: tcp-md5  73656372 6574
>   seq=0x replay=0 flags=0x0040 state=mature 
>   created: Jun 12 15:57:28 2018   current: Jun 12 15:57:36
> 2018
>   diff: 8(s)  hard: 0(s)  soft: 0(s)
>   last:   hard: 0(s)  soft: 0(s)
>   current: 0(bytes)   hard: 0(bytes)  soft: 0(bytes)
>   allocated: 0hard: 0 soft: 0
>   sadb_seq=1 pid=5405 refcnt=1
> 129.20.128.78 129.20.128.149
>   tcp mode=any spi=4096(0x1000) reqid=0(0x)
>   A: tcp-md5  73656372 6574
>   seq=0x replay=0 flags=0x0040 state=mature 
>   created: Jun 12 15:57:28 2018   current: Jun 12 15:57:36
> 2018
>   diff: 8(s)  hard: 0(s)  soft: 0(s)
>   last:   hard: 0(s)  soft: 0(s)
>   current: 0(bytes)   hard: 0(bytes)  soft: 0(bytes)
>   allocated: 0hard: 0 soft: 0
>   sadb_seq=0 pid=5405 refcnt=1
> 
> spi field looks wrongs :(
>
> That works fine on FreeBSD 10.3
> 
> Same problem on a FreeBSD 11.1-STABLE #1 r326391: Thu Nov 30 12:07:50
> CET 2017 

SPI isn't used with TCP (it doesn't sent over network). It is here,
since it is required to create SA in SADB. In 11.0 the SADB/SPDB were
changed and now each SA must have unique SPI. To not break old
applications the compatibility shim was added, for TCP-MD5 SAs it is
supported to use one SPI 0x1000, and it is allowed when you try to add
several SAs with the same SPI, but actually they will use auto-generated
values.

Two years ago I have sent the patch to bird developers, but have not
received any answers.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: GRE/gif/netgraph tunnel speed on 10Gbit channel

2018-05-29 Thread Andrey V. Elsukov
On 29.05.2018 13:58, Vitalij Satanivskij wrote:
> Thank you Andrey.
> 
> I'm test with value of 62 without any success ^( 

So, is there no difference at all? The same bit rate with and without
loaded module? Can you share your configs and parameters used for testing?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: GRE/gif/netgraph tunnel speed on 10Gbit channel

2018-05-29 Thread Andrey V. Elsukov
On 29.05.2018 13:05, Vitalij Satanivskij wrote:
> 
> Hello.
> 
> Looks like it's inposible to set as sysctl are readonly and through 
> loader.conf it's also not setable.
> 
> AVE> You can try to increase kern.ipc.max_linkhdr up to 60-80 bytes.
> AVE> I think this can increase throughput a bit, since this can reduce the
> AVE> need to allocate extra mbuf when new IP header is encapsulated.

Hm, yes it is readonly. Probably I thought about my local patches...
You can try this kernel module to change this value in run time.

-- 
WBR, Andrey V. Elsukov
# $FreeBSD$

KMOD=   linkhdr
SRCS=   main.c
MK_MAN= no

.include 


signature.asc
Description: OpenPGP digital signature


Re: GRE/gif/netgraph tunnel speed on 10Gbit channel

2018-05-29 Thread Andrey V. Elsukov
On 29.05.2018 10:17, Vitalij Satanivskij wrote:
> Version of system - 11.2-BETA2 FreeBSD 11.2-BETA2 #0 r334027
> Also I'm test FreeBSD 11.1-PRERELEASE #6 r320593
> 
> Kernel GENERIC and CUSTOM (mostly cut off not used drivers from kernel) 
> 
> For testing I'm use iperf on pure 10Gbit chanel easy get 9.8-9.9Gbit's 
> For tunnel's even try to check multiply instance of iperf (eg on different 
> ports)
> 
> 
> So question are - Is it normal speed for tunnel's. 
> Which tuning I can try to speed up it. 

You can try to increase kern.ipc.max_linkhdr up to 60-80 bytes.
I think this can increase throughput a bit, since this can reduce the
need to allocate extra mbuf when new IP header is encapsulated.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: multiple if_ipsec

2018-05-13 Thread Andrey V. Elsukov
On 08.05.2018 16:51, Andrey V. Elsukov wrote:
> I think for proper support of several if_ipsec interfaces racoon needs
> some patches. But I have not spare time to do this job.
> I recommend to use strongswan, it has active developers that are
> responsive and may give some help at least.

Hi,

Today I hacked ipsec-tools a bit, and made the patch that adds support
for multiple if_ipsec interfaces.

https://people.freebsd.org/~ae/patch-reqid.diff

You can put this patch into ipsec-tools/files/ directory and then
rebuild the package. I'm not sure about compatibility with generic
configurations, I tested only the case with two if_ipsec tunnels.

What it does:
* added new configuration option for sainfo section - "reqid NUM";
* policy index was extended to contain reqid, so now racoon's security
policies from multiple interfaces don't overlapped;
* logging extended to print reqid in some places.

How it is expected to be used:

In racoon.conf you have several "remote IP-address {}" sections. Each
section should have "ph1id NUM" option. This option is used to select
corresponding "sainfo {}". You can have many "sainfo anonymous {}"
sections with different "remoteid NUM", where NUM should match to "ph1id
NUM". Also you need to add "reqid N" option to these sainfo sections.
This reqid should match to value configured in if_ipsec interface.

I.e. "ph1id NUM" and "remoteid NUM" are used to create relation between
"sainfo" and "remote" sections. And "requid N" options is used to lookup
corresponding SP in SPDB and install proper SA with needed reqid.

The example based on your config:

remote 10.9.8.2
{
exchange_mode main,aggressive;
doi ipsec_doi;
situation identity_only;

my_identifier address 10.9.8.3;
peers_identifier address 10.9.8.2;
ph1id 10982;

nonce_size 16;
initial_contact on;
proposal_check obey;# obey, strict, or claim
passive off;

proposal {
encryption_algorithm 3des;
hash_algorithm sha1;
authentication_method pre_shared_key;
dh_group 2;
}
}

remote 10.9.8.6
{
exchange_mode main,aggressive;
doi ipsec_doi;
situation identity_only;

my_identifier address 10.9.8.3;
peers_identifier address 10.9.8.6;
ph1id 10986;

nonce_size 16;
initial_contact on;
proposal_check obey;
passive off;

proposal {
encryption_algorithm aes;
hash_algorithm sha256;
authentication_method pre_shared_key;
dh_group 2;
}
}

sainfo anonymous
{
remoteid 10982;
reqid 100;
lifetime time 24 hour;

pfs_group 2;
encryption_algorithm 3des;
authentication_algorithm hmac_sha1;
compression_algorithm deflate;
}

sainfo anonymous
{
remoteid 10986;
reqid 200;
lifetime time 24 hour;

pfs_group 2;
encryption_algorithm aes;
authentication_algorithm hmac_sha256;
compression_algorithm deflate;
}

sainfo anonymous
{
lifetime time 30 min;

    pfs_group 2;
    encryption_algorithm des;
authentication_algorithm hmac_md5;
compression_algorithm deflate;
}

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >