subject:"pf route\-to issues"

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Stuart Henderson

On 2021/01/26 09:29, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > > 
> > > 
> > > I'm not sure if proposed scenario real. Let's assume there
> > > is a PF box with three NICs running on this awkward set up
> > > 
> > >   em1 ... 192.168.1.10
> > > 
> > >   em0
> > > 
> > >   em2 ... 192.168.1.10
> > > 
> > > em0 is attached to LAN em1 and em2 are facing to internet which is
> > > reachable with two different physical lines. both lines are connected 
> > > via
> > > equipment, which uses fixed IP address 192.168.1.10 and PF admin has
> > > no way to change that.
> > 
> > in this scenario are em1 and em2 connected to the same ethernet
> > segment and ip subnet?
> 
> em1 and em2 are connected to different wires (broadcast domains).
> 
> and I agree this set up is really awkward. not sure if it will work
> because it also depends on how ARP table is organized. I think it
> works on Solaris, but I have not tried that with OpenBSD.

I don't know about ethernet/ARP, but that does work with pppoe.
FWIW this is how I was doing things. (I'm not doing this any more)

PRIO_LINES="81.187.8.161@pppoe4 81.187.218.235@pppoe5"
PRIO_LINES_V6="fe80::@pppoe4 fe80::@pppoe5"
..
pass in log quick on {lan, wifi} proto udp to 193.35.128.0/20 port {500, 4500} 
tag uma-orange route-to {$PRIO_LINES}
pass in log quick on {lan, wifi} proto {esp, ah} to 193.35.128.0/20 tag 
uma-orange route-to {$PRIO_LINES}
pass out log quick on aaisp tagged uma-orange nat-to 81.187.41.148/30 random 
sticky-address static-port

pass in quick inet proto {tcp,udp} from {(self), 10.71.12.5, 192.168.10.5} to 
!10/8 \
port domain tag priority-dns route-to {$PRIO_LINES}
pass in quick inet6 proto {tcp,udp} from {(self), 2001:8b0:6412:11::5} to 
!2001:8b0:6412::/48 \
port domain tag priority-dns route-to {$PRIO_LINES_V6}
pass out on aaisp tagged priority-dns
..

In these I used the "my side" address on the pppoe interfaces because
the "ISP side" address was always 81.187.81.187. AFAIK PF didn't care
about the actual address when it was point-to-point and just a dummy
address worked. (Same with the standard route table too, but that
got changed, I think maybe with ART).

> > the other is to add add routes "beyond" 192.168.1.10 and route-to
> > them:
> > 
> >   # route add 127.0.1.10 192.168.1.10 -ifp em1
> >   # route add 127.0.2.10 192.168.1.10 -ifp em2
> > 
> > then you can write rules like this:
> > 
> >   pass in on em0 from 172.16.0.0/16 route-to 127.0.1.10
> >   pass in on em0 from 172.17.0.0/16 route-to 127.0.2.10

That sounds like a usable hack. Maybe even better than before (this
route should go down with the interface, so the traffic wouldn't
end up blackholed like it did with route-to ip@iface).

I think that's easier to deal with than rtables if you have incoming
traffic to services running on the router itself.

> yes I understand that. I'm not able to judge how many currently working
> configurations will remain broken after we kill 'address@interface' form. 
> I'm sure many deployments will get fixed with simple tweak:
>   echo 'address@interface'|cut -d @ -f 1

standard ethernet:  address@em0 -> address

ppp, different isps:address@pppoe0 -> (pppoe0:peer)

ppp with same isps, or
conflicting ethernet:   rtable or "routes beyond" hack

dhcp, rtadv:ETOOHARD, don't think it worked before without
scripted pf.conf changes anyway. use rtable.

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky

Hello,

On Tue, Jan 26, 2021 at 12:33:25PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> > > But what about dup-to?  The packet is duplicated for both directions.
> > > I guess the main use case for dup-to is implementing a monitor port.
> > > There you have to pass packets stateless, otherwise it would not
> > > work anyway.  The strange semantics is not related to this diff.
> >
> > are you saying i should skip pf_test for all dup-to generated packets?
> 
> I am not sure.
> 
> When we have an in dup-to rule, the incoming packets in request
> direction are dupped and tested with the out ruleset.  The reply
> packets for this state are also dupped, but not tested when they
> leave the dup interface.
> 
> This is inconsistent and cannot work statefully.  Stateful filtering
> with dupped packets does not make sense anyway.  The only working
> config is "pass out on dup-interface no state".
> 
> Do we think this rule should be required?
> 
> 1. No packet should leave an interface without a rule.
> 
> if (pd->dir == PF_IN || s->rt == PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> 
> 2. The config says we want a monitor port.  We risk that the
>original packet and the dupped packet match the same rule.
>Stateful filtering cannot work, we do not expect reply packets
>for the dups.
> 
> if (pd->dir == PF_IN && s->rt != PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> 
> 3. Some sort of problem was there before, but different.  Don't
>address it now.
> 

another option would be to mark duped packet as GENERATED
to short circuit pf_test() here:

6871 
6872 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
6873 return (PF_PASS);
6874 

perhaps excluding those changes from current diff is good.
this seems to be separate issue anyway.

thanks and
regards
sashan

Re: pf route-to issues

2021-01-26 Thread Claudio Jeker

On Tue, Jan 26, 2021 at 12:33:25PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> > > But what about dup-to?  The packet is duplicated for both directions.
> > > I guess the main use case for dup-to is implementing a monitor port.
> > > There you have to pass packets stateless, otherwise it would not
> > > work anyway.  The strange semantics is not related to this diff.
> > 
> > are you saying i should skip pf_test for all dup-to generated packets?
> 
> I am not sure.
> 
> When we have an in dup-to rule, the incoming packets in request
> direction are dupped and tested with the out ruleset.  The reply
> packets for this state are also dupped, but not tested when they
> leave the dup interface.
> 
> This is inconsistent and cannot work statefully.  Stateful filtering
> with dupped packets does not make sense anyway.  The only working
> config is "pass out on dup-interface no state".
> 
> Do we think this rule should be required?

dup-to is tricky. In general you should run the collector on its own
interface and then 'set skip on $dupif' could work. I would assume that
the copy of the packet is bypassing pf and is just sent directly without
hitting pf_test again. At least for route-to & reply-to I would expect
this behaviour (like I do not expect that I need an extra pass rule to
allow a rdr-to through).
 
> 1. No packet should leave an interface without a rule.
> 
> if (pd->dir == PF_IN || s->rt == PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> 
> 2. The config says we want a monitor port.  We risk that the
>original packet and the dupped packet match the same rule.
>Stateful filtering cannot work, we do not expect reply packets
>for the dups.
> 
> if (pd->dir == PF_IN && s->rt != PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)

I guess this is for the case where route-to is used in PF_IN. I agree this
should be done so that the state table is properly set. Skipping this for
the copy of dup-to packets makes sense.
Running pf_test() for the same mbuf and direction but with different ifp
is causing more harm the good.

> 3. Some sort of problem was there before, but different.  Don't
>address it now.
> 
> Maybe 2 has less impact for the users and is easy to understand.
> We should document that in the man page.
> 
> > > We are reaching a state where this diff can go in.  I just startet
> > > a regress run with it.  OK bluhm@
> > 
> > hopefully i fixed the pfctl error messages up so the regress tests arent
> > too unhappy.
> 
> pf forward and pf fragment tests pass.  They include route-to and
> reply-to rules.  I have no test for dup-to.  Regress pfctl fails,
> but I think dlg@ has a diff for that.
> 
> bluhm
> 

-- 
:wq Claudio

Re: pf route-to issues

2021-01-26 Thread Alexander Bluhm

On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> > But what about dup-to?  The packet is duplicated for both directions.
> > I guess the main use case for dup-to is implementing a monitor port.
> > There you have to pass packets stateless, otherwise it would not
> > work anyway.  The strange semantics is not related to this diff.
> 
> are you saying i should skip pf_test for all dup-to generated packets?

I am not sure.

When we have an in dup-to rule, the incoming packets in request
direction are dupped and tested with the out ruleset.  The reply
packets for this state are also dupped, but not tested when they
leave the dup interface.

This is inconsistent and cannot work statefully.  Stateful filtering
with dupped packets does not make sense anyway.  The only working
config is "pass out on dup-interface no state".

Do we think this rule should be required?

1. No packet should leave an interface without a rule.

if (pd->dir == PF_IN || s->rt == PF_DUPTO) {
if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)

2. The config says we want a monitor port.  We risk that the
   original packet and the dupped packet match the same rule.
   Stateful filtering cannot work, we do not expect reply packets
   for the dups.

if (pd->dir == PF_IN && s->rt != PF_DUPTO) {
if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)

3. Some sort of problem was there before, but different.  Don't
   address it now.

Maybe 2 has less impact for the users and is easy to understand.
We should document that in the man page.

> > We are reaching a state where this diff can go in.  I just startet
> > a regress run with it.  OK bluhm@
> 
> hopefully i fixed the pfctl error messages up so the regress tests arent
> too unhappy.

pf forward and pf fragment tests pass.  They include route-to and
reply-to rules.  I have no test for dup-to.  Regress pfctl fails,
but I think dlg@ has a diff for that.

bluhm

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky

Hello,

On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> On Mon, Jan 25, 2021 at 04:19:11PM +0100, Alexander Bluhm wrote:
> > On Fri, Jan 22, 2021 at 06:07:59PM +1000, David Gwynne wrote:
> > > --- sys/conf/GENERIC  30 Sep 2020 14:51:17 -  1.273
> > > +++ sys/conf/GENERIC  22 Jan 2021 07:33:30 -
> > > @@ -82,6 +82,7 @@ pseudo-device   msts1   # MSTS line discipl
> > >  pseudo-deviceendrun  1   # EndRun line discipline
> > >  pseudo-devicevnd 4   # vnode disk devices
> > >  pseudo-deviceksyms   1   # kernel symbols device
> > > +pseudo-devicekstat
> > >  #pseudo-device   dt  # Dynamic Tracer
> > >
> > >  # clonable devices
> > 
> > This is an unrelated chunk.
> 
> oh yeah...
> 
> > > +pf_route(struct pf_pdesc *pd, struct pf_state *s)
> > ...
> > > + if (pd->dir == PF_IN) {
> > >   if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> > 
> > Yes, this is the correct logic.  When the packet comes in, pf
> > overrides forwarding, tests the out rules, and sends it.  For
> > outgoing packets on out route-to rules we have already tested the
> > rules.  It also works for reply-to the other way around.
> 
> yep.

I'm suggesting to keep current if () in. and change it with follow
up commit, which will adjust match rule for route-to/reply-to.
would it be OK with you, David?

> 
> > But what about dup-to?  The packet is duplicated for both directions.
> > I guess the main use case for dup-to is implementing a monitor port.
> > There you have to pass packets stateless, otherwise it would not
> > work anyway.  The strange semantics is not related to this diff.
> 
> are you saying i should skip pf_test for all dup-to generated packets?

this makes perfect sense for me.

> 
> > We are reaching a state where this diff can go in.  I just startet
> > a regress run with it.  OK bluhm@
> 
> hopefully i fixed the pfctl error messages up so the regress tests arent
> too unhappy.


thanks and
regards
sashan

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky

Hello,


> > >   goto bad;
> > 
> > here we do 'goto bad', which does not call if_put().
> 
> yes it does. the whole chunk with the diff applied is:
> 
> done:
> if (s->rt != PF_DUPTO)
> pd->m = NULL;
> if_put(ifp);
> rtfree(rt);
> return;
> 
> bad:
> m_freem(m0);
> goto done;
> }
> 
> bad drops the mbuf and then goes to done.

yes you are absolutely right, I need glasses.

thanks and
regards
sashan

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky

Hello,

> 
> i'll need help with "match on em0 route-to $if_em0_peer". or we can do
> that later separately?

may be can we keep this line in pf_route() untouched at least for now:

6041 
6042 if (pd->kif->pfik_ifp != ifp) {
6043 if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
6044 goto bad;
6045 else if (m0 == NULL)
6046 goto done;
6047 if (m0->m_len < sizeof(struct ip)) {
6048 DPFPRINTF(LOG_ERR,
6049 "%s: m0->m_len < sizeof(struct ip)", __func__);
6050 goto bad;
6051 }
6052 ip = mtod(m0, struct ip *);
6053 }
6054 

I think if () at line 6042 does not hurt pfsync(4). This should be removed
with commit, which will introduce the 'match ... route-to'. There should be
more detailed explanation in my response to email from bluhm@.

thanks and
regards
sashan

Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky

Hello,


> > 
> > 
> > I'm not sure if proposed scenario real. Let's assume there
> > is a PF box with three NICs running on this awkward set up
> > 
> > em1 ... 192.168.1.10
> > 
> > em0
> > 
> > em2 ... 192.168.1.10
> > 
> > em0 is attached to LAN em1 and em2 are facing to internet which is
> > reachable with two different physical lines. both lines are connected 
> > via
> > equipment, which uses fixed IP address 192.168.1.10 and PF admin has
> > no way to change that.
> 
> in this scenario are em1 and em2 connected to the same ethernet
> segment and ip subnet?

em1 and em2 are connected to different wires (broadcast domains).

and I agree this set up is really awkward. not sure if it will work
because it also depends on how ARP table is organized. I think it
works on Solaris, but I have not tried that with OpenBSD.

> 
> > the 'address@interface' syntax is the only way to define rules:
> > 
> > pass in on em0 from 172.16.0.0/16 route-to 192.168.1.10@em1
> > pass in on em0 from 172.17.0.0/16 route-to 192.168.1.10@em2
> > 
> > regardless of how much real such scenario is I believe it can
> > currently work.
> 
> this is a very awkward configuration. while i think what it's trying
> to do is useful, how it is expressed relies on something i want to
> break (fix?).

yes I completely agree. I believe this 'awkward thinking' stems
back to ipf (ipfilter). I found my ancient notes to find some details
on this. My notes refer to ipf.conf(5) manpage [1].

If I understand the purpose of 'address@interface' syntax right,
it allows us to forward packets using explicit pair [ next-hop IP,
interface ]. Thus firewall can emulate a route, which might be missing in
routing table. It just 'blindly' sends matching packet to next-hop using
desired interface assuming that 'next-hop' host will do the right thing
with such packet.

> 
> one of the original reasons i wanted to break this kind of config
> is because pfsync hasn't got a way to exchange interace information,
> and different firewalls are going to have different interface
> topologies anyway. one of the reasons to only use a destination/next
> hop as the argument to route-to rules was so pfsync would work.

yes I understand that. I'm not able to judge how many currently working
configurations will remain broken after we kill 'address@interface' form. 
I'm sure many deployments will get fixed with simple tweak:
echo 'address@interface'|cut -d @ -f 1
but I'm not sure how many people do still need to order desired NIC. 

> 
> i'm pretty sure this is broken at the moment because of bugs in the
> routing code. it is possible to configure routes to 192.168.1.10
> via both em1 and em2 if net.inet.ip.multipath is set to 1, but im
> sure the llinfo (arp and rtable) part of this kind of multipath
> route setup does not work reliably. i guess i should try and get
> my fixes for this into the tree.
> 
> there are two alternate ways i can think of to do this. the first
> is to configure an rtable for each interface:
> 
>   # route -T 1 add default 192.168.1.10 -ifp em1
>   # route -T 2 add default 192.168.1.10 -ifp em2
> 
> then you could write rules like this:
> 
>   pass in on em0 from 172.16.0.0/16 rtable 1
>   pass in on em0 from 172.17.0.0/16 rtable 2
> 
> the other is to add add routes "beyond" 192.168.1.10 and route-to
> them:
> 
>   # route add 127.0.1.10 192.168.1.10 -ifp em1
>   # route add 127.0.2.10 192.168.1.10 -ifp em2
> 
> then you can write rules like this:
> 
>   pass in on em0 from 172.16.0.0/16 route-to 127.0.1.10
>   pass in on em0 from 172.17.0.0/16 route-to 127.0.2.10
> 
> this will likely hit the same bugs in the rtable/arp code i referred
> to above though.
> 
> also, note that i haven't tested either of these.
> 

I think this sounds like a plan how to deal with edge cases,
which can't get fixed with simple 'cut -d @ -f 1'

thanks and
regards
sashan

[1] https://www.freebsd.org/cgi/man.cgi?query=ipf.conf&sektion=5&apropos=0

(search for reply-to)

Re: [External] : Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 05:38:40PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> On Mon, Jan 25, 2021 at 03:21:29PM +0100, Alexander Bluhm wrote:
> > Hi,
> > 
> > Some personal thoughts.  I am happy when pf route-to gets simpler.
> > Especially I have never understood what this address@interface
> > syntax is used for.
> > 
> > I cannot estimate what configuration is used by our cutomers in
> > many installations.  Simple syntax change address@interface ->
> > address of next hob should be no problem.  Slight semantic changes
> > have to be dealt with.  Current packet flow is complicated and may
> > be inspired by old NAT behavior.  As long it becomes more sane and
> > easier to understand, we should change it.
> 
> 
> I'm not sure if proposed scenario real. Let's assume there
> is a PF box with three NICs running on this awkward set up
> 
>   em1 ... 192.168.1.10
> 
>   em0
> 
>   em2 ... 192.168.1.10
> 
> em0 is attached to LAN em1 and em2 are facing to internet which is
> reachable with two different physical lines. both lines are connected via
> equipment, which uses fixed IP address 192.168.1.10 and PF admin has
> no way to change that.

in this scenario are em1 and em2 connected to the same ethernet
segment and ip subnet?

> the 'address@interface' syntax is the only way to define rules:
> 
>   pass in on em0 from 172.16.0.0/16 route-to 192.168.1.10@em1
>   pass in on em0 from 172.17.0.0/16 route-to 192.168.1.10@em2
> 
> regardless of how much real such scenario is I believe it can
> currently work.

this is a very awkward configuration. while i think what it's trying
to do is useful, how it is expressed relies on something i want to
break (fix?).

one of the original reasons i wanted to break this kind of config
is because pfsync hasn't got a way to exchange interace information,
and different firewalls are going to have different interface
topologies anyway. one of the reasons to only use a destination/next
hop as the argument to route-to rules was so pfsync would work.

i'm pretty sure this is broken at the moment because of bugs in the
routing code. it is possible to configure routes to 192.168.1.10
via both em1 and em2 if net.inet.ip.multipath is set to 1, but im
sure the llinfo (arp and rtable) part of this kind of multipath
route setup does not work reliably. i guess i should try and get
my fixes for this into the tree.

there are two alternate ways i can think of to do this. the first
is to configure an rtable for each interface:

  # route -T 1 add default 192.168.1.10 -ifp em1
  # route -T 2 add default 192.168.1.10 -ifp em2

then you could write rules like this:

  pass in on em0 from 172.16.0.0/16 rtable 1
  pass in on em0 from 172.17.0.0/16 rtable 2

the other is to add add routes "beyond" 192.168.1.10 and route-to
them:

  # route add 127.0.1.10 192.168.1.10 -ifp em1
  # route add 127.0.2.10 192.168.1.10 -ifp em2

then you can write rules like this:

  pass in on em0 from 172.16.0.0/16 route-to 127.0.1.10
  pass in on em0 from 172.17.0.0/16 route-to 127.0.2.10

this will likely hit the same bugs in the rtable/arp code i referred
to above though.

also, note that i haven't tested either of these.

cheers,
dlg

Re: [External] : Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 06:17:02PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> pf_route() might leak a refence to ifp.

oh no :(

> > Index: sys/net/pf.c
> > ===
> > RCS file: /cvs/src/sys/net/pf.c,v
> > retrieving revision 1.1101
> > diff -u -p -r1.1101 pf.c
> > --- sys/net/pf.c19 Jan 2021 22:22:23 -  1.1101
> > +++ sys/net/pf.c22 Jan 2021 07:33:31 -
> 
> 
> 
> > @@ -5998,48 +5994,40 @@ pf_route(struct pf_pdesc *pd, struct pf_
> >  
> > ip = mtod(m0, struct ip *);
> >  
> 
> > +
> > +   ifp = if_get(rt->rt_ifidx);
> > if (ifp == NULL)
> > goto bad;
> 
> here we get a reference to ifp.

yep.

> > @@ -6082,9 +6060,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
> >  */
> > if (ip->ip_off & htons(IP_DF)) {
> > ipstat_inc(ips_cantfrag);
> > -   if (r->rt != PF_DUPTO)
> > +   if (s->rt != PF_DUPTO)
> > pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
> > -   ifp->if_mtu, pd->af, r, pd->rdomain);
> > +   ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain);
> > goto bad;
> 
>   here we do 'goto bad', which does not call if_put().

yes it does. the whole chunk with the diff applied is:

done:
if (s->rt != PF_DUPTO)
pd->m = NULL;
if_put(ifp);
rtfree(rt);
return;

bad:
m_freem(m0);
goto done;
}

bad drops the mbuf and then goes to done.

Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 04:19:11PM +0100, Alexander Bluhm wrote:
> On Fri, Jan 22, 2021 at 06:07:59PM +1000, David Gwynne wrote:
> > --- sys/conf/GENERIC30 Sep 2020 14:51:17 -  1.273
> > +++ sys/conf/GENERIC22 Jan 2021 07:33:30 -
> > @@ -82,6 +82,7 @@ pseudo-device msts1   # MSTS line discipl
> >  pseudo-device  endrun  1   # EndRun line discipline
> >  pseudo-device  vnd 4   # vnode disk devices
> >  pseudo-device  ksyms   1   # kernel symbols device
> > +pseudo-device  kstat
> >  #pseudo-device dt  # Dynamic Tracer
> >
> >  # clonable devices
> 
> This is an unrelated chunk.

oh yeah...

> > +pf_route(struct pf_pdesc *pd, struct pf_state *s)
> ...
> > +   if (pd->dir == PF_IN) {
> > if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> 
> Yes, this is the correct logic.  When the packet comes in, pf
> overrides forwarding, tests the out rules, and sends it.  For
> outgoing packets on out route-to rules we have already tested the
> rules.  It also works for reply-to the other way around.

yep.

> But what about dup-to?  The packet is duplicated for both directions.
> I guess the main use case for dup-to is implementing a monitor port.
> There you have to pass packets stateless, otherwise it would not
> work anyway.  The strange semantics is not related to this diff.

are you saying i should skip pf_test for all dup-to generated packets?

> We are reaching a state where this diff can go in.  I just startet
> a regress run with it.  OK bluhm@

hopefully i fixed the pfctl error messages up so the regress tests arent
too unhappy.

Re: [External] : Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 03:21:29PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Some personal thoughts.  I am happy when pf route-to gets simpler.
> Especially I have never understood what this address@interface
> syntax is used for.

even after staring at it for so long, i still don't get it. i do think
it was a reimplementation of an ipfilter thing, but i don't think it
makes a lot of sense in ipfilter either.

> I cannot estimate what configuration is used by our cutomers in
> many installations.  Simple syntax change address@interface ->
> address of next hob should be no problem.  Slight semantic changes
> have to be dealt with.  Current packet flow is complicated and may
> be inspired by old NAT behavior.  As long it becomes more sane and
> easier to understand, we should change it.

route-to $destination, not $next_hop...

the biggest change we have to agree on at the moment is whether we're
changing the semantic of "pf runs when a packet goes over an interface"
to "pf runs when a packet goes in or out of the stack". this affects
whether pf_test runs again when route-to changes the interface.

> But I don't like artificial restrictions.  We don't know all use
> cases.  reply-to and route-to could be used for both in and out
> rules.  I have used them for strange divert-to on bridge setups.
> It should stay that way.

i don't think it's complicated to support route-to and reply-to on both
in and out rules. we've already found that there's use cases for
reply-to on inbound rules, doing things on bridges just adds to that.
it could be used on tpmr(4) too...

> It would be nice to keep state-less route-to.  I have found a special
> case with that in the code of our product.  But it looks like dead
> code, so I would not object to remove state-less route-to for now.

ok. thank you.

> bluhm

Re: [External] : Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 02:30:46PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
...
> > 
> > i dont understand the kif lifetimes though. can we just point a
> > pdesc at an arbitrary kif or do we need ot reference count?
> > 
> 
> as long as we don't release NET_LOCK() (or PF_LOCK() in near future),
> the reference to kif should remain valid.
> 
> kif is very own PF's abstraction of network interfaces. there are
> two ways how one can create a kif:
> 
>   there is either rule loaded, which refers to interface, which
>   is not plumbed yet.
> 
>   the new interface gets plumbed and PF must have a kif for it.
> 
> the reference count currently used makes sure we destroy kif when
> either interface is gone or when the rule, which refers kif is
> gone.
> 
> hope I remember details above right.

ok. sounds simple enough...

> > > I think this might be way to go.
> > > 
> > > My only concern now is that such change is subtle. I mean the
> > > 
> > >   pass out ... route-to 
> > > 
> > > will change behavior in sense that current code will dispatch
> > > packet to new interface and run pf_test() again. Once your diff
> > > will be in the same rule will be accepted, but will bring entirely
> > > different behaviour: just dispatch packet to new interface.
> > 
> > yeah.
> > 
> > the counter example is that i was extremely surprised when i
> > discovered that pf_test gets run again when the outgoing interface
> > is changed with a route-to rule.
> 
> surprised because you've forgot about current model? which is:
> run pf_test() whenever packet crosses the interface?

forgot isn't the right word. i was in the room when henning and mcbride
reworked pf and came up with the stack and wire side key terminology and
semantics, and i've spent a lot of time looking at the ip input and
output code where pf_test is called close to the stack. it just wasn't
obvious to me that pf filtered over an interface rather than filtered
in and out of the stack. apart from route-to in pf, i'm not sure it is a
meaningful difference either.

> > there's subtlety either way, we're just figuring out which one we're
> > going for.
> 
> yes exactly. there are trade offs either way.
> 
> 
> 
> > > I think this is acceptable. If this will cause a friction we can 
> > > always
> > > adjust the code in follow up commit to allow state-less 
> > > route-to/reply-to
> > > with no support from pfsync(4).
> > 
> > if we're going to support route-to on match rules i think this will be
> > easy to implement. 
> > 
> 
> I think there must be some broader consent on model change
> from current which is run pf_test() for every NIC crossing
> to new way, which is run pf_test() at most two times.

agreed.

> > > > > > lastly, the "argument" or address specified with route-to (and
> > > > > > reply-to and dup-to) is a destination address, not a next-hop. this
> > > > > > has been discussed on the lists a couple of times before, so i won't
> > > > > > go over it again, except to reiterate that it allows pf to force
> > > > > > "sticky" path selection while opening up the possibility for ecmp
> > > > > > and failover for where that path traverses.
> > > > > 
> > > > > I keep forgetting about it as I still stick to current 
> > > > > interpretation.
> > > > > 
> > > > > 
> > > > > I've seen changes to pfctl. Diff below still allows rule:
> > > > > 
> > > > > pass in on net0 from 192.168.1.0/24 to any route-to 
> > > > > 10.10.10.10@em0
> > > > 
> > > > Is there use case for the @interface syntax apart from the current
> > > > route-to rules? If not, we can just delete it.
> > > 
> > > perhaps I'm still not quite on the same page as you then. I also
> > > had no time to entirely test you diff. The way I understand your
> > > effort is to change route-to behavior such it will be using
> > > a destination instead of next-hop@interface. Or are you planning
> > > to keep current form ('route-to next-hop@interface') working?
> > 
> > if we ignore route-to, what's the use case for the interface part of
> > address@interface? it doesnt seem to be accepted as part of an address
> > in other parts of the grammar:
> > 
> > dlg@kbuild ~$ echo pass in from 192.168.0.0@vmx0 | sudo pfctl -nf -
> > stdin:1: @if syntax not permitted in from or to
> > stdin:1: skipping rule due to errors
> > stdin:1: rule expands to no valid combination
> > dlg@kbuild ~$ echo pass from 192.168.0.0@vmx0 | sudo pfctl -nf -
> > stdin:1: @if syntax not permitted in from or to
> > stdin:1: skipping rule due to errors
> > stdin:1: rule expands to no valid combination
> > dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfctl -nf -
> > stdin:1: @if not permitted
> > stdin:1: nat-to and rdr-to require a direction
> > stdin:1: skipping rule due to errors
> > stdin:1: rule expands to no valid combination
> > dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfc

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky

Hello,

pf_route() might leak a refence to ifp.

> Index: sys/net/pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1101
> diff -u -p -r1.1101 pf.c
> --- sys/net/pf.c  19 Jan 2021 22:22:23 -  1.1101
> +++ sys/net/pf.c  22 Jan 2021 07:33:31 -



> @@ -5998,48 +5994,40 @@ pf_route(struct pf_pdesc *pd, struct pf_
>  
>   ip = mtod(m0, struct ip *);
>  

> +
> + ifp = if_get(rt->rt_ifidx);
>   if (ifp == NULL)
>   goto bad;

here we get a reference to ifp.

>  
> - if (pd->kif->pfik_ifp != ifp) {
> + /* A locally generated packet may have invalid source address. */
> + if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET &&
> + (ifp->if_flags & IFF_LOOPBACK) == 0)
> + ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr;
> +
> + if (pd->dir == PF_IN) {
>   if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> @@ -6052,16 +6040,6 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   ip = mtod(m0, struct ip *);
>   }
>  
> - rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid);
> - if (!rtisvalid(rt)) {
> - ipstat_inc(ips_noroute);
> - goto bad;
> - }
> - /* A locally generated packet may have invalid source address. */
> - if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET &&
> - (ifp->if_flags & IFF_LOOPBACK) == 0)
> - ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr;
> -
>   in_proto_cksum_out(m0, ifp);
>  
>   if (ntohs(ip->ip_len) <= ifp->if_mtu) {
> @@ -6082,9 +6060,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
>*/
>   if (ip->ip_off & htons(IP_DF)) {
>   ipstat_inc(ips_cantfrag);
> - if (r->rt != PF_DUPTO)
> + if (s->rt != PF_DUPTO)
>   pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
> - ifp->if_mtu, pd->af, r, pd->rdomain);
> + ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain);
>   goto bad;

here we do 'goto bad', which does not call if_put().

>   }
>  
> @@ -6108,8 +6086,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   ipstat_inc(ips_fragmented);
>  
>  done:
> - if (r->rt != PF_DUPTO)
> + if (s->rt != PF_DUPTO)
>   pd->m = NULL;
> + if_put(ifp);
>   rtfree(rt);
>   return;
>  


pf_route6() suffers from the same issue.


thanks and
regards
sashan

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky

Hello,


On Mon, Jan 25, 2021 at 03:21:29PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Some personal thoughts.  I am happy when pf route-to gets simpler.
> Especially I have never understood what this address@interface
> syntax is used for.
> 
> I cannot estimate what configuration is used by our cutomers in
> many installations.  Simple syntax change address@interface ->
> address of next hob should be no problem.  Slight semantic changes
> have to be dealt with.  Current packet flow is complicated and may
> be inspired by old NAT behavior.  As long it becomes more sane and
> easier to understand, we should change it.


I'm not sure if proposed scenario real. Let's assume there
is a PF box with three NICs running on this awkward set up

em1 ... 192.168.1.10

em0

em2 ... 192.168.1.10

em0 is attached to LAN em1 and em2 are facing to internet which is
reachable with two different physical lines. both lines are connected via
equipment, which uses fixed IP address 192.168.1.10 and PF admin has
no way to change that.

the 'address@interface' syntax is the only way to define rules:

pass in on em0 from 172.16.0.0/16 route-to 192.168.1.10@em1
pass in on em0 from 172.17.0.0/16 route-to 192.168.1.10@em2

regardless of how much real such scenario is I believe it can
currently work.



> 
> But I don't like artificial restrictions.  We don't know all use
> cases.  reply-to and route-to could be used for both in and out
> rules.  I have used them for strange divert-to on bridge setups.
> It should stay that way.
> 

OK I agree.


regards
sashan

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky

hello,


> 
> > +pf_route(struct pf_pdesc *pd, struct pf_state *s)
> ...
> > +   if (pd->dir == PF_IN) {
> > if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
> 
> Yes, this is the correct logic.  When the packet comes in, pf
> overrides forwarding, tests the out rules, and sends it.  For
> outgoing packets on out route-to rules we have already tested the
> rules.  It also works for reply-to the other way around.

the change above changes the behavior from 'do pf_test() when packet
crosses interface' to 'call pf_test() at most two times'. the change
essentially breaks the behavior I've mentioned earlier.

table  { 10.10.10.10, 172.16.1.1 }

pass out on em0 from 192.168.1.0/24 to any route-to 
pass out on em1 from 192.168.1.0 to any nat-to (em1)
pass out on em2 all 

the story goes as follows:
destination/next-hop 10.10.10.10 is reached via em1
destination/next-hop 172.16.1.1 is reached via em2

for arbitrary reason we have to do NAT on em1 interface.
with current PF things do work. trading current 'if (...)'
for 'if (pd->dir == PF_IN)' breaks it. The NAT won't happen.

I think this can be solved by using 'match ... route-to ...' rule suggested by
David. However...  I actually start to wonder does it hurt us to keep current
behavior? Why do we want to change it? I'm getting worried this particular
tid-bit will bring more harm than good. It also does not help much pfsync(4),
if I understand things right.


> 
> But what about dup-to?  The packet is duplicated for both directions.
> I guess the main use case for dup-to is implementing a monitor port.
> There you have to pass packets stateless, otherwise it would not
> work anyway.  The strange semantics is not related to this diff.

this is a good point.


thanks and
regards
sahan

Re: pf route-to issues

2021-01-25 Thread Alexander Bluhm

On Fri, Jan 22, 2021 at 06:07:59PM +1000, David Gwynne wrote:
> --- sys/conf/GENERIC  30 Sep 2020 14:51:17 -  1.273
> +++ sys/conf/GENERIC  22 Jan 2021 07:33:30 -
> @@ -82,6 +82,7 @@ pseudo-device   msts1   # MSTS line discipl
>  pseudo-deviceendrun  1   # EndRun line discipline
>  pseudo-devicevnd 4   # vnode disk devices
>  pseudo-deviceksyms   1   # kernel symbols device
> +pseudo-devicekstat
>  #pseudo-device   dt  # Dynamic Tracer
>  
>  # clonable devices

This is an unrelated chunk.

> +pf_route(struct pf_pdesc *pd, struct pf_state *s)
...
> + if (pd->dir == PF_IN) {
>   if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)

Yes, this is the correct logic.  When the packet comes in, pf
overrides forwarding, tests the out rules, and sends it.  For
outgoing packets on out route-to rules we have already tested the
rules.  It also works for reply-to the other way around.

But what about dup-to?  The packet is duplicated for both directions.
I guess the main use case for dup-to is implementing a monitor port.
There you have to pass packets stateless, otherwise it would not
work anyway.  The strange semantics is not related to this diff.

We are reaching a state where this diff can go in.  I just startet
a regress run with it.  OK bluhm@

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexander Bluhm

Hi,

Some personal thoughts.  I am happy when pf route-to gets simpler.
Especially I have never understood what this address@interface
syntax is used for.

I cannot estimate what configuration is used by our cutomers in
many installations.  Simple syntax change address@interface ->
address of next hob should be no problem.  Slight semantic changes
have to be dealt with.  Current packet flow is complicated and may
be inspired by old NAT behavior.  As long it becomes more sane and
easier to understand, we should change it.

But I don't like artificial restrictions.  We don't know all use
cases.  reply-to and route-to could be used for both in and out
rules.  I have used them for strange divert-to on bridge setups.
It should stay that way.

It would be nice to keep state-less route-to.  I have found a special
case with that in the code of our product.  But it looks like dead
code, so I would not object to remove state-less route-to for now.

bluhm

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky

Hello,


> > 
> > If I understand the idea right, then basically 'match out on em0'
> > figures out the new 'outbound interface' so either 'pass out on em1...' 
> > or
> > 'pass out on em2...' will kick in. In other words:
> > 
> > depending on the destination picked up from  table,
> > the route-to action will override the em0 interface to
> > either em1 or em2.
> 
> yes.
> 
> i dont understand the kif lifetimes though. can we just point a
> pdesc at an arbitrary kif or do we need ot reference count?
> 

as long as we don't release NET_LOCK() (or PF_LOCK() in near future),
the reference to kif should remain valid.

kif is very own PF's abstraction of network interfaces. there are
two ways how one can create a kif:

there is either rule loaded, which refers to interface, which
is not plumbed yet.

the new interface gets plumbed and PF must have a kif for it.

the reference count currently used makes sure we destroy kif when
either interface is gone or when the rule, which refers kif is
gone.

hope I remember details above right.


> > I think this might be way to go.
> > 
> > My only concern now is that such change is subtle. I mean the
> > 
> > pass out ... route-to 
> > 
> > will change behavior in sense that current code will dispatch
> > packet to new interface and run pf_test() again. Once your diff
> > will be in the same rule will be accepted, but will bring entirely
> > different behaviour: just dispatch packet to new interface.
> 
> yeah.
> 
> the counter example is that i was extremely surprised when i
> discovered that pf_test gets run again when the outgoing interface
> is changed with a route-to rule.

surprised because you've forgot about current model? which is:
run pf_test() whenever packet crosses the interface?

> 
> there's subtlety either way, we're just figuring out which one we're
> going for.

yes exactly. there are trade offs either way.



> > I think this is acceptable. If this will cause a friction we can always
> > adjust the code in follow up commit to allow state-less 
> > route-to/reply-to
> > with no support from pfsync(4).
> 
> if we're going to support route-to on match rules i think this will be
> easy to implement. 
> 

I think there must be some broader consent on model change
from current which is run pf_test() for every NIC crossing
to new way, which is run pf_test() at most two times.


> > > 
> > > > > 
> > > > > lastly, the "argument" or address specified with route-to (and
> > > > > reply-to and dup-to) is a destination address, not a next-hop. this
> > > > > has been discussed on the lists a couple of times before, so i won't
> > > > > go over it again, except to reiterate that it allows pf to force
> > > > > "sticky" path selection while opening up the possibility for ecmp
> > > > > and failover for where that path traverses.
> > > > 
> > > > I keep forgetting about it as I still stick to current 
> > > > interpretation.
> > > > 
> > > > 
> > > > I've seen changes to pfctl. Diff below still allows rule:
> > > > 
> > > > pass in on net0 from 192.168.1.0/24 to any route-to 10.10.10.10@em0
> > > 
> > > Is there use case for the @interface syntax apart from the current
> > > route-to rules? If not, we can just delete it.
> > 
> > perhaps I'm still not quite on the same page as you then. I also
> > had no time to entirely test you diff. The way I understand your
> > effort is to change route-to behavior such it will be using
> > a destination instead of next-hop@interface. Or are you planning
> > to keep current form ('route-to next-hop@interface') working?
> 
> if we ignore route-to, what's the use case for the interface part of
> address@interface? it doesnt seem to be accepted as part of an address
> in other parts of the grammar:
> 
> dlg@kbuild ~$ echo pass in from 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if syntax not permitted in from or to
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass from 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if syntax not permitted in from or to
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if not permitted
> stdin:1: nat-to and rdr-to require a direction
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfctl -nf - 
> 
> if route-to isn't going to use it, can we cut the @interface part out of
> fthe address parser?
> 

that's also my understanding. the form 192.168.0.0@net0 is currently
used by route-to et.al. only. So we can eventually let it go.

however there is a strong objection from Sven, which wants it to keep
it around.


> > unless we don't

Re: [External] : Re: pf route-to issues

2021-01-25 Thread David Gwynne

On Mon, Jan 25, 2021 at 01:11:35PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > > 
> > > I understand that simple is better here, so I won't object
> > > if we will lean towards simplified model above. However I still
> > > would like to share my view on current PF.
> > > 
> > > the way I understand how things (should) work currently is fairly 
> > > simple:
> > > 
> > >   we always run pf_test() as packet crosses interface.
> > >   packet can cross interface either in outbound or
> > >   inbound direction.
> > 
> > That's how I understand the current code. I'm proposing that we change
> > the semantics so they are:
> > 
> > - we always run pf_test as a packet enters or leaves the network stack.
> > - pf is able to filter or apply policy based on various attributes
> >   of the packet such as addresses and ports, but also metadata about
> >   the packet such as the current prio, or the interface it came
> >   from or is going to.
> > - changing a packet or it's metadata does not cause a rerun of pf_test.
> > - route-to on an incoming packet basically bypasses the default
> >   stack processing with a "fast route" out of the stack.
> > 
> > > this way we can always create a complex route-to loops,
> > > however it can also solve some route-to vs. NAT issues.
> > > consider those fairly innocent rules:
> > > 
> > > 8<---8<---8<--8<
> > > table  { 10.10.10.10, 172.16.1.1 }
> > > 
> > > pass out on em0 from 192.168.1.0/24 to any route-to 
> > > pass out on em1 from 192.168.1.0 to any nat-to (em1)
> > > pass out on em2 all 
> > > 8<---8<---8<--8<
> > > 
> > > Rules above should currently work, but will stop if we will
> > > go with simplified model.
> > 
> > The entries in  make the packet go out em1 and em2?
> 
> yes they do. let's say 10.10.10.10 is reached over em1, 172.16.1.1 is
> reached over em2. sorry I have not specified that in my earlier email.

npz.

> 
> > 
> > I'm ok with breaking configs like that. We don't run pf_test again for
> > other changes to the packet, so if we do want to support something like
> > that I think we should make the following work:
> > 
> >   # pf_pdesc kif is em0
> >   match out on em0 from 192.168.1.0/24 to any route-to 
> >   # pf_pdesc kif is now em1
> >   pass out on em1 from 192.168.1.0 to any nat-to (em1)
> >   pass out on em2 all
> > 
> > This is more in line with how NAT rules operate.
> 
> If I understand the idea right, then basically 'match out on em0'
> figures out the new 'outbound interface' so either 'pass out on em1...' or
> 'pass out on em2...' will kick in. In other words:
> 
>   depending on the destination picked up from  table,
>   the route-to action will override the em0 interface to
>   either em1 or em2.

yes.

i dont understand the kif lifetimes though. can we just point a
pdesc at an arbitrary kif or do we need ot reference count?

> I think this might be way to go.
> 
> My only concern now is that such change is subtle. I mean the
> 
>   pass out ... route-to 
> 
> will change behavior in sense that current code will dispatch
> packet to new interface and run pf_test() again. Once your diff
> will be in the same rule will be accepted, but will bring entirely
> different behaviour: just dispatch packet to new interface.

yeah.

the counter example is that i was extremely surprised when i
discovered that pf_test gets run again when the outgoing interface
is changed with a route-to rule.

there's subtlety either way, we're just figuring out which one we're
going for.

> > > I'll be OK with your simplified model if it will make things
> > > more explicit:
> > > 
> > >   route-to option should be applied on inbound rules
> > >   only
> > 
> > This would restrict how we currently write rules. See below about how we
> > would be using it.
> > 
> > >   reply-to option should be applied on outbound rule
> > >   only
> > 
> > I'm using reply-to on inbound rules. On these boxes I have a service
> > (it's a dns resolver running unbound) that is accessible only via
> > gre(4) tunnels, and I need the replies to those connections to go
> > out the same interface they came in on. I'm running an older version of
> > my diff, so I can have rules like this to make it work:
> > 
> >   pass in quick on gre0 reply-to gre0:peer
> >   pass in quick on gre1 reply-to gre1:peer
> > 
> > The DNS traffic isn't going through this box, the replies that
> > unbound is generating match the state created by the inbound rule.
> > 
> > If I'm remembering correctly, sthen@ had a similar use case.
> 
> you are right, I did not think much of a local bound traffic.
> in this case reply-to needs to be kept as-is.
> 
> > 
> > >   dup-to option can go either way (in/out)
> > 
> > Yep.
> > 
> > > does it make sense? IMO yes, because doing route-to
> > >

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky

Hello,


> > 
> > I understand that simple is better here, so I won't object
> > if we will lean towards simplified model above. However I still
> > would like to share my view on current PF.
> > 
> > the way I understand how things (should) work currently is fairly 
> > simple:
> > 
> > we always run pf_test() as packet crosses interface.
> > packet can cross interface either in outbound or
> > inbound direction.
> 
> That's how I understand the current code. I'm proposing that we change
> the semantics so they are:
> 
> - we always run pf_test as a packet enters or leaves the network stack.
> - pf is able to filter or apply policy based on various attributes
>   of the packet such as addresses and ports, but also metadata about
>   the packet such as the current prio, or the interface it came
>   from or is going to.
> - changing a packet or it's metadata does not cause a rerun of pf_test.
> - route-to on an incoming packet basically bypasses the default
>   stack processing with a "fast route" out of the stack.
> 
> > this way we can always create a complex route-to loops,
> > however it can also solve some route-to vs. NAT issues.
> > consider those fairly innocent rules:
> > 
> > 8<---8<---8<--8<
> > table  { 10.10.10.10, 172.16.1.1 }
> > 
> > pass out on em0 from 192.168.1.0/24 to any route-to 
> > pass out on em1 from 192.168.1.0 to any nat-to (em1)
> > pass out on em2 all 
> > 8<---8<---8<--8<
> > 
> > Rules above should currently work, but will stop if we will
> > go with simplified model.
> 
> The entries in  make the packet go out em1 and em2?

yes they do. let's say 10.10.10.10 is reached over em1, 172.16.1.1 is
reached over em2. sorry I have not specified that in my earlier email.

> 
> I'm ok with breaking configs like that. We don't run pf_test again for
> other changes to the packet, so if we do want to support something like
> that I think we should make the following work:
> 
>   # pf_pdesc kif is em0
>   match out on em0 from 192.168.1.0/24 to any route-to 
>   # pf_pdesc kif is now em1
>   pass out on em1 from 192.168.1.0 to any nat-to (em1)
>   pass out on em2 all
> 
> This is more in line with how NAT rules operate.

If I understand the idea right, then basically 'match out on em0'
figures out the new 'outbound interface' so either 'pass out on em1...' or
'pass out on em2...' will kick in. In other words:

depending on the destination picked up from  table,
the route-to action will override the em0 interface to
either em1 or em2.

I think this might be way to go.

My only concern now is that such change is subtle. I mean the

pass out ... route-to 

will change behavior in sense that current code will dispatch
packet to new interface and run pf_test() again. Once your diff
will be in the same rule will be accepted, but will bring entirely
different behaviour: just dispatch packet to new interface.


> 
> > I'll be OK with your simplified model if it will make things
> > more explicit:
> > 
> > route-to option should be applied on inbound rules
> > only
> 
> This would restrict how we currently write rules. See below about how we
> would be using it.
> 
> > reply-to option should be applied on outbound rule
> > only
> 
> I'm using reply-to on inbound rules. On these boxes I have a service
> (it's a dns resolver running unbound) that is accessible only via
> gre(4) tunnels, and I need the replies to those connections to go
> out the same interface they came in on. I'm running an older version of
> my diff, so I can have rules like this to make it work:
> 
>   pass in quick on gre0 reply-to gre0:peer
>   pass in quick on gre1 reply-to gre1:peer
> 
> The DNS traffic isn't going through this box, the replies that
> unbound is generating match the state created by the inbound rule.
> 
> If I'm remembering correctly, sthen@ had a similar use case.

you are right, I did not think much of a local bound traffic.
in this case reply-to needs to be kept as-is.

> 
> > dup-to option can go either way (in/out)
> 
> Yep.
> 
> > does it make sense? IMO yes, because doing route-to
> > on outbound path feels unnatural to me.
> 
> I agree that it feels a bit unnatural, but so far all the route-to rules
> I've been writing have been on pass out rules. That could be peculiar to
> my setup, but we generally allow packets in on our external links, and
> apply policy on the outbound interface heading towards the relevant
> service. eg:
> 
>   block
>   pass in on $if_external
>   pass out on $if_webservers proto tcp to port { http https }
>   pass out on $if_relays proto { tcp udp } to port domain
> 
> We'd be sprinkling route-to on these pass out rules to tie connections
> to specific backends.
> 
> > 
> > 
> > 
> > > 
> > >

Re: [External] : Re: pf route-to issues

2021-01-24 Thread Sven F.

On Sun, Jan 24, 2021 at 11:51 PM David Gwynne  wrote:

> On Mon, Jan 25, 2021 at 02:50:12AM +0100, Alexandr Nedvedicky wrote:
> > Hello,
> >
> > >
> > > ok. i don't know how to split up the rest of the change though.
> > >
> > > here's an updated diff that includes the rest of the kernel changes and
> > > the pfctl and pf.conf tweaks.
> > >
> > > it's probably useful for me to try and explain at a high level what
> > > i think the semantics should be, otherwise we might end up arguing
> about
> > > which bits of the current config i broke.
> > >
> > > so, from an extremely high level point of view, and apologies if
> > > this is condescending, pf sits between the network stack and an
> > > interface that a packet travels on. for connections handled by the
> > > local box, this means packets come from the stack and get an output
> > > interface selected by a route lookup, then pf checks it, and then
> > > it goes out the selected interface. replies come into an interface,
> > > get checked by pf, and then enter the stack. when forwarding, a
> > > packet comes into an interface, pf checks it, the stack does a route
> > > lookup to pick an interface, pf checks it again, and then it goes
> > > out the interface.
> > >
> > > so what does it mean when route-to (or reply-to) gets involved? i'm
> > > saying that when route-to is applied to a packet, pf takes the packet
> > > away from the stack and immediately forwards it toward to specified
> > > destination address. for a packet entering the system, ie, when the
> > > packet is going from the interface into the stack, route-to should
> > > pretend that it is forwarding the packet and basically push it
> > > straight out an interface. however, like normal forwarding via the
> > > stack, there might be some policy on packets leaving that interface
> that
> > > you want to apply, so pf should run pf_test in that situation so the
> > > policy can be applied. this is especially useful if you need to apply
> > > nat-to when packets leave a particular interface.
> > >
> > > however, if you route-to when a packet is on the way out of the
> > > stack, i'm arguing that pf should not run again against that packet.
> > > currently route-to rules run pf_test again if the interface the packet
> > > is routed out of changes, which means pf runs multiple times against a
> > > packet if rules keep changing which interface it goes out. this means
> > > there's loop prevention in pf to mitigate against this, and weird
> > > potentials for multiple states to be created when nat gets involved.
> > >
> > > for simplicity, both in terms of reasoning and code i think pf should
> > > only be run once when a packet enters the system, and only once when it
> > > leaves the system. the only reason i can come up with for running
> > > pf_test multiple times when route-to changes the outgoing interface is
> > > so you can check the packet with "pass out on $new_if" type rules. we
> > > don't rerun pf again when nat/rdr changes addresses, so this feels
> > > inconsistent to me.
> >
> > I understand that simple is better here, so I won't object
> > if we will lean towards simplified model above. However I still
> > would like to share my view on current PF.
> >
> > the way I understand how things (should) work currently is fairly
> simple:
> >
> >   we always run pf_test() as packet crosses interface.
> >   packet can cross interface either in outbound or
> >   inbound direction.
>
> That's how I understand the current code. I'm proposing that we change
> the semantics so they are:
>
> - we always run pf_test as a packet enters or leaves the network stack.
> - pf is able to filter or apply policy based on various attributes
>   of the packet such as addresses and ports, but also metadata about
>   the packet such as the current prio, or the interface it came
>   from or is going to.
> - changing a packet or it's metadata does not cause a rerun of pf_test.
> - route-to on an incoming packet basically bypasses the default
>   stack processing with a "fast route" out of the stack.
>
> > this way we can always create a complex route-to loops,
> > however it can also solve some route-to vs. NAT issues.
> > consider those fairly innocent rules:
> >
> > 8<---8<---8<--8<
> > table  { 10.10.10.10, 172.16.1.1 }
> >
> > pass out on em0 from 192.168.1.0/24 to any route-to 
> > pass out on em1 from 192.168.1.0 to any nat-to (em1)
> > pass out on em2 all
> > 8<---8<---8<--8<
> >
> > Rules above should currently work, but will stop if we will
> > go with simplified model.
>
> The entries in  make the packet go out em1 and em2?
>
> I'm ok with breaking configs like that. We don't run pf_test again for
> other changes to the packet, so if we do want to support something like
> that I think we should make the following work:
>
>   # pf_pdesc kif is em0
>

Re: [External] : Re: pf route-to issues

2021-01-24 Thread David Gwynne

On Mon, Jan 25, 2021 at 02:50:12AM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> > 
> > ok. i don't know how to split up the rest of the change though.
> > 
> > here's an updated diff that includes the rest of the kernel changes and
> > the pfctl and pf.conf tweaks.
> > 
> > it's probably useful for me to try and explain at a high level what
> > i think the semantics should be, otherwise we might end up arguing about
> > which bits of the current config i broke.
> > 
> > so, from an extremely high level point of view, and apologies if
> > this is condescending, pf sits between the network stack and an
> > interface that a packet travels on. for connections handled by the
> > local box, this means packets come from the stack and get an output
> > interface selected by a route lookup, then pf checks it, and then
> > it goes out the selected interface. replies come into an interface,
> > get checked by pf, and then enter the stack. when forwarding, a
> > packet comes into an interface, pf checks it, the stack does a route
> > lookup to pick an interface, pf checks it again, and then it goes
> > out the interface.
> > 
> > so what does it mean when route-to (or reply-to) gets involved? i'm
> > saying that when route-to is applied to a packet, pf takes the packet
> > away from the stack and immediately forwards it toward to specified
> > destination address. for a packet entering the system, ie, when the
> > packet is going from the interface into the stack, route-to should
> > pretend that it is forwarding the packet and basically push it
> > straight out an interface. however, like normal forwarding via the
> > stack, there might be some policy on packets leaving that interface that
> > you want to apply, so pf should run pf_test in that situation so the
> > policy can be applied. this is especially useful if you need to apply
> > nat-to when packets leave a particular interface.
> > 
> > however, if you route-to when a packet is on the way out of the
> > stack, i'm arguing that pf should not run again against that packet.
> > currently route-to rules run pf_test again if the interface the packet
> > is routed out of changes, which means pf runs multiple times against a
> > packet if rules keep changing which interface it goes out. this means
> > there's loop prevention in pf to mitigate against this, and weird
> > potentials for multiple states to be created when nat gets involved.
> > 
> > for simplicity, both in terms of reasoning and code i think pf should
> > only be run once when a packet enters the system, and only once when it
> > leaves the system. the only reason i can come up with for running
> > pf_test multiple times when route-to changes the outgoing interface is
> > so you can check the packet with "pass out on $new_if" type rules. we
> > don't rerun pf again when nat/rdr changes addresses, so this feels
> > inconsistent to me.
> 
> I understand that simple is better here, so I won't object
> if we will lean towards simplified model above. However I still
> would like to share my view on current PF.
> 
> the way I understand how things (should) work currently is fairly simple:
> 
>   we always run pf_test() as packet crosses interface.
>   packet can cross interface either in outbound or
>   inbound direction.

That's how I understand the current code. I'm proposing that we change
the semantics so they are:

- we always run pf_test as a packet enters or leaves the network stack.
- pf is able to filter or apply policy based on various attributes
  of the packet such as addresses and ports, but also metadata about
  the packet such as the current prio, or the interface it came
  from or is going to.
- changing a packet or it's metadata does not cause a rerun of pf_test.
- route-to on an incoming packet basically bypasses the default
  stack processing with a "fast route" out of the stack.

> this way we can always create a complex route-to loops,
> however it can also solve some route-to vs. NAT issues.
> consider those fairly innocent rules:
> 
> 8<---8<---8<--8<
> table  { 10.10.10.10, 172.16.1.1 }
> 
> pass out on em0 from 192.168.1.0/24 to any route-to 
> pass out on em1 from 192.168.1.0 to any nat-to (em1)
> pass out on em2 all 
> 8<---8<---8<--8<
> 
> Rules above should currently work, but will stop if we will
> go with simplified model.

The entries in  make the packet go out em1 and em2?

I'm ok with breaking configs like that. We don't run pf_test again for
other changes to the packet, so if we do want to support something like
that I think we should make the following work:

  # pf_pdesc kif is em0
  match out on em0 from 192.168.1.0/24 to any route-to 
  # pf_pdesc kif is now em1
  pass out on em1 from 192.168.1.0 to any nat-to (em1)
  pass out on em2 all

This is more in line with how NAT rules operate.

> I'll be OK with

Re: [External] : Re: pf route-to issues

2021-01-24 Thread Alexandr Nedvedicky

Hello,

> 
> ok. i don't know how to split up the rest of the change though.
> 
> here's an updated diff that includes the rest of the kernel changes and
> the pfctl and pf.conf tweaks.
> 
> it's probably useful for me to try and explain at a high level what
> i think the semantics should be, otherwise we might end up arguing about
> which bits of the current config i broke.
> 
> so, from an extremely high level point of view, and apologies if
> this is condescending, pf sits between the network stack and an
> interface that a packet travels on. for connections handled by the
> local box, this means packets come from the stack and get an output
> interface selected by a route lookup, then pf checks it, and then
> it goes out the selected interface. replies come into an interface,
> get checked by pf, and then enter the stack. when forwarding, a
> packet comes into an interface, pf checks it, the stack does a route
> lookup to pick an interface, pf checks it again, and then it goes
> out the interface.
> 
> so what does it mean when route-to (or reply-to) gets involved? i'm
> saying that when route-to is applied to a packet, pf takes the packet
> away from the stack and immediately forwards it toward to specified
> destination address. for a packet entering the system, ie, when the
> packet is going from the interface into the stack, route-to should
> pretend that it is forwarding the packet and basically push it
> straight out an interface. however, like normal forwarding via the
> stack, there might be some policy on packets leaving that interface that
> you want to apply, so pf should run pf_test in that situation so the
> policy can be applied. this is especially useful if you need to apply
> nat-to when packets leave a particular interface.
> 
> however, if you route-to when a packet is on the way out of the
> stack, i'm arguing that pf should not run again against that packet.
> currently route-to rules run pf_test again if the interface the packet
> is routed out of changes, which means pf runs multiple times against a
> packet if rules keep changing which interface it goes out. this means
> there's loop prevention in pf to mitigate against this, and weird
> potentials for multiple states to be created when nat gets involved.
> 
> for simplicity, both in terms of reasoning and code i think pf should
> only be run once when a packet enters the system, and only once when it
> leaves the system. the only reason i can come up with for running
> pf_test multiple times when route-to changes the outgoing interface is
> so you can check the packet with "pass out on $new_if" type rules. we
> don't rerun pf again when nat/rdr changes addresses, so this feels
> inconsistent to me.

I understand that simple is better here, so I won't object
if we will lean towards simplified model above. However I still
would like to share my view on current PF.

the way I understand how things (should) work currently is fairly simple:

we always run pf_test() as packet crosses interface.
packet can cross interface either in outbound or
inbound direction.

this way we can always create a complex route-to loops,
however it can also solve some route-to vs. NAT issues.
consider those fairly innocent rules:

8<---8<---8<--8<
table  { 10.10.10.10, 172.16.1.1 }

pass out on em0 from 192.168.1.0/24 to any route-to 
pass out on em1 from 192.168.1.0 to any nat-to (em1)
pass out on em2 all 
8<---8<---8<--8<

Rules above should currently work, but will stop if we will
go with simplified model.

I'll be OK with your simplified model if it will make things
more explicit:

route-to option should be applied on inbound rules
only

reply-to option should be applied on outbound rule
only

dup-to option can go either way (in/out)

does it make sense? IMO yes, because doing route-to
on outbound path feels unnatural to me.



> 
> this also breaks the ability to do route-to without states. is there a
> reason to do that apart from the DSR type things? did we agree that
> those use cases could be handled by sloppy states instead?

If I remember correct we need to make 'keep state' mandatory
for route-to so it can work well with pfsync(4), right?

> 
> lastly, the "argument" or address specified with route-to (and
> reply-to and dup-to) is a destination address, not a next-hop. this
> has been discussed on the lists a couple of times before, so i won't
> go over it again, except to reiterate that it allows pf to force
> "sticky" path selection while opening up the possibility for ecmp
> and failover for where that path traverses.

I keep forgetting about it as I still stick to current interpretation.


I've seen changes to pfctl. Diff below still allows rule:

pass in on net0 from 192.168.1.0/24 to any route

Re: pf route-to issues

2021-01-22 Thread David Gwynne

On Fri, Jan 08, 2021 at 04:43:39PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > 
> > revision 1.294
> > date: 2003/01/02 01:56:56;  author: dhartmei;  state: Exp;  lines: +27 -49;
> > When route-to/reply-to is used in combination with address translation,
> > pf_test() may be called twice for the same packet. In this case, make
> > sure the translation is only applied in the second call. This solves
> > the problem with state insert failures where the second pf_test() call
> > tried to insert another state entry after the first call's translation.
> > ok henning@, mcbride@, thanks to Joe Nall for additional testing.
> > 
> > 
> > I have tested your diffs in my setup, they all pass.  I have not
> > tested the scenario mentioned in the commit message.  Note that the
> > address translation implementation in 2003 was different from what
> > we have now.  And sasha@'s analysis shows that the current code is
> > wrong in other use cases.
> > 
> 
> I've completely forgot there was a change in NAT. Therefore I could
> not understand the commit message.
> 
> 
> 
> > 
> > The only way to find out is to commit it.  It reduces comlexity that
> > noone understands.
> > 
> > OK bluhm@ to remove the check
> > 
> > Please leave the "if (pd->kif->pfik_ifp != ifp)" around pf_test()
> > in pf_route() as it is for now.
> 
> I agree with bluhm@ here. we should proceed with small steps in such
> case and let things to settle down before making next move.

ok. i don't know how to split up the rest of the change though.

here's an updated diff that includes the rest of the kernel changes and
the pfctl and pf.conf tweaks.

it's probably useful for me to try and explain at a high level what
i think the semantics should be, otherwise we might end up arguing about
which bits of the current config i broke.

so, from an extremely high level point of view, and apologies if
this is condescending, pf sits between the network stack and an
interface that a packet travels on. for connections handled by the
local box, this means packets come from the stack and get an output
interface selected by a route lookup, then pf checks it, and then
it goes out the selected interface. replies come into an interface,
get checked by pf, and then enter the stack. when forwarding, a
packet comes into an interface, pf checks it, the stack does a route
lookup to pick an interface, pf checks it again, and then it goes
out the interface.

so what does it mean when route-to (or reply-to) gets involved? i'm
saying that when route-to is applied to a packet, pf takes the packet
away from the stack and immediately forwards it toward to specified
destination address. for a packet entering the system, ie, when the
packet is going from the interface into the stack, route-to should
pretend that it is forwarding the packet and basically push it
straight out an interface. however, like normal forwarding via the
stack, there might be some policy on packets leaving that interface that
you want to apply, so pf should run pf_test in that situation so the
policy can be applied. this is especially useful if you need to apply
nat-to when packets leave a particular interface.

however, if you route-to when a packet is on the way out of the
stack, i'm arguing that pf should not run again against that packet.
currently route-to rules run pf_test again if the interface the packet
is routed out of changes, which means pf runs multiple times against a
packet if rules keep changing which interface it goes out. this means
there's loop prevention in pf to mitigate against this, and weird
potentials for multiple states to be created when nat gets involved.

for simplicity, both in terms of reasoning and code i think pf should
only be run once when a packet enters the system, and only once when it
leaves the system. the only reason i can come up with for running
pf_test multiple times when route-to changes the outgoing interface is
so you can check the packet with "pass out on $new_if" type rules. we
don't rerun pf again when nat/rdr changes addresses, so this feels
inconsistent to me.

i also don't think route-to is used much. getting basic functionality
working is surprisingly hard, so the complicated possibilities in the
current code are almost certainly not taken advantage of. we're going to
break existing configurations anyway, so if we can agree that pf only
runs twice even if route-to gets involved, then i'm not going to feel
bad about breaking something this complicated anyway.

this also breaks the ability to do route-to without states. is there a
reason to do that apart from the DSR type things? did we agree that
those use cases could be handled by sloppy states instead?

lastly, the "argument" or address specified with route-to (and
reply-to and dup-to) is a destination address, not a next-hop. this
has been discussed on the lists a couple of times before, so i won't
go over it again, except to reit

Re: pf route-to issues

2021-01-08 Thread Alexandr Nedvedicky

Hello,


> 
> revision 1.294
> date: 2003/01/02 01:56:56;  author: dhartmei;  state: Exp;  lines: +27 -49;
> When route-to/reply-to is used in combination with address translation,
> pf_test() may be called twice for the same packet. In this case, make
> sure the translation is only applied in the second call. This solves
> the problem with state insert failures where the second pf_test() call
> tried to insert another state entry after the first call's translation.
> ok henning@, mcbride@, thanks to Joe Nall for additional testing.
> 
> 
> I have tested your diffs in my setup, they all pass.  I have not
> tested the scenario mentioned in the commit message.  Note that the
> address translation implementation in 2003 was different from what
> we have now.  And sasha@'s analysis shows that the current code is
> wrong in other use cases.
> 

I've completely forgot there was a change in NAT. Therefore I could
not understand the commit message.



> 
> The only way to find out is to commit it.  It reduces comlexity that
> noone understands.
> 
> OK bluhm@ to remove the check
> 
> Please leave the "if (pd->kif->pfik_ifp != ifp)" around pf_test()
> in pf_route() as it is for now.

I agree with bluhm@ here. we should proceed with small steps in such
case and let things to settle down before making next move.


thanks and
regards
sashan

Re: pf route-to issues

2021-01-08 Thread Alexander Bluhm

On Tue, Jan 05, 2021 at 10:05:39PM +1000, David Gwynne wrote:
> If the idea is to avoid running most of pf_test again if route-to is
> applied during ip_output, I think this tweaked diff is simpler. Is there
> a valid use case for running some of pf_test again after route-to is
> applied?

I found the original commit that introduced this strange check.

revision 1.294
date: 2003/01/02 01:56:56;  author: dhartmei;  state: Exp;  lines: +27 -49;
When route-to/reply-to is used in combination with address translation,
pf_test() may be called twice for the same packet. In this case, make
sure the translation is only applied in the second call. This solves
the problem with state insert failures where the second pf_test() call
tried to insert another state entry after the first call's translation.
ok henning@, mcbride@, thanks to Joe Nall for additional testing.

I have tested your diffs in my setup, they all pass.  I have not
tested the scenario mentioned in the commit message.  Note that the
address translation implementation in 2003 was different from what
we have now.  And sasha@'s analysis shows that the current code is
wrong in other use cases.

The check in pf_find_state() seems to be unrelated to the call to
pf_test() in pf_route().  I have to rethink it separately.

How can we figure out what happens when we remove the check?  It
may harm some cases and benefit others or make no sense at all.  My
regression test, which tests each feature individually, is not
affected.

The only way to find out is to commit it.  It reduces comlexity that
noone understands.

OK bluhm@ to remove the check

Please leave the "if (pd->kif->pfik_ifp != ifp)" around pf_test()
in pf_route() as it is for now.

Re: pf route-to issues

2021-01-07 Thread Alexandr Nedvedicky

Hello,

sorry to come back after while...

On Tue, Jan 05, 2021 at 10:05:39PM +1000, David Gwynne wrote:
> On Mon, Jan 04, 2021 at 06:37:54PM +0100, Alexander Bluhm wrote:
> > On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> > > this chunk pops out as a standalone change.
> > >
> > > having pf_find_state() return PF_PASS here means the callers short
> > > circuit and let the packet go through without running it through the
> > > a lot of the state handling, which includes things like protocol state
> > > updates, nat, scrubbing, some pflog handling, and most importantly,
> > > later calls to pf_route().
> > 
> > pf_route() calls pf_test() again with a different interface.
> > 
> > The idea of this code is, that the interface which is passed to
> > pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
> > changes it in the state.
> > 
> > In the pf_test() call from ip_output() we skip the tests.  We know
> > they will happen in pf_test() called from pf_route().  Without this
> > chunk we would do state handling twice with different interfaces.
> > 
> > Is that analysis correct?
> 
> I think so, but I didn't get as much time to poke at this today as I was
> hoping.
> 
> If the idea is to avoid running most of pf_test again if route-to is
> applied during ip_output, I think this tweaked diff is simpler. Is there
> a valid use case for running some of pf_test again after route-to is
> applied?
> 
> The pf_set_rt_ifp() stuff could be cleaned up if we can get away with
> this.

I think this should go in. I was trying to test this change, but
I'm unable to create set up which would work on google cloud in the
way I need. I suspect failing to convince underlying vswitch
to carry packets between test hosts for me.

as soon as I start to do something more fancy, packets start to disappear
in underlying fabric...

If I understand the change right we really need to take care of
route-to action for inbound packet 'to convince PF' stack
the packet actually enters firewall on interface desired by
route-to action ordered by rule found by currently executed pf_test().

for outbound case the IP stack will be executing the pf_test().

so yes, I'm OK with this change.

regards
sashan

Re: pf route-to issues

2021-01-05 Thread David Gwynne

On Mon, Jan 04, 2021 at 06:37:54PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> > this chunk pops out as a standalone change.
> >
> > having pf_find_state() return PF_PASS here means the callers short
> > circuit and let the packet go through without running it through the
> > a lot of the state handling, which includes things like protocol state
> > updates, nat, scrubbing, some pflog handling, and most importantly,
> > later calls to pf_route().
> 
> pf_route() calls pf_test() again with a different interface.
> 
> The idea of this code is, that the interface which is passed to
> pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
> changes it in the state.
> 
> In the pf_test() call from ip_output() we skip the tests.  We know
> they will happen in pf_test() called from pf_route().  Without this
> chunk we would do state handling twice with different interfaces.
> 
> Is that analysis correct?

I think so, but I didn't get as much time to poke at this today as I was
hoping.

If the idea is to avoid running most of pf_test again if route-to is
applied during ip_output, I think this tweaked diff is simpler. Is there
a valid use case for running some of pf_test again after route-to is
applied?

The pf_set_rt_ifp() stuff could be cleaned up if we can get away with
this.

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1097
diff -u -p -r1.1097 pf.c
--- pf.c4 Jan 2021 12:48:27 -   1.1097
+++ pf.c5 Jan 2021 11:18:14 -
@@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
}
 
*state = s;
-   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
-   ((s->rule.ptr->rt == PF_ROUTETO &&
-   s->rule.ptr->direction == PF_OUT) ||
-   (s->rule.ptr->rt == PF_REPLYTO &&
-   s->rule.ptr->direction == PF_IN)))
-   return (PF_PASS);
 
return (PF_MATCH);
 }
@@ -6049,7 +6043,7 @@ pf_route(struct pf_pdesc *pd, struct pf_
if (ifp == NULL)
goto bad;
 
-   if (pd->kif->pfik_ifp != ifp) {
+   if (pd->dir == PF_IN) {
if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS)
goto bad;
else if (m0 == NULL)
@@ -6204,7 +6198,7 @@ pf_route6(struct pf_pdesc *pd, struct pf
if (ifp == NULL)
goto bad;
 
-   if (pd->kif->pfik_ifp != ifp) {
+   if (pd->dir == PF_IN) {
if (pf_test(AF_INET6, PF_OUT, ifp, &m0) != PF_PASS)
goto bad;
else if (m0 == NULL)

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,

On Mon, Jan 04, 2021 at 06:37:54PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> > this chunk pops out as a standalone change.
> >
> > having pf_find_state() return PF_PASS here means the callers short
> > circuit and let the packet go through without running it through the
> > a lot of the state handling, which includes things like protocol state
> > updates, nat, scrubbing, some pflog handling, and most importantly,
> > later calls to pf_route().
> 
> pf_route() calls pf_test() again with a different interface.

yes, that's correct. In ideal world one of those should
happen:

a) no state is found, try to find matching rule

b) matching rule found, create a new state if the rule orders to do so

c) no rule found either, just accept packet
> 
> The idea of this code is, that the interface which is passed to
> pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
> changes it in the state.

can you clarify what 'wrong' means here?

> > Index: pf.c
> > ===
> > RCS file: /cvs/src/sys/net/pf.c,v
> > retrieving revision 1.1097
> > diff -u -p -r1.1097 pf.c
> > --- pf.c4 Jan 2021 12:48:27 -   1.1097
> > +++ pf.c4 Jan 2021 13:08:26 -
> > @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
> > }
> >
> > *state = s;
> > -   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
verify interface matches the one specified in state for outbound packet.

if the interfaces don't match, then

> > -   ((s->rule.ptr->rt == PF_ROUTETO &&
> > -   s->rule.ptr->direction == PF_OUT) ||

we must be either dealing with state created by route-to action bound
to outbound rule.

> > -   (s->rule.ptr->rt == PF_REPLYTO &&
> > -   s->rule.ptr->direction == PF_IN)))
> > -   return (PF_PASS);

or reply-to action bound to inbound rule...

...if it is the case then we short circuit the state check and
assume the state matches...

let's consider the rules as follows:

pass out from 1.2.3.4 to any route-to 10.20.30.40@em1
pass out from any to 1.2.3.4 route-to 192.168.1.10@em0
pass out on em1 from 1.2.3.4 to any nat-to (em1)

let's further assume there is outbound packet on em0 interface.  the packet
matches the rule with 'route-to' action. state gets created:

out@em0 1.2.3.4 -> a.b.c.d route-to 10.20.30.40@em1

pf_route() gets called, and packet arrives to pf_test() again as outbound
on em1 interface. No state should be found, because em1 != em0.
so that complex if does not kick in.

Now let there be a response packet at em1 interface. It gets
translated to a.b.c.d -> 1.2.3.4 and forwarded. let's further
assume the packet hits (default) route, which goes via em3.
There is outbound packet at em3 inspected by pf_test():

out@em3 a.b.c.d -> 1.2.3.4

no matching state is found because of em3 != em0, so packet
hits the 'pass out from any to 1.2.3.4' rule. The rule creates
state:

out@em3 a.b.c.d -> 1.2.3.4 route-to 192.168.1.10@em0

and packet is sent out via pf_route('em0'). The packet enters
pf_test() again. We find matching state this time:

in@em0 1.2.3.4 -> a.b.c.d route-to@10.20.30.40@em1

let's see if the complex condition is met this time:

pd->dir == PF_OUT   (it is outbound packet)
pd->kif == em0  (bound to em0 interface)
s->rt_kif == em1(route* action is bound to em1)
s->rule.ptr->direction == PF_OUT (state got created by outbound rule)
s->rule.ptr == PF_ROUTETO (there is a route-to action at rule)

It looks like the complex if() test is satisfied this time, so
we bail out with PF_PASS. So what are consequences of skipping
proper state check?

first of all the state does not get updated. If the packet, which bypasses
state check is SYN-ACK, the state is left in SYN-SENT state, although the
connection initiator is in connected state already (assuming it receives
SYN-ACK).

I hope I have not missed anything in story above. it seems to me
the code in question can indeed do some harm (sometimes...)

thanks and
regards
sashan

Re: pf route-to issues

2021-01-04 Thread Alexander Bluhm

On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> this chunk pops out as a standalone change.
> 
> having pf_find_state() return PF_PASS here means the callers short
> circuit and let the packet go through without running it through the
> a lot of the state handling, which includes things like protocol state
> updates, nat, scrubbing, some pflog handling, and most importantly,
> later calls to pf_route().

pf_route() calls pf_test() again with a different interface.

The idea of this code is, that the interface which is passed to
pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
changes it in the state.

In the pf_test() call from ip_output() we skip the tests.  We know
they will happen in pf_test() called from pf_route().  Without this
chunk we would do state handling twice with different interfaces.

Is that analysis correct?

bluhm

> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1097
> diff -u -p -r1.1097 pf.c
> --- pf.c  4 Jan 2021 12:48:27 -   1.1097
> +++ pf.c  4 Jan 2021 13:08:26 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
>  
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
>  
>   return (PF_MATCH);
>  }

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,



> My diff removes the kif here ...
> 
> > > > -   if (rv == 0) {
> > > > -   s->rt_kif = r->route.kif;
> > > > +   if (rv == 0)
> > > > s->natrule.ptr = r;
> > > > -   }
> 
> ... and the {}.
> 
> Anyway, it should not be commited without the userland part.
> (and not without compling it :-)

yes you are right. there was a conflict and I ignored the resulting
pf.c.rej. entirely my fault.

your diff looks good and should go in. The userland part will be tricky.
perhaps we should fork existing `host()` function and tailor it to route-to et.
al. The current `host()` as-is still tries to resolve name to interface if dns
fails. We don't want it now.

your diff reads OK to me.

thanks and
regards
sashan

> 
> bluhm

Re: pf route-to issues

2021-01-04 Thread Alexander Bluhm

On Mon, Jan 04, 2021 at 04:32:45PM +0100, Alexandr Nedvedicky wrote:
> so either rt_kif must stay for a while, or your new diff (rebased on top 
> of
> stuff committed already) must be expanded by the nit pick I've sent.

The diff I sent contains this bit.  I still think the merge bug is
on your side.

> to put it clear: I'm concerned the diff posted here:
>   https://marc.info/?l=openbsd-tech&m=160976516119388&w=2
> is not complete and should not be committed as is.

It compiles, I recreated the diff and attached it.

> > > -   s->rt_kif = NULL;
> > > if (!r->rt)
> > > return (0);

My diff removes the kif here ...

> > > -   if (rv == 0) {
> > > -   s->rt_kif = r->route.kif;
> > > +   if (rv == 0)
> > > s->natrule.ptr = r;
> > > -   }

... and the {}.

Anyway, it should not be commited without the userland part.
(and not without compling it :-)

bluhm

Index: net/if_pfsync.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.280
diff -u -p -r1.280 if_pfsync.c
--- net/if_pfsync.c 4 Jan 2021 12:48:27 -   1.280
+++ net/if_pfsync.c 4 Jan 2021 15:57:17 -
@@ -613,6 +613,7 @@ pfsync_state_import(struct pfsync_state 
 
/* copy to state */
st->rt_addr = sp->rt_addr;
+   st->rt = sp->rt;
st->creation = getuptime() - ntohl(sp->creation);
st->expire = getuptime();
if (ntohl(sp->expire)) {
@@ -643,7 +644,6 @@ pfsync_state_import(struct pfsync_state 
 
st->rule.ptr = r;
st->anchor.ptr = NULL;
-   st->rt_kif = NULL;
 
st->pfsync_time = getuptime();
st->sync_state = PFSYNC_S_NONE;
@@ -1857,7 +1857,7 @@ pfsync_undefer(struct pfsync_deferral *p
if (drop)
m_freem(pd->pd_m);
else {
-   if (st->rule.ptr->rt == PF_ROUTETO) {
+   if (st->rt == PF_ROUTETO) {
if (pf_setup_pdesc(&pdesc, st->key[PF_SK_WIRE]->af,
st->direction, st->kif, pd->pd_m, NULL) !=
PF_PASS) {
@@ -1866,11 +1866,11 @@ pfsync_undefer(struct pfsync_deferral *p
}
switch (st->key[PF_SK_WIRE]->af) {
case AF_INET:
-   pf_route(&pdesc, st->rule.ptr, st);
+   pf_route(&pdesc, st);
break;
 #ifdef INET6
case AF_INET6:
-   pf_route6(&pdesc, st->rule.ptr, st);
+   pf_route6(&pdesc, st);
break;
 #endif /* INET6 */
default:
Index: net/pf.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/pf.c,v
retrieving revision 1.1097
diff -u -p -r1.1097 pf.c
--- net/pf.c4 Jan 2021 12:48:27 -   1.1097
+++ net/pf.c4 Jan 2021 15:57:17 -
@@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
}
 
*state = s;
-   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
-   ((s->rule.ptr->rt == PF_ROUTETO &&
-   s->rule.ptr->direction == PF_OUT) ||
-   (s->rule.ptr->rt == PF_REPLYTO &&
-   s->rule.ptr->direction == PF_IN)))
-   return (PF_PASS);
 
return (PF_MATCH);
 }
@@ -1186,6 +1180,7 @@ pf_state_export(struct pfsync_state *sp,
 
/* copy from state */
strlcpy(sp->ifname, st->kif->pfik_name, sizeof(sp->ifname));
+   sp->rt = st->rt;
sp->rt_addr = st->rt_addr;
sp->creation = htonl(getuptime() - st->creation);
expire = pf_state_expires(st);
@@ -3433,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct
struct pf_rule *r = s->rule.ptr;
int rv;
 
-   s->rt_kif = NULL;
-   if (!r->rt)
+   if (r->rt == PF_NOPFROUTE)
return (0);
 
rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
&r->route, PF_SN_ROUTE);
-   if (rv == 0) {
-   s->rt_kif = r->route.kif;
-   s->natrule.ptr = r;
-   }
+   if (rv == 0)
+   s->rt = r->rt;
 
return (rv);
 }
@@ -5973,15 +5965,13 @@ pf_rtlabel_match(struct pf_addr *addr, s
 
 /* pf_route() may change pd->m, adjust local copies after calling */
 void
-pf_route(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s)
+pf_route(struct pf_pdesc *pd, struct pf_state *s)
 {
struct mbuf *m0, *m1;
struct sockaddr_in  *dst, sin;
struct rtentry  *rt = NULL;
struct ip   *ip;
struct ifnet*ifp = NULL;
-   struct pf_addr   naddr;
-   struct pf_src_node  *sns[PF_SN_MAX];
int  error = 0;
un

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,

I'm sorry I was not clear enough in my earlier email.

On Mon, Jan 04, 2021 at 03:56:45PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 03:26:15PM +0100, Alexandr Nedvedicky wrote:
> > you refactoring diff requires a minor finishing touch to keep the
> > stuff compiling:
> 
> Did I commit something that does not compile?  I just made cvs
> update on another machine.  There it worked.

all diffs you commit compile, there are no doubts in this.
I was talking about change sent here:

https://marc.info/?l=openbsd-tech&m=160976516119388&w=2

diff above contains chunk here, which removes rt_kif.

8<---8<---8<---8<---8<8<
Index: net/pfvar.h
===
RCS file: /cvs/src/sys/net/pfvar.h,v
retrieving revision 1.497
diff -u -p -r1.497 pfvar.h
--- net/pfvar.h 14 Oct 2020 19:22:14 - 1.497
+++ net/pfvar.h 4 Jan 2021 12:52:02 -
@@ -762,7 +762,6 @@ struct pf_state {
  struct pf_sn_head  src_nodes;
  struct pf_state_key *key[2]; /* addresses stack and wire  */
  struct pfi_kif  *kif;
- struct pfi_kif  *rt_kif;
  u_int64_t   packets[2];
  u_int64_t   bytes[2];
  int32_tcreation;
8<---8<---8<---8<---8<8<

> 
> The rt_kif in pf_state still exists.  The diff below should not
> be necessary.  Maybe you forgot to clean pfvar.h.
> 

so either rt_kif must stay for a while, or your new diff (rebased on top of
stuff committed already) must be expanded by the nit pick I've sent.


to put it clear: I'm concerned the diff posted here:
https://marc.info/?l=openbsd-tech&m=160976516119388&w=2
is not complete and should not be committed as is.

thanks and
regards
sashan
> 
> > 8<---8<---8<---8<---8<8<
> > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > index b8766df1686..3f9f5b13add 100644
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -3428,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct pf_addr 
> > *saddr, sa_family_t af,
> > struct pf_rule *r = s->rule.ptr;
> > int rv;
> >  
> > -   s->rt_kif = NULL;
> > if (!r->rt)
> > return (0);
> >  
> > rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
> > &r->route, PF_SN_ROUTE);
> > -   if (rv == 0) {
> > -   s->rt_kif = r->route.kif;
> > +   if (rv == 0)
> > s->natrule.ptr = r;
> > -   }
> >  
> > return (rv);
> >  }
> > 8<---8<---8<---8<---8<8<
> > 
> > 
> > thanks and
> > regards
> > sashan
>

Re: pf route-to issues

2021-01-04 Thread Alexander Bluhm

On Mon, Jan 04, 2021 at 03:26:15PM +0100, Alexandr Nedvedicky wrote:
> you refactoring diff requires a minor finishing touch to keep the
> stuff compiling:

Did I commit something that does not compile?  I just made cvs
update on another machine.  There it worked.

The rt_kif in pf_state still exists.  The diff below should not
be necessary.  Maybe you forgot to clean pfvar.h.

bluhm

> 8<---8<---8<---8<---8<8<
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index b8766df1686..3f9f5b13add 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -3428,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct pf_addr 
> *saddr, sa_family_t af,
> struct pf_rule *r = s->rule.ptr;
> int rv;
>  
> -   s->rt_kif = NULL;
> if (!r->rt)
> return (0);
>  
> rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
> &r->route, PF_SN_ROUTE);
> -   if (rv == 0) {
> -   s->rt_kif = r->route.kif;
> +   if (rv == 0)
> s->natrule.ptr = r;
> -   }
>  
> return (rv);
>  }
> 8<---8<---8<---8<---8<8<
> 
> 
> thanks and
> regards
> sashan

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,

On Mon, Jan 04, 2021 at 01:57:24PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > > let's put this in and then i'll have a look. ok by me.
> > bluhm's diff is fine with me.
> 
> Refactoring is commited, here is the remaining kernel diff after merge.
> 
> bluhm
> 



> Index: net/pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1097
> diff -u -p -r1.1097 pf.c
> --- net/pf.c  4 Jan 2021 12:48:27 -   1.1097
> +++ net/pf.c  4 Jan 2021 12:52:02 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
> 
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
> 
>   return (PF_MATCH);
>  }

I've sent OK to dlg for the chunk above. I think it's worth to have this
fix in separate changeset.


you refactoring diff requires a minor finishing touch to keep the
stuff compiling:

8<---8<---8<---8<---8<8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index b8766df1686..3f9f5b13add 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -3428,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct pf_addr 
*saddr, sa_family_t af,
struct pf_rule *r = s->rule.ptr;
int rv;
 
-   s->rt_kif = NULL;
if (!r->rt)
return (0);
 
rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
&r->route, PF_SN_ROUTE);
-   if (rv == 0) {
-   s->rt_kif = r->route.kif;
+   if (rv == 0)
s->natrule.ptr = r;
-   }
 
return (rv);
 }
8<---8<---8<---8<---8<8<


thanks and
regards
sashan

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,

On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> On Mon, Jan 04, 2021 at 01:57:24PM +0100, Alexander Bluhm wrote:
> > On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > > > let's put this in and then i'll have a look. ok by me.
> > > bluhm's diff is fine with me.
> > 
> > Refactoring is commited, here is the remaining kernel diff after merge.
> 
> this chunk pops out as a standalone change.
> 
> having pf_find_state() return PF_PASS here means the callers short
> circuit and let the packet go through without running it through the
> a lot of the state handling, which includes things like protocol state
> updates, nat, scrubbing, some pflog handling, and most importantly,
> later calls to pf_route().
> 
> ok?

I think this should go in. I've seen it in bluhm's larger diff,
which I still need to finish.

I'm fine if change will be committed as it solves a real bug.

OK sashan

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1097
> diff -u -p -r1.1097 pf.c
> --- pf.c  4 Jan 2021 12:48:27 -   1.1097
> +++ pf.c  4 Jan 2021 13:08:26 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
>  
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
>  
>   return (PF_MATCH);
>  }

Re: pf route-to issues

2021-01-04 Thread David Gwynne

On Mon, Jan 04, 2021 at 01:57:24PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > > let's put this in and then i'll have a look. ok by me.
> > bluhm's diff is fine with me.
> 
> Refactoring is commited, here is the remaining kernel diff after merge.

this chunk pops out as a standalone change.

having pf_find_state() return PF_PASS here means the callers short
circuit and let the packet go through without running it through the
a lot of the state handling, which includes things like protocol state
updates, nat, scrubbing, some pflog handling, and most importantly,
later calls to pf_route().

ok?

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1097
diff -u -p -r1.1097 pf.c
--- pf.c4 Jan 2021 12:48:27 -   1.1097
+++ pf.c4 Jan 2021 13:08:26 -
@@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
}
 
*state = s;
-   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
-   ((s->rule.ptr->rt == PF_ROUTETO &&
-   s->rule.ptr->direction == PF_OUT) ||
-   (s->rule.ptr->rt == PF_REPLYTO &&
-   s->rule.ptr->direction == PF_IN)))
-   return (PF_PASS);
 
return (PF_MATCH);
 }

Re: pf route-to issues

2021-01-04 Thread Alexander Bluhm

On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > let's put this in and then i'll have a look. ok by me.
> bluhm's diff is fine with me.

Refactoring is commited, here is the remaining kernel diff after merge.

bluhm

Index: net/if_pfsync.c
===
RCS file: /cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.280
diff -u -p -r1.280 if_pfsync.c
--- net/if_pfsync.c 4 Jan 2021 12:48:27 -   1.280
+++ net/if_pfsync.c 4 Jan 2021 12:52:01 -
@@ -613,6 +613,7 @@ pfsync_state_import(struct pfsync_state 
 
/* copy to state */
st->rt_addr = sp->rt_addr;
+   st->rt = sp->rt;
st->creation = getuptime() - ntohl(sp->creation);
st->expire = getuptime();
if (ntohl(sp->expire)) {
@@ -643,7 +644,6 @@ pfsync_state_import(struct pfsync_state 
 
st->rule.ptr = r;
st->anchor.ptr = NULL;
-   st->rt_kif = NULL;
 
st->pfsync_time = getuptime();
st->sync_state = PFSYNC_S_NONE;
@@ -1857,7 +1857,7 @@ pfsync_undefer(struct pfsync_deferral *p
if (drop)
m_freem(pd->pd_m);
else {
-   if (st->rule.ptr->rt == PF_ROUTETO) {
+   if (st->rt == PF_ROUTETO) {
if (pf_setup_pdesc(&pdesc, st->key[PF_SK_WIRE]->af,
st->direction, st->kif, pd->pd_m, NULL) !=
PF_PASS) {
@@ -1866,11 +1866,11 @@ pfsync_undefer(struct pfsync_deferral *p
}
switch (st->key[PF_SK_WIRE]->af) {
case AF_INET:
-   pf_route(&pdesc, st->rule.ptr, st);
+   pf_route(&pdesc, st);
break;
 #ifdef INET6
case AF_INET6:
-   pf_route6(&pdesc, st->rule.ptr, st);
+   pf_route6(&pdesc, st);
break;
 #endif /* INET6 */
default:
Index: net/pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1097
diff -u -p -r1.1097 pf.c
--- net/pf.c4 Jan 2021 12:48:27 -   1.1097
+++ net/pf.c4 Jan 2021 12:52:02 -
@@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
}
 
*state = s;
-   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
-   ((s->rule.ptr->rt == PF_ROUTETO &&
-   s->rule.ptr->direction == PF_OUT) ||
-   (s->rule.ptr->rt == PF_REPLYTO &&
-   s->rule.ptr->direction == PF_IN)))
-   return (PF_PASS);
 
return (PF_MATCH);
 }
@@ -1186,6 +1180,7 @@ pf_state_export(struct pfsync_state *sp,
 
/* copy from state */
strlcpy(sp->ifname, st->kif->pfik_name, sizeof(sp->ifname));
+   sp->rt = st->rt;
sp->rt_addr = st->rt_addr;
sp->creation = htonl(getuptime() - st->creation);
expire = pf_state_expires(st);
@@ -3433,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct
struct pf_rule *r = s->rule.ptr;
int rv;
 
-   s->rt_kif = NULL;
-   if (!r->rt)
+   if (r->rt == PF_NOPFROUTE)
return (0);
 
rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
&r->route, PF_SN_ROUTE);
-   if (rv == 0) {
-   s->rt_kif = r->route.kif;
-   s->natrule.ptr = r;
-   }
+   if (rv == 0)
+   s->rt = r->rt;
 
return (rv);
 }
@@ -5973,15 +5965,13 @@ pf_rtlabel_match(struct pf_addr *addr, s
 
 /* pf_route() may change pd->m, adjust local copies after calling */
 void
-pf_route(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s)
+pf_route(struct pf_pdesc *pd, struct pf_state *s)
 {
struct mbuf *m0, *m1;
struct sockaddr_in  *dst, sin;
struct rtentry  *rt = NULL;
struct ip   *ip;
struct ifnet*ifp = NULL;
-   struct pf_addr   naddr;
-   struct pf_src_node  *sns[PF_SN_MAX];
int  error = 0;
unsigned int rtableid;
 
@@ -5991,11 +5981,11 @@ pf_route(struct pf_pdesc *pd, struct pf_
return;
}
 
-   if (r->rt == PF_DUPTO) {
+   if (s->rt == PF_DUPTO) {
if ((m0 = m_dup_pkt(pd->m, max_linkhdr, M_NOWAIT)) == NULL)
return;
} else {
-   if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir))
+   if ((s->rt == PF_REPLYTO) == (s->direction == pd->dir))
return;
m0 = pd->m;
}
@@ -6008,44 +5998,31 @@ pf_route(struct pf_pdesc *pd, struct pf_
 
ip = mtod(m0, struct ip *);
 
-   memset(&sin, 0, sizeof(sin));
-   dst = &sin;
-   dst->sin_family = AF_INET;
-   ds

Re: pf route-to issues

2021-01-04 Thread David Gwynne




> On 4 Jan 2021, at 9:27 pm, Alexandr Nedvedicky 
>  wrote:
> 
> Hello,
> 
> there is one more thing, which just came up on my mind.
> 
> 
>> 
>> so i want to change route-to in pfctl so it takes a nexthop instead
>> of an interface. you could argue that pf already lets you do this,
>> because there's some bs nexthop@interface syntax. my counter argument
>> is that the interface the nexthop is reachable over is redundant, and it
>> makes fixing some of the other problems harder if we keep it.
>> 
> 
>what is your plan for dup-to then? if my understanding of dup-to is
>correct, then it allows administrator to copy matching packets and send
>them out dedicated interface so another physical box (box with running
>snort) can intercept them and process them.
> 
>I remember we had to do some assumptions about this, when porting PF to
>Solaris. So Solaris interpretation of option
> 
>   'dup-to net12'
> 
>is to send out copy of matching packet via net12 interface.  because there
>is no next-hop specified, we just use link broadcast when pushing out the
>packet to network. I agree this is a hack. If route-to will be changed
>to accept next-hop instead of interface, then we will be able to kill
>such hack.

route-to, reply-to, and dup-to take an address as an argument. In dup-to's 
case, the packet is duplicated and then the copy is routed toward the 
destination.

As discussed previously, the address argument is a destination, it's not a link 
local address per se. The destination address could be a directly connected or 
link local address, but it could also be via a gateway (including a those 
learnt by a dynamic routing protocol), and/or via multiple gateways. This 
allows for failover or ECMP like we get with packets where the path is selected 
by the rtable lookup.

dlg

> 
> 
> 
>> 
>> if we limit the information needed for pf_route to a nexthop address,
>> and which direction the address is used, this is doable. both the
>> pf_state and pfsync_state structs already contain an address to store a
>> nexthop in, i just had to move the route-to direction from the rule into
>> the state. this is easy with pf_state, but i used a spare pad field in
>> pfsync_state for this.
>> 
> 
>this should be fine, because route-to et.al. don't work with 'block' rules.
> 
> 
> thanks and
> regards
> sashan

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,

there is one more thing, which just came up on my mind.


> 
> so i want to change route-to in pfctl so it takes a nexthop instead
> of an interface. you could argue that pf already lets you do this,
> because there's some bs nexthop@interface syntax. my counter argument
> is that the interface the nexthop is reachable over is redundant, and it
> makes fixing some of the other problems harder if we keep it.
> 

what is your plan for dup-to then? if my understanding of dup-to is
correct, then it allows administrator to copy matching packets and send
them out dedicated interface so another physical box (box with running
snort) can intercept them and process them.

I remember we had to do some assumptions about this, when porting PF to
Solaris. So Solaris interpretation of option

'dup-to net12'

is to send out copy of matching packet via net12 interface.  because there
is no next-hop specified, we just use link broadcast when pushing out the
packet to network. I agree this is a hack. If route-to will be changed
to accept next-hop instead of interface, then we will be able to kill
such hack.



> 
> if we limit the information needed for pf_route to a nexthop address,
> and which direction the address is used, this is doable. both the
> pf_state and pfsync_state structs already contain an address to store a
> nexthop in, i just had to move the route-to direction from the rule into
> the state. this is easy with pf_state, but i used a spare pad field in
> pfsync_state for this.
> 

this should be fine, because route-to et.al. don't work with 'block' rules.


thanks and
regards
sashan

Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky

Hello,


> 
> the stack should provide it on a 4 byte boundary, but it has uint64_t
> members. however, it is also __packed, so the compiler makes no
> assumptions about alignment.
> 
> > Is there anything else that can be split out easily?
> 
> let's put this in and then i'll have a look. ok by me.
> 

bluhm's diff is fine with me.

thanks and
regards
sashan

Re: pf route-to issues

2021-01-03 Thread David Gwynne

On Mon, Jan 04, 2021 at 12:58:17AM +0100, Alexander Bluhm wrote:
> On Sun, Jan 03, 2021 at 06:56:20PM +0100, Alexander Bluhm wrote:
> > I am currently running a full regress to find more fallout.
> 
> These regress tests fail:
> 
> sys/net/pf_forward
> sys/net/pf_fragment
> sbin/pfctl
> 
> The first two are easy to fix.  That means my tests using route-to
> work fine with your diff.  Just remove the @interface as below.

pretty much, yes.

> pfctl tests pfail8 and pf13 use very strange routespec syntax.  You
> might want to take a look at what that meant before and what should
> be valid now.

this is another syntax which we seem to agree is confusing. this makes
me more convinced that it needs to be changed.

pfail8.in and pf13.in should be modified to route-to an IP address
instead of an interface. these regress tests are a bit confusing
because they're just testing the parser and the addresses that
they're using aren't configured anywhere.

pfail8.ok shows that pfctl should generate some more specific error
messages, which is easily fixed.

> 
> bluhm
> 
> Index: regress/sys/net/pf_forward/pf.conf
> ===
> RCS file: /mount/openbsd/cvs/src/regress/sys/net/pf_forward/pf.conf,v
> retrieving revision 1.5
> diff -u -p -r1.5 pf.conf
> --- regress/sys/net/pf_forward/pf.conf11 Jan 2018 03:23:16 -  
> 1.5
> +++ regress/sys/net/pf_forward/pf.conf3 Jan 2021 23:26:54 -
> @@ -17,22 +17,22 @@ pass out inet6
>  pass in  to $AF_IN6/64 af-to inet  from $PF_OUT  to $ECO_IN/24   tag af
>  pass out inettagged af
> 
> -pass in  to $RTT_IN/24  route-to $RT_IN@$PF_IFOUT  tag rttin
> -pass out   tagged rttin
> -pass in  to $RTT_IN6/64 route-to $RT_IN6@$PF_IFOUT tag rttin
> -pass out   tagged rttin
> +pass in  to $RTT_IN/24  route-to $RT_IN  tag rttin
> +pass out tagged rttin
> +pass in  to $RTT_IN6/64 route-to $RT_IN6 tag rttin
> +pass out tagged rttin
> 
> -pass in  to $RTT_OUT/24 tag rttout
> -pass out route-to $RT_IN@$PF_IFOUT  tagged rttout
> -pass in  to $RTT_OUT6/64tag rttout
> -pass out route-to $RT_IN6@$PF_IFOUT tagged rttout
> +pass in  to $RTT_OUT/24   tag rttout
> +pass out route-to $RT_IN  tagged rttout
> +pass in  to $RTT_OUT6/64  tag rttout
> +pass out route-to $RT_IN6 tagged rttout
> 
> -pass in  from $RPT_IN/24  reply-to $SRC_OUT@$PF_IFIN  tag rptin
> -pass out  tagged rptin
> -pass in  from $RPT_IN6/64 reply-to $SRC_OUT6@$PF_IFIN tag rptin
> -pass out  tagged rptin
> +pass in  from $RPT_IN/24  reply-to $SRC_OUT  tag rptin
> +pass out tagged rptin
> +pass in  from $RPT_IN6/64 reply-to $SRC_OUT6 tag rptin
> +pass out tagged rptin
> 
> -pass in  from $RPT_OUT/24  tag rptout
> -pass out   reply-to $SRC_OUT@$PF_IFIN  tagged rptout
> -pass in  from $RPT_OUT6/64 tag rptout
> -pass out   reply-to $SRC_OUT6@$PF_IFIN tagged rptout
> +pass in  from $RPT_OUT/24 tag rptout
> +pass out   reply-to $SRC_OUT  tagged rptout
> +pass in  from $RPT_OUT6/64tag rptout
> +pass out   reply-to $SRC_OUT6 tagged rptout
> Index: regress/sys/net/pf_fragment/pf.conf
> ===
> RCS file: /mount/openbsd/cvs/src/regress/sys/net/pf_fragment/pf.conf,v
> retrieving revision 1.5
> diff -u -p -r1.5 pf.conf
> --- regress/sys/net/pf_fragment/pf.conf   7 Jun 2017 20:09:07 -   
> 1.5
> +++ regress/sys/net/pf_fragment/pf.conf   3 Jan 2021 23:28:07 -
> @@ -10,7 +10,7 @@ pass outnat-to $PF_OUT
>  pass in  to $RDR_IN6/64 rdr-to $ECO_IN6 allow-opts tag rdr
>  pass outnat-to $PF_OUT6 allow-opts tagged rdr
> 
> -pass in  to $RTT_IN/24 allow-opts tag rtt
> -pass outroute-to $RT_IN@$PF_IFOUT  allow-opts tagged rtt
> -pass in  to $RTT_IN6/64allow-opts tag rtt
> -pass outroute-to $RT_IN6@$PF_IFOUT allow-opts tagged rtt
> +pass in  to $RTT_IN/24   allow-opts tag rtt
> +pass outroute-to $RT_IN  allow-opts tagged rtt
> +pass in  to $RTT_IN6/64  allow-opts tag rtt
> +pass outroute-to $RT_IN6 allow-opts tagged rtt
>

Re: pf route-to issues

2021-01-03 Thread David Gwynne

On Sun, Jan 03, 2021 at 06:56:20PM +0100, Alexander Bluhm wrote:
> On Sun, Jan 03, 2021 at 02:00:00PM +1000, David Gwynne wrote:
> > On Tue, Oct 20, 2020 at 09:27:09AM +1000, David Gwynne wrote:
> > We've been running this diff in production for the last couple of
> > months, and it's been solid for us so far. Ignoring the fixes for
> > crashes, I personally find it a lot more usable than the current
> > route-to rules too.
> >
> > Can I commit it?
> 
> The diff is quite large and does multiple things at a time.

In hindsight I agree. It was hard to see while being so close to it.

> In general I also did not understand why I have to say em0@10.0.0.1
> for routing and it took me a while to figure out what to put into
> pf.conf.  I use this syntax in /usr/src/regress/sys/net/pf_forward/pf.conf.
> This has to be fixed after this goes in.  I will care about regress
> as this test is quite complex an needs several machines to setup.
> I am currently running a full regress to find more fallout.
> 
> I do not use pfsync, so I cannot say what the consequences of the
> change are in this area.  Also I don't know why pf-route interfaces
> were designed in such a strange way.

we do use pfsync, and not being able to use it with route-to has been a
point of friction for us for years.

as for the design, i think it was copied (imperfectly) from ipfilter.
look for "Policy Based Routing" in
https://www.freebsd.org/cgi/man.cgi?query=ipf&sektion=5.

> From a user perspective it is not clear, why route-to should not
> work together with no-state.  So we should either fix it or document
> it and add a check in the parser.  Is fixing hard?

pf_route only takes a state now, so i'd say it is non-trivial. for now
i'll go with documentation and a check in the parser..

> Are we losing any other features apart from this strange arp reuse
> you described in your mail?

i wouldn't say the arp reuse is a feature.

> There is some refactoring in your diff.  I splitted it to make
> review easier.  I think this should go in first.  Note that the
> pf_state variable is called st in if_pfsync.c.  Can we be consistent
> here?  Is the pfsync_state properly aligned?  During import it comes
> from an mbuf.

the stack should provide it on a 4 byte boundary, but it has uint64_t
members. however, it is also __packed, so the compiler makes no
assumptions about alignment.

> Is there anything else that can be split out easily?

let's put this in and then i'll have a look. ok by me.

> 
> bluhm
> 
> Index: net/if_pfsync.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v
> retrieving revision 1.279
> diff -u -p -r1.279 if_pfsync.c
> --- net/if_pfsync.c   12 Dec 2020 11:49:02 -  1.279
> +++ net/if_pfsync.c   3 Jan 2021 17:16:55 -
> @@ -612,7 +612,7 @@ pfsync_state_import(struct pfsync_state
>   st->rtableid[PF_SK_STACK] = ntohl(sp->rtableid[PF_SK_STACK]);
> 
>   /* copy to state */
> - bcopy(&sp->rt_addr, &st->rt_addr, sizeof(st->rt_addr));
> + st->rt_addr = sp->rt_addr;
>   st->creation = getuptime() - ntohl(sp->creation);
>   st->expire = getuptime();
>   if (ntohl(sp->expire)) {
> @@ -1843,6 +1843,7 @@ pfsync_undefer(struct pfsync_deferral *p
>  {
>   struct pfsync_softc *sc = pfsyncif;
>   struct pf_pdesc pdesc;
> + struct pf_state *st = pd->pd_st;
> 
>   NET_ASSERT_LOCKED();
> 
> @@ -1852,35 +1853,32 @@ pfsync_undefer(struct pfsync_deferral *p
>   TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry);
>   sc->sc_deferred--;
> 
> - CLR(pd->pd_st->state_flags, PFSTATE_ACK);
> + CLR(st->state_flags, PFSTATE_ACK);
>   if (drop)
>   m_freem(pd->pd_m);
>   else {
> - if (pd->pd_st->rule.ptr->rt == PF_ROUTETO) {
> - if (pf_setup_pdesc(&pdesc,
> - pd->pd_st->key[PF_SK_WIRE]->af,
> - pd->pd_st->direction, pd->pd_st->rt_kif,
> - pd->pd_m, NULL) != PF_PASS) {
> + if (st->rule.ptr->rt == PF_ROUTETO) {
> + if (pf_setup_pdesc(&pdesc, st->key[PF_SK_WIRE]->af,
> + st->direction, st->kif, pd->pd_m, NULL) !=
> + PF_PASS) {
>   m_freem(pd->pd_m);
>   goto out;
>   }
> - switch (pd->pd_st->key[PF_SK_WIRE]->af) {
> + switch (st->key[PF_SK_WIRE]->af) {
>   case AF_INET:
> - pf_route(&pdesc,
> - pd->pd_st->rule.ptr, pd->pd_st);
> + pf_route(&pdesc, st->rule.ptr, st);
>   break;
>  #ifdef INET6
>   case AF_INET6:
> - pf_route6(&pdesc,
> - pd->pd_st->rule.ptr, pd->pd_st);
> +

Re: pf route-to issues

2021-01-03 Thread Alexander Bluhm

On Sun, Jan 03, 2021 at 06:56:20PM +0100, Alexander Bluhm wrote:
> I am currently running a full regress to find more fallout.

These regress tests fail:

sys/net/pf_forward
sys/net/pf_fragment
sbin/pfctl

The first two are easy to fix.  That means my tests using route-to
work fine with your diff.  Just remove the @interface as below.

pfctl tests pfail8 and pf13 use very strange routespec syntax.  You
might want to take a look at what that meant before and what should
be valid now.

bluhm

Index: regress/sys/net/pf_forward/pf.conf
===
RCS file: /mount/openbsd/cvs/src/regress/sys/net/pf_forward/pf.conf,v
retrieving revision 1.5
diff -u -p -r1.5 pf.conf
--- regress/sys/net/pf_forward/pf.conf  11 Jan 2018 03:23:16 -  1.5
+++ regress/sys/net/pf_forward/pf.conf  3 Jan 2021 23:26:54 -
@@ -17,22 +17,22 @@ pass out inet6  
 pass in  to $AF_IN6/64 af-to inet  from $PF_OUT  to $ECO_IN/24   tag af
 pass out inettagged af
 
-pass in  to $RTT_IN/24  route-to $RT_IN@$PF_IFOUT  tag rttin
-pass out   tagged rttin
-pass in  to $RTT_IN6/64 route-to $RT_IN6@$PF_IFOUT tag rttin
-pass out   tagged rttin
+pass in  to $RTT_IN/24  route-to $RT_IN  tag rttin
+pass out tagged rttin
+pass in  to $RTT_IN6/64 route-to $RT_IN6 tag rttin
+pass out tagged rttin
 
-pass in  to $RTT_OUT/24 tag rttout
-pass out route-to $RT_IN@$PF_IFOUT  tagged rttout
-pass in  to $RTT_OUT6/64tag rttout
-pass out route-to $RT_IN6@$PF_IFOUT tagged rttout
+pass in  to $RTT_OUT/24   tag rttout
+pass out route-to $RT_IN  tagged rttout
+pass in  to $RTT_OUT6/64  tag rttout
+pass out route-to $RT_IN6 tagged rttout
 
-pass in  from $RPT_IN/24  reply-to $SRC_OUT@$PF_IFIN  tag rptin
-pass out  tagged rptin
-pass in  from $RPT_IN6/64 reply-to $SRC_OUT6@$PF_IFIN tag rptin
-pass out  tagged rptin
+pass in  from $RPT_IN/24  reply-to $SRC_OUT  tag rptin
+pass out tagged rptin
+pass in  from $RPT_IN6/64 reply-to $SRC_OUT6 tag rptin
+pass out tagged rptin
 
-pass in  from $RPT_OUT/24  tag rptout
-pass out   reply-to $SRC_OUT@$PF_IFIN  tagged rptout
-pass in  from $RPT_OUT6/64 tag rptout
-pass out   reply-to $SRC_OUT6@$PF_IFIN tagged rptout
+pass in  from $RPT_OUT/24 tag rptout
+pass out   reply-to $SRC_OUT  tagged rptout
+pass in  from $RPT_OUT6/64tag rptout
+pass out   reply-to $SRC_OUT6 tagged rptout
Index: regress/sys/net/pf_fragment/pf.conf
===
RCS file: /mount/openbsd/cvs/src/regress/sys/net/pf_fragment/pf.conf,v
retrieving revision 1.5
diff -u -p -r1.5 pf.conf
--- regress/sys/net/pf_fragment/pf.conf 7 Jun 2017 20:09:07 -   1.5
+++ regress/sys/net/pf_fragment/pf.conf 3 Jan 2021 23:28:07 -
@@ -10,7 +10,7 @@ pass outnat-to $PF_OUT  
 pass in  to $RDR_IN6/64 rdr-to $ECO_IN6 allow-opts tag rdr
 pass outnat-to $PF_OUT6 allow-opts tagged rdr
 
-pass in  to $RTT_IN/24 allow-opts tag rtt
-pass outroute-to $RT_IN@$PF_IFOUT  allow-opts tagged rtt
-pass in  to $RTT_IN6/64allow-opts tag rtt
-pass outroute-to $RT_IN6@$PF_IFOUT allow-opts tagged rtt
+pass in  to $RTT_IN/24   allow-opts tag rtt
+pass outroute-to $RT_IN  allow-opts tagged rtt
+pass in  to $RTT_IN6/64  allow-opts tag rtt
+pass outroute-to $RT_IN6 allow-opts tagged rtt

Re: pf route-to issues

2021-01-03 Thread Alexander Bluhm

On Sun, Jan 03, 2021 at 02:00:00PM +1000, David Gwynne wrote:
> On Tue, Oct 20, 2020 at 09:27:09AM +1000, David Gwynne wrote:
> We've been running this diff in production for the last couple of
> months, and it's been solid for us so far. Ignoring the fixes for
> crashes, I personally find it a lot more usable than the current
> route-to rules too.
> 
> Can I commit it?

The diff is quite large and does multiple things at a time.

In general I also did not understand why I have to say em0@10.0.0.1
for routing and it took me a while to figure out what to put into
pf.conf.  I use this syntax in /usr/src/regress/sys/net/pf_forward/pf.conf.
This has to be fixed after this goes in.  I will care about regress
as this test is quite complex an needs several machines to setup.
I am currently running a full regress to find more fallout.

I do not use pfsync, so I cannot say what the consequences of the
change are in this area.  Also I don't know why pf-route interfaces
were designed in such a strange way.

>From a user perspective it is not clear, why route-to should not
work together with no-state.  So we should either fix it or document
it and add a check in the parser.  Is fixing hard?

Are we losing any other features apart from this strange arp reuse
you described in your mail?

There is some refactoring in your diff.  I splitted it to make
review easier.  I think this should go in first.  Note that the
pf_state variable is called st in if_pfsync.c.  Can we be consistent
here?  Is the pfsync_state properly aligned?  During import it comes
from an mbuf.

Is there anything else that can be split out easily?

bluhm

Index: net/if_pfsync.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.279
diff -u -p -r1.279 if_pfsync.c
--- net/if_pfsync.c 12 Dec 2020 11:49:02 -  1.279
+++ net/if_pfsync.c 3 Jan 2021 17:16:55 -
@@ -612,7 +612,7 @@ pfsync_state_import(struct pfsync_state 
st->rtableid[PF_SK_STACK] = ntohl(sp->rtableid[PF_SK_STACK]);
 
/* copy to state */
-   bcopy(&sp->rt_addr, &st->rt_addr, sizeof(st->rt_addr));
+   st->rt_addr = sp->rt_addr;
st->creation = getuptime() - ntohl(sp->creation);
st->expire = getuptime();
if (ntohl(sp->expire)) {
@@ -1843,6 +1843,7 @@ pfsync_undefer(struct pfsync_deferral *p
 {
struct pfsync_softc *sc = pfsyncif;
struct pf_pdesc pdesc;
+   struct pf_state *st = pd->pd_st;
 
NET_ASSERT_LOCKED();
 
@@ -1852,35 +1853,32 @@ pfsync_undefer(struct pfsync_deferral *p
TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry);
sc->sc_deferred--;
 
-   CLR(pd->pd_st->state_flags, PFSTATE_ACK);
+   CLR(st->state_flags, PFSTATE_ACK);
if (drop)
m_freem(pd->pd_m);
else {
-   if (pd->pd_st->rule.ptr->rt == PF_ROUTETO) {
-   if (pf_setup_pdesc(&pdesc,
-   pd->pd_st->key[PF_SK_WIRE]->af,
-   pd->pd_st->direction, pd->pd_st->rt_kif,
-   pd->pd_m, NULL) != PF_PASS) {
+   if (st->rule.ptr->rt == PF_ROUTETO) {
+   if (pf_setup_pdesc(&pdesc, st->key[PF_SK_WIRE]->af,
+   st->direction, st->kif, pd->pd_m, NULL) !=
+   PF_PASS) {
m_freem(pd->pd_m);
goto out;
}
-   switch (pd->pd_st->key[PF_SK_WIRE]->af) {
+   switch (st->key[PF_SK_WIRE]->af) {
case AF_INET:
-   pf_route(&pdesc,
-   pd->pd_st->rule.ptr, pd->pd_st);
+   pf_route(&pdesc, st->rule.ptr, st);
break;
 #ifdef INET6
case AF_INET6:
-   pf_route6(&pdesc,
-   pd->pd_st->rule.ptr, pd->pd_st);
+   pf_route6(&pdesc, st->rule.ptr, st);
break;
 #endif /* INET6 */
default:
-   unhandled_af(pd->pd_st->key[PF_SK_WIRE]->af);
+   unhandled_af(st->key[PF_SK_WIRE]->af);
}
pd->pd_m = pdesc.m;
} else {
-   switch (pd->pd_st->key[PF_SK_WIRE]->af) {
+   switch (st->key[PF_SK_WIRE]->af) {
case AF_INET:
ip_output(pd->pd_m, NULL, NULL, 0, NULL, NULL,
0);
@@ -1892,12 +1890,12 @@ pfsync_undefer(struct pfsync_deferral *p
break;
 #endif /* INET6 */
default:
-   unhandled_af(pd->pd_st->key[PF_SK_WIRE

Re: pf route-to issues

2021-01-02 Thread David Gwynne

On Tue, Oct 20, 2020 at 09:27:09AM +1000, David Gwynne wrote:
> 
> i am feeling very warm and fuzzy about this diff at the moment.

We've been running this diff in production for the last couple of
months, and it's been solid for us so far. Ignoring the fixes for
crashes, I personally find it a lot more usable than the current
route-to rules too.

Can I commit it?

Index: sbin/pfctl/parse.y
===
RCS file: /cvs/src/sbin/pfctl/parse.y,v
retrieving revision 1.707
diff -u -p -r1.707 parse.y
--- sbin/pfctl/parse.y  16 Dec 2020 18:01:16 -  1.707
+++ sbin/pfctl/parse.y  3 Jan 2021 03:53:02 -
@@ -276,6 +276,7 @@ struct filter_opts {
struct redirspec nat;
struct redirspec rdr;
struct redirspec rroute;
+   u_int8_t rt;
 
/* scrub opts */
int  nodf;
@@ -284,15 +285,6 @@ struct filter_opts {
int  randomid;
int  max_mss;
 
-   /* route opts */
-   struct {
-   struct node_host*host;
-   u_int8_t rt;
-   u_int8_t pool_opts;
-   sa_family_t  af;
-   struct pf_poolhashkey   *key;
-   }route;
-
struct {
u_int32_t   limit;
u_int32_t   seconds;
@@ -518,7 +510,6 @@ int parseport(char *, struct range *r, i
 %type  ipspec xhost host dynaddr host_list
 %type  table_host_list tablespec
 %type  redir_host_list redirspec
-%type  route_host route_host_list routespec
 %typeos xos os_list
 %type  portspec port_list port_item
 %type   uids uid_list uid_item
@@ -975,7 +966,7 @@ anchorrule  : ANCHOR anchorname dir quick
YYERROR;
}
 
-   if ($9.route.rt) {
+   if ($9.rt) {
yyerror("cannot specify route handling "
"on anchors");
YYERROR;
@@ -1843,37 +1834,13 @@ pfrule  : action dir logquick interface 
decide_address_family($7.src.host, &r.af);
decide_address_family($7.dst.host, &r.af);
 
-   if ($8.route.rt) {
-   if (!r.direction) {
+   if ($8.rt) {
+   if ($8.rt != PF_DUPTO && !r.direction) {
yyerror("direction must be explicit "
"with rules that specify routing");
YYERROR;
}
-   r.rt = $8.route.rt;
-   r.route.opts = $8.route.pool_opts;
-   if ($8.route.key != NULL)
-   memcpy(&r.route.key, $8.route.key,
-   sizeof(struct pf_poolhashkey));
-   }
-   if (r.rt) {
-   decide_address_family($8.route.host, &r.af);
-   if ((r.route.opts & PF_POOL_TYPEMASK) ==
-   PF_POOL_NONE && ($8.route.host->next != 
NULL ||
-   $8.route.host->addr.type == PF_ADDR_TABLE ||
-   DYNIF_MULTIADDR($8.route.host->addr)))
-   r.route.opts |= PF_POOL_ROUNDROBIN;
-   if ($8.route.host->next != NULL) {
-   if (!PF_POOL_DYNTYPE(r.route.opts)) {
-   yyerror("address pool option "
-   "not supported by type");
-   YYERROR;
-   }
-   }
-   /* fake redirspec */
-   if (($8.rroute.rdr = calloc(1,
-   sizeof(*$8.rroute.rdr))) == NULL)
-   err(1, "$8.rroute.rdr");
-   $8.rroute.rdr->host = $8.route.host;
+   r.rt = $8.rt;
}
 
if (expand_divertspec(&r, &$8.divert))
@@ -2137,30 +2104,14 @@ filter_opt  : USER uids {
sizeof(filter_opts.nat.pool_opts));
filter_opts.nat.pool_opts.staticport = 1;
}
-   | ROUTETO routespec pool_opts {
-   filter_opts.route.host = $2;
-

Re: pf route-to issues

2020-10-19 Thread David Gwynne

On Mon, Oct 19, 2020 at 12:33:25PM +0100, Stuart Henderson wrote:
> On 2020/10/19 19:53, David Gwynne wrote:
> > On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> > > On 2020/10/19 15:35, David Gwynne wrote:
> > > > every few years i try and use route-to in pf, and every time it
> > > > goes badly. i tried it again last week in a slightly different
> > > > setting, and actually tried to understand the sharp edges i hit
> > > > this time instead of giving up. it turns out there are 2 or 3
> > > > different things together that have cause me trouble, which is why
> > > > the diff below is so big.
> > > 
> > > I used to route-to/reply-to quite a lot at places with poor internet
> > > connections to split traffic between lines (mostly those have better
> > > connections now so I don't need it as often). It worked as I expected -
> > > but I only ever used it with the interface specified.
> > 
> > cool. did it work beyond the first packet in a connection?
> 
> It must have done. The webcams would have utterly broken the rest of
> traffic if it hadn't :)


> 
> > > I mostly used it with pppoe interfaces so the peer address was unknown
> > > at ruleset load time. (I was lucky and had static IPs my side, but the
> > > ISP side was variable). I relied on the fact that once packets are
> > > directed at a point-point interface there's only one place for them to
> > > go. I didn't notice that ":peer" might be useful here (and the syntax
> > > 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> > > have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> > > (though actually I think it works with $any_random_address@pppoeX).
> > 
> > yes. i was trying to use it with peers over ethernet, and always
> > struggled with the syntax.
> > 
> > > > the first and i would argue most fundamental problem is a semantic
> > > > problem. if you ask a random person who has some clue about networks
> > > > and routing what they would expect the "argument" to route-to or
> > > > reply-to to be, they would say "a nexthop address" or "a gateway
> > > > address". eg, say i want to force packets to a specific backend
> > > > server without using NAT, i would write a rule like this:
> > > > 
> > > >   n_servers="192.0.2.128/27"
> > > >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > > > 
> > > > pfctl will happily parse this, shove it into the kernel, let you read
> > > > the rules back out again with pfctl -sr, and it all looks plausible, but
> > > > it turns out that it's using the argument to route-to as an interface
> > > > name. because rulesets can refer to interfaces that don't exist yet, pf
> > > > just passes the IP address around as a string, hoping i'll plug in an
> > > > interface with a driver name that looks like an ip address. i spent
> > > > literally a day trying to figure out why a rule like this wasn't
> > > > working.
> > > 
> > > I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> > > supposed to work. So either doc or implementation bug there.
> > 
> > im leaning toward implementation bug.
> > 
> > >  route  = ( "route-to" | "reply-to" | "dup-to" )
> > >   ( routehost | "{" routehost-list "}" )
> > >   [ pooltype ]
> > > 
> > >  routehost-list = routehost [ [ "," ] routehost-list ]
> > > 
> > >  routehost  = host | host "@" interface-name |
> > >   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> > > 
> > > > the second problem is that the pf_route calls from pfsync don't
> > > > have all the information it is supposed to have. more specifically,
> > > > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > > > isn't set because pfsync doesnt track which interface a packet is
> > > > going out, it assumes the ip layer will get it right again later, or a
> > > > rule provided something usable.
> > > > 
> > > > the third problem is that pf_route relies on information from rules to
> > > > work correctly. this is a problem in a pfsync environment because you
> > > > cannot have the same ruleset on both firewalls 100% of the time, which
> > > > means you cannot have route-to/reply-to behave consistently on a pair of
> > > > firwalls 100% of the time.
> > > 
> > > I didn't run into this because pppoe(4) and pfsync/carp don't really
> > > go well together, but ouch!
> > > 
> > > > all of this together makes things work pretty obviously and smoothly.
> > > > in my opinion anyway. route-to now works more like rdr-to, it just
> > > > feels like it changes the address used for the route lookup rather
> > > > than changing the actual IP address in the packet. it also works
> > > > predictably in a pfsync pair, which is great from the point of view of
> > > > high availability.
> > > > 
> > > > the main caveat is that it's not backward compatible. if you're already
> > > > using route-to, you will need to tweak your rules to have them par

Re: pf route-to issues

2020-10-19 Thread Stuart Henderson

On 2020/10/19 19:53, David Gwynne wrote:
> On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> > On 2020/10/19 15:35, David Gwynne wrote:
> > > every few years i try and use route-to in pf, and every time it
> > > goes badly. i tried it again last week in a slightly different
> > > setting, and actually tried to understand the sharp edges i hit
> > > this time instead of giving up. it turns out there are 2 or 3
> > > different things together that have cause me trouble, which is why
> > > the diff below is so big.
> > 
> > I used to route-to/reply-to quite a lot at places with poor internet
> > connections to split traffic between lines (mostly those have better
> > connections now so I don't need it as often). It worked as I expected -
> > but I only ever used it with the interface specified.
> 
> cool. did it work beyond the first packet in a connection?

It must have done. The webcams would have utterly broken the rest of
traffic if it hadn't :)

> > I mostly used it with pppoe interfaces so the peer address was unknown
> > at ruleset load time. (I was lucky and had static IPs my side, but the
> > ISP side was variable). I relied on the fact that once packets are
> > directed at a point-point interface there's only one place for them to
> > go. I didn't notice that ":peer" might be useful here (and the syntax
> > 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> > have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> > (though actually I think it works with $any_random_address@pppoeX).
> 
> yes. i was trying to use it with peers over ethernet, and always
> struggled with the syntax.
> 
> > > the first and i would argue most fundamental problem is a semantic
> > > problem. if you ask a random person who has some clue about networks
> > > and routing what they would expect the "argument" to route-to or
> > > reply-to to be, they would say "a nexthop address" or "a gateway
> > > address". eg, say i want to force packets to a specific backend
> > > server without using NAT, i would write a rule like this:
> > > 
> > >   n_servers="192.0.2.128/27"
> > >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > > 
> > > pfctl will happily parse this, shove it into the kernel, let you read
> > > the rules back out again with pfctl -sr, and it all looks plausible, but
> > > it turns out that it's using the argument to route-to as an interface
> > > name. because rulesets can refer to interfaces that don't exist yet, pf
> > > just passes the IP address around as a string, hoping i'll plug in an
> > > interface with a driver name that looks like an ip address. i spent
> > > literally a day trying to figure out why a rule like this wasn't
> > > working.
> > 
> > I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> > supposed to work. So either doc or implementation bug there.
> 
> im leaning toward implementation bug.
> 
> >  route  = ( "route-to" | "reply-to" | "dup-to" )
> >   ( routehost | "{" routehost-list "}" )
> >   [ pooltype ]
> > 
> >  routehost-list = routehost [ [ "," ] routehost-list ]
> > 
> >  routehost  = host | host "@" interface-name |
> >   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> > 
> > > the second problem is that the pf_route calls from pfsync don't
> > > have all the information it is supposed to have. more specifically,
> > > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > > isn't set because pfsync doesnt track which interface a packet is
> > > going out, it assumes the ip layer will get it right again later, or a
> > > rule provided something usable.
> > > 
> > > the third problem is that pf_route relies on information from rules to
> > > work correctly. this is a problem in a pfsync environment because you
> > > cannot have the same ruleset on both firewalls 100% of the time, which
> > > means you cannot have route-to/reply-to behave consistently on a pair of
> > > firwalls 100% of the time.
> > 
> > I didn't run into this because pppoe(4) and pfsync/carp don't really
> > go well together, but ouch!
> > 
> > > all of this together makes things work pretty obviously and smoothly.
> > > in my opinion anyway. route-to now works more like rdr-to, it just
> > > feels like it changes the address used for the route lookup rather
> > > than changing the actual IP address in the packet. it also works
> > > predictably in a pfsync pair, which is great from the point of view of
> > > high availability.
> > > 
> > > the main caveat is that it's not backward compatible. if you're already
> > > using route-to, you will need to tweak your rules to have them parse.
> > > however, i doubt anyone is using this stuff because it feels very broken
> > > to me.
> > 
> > Do you expect this to work with a bracketed "address" to defer lookup
> > until rule evaluation time? i.e.
> > 
> > pass out proto tcp to any p

Re: pf route-to issues

2020-10-19 Thread David Gwynne

On Mon, Oct 19, 2020 at 12:28:19PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > > 
> > > it seems to me 'route-to vs. pfsync' still needs more thought.  the
> > > next-hop IP address in route-to may be different for each PF box
> > > linked by pfsync(4). To be honest I have no answer to address this
> > > issue at the moment.
> > 
> > i have thought about that a little bit. we could play with what the
> > argument to route-to means. rather than requiring it to be a directly
> > connected host/gateway address, we could interpret it as a destination
> > address, and use the gateway for that destination as the nexthop.
> > 
> > eg, if i have the following routing table on frontend a:
> > 
> > Internet:
> > DestinationGatewayFlags   Refs  Use   Mtu  Prio 
> > Iface
> > default192.168.96.33  UGS6   176171 - 8 
> > vmx0 
> > 224/4  127.0.0.1  URS00 32768 8 lo0 
> >  
> > 10.0.0.0/30192.168.0.1UGS00 - 8 
> > gre0 
> 
> > 
> > and this routing table on frontend b:
> > 
> > Internet:
> > DestinationGatewayFlags   Refs  Use   Mtu  Prio 
> > Iface
> > default192.168.96.33  UGS987548 - 8 
> > aggr0
> > 224/4  127.0.0.1  URS00 32768 8 lo0 
> >  
> > 10.0.0.0/30192.168.0.3UGS00 - 8 
> > gre0 
> 
> > 
> > if gre0 on both frontends pointed at different legs on the same backend
> > server, i could write a pf rule like this:
> > 
> > pass out to port 80 route-to 10.0.0.1
> > 
> > 10.0.0.1 would then end up as the rt_addr field in pf_state and
> > pfsync_state.
> > 
> > both frontend a and b would lookup the route to 10.0.0.1, and then
> > use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
> > both would end up pushing the packet over their gre link to the
> > same backend. the same semantic would work if the link to the backend
> > was over ethernet instead of a tunnel.
> > 
> > > > thoughts?
> > > > 
> > > 
> > > What you've said makes sense. However I still feel pfsync(4)
> > > does not play well with route-to.
> > 
> > maybe your opinion is different if the above makes sense?
> > 
> 
> Thanks for detailed explanation. This is good enough to make me happy.
> The remaining questions on this are sort of 'homework for me' to poke
> to PF source code, for example:
>   are we doing route look up for every packet? or route look up
>   is performed when state is created/imported? (and we cache
>   outbound interface + next-hop along the state)

the "destination" address is determined when the state is created and
stored in rt_addr if pf_state. the route lookup using that address is
done per packet in pf_route.

>   also what happens when route does not exist on pfsync peer, which
>   receives state? How admin will discover state failed to import?

the route lookup or interface lookup fails and the packet is dropped.
there are no counters to show this is happening though :(

> Anyway, your plan above looks solid to me now. It's certainly more 
> flexible
> (?reliable?) to select route to particular destination, than using pair of
> interface,next-hop.

cool, i'll keep working on it then.

> 
> thanks and
> regards
> sashan

Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky

Hello,


> > 
> > it seems to me 'route-to vs. pfsync' still needs more thought.  the
> > next-hop IP address in route-to may be different for each PF box
> > linked by pfsync(4). To be honest I have no answer to address this
> > issue at the moment.
> 
> i have thought about that a little bit. we could play with what the
> argument to route-to means. rather than requiring it to be a directly
> connected host/gateway address, we could interpret it as a destination
> address, and use the gateway for that destination as the nexthop.
> 
> eg, if i have the following routing table on frontend a:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS6   176171 - 8 vmx0 
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.1UGS00 - 8 gre0 

> 
> and this routing table on frontend b:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS987548 - 8 aggr0
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.3UGS00 - 8 gre0 

> 
> if gre0 on both frontends pointed at different legs on the same backend
> server, i could write a pf rule like this:
> 
> pass out to port 80 route-to 10.0.0.1
> 
> 10.0.0.1 would then end up as the rt_addr field in pf_state and
> pfsync_state.
> 
> both frontend a and b would lookup the route to 10.0.0.1, and then
> use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
> both would end up pushing the packet over their gre link to the
> same backend. the same semantic would work if the link to the backend
> was over ethernet instead of a tunnel.
> 
> > > thoughts?
> > > 
> > 
> > What you've said makes sense. However I still feel pfsync(4)
> > does not play well with route-to.
> 
> maybe your opinion is different if the above makes sense?
> 

Thanks for detailed explanation. This is good enough to make me happy.
The remaining questions on this are sort of 'homework for me' to poke
to PF source code, for example:
are we doing route look up for every packet? or route look up
is performed when state is created/imported? (and we cache
outbound interface + next-hop along the state)

also what happens when route does not exist on pfsync peer, which
receives state? How admin will discover state failed to import?

Anyway, your plan above looks solid to me now. It's certainly more flexible
(?reliable?) to select route to particular destination, than using pair of
interface,next-hop.

thanks and
regards
sashan

Re: pf route-to issues

2020-10-19 Thread David Gwynne

On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> On 2020/10/19 15:35, David Gwynne wrote:
> > every few years i try and use route-to in pf, and every time it
> > goes badly. i tried it again last week in a slightly different
> > setting, and actually tried to understand the sharp edges i hit
> > this time instead of giving up. it turns out there are 2 or 3
> > different things together that have cause me trouble, which is why
> > the diff below is so big.
> 
> I used to route-to/reply-to quite a lot at places with poor internet
> connections to split traffic between lines (mostly those have better
> connections now so I don't need it as often). It worked as I expected -
> but I only ever used it with the interface specified.

cool. did it work beyond the first packet in a connection?

> I mostly used it with pppoe interfaces so the peer address was unknown
> at ruleset load time. (I was lucky and had static IPs my side, but the
> ISP side was variable). I relied on the fact that once packets are
> directed at a point-point interface there's only one place for them to
> go. I didn't notice that ":peer" might be useful here (and the syntax
> 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> (though actually I think it works with $any_random_address@pppoeX).

yes. i was trying to use it with peers over ethernet, and always
struggled with the syntax.

> > the first and i would argue most fundamental problem is a semantic
> > problem. if you ask a random person who has some clue about networks
> > and routing what they would expect the "argument" to route-to or
> > reply-to to be, they would say "a nexthop address" or "a gateway
> > address". eg, say i want to force packets to a specific backend
> > server without using NAT, i would write a rule like this:
> > 
> >   n_servers="192.0.2.128/27"
> >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > 
> > pfctl will happily parse this, shove it into the kernel, let you read
> > the rules back out again with pfctl -sr, and it all looks plausible, but
> > it turns out that it's using the argument to route-to as an interface
> > name. because rulesets can refer to interfaces that don't exist yet, pf
> > just passes the IP address around as a string, hoping i'll plug in an
> > interface with a driver name that looks like an ip address. i spent
> > literally a day trying to figure out why a rule like this wasn't
> > working.
> 
> I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> supposed to work. So either doc or implementation bug there.

im leaning toward implementation bug.

>  route  = ( "route-to" | "reply-to" | "dup-to" )
>   ( routehost | "{" routehost-list "}" )
>   [ pooltype ]
> 
>  routehost-list = routehost [ [ "," ] routehost-list ]
> 
>  routehost  = host | host "@" interface-name |
>   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> 
> > the second problem is that the pf_route calls from pfsync don't
> > have all the information it is supposed to have. more specifically,
> > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > isn't set because pfsync doesnt track which interface a packet is
> > going out, it assumes the ip layer will get it right again later, or a
> > rule provided something usable.
> > 
> > the third problem is that pf_route relies on information from rules to
> > work correctly. this is a problem in a pfsync environment because you
> > cannot have the same ruleset on both firewalls 100% of the time, which
> > means you cannot have route-to/reply-to behave consistently on a pair of
> > firwalls 100% of the time.
> 
> I didn't run into this because pppoe(4) and pfsync/carp don't really
> go well together, but ouch!
> 
> > all of this together makes things work pretty obviously and smoothly.
> > in my opinion anyway. route-to now works more like rdr-to, it just
> > feels like it changes the address used for the route lookup rather
> > than changing the actual IP address in the packet. it also works
> > predictably in a pfsync pair, which is great from the point of view of
> > high availability.
> > 
> > the main caveat is that it's not backward compatible. if you're already
> > using route-to, you will need to tweak your rules to have them parse.
> > however, i doubt anyone is using this stuff because it feels very broken
> > to me.
> 
> Do you expect this to work with a bracketed "address" to defer lookup
> until rule evaluation time? i.e.
> 
> pass out proto tcp to any port 22 route-to (pppoe1:peer)

in my opinion route-to should be able to take whatever rdr-to accepts.
however, i just tried it and it doesnt currently work, but i'm sure i
can figure it out.

> I think that will be all that's needed to allow converting the pppoe
> use case. I don't have a multiple pppoe setup handy bu

Re: pf route-to issues

2020-10-19 Thread David Gwynne

On Mon, Oct 19, 2020 at 09:46:19AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> disclaimer: I have no chance to run pfsync on production, I'm very
> inexperienced with pfsync(4).

i wrote the defer code in pfsync, and i think i wrote the code in
pfsync that calls pf_route badly, so noones perfect :)

> 
> > 
> > the third problem is that pf_route relies on information from rules to
> > work correctly. this is a problem in a pfsync environment because you
> > cannot have the same ruleset on both firewalls 100% of the time, which
> > means you cannot have route-to/reply-to behave consistently on a pair of
> > firwalls 100% of the time.
> > 
> > my solution to both these problems is reduce the amount of information
> > pf_route needs to work with, to make sure that the info it does need
> > is in the pf state structure, and that pfsync handles it properly.
> > 
> > if we limit the information needed for pf_route to a nexthop address,
> > and which direction the address is used, this is doable. both the
> > pf_state and pfsync_state structs already contain an address to store a
> > nexthop in, i just had to move the route-to direction from the rule into
> > the state. this is easy with pf_state, but i used a spare pad field in
> > pfsync_state for this.
> > 
> 
> it seems to me 'route-to vs. pfsync' still needs more thought.  the
> next-hop IP address in route-to may be different for each PF box
> linked by pfsync(4). To be honest I have no answer to address this
> issue at the moment.

i have thought about that a little bit. we could play with what the
argument to route-to means. rather than requiring it to be a directly
connected host/gateway address, we could interpret it as a destination
address, and use the gateway for that destination as the nexthop.

eg, if i have the following routing table on frontend a:

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default192.168.96.33  UGS6   176171 - 8 vmx0 
224/4  127.0.0.1  URS00 32768 8 lo0  
10.0.0.0/30192.168.0.1UGS00 - 8 gre0 
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHhl   1  142 32768 1 lo0  
192.168.0.0192.168.0.0UHl00 - 1 gre0 
192.168.0.1192.168.0.0UHh11 - 8 gre0 
192.168.96.32/27   192.168.96.34  UCn2   122849 - 4 vmx0 
192.168.96.33  00:00:5e:00:01:47  UHLch  114611 - 3 vmx0 
192.168.96.34  00:50:56:a1:73:91  UHLl   0   362231 - 1 vmx0 
192.168.96.60  fe:e1:ba:d0:74:ef  UHLc   059690 - 3 vmx0 
192.168.96.63  192.168.96.34  UHb00 - 1 vmx0 

and this routing table on frontend b:

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default192.168.96.33  UGS987548 - 8 aggr0
224/4  127.0.0.1  URS00 32768 8 lo0  
10.0.0.0/30192.168.0.3UGS00 - 8 gre0 
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHhl   1   62 32768 1 lo0  
192.168.0.2192.168.0.2UHl00 - 1 gre0 
192.168.0.3192.168.0.2UHh11 - 8 gre0 
192.168.96.32/27   192.168.96.55  UCn362186 - 4 aggr0
192.168.96.33  00:00:5e:00:01:47  UHLch  1 7442 - 3 aggr0
192.168.96.35  00:23:42:d0:56:8e  UHLc   0 1905 - 3 aggr0
192.168.96.55  fe:e1:ba:d0:e0:83  UHLl   0   178119 - 1 aggr0
192.168.96.60  fe:e1:ba:d0:74:ef  UHLc   031021 - 3 aggr0
192.168.96.63  192.168.96.55  UHb00 - 1 aggr0

if gre0 on both frontends pointed at different legs on the same backend
server, i could write a pf rule like this:

pass out to port 80 route-to 10.0.0.1

10.0.0.1 would then end up as the rt_addr field in pf_state and
pfsync_state.

both frontend a and b would lookup the route to 10.0.0.1, and then
use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
both would end up pushing the packet over their gre link to the
same backend. the same semantic would work if the link to the backend
was over ethernet instead of a tunnel.

> > thoughts?
> > 
> 
> What you've said makes sense. However I still feel pfsync(4)
> does not play well with route-to.

maybe your opinion is different if the above makes sense?

> thanks and
> regards
> sashan

no, thank you for reading my long email.

cheers,
dlg

Re: pf route-to issues

2020-10-19 Thread Stuart Henderson

On 2020/10/19 15:35, David Gwynne wrote:
> every few years i try and use route-to in pf, and every time it
> goes badly. i tried it again last week in a slightly different
> setting, and actually tried to understand the sharp edges i hit
> this time instead of giving up. it turns out there are 2 or 3
> different things together that have cause me trouble, which is why
> the diff below is so big.

I used to route-to/reply-to quite a lot at places with poor internet
connections to split traffic between lines (mostly those have better
connections now so I don't need it as often). It worked as I expected -
but I only ever used it with the interface specified.

I mostly used it with pppoe interfaces so the peer address was unknown
at ruleset load time. (I was lucky and had static IPs my side, but the
ISP side was variable). I relied on the fact that once packets are
directed at a point-point interface there's only one place for them to
go. I didn't notice that ":peer" might be useful here (and the syntax
'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
(though actually I think it works with $any_random_address@pppoeX).

> the first and i would argue most fundamental problem is a semantic
> problem. if you ask a random person who has some clue about networks
> and routing what they would expect the "argument" to route-to or
> reply-to to be, they would say "a nexthop address" or "a gateway
> address". eg, say i want to force packets to a specific backend
> server without using NAT, i would write a rule like this:
> 
>   n_servers="192.0.2.128/27"
>   pass out on $if_internal to $n_servers route-to 192.168.0.1
> 
> pfctl will happily parse this, shove it into the kernel, let you read
> the rules back out again with pfctl -sr, and it all looks plausible, but
> it turns out that it's using the argument to route-to as an interface
> name. because rulesets can refer to interfaces that don't exist yet, pf
> just passes the IP address around as a string, hoping i'll plug in an
> interface with a driver name that looks like an ip address. i spent
> literally a day trying to figure out why a rule like this wasn't
> working.

I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
supposed to work. So either doc or implementation bug there.

 route  = ( "route-to" | "reply-to" | "dup-to" )
  ( routehost | "{" routehost-list "}" )
  [ pooltype ]

 routehost-list = routehost [ [ "," ] routehost-list ]

 routehost  = host | host "@" interface-name |
  "(" interface-name [ address [ "/" mask-bits ] ] ")"

> the second problem is that the pf_route calls from pfsync don't
> have all the information it is supposed to have. more specifically,
> an ifp pointer isn't set which leads to a segfault. the ifp pointer
> isn't set because pfsync doesnt track which interface a packet is
> going out, it assumes the ip layer will get it right again later, or a
> rule provided something usable.
> 
> the third problem is that pf_route relies on information from rules to
> work correctly. this is a problem in a pfsync environment because you
> cannot have the same ruleset on both firewalls 100% of the time, which
> means you cannot have route-to/reply-to behave consistently on a pair of
> firwalls 100% of the time.

I didn't run into this because pppoe(4) and pfsync/carp don't really
go well together, but ouch!

> all of this together makes things work pretty obviously and smoothly.
> in my opinion anyway. route-to now works more like rdr-to, it just
> feels like it changes the address used for the route lookup rather
> than changing the actual IP address in the packet. it also works
> predictably in a pfsync pair, which is great from the point of view of
> high availability.
> 
> the main caveat is that it's not backward compatible. if you're already
> using route-to, you will need to tweak your rules to have them parse.
> however, i doubt anyone is using this stuff because it feels very broken
> to me.

Do you expect this to work with a bracketed "address" to defer lookup
until rule evaluation time? i.e.

pass out proto tcp to any port 22 route-to (pppoe1:peer)

I think that will be all that's needed to allow converting the pppoe
use case. I don't have a multiple pppoe setup handy but I can probably
hack together some sort of test.

I've also used route-to with squid "transparent" proxying (shown in
the pkg-readme), I don't do that any more but I can put a squid test
together easily enough.

> @@ -1842,37 +1833,18 @@ pfrule: action dir logquick interface 
>   decide_address_family($7.src.host, &r.af);
>   decide_address_family($7.dst.host, &r.af);
>  
> - if ($8.route.rt) {
> + if ($8.rt) {
> + if ($8.rt != PF_DUPTO && !r.direction) {
>

Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky

Hello,

disclaimer: I have no chance to run pfsync on production, I'm very
inexperienced with pfsync(4).


> 
> the third problem is that pf_route relies on information from rules to
> work correctly. this is a problem in a pfsync environment because you
> cannot have the same ruleset on both firewalls 100% of the time, which
> means you cannot have route-to/reply-to behave consistently on a pair of
> firwalls 100% of the time.
> 
> my solution to both these problems is reduce the amount of information
> pf_route needs to work with, to make sure that the info it does need
> is in the pf state structure, and that pfsync handles it properly.
> 
> if we limit the information needed for pf_route to a nexthop address,
> and which direction the address is used, this is doable. both the
> pf_state and pfsync_state structs already contain an address to store a
> nexthop in, i just had to move the route-to direction from the rule into
> the state. this is easy with pf_state, but i used a spare pad field in
> pfsync_state for this.
> 

it seems to me 'route-to vs. pfsync' still needs more thought.  the
next-hop IP address in route-to may be different for each PF box
linked by pfsync(4). To be honest I have no answer to address this
issue at the moment.

> 
> thoughts?
> 

What you've said makes sense. However I still feel pfsync(4)
does not play well with route-to.


thanks and
regards
sashan

pf route-to issues

2020-10-18 Thread David Gwynne

every few years i try and use route-to in pf, and every time it
goes badly. i tried it again last week in a slightly different
setting, and actually tried to understand the sharp edges i hit
this time instead of giving up. it turns out there are 2 or 3
different things together that have cause me trouble, which is why
the diff below is so big.

the first and i would argue most fundamental problem is a semantic
problem. if you ask a random person who has some clue about networks
and routing what they would expect the "argument" to route-to or
reply-to to be, they would say "a nexthop address" or "a gateway
address". eg, say i want to force packets to a specific backend
server without using NAT, i would write a rule like this:

  n_servers="192.0.2.128/27"
  pass out on $if_internal to $n_servers route-to 192.168.0.1

pfctl will happily parse this, shove it into the kernel, let you read
the rules back out again with pfctl -sr, and it all looks plausible, but
it turns out that it's using the argument to route-to as an interface
name. because rulesets can refer to interfaces that don't exist yet, pf
just passes the IP address around as a string, hoping i'll plug in an
interface with a driver name that looks like an ip address. i spent
literally a day trying to figure out why a rule like this wasn't
working.

i happened to be talking to pascoe@ at the time, and his vague memory
was that the idea was to try and switch the interface a packet was going
to travel over, but to try and reuse the arp lookup from the parent one.
neither of us could figure out why that would be a good idea though.

the best i can say about this is that it only really makes some
kind of sense if you're moving a packet into a tunnel. tunnels don't
really care about nexthops and will happily route anything you give
them. if you were trying to add a route to the routing table to do this,
you'd be specifying the peer address on a tunnel interface as the
gateway. pf has a if0:peer syntax that makes this convenient to write.

so i want to change route-to in pfctl so it takes a nexthop instead
of an interface. you could argue that pf already lets you do this,
because there's some bs nexthop@interface syntax. my counter argument
is that the interface the nexthop is reachable over is redundant, and it
makes fixing some of the other problems harder if we keep it.

the second and third problems i hit are when route-to is used on
a pair of boxes that have pfsync and pfsync defer set up. when defer
is enabled, pfsync takes the packet away from the forwarding path,
and when it has some confidence that the peer is aware of the state,
then it tries to push the packet back out.

to understand the following, be aware that route-to, reply-to, and
dup-to are implemented in pf in a pair of functions called pf_route
and pf_route6. if i say pf_route, just assume i'm talking about
both of these functions.

the second problem is that the pf_route calls from pfsync don't
have all the information it is supposed to have. more specifically,
an ifp pointer isn't set which leads to a segfault. the ifp pointer
isn't set because pfsync doesnt track which interface a packet is
going out, it assumes the ip layer will get it right again later, or a
rule provided something usable.

the third problem is that pf_route relies on information from rules to
work correctly. this is a problem in a pfsync environment because you
cannot have the same ruleset on both firewalls 100% of the time, which
means you cannot have route-to/reply-to behave consistently on a pair of
firwalls 100% of the time.

my solution to both these problems is reduce the amount of information
pf_route needs to work with, to make sure that the info it does need
is in the pf state structure, and that pfsync handles it properly.

if we limit the information needed for pf_route to a nexthop address,
and which direction the address is used, this is doable. both the
pf_state and pfsync_state structs already contain an address to store a
nexthop in, i just had to move the route-to direction from the rule into
the state. this is easy with pf_state, but i used a spare pad field in
pfsync_state for this.

the pf_state struct has had which interface the route is using
removed. there's no simple way to sync interface information between
pfsync peers on the wire, and the need for them is marginal at best.
things are much simpler if we can get away with not having this info.

a bonus problem i hit is that there's code in pf_match that
appears to try and short circuit some processing of states when
route-to/reply-to is in effect. this has two consequences. first,
if you're using route-to with tcp states, half the tcp state machine
is is skipped. when you look at these states with pfctl -vvss,
one half of the TCP state never moves forward. secondly, because
the processing is short circuited, it never falls through to the
end of pf_test where the actual call to pf_route is done. so the
first packet is properly handled by pf_

56 matches

Mail list logo