On Thu, May 26, 2016 at 2:05 PM, Ben Pfaff <b...@ovn.org> wrote:

> On Thu, May 26, 2016 at 12:39:36PM -0400, Flavio Fernandes wrote:
> > To be discussed:
> >
> > One caveat here is that ttl should be decremented before vm
> > reaches <IP2.2>, assuming the ping packet is coming from
> > a remote subnet.
> >
> > In short, we ideally would have the match augmented to look
> > like this.
> >
> >   "(inport == %s || ip.ttl > 1) && (ip4.dst == "IP_FMT" || ip4.dst ==
> "IP_FMT") && ...
> >
> > Afer talking to Russel [1] it seems that we have either a bug or
> > a known limitation on how a match on ip.ttl can be used with the
> > greater than operator. I tried a couple of tweaks [2] against a
> > test ping with ttl 10 and concluded that both
> > negation and greater than are not working as expected. Assuming
> > ip.ttl should be supported in the match, there are 3 issues that
> > should be followed upon, which may be outside the scope of this
> > patch:
> >   1) fix/support negation and greater than operator for ip.ttl
> >   2) update doc on supported matches (ovn-sb);
> >   3) expand unit tests to verify ip.ttl in match
>
> Inequalities are intentionally unsupported for ip.ttl.  You can see this
> pretty clearly from ovstest output:
>
>     $ echo 'ip.ttl == 1' | tests/ovstest test-ovn parse-expr
>     ip.ttl == 0x1
>     $ echo 'ip.ttl > 1' | tests/ovstest test-ovn parse-expr
>     Only == and != operators may be used with nominal field ip.ttl.
>
>
Very cool! Thanks for mentioning about 'tests/ovstest test-ovn parse-expr'.
Do you think it would be a good idea to have this ttl expression added to
ovn.at ? If so, I can do it in a separate patch.



> The proximate reason for this is that ip.ttl is a nominal field.  That's
> because the OpenFlow field for IP TTL (NXM_NX_IP_TTL) is nonmaskable;
> that is, at the OpenFlow layer, OVS rejects attempts to match on a
> subset of bits in the TTL.
>
> It would be easy to change both of these decisions.  Making
> NXM_NX_IP_TTL maskable is a one-line change to a comment (!), and making
> ip.ttl ordinal would be equally trivial.
>
> But I made NXM_NX_IP_TTL nonmaskable for good reason.  That is because
> inequality matches are expensive.  The comparison "ip.ttl > 1" needs 7
> OpenFlow flows, each of which tests only one bit.  This can be a big
> deal because each of those flows ends up in a separate hash table in the
> classifier, adding 7 separate hash table lookups to any classifier
> search, which is inefficient.
>
> That's why the existing ip.ttl checks instead use "ip.ttl == {0,1}",
> which only uses 2 OpenFlow flows, both of which can fit in the same
> classifier lookup table.
>
>
Make sense. Just for giggles I did a little experiment using my vanilla
linux distro
as a router to see if it cared about ttl when responging to pings in a
similar
situation.

To my surprise -- kinda not -- the linux router code was happy to reply to
that
ping without prejudice. ;)  See below [pingTest] for the steps and output.

With that, I'm now proposing that we skip doing anything
beyond the removal of the inport from the logical flow match. If you do not
agree, please make it known on the next patch I submit for this.


> So, it's better if we can use a design that can use a higher-priority
> flow to handle too-low TTLs as a special case, then handle the common
> case with a lower priority flow.
>

And you already added that [ttldis]. My only point was that this is done
later
in the router datapath processing, so the packets destined to the router
are the
only ones that do not have the ttl check.

** update **: Darrell pointed out the excerpt in RFC that indicates that
ttl checking
is not to happen when packet is destined to the router. With that, there is
no
further discussion required on this particular topic.


> I may be microoptimizing.  I am open to that discussion.
>

Not at all, blp! And this is all very good info. I really appreciate it.

Best,

-- flaviof

[ttldis]:
https://github.com/openvswitch/ovs/commit/9975d7beb36ab3aadfd07c9d566f8d3d1d340fc4#diff-2c35162acf6ad144624954fdc4c3d9f4R1348

[pingTest]:
https://gist.github.com/bee26f82113ff9dd2715c6451e19718c

vm ---(192.168.50.0/24)--- rtr ---(192.168.111.0/24)

vm: 192.168.50.200
rtr (port1): 192.168.50.254
rtr (port2): 192.168.111.254

vagrant@ubuntu:~$ sudo ip route 192.168.111.254 255.255.255.255
192.168.50.254

vagrant@ubuntu:~$ sudo tcpdump -n -vvv -i eth1 &
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535
bytes

vagrant@ubuntu:~$ ping -t 1 -c 1 -I eth1 192.168.111.254 ; echo "--X-X--"
PING 192.168.111.254 (192.168.111.254) from 192.168.50.200 eth1: 56(84)
bytes of data.
64 bytes from 192.168.111.254: icmp_seq=1 ttl=64 time=0.526 ms

--- 192.168.111.254 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.526/0.526/0.526/0.000 ms
--X-X--
vagrant@ubuntu:~$ 19:53:21.606126 IP (tos 0x0, ttl 1, id 21870, offset 0,
flags [DF], proto ICMP (1), length 84)
    192.168.50.200 > 192.168.111.254: ICMP echo request, id 3656, seq 1,
length 64
19:53:21.606632 IP (tos 0x0, ttl 64, id 47178, offset 0, flags [none],
proto ICMP (1), length 84)
    192.168.111.254 > 192.168.50.200: ICMP echo reply, id 3656, seq 1,
length 64
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to