Re: [networking-discuss] Start of code review for IP datapath refactoring

Erik Nordmark Sun, 27 Sep 2009 14:02:21 -0700

Sebastien Roy wrote:

usr/src/uts/common/inet/iptun/iptun.c


* 746-750: This explains the implementation of conn_get_ixa().  This
  comment should be the intro block-comment to conn_get_ixa() in
  ip_attr.c.


Fixed.

* 759: This looks like it belongs in conn_get_ixa() as there is only
  one possible value for the ixa_cred of the ixa returned by
  conn_get_ixa(), and that is conn_cred.

That code isn't needed; too much copy and paste. (It only applies to UDPwhere we want to track the cred for each send* so that SCM_UCRED can beused on the receive side.)However, re-reading the TX code I realize we need to always restart withixa_tsl = crgetlabel(ixa_cred) before we call tsol_check_dest(), butthat code belongs in ip_set_destination_*() where we call tsol_check_dest().

* 761-769: This also looks like something that should be in
  conn_get_ixa().

Ditto. (it only applies to UDP ancillary data where the label changefor each packet due to SCM_UCRED).

* 768: Won't we need to set ixa_tsl on a per-packet basis?

Yes, we do that by calling tsol_check_label_* in the output path. But todo that we need the initial label which is based on the zone.

* 786-800: It seems like there should be some utility function to set
  conn addresses.  For example, the fact that IPv4 addresses are
  stored as IPv4-mapped IPv6 addresses in the conn_t seems like an
  implementation detail of ip that iptun shouldn't really need to care
  about.


I already responded to this one.

* 848: iptun_get_maxmtu() doesn't set the tunnel MTU, it just returns
  the maximum possible link MTU for a tunnel link.  The call to
  iptun_update_mtu() at 886 does what you want anyway (and that calls
  iptun_get_maxmtu() to do its job).


Fixed.

* 862: This can't happen because we call iptun_unbind() before
  iptun_bind() if the tunnel is already bound.  I suppose you could
  optimize things to make the iptun_unbind() unnecessary if you want.


I'll remove 862.

* 870,872: We should just have an ipcl_conn_insert() that calls
  ipcl_conn_insert_v[4|6]() depending on conn_ipversion.
  tcp_conn_request() and conn_connect() both have the same little
  if/else dance.


OK, but for uniformity it makes sense to do the same for ipcl_bind_insert.

* 891: The convention in this file is to use "done:" as the label, as
  the code that goes through here is in most cases not related to bind
  failure.


Will fix.

* 1311: The current process is some dladm incantation that will go
  away momentarily.  It's probably not meaningful to save that away as
  conn_cpid.


I'll use NOPID instead.

* 1795: There are some places in the code where ip_get_pmtu() is
  called with an argument of connp->conn_ixa directly without first
  getting a safe reference using conn_get_ixa() (in icmp_icmp_input()
  for example).  Is that deliberate?

We only call ip_get_pmtu() in one place, but there are places where weuse conn_ixa directly. However, currently that is only in iptun_create().

Where did you see it be used?

* 1807: I know they're not used, but it would help legibility to call
  the first arguments using names representing what was passed in.  On
  a related note, there seems to be no reason why the first argument
  couldn't simply be a "conn_t *" instead of "void *".


Agreed. Will fix.

* 1824: You don't need a default case.

OK


* 1976: What is this call to ip_attr_connect() about?


If ixa_ire was NULL we got a safe copy.

But I think I'll change the ixa_notify upcall to pass up theip_xmit_attr_t; that way we will have the currently used ip_xmit_attr_tand can modify it directly which avoids the conn_get_ixa iniptun_get_maxmtu.

* 2029: typo: there is no iptun_get_pmtu().


Should say ip_get_pmtu()

* 2043: In the case of IPTUN_FIXED_MTU, don't we only need update
  ixa_pmtu when we initially set iptun_mtu?


I already responded to this one.
I'll also add some comment to make this more clear.

* 2093: The ICMP error doesn't need to have the same cred as the
  original packet?


I read RFC 791 and there aren't credentials for IP packets ;-)

What is the purpose of the cred on the packet?

We use them for getpeerucred() and SCM_UCRED. They have also beenoverloaded to carry a label, but with refactoring we have the explicitixa_tsl for the label.

* 2150,2200: The cred associated with the conn_t won't necessarily be the
  same as the cred associated with the data.  The label we want in the
  packet is the one associated with the data, right?  Labels used to
  be computed using dblk creds...

I was hoping to already have merged with labeled IPsec, to be able toanswer authoritatively.Given that iptun in onnv-gate has the implicit side effect of preservingdb_credp hence the label from the original ICMP error, it makes sense todo the same by explicitly passing the ts_label_t to the iptun_icmpfunctions.

* 2677: The first argument is always a conn_t *, so there appears to
  be no reason to pass in void * that is then set to a conn_t *.

conn_recv was introduced by FireEngine - conn_recvicmp just follows whatconn_recv started - it doesn't make sense to have them be different.And I've exceeded the quota for how much of FireEngine I'm allowed toclean up ;-)

* 2677: The third argument is always NULL, so why have a third argument?

It is non-NULL for tcp, just like conn_recv has a non-NULL thirdargument for TCP.

* 2769: The first argument is always a conn_t *, so let's not use void *.


See above.

* 2769: The third argument is always NULL, let's remove it.


See above.

* 2781: Can we use ira_ip_hdr_len to make this faster and simple?

It will be a bit faster at the expense of duplicated code. Theduplication would be to have

        outer_hlen = ira->ira_ip_hdr_length;
        if (ira->ira_flags & IRAF_IS_IPV4) {
                outer4 = (ipha_t *)data_mp->b_rptr;
                outer6 = NULL;
        } else {
                outer4 = NULL;
                outer6 = (ip6_t *)data_mp->b_rptr;
        }
        if (MBLKL(data_mp) == outer_hlen) {
                inner_mp = data_mp->b_cont;
                ipha = (ipha_t *)inner_mp->b_rptr;
        } else {
                inner_mp = data_mp;
                ipha = (ipha_t *)(data_mp->b_rptr + outer_hlen);
        }
        switch (IPH_HDR_VERSION(ipha)) {
        case IPV4_VERSION:
                if (inner_mp->b_wptr - (uint8_t *)ipha < sizeof (ipha_t))
                        goto drop;
                inner4 = ipha;
                inner6 = NULL;
                break;
        case IPV6_VERSION:
                if (inner_mp->b_wptr - (uint8_t *)ipha < sizeof (ip6_t))
                        goto drop;
                *inner4 = NULL;
                *inner6 = (ip6_t *)ipha;
                break;
        default:
                goto drop;
        }
thus a subset of what iptun_find_headers does.
Is that what you want?

* 2804: I believe the answer is yes, but danmcd should confirm.  In
  the old world, I believe an ipsec_in_t M_CTL would be pre-pended
  even on a clear-text packet if there was global policy present.
  This would cause ipsec_tun_inbound() to be called on such packets.


I'll check with Dan.

  One would thus assume that IRAF_IPSEC_SECURE should be set if there
  was global policy present in the ip input path(?).

That doesn't seem like the natural semantics of a flag that says SECURE- it shouldn't be set if the packet was received in the clear.

Perhaps an IRAF_IPSEC_HAS_GLOBAL_POLICY would make sense.

* 2989: Don't we lose the db_credp here?

Yes, intentionally. db_credp in the new world is only needed for certainmessages going to sockfs for getpeerucred(), and in the case of TX forcertain messages going to rpcmod.

Using db_credp less means less overloaded semantics.

* 3183,3276: We call iptun_find_headers() twice in a row for 6to4
  output processing.

The alternative would be to pass 5 more arguments (outer{4,6},inner{4,6} and outer_hlen) to iptun_output_common(). Is that better?

* 3246-3259: Can this be done before calling iptun_output_common() so
  that iptun_output_common() becomes a tail-call?


ixa_refrele() can't be done until we are done with the ixa.

If you think the extra stack frame makes a big difference I can apply apragma inline.

* 3297: I'm thinking that it might make sense to unconditionally
  include the labeling overhead in the link MTU if
  is_system_labeled().  This would simplify things a bit I think and
  would reduce the round trips necessary to make pmtu converge on
  labeled systems.  This isn't new with your changes, but just a
  fleeting thought.

The labeling overhead is not constant; different 6to4 destinations canhave CIPSO and not have CIPSO.I don't know if it would make sense to always subtract the maximum sizeof a CIPSO option on a labeled system. Was that what you had in mind?


   Erik
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Start of code review for IP datapath refactoring

Reply via email to