On Sun, 2009-09-27 at 14:01 -0700, Erik Nordmark wrote:
> Sebastien Roy wrote:
>
> > usr/src/uts/common/inet/iptun/iptun.c
> > * 759: This looks like it belongs in conn_get_ixa() as there is only
> > one possible value for the ixa_cred of the ixa returned by
> > conn_get_ixa(), and that is conn_cred.
>
> That code isn't needed; too much copy and paste. (It only applies to UDP
> where we want to track the cred for each send* so that SCM_UCRED can be
> used on the receive side.)
> However, re-reading the TX code I realize we need to always restart with
> ixa_tsl = crgetlabel(ixa_cred) before we call tsol_check_dest(), but
> that code belongs in ip_set_destination_*() where we call tsol_check_dest().
Okay.
>
> > * 761-769: This also looks like something that should be in
> > conn_get_ixa().
>
> Ditto. (it only applies to UDP ancillary data where the label change
> for each packet due to SCM_UCRED).
Okay.
>
> > * 768: Won't we need to set ixa_tsl on a per-packet basis?
>
> Yes, we do that by calling tsol_check_label_* in the output path. But to
> do that we need the initial label which is based on the zone.
Got it.
> > * 862: This can't happen because we call iptun_unbind() before
> > iptun_bind() if the tunnel is already bound. I suppose you could
> > optimize things to make the iptun_unbind() unnecessary if you want.
>
> I'll remove 862.
Perhaps an ASSERT() as well?
> > * 870,872: We should just have an ipcl_conn_insert() that calls
> > ipcl_conn_insert_v[4|6]() depending on conn_ipversion.
> > tcp_conn_request() and conn_connect() both have the same little
> > if/else dance.
>
> OK, but for uniformity it makes sense to do the same for ipcl_bind_insert.
Yes.
> > * 1795: There are some places in the code where ip_get_pmtu() is
> > called with an argument of connp->conn_ixa directly without first
> > getting a safe reference using conn_get_ixa() (in icmp_icmp_input()
> > for example). Is that deliberate?
>
> We only call ip_get_pmtu() in one place, but there are places where we
> use conn_ixa directly. However, currently that is only in iptun_create().
> Where did you see it be used?
The question was more about the usage in icmp_icmp_input(), where I had
noticed conn_ixa directly. Is the usage there safe?
> >
> > * 1976: What is this call to ip_attr_connect() about?
>
> If ixa_ire was NULL we got a safe copy.
>
> But I think I'll change the ixa_notify upcall to pass up the
> ip_xmit_attr_t; that way we will have the currently used ip_xmit_attr_t
> and can modify it directly which avoids the conn_get_ixa in
> iptun_get_maxmtu.
Good idea.
> > * 2093: The ICMP error doesn't need to have the same cred as the
> > original packet?
>
> I read RFC 791 and there aren't credentials for IP packets ;-)
>
> What is the purpose of the cred on the packet?
> We use them for getpeerucred() and SCM_UCRED. They have also been
> overloaded to carry a label, but with refactoring we have the explicit
> ixa_tsl for the label.
I'm simply concerned that the TX label associated with the ICMP error
will be correct. If your code ensures that, then that's fine.
> > * 2150,2200: The cred associated with the conn_t won't necessarily be the
> > same as the cred associated with the data. The label we want in the
> > packet is the one associated with the data, right? Labels used to
> > be computed using dblk creds...
>
> I was hoping to already have merged with labeled IPsec, to be able to
> answer authoritatively.
> Given that iptun in onnv-gate has the implicit side effect of preserving
> db_credp hence the label from the original ICMP error, it makes sense to
> do the same by explicitly passing the ts_label_t to the iptun_icmp
> functions.
Okay.
> > * 2677: The first argument is always a conn_t *, so there appears to
> > be no reason to pass in void * that is then set to a conn_t *.
>
> conn_recv was introduced by FireEngine - conn_recvicmp just follows what
> conn_recv started - it doesn't make sense to have them be different.
> And I've exceeded the quota for how much of FireEngine I'm allowed to
> clean up ;-)
I don't see why they couldn't be different, but okay, it's your call.
> > * 2677: The third argument is always NULL, so why have a third argument?
>
> It is non-NULL for tcp, just like conn_recv has a non-NULL third
> argument for TCP.
I missed that.
> > * 2781: Can we use ira_ip_hdr_len to make this faster and simple?
>
> It will be a bit faster at the expense of duplicated code. The
> duplication would be to have
> outer_hlen = ira->ira_ip_hdr_length;
> if (ira->ira_flags & IRAF_IS_IPV4) {
> outer4 = (ipha_t *)data_mp->b_rptr;
> outer6 = NULL;
> } else {
> outer4 = NULL;
> outer6 = (ip6_t *)data_mp->b_rptr;
> }
> if (MBLKL(data_mp) == outer_hlen) {
> inner_mp = data_mp->b_cont;
> ipha = (ipha_t *)inner_mp->b_rptr;
> } else {
> inner_mp = data_mp;
> ipha = (ipha_t *)(data_mp->b_rptr + outer_hlen);
> }
> switch (IPH_HDR_VERSION(ipha)) {
> case IPV4_VERSION:
> if (inner_mp->b_wptr - (uint8_t *)ipha < sizeof (ipha_t))
> goto drop;
> inner4 = ipha;
> inner6 = NULL;
> break;
> case IPV6_VERSION:
> if (inner_mp->b_wptr - (uint8_t *)ipha < sizeof (ip6_t))
> goto drop;
> *inner4 = NULL;
> *inner6 = (ip6_t *)ipha;
> break;
> default:
> goto drop;
> }
> thus a subset of what iptun_find_headers does.
> Is that what you want?
Not quite, I was thinking that iptun_find_headers() could take an
outer_hlen as input if that was already a known quantity. That would
make the IPv6 input path a tad faster in the face of extension headers,
which there are by default (the encapsulation limit option).
> > * 2804: I believe the answer is yes, but danmcd should confirm. In
> > the old world, I believe an ipsec_in_t M_CTL would be pre-pended
> > even on a clear-text packet if there was global policy present.
> > This would cause ipsec_tun_inbound() to be called on such packets.
>
> I'll check with Dan.
Okay.
>
> > One would thus assume that IRAF_IPSEC_SECURE should be set if there
> > was global policy present in the ip input path(?).
>
> That doesn't seem like the natural semantics of a flag that says SECURE
> - it shouldn't be set if the packet was received in the clear.
> Perhaps an IRAF_IPSEC_HAS_GLOBAL_POLICY would make sense.
That's fine with me.
>
> > * 2989: Don't we lose the db_credp here?
>
> Yes, intentionally. db_credp in the new world is only needed for certain
> messages going to sockfs for getpeerucred(), and in the case of TX for
> certain messages going to rpcmod.
> Using db_credp less means less overloaded semantics.
Got it.
>
> > * 3183,3276: We call iptun_find_headers() twice in a row for 6to4
> > output processing.
>
> The alternative would be to pass 5 more arguments (outer{4,6},
> inner{4,6} and outer_hlen) to iptun_output_common(). Is that better?
I suppose not. I thought that perhaps this could be fixed by
reorganizing the calling chain, but it doesn't look that simple.
>
> > * 3246-3259: Can this be done before calling iptun_output_common() so
> > that iptun_output_common() becomes a tail-call?
>
> ixa_refrele() can't be done until we are done with the ixa.
>
> If you think the extra stack frame makes a big difference I can apply a
> pragma inline.
Not a big deal; if it's not a trivial fix, I wouldn't go out of my way
for the stack frame savings.
>
> > * 3297: I'm thinking that it might make sense to unconditionally
> > include the labeling overhead in the link MTU if
> > is_system_labeled(). This would simplify things a bit I think and
> > would reduce the round trips necessary to make pmtu converge on
> > labeled systems. This isn't new with your changes, but just a
> > fleeting thought.
>
> The labeling overhead is not constant; different 6to4 destinations can
> have CIPSO and not have CIPSO.
> I don't know if it would make sense to always subtract the maximum size
> of a CIPSO option on a labeled system. Was that what you had in mind?
Yes (this is what IPsec overhead is; a maximum possible overhead), but I
admittedly don't know enough about TX labeling to know if it makes sense
to do something similar. An RFE for another day.
Thanks,
-Seb
_______________________________________________
networking-discuss mailing list
[email protected]