Nils Goroll wrote:
Hi Erik and all,
I am sorry for this very late comment, I should have replied during the
review
phase, but I didn't get to it. It's only been today that I got reminded
by the commit notification and read through the PSARC emails and some
of the discussion in networking-discuss.
Thanks for your questions and comments.
First of all, two questions on the new source IP selection:
- Will it still be possible to use routing entries to force use of a
certain
source address by destination? I've seen a couple of installations
making use
of this.
What type of configuration is this?
Let me venture a guess. You have two (or more) IP subnet prefixes
assigned to a wire and the router(s) have an IP address in each subnet.
The the host has something like
bge0 10.1.0.33/24
bge0:1 10.2.0.33/24
with 10.1.0.1 and 10.2.0.1 being the router's IP address.
Then you could have some statically added routes point at 10.1.0.1 and
some point at 10.2.0.1, and that would affect the source address
selected. Is that a close to the configuration?
The fact that it worked this was in S10 (and until recently in Nevada)
is more or less an accident. In fact, if you configure IPMP you'd find
that all the source addresses (on bge0 and the other NICs in the IPMP
group) will be selected in a round-robin fashion.
With IP datapath refactoring it behaves the same way whether or not IPMP
is configured.
Part of the issue we had to fix was this confusion between routing and
source address selection. IP routing selects a nexthop (the IP address
of the router) and an outbound (physical) interface. That outbound
interface is then used to find a candidate set IP addresses. There
aren't RFCs that proscribe this for IPv4 since almost everybody did this
based on the BSD source base 20 years ago. For IPv6 it is actually part
of the RFC set. And none of that talks about any "logical interfaces"
like we have in Solaris.
It would be good to know what external factors drove the configuration
in this direction, to see what other approaches we have available. In
the case of shared-IP zones things are different since we have the added
constraint the the source address be assigned to the zone.
- Will explicit source address selection via in_pktinfo_t sill work? I am
looking after a couple of customer cases descending from a (very) old
case
regarding source address selection for RPC over UDP with
(pre-clearview) IPMP
and I would very much like to see them get closed one day (greetings to
everyone involved in those cases, you'll know what I am referring to).
Yes, the IPv4 and IPv6 pktinfo socket options can be used to set the
source address. So does the use of bind(), and IP_MULTICAST_IF for IPv4
multicast.
I would like to mention that a long time ago I had to debug weird
behavior of a
certain firewall HA solution which used multicast MAC addresses for
unicast IP
addresses in order to achieve load spreading and uninterrupted service
by having
hosts send all traffic to all nodes of a cluster (the interfaces of
nodes were
members of the same multicast group and ARP requests for the unicast
"cluster IP
address" were answered with the MAC address corresponding to that
multicast IP
address).
This certainly is quite an exotic case (and an approach which, friendly
put, I
didn't find particularly clean) and I am not sure if such applications
still
exist, but it might be relevant to check if the refactored code could
handle
this properly.
While I'm not sure it is well-defined what "properly" means here
(sending unicast IP packets to a multicast MAC address is very
questionable) I'm not aware of any changes we've made to ARP that would
reject such things.
IIRC, I've used the now discontinued behavior as a simple and flexible
configuration on GigE networks with jumbo frames supported by *some*
hosts only.
By using logical addresses on the server with different MTUs, clients could
select between "default MTU" and "jumbo frame enabled" server addresses,
which
was really useful for optimizing performance for UDP based applications
(I am
generalizing, I only ever used this for NFS/UDP).
From an administrator's standpoint, the advantage of this configuration
was
that no client IP specific configuration was involved as with setting
routes
with -mtu.
So you would have two separate IP subnets on the same wire, and each
client would somehow be configured to know that bge0 has a mtu of 9k and
bge0:1 has an mtu of 1500?> I am aware that the particular customer
installation I am talking about
> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>
With those configurations, does broadcast and multicast of large packets
actually work?
I would suggest you try
ping -sn 224.0.0.1 8192> I am aware that the particular customer
installation I am talking about
> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>
ping -sn ff02::1 8192
ping -sn -i bge0 255.255.255.255 8192
and make sure that all the systems on the Ethernet respond.
For the above problem of wanting to selectively use jumbo-frames, since
you already have separate IP subnet prefixes, why not just run those IP
subnets on separate VLANs? That would ensure that multicast and
broadcast works as expected.
My questions are:
* Could you please explain why, in the former implementation, it was not
well
defined which MTU was applied for multicast packets?
Try the above pings and let me know how it goes.
The issue that is obvious for multicast and for the 255.255.255.255
broaedcast is that these packets are routed out a physical interface
since there is no correlation between the destination IP address and any
IP subnet prefix.
But as I mentioned above, the IP architecture that everybody seems to
assume for IPv4 and have written down for IPv6 is about routing packets
out (physical) interfaces. This can be found deep down in the MIB RFCs
that describe how routes are represented. The Solaris notion of logical
interfaces doesn't fit into those descriptions of routing.
* Why would it cause any harm to keep an interface MTU for locical
interfaces?
It causes confusion, and is by definition incomplete and unworkable as
shown by the case of multicast and all-ones broadcast.
My understanding is that
- for unicast packets, the effective MTU would be the minimum of the
MTUs of
the locical interface, the physical interface, the destination ire
and the
destionation IP
- for multicast packets, the effective MTU would be the minimum of the
MTUs of
the locical interface and the physical interface
The issue is that packets are routed out physical network interfaces.
Which logical interface does the 224.0.0.1 address belong to?
Erik
_______________________________________________
networking-discuss mailing list
[email protected]