Darren Reed wrote:
> On 27/07/10 04:38 AM, James Carlson wrote:
>> Darren Reed wrote:
>>> Finally, given that you feel so strongly about this and given that
>>> this is OpenSolaris, feel free to file a bug in bugzilla along with
>>> a new design and code that fixes this issue. Nothing speaks louder
>>> in an open source community than contributions of new working code.
>>
>> The only "new design" I would offer would be to back this errant fix out
>> of the gate.  The code was better the way it was before, and the change
>> does not actually fix any real problem that anyone encountered.  It's
>> gratuitous, and I'm surprised that a reviewer didn't object.
>>
>> Given that this is OpenSolaris, I was hoping that we could have a
>> meaningful conversation about this change on a mailing list that's
>> intended for that purpose.  Obviously, that's not to be, because I'm
>> instead getting weird demands that I name other "multitasking operating
>> systems" that implement standardized protocols.
> 
> Let me summarise it like this: there was a feature present that
> that nobody used and if they did, they would need to reboot their
> system after they did so in order to restore proper operation.
> Is that the kind of feature we need/want in Solaris?

Given that we have in fact used this feature as part of SolarMAX stress
testing, I very much doubt that this is a complete accounting of the issues.

The only thing that's really affected by this problem is the old BSD
routing socket interface.  In the case of adding a route, you can always
use the interface name instead of the ifIndex (sockaddr_dl supports
both), and that works fine.  In the case of receiving a "surprising"
truncated rtm_index value, you can do the smart thing and validate all
potentially matching interface information you've got, as dhcpagent
already does by doing a comparison under a 0xffff mask.

And the previous code had the nice property that the 32-bit index
numbers (which _are_ widely used both in ON code and in other projects
like Quagga) are "almost never" reused, because even a busy system is
unlikely to get 2^32 plumb/unplumb sequences in a single OS instance,
while 2^16 plumb/unplumb events is certainly not impossible, and can
even be downright likely if you have some kind of dynamic process
involved -- such as with tunnels or PPP.

And, to the best of my knowledge, there is no known application that was
hindered in any particular way by the existing (albeit admittedly "bad")
BSD compatibility interfaces.  This means the fix wasn't really fixing
any specific problem, but rather addressing some perceived issue.

And the now more likely roll-over event is toxic to SNMP, because it
violates a constraint of the Interfaces MIB, so there's really no way of
knowing how wide the blast radius might be.

Thus my argument is simple: there is no need to "fix" this old interface
by breaking an existing interface.

If there's some compelling need to address the deficiencies in the old
BSD interfaces, I believe those should be addressed head-on rather than
hobbling the underlying mechanisms to fit.  This would mean figuring out
a good way to version the existing interface and addressing the numerous
defects in this area -- including the notable lack of sockaddr.sa_len on
Solaris and truncated (16-bit-only) rtm_flags.

Forcing the underlying ifIndex numbers to stay in a 16-bit range is no
real fix at all, because it addresses only a tiny part of the problem,
and a part that (at least historically) has not really been all that
interesting or limiting.

-- 
James Carlson         42.703N 71.076W         <[email protected]>
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to