Re: Article on spammers and their infrastructure

2009-12-24 Thread Jon Lewis

On Tue, 22 Dec 2009, Leo Vegoda wrote:


  ASSIGNED PA: This address space has been assigned to an End User for use
  with services provided by the issuing LIR. It cannot be kept when
  terminating services provided by the LIR.

My interpretation of the above is ASSIGNED PA is the equivalent of my
assigning IP space to a customer who either buys transit (connectivity)
from us or colo's or buys server hosting from us where they will use that
IP space.  We don't simply lease out IP space for customers to use as
they please on other networks.


I am sure that your interpretation was the original intent of the policy
text. However, the wording could also be read in a way that allows an LIR to
just provide registry services, without providing any connectivity services.


That's one hell of a stretch.  Registry services aren't needed if they 
don't have the IP space, so saying that the service the end user is buying 
that justifies the IP assignment is 'registration services' is a circular 
argument.


--
 Jon Lewis   |  I route
 Senior Network Engineer |  therefore you are
 Atlantic Net|
_ http://www.lewis.org/~jlewis/pgp for PGP public key_



Re: Ipsec/VRF Mpls ?

2009-12-24 Thread Kenny Sallee
Hello Stephane - if you search google for VRF aware IPSEC you will find
links and relevant information and configs.

I did this on older hardware by creating an IPSEC tunnel between 2 routeable
loopbacks and creating a GRE tunnel that used the loopbacks and tunnel
source and destination.  Then place the GRE tunnel in a VRF.

Kenny

On Fri, Dec 18, 2009 at 11:03 PM, Stephane MAGAND stmagconsult...@gmail.com
 wrote:

 Hi

 after a first post with 0 answer (very thanks ..) i test a second post for
 get a small help.

 I am search a simple sample of configuration for a cisco 2821 for connect
 a Ipsec routers ton a MPLS IP VPN Backbone

 My cisco 2821 have two interface, one connected at my MPLS network
 and the second at the Internet.

 I create two vrf, one for a site to site and the second for a Remote User
 Access

 anyone have this into a config ? because i never have used Ipsec actually
 on

 cisco.

 The site-to-site router are a C1721, and remote user use cisco IPSEC client
 and
 i want a radius authentification (and it's the radius that sent the vrf)

 thanks for your help
 Stephane



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Eddy Martinez
On Dec 24, 2009, at 9:51 AM, Randy Bush wrote:

 I'm more persistent than smart, and I tell ya, if you prep well
 enough, you can hand your checklist to a stoned intern and you'll
 have no worries at all.
 
 this works in a tech culture where folk follow mops obsessively.  my
 experience is that most north american engineers are too smart to do
 that, and take shortcuts.
 
 randy
 

Being a North American Engineer, I resent that remark.  =]

I _do_ create action plans and _do_ quarterback each step and _do_ slap down 
any attempt to deviate. 


Eddy





Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Randy Bush
 I _do_ create action plans and _do_ quarterback each step and _do_
 slap down any attempt to deviate.

imagine a network engineering culture where the concept of 'attempt to
deviate' just does not occur.

randy



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Eddy Martinez
On Dec 24, 2009, at 10:09 AM, Randy Bush wrote:

 I _do_ create action plans and _do_ quarterback each step and _do_
 slap down any attempt to deviate.
 
 imagine a network engineering culture where the concept of 'attempt to
 deviate' just does not occur.
 
 randy


=]

The networking group is under control. 

Its the software engineers that start making edits to configs and code on the 
fly, improvisation at its finest. I guess my scope of interaction is greater 
than just networking. The hard part is that its a peer situation and how do you 
elevate the members of another team who have a lessor standard of operation. 
Also, they feel its fine to act like a cowboy and tackle problems on the fly. 
As long as the product is live before the window close. Then there is the 
almighty We can't back out, we already made too many changes that makes me 
want to grab rope and attach it to the ceiling. 

Have a Merry Christmas, 
Eddy 






Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Jim Shankland

Eddy Martinez wrote:

On Dec 24, 2009, at 10:09 AM, Randy Bush wrote:


I _do_ create action plans and _do_ quarterback each step and _do_
slap down any attempt to deviate.

imagine a network engineering culture where the concept of 'attempt to
deviate' just does not occur.


I find the thought of *any* culture in which attempts to deviate
just do not occur a little unnerving.

Jim Shankland

http://blog.oliver-gassner.de/archives/225-Guenter-Eich,-Traeume.html



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread David Andersen
On Dec 24, 2009, at 1:09 PM, Randy Bush wrote:

 I _do_ create action plans and _do_ quarterback each step and _do_
 slap down any attempt to deviate.
 
 imagine a network engineering culture where the concept of 'attempt to
 deviate' just does not occur.

Are you trying to suggest that this is something horrible, or that it's the 
future of network engineering? :)

I'm actually serious in asking the question, despite the grin.

  -Dave


Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Randy Bush
 imagine a network engineering culture where the concept of 'attempt to
 deviate' just does not occur.
 
 Are you trying to suggest that this is something horrible, or that
 it's the future of network engineering? :)

neither.  it is one [type of] ops engineering culture, and a very
successful one.  it seems, from this gaijin's naive point of view, to be
the common one in japan.

when i try to 'sell' configuration automation, they are confused by how
important it is to me.  they have a hard time seeing the need because
mops just work.  my read is that this is because people do not have the
arrogance to take shortcuts.  

when one is raised knowing that one's responsibility to the group is
more important than how smart one may think that one is, mops work.

randy



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Dave Israel



I _do_ create action plans and _do_ quarterback each step and _do_
slap down any attempt to deviate.
  

imagine a network engineering culture where the concept of 'attempt to
deviate' just does not occur.



Are you trying to suggest that this is something horrible, or that it's the 
future of network engineering? :)

I'm actually serious in asking the question, despite the grin.
  


Possibly, he is trying to hint at a connection with Nazis, so somebody 
will mention it, invoking Godwin's Law, and bringing a fruitless 
religious thread to a close.


There's a full range of methods, with just do it on one side, 
deviation is terms for dismissal on the other, and plenty of shades of 
gray in between.  I've seen both extremes result in excessive downtime. 
(How impromptu engineering can go wrong shouldn't take much imagination; 
the no deviation rule is especially hysterical when the backout plan 
doesn't work, but even without that, the one thing didn't work exactly 
right, back it out and try again in two weeks effect is destructive to 
both progress and morale.)  Working with the dynamic and quality of the 
team is more important than any change management paradigm.


-Dave


Re: Article on spammers and their infrastructure

2009-12-24 Thread Jon Lewis
Wouldn't that be kind of pointless?  ARIN policies are proposed by the 
public, not ARIN staff or board members.


https://www.arin.net/policy/pdp.html

 Policy proposals may be submitted by anyone in the global Internet
 community except for members of the ARIN Board of Trustees or the ARIN
 staff.

On Wed, 23 Dec 2009, O'Reirdan, Michael wrote:


JD

Great point, I am more than happy to have a couple of people from ARIN or
RIPE as guests at the next MAAWG in SFO or the subsequent one in Barcelona.

Mike


On 12/23/09 1:18 PM, J.D. Falk jdfalk-li...@cybernothing.org wrote:


On Dec 22, 2009, at 11:58 PM, Christopher Morrow wrote:


On Wed, Dec 23, 2009 at 1:12 AM, Paul Ferguson fergdawgs...@gmail.com

wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Folks should not be so obtuse about these activities. It's almost

blatantly

in-your-face, so to speak. These guys have no fear of retribution.


no real arguement, but... 'please provide some set of workable solutions'

The ARIN meetings (at least) are open, please come and help guide
policies. I'm sure RIPE also wouldn't mind a discussion, if there
could be some positive policy outcome.


Rather than expecting anti-spam researchers to lobby at ARIN  RIPE meetings,
perhaps ARIN  RIPE representatives could visit anti-spam meetings such as
MAAWG to ask how they can help?

I'd be happy to make some introductions.

--
J.D. Falk jdf...@returnpath.net
Return Path Inc










--
 Jon Lewis   |  I route
 Senior Network Engineer |  therefore you are
 Atlantic Net|
_ http://www.lewis.org/~jlewis/pgp for PGP public key_



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Scott Weeks

: this works in a tech culture where folk follow mops obsessively.  my
: experience is that most north americam engineers are too smart to do
: that, and take shoprtcuts

 and _do_ slap down any attempt to deviate

: imagine a network engineering culture where the concept of 'attempt to
: deviate' just does not occur

 the network group is under control


Hopefully, at least some of that was tongue-in-cheek.

For managers: saved LOTS of dollars when deviating from MoPs by fixing AFU 
things not thought of in the MoP.

For fellow netgeeks:  no one woke you up because the AFU things were fixed 
while you slept.

scott



Re: IPv6 allocations, deaggregation, etc.

2009-12-24 Thread Scott Leibrand

On 12/23/2009 12:31 AM, George Bonser wrote:

Apologies in advance for the top post.
   


Likewise.  These are general comments, though, so I don't feel too 
badly...  :-)


It sounds like you're on the right track.  You discovered the 2009-5 
Multiple Discrete Networks draft policy, which should allow you a 
separate /48 for each discrete network.  That is somewhat orthogonal to 
the question of whether you should get separate resources from each RIR 
whose region you operate a network in.  If the networks on different 
continents are discrete, I think the answer there is yes.


I'll also point out another resource for discussing topics like this, 
particularly if it appears that a change in policy would be needed to 
accommodate your needs: ARIN's Public Policy Mailing List (PPML), 
https://www.arin.net/participate/mailing_lists/index.html.  That's where 
2009-5 came from, and I know there are still some needs unmet by current 
ARIN IPv6 address policy, so we're always looking for more good ideas, 
and feedback on the ones being discussed.  At the moment, there are some 
very interesting discussions ongoing about how to rewrite ARIN IPv6 
address policy to simplify it while making provider independent 
addressing more widely available and making it easier to filter traffic 
engineering deaggregates without accidentally filtering multihomed 
networks.  And on the IPv4 side, there are two policy proposals on the 
docket to lower ARIN's minimum allocation size to /23 or /24.


I encourage anyone on this list who's interested in these topics to 
browse the PPML archives, look over the full list of active draft 
policies and policy proposals at https://www.arin.net/policy/proposals/, 
and subscribe to PPML.  We need all the input we can get.


Thanks,
Scott Leibrand
elected volunteer member of the ARIN Advisory Council, but speaking only 
for myself





My initial idea was to use a /48, divide it up into /56 nets for each facility 
with /64 subnets within each facility.  We would announce a /48 to our transit 
providers that I would expect them to announce in turn to their peers and we 
would also announce the more specific /56 nets to the transit providers that I 
would expect them not to announce to their peers.  My current vlan requirements 
per facility would support such an addressing plan.  In order to make that 
work, we would need the same transit providers in each region as our locations 
are not meshed internally.  We don’t have dedicated connectivity from the US to 
the UK or China, for example.  Currently that is not a problem as far as 
connectivity is concerned as my US providers appear in Europe and my China 
provider appears in the US. BUT when I consider the possibilities of South 
America and Africa and finding a transit provider that has a robust presence 
everywhere, my choices are very limited.  I need to be multihomed and I need to 
be provider agnostic in my addressing.



Using that scheme above does create some potential performance issues. While my 
transit provider collects the traffic from a remote location and routes it to 
the more specific location in my network, If a provider in Europe, for example, 
sees only the /48 announced from the US, maybe they haul the traffic across an 
ocean to a point where they peer with my provider … who then must haul it back 
to Europe to the /56 corresponding to the destination because the original 
traffic source doesn’t see my /56 unless they are using the same transit 
provider I am.



Then based on earlier discussion on the list a while back, I was concerned that 
a /48 wasn’t even enough to get me connected to some nets that were apparently 
filtering smaller than a /48 but my mind is somewhat eased in that respect and 
I believe that a /48 announced from space where /48s are issued will be 
accepted by most people.



Then I was informed of ARIN 2009-5 which seems aimed at our situation; data 
centers widely separated by large geographical distances that are fairly 
autonomous and aren’t directly connected by dedicated links.  It now seems that 
we (and the rest of the Internet) might be better served if we get a RIPE AS 
and net block for our Europe operations, and APNIC AS and net block for our 
APAC operations and get a regional /48 that I can split into /56 nets for the 
various satellite facilities within that region as those satellite offices CAN 
be directly connected to the regional data center which would act as the 
regional communications hub.



There are probably 16 different ways to slice this but I would like to get it 
as close to “right” as possible to prevent us having to renumber later while at 
the same time not taking more space than we need.  A /48 per region seems like 
the right way to go at the present time.  So we would have a /48 for the US, a 
/48 for Asia (and possibly one /48 dedicated to China) and a /48 for Europe.  
Satellite facilities would collect a /56 (or two or three) out of that regional 
block 

Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Scott Weeks


flameproof panties == ON  :-)

:mops work.

It depends on who wrote it and the experience the person has (on the particular 
network) who generated it..

scott



The cost of nines

2009-12-24 Thread Eric Brunner-Williams

Hi all,

On the 7th of next month I'll be participating in an ICANN 
consultation on the proposed draft registry agreement, and the number 
of nines that have crept into it, relative to what was expected of 
new registry operators a decade ago, is one of the hidden cost 
increases I will discuss with ICANN's lawyers, who are responsible for 
the extra nines.


I'm looking for sources of cost-per-nine, network provisioning, and 
host provisioning, where host is usually a bunch of boxen, not just 
a pizza box.


The way the requirements are now, a startup of another .museum, say 
for libraries or archives, or a new .coop, or a new linguistic and 
cultural say a .scot, has to provide a higher level of performance 
than Verisign currently does for com/net/name, which is slightly 
absurd, if not worse.


I can cite sources, or not, as preferred, and while CORE is 
comfortable at any number ICANN's lawyers can come up with under the 
theory that more nines is what security and stability mean, my goal 
is to allow real startups, like .museum and .coop were in 2001, not be 
forced to outsource registry operations to an already highly 
capitalized registry service provider, for competition policy reasons.


I'm also in the market for recent failure data, such as Ultra's 
yesterday, and Verisign's v6, not for competitive reasons, but to show 
that the SLA expectation of ICANN's lawyers may need modification if 
placed proximal to actual operational failure data.


Off-list or on, and thanks in advance, from my Yule tree to your own.

Cheers,
Eric





RE: IPv6 allocations, deaggregation, etc.

2009-12-24 Thread George Bonser
 -Original Message-
 From: Scott Leibrand
 
 It sounds like you're on the right track.  You discovered the 2009-5
 Multiple Discrete Networks draft policy, which should allow you a
 separate /48 for each discrete network.  That is somewhat orthogonal to
 the question of whether you should get separate resources from each RIR
 whose region you operate a network in.  If the networks on different
 continents are discrete, I think the answer there is yes.

The extent to which they are discrete is really more of a function of the 
partners those networks serve when it comes to the data centers.  While most of 
our partners are regional, that is more by happenstance than by design and I 
see it changing over time as more of them operate outside of their home 
region.  I also want to ensure a design that allows us to serve anyone from 
anywhere which further fuzzes how discrete each potentially is.  And this is 
actually the part where I am having the most trouble sorting the best practice. 
 There are some advantages to doing it either way.  I could get a /45 to handle 
everything.  Having a /45 would allow me to aggregate /48s where practical 
while obtaining individual /48 networks would not guarantee they would be in 
any sort of contiguous space and not likely allow me to aggregate them even 
where physically possible to do so.  

One possible problem of using a US block globally is that someone might see a 
source address from me and assume it is originating in the US if they are using 
some sort of geolocation in order to direct service.  That might cause me to be 
directed to a sub-optimal service portal depending on who I am communicating 
with.

Getting blocks from the regions served seems to be the way that will cause less 
of a problem overall at the cost of ability to aggregate the blocks should the 
entire network become fully physically integrated at some point in the future.

 I'll also point out another resource for discussing topics like this,
 particularly if it appears that a change in policy would be needed to
 accommodate your needs: ARIN's Public Policy Mailing List (PPML),
 https://www.arin.net/participate/mailing_lists/index.html

Thanks for the pointer, Scott, I will have a look.

George




Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Michael Dillon
 imagine a network engineering culture where the concept of 'attempt to
 deviate' just does not occur.

 Are you trying to suggest that this is something horrible, or that it's the 
 future of network engineering? :)

The model of network engineering that grew up during the 1990s is
forever gone unless you work
in a smaller organization where people have to wear many hats. In the
big ISPs, now identical to
the big telcos, operations and engineering design duties are
separated. The operations folks
do not deviate from the written plans that they work with. If the
slightest thing happens that is not
in the plan, they rollback the changes as specified in the plan. They
don't fix anything unless it
is officially broken with trouble tickets filed and escalations up to
senior management. That is
about the only time that operations people can get away with taking
shortcuts and creative solutions.

On the other hand, the engineering design folks should spend a good
part of their day trying out
things, thinking up new ideas, poking around equipment and software to
see how far it can be pushed.
Then, when they have learned something and are ready to implement it
in the network, they write
a detailed plan for operations. Then some other engineering folks test
the heck out of that design
to try and find fault with it. After all the faults are fixed, it goes
to operations and the engineering
design folks move on to something else unless serious problems occur
and operations needs
a design engineer to approve some sensible action to be taken. The
operations folk can't take
the sensible action because that would deviate from their plans, but
getting engineering design
folks involved, gives them an out for real emergencies.

So the term network engineering is ambiguous because a lot of people
use it to mean the 90's
style job where engineering design activity and operational activity
were all jumbled together.

In some companies, taking the engineering design track not only means
that you lose enable
on the routers, but you lose all TACACS access and have to get
authorisation from a VP just
to ask for a copy of the running config on a production router. Some
people like ops because
they see a lot of stuff go by and learn from it, get their CCIE and
move into design engineering.
Others like ops because they are scared of the responsibility for
thinking up what to do next,
and making a mistake.

As far as I can see, the only way to get a job that mixes ops and
design is to be in 3rd or 4th
level support which is the top of the technical escalation chain where
a few excellent design
engineers do have enable on the routers because they fix important
problems in near realtime.
I suspect that it would be advantageous to have a career in which you
worked for a while in
ops before moving into design engineering if you want to get into
top-level support.

Take all this with a grain of salt. Every company does things a bit
different, and the terminology
that is used is ambiguous. It would be interesting to see what others
have to say about this
answer.

--Michael Dillon



Re: IPv6 allocations, deaggregation, etc.

2009-12-24 Thread Michael Dillon
 I can't in good conscience justify a /32.  That is just too much space.

Then you need to go back to IPv6 101.

 I believe I can, however, justify a separate /48 in Europe and APAC with
 my various offices and data centers in that region coming from the /48
 for that region.

A /48 is for a single site. If you are operating a network connecting many
sites, then you are a network operator and should get a /32 block.

Don't try to fit more into a /48 than one single site.

If you need to announce /33 or /34 prefixes to make things work, then
deal with it. Talk to providers and explain what is going on. IPv6 routing
is in its infancy and many people tend to set it up and let it run on
autopilot. There is no law saying that you must announce one and
only one /32 aggregate everywhere.

For real technical solutions to your problem, you are probably better off
going to the IPv6-ops list  Subscription info is here
http://lists.cluenet.de/mailman/listinfo/ipv6-ops

--Michael Dillon


--Michael Dillon



Re: Article on spammers and their infrastructure

2009-12-24 Thread Leo Vegoda
On Dec 24, 2009, at 8:59 AM, Jon Lewis wrote:

[…]

 I am sure that your interpretation was the original intent of the policy
 text. However, the wording could also be read in a way that allows an LIR to
 just provide registry services, without providing any connectivity services.
 
 That's one hell of a stretch.  Registry services aren't needed if they 
 don't have the IP space, so saying that the service the end user is buying 
 that justifies the IP assignment is 'registration services' is a circular 
 argument.

Of course - but if you wanted to provide services to spammers and their friends 
it's the sort of stretch you'd find yourself making.

Regards,

Leo


RE: IPv6 allocations, deaggregation, etc.

2009-12-24 Thread George Bonser


 -Original Message-
 From: Michael Dillon [mailto:wavetos...@googlemail.com]
 Sent: Thursday, December 24, 2009 4:11 PM
 To: nanog@nanog.org
 Subject: Re: IPv6 allocations, deaggregation, etc.
 
  I can't in good conscience justify a /32.  That is just too much
 space.
 
 Then you need to go back to IPv6 101.

This is an end user application, not an ISP application.

Something between a /32 and a /48 would suffice.  The idea was that a /32 is 
too large (in my opinion) for an organization that isn't planning on having 
more than 20 sites in the next 5 years.  If it were 200, that would be a 
different story.

If having a block smaller than a /32 breaks something, then it needs to break 
early so it can be addressed before things progress much further.  And getting 
a /32 would appear to violate ARIN's policy:

6.5.8.2. Initial assignment size

Organizations that meet the direct assignment criteria are eligible to receive 
a direct assignment. The minimum size of the assignment is /48. Organizations 
requesting a larger assignment must provide documentation justifying the need 
for additional subnets. An HD-Ratio of .94 must be met for all assignments 
larger than a /48.

These assignments shall be made from a distinctly identified prefix and shall 
be made with a reservation for growth of at least a /44. This reservation may 
be assigned to other organizations later, at ARIN's discretion.



If we were to number all sites globally into a /45, we could meet the .94 
HD-Ratio but with the potential problems noted in earlier traffic on this 
thread.  I am now leaning toward expanding my request to a /45 if we go with a 
global block or a /46 if we go with only using ARIN allocations in North 
American operations. 

 Don't try to fit more into a /48 than one single site.

Yeah, I think I pretty much get that, at this point.  I can hang small 
offices off of a data center, giving them one or more /56 nets each but yeah, 
trying to split a /48 between data centers is probably counter-productive.


 If you need to announce /33 or /34 prefixes to make things work, then
 deal with it. Talk to providers and explain what is going on. IPv6
 routing
 is in its infancy and many people tend to set it up and let it run on
 autopilot. There is no law saying that you must announce one and
 only one /32 aggregate everywhere.

Agreed.  Wasn't planning on it but if we did eventually become fully integrated 
globally, I would probably announce the larger aggregate(s) out of one main 
location, maybe handing any unassigned traffic to a honey-net or something.  At 
least if a mistake is made somewhere in addressing, that would give me a 
backstop so that we could provide a temporary fix for the problem quickly 
until it got fixed correctly.  If someone misconfigures something and traffic 
goes out with the wrong subnet SA but still in our block (say someone 
transposes a couple of subnet digits someplace), at least the reply traffic 
would come back to someplace I have some control over and could route (or 
tunnel) the reply traffic back to where it needs to go until the root cause 
could be fixed.  It would be ugly and slow for a while but it wouldn't be 
completely broken until a maintenance window where we could correct the 
underlying problem.  Things like that offers an opportunity to fix emergencies 
quickly and schedule more disruptive corrective actions for a later time when 
people can plan for the outage.  It is yet another advantage of having a larger 
global block over a gaggle of smaller scattered blocks.

 
 For real technical solutions to your problem, you are probably better
 off
 going to the IPv6-ops list  

Signed up yesterday :)

 
 --Michael Dillon

Thanks, Michael.

George



Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Dobbins, Roland

On Dec 25, 2009, at 7:01 AM, Michael Dillon wrote:

 It would be interesting to see what others have to say about this answer.

I think it's a pretty accurate summation of how these things work in a lot of 
big organizations, all over the world.

There's a detrimental side to it, in that in the engineering org, the 
near-complete siloing away from ops can lead to an ivory-tower/King Canute type 
of mentality; in the ops org, this phenomenon in turn can lead to increasing 
frustration and lowered morale, which in turn leads to apathy and poor customer 
service. 

All too often, one ends up with mutually-hostile engineering and ops teams who 
waste time and energy actively working to frustrate one another's ambitions, 
rather than combining their efforts to design, build, and operate the best 
network possible.  Which in turn leads to many of the frustrations experienced 
every day by the end-customer.

---
Roland Dobbins rdobb...@arbor.net // http://www.arbornetworks.com

Injustice is relatively easy to bear; what stings is justice.

-- H.L. Mencken






RE: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread George Bonser


 -Original Message-
 From: Dobbins, Roland

 On Dec 25, 2009, at 7:01 AM, Michael Dillon wrote:
 
  It would be interesting to see what others have to say about this
 answer.
 
 I think it's a pretty accurate summation of how these things work in a
 lot of big organizations, all over the world.


I think that one must keep in mind that there are two kinds of
check-lists.  There is a takeoff list where you can always choose to go
back to the ramp and fly another day if something doesn't check out but
there is a different priority when someone is already in the air and
something goes wrong.  You can't decide to land a different day.  In
that case you must rely on experience and knowledge to handle the
situation as it presents itself.  Sure, you can have some basic checks
for things even in an emergency but you can't know how the problem is
going to present itself ahead of time.  In cases like that you have set
of general parameters but the person at the controls needs to have
leeway to both clearly identify the nature of the problem and mitigate
the same if possible and that might include calling in some extra eyes
in order to identify things that might be going on with applications or
other devices that aren't specifically network gear.

So you can put a lot of process around changes in advance but there
isn't quite as much to manage incidents that strike out of the clear
blue.  Too much process at that point could impede progress in clearing
the issue.  Capt. Sullenberger did not need to fill out an incident
report, bring up a conference bridge, and give a detailed description of
what was happening with his plane, the status of all subsystems, and his
proposed plan of action (subject to consensus of those on the conference
bridge) and get approval for deviation from his initial flight plan
before he took the required actions to land the plane as best as he
could under the circumstances.  And while that is a bit extreme in the
sense of most networks in that lives are not often at stake, some
concepts are the same (and there might be networks supporting various
occupations on this planet where lives might actually be at stake in the
case of a network failure during some sort of activity).

One of the most efficient shops I worked in was when the production
internet operation was owned by the engineering department.  Corporate
operations owned the internal corporate IT, but engineering owned the
internet production data centers and network operations.  If engineering
released a code revision that blew up the network, the VP of Engineering
was responsible for the entire picture, not just the software piece.
Same is true where a networking change blew up the application.  Having
the responsibility for the entire system (software, hardware
platforms, and networking) under the same organization resulted in a lot
smoother operation without backbiting and greater access to and sharing
of resources between the application engineers, the systems
administrators, and the network engineers.




Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Dobbins, Roland

On Dec 25, 2009, at 9:27 AM, George Bonser wrote:

 Capt. Sullenberger did not need to fill out an incident
 report, bring up a conference bridge, and give a detailed description of
 what was happening with his plane, the status of all subsystems, and his
 proposed plan of action (subject to consensus of those on the conference
 bridge) and get approval for deviation from his initial flight plan
 before he took the required actions to land the plane as best as he
 could under the circumstances.

Conversely, the ever-increasing outright hostility and contempt evinced towards 
their customers by airlines worldwide -  especially US-based airlines - over 
the last decade or so, all in the name of 'regulations', offers a useful 
counterexample.

When it comes to larger organizations, this latter scenario is more the norm 
than what you describe, in my experience.  Critical problems are left 
unresolved for days/weeks/months; if one attempts to report an issue which is 
causing problems for many of an organizations customers worldwide, but one 
isn't oneself a direct customer of said organization, one is often as not 
ignored and shunted aside.

This isn't specific to the SP realm; it's simply a function of increased size, 
which leads to increased bureaucritization, which leads to dehumanization and 
the subordination of the organization's ostensible goals to internal politics, 
one-upsmanship, and blame-laying, no matter the industry in question.  The 
folks with a can-do attitude who're willing to buck the system in order to do 
the right thing for the customer stand out in stark contrast to their peers, 
and in many cases end up paying a price in terms of career advancement because 
of their willingness to Do The Right Thing.

'Process' is all too often merely a ruse designed to avoid responsibility, 
shift blame/liability, justify hiring lower-cost/unqualified employees whilst 
shedding expensive/competent employees, and indulge in empire-building.  We've 
seen this throughout corporate America with the 'permanent Y2K' of SoX and 
HIPAA, and the increasing involvement of government in terms of 
telecommunications-related rule-making which ends up directly affecting SPs.

I'm a big advocate of standards and change-control, and not an advocate of 
seat-of-the-pants, midnight engineering - except when the latter is necessary, 
as in the examples you give.  

Unfortunately, many folks who work in larger organizations are actively 
prohibited from indulging in fluid, situationally-approrpriate problem 
resolution; and because of the aforementioned siloing of ops and engineering, 
their valuable first-hand experiences and the lessons learned thereby aren't 
taken into account during the design and rulemaking processes.

---
Roland Dobbins rdobb...@arbor.net // http://www.arbornetworks.com

Injustice is relatively easy to bear; what stings is justice.

-- H.L. Mencken






Re: Revisiting the Aviation Safety vs. Networking discussion

2009-12-24 Thread Scott Howard
On Thu, Dec 24, 2009 at 6:27 PM, George Bonser gbon...@seven.com wrote:

 So you can put a lot of process around changes in advance but there
 isn't quite as much to manage incidents that strike out of the clear
 blue.  Too much process at that point could impede progress in clearing
 the issue.  Capt. Sullenberger did not need to fill out an incident
 report, bring up a conference bridge, and give a detailed description of
 what was happening with his plane, the status of all subsystems, and his
 proposed plan of action (subject to consensus of those on the conference
 bridge) and get approval for deviation from his initial flight plan
 before he took the required actions to land the plane as best as he
 could under the circumstances.



*mayday mayday mayday. **Cactus fifteen thirty nine hit birds, we've lost
thrust (in/on) both engines we're turning back towards LaGuardia* - Capt.
Sullenberger

Not exactly detailed, but he definitely initiated an incident report
(the mayday), gave a description of what was happening with his plane, the
status of [the relevant] subsystems, and his proposed plan of action -
even in the order you've asked for!

His actions were then subject to the consensus of those on the conference
bridge (ie, ATC) who could have denied his actions if they believed they
would have made the situation worse (ie, if what they were proposing would
have had them on a collision course with another plane). In this case, the
conference bridge gave approval for his course of action (*ok uh, you need
to return to LaGuardia? turn left heading of uh two two zero.* - ATC)

5 seconds before they made the above call they were reaching for the QRH
(Quick Reference Handbook), which contains checklists of the steps to take
in such a situation - including what to do in the event of loss of both
engines due to multiple birdstrikes.  They had no need to confer with others
as to what actions to take to try and recover from the problem, or what
order to take them in, because that pre-work had already been carried out
when the check-lists were written.

Of course, at the end of the day, training, skill and experience played a
very large part in what transpired - but so did the actions of the people on
the conference bridge (You can't get much more of a conference bridge
than open radio frequencies), and the checklists they have for almost every
conceivable situation.

  Scott.