Re: Fiber Cut in CA?

2010-02-02 Thread charles
That is one long protect path. Yikes. 


Sent via BlackBerry from T-Mobile



Re: Mitigating human error in the SP

2010-02-02 Thread gb10hkzo-nanog



>>Otherwise, as Suresh notes, the only way to eliminate human error completely 
>>is 
>>to eliminate the presence of humans in the activity. 
and,hence by reference.
>> Automated config deployment / provisioning.

That's the funniest thing I've read all day... ;-)

A little pessimistic rant ;-)

Who writes the scripts that you use, who writes the software that you use ?
There will always be at least one human somewhere, and where there's a human 
writing software tools, there's scope for bugs and unexpected issues.  Whether 
inadvertent or not, they will always be there.

If the excrement is going to hit the proverbial fan, try as you might to stop 
it, it will happen.  Nothing in the IT / ISP / Telco world is ever going to be 
perfect, far too complex with many dependencies.   Yes you might play in your 
perfect little labs until the cows come home . but there always has been 
and always will be an element of risk when you start making changes in 
production.

Face it, unless you follow the rigorous change control and development 
practices that they use for avionics or other high-risk environments, you are 
always going to be left with some element of risk.

How much risk your company is prepared to take is something for the men in 
black (suits) to decide because it correlates directly with how much $$$ they 
are prepared to throw your way to help you mitigate the risk .;-)

That's my 2  over .. thanks for listening (or not !) 
;-)






Re: [Pauldotcom] Skiddy Interview

2010-02-02 Thread Rick Tait
This self-proclaimed "hacker" was no more than a script-kiddie with a spot
of luck and half a brain enough to follow social-media-password chains. What
an absolute buffoon. I applaud the English student interviewer for
maintaining his composure while wasting his time with the "black hat", who
by the way [I'll save listmembers the torture of listening to elevated rants
about his "sql injections in the PHP" and his "paypal hacks" and his net
total $ gain of maybe $5000  over 4+ years of his extreme hacking] is 18
years old, in the middle of getting married and starting a family, but he'd
give it all up to work for the FBI to "you know, help out with the hacking
scene, and stop all the DDoSing wars between hacker crews".

Perhaps the most appalling thing was that he could barely muster up a single
question to ask his composed, educated and interesting interviewer, except
what it would be like to meet a "black hat", [ie himself]. My god. One
almost wants a member of a real (ie, foreign, serious $$$-earning) hacker
"crew" to swoop down on one of these morons and pwn them in such an
egregious manner so they might understand what a real penetration expert is
and be so scared as to. just. stop.

*facepalm*

--
Rick Tait
e: ri...@stickam.com
t: 213-915-UNIX

Charles de 
Gaulle
- "The better I get to know men, the more I find myself loving dogs."

On Sat, Jan 30, 2010 at 2:36 PM, andrew.wallace <
andrew.wall...@rocketmail.com> wrote:

> -- Forwarded message --
> From: andrew.wallace 
> Date: Sat, Jan 30, 2010 at 9:31 PM
> Subject: Re: [Pauldotcom] Skiddy Interview
> To: Adrian Crenshaw 
> Cc: PaulDotCom Security Weekly Mailing List <
> pauldot...@mail.pauldotcom.com>
>
>
> On Sat, Jan 30, 2010 at 3:10 PM, Adrian Crenshaw 
> wrote:
> > Kind of interesting Skiddy Interview:
> >
> > http://hackerpublicradio.org/eps/hpr0505.mp3
> >
> > Guy seems pretty uneducated, but it gives you an idea of the mentality.
> No
> > offence meant to the HPR podcast, it has some good stuff.
> > Like your comments.
> >
> > Adrian
> >
>
> He mentions selling a Bank of America employee account starting around
> 7 minutes 40 seconds, which just suffered a Denial of Service attack
> to its website.
>
> http://isc.sans.org/diary.html?storyid=8119
>
> Any connection?
>
> Of course probably not, but just thought i'd throw it out there anyway.
>
> Andrew
>
>


Re: Mitigating human error in the SP

2010-02-02 Thread gordon b slater
On Tue, 2010-02-02 at 12:26 +, gb10hkzo-na...@yahoo.co.uk wrote:

>  Nothing in the IT / ISP / Telco world is ever going to be perfect, 
>  far too complex with many dependencies.   Yes you might play in your 
>  perfect little labs until the cows come home . but there always
>  has been and always will be an element of risk when you start making
>  changes in production.
>  Face it, unless you follow the rigorous change control and development
>  practices that they use for avionics or other high-risk environments,
>  you are always going to be left with some element of risk.

Agreed.

I'd say that 10 minutes of checklist creation at the onset of a change
plan, then 5 minutes of checklist revision/debrief per day is time well
spent. After a couple of months attitudes to SOPs usually change.

_insert duplicate of aviation-style check-listing and human factors
reporting thread here_  


Gord

--
next thread: Stateful Firewalls vs Randy, round two, `ding-ding`
followed by: "help - SORBS has me blacklisted", again
:)




Re: Mitigating human error in the SP

2010-02-02 Thread Mark Smith
On Mon, 1 Feb 2010 21:21:52 -0500
Chadwick Sorrell  wrote:

> Hello NANOG,
> 
> Long time listener, first time caller.
> 
> A recent organizational change at my company has put someone in charge
> who is determined to make things perfect.  We are a service provider,
> not an enterprise company, and our business is doing provisioning work
> during the day.  We recently experienced an outage when an engineer,
> troubleshooting a failed turn-up, changed the ethertype on the wrong
> port losing both management and customer data on said device.  This
> isn't a common occurrence, and the engineer in question has a pristine
> track record.
> 

Why didn't the customer have a backup link if their service was so
important to them and indirectly your upper management? If your
upper management are taking this problem that seriously, then your
*sales people* didn't do their job properly - they should be ensuring
that customers with high availability requirements have a backup link,
or aren't led to believe that the single-point-of-failure service will
be highly available.


> This outage, of a high profile customer, triggered upper management to
> react by calling a meeting just days after.  Put bluntly, we've been
> told "Human errors are unacceptable, and they will be completely
> eliminated.  One is too many."
> 

If upper management don't understand that human error is a risk factor
that can't be completely eliminated, then I suggest "self-eliminating"
and find yourself a job somewhere else. The only way you'll avoid
human error having any impact on production services is to not change
anything - which pretty much means not having a job anyway ...


> I am asking the respectable NANOG engineers
> 
> What measures have you taken to mitigate human mistakes?
> 
> Have they been successful?
> 
> Any other comments on the subject would be appreciated, we would like
> to come to our next meeting armed and dangerous.
> 
> Thanks!
> Chad
> 



Re: Mitigating human error in the SP

2010-02-02 Thread Nick Hilliard
On 02/02/2010 02:21, Chadwick Sorrell wrote:
> This outage, of a high profile customer, triggered upper management to
> react by calling a meeting just days after.  Put bluntly, we've been
> told "Human errors are unacceptable, and they will be completely
> eliminated.  One is too many."

Leaving the PHB rhetoric aside for a few moments, this comes down to two
things: 1. cost vs. return and 2. realisation that service availability is
a matter of risk management, not a product bolt-on that you can install in
your operations department in a matter of days.

Pilot error can be substantially reduced by a variety of different things,
most notably good quality training, good quality procedures and
documentation, lab staging of all potentially service-affecting operations,
automation of lots of tasks, good quality change management control,
pre/post project analysis, and basic risk analysis of all regular procedures.

You'll note that all of these things cost time and money to develop,
implement and maintain; also, depending on the operational service model
which you currently use, some of them may dramatically affect operational
productivity one way or another.  This often leads to a significant
increase in staffing / resourcing costs in order to maintain similar levels
of operational service.  It also tends to lead to inflexibility at various
levels, which can have a knock-on effect in terms of customer expectation.

Other things which will help your situation from a customer interaction
point of view is rigorous use of maintenance windows and good
communications to ensure that they understand that there are risks
associated with maintenance.

Your management is obviously pretty upset about this incident.  If they
want things to change, then they need to realise that reducing pilot error
is not just a matter of getting someone to bark at the tech people until
the problem goes away.  They need to be fully aware at all levels that risk
management of this sort is a major undertaking for a small company, and
that it needs their full support and buy-in.

Nick



RE: [NANOG] NREN Network Design

2010-02-02 Thread Tarig Y. Adam
Hi Rashed

This my first time to hear about app-tec company. I'm very happy to see this in 
Sudan. In fact We applied for PI address form AfriNIC and our request is 
approved. But the current design depends on our ISP (SUDATEL) and we using 
their routers. we have a router (One point to access our NREN) in every 
University, and two NOCs in our TOP Universities (Sudan, and Khartoum), for 
connecting them our ISP interconnect them with MPLS VPN layer3 at Provider 
Edge. So our traffic is routed via through their ISP. I think this is the 
problem.
but how we can solve it

Thanks
Normal0falsefalsefalseEN-USX-NONEAR-SAMicrosoftInternetExplorer4
 /* Style Definitions */
 table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
-
Eng. Tarig Yassin Adam
Sudanese Universities' Information Network (SUIN)
T:
+249925659149

- Original Message -
From: Rashed Alwarrag 
To: Tarig Yassin Adam  ,   
Cc:
Date: Tuesday, February 2 2010 08:01 AM
Subject: RE: [NANOG] NREN Network Design

Tariq
It's really nice to hear from Sudan in NANOG :) , the problem as Alex state 
it's not clear at all a PI address / BGP peering could be a solutions for it , 
VNE (Virtual Network Environment) it's to isolate the applications located in 
one machine in virtual networks like ( VMware ) using it , so can you please 
give us more details about the problem 

Thanks a lot 

Rashed Alwarrag
Applied Technologies 
NOC Manager 



-Original Message-
From: Tarig Yassin Adam [mailto:ta...@suin.edu.sd] 
Sent: Tuesday, February 02, 2010 5:26 AM
To: nanog@nanog.org
Subject: [NANOG] NREN Network Design

I'm try to redesign the Sudanese NREN (National Research & Education Network). 
we provide end to end service,to our customers. Our network is build over local 
ISPs. But the problem of the current design that each time we need to go back 
to the ISP to change our Infrastructure IP addresses, when the situation need 
this.How can solve this? I heard about something called Virtual Network 
Environment which give us the full control of ISP routers, is it the best 
solution? What about others NRENs.Thanks

Normal0falsefalsefalseEN-USX-NONEAR-SAMicrosoftInternetExplorer4
 /* Style Definitions */
 table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
-
Eng. Tarig Yassin Adam
Sudanese Universities' Information Network (SUIN)
T:
+249925659149
E: ta...@sustech.edu


Re: Mitigating human error in the SP

2010-02-02 Thread Paul Corrao
Humans make errors.  

For your upper management to think  they can build a foundation of reliability 
on the theory that humans won't make errors is self deceiving.

But that isn't where the story ends.  That's where it begins.  Your 
infrastructure, processes and tools should all be designed with that in mind so 
as to reduce or eliminate the impact that human error will have on the 
reliability of the service you provide to your customers.

So, for the example you gave there are a few things that could be put in place. 
 The first one, already mentioned by Chad, is that mission critical services 
should not be designed with single points of failure - that situation should be 
remediated.  

Another question  to be asked - since this was provisioning work being done, 
and it was apparently being done on production equipment, could the work have 
been done at a time of day (or night) when an error would not have been as much 
of a problem?

You don't say how long the outage lasted, but given the reaction by your upper 
management, I would infer that it lasted for a while.  That raises the next 
question.  Who besides the engineer making the mistake was aware of the fact 
that work on production equipment was occurring?  The reason this is important 
is because having the NOC know that work is occurring would give them a leg up 
on locating where the problem is once they get the trouble notification.

Paul


On Feb 2, 2010, at 8:16 AM, Mark Smith wrote:

> On Mon, 1 Feb 2010 21:21:52 -0500
> Chadwick Sorrell  wrote:
> 
>> Hello NANOG,
>> 
>> Long time listener, first time caller.
>> 
>> A recent organizational change at my company has put someone in charge
>> who is determined to make things perfect.  We are a service provider,
>> not an enterprise company, and our business is doing provisioning work
>> during the day.  We recently experienced an outage when an engineer,
>> troubleshooting a failed turn-up, changed the ethertype on the wrong
>> port losing both management and customer data on said device.  This
>> isn't a common occurrence, and the engineer in question has a pristine
>> track record.
>> 
> 
> Why didn't the customer have a backup link if their service was so
> important to them and indirectly your upper management? If your
> upper management are taking this problem that seriously, then your
> *sales people* didn't do their job properly - they should be ensuring
> that customers with high availability requirements have a backup link,
> or aren't led to believe that the single-point-of-failure service will
> be highly available.
> 
> 
>> This outage, of a high profile customer, triggered upper management to
>> react by calling a meeting just days after.  Put bluntly, we've been
>> told "Human errors are unacceptable, and they will be completely
>> eliminated.  One is too many."
>> 
> 
> If upper management don't understand that human error is a risk factor
> that can't be completely eliminated, then I suggest "self-eliminating"
> and find yourself a job somewhere else. The only way you'll avoid
> human error having any impact on production services is to not change
> anything - which pretty much means not having a job anyway ...
> 
> 
>> I am asking the respectable NANOG engineers
>> 
>> What measures have you taken to mitigate human mistakes?
>> 
>> Have they been successful?
>> 
>> Any other comments on the subject would be appreciated, we would like
>> to come to our next meeting armed and dangerous.
>> 
>> Thanks!
>> Chad
>> 
> 




Re: Mitigating human error in the SP

2010-02-02 Thread Paul Corrao
Humans make errors.  

For your upper management to think  they can build a foundation of reliability 
on the theory that humans won't make errors is self deceiving.

But that isn't where the story ends.  That's where it begins.  Your 
infrastructure, processes and tools should all be designed with that in mind so 
as to reduce or eliminate the impact that human error will have on the 
reliability of the service you provide to your customers.

So, for the example you gave there are a few things that could be put in place. 
 The first one, already mentioned by Chad, is that mission critical services 
should not be designed with single points of failure - that situation should be 
remediated.  

Another question  to be asked - since this was provisioning work being done, 
and it was apparently being done on production equipment, could the work have 
been done at a time of day (or night) when an error would not have been as much 
of a problem?

You don't say how long the outage lasted, but given the reaction by your upper 
management, I would infer that it lasted for a while.  That raises the next 
question.  Who besides the engineer making the mistake was aware of the fact 
that work on production equipment was occurring?  The reason this is important 
is because having the NOC know that work is occurring would give them a leg up 
on locating where the problem is once they get the trouble notification.

Paul


On Feb 2, 2010, at 8:16 AM, Mark Smith wrote:

> On Mon, 1 Feb 2010 21:21:52 -0500
> Chadwick Sorrell  wrote:
> 
>> Hello NANOG,
>> 
>> Long time listener, first time caller.
>> 
>> A recent organizational change at my company has put someone in charge
>> who is determined to make things perfect.  We are a service provider,
>> not an enterprise company, and our business is doing provisioning work
>> during the day.  We recently experienced an outage when an engineer,
>> troubleshooting a failed turn-up, changed the ethertype on the wrong
>> port losing both management and customer data on said device.  This
>> isn't a common occurrence, and the engineer in question has a pristine
>> track record.
>> 
> 
> Why didn't the customer have a backup link if their service was so
> important to them and indirectly your upper management? If your
> upper management are taking this problem that seriously, then your
> *sales people* didn't do their job properly - they should be ensuring
> that customers with high availability requirements have a backup link,
> or aren't led to believe that the single-point-of-failure service will
> be highly available.
> 
> 
>> This outage, of a high profile customer, triggered upper management to
>> react by calling a meeting just days after.  Put bluntly, we've been
>> told "Human errors are unacceptable, and they will be completely
>> eliminated.  One is too many."
>> 
> 
> If upper management don't understand that human error is a risk factor
> that can't be completely eliminated, then I suggest "self-eliminating"
> and find yourself a job somewhere else. The only way you'll avoid
> human error having any impact on production services is to not change
> anything - which pretty much means not having a job anyway ...
> 
> 
>> I am asking the respectable NANOG engineers
>> 
>> What measures have you taken to mitigate human mistakes?
>> 
>> Have they been successful?
>> 
>> Any other comments on the subject would be appreciated, we would like
>> to come to our next meeting armed and dangerous.
>> 
>> Thanks!
>> Chad
>> 
> 




Re: NREN Network Design

2010-02-02 Thread Tarig Y. Adam
Hi Stephane

thanks, your scheme is very expressive.And sorry for forgetting to send this 
message to my own list (afnog).
regarding the subject the next hop of default route of our customers are ISP 
routers then SUIN routers. Is this a normal situation???


Normal0falsefalsefalseEN-USX-NONEAR-SAMicrosoftInternetExplorer4
 /* Style Definitions */
 table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
-
Eng. Tarig Yassin Adam
Sudanese Universities' Information Network (SUIN)
T:
+249925659149
- Original Message -
From: Stephane Bortzmeyer 
To: Tarig Yassin Adam 
Cc: Phil Regnauld  ,   
Date: Tuesday, February 2 2010 11:40 AM
Subject: Re: NREN Network Design
On Mon, Feb 01, 2010 at 10:04:07PM -0500,
 Tarig Yassin Adam  wrote 
 a message of 597 lines which said:

> hi phil

[And thanks for forwarding this discussion where it belongs. It is sad
that a discussion on an african NREN was first carried to a North
America list.]

> We already applied to AfriNIC, they allocated to us /18 ipv4 we're
> using two ISPs to connect to our customers and the same time we're
> using these ISPs for the Internet.

And you use BGP? You have your own AS?

> The ISP using tunnel layer 3. 

I am sorry, I cannot parse this sentence.

> For customer sites to reach our NOC they need next hop to the ISP
> routers which we do not have a control. 

The way I understand it (you are really short on details), the
universities are your customers and there is always an ISP between
NREN and UNI:

+-+
|  |+-+
| NREN|-| ISP 1   |--- UNI 1
+-++-+
 |  |   \
 ++   |   \
 | ISP 2  |  +---+ \
 ++---| Internet  |  ++
   | | +---+  |  UNI 2 |
   | ||   |
+--+  +---+ ++
|UNI 3 |  | UNI 4 |  
+--+  +---+   


Is my schema OK? If no, please provide yours. If yes, I wonder, what
service provides the NREN if no connectivity? If you just provide
Internet servers (HTTP, XMPP, etc), there is no need to "tunnel"
anything over the ISP.


Re: Mitigating human error in the SP

2010-02-02 Thread Joe Provo
On Mon, Feb 01, 2010 at 09:21:52PM -0500, Chadwick Sorrell wrote:
> Hello NANOG,
> 
> Long time listener, first time caller.
[snip]
> What measures have you taken to mitigate human mistakes?
> 
> Have they been successful?
> 
> Any other comments on the subject would be appreciated, we would like
> to come to our next meeting armed and dangerous.

Define your processes well, have management sign off so no blame game 
and people realize they are all on the same side Use peer review.  
Don't start automating until you have a working system, and then get 
the humans out of the repetitive bits.  Don't build monolithic systems.
Test your automation well.  Be sure have the symmetric *de-provisioning* 
to any provisioning else you will be relying on humans to clean out 
the cruft instead of addressing the problem. 

Extend accountability throughout the organization - replace commission-
minded sales folks with relationship-minded account management. 

Always have OoB.  Require vendors to be *useful* under OoB conditions,
at least to your more advanced employees.

Expect errors in the system and in execution; develop ways to check
for them and be prepared to modify methods, procedures and tools 
without multiple years and inter-departmental bureaucracy.  Change and
errors happen, so capitalize on to those events to improve you service
and systems rather than emphasizing punishment.

Cheers,

Joe
-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE



Threading the senderbase reputation needle

2010-02-02 Thread Drew Weaver
Howdy,

Has anyone come up with a reverse DNS 'pattern' that one can employ that will 
prevent Senderbase from assigning a poor reputation to an entire /24 because 
they saw an email they didn't like from a single IP address?

We're an infrastructure provider, which means that we lease servers, etc to 
customers and everything we do uses static IPs.

Our current 'default (before the customer changes it)' is a 
x.x.x.x.static.domain.com, apparently Senderbase cannot look up CIDR boundaries 
in the RIR database (even though we spend a lot of time making sure that we 
publish the CIDR information) so they just assume that each 'offender' owns the 
entire /24 and they also consider any 'email' from the static.domain.com domain 
to be the 'same offender' (which is completely silly).

The other little annoyance about their system is that we assign CIDR blocks to 
users (almost always a /29) these CIDRs include IP addresses like the gateway 
address, the broadcast address, the network address, etc and the users may only 
use 2-3 of the IPs in the /29, but they expect us or the user to set a 'custom 
looking' reverse DNS on all of the IPs in the range. Originally, we were not 
putting any reverse DNS on our IPs until the customer requested it (or did it 
themselves via our system) but then we ran into problems with some RBLs that 
require reverse DNS on all IPs, and other RBLs that require matching forward 
and reverse DNS on all IPs. 

I've contacted Senderbase for advice on what specifically we need to do but 
they've been vague at best and I have even asked them for examples of companies 
who 'meet their specifications' but I wasn't given any.

I'm considering doing something like customerX.static.domain.com but then I 
can see other problems with that also.

Any advice?

-Drew




Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
On Tue, Feb 2, 2010 at 9:09 AM, Paul Corrao  wrote:
> Humans make errors.
>
> For your upper management to think  they can build a foundation of 
> reliability on the theory that humans won't make errors is self deceiving.
>
> But that isn't where the story ends.  That's where it begins.  Your 
> infrastructure, processes and tools should all be designed with that in mind 
> so as to reduce or eliminate the impact that human error will have on the 
> reliability of the service you provide to your customers.
>
> So, for the example you gave there are a few things that could be put in 
> place.  The first one, already mentioned by Chad, is that mission critical 
> services should not be designed with single points of failure - that 
> situation should be remediated.

Agreed.

> Another question  to be asked - since this was provisioning work being done, 
> and it was apparently being done on production equipment, could the work have 
> been done at a time of day (or night) when an error would not have been as 
> much of a problem?

As it stands now, business want to turn their services up when they
are in the office.  We do all new turn-ups during the day, anything
requiring a roll or maintenance window is schedule in the middle of
the night.

> You don't say how long the outage lasted, but given the reaction by your 
> upper management, I would infer that it lasted for a while.  That raises the 
> next question.  Who besides the engineer making the mistake was aware of the 
> fact that work on production equipment was occurring?  The reason this is 
> important is because having the NOC know that work is occurring would give 
> them a leg up on locating where the problem is once they get the trouble 
> notification.

The actual error happened when someone was troubleshooting a turn-up,
where in the past the customer in question has had their ethertype set
wrong.  It wasn't a provisioning problem as much as someone
troubleshooting why it didn't come up with the customer.  Ironically,
the NOC was on the phone when it happened, and the switch was rebooted
almost immediately and the outage lasted 5 minutes.

Chad



RE: Threading the senderbase reputation needle

2010-02-02 Thread Jason Gurtz
> Has anyone come up with a reverse DNS 'pattern' that one can employ that
> will prevent Senderbase from assigning a poor reputation to an entire
/24
> because they saw an email they didn't like from a single IP address?
> 
> We're an infrastructure provider, which means that we lease servers, etc
> to customers and everything we do uses static IPs.
[...] 
> Any advice?

Since email reputation is now being based on the neighborhood theory you
must do one of the following:

Do one of the following (hopefully #1):

1.) Provide custom reverse DNS for the customer.  BCP for SMTP server DNS
is matching forward and reverse DNS.  Anything else is suspect...

2.) Set up a relay host and funnel all customers mail through it.

Side effects of each:

1.) Slightly more work on the front end (but hey, even AT&T will do this
for business DSL customers).  People will know you have clue.  The
technical staff at your customers will be happy and recommend you to their
peers (well, I guess this depends a bit on what kind of customers you
have).

2.) You have taken responsibility for all your customers' outbound mail
flows.  You will need to scale an abuse desk and maintain effective
anti-spam policies (including customer education).  If you don't run an
effective abuse desk (including blocking your own customers outbound mail
when necessary), you will be blacklisted eventually anyway.  You could
charge extra for or outsource this ESP service.

~JasonG



RE: Threading the senderbase reputation needle

2010-02-02 Thread Drew Weaver
Since email reputation is now being based on the neighborhood theory you
must do one of the following:

Do one of the following (hopefully #1):

1.) Provide custom reverse DNS for the customer.  BCP for SMTP server DNS
is matching forward and reverse DNS.  Anything else is suspect...

2.) Set up a relay host and funnel all customers mail through it.

Side effects of each:

1.) Slightly more work on the front end (but hey, even AT&T will do this
for business DSL customers).  People will know you have clue.  The
technical staff at your customers will be happy and recommend you to their
peers (well, I guess this depends a bit on what kind of customers you
have).

2.) You have taken responsibility for all your customers' outbound mail
flows.  You will need to scale an abuse desk and maintain effective
anti-spam policies (including customer education).  If you don't run an
effective abuse desk (including blocking your own customers outbound mail
when necessary), you will be blacklisted eventually anyway.  You could
charge extra for or outsource this ESP service.
==

Okay, as I mentioned, we allow the customers to set their reverse DNS to 
whatever they want as long as the forward and the reverse match. we don't own 
the customer's domains nor do we host the DNS for 99% of them, so I'm not sure 
how we could enforce a rule saying that everyone on our network has to have 
their reverse DNS set a certain way. That is why we set it up like we did, 
because we can control hostnames within our domain and we can set the PTR 
record to match. Like I said before we're a hosting company, we sell Co-Lo, 
Dedicated servers, and Virtualization products. 

It seems somewhat impossible to employ either of your suggestions in our 
environment.

thanks,
-Drew





Re: Mitigating human error in the SP

2010-02-02 Thread Larry Sheldon

On 2/2/2010 6:26 AM, gb10hkzo-na...@yahoo.co.uk wrote:





Otherwise, as Suresh notes, the only way to eliminate human error
completely is to eliminate the presence of humans in the
activity.

and,hence by reference.

Automated config deployment / provisioning.


That's the funniest thing I've read all day... ;-)

A little pessimistic rant ;-)

Who writes the scripts that you use, who writes the software that you
use ?There will always be at least one human somewhere, and where
there's a human writing software tools, there's scope for bugs and
unexpected issues.  Whether inadvertent or not, they will always be
there.

If the excrement is going to hit the proverbial fan, try as you might
to stop it, it will happen.  Nothing in the IT / ISP / Telco world is
ever going to be perfect, far too complex with many dependencies.
Yes you might play in your perfect little labs until the cows come
home . but there always has been and always will be an element of
risk when you start making changes in production.

Face it, unless you follow the rigorous change control and
development practices that they use for avionics or other high-risk
environments, you are always going to be left with some element of
risk.

How much risk your company is prepared to take is something for the
men in black (suits) to decide because it correlates directly with
how much $$$ they are prepared to throw your way to help you mitigate
the risk .;-)

That's my 2  over .. thanks for listening (or
not !) ;-)


Add to that the stuff that always sounds like a cop-out, even tom the 
victims--the "human error" made by people not on you payroll, the 
vendors that are responsible for the misleading (or absent) 
documentation, for the CLI stuff that doesn't work just the way a 
reasonable person would expect it too, for the hardware that fails 
dirty, and on and on--a very long list.  Exacerbated by management that 
cheaps out on equipment, software, documentation, training, and staff.


Even with a lab with a rich fabric of equipment, there will be most of 
the other things to contend with.


A reasonable and competent management will not only provide what is 
needed for a reasonable error rate (which indeed can approach one over 5 
nines) but will also provide the means of recovery when the inevitable 
happens.  That might involve "needless" expense like additional staff, 
redundant equipment, alternate paths, ...


But it won't involve whippings until the morale improves or reductions 
in staff and funding until the errors go away.



--
"Government big enough to supply everything you need is big enough to
take everything you have."

Remember:  The Ark was built by amateurs, the Titanic by professionals.

Requiescas in pace o email
Ex turpi causa non oritur actio
Eppure si rinfresca

ICBM Targeting Information:  http://tinyurl.com/4sqczs
http://tinyurl.com/7tp8ml




Re: Threading the senderbase reputation needle

2010-02-02 Thread Rich Kulawiec
On Tue, Feb 02, 2010 at 09:37:44AM -0500, Drew Weaver wrote:
> Has anyone come up with a reverse DNS 'pattern' that one can employ
> that will prevent Senderbase from assigning a poor reputation to an entire
> /24 because they saw an email they didn't like from a single IP address?

I think this discussion would be much better on the mailop list, but
the short answer here is "real mail servers have real, non-generic names
with matching forward/reverse DNS".

---Rsk



Re: Threading the senderbase reputation needle

2010-02-02 Thread Ronald Cotoni
On Tue, Feb 2, 2010 at 10:32 AM, Drew Weaver  wrote:
> Since email reputation is now being based on the neighborhood theory you
> must do one of the following:
>
> Do one of the following (hopefully #1):
>
> 1.) Provide custom reverse DNS for the customer.  BCP for SMTP server DNS
> is matching forward and reverse DNS.  Anything else is suspect...
>
> 2.) Set up a relay host and funnel all customers mail through it.
>
> Side effects of each:
>
> 1.) Slightly more work on the front end (but hey, even AT&T will do this
> for business DSL customers).  People will know you have clue.  The
> technical staff at your customers will be happy and recommend you to their
> peers (well, I guess this depends a bit on what kind of customers you
> have).
>
> 2.) You have taken responsibility for all your customers' outbound mail
> flows.  You will need to scale an abuse desk and maintain effective
> anti-spam policies (including customer education).  If you don't run an
> effective abuse desk (including blocking your own customers outbound mail
> when necessary), you will be blacklisted eventually anyway.  You could
> charge extra for or outsource this ESP service.
> ==
>
> Okay, as I mentioned, we allow the customers to set their reverse DNS to 
> whatever they want as long as the forward and the reverse match. we don't own 
> the customer's domains nor do we host the DNS for 99% of them, so I'm not 
> sure how we could enforce a rule saying that everyone on our network has to 
> have their reverse DNS set a certain way. That is why we set it up like we 
> did, because we can control hostnames within our domain and we can set the 
> PTR record to match. Like I said before we're a hosting company, we sell 
> Co-Lo, Dedicated servers, and Virtualization products.
>
> It seems somewhat impossible to employ either of your suggestions in our 
> environment.
>
> thanks,
> -Drew
>
>
>
>

I used to work at a hosting company and we had a few solutions in
place.  Whenever a client purchased  a server or an additional block
of ip's, it was assigned the reverse dns related to the hostname of
their server.  This even included example.com sometimes.  The client
could then change it as they wish.  Another option we had was an
outgoing spam filter setup with ASSP.  This scrubbed all outgoing mail
for spam messages.  Honestly the first option was good enough for most
people.  About 99.95% of your clients assign a forward DNS for their
server/colo/virtualization products.  Just make it a requirement that
they provide that before you turn up their service.  This prevents
DUHLs from listing you for those generic RDNS names.



Unix Sysadmin/Net Eng in the Toronto Area?

2010-02-02 Thread Carlos Kamtha
Greetings, 

I am looking for an experienced Freebsd/linux/cisco/juniper person. 

If you live in Toronto or the GTA and are interested please drop me a line 
offlist. 

Cheers, 

Carlos. 



RE: Threading the senderbase reputation needle

2010-02-02 Thread Drew Weaver
I think this discussion would be much better on the mailop list, but
the short answer here is "real mail servers have real, non-generic names
with matching forward/reverse DNS".


That certainly is true, but if a "real mail server" that has real, non-generic 
names with matching forward/reverse DNS happens to be in the same /24 as a 
server that doesn't it is given a poor reputation by Senderbase since 
Senderbase cannot do simple RIR lookups to see the scope of that particular 
customer's network/impact. 

-Drew

  




Re: Mitigating human error in the SP

2010-02-02 Thread Jared Mauch
We have solved 98% of this with standard configurations and templates.

To deviate from this requires management approval/exception approval after an 
evaluation of the business risks.

Automation of config building is not too hard, and certainly things like 
peer-groups (cisco) and regular groups (juniper) make it easier.

If you go for the holy grail, you want something that takes into account the 
following:

1) each phase in the provisioning/turn-up state
2) each phase in infrastructure troubleshooting (turn-up, temporary 
outage/temporary testing, production)
3) automated pushing of config via load override/commit replace to your config 
space.

Obviously testing, etc.. is important.  I've found that whenever a human is 
involved, mistakes happen.  There is also the "Software is imperfect" mantra 
that should be repeated.  I find vendors at times have demanding customers who 
want perfection.  Bugs happen, Outages happen, the question is how do you 
respond to these risks.

If you have poor handling of bugs, outages, etc.. in your process or are 
decision gridlocked, very bad things happen.

- Jared

On Feb 1, 2010, at 9:21 PM, Chadwick Sorrell wrote:

> Hello NANOG,
> 
> Long time listener, first time caller.
> 
> A recent organizational change at my company has put someone in charge
> who is determined to make things perfect.  We are a service provider,
> not an enterprise company, and our business is doing provisioning work
> during the day.  We recently experienced an outage when an engineer,
> troubleshooting a failed turn-up, changed the ethertype on the wrong
> port losing both management and customer data on said device.  This
> isn't a common occurrence, and the engineer in question has a pristine
> track record.
> 
> This outage, of a high profile customer, triggered upper management to
> react by calling a meeting just days after.  Put bluntly, we've been
> told "Human errors are unacceptable, and they will be completely
> eliminated.  One is too many."
> 
> I am asking the respectable NANOG engineers
> 
> What measures have you taken to mitigate human mistakes?
> 
> Have they been successful?
> 
> Any other comments on the subject would be appreciated, we would like
> to come to our next meeting armed and dangerous.
> 
> Thanks!
> Chad




Re: Mitigating human error in the SP

2010-02-02 Thread James Downs


On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote:


We have solved 98% of this with standard configurations and templates.

To deviate from this requires management approval/exception approval  
after an evaluation of the business risks.


I would also point Chad to this book: http://bit.ly/cShEIo (Amazon  
Link to Visual Ops).


It's very useful to have your management read it.  You may or may not  
be able to or want to use a full ITIL process, but understanding how  
these policies and procedures can/should work, and using the ones that  
apply makes sense.


Change control, tracking, and configuration management are going to be  
key to avoiding mistakes, and being able to rapidly repair when one is  
made.


Unfortunately, most management that demands No Tolerance, Zero Error  
from operations won't read the book.


Good luck.. I'd bet most of the people on this list have been there  
one time or another.


Cheers,
-j



Re: Mitigating human error in the SP

2010-02-02 Thread JC Dill

Chadwick Sorrell wrote:

This outage, of a high profile customer, triggered upper management to
react by calling a meeting just days after.  Put bluntly, we've been
told "Human errors are unacceptable, and they will be completely
eliminated.  One is too many."


Good, Fast, Cheap - pick any two.  No you can't have all three.

Here, Good is defined by your pointy-haired bosses as an 
impossible-to-achieve zero error rate.[1]  Attempting to achieve this is 
either going to cost $$$, or your operations speed (how long it takes 
people to do things) is going to drop like a rock.  Your first action 
should be to make sure upper management understands this so they can set 
the appropriate priorities on Good, Fast, and Cheap, and make the 
appropriate budget changes.


It's going to cost $$$ to hire enough people to have the staff necessary 
to double-check things in a timely manner, OR things are going to slow 
way down as the existing staff is burdened by necessary double-checking 
of everything and triple-checking of some things required to try to 
achieve a zero error rate.  They will also need to spend $$$ on software 
(to automate as much as possible) and testing equipment.  They will also 
never actually achieve a zero error rate as this is an impossible task 
that no organization has ever achieved, no matter how much emphasis or 
money they pour into it (e.g. Windows vulnerabilities) or how important 
(see Challenger, Columbia, and the Mars Climate Orbiter incidents).


When you put a $$$ cost on trying to achieve a zero error rate, 
pointy-haired bosses are usually willing to accept a normal error rate.  
Of course, they want you to try to avoid errors, and there are a lot of 
simple steps you can take in that effort (basic checklists, automation, 
testing) which have been mentioned elsewhere in this thread that will 
cost some money but not the $$$ that is required to try to achieve a 
zero error rate.  Make sure they understand that the budget they 
allocate for these changes will be strongly correlated to how Good (zero 
error rate) and Fast (quick operational responses to turn-ups and 
problems) the outcome of this initiative.


jc

[1]  http://www.godlessgeeks.com/LINKS/DilbertQuotes.htm

2. "What I need is a list of specific unknown problems we will 
encounter." (Lykes Lines Shipping)


6. "Doing it right is no excuse for not meeting the schedule." (R&D 
Supervisor, Minnesota Mining & Manufacturing/3M Corp.)






Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
On Tue, Feb 2, 2010 at 12:45 PM, James Downs  wrote:
>
> On Feb 2, 2010, at 9:33 AM, Jared Mauch wrote:
>
> We have solved 98% of this with standard configurations and templates.
>
> To deviate from this requires management approval/exception approval after
> an evaluation of the business risks.
>
> I would also point Chad to this book: http://bit.ly/cShEIo (Amazon Link to
> Visual Ops).
> It's very useful to have your management read it.  You may or may not be
> able to or want to use a full ITIL process, but understanding how these
> policies and procedures can/should work, and using the ones that apply makes
> sense.
> Change control, tracking, and configuration management are going to be key
> to avoiding mistakes, and being able to rapidly repair when one is made.
> Unfortunately, most management that demands No Tolerance, Zero Error from
> operations won't read the book.
> Good luck.. I'd bet most of the people on this list have been there one time
> or another.
> Cheers,
> -j

Interesting book, maybe I'll bring that to the next meeting.  Thanks
for the heads up on that.



Re: Mitigating human error in the SP

2010-02-02 Thread Larry Sheldon

On 2/2/2010 11:33 AM, Jared Mauch wrote:

We have solved 98% of this with standard configurations and
templates.

To deviate from this requires management approval/exception approval
after an evaluation of the business risks.

Automation of config building is not too hard, and certainly things
like peer-groups (cisco) and regular groups (juniper) make it
easier.


Those things and some of the others that have been mentioned will go a 
very long way to prevent the second occurrence.


Only training, adequate (number and quality) staff, and a 
quality-above-all-all-else culture have a prayer of preventing the first 
occurrence.  (For sure, lots of the second-occurrence-preventers may be 
part of that quality first culture.)


--
"Government big enough to supply everything you need is big enough to
take everything you have."

Remember:  The Ark was built by amateurs, the Titanic by professionals.

Requiescas in pace o email
Ex turpi causa non oritur actio
Eppure si rinfresca

ICBM Targeting Information:  http://tinyurl.com/4sqczs
http://tinyurl.com/7tp8ml




Re: Fiber Cut in CA?

2010-02-02 Thread Bill Stewart
On Tue, Feb 2, 2010 at 12:04 AM,   wrote:
> That is one long protect path. Yikes.

There be mountains in the way, with deserts in between, and not a lot
of people to justify diversity or railroads and highways to run it
along.
Not many carriers have more than one fiber route across Arizona and
New Mexico, especially for the newer high-capacity fibers (i.e. built
this millennium, after the financial excesses of the 90s.)
I'm no longer current on what routes are being used by what carriers,
but if you don't have two routes across northern Arizona ( I-10/I-40,
with restoration routes like Barstow->LasVegas->Flagstaff->Phoenix),
then the next alternative is Barstow->LasVegas->SaltLakeCity->Denver,
at which point some carriers have routes down to Phoenix via Tucumcari
or Amarillo, and the rest are going to go through Dallas, and anybody
who doesn't have the LasVegas->SLC route is going to use
Sacramento->SLC->Denver, possibly also including San Jose, depending
on what routes they've got across California.

So, yeah, instead of the nice short 2200-mile restoration routes you
can use if SF->Seattle fails, cable cuts in the Southwest can be
really long...
-- 

 Thanks; Bill

Note that this isn't my regular email account - It's still experimental so far.
And Google probably logs and indexes everything you send it.



Re: Fiber Cut in CA?

2010-02-02 Thread Matt Simmons
And in an open desert, back hoes can smell fiber from miles away.

On Tue, Feb 2, 2010 at 3:27 PM, Bill Stewart  wrote:
> On Tue, Feb 2, 2010 at 12:04 AM,   wrote:
>> That is one long protect path. Yikes.
>
> There be mountains in the way, with deserts in between, and not a lot
> of people to justify diversity or railroads and highways to run it
> along.
> Not many carriers have more than one fiber route across Arizona and
> New Mexico, especially for the newer high-capacity fibers (i.e. built
> this millennium, after the financial excesses of the 90s.)
> I'm no longer current on what routes are being used by what carriers,
> but if you don't have two routes across northern Arizona ( I-10/I-40,
> with restoration routes like Barstow->LasVegas->Flagstaff->Phoenix),
> then the next alternative is Barstow->LasVegas->SaltLakeCity->Denver,
> at which point some carriers have routes down to Phoenix via Tucumcari
> or Amarillo, and the rest are going to go through Dallas, and anybody
> who doesn't have the LasVegas->SLC route is going to use
> Sacramento->SLC->Denver, possibly also including San Jose, depending
> on what routes they've got across California.
>
> So, yeah, instead of the nice short 2200-mile restoration routes you
> can use if SF->Seattle fails, cable cuts in the Southwest can be
> really long...
> --
> 
>             Thanks;     Bill
>
> Note that this isn't my regular email account - It's still experimental so 
> far.
> And Google probably logs and indexes everything you send it.
>
>



-- 

LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.



ip address management

2010-02-02 Thread Pavel Dimow
Hello,

does anybody knows what happend with ipat?

http://nethead.de/index.php/ipat
http://nanog.cluepon.net/index.php/Tools_and_Resources

Any other suggestion for a good foss ip address management app with
ipv6 support?



RE: ip address management

2010-02-02 Thread Scott Berkman
I was about to suggest IPPlan, but it is lacking the V6 support.  Here is
one I found doing some searching, but I haven't used it myself:

http://sourceforge.net/projects/haci/

-Scott

-Original Message-
From: Pavel Dimow [mailto:paveldi...@gmail.com] 
Sent: Tuesday, February 02, 2010 3:55 PM
To: nanog@nanog.org
Subject: ip address management

Hello,

does anybody knows what happend with ipat?

http://nethead.de/index.php/ipat
http://nanog.cluepon.net/index.php/Tools_and_Resources

Any other suggestion for a good foss ip address management app with
ipv6 support?





Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Matt Sprague
Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or north of 
Westchester NY without getting too far away from NYC?

I'm looking for a DR colo solution for a site that is in NYC; this needs to be 
at least 50m away from NYC, but I'm trying to keep it not too much further than 
that for convenience.  I'm also trying to keep this to top level providers as 
there may be compliance requirements.

Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/



RE: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Ray Sanders
Datapipe has a facility in N.J...
Not sure if they are 50mi from NYC

Mobile email powered by the force...

 Original Message 
From: "Matt Sprague" 
Date: 2/2/10 2:19 pm
To: "nanog@nanog.org" 
Subj: Datacenter for DR in northwestern NJ/NY
Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or north of 
Westchester NY without getting too far away from NYC?

I'm looking for a DR colo solution for a site that is in NYC; this needs to be 
at least 50m away from NYC, but I'm trying to keep it not too much further than 
that for convenience.  I'm also trying to keep this to top level providers as 
there may be compliance requirements.

Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/




RE: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Matt Sprague
Thanks! 

I was looking at them also (I live in that area), but you're right, they're 
just inside a 50mi radius. 

--
Matt Sprague
Delivery Director 
ReadyTechs llc
973.455.0606 x1204


-Original Message-
From: Ray Sanders [mailto:ray.sand...@villagevoicemedia.com] 
Sent: Tuesday, February 02, 2010 4:41 PM
To: nanog@nanog.org; Matt Sprague
Subject: RE: Datacenter for DR in northwestern NJ/NY

Datapipe has a facility in N.J...
Not sure if they are 50mi from NYC

Mobile email powered by the force...

 Original Message 
From: "Matt Sprague" 
Date: 2/2/10 2:19 pm
To: "nanog@nanog.org" 
Subj: Datacenter for DR in northwestern NJ/NY
Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or north of 
Westchester NY without getting too far away from NYC?

I'm looking for a DR colo solution for a site that is in NYC; this needs to be 
at least 50m away from NYC, but I'm trying to keep it not too much further than 
that for convenience.  I'm also trying to keep this to top level providers as 
there may be compliance requirements.

Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/




RE: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Scott Berkman
Might be better off going to Philly, its only about an hour and a half away,
and you'll likely have better connectivity options.  Most of the big data
centers in NJ are well within the 50 mile requirement (Bergen County,
Hoboken, Newark, Jersey City).

-Scott

-Original Message-
From: Matt Sprague [mailto:mspra...@readytechs.com] 
Sent: Tuesday, February 02, 2010 4:16 PM
To: nanog@nanog.org
Subject: Datacenter for DR in northwestern NJ/NY

Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or north of
Westchester NY without getting too far away from NYC?

I'm looking for a DR colo solution for a site that is in NYC; this needs to
be at least 50m away from NYC, but I'm trying to keep it not too much
further than that for convenience.  I'm also trying to keep this to top
level providers as there may be compliance requirements.

Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/





Re: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Matt Simmons
Sungard has some nice datacenters in Philly. I'm in one that's still
being built out, and I haven't regretted it yet.

--Matt

On Tue, Feb 2, 2010 at 4:50 PM, Scott Berkman  wrote:
> Might be better off going to Philly, its only about an hour and a half away,
> and you'll likely have better connectivity options.  Most of the big data
> centers in NJ are well within the 50 mile requirement (Bergen County,
> Hoboken, Newark, Jersey City).
>
>        -Scott
>
> -Original Message-
> From: Matt Sprague [mailto:mspra...@readytechs.com]
> Sent: Tuesday, February 02, 2010 4:16 PM
> To: nanog@nanog.org
> Subject: Datacenter for DR in northwestern NJ/NY
>
> Hello NANOG!
>
> Does anyone know of some strong datacenters in northwestern NJ, or north of
> Westchester NY without getting too far away from NYC?
>
> I'm looking for a DR colo solution for a site that is in NYC; this needs to
> be at least 50m away from NYC, but I'm trying to keep it not too much
> further than that for convenience.  I'm also trying to keep this to top
> level providers as there may be compliance requirements.
>
> Thanks in advance for any responses.
> --
> Matt Sprague
> ReadyTechs, LLC
>
> mspra...@readytechs.com
> 973-455-0606 x1204 (voice)
> http://www.readytechs.com/
>
>
>
>



-- 

LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.



RE: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Greg D. Moore

At 04:41 PM 2/2/2010, Ray Sanders wrote:

Datapipe has a facility in N.J...



I have to admit, I've used the Datapipe facility.  I'm underwhelmed.

You might also want to try Albany.  Smaller providers, but a few up 
there (Time Warner for example) that may work.


Decent infrastructure, a train ride away.  Good brew pubs.



Not sure if they are 50mi from NYC

Mobile email powered by the force...

 Original Message 
From: "Matt Sprague" 
Date: 2/2/10 2:19 pm
To: "nanog@nanog.org" 
Subj: Datacenter for DR in northwestern NJ/NY
Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or 
north of Westchester NY without getting too far away from NYC?


I'm looking for a DR colo solution for a site that is in NYC; this 
needs to be at least 50m away from NYC, but I'm trying to keep it 
not too much further than that for convenience.  I'm also trying to 
keep this to top level providers as there may be compliance requirements.


Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/


Greg D. Moore   President   moor...@greenms.com
Ask me about lily, an RPI based chat system: http://lilycore.sourceforge.net/

Help honor our WWII Veterans: http://www.honorflight.org/ 





RE: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Cerniglia, Brandon
Cervalis has facilities in wappingers ny
1.5 hours from NYC

-Original Message-
From: Matt Sprague [mailto:mspra...@readytechs.com]
Sent: Tuesday, February 02, 2010 4:16 PM
To: nanog@nanog.org
Subject: Datacenter for DR in northwestern NJ/NY

Hello NANOG!

Does anyone know of some strong datacenters in northwestern NJ, or north of 
Westchester NY without getting too far away from NYC?

I'm looking for a DR colo solution for a site that is in NYC; this needs to be 
at least 50m away from NYC, but I'm trying to keep it not too much further than 
that for convenience.  I'm also trying to keep this to top level providers as 
there may be compliance requirements.

Thanks in advance for any responses.
--
Matt Sprague
ReadyTechs, LLC

mspra...@readytechs.com
973-455-0606 x1204 (voice)
http://www.readytechs.com/


STATEMENT OF CONFIDENTIALITY:



The information contained in this electronic message and any attachments to
this message are intended for the exclusive use of the addressee(s) and may
contain confidential or privileged information. If you are not the intended
recipient, please notify WHI Solutions immediately at g...@whisolutions.com,
and destroy all copies of this message and any attachments.



Re: Datacenter for DR in northwestern NJ/NY

2010-02-02 Thread Steven Bellovin

On Feb 2, 2010, at 5:52 PM, Cerniglia, Brandon wrote:

> Cervalis has facilities in wappingers ny
> 1.5 hours from NYC


Hmm -- where to the fibers run from a facility like that?  Are the all homed to 
NYC, or are there runs to, say, Albany or Boston?
> 
> -Original Message-
> From: Matt Sprague [mailto:mspra...@readytechs.com]
> Sent: Tuesday, February 02, 2010 4:16 PM
> To: nanog@nanog.org
> Subject: Datacenter for DR in northwestern NJ/NY
> 
> Hello NANOG!
> 
> Does anyone know of some strong datacenters in northwestern NJ, or north of 
> Westchester NY without getting too far away from NYC?
> 
> I'm looking for a DR colo solution for a site that is in NYC; this needs to 
> be at least 50m away from NYC, but I'm trying to keep it not too much further 
> than that for convenience.  I'm also trying to keep this to top level 
> providers as there may be compliance requirements.
> 
> Thanks in advance for any responses.
> --
> Matt Sprague
> ReadyTechs, LLC
> 
> mspra...@readytechs.com
> 973-455-0606 x1204 (voice)
> http://www.readytechs.com/
> 
> 
> STATEMENT OF CONFIDENTIALITY:
> 
> 
> 
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain confidential or privileged information. If you are not the intended
> recipient, please notify WHI Solutions immediately at g...@whisolutions.com,
> and destroy all copies of this message and any attachments.
> 
> 


--Steve Bellovin, http://www.cs.columbia.edu/~smb








Re: Fiber Cut in CA?

2010-02-02 Thread Bret Clark
   Good point...so if the cut is in the middle of nowhere without easy
   access...then how the hell did it get cut? Malicious?
   Matt Simmons wrote:

And in an open desert, back hoes can smell fiber from miles away.

On Tue, Feb 2, 2010 at 3:27 PM, Bill Stewart [1] wrote:

On Tue, Feb 2, 2010 at 12:04 AM,  [2] wrote:

That is one long protect path. Yikes.

There be mountains in the way, with deserts in between, and not a lot
of people to justify diversity or railroads and highways to run it
along.
Not many carriers have more than one fiber route across Arizona and
New Mexico, especially for the newer high-capacity fibers (i.e. built
this millennium, after the financial excesses of the 90s.)
I'm no longer current on what routes are being used by what carriers,
but if you don't have two routes across northern Arizona ( I-10/I-40,
with restoration routes like Barstow->LasVegas->Flagstaff->Phoenix),
then the next alternative is Barstow->LasVegas->SaltLakeCity->Denver,
at which point some carriers have routes down to Phoenix via Tucumcari
or Amarillo, and the rest are going to go through Dallas, and anybody
who doesn't have the LasVegas->SLC route is going to use
Sacramento->SLC->Denver, possibly also including San Jose, depending
on what routes they've got across California.

So, yeah, instead of the nice short 2200-mile restoration routes you
can use if SF->Seattle fails, cable cuts in the Southwest can be
really long...
--

Thanks; Bill

Note that this isn't my regular email account - It's still experimental so far.
And Google probably logs and indexes everything you send it.

References

   1. mailto:nonobvi...@gmail.com
   2. mailto:char...@knownelement.com


Re: Fiber Cut in CA?

2010-02-02 Thread Steven Bellovin

On Feb 2, 2010, at 6:36 PM, Bret Clark wrote:

>   Good point...so if the cut is in the middle of nowhere without easy
>   access...then how the hell did it get cut? Malicious?

Some hikers were lost in the desert and tossed down some fiber, waiting for a 
backhoe to show up and save them, but it was confused by the scent of a much 
longer, juicier piece


>   Matt Simmons wrote:
> 
> And in an open desert, back hoes can smell fiber from miles away.
> 
> On Tue, Feb 2, 2010 at 3:27 PM, Bill Stewart [1] wrote:
> 
> On Tue, Feb 2, 2010 at 12:04 AM,  [2] wrote:
> 
> That is one long protect path. Yikes.
> 
> There be mountains in the way, with deserts in between, and not a lot
> of people to justify diversity or railroads and highways to run it
> along.
> Not many carriers have more than one fiber route across Arizona and
> New Mexico, especially for the newer high-capacity fibers (i.e. built
> this millennium, after the financial excesses of the 90s.)
> I'm no longer current on what routes are being used by what carriers,
> but if you don't have two routes across northern Arizona ( I-10/I-40,
> with restoration routes like Barstow->LasVegas->Flagstaff->Phoenix),
> then the next alternative is Barstow->LasVegas->SaltLakeCity->Denver,
> at which point some carriers have routes down to Phoenix via Tucumcari
> or Amarillo, and the rest are going to go through Dallas, and anybody
> who doesn't have the LasVegas->SLC route is going to use
> Sacramento->SLC->Denver, possibly also including San Jose, depending
> on what routes they've got across California.
> 
> So, yeah, instead of the nice short 2200-mile restoration routes you
> can use if SF->Seattle fails, cable cuts in the Southwest can be
> really long...
> --
> 
>Thanks; Bill
> 
> Note that this isn't my regular email account - It's still experimental so 
> far.
> And Google probably logs and indexes everything you send it.
> 
> References
> 
>   1. mailto:nonobvi...@gmail.com
>   2. mailto:char...@knownelement.com
> 


--Steve Bellovin, http://www.cs.columbia.edu/~smb








RE: Fiber Cut in CA?

2010-02-02 Thread Scott Berkman
Cross-country Fibers very often follow existing utility rights of way.  So even 
in a wide open desert, the places the fibers go are the "busy" spots.  
Sometimes its train tracks, sometimes its gas pipelines, sometimes its 
electric, sometimes it’s a road, but very rarely is fiber like that "on its 
own".

So the cut was likely construction on whatever the fiber was near.  The other 
option is that the fiber provider was actually doing maintenance (adding 
capacity, fixing a troubled strand) and did the damage themselves.

-Scott

-Original Message-
From: Bret Clark [mailto:bcl...@spectraaccess.com] 
Sent: Tuesday, February 02, 2010 6:37 PM
To: nanog
Subject: Re: Fiber Cut in CA?

   Good point...so if the cut is in the middle of nowhere without easy
   access...then how the hell did it get cut? Malicious?
   Matt Simmons wrote:

And in an open desert, back hoes can smell fiber from miles away.

On Tue, Feb 2, 2010 at 3:27 PM, Bill Stewart [1] wrote:

On Tue, Feb 2, 2010 at 12:04 AM,  [2] wrote:

That is one long protect path. Yikes.

There be mountains in the way, with deserts in between, and not a lot
of people to justify diversity or railroads and highways to run it
along.
Not many carriers have more than one fiber route across Arizona and
New Mexico, especially for the newer high-capacity fibers (i.e. built
this millennium, after the financial excesses of the 90s.)
I'm no longer current on what routes are being used by what carriers,
but if you don't have two routes across northern Arizona ( I-10/I-40,
with restoration routes like Barstow->LasVegas->Flagstaff->Phoenix),
then the next alternative is Barstow->LasVegas->SaltLakeCity->Denver,
at which point some carriers have routes down to Phoenix via Tucumcari
or Amarillo, and the rest are going to go through Dallas, and anybody
who doesn't have the LasVegas->SLC route is going to use
Sacramento->SLC->Denver, possibly also including San Jose, depending
on what routes they've got across California.

So, yeah, instead of the nice short 2200-mile restoration routes you
can use if SF->Seattle fails, cable cuts in the Southwest can be
really long...
--

Thanks; Bill

Note that this isn't my regular email account - It's still experimental so far.
And Google probably logs and indexes everything you send it.

References

   1. mailto:nonobvi...@gmail.com
   2. mailto:char...@knownelement.com





RE: Fiber Cut in CA?

2010-02-02 Thread Michael J McCafferty

I believe in this case the ticket mentions it was at the site of an
"on-going water project". Contrary to what may seem logical to those not
familiar with the area, the area out that way is loaded with very
productive farm land and there are lots of aqueducts and irrigation.

Mike

On Tue, 2010-02-02 at 19:41 -0500, Scott Berkman wrote:
> Cross-country Fibers very often follow existing utility rights of way.  So 
> even in a wide open desert, the places the fibers go are the "busy" spots.  
> Sometimes its train tracks, sometimes its gas pipelines, sometimes its 
> electric, sometimes it’s a road, but very rarely is fiber like that "on its 
> own".
> 
> So the cut was likely construction on whatever the fiber was near.  The other 
> option is that the fiber provider was actually doing maintenance (adding 
> capacity, fixing a troubled strand) and did the damage themselves.
> 
>   -Scott
> 

-- 

Michael J. McCafferty
Principal
M5 Hosting
http://www.m5hosting.com

You can have your own custom Dedicated Server up and running today !
RedHat Enterprise, CentOS, Ubuntu, Debian, OpenBSD, FreeBSD, and more





Re: Mitigating human error in the SP

2010-02-02 Thread Chadwick Sorrell
Thanks for all the comments!

On Tue, Feb 2, 2010 at 1:01 PM, JC Dill  wrote:
> Chadwick Sorrell wrote:
>>
>> This outage, of a high profile customer, triggered upper management to
>> react by calling a meeting just days after.  Put bluntly, we've been
>> told "Human errors are unacceptable, and they will be completely
>> eliminated.  One is too many."
>
> Good, Fast, Cheap - pick any two.  No you can't have all three.
>
> Here, Good is defined by your pointy-haired bosses as an
> impossible-to-achieve zero error rate.[1]  Attempting to achieve this is
> either going to cost $$$, or your operations speed (how long it takes people
> to do things) is going to drop like a rock.  Your first action should be to
> make sure upper management understands this so they can set the appropriate
> priorities on Good, Fast, and Cheap, and make the appropriate budget
> changes.
>
> It's going to cost $$$ to hire enough people to have the staff necessary to
> double-check things in a timely manner, OR things are going to slow way down
> as the existing staff is burdened by necessary double-checking of everything
> and triple-checking of some things required to try to achieve a zero error
> rate.  They will also need to spend $$$ on software (to automate as much as
> possible) and testing equipment.  They will also never actually achieve a
> zero error rate as this is an impossible task that no organization has ever
> achieved, no matter how much emphasis or money they pour into it (e.g.
> Windows vulnerabilities) or how important (see Challenger, Columbia, and the
> Mars Climate Orbiter incidents).
>
> When you put a $$$ cost on trying to achieve a zero error rate,
> pointy-haired bosses are usually willing to accept a normal error rate.  Of
> course, they want you to try to avoid errors, and there are a lot of simple
> steps you can take in that effort (basic checklists, automation, testing)
> which have been mentioned elsewhere in this thread that will cost some money
> but not the $$$ that is required to try to achieve a zero error rate.  Make
> sure they understand that the budget they allocate for these changes will be
> strongly correlated to how Good (zero error rate) and Fast (quick
> operational responses to turn-ups and problems) the outcome of this
> initiative.
>
> jc
>
> [1]  http://www.godlessgeeks.com/LINKS/DilbertQuotes.htm
>
> 2. "What I need is a list of specific unknown problems we will encounter."
> (Lykes Lines Shipping)
>
> 6. "Doing it right is no excuse for not meeting the schedule." (R&D
> Supervisor, Minnesota Mining & Manufacturing/3M Corp.)
>
>
>
>



Re: Fiber Cut in CA?

2010-02-02 Thread Blake Covarrubias
This is actually in my service area. 

There is an on-going water construction project along Interstate 8 by the 
Kiewit Corporation, and other entities, which are working on the All American 
Canal Lining Project.

http://www.iid.com/Water/AllAmericanCanalLiningProject
http://www.kiewit.com/projects/water-resources/all-american-canal.aspx

I drive by that area often and it is always very busy with workers and large 
machinery.

--
Blake Covarrubias

On Feb 2, 2010, at 6:02 PM, Michael J McCafferty wrote:

> 
>   I believe in this case the ticket mentions it was at the site of an
> "on-going water project". Contrary to what may seem logical to those not
> familiar with the area, the area out that way is loaded with very
> productive farm land and there are lots of aqueducts and irrigation.
> 
> Mike
> 
> On Tue, 2010-02-02 at 19:41 -0500, Scott Berkman wrote:
>> Cross-country Fibers very often follow existing utility rights of way.  So 
>> even in a wide open desert, the places the fibers go are the "busy" spots.  
>> Sometimes its train tracks, sometimes its gas pipelines, sometimes its 
>> electric, sometimes it’s a road, but very rarely is fiber like that "on its 
>> own".
>> 
>> So the cut was likely construction on whatever the fiber was near.  The 
>> other option is that the fiber provider was actually doing maintenance 
>> (adding capacity, fixing a troubled strand) and did the damage themselves.
>> 
>>  -Scott
>> 
> 
> -- 
> 
> Michael J. McCafferty
> Principal
> M5 Hosting
> http://www.m5hosting.com
> 
> You can have your own custom Dedicated Server up and running today !
> RedHat Enterprise, CentOS, Ubuntu, Debian, OpenBSD, FreeBSD, and more
> 
> 
> 




Re: Mitigating human error in the SP

2010-02-02 Thread Michael Dillon
> Automated config deployment / provisioning.   And sanity checking
> before deployment.

Easy to say, not so easy to do. For instance, that incorrect port was identified
by a number or name. Theoretically, if an automated tool pulls the number/name
from a database and issues the command, then the error cannot happen. But how
does the number/name get into the database.

I've seen a situation where a human being enters that number, copying it from
another application screen. We hope that it is done by copy/paste all the
time but who knows? And even copy/paste can make mistakes if the selection
is done by mouse by someone who isn't paying enough attention.

But wait! How did the other application come up with that number for copying?
Actually, it was copy-pasted from yet a third application, and that application
got it by copy paste from a spreadsheet.

It is easy to create a tangled mess of OSS applications that are glued together
by lots of manual human effort creating numerous opportunities for human error.
So while I wholeheartedly support automation of network configuration, that is
not a magic bullet. You also need to pay attention to the whole process, the
whole chain of information flow.

And there are other things that may be even more effective such as hiding your
human errors. This is commonly called a "maintenance window" and it involves
an absolute ban on making any network change, no matter how trivial, outside
of a maintenance window. The human error can still occur but because it is
in a maintenance window, the customer either doesn't notice, or if it is planned
maintenance, they don't complain because they are expecting a bit of disruption
and have agreed to the planned maintenance window.

That only leaves break-fix work which is where the most skilled and trusted
engineers work on the live network outside of maintenance windows to fix
stuff that is seriously broken. It sounds like the event in the original posting
was something like that, but perhaps not, because this kind of break-fix work
should only be done when there is already a customer-affecting issue.

By the way, even break-fix changes can, and should be, tested in a lab
environment before you push them onto the network.

--Michael Dillon



Re: Mitigating human error in the SP

2010-02-02 Thread Suresh Ramasubramanian
Never said it was, and never said foolproof either.  Minimizing the
chance of error is what I'm after - and ssh'ing in + hand typing
configs isn't the way to go.

Use a known good template to provision stuff - and automatically
deploy it, and the chances of human error go down quite a lot. Getting
it down to zero defect from there is another kettle of fish altogether
- a much more expensive with dev / test, staging and production
environments, documented change processes, maintenance windows etc.

On Wed, Feb 3, 2010 at 7:00 AM, Michael Dillon
 wrote:
>
> It is easy to create a tangled mess of OSS applications that are glued 
> together
> by lots of manual human effort creating numerous opportunities for human 
> error.
> So while I wholeheartedly support automation of network configuration, that is
> not a magic bullet. You also need to pay attention to the whole process, the
> whole chain of information flow.



-- 
Suresh Ramasubramanian (ops.li...@gmail.com)



Re: Mitigating human error in the SP

2010-02-02 Thread Michael Dillon
> The actual error happened when someone was troubleshooting a turn-up,
> where in the past the customer in question has had their ethertype set
> wrong.  It wasn't a provisioning problem as much as someone
> troubleshooting why it didn't come up with the customer.  Ironically,
> the NOC was on the phone when it happened, and the switch was rebooted
> almost immediately and the outage lasted 5 minutes.

This is why large operators have a "ready for service" protocol. The customer
is never billed until it is officially RFS, and to make it RFS requires more
than an operational network, it also requires the customer to agree in writing
that they have a fully functional connection.

This is another way of hiding human error, because now the up-down-up is
just part of the provisioning process. There is a record of the RFS date-time
so if the customer complains about an outage BEFORE that point, they can
be politely reminded that when RFS happened and that charging does not
start until AFTER that point.

--Michael Dillon



Re: Mitigating human error in the SP

2010-02-02 Thread David Hiers
If your manager pretends that they can manage humans without a few
well-worn human factor books on their shelf, quit.




David









On Tue, Feb 2, 2010 at 5:36 PM, Michael Dillon
 wrote:
>> The actual error happened when someone was troubleshooting a turn-up,
>> where in the past the customer in question has had their ethertype set
>> wrong.  It wasn't a provisioning problem as much as someone
>> troubleshooting why it didn't come up with the customer.  Ironically,
>> the NOC was on the phone when it happened, and the switch was rebooted
>> almost immediately and the outage lasted 5 minutes.
>
> This is why large operators have a "ready for service" protocol. The customer
> is never billed until it is officially RFS, and to make it RFS requires more
> than an operational network, it also requires the customer to agree in writing
> that they have a fully functional connection.
>
> This is another way of hiding human error, because now the up-down-up is
> just part of the provisioning process. There is a record of the RFS date-time
> so if the customer complains about an outage BEFORE that point, they can
> be politely reminded that when RFS happened and that charging does not
> start until AFTER that point.
>
> --Michael Dillon
>
>



Re: Mitigating human error in the SP

2010-02-02 Thread Steven Bellovin

On Feb 2, 2010, at 8:36 PM, Suresh Ramasubramanian wrote:

> Never said it was, and never said foolproof either.  Minimizing the
> chance of error is what I'm after - and ssh'ing in + hand typing
> configs isn't the way to go.
> 
> Use a known good template to provision stuff - and automatically
> deploy it, and the chances of human error go down quite a lot. Getting
> it down to zero defect from there is another kettle of fish altogether
> - a much more expensive with dev / test, staging and production
> environments, documented change processes, maintenance windows etc.
> 
Yup.  Or use a database and a template-driven compiler.  See "Configuration 
management and security", IEEE Journal on Selected Areas in Communications, 
27(3):268-274, April 2009, by myself and Randy Bush, 
http://www.cs.columbia.edu/~smb/papers/config-jsac.pdf (the system described is 
Randy's work, from many years ago).



--Steve Bellovin, http://www.cs.columbia.edu/~smb








Research Project: Internet capacity during pandemic events

2010-02-02 Thread haska

Hello everyone,

My name is Mike Haska, and I am a graduate student at the University  
of Alberta. I am conducting research into Internet capacity issues  
during pandemic events. In order to analyze certain aspects of this  
topic, I need to get in touch with representatives from the major  
Internet service providers in Canada - some of whom, I am hoping, are  
members of this distribution.


Specifically, I am looking to get in touch with individuals who are  
familiar with the structure of their network and with any pandemic  
contingency plans that are in place within their organization.


If you think you may be able to assist, or if you know of anyone who  
could, please contact me at (haska at ualberta.ca) and I will provide  
further information on all aspects of this study.


To put your mind at ease - I'm not fishing around for sensitive  
information or your root passwords; I'm looking for an overview of  
your policies and your responses to hypothetical scenarios. Your  
confidentiality is assured and you are welcome to preview all the  
questions to be asked before you commit to participating in any way.


I feel this topic has important implications to network operators in  
Canada, so any support you can offer to this research project is  
greatly appreciated.


Best regards,
-Mike



Re: Research Project: Internet capacity during pandemic events

2010-02-02 Thread Sean Donelan


http://www.ncs.gov/library/pubs/Pandemic%20Comms%20Impact%20Study%20(December%202007).pdf

Department of Homeland Security
Pandemic Influenza Impact on Communications Networks Study
December 2007