Re: latency (was: RE: cooling door)

2008-03-30 Thread Mikael Abrahamsson


On Sat, 29 Mar 2008, Frank Coluccio wrote:


Understandably, some applications fall into a class that requires very-short
distances for the reasons you cite, although I'm still not comfortable with the
setup you've outlined. Why, for example, are you showing two Ethernet switches
for the fiber option (which would naturally double the switch-induced latency),
but only a single switch for the UTP option?


Yes, I am showing a case where you have switches in each rack so each rack 
is uplinked with a fiber to a central aggregation switch, as opposed to 
having a lot of UTP from the rack directly into the aggregation switch.



Now, I'm comfortable in ceding this point. I should have made allowances for 
this
type of exception in my introductory post, but didn't, as I also omitted mention
of other considerations for the sake of brevity. For what it's worth, 
propagation
over copper is faster propagation over fiber, as copper has a higher nominal
velocity of propagation (NVP) rating than does fiber, but not significantly
greater to cause the difference you've cited.


The 2/3 speed of light in fiber as opposed to propagation speed in copper 
was not in my mind.



As an aside, the manner in which o-e-o and e-o-e conversions take place when
transitioning from electronic to optical states, and back, affects latency
differently across differing link assembly approaches used. In cases where 
10Gbps


My opinion is that the major factors of added end-to-end latency in my 
example is that the packet has to be serialisted three times as opposed to 
once and there are three lookups instead of one. Lookups take time, 
putting the packet on the wire take time.


Back in the 10 megabit/s days, there were switches that did cut-through, 
ie if the output port was not being used the instant the packet came in, 
it could start to send out the packet on the outgoing port before it was 
completely taken in on the incoming port (when the header was received, 
the forwarding decision was taken and the equipment would start to send 
the packet out before it was completely received from the input port).



By chance, is the deserialization you cited earlier, perhaps related to this
inverse muxing process? If so, then that would explain the disconnect, and if it
is so, then one shouldn't despair, because there is a direct path to avoiding 
this.


No, it's the store-and-forward architecture used in all modern equipment 
(that I know of). A packet has to be completely taken in over the wire 
into a buffer, a lookup has to be done as to where this packet should be 
put out, it needs to be sent over a bus or fabric, and then it has to be 
clocked out on the outgoing port from another buffer. This adds latency in 
each switch hop on the way.


As Adrian Chadd mentioned in the email sent after yours, this can of 
course be handled by modifying or creating new protocols that handle this 
fact. It's just that with what is available today, this is a problem. Each 
directory listing or file access takes a bit longer over NFS with added 
latency, and this reduces performance in current protocols.


Programmers who do client/server applications are starting to notice this 
and I know of companies that put latency-inducing applications in the 
development servers so that the programmer is exposed to the same 
conditions in the development environment as in the real world. This means 
for some that they have to write more advanced SQL queries to get 
everything done in a single query instead of asking multiple and changing 
the queries depending on what the first query result was.


Also, protocols such as SMB and NFS that use message blocks over TCP have 
to be abandonded and replaced with real streaming protocols and large 
window sizes. Xmodem wasn't a good idea back then, it's not a good idea 
now (even though the blocks now are larger than the 128 bytes of 20-30 
years ago).


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: latency (was: RE: cooling door)

2008-03-30 Thread Paul Vixie

[EMAIL PROTECTED] (Mikael Abrahamsson) writes:

 ...
 Back in the 10 megabit/s days, there were switches that did cut-through, 
 ie if the output port was not being used the instant the packet came in, 
 it could start to send out the packet on the outgoing port before it was 
 completely taken in on the incoming port (when the header was received, 
 the forwarding decision was taken and the equipment would start to send 
 the packet out before it was completely received from the input port).

had packet sizes scaled with LAN transmission speed, i would agree.  but
the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down
by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us)
and another factor of 10 for 10GE (~1.2us).  even those of us using jumbo
grams are getting less serialization delay at 10GE (~7us) than we used to
get on a DEC LANbridge 100 which did cutthrough after the header (~28us).

 ..., it's the store-and-forward architecture used in all modern equipment 
 (that I know of). A packet has to be completely taken in over the wire 
 into a buffer, a lookup has to be done as to where this packet should be 
 put out, it needs to be sent over a bus or fabric, and then it has to be 
 clocked out on the outgoing port from another buffer. This adds latency in 
 each switch hop on the way.

you may be right about the TCAM lookup times having an impact, i don't know
if they've kept pace with transmission speed either.  but someone's theory
here yesterday that software (kernel and IP stack) architecture is more
likely to be at fault, there are still plenty of queue it here, it'll go
out next time the device or timer interrupt handler fires and this can be
in the ~1ms or even ~10ms range.  this doesn't show up on file transfer
benchmarks since packet trains usually do well, but miss an ACK, or send
a ping, and you'll see a shelf.

 As Adrian Chadd mentioned in the email sent after yours, this can of 
 course be handled by modifying or creating new protocols that handle this 
 fact. It's just that with what is available today, this is a problem. Each 
 directory listing or file access takes a bit longer over NFS with added 
 latency, and this reduces performance in current protocols.

here again it's not just the protocols, it's the application design, that 
has to be modernized.  i've written plenty of code that tries to cut down
the number of bytes of RAM that get copied or searched, which ends up not
going faster on modern CPUs (or sometimes going slower) because of the
minimum transfer size between L2 and DRAM.  similarly, a program that sped
up on a VAX 780 when i taught it to match the size domain of its disk I/O
to the 512-byte size of a disk sector, either fails to go faster on modern
high-bandwidth I/O and log structured file systems, or actually goes slower.

in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what
used to be performance gains through efficiency.  there's plenty enough new
latency (expressed as a factor of clock speed) in the path to DRAM, the
path to SATA, and the path through ZFS, to make it necessary that any
application that wants modern performance has to be re-oriented to take
modern (which in this case means, streaming) approach.  correspondingly,
applications which take this approach, don't suffer as much when they move
from SATA to NFS or iSCSI.

 Programmers who do client/server applications are starting to notice this
 and I know of companies that put latency-inducing applications in the
 development servers so that the programmer is exposed to the same
 conditions in the development environment as in the real world.  This
 means for some that they have to write more advanced SQL queries to get
 everything done in a single query instead of asking multiple and changing
 the queries depending on what the first query result was.

while i agree that turning one's SQL into transactions that are more like
applets (such that, for example, you're sending over the content for a
potential INSERT that may not happen depending on some SELECT, because the
end-to-end delay of getting back the SELECT result is so much higher than
the cost of the lost bandwidth from occasionally sending a useless INSERT)
will take better advantage of modern hardware and software architecture
(which means in this case, streaming), it's also necessary to teach our
SQL servers that ZFS recordsize=128k means what it says, for file system
reads and writes.  a lot of SQL users who have moved to a streaming model
using a lot of transactions have merely seen their bottleneck move from the
network into the SQL server.

 Also, protocols such as SMB and NFS that use message blocks over TCP have 
 to be abandonded and replaced with real streaming protocols and large 
 window sizes. Xmodem wasn't a good idea back then, it's not a good idea 
 now (even though the blocks now are larger than the 128 bytes of 20-30 
 years ago).

i think xmodem and kermit moved 

minimizing BGP link failure detection time

2008-03-30 Thread Ang Kah Yik


Hi all,

I'm currently getting started out with BGP so if I'm asking the obvious, 
please forgive my ignorance.


On the topic of BGP convergence, may I ask what are the current best 
practices for ensuring rapid link failure detection - especially when 
dealing with an interface that is connected to 3rd party L2 
infrastructure? e.g. an interface connected to an EBGP peer via Metro-E


As far as I'm aware, the most commonly used method to minimize link 
failure detection time is to tune the keepalive/hold timers. I 
understand that there are alternatives to this such as BFD and next hop 
tracking - but support for this is limited on certain platforms.


I have looked through this presentation from APNIC 21.
http://www.apnic.net/meetings/21/docs/sigs/routing/routing-pres-hughes-bgp.pdf

What do you guys think of the recommendation for timers? (5s and 15s for 
keepalive and hold timers respectively). However, since the presentation 
is more than a year old, has BFD become a better solution to this?


Looking forward to your suggestions.

Thanks
--
ANG Kah Yik (bangky)



RE: latency (was: RE: cooling door)

2008-03-30 Thread Fred Reimer

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
 Paul Vixie
 Sent: Sunday, March 30, 2008 10:35 AM
 To: nanog@merit.edu
 Subject: Re: latency (was: RE: cooling door)
 
 
 [EMAIL PROTECTED] (Mikael Abrahamsson) writes:
 
  Programmers who do client/server applications are starting to notice
 this
  and I know of companies that put latency-inducing applications in the
  development servers so that the programmer is exposed to the same
  conditions in the development environment as in the real world.  This
  means for some that they have to write more advanced SQL queries to
 get
  everything done in a single query instead of asking multiple and
 changing
  the queries depending on what the first query result was.
 
 while i agree that turning one's SQL into transactions that are more
 like
 applets (such that, for example, you're sending over the content for a
 potential INSERT that may not happen depending on some SELECT, because
 the
 end-to-end delay of getting back the SELECT result is so much higher
 than
 the cost of the lost bandwidth from occasionally sending a useless
 INSERT)
 will take better advantage of modern hardware and software architecture
 (which means in this case, streaming), it's also necessary to teach our
 SQL servers that ZFS recordsize=128k means what it says, for file
 system
 reads and writes.  a lot of SQL users who have moved to a streaming
 model
 using a lot of transactions have merely seen their bottleneck move from
 the
 network into the SQL server.

I have seen first hand (worked for a company and diagnosed issues with their
applications from a network perspective, prompting a major re-write of the
software), where developers work with their SQL servers, application
servers, and clients all on the same L2 switch.  They often do not duplicate
the environment they are going to be deploying the application into, and
therefore assume that the network is going to perform the same.  So, when
there are problems they blame the network.  Often the root problem is the
architecture of the application itself and not the network.  All the
servers and client workstations have Gigabit connections to the same L2
switch, and they are honestly astonished when there are issues running the
same application over a typical enterprise network with clients of different
speeds (10/100/1000, full and/or half duplex).  Surprisingly, to me, they
even expect the same performance out of a WAN.

Application developers today need a network guy on their team.  One who
can help them understand how their proposed application architecture would
perform over various customer networks, and that can make suggestions as to
how the architecture can be modified to allow the performance of the
application to take advantage of the networks' capabilities.   Mikael (seems
to) complain that developers have to put latency inducing applications into
the development environment.  I'd say that those developers are some of the
few who actually have a clue, and are doing the right thing.

  Also, protocols such as SMB and NFS that use message blocks over TCP
 have
  to be abandonded and replaced with real streaming protocols and large
  window sizes. Xmodem wasn't a good idea back then, it's not a good
 idea
  now (even though the blocks now are larger than the 128 bytes of 20-
 30
  years ago).
 
 i think xmodem and kermit moved enough total data volume (expressed as
 a
 factor of transmission speed) back in their day to deserve an
 honourable
 retirement.  but i'd agree, if an application is moved to a new
 environment
 where everything (DRAM timing, CPU clock, I/O bandwidth, network
 bandwidth,
 etc) is 10X faster, but the application only runs 2X faster, then it's
 time
 to rethink more.  but the culprit will usually not be new network
 latency.
 --
 Paul Vixie

It may be difficult to switch to a streaming protocol if the underlying data
sets are block-oriented.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697




smime.p7s
Description: S/MIME cryptographic signature


RE: latency (was: RE: cooling door)

2008-03-30 Thread Mikael Abrahamsson


On Sun, 30 Mar 2008, Fred Reimer wrote:


application to take advantage of the networks' capabilities.   Mikael (seems
to) complain that developers have to put latency inducing applications into
the development environment.  I'd say that those developers are some of the
few who actually have a clue, and are doing the right thing.


I was definately not complaining, I brought it up as an example where 
developers have clue and where they're doing the right thing.


I've too often been involved in customer complaints which ended up being 
the fault of Microsoft SMB and the customers having the firm idea that it 
must be a network problem since MS is a world standard and that can't be 
changed. Even proposing to change TCP Window settings to get FTP transfers 
quicker is met with the same sceptisism.


Even after describing to them about the propagation delay of light in 
fiber and the physical limitations, they're still very suspicious about it 
all.


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: latency (was: RE: cooling door)

2008-03-30 Thread Steven M. Bellovin

On Sun, 30 Mar 2008 13:03:18 +0800
Adrian Chadd [EMAIL PROTECTED] wrote:
 
 Oh, and kernel hz tickers can have similar effects on network
 traffic, if the application does dumb stuff. If you're (un)lucky then
 you may see 1 or 2ms of delay between packet input and scheduling
 processing. This doesn't matter so much over 250ms + latent links but
 matters on 0.1ms - 1ms latent links.
 
 (Can someone please apply some science to this and publish best
 practices please?)
 
There's been a lot of work done on TCP throughput.  Roughly speaking,
and holding everything else constant, throughput is linear in the round
trip time.  That is, if you double the RTT -- even from .1 ms to .2 ms
-- you halve the throughput on (large) file transfers.  See
http://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html for one
summary; feed tcp throughput equation into your favorite search
engine for a lot more references.  Another good reference is RFC 3448,
which relates throughput to packet size (also a linear factor, but if
serialization delay is significant then increasing the packet size will
increase the RTT), packet loss rate, the TCP retransmission timeout
(which can be approximated as 4x the RTT), and the number of packets
acknowledged by a single TCP acknowledgement.

On top of that, there are lots of application issues, as a number of
people have pointed out.

--Steve Bellovin, http://www.cs.columbia.edu/~smb


RE: latency (was: RE: cooling door)

2008-03-30 Thread Fred Reimer
Thanks for the clarification; that's why I put the seems to in the reply.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
 Mikael Abrahamsson
 Sent: Sunday, March 30, 2008 12:30 PM
 To: nanog@merit.edu
 Subject: RE: latency (was: RE: cooling door)
 
 
 On Sun, 30 Mar 2008, Fred Reimer wrote:
 
  application to take advantage of the networks' capabilities.   Mikael
 (seems
  to) complain that developers have to put latency inducing
 applications into
  the development environment.  I'd say that those developers are some
 of the
  few who actually have a clue, and are doing the right thing.
 
 I was definately not complaining, I brought it up as an example where
 developers have clue and where they're doing the right thing.
 
 I've too often been involved in customer complaints which ended up
 being
 the fault of Microsoft SMB and the customers having the firm idea that
 it
 must be a network problem since MS is a world standard and that can't
 be
 changed. Even proposing to change TCP Window settings to get FTP
 transfers
 quicker is met with the same sceptisism.
 
 Even after describing to them about the propagation delay of light in
 fiber and the physical limitations, they're still very suspicious about
 it
 all.
 
 --
 Mikael Abrahamssonemail: [EMAIL PROTECTED]


smime.p7s
Description: S/MIME cryptographic signature


RE: latency (was: RE: cooling door)

2008-03-30 Thread Buhrmaster, Gary

 

 ... feed tcp throughput equation into your favorite search
 engine for a lot more references. 

There has been a lot of work in some OS stacks
(Vista and recent linux kernels) to enable TCP
auto-tuning (of one form or another), which is
attempting to hide some of the worst of the TCP
uglynesses from the application/end-users.  I
am not convinced this is always a good thing,
since having the cruft exposed to the developers
(in particular) means one needs to plan for
errors and less than ideal cases.

Gary



Re: cooling door

2008-03-30 Thread Henry Yen

Perhaps this is apropos:

  Linkname: Slashdot | Iceland Woos Data Centers As Power Costs Soar
   URL: http://hardware.slashdot.org/hardware/08/03/29/2331218.shtml

On Sat, Mar 29, 2008 at 23:29:18PM -0400, Robert Boyle wrote:
 
 At 02:11 PM 3/29/2008, Alex Pilosov wrote:
 Can someone please, pretty please with sugar on top, explain the point
 behind high power density?
 
 More equipment in your existing space means more revenue and more profit.
 
 Raw real estate is cheap (basically, nearly free). Increasing power
 density per sqft will *not* decrease cost, beyond 100W/sqft, the real
 estate costs are a tiny portion of total cost. Moving enough air to cool
 400 (or, in your case, 2000) watts per square foot is *hard*.
 
 It depends on where you are located, but I understand what you are 
 saying. However, the space is the cheap part. Installing the 
 electrical power, switchgear, ATS gear, Gensets, UPS units, power 
 distribution, cable/fiber distribution, connectivity to the 
 datacenter, core and distribution routers/switches are all basically 
 stepped incremental costs. If you can leverage the existing floor 
 infrastructure then you maximize the return on your investment.
 
 I've started to recently price things as cost per square amp. (That is,
 1A power, conditioned, delivered to the customer rack and cooled). Space
 is really irrelevant - to me, as colo provider, whether I have 100A going
 into a single rack or 5 racks, is irrelevant. In fact, my *costs*
 (including real estate) are likely to be lower when the load is spread
 over 5 racks. Similarly, to a customer, all they care about is getting
 their gear online, and can care less whether it needs to be in 1 rack or
 in 5 racks.
 
 I don't disagree with what you have written above, but if you can get 
 100A into all 5 racks (and cool it!), then you have five times the 
 revenue with the same fixed infrastructure costs (with the exception 
 of a bit more power, GenSet, UPS and cooling, but the rest of my 
 costs stay the same.)

-- 
Henry Yen   Aegis Information Systems, Inc.
Senior Systems Programmer   Hicksville, New York


Re: cooling door

2008-03-30 Thread Matthew Petach

On 3/29/08, Alex Pilosov [EMAIL PROTECTED] wrote:

 Can someone please, pretty please with sugar on top, explain the point
  behind high power density?

  Raw real estate is cheap (basically, nearly free). Increasing power
  density per sqft will *not* decrease cost, beyond 100W/sqft, the real
  estate costs are a tiny portion of total cost. Moving enough air to cool
  400 (or, in your case, 2000) watts per square foot is *hard*.

  I've started to recently price things as cost per square amp. (That is,
  1A power, conditioned, delivered to the customer rack and cooled). Space
  is really irrelevant - to me, as colo provider, whether I have 100A going
  into a single rack or 5 racks, is irrelevant. In fact, my *costs*
  (including real estate) are likely to be lower when the load is spread
  over 5 racks. Similarly, to a customer, all they care about is getting
  their gear online, and can care less whether it needs to be in 1 rack or
  in 5 racks.

  To rephrase vijay, what is the problem being solved?

I have not yet found a way to split the ~10kw power/cooling
demand of a T1600 across 5 racks.  Yes, when I want to put
a pair of them into an exchange point, I can lease 10 racks,
put T1600s in two of them, and leave the other 8 empty; but
that hasn't helped either me the customer or the exchange
point provider; they've had to burn more real estate for empty
racks that can never be filled, I'm paying for floor space in my
cage that I'm probably going to end up using for storage rather
than just have it go to waste, and we still have the problem of
two very hot spots that need relatively 'point' cooling solutions.

There are very specific cases where high density power and
cooling cannot simply be spread out over more space; thus,
research into areas like this is still very valuable.

Matt


Re: latency (was: RE: cooling door)

2008-03-30 Thread Paul Vixie

[EMAIL PROTECTED] (Buhrmaster, Gary) writes:

  ... feed tcp throughput equation into your favorite search
  engine for a lot more references.=20
 
 There has been a lot of work in some OS stacks
 (Vista and recent linux kernels) to enable TCP
 auto-tuning (of one form or another), ...

on http://www.onlamp.com/pub/a/bsd/2008/02/26/whats-new-in-freebsd-70.html
i'd read that freebsd 7 also has some tcp auto tuning logic.
-- 
Paul Vixie


Re: latency (was: RE: cooling door)

2008-03-30 Thread Steven M. Bellovin

On 30 Mar 2008 21:00:25 +
Paul Vixie [EMAIL PROTECTED] wrote:

 
 [EMAIL PROTECTED] (Buhrmaster, Gary) writes:
 
   ... feed tcp throughput equation into your favorite search
   engine for a lot more references.=20
  
  There has been a lot of work in some OS stacks
  (Vista and recent linux kernels) to enable TCP
  auto-tuning (of one form or another), ...
 
 on
 http://www.onlamp.com/pub/a/bsd/2008/02/26/whats-new-in-freebsd-70.html
 i'd read that freebsd 7 also has some tcp auto tuning logic.

There are certain things that the stack can do, like auto-adjusting the
window size, tuning retransmission intervals, etc.  But other problem
are at the application layer, as you noted a few posts ago.

--Steve Bellovin, http://www.cs.columbia.edu/~smb


Re: latency (was: RE: cooling door)

2008-03-30 Thread Frank Coluccio

Mikael, I see your points more clearly now in respect to the number of turns
affecting latency. In analyzing this further, however, it becomes apparent that
the collapsed backbone regimen may, in many scenarios offer far fewer
opportunities for turns, and more occasions for others. 

To the former class of winning applications, because it eliminates local
access/distribution/aggregation switches and then an entire lineage of
hierarchical in-building routing elements. 

To the latter class of loser applications, no doubt, if a collapsed backbone
design were to be dropped-shipped in place on a Friday Evening, as is, the there
would surely be some losers that would require re-designing, or maybe simply 
some
re-tuning, or they may need to be treated as one-offs entirely. 

BTW, in case there is any confusion concerning my earlier allusion to SMB, it
had nothing to do with the size of message blocks, protocols, or anything else
affecting a transaction profile's latency numbers. Instead, I was referring to
the _s_mall-to-_m_edium-sized _b_usiness class of customers that the cable
operator Bright House Networks was targeting with its passive optical network
business-grade offering, fwiw.
--

Mikael, All, I truly appreciate the comments and criticisms you've offered on
this subject up until now in connection with the upstream hypothesis that began
with a post by Michael Dillon. However, I shall not impose this topic on the
larger audience any further. I would, however, welcome a continuation _offlist _
with anyone so inclined. If anything worthwhile results I'd be pleased to post 
it
here at a later date. TIA.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sun Mar 30  3:17 , Mikael Abrahamsson  sent:

On Sat, 29 Mar 2008, Frank Coluccio wrote:

 Understandably, some applications fall into a class that requires very-short
 distances for the reasons you cite, although I'm still not comfortable with 
 the
 setup you've outlined. Why, for example, are you showing two Ethernet 
 switches
 for the fiber option (which would naturally double the switch-induced 
 latency),
 but only a single switch for the UTP option?

Yes, I am showing a case where you have switches in each rack so each rack 
is uplinked with a fiber to a central aggregation switch, as opposed to 
having a lot of UTP from the rack directly into the aggregation switch.

 Now, I'm comfortable in ceding this point. I should have made allowances for 
 this
 type of exception in my introductory post, but didn't, as I also omitted 
 mention
 of other considerations for the sake of brevity. For what it's worth, 
 propagation
 over copper is faster propagation over fiber, as copper has a higher nominal
 velocity of propagation (NVP) rating than does fiber, but not significantly
 greater to cause the difference you've cited.

The 2/3 speed of light in fiber as opposed to propagation speed in copper 
was not in my mind.

 As an aside, the manner in which o-e-o and e-o-e conversions take place when
 transitioning from electronic to optical states, and back, affects latency
 differently across differing link assembly approaches used. In cases where 
 10Gbps

My opinion is that the major factors of added end-to-end latency in my 
example is that the packet has to be serialisted three times as opposed to 
once and there are three lookups instead of one. Lookups take time, 
putting the packet on the wire take time.

Back in the 10 megabit/s days, there were switches that did cut-through, 
ie if the output port was not being used the instant the packet came in, 
it could start to send out the packet on the outgoing port before it was 
completely taken in on the incoming port (when the header was received, 
the forwarding decision was taken and the equipment would start to send 
the packet out before it was completely received from the input port).

 By chance, is the deserialization you cited earlier, perhaps related to 
 this
 inverse muxing process? If so, then that would explain the disconnect, and 
 if it
 is so, then one shouldn't despair, because there is a direct path to avoiding
this.

No, it's the store-and-forward architecture used in all modern equipment 
(that I know of). A packet has to be completely taken in over the wire 
into a buffer, a lookup has to be done as to where this packet should be 
put out, it needs to be sent over a bus or fabric, and then it has to be 
clocked out on the outgoing port from another buffer. This adds latency in 
each switch hop on the way.

As Adrian Chadd mentioned in the email sent after yours, this can of 
course be handled by modifying or creating new protocols that handle this 
fact. It's just that with what is available today, this is a problem. Each 
directory listing or file access takes a bit longer over NFS with added 
latency, and this reduces performance in current protocols.

Programmers who do client/server applications are starting to notice this 
and I know of companies 

Re: latency (was: RE: cooling door)

2008-03-30 Thread Frank Coluccio

Silly me. I didn't mean turns alone, but also intended to include the number 
of
state transitions (e-o, o-e, e-e, etc.) in my preceding reply, as well.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sun Mar 30 16:47 , Frank Coluccio  sent:

Mikael, I see your points more clearly now in respect to the number of turns
affecting latency. In analyzing this further, however, it becomes apparent that
the collapsed backbone regimen may, in many scenarios offer far fewer
opportunities for turns, and more occasions for others. 

To the former class of winning applications, because it eliminates local
access/distribution/aggregation switches and then an entire lineage of
hierarchical in-building routing elements. 

To the latter class of loser applications, no doubt, if a collapsed backbone
design were to be dropped-shipped in place on a Friday Evening, as is, the 
there
would surely be some losers that would require re-designing, or maybe simply 
some
re-tuning, or they may need to be treated as one-offs entirely. 

BTW, in case there is any confusion concerning my earlier allusion to SMB, it
had nothing to do with the size of message blocks, protocols, or anything else
affecting a transaction profile's latency numbers. Instead, I was referring to
the _s_mall-to-_m_edium-sized _b_usiness class of customers that the cable
operator Bright House Networks was targeting with its passive optical network
business-grade offering, fwiw.
--

Mikael, All, I truly appreciate the comments and criticisms you've offered on
this subject up until now in connection with the upstream hypothesis that began
with a post by Michael Dillon. However, I shall not impose this topic on the
larger audience any further. I would, however, welcome a continuation _offlist 
_
with anyone so inclined. If anything worthwhile results I'd be pleased to post 
it
here at a later date. TIA.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sun Mar 30  3:17 , Mikael Abrahamsson  sent:

On Sat, 29 Mar 2008, Frank Coluccio wrote:

 Understandably, some applications fall into a class that requires very-short
 distances for the reasons you cite, although I'm still not comfortable with 
 the
 setup you've outlined. Why, for example, are you showing two Ethernet 
 switches
 for the fiber option (which would naturally double the switch-induced 
 latency),
 but only a single switch for the UTP option?

Yes, I am showing a case where you have switches in each rack so each rack 
is uplinked with a fiber to a central aggregation switch, as opposed to 
having a lot of UTP from the rack directly into the aggregation switch.

 Now, I'm comfortable in ceding this point. I should have made allowances 
 for this
 type of exception in my introductory post, but didn't, as I also omitted 
 mention
 of other considerations for the sake of brevity. For what it's worth, 
 propagation
 over copper is faster propagation over fiber, as copper has a higher nominal
 velocity of propagation (NVP) rating than does fiber, but not significantly
 greater to cause the difference you've cited.

The 2/3 speed of light in fiber as opposed to propagation speed in copper 
was not in my mind.

 As an aside, the manner in which o-e-o and e-o-e conversions take place when
 transitioning from electronic to optical states, and back, affects latency
 differently across differing link assembly approaches used. In cases where 
 10Gbps

My opinion is that the major factors of added end-to-end latency in my 
example is that the packet has to be serialisted three times as opposed to 
once and there are three lookups instead of one. Lookups take time, 
putting the packet on the wire take time.

Back in the 10 megabit/s days, there were switches that did cut-through, 
ie if the output port was not being used the instant the packet came in, 
it could start to send out the packet on the outgoing port before it was 
completely taken in on the incoming port (when the header was received, 
the forwarding decision was taken and the equipment would start to send 
the packet out before it was completely received from the input port).

 By chance, is the deserialization you cited earlier, perhaps related to 
 this
 inverse muxing process? If so, then that would explain the disconnect, and 
 if it
 is so, then one shouldn't despair, because there is a direct path to 
 avoiding
this.

No, it's the store-and-forward architecture used in all modern equipment 
(that I know of). A packet has to be completely taken in over the wire 
into a buffer, a lookup has to be done as to where this packet should be 
put out, it needs to be sent over a bus or fabric, and then it has to be 
clocked out on the outgoing port from another buffer. This adds latency in 
each switch hop on the way.

As Adrian Chadd mentioned in the email sent after yours, this can of 
course be handled by modifying or creating new protocols that handle this 
fact. It's just that with 

Re: cooling door

2008-03-30 Thread Brandon Butterworth

 I can lease 10 racks,
 put T1600s in two of them, and leave the other 8 empty; but
 that hasn't helped either me the customer or the exchange
 point provider; they've had to burn more real estate for empty
 racks that can never be filled

Seems fine to me, you used your power in two racks, to get more power
will cost the provider more so you will have to pay more. This is just
an extreme example of blade servers causing half a rack to be empty
the facts haven't changed no matter how bad seeing empty racks feels

Waste implies you could use it for no additional cost, old pricing
models were vulnerable to gaming on combinations they'd not thought
through.

It might be easier for people to understand in these cases
if the provider put yellow/black stripe tape over the unused
space with a big sign saying not yours

 I'm paying for floor space in my
 cage that I'm probably going to end up using for storage rather
 than just have it go to waste

That's nice, I hate cages where you have no room for tools or to work
and the kit hits the walls when you try and unrack it. I've no
idea how they fit a normal engineer in some

 and we still have the problem of
 two very hot spots that need relatively 'point' cooling solutions.

Accepted, big fan on the back of the rack? Plenty of empty
space for such solutions.

High density servers seem to be vendor driven, they can
charge more and make you buy the switch and other ancillaries
you'd likely choose cheaper from others. And when new models come
out the whole lot gets replaced rather than just the odd few U
of servers. The convenience may be worth the high price for some
situations

Density is just another DC design parameter to be optimised for
profit


brandon

-- 
You know a nanog thread has gone on too long when I
overcome inertia and post. More science please.



Re: NXDOMAIN data needed for survey

2008-03-30 Thread Valdis . Kletnieks
On Fri, 28 Mar 2008 14:25:22 PDT, Scott Weeks said:

 Why would you assume this?  That wouldn't be my first assumption after
 reading the thread.  I would assume folks would Do The Right Thing.

There is no Right Thing that is *so* obviously right that some significant
fraction of the community will refuse to do it anyhow.  Witness the flame-fests
we have regarding ingress/egress filtering, BGP prefix filtering/validation,
harboring spammers and other similar low-lifes, and so on... 


pgpfCahbZeVyH.pgp
Description: PGP signature


Re: cooling door

2008-03-30 Thread paul

 I have a need for a 1U that will just act as a backup (higher MX) mailserver
 and, occasionally, deliver some large .iso images at under 10Mbit/Sec 
 :) And I'm sure that there are other technically saavy users just like me
 that could help you out with this surplus space!  :)

see http://www.vix.com/personalcolo/ for some places to host that backup MX.
(note, i have no business affiliation with any of the entities listed there.)


Re: rack power question

2008-03-30 Thread vijay gill
On Sun, Mar 23, 2008 at 2:15 PM, [EMAIL PROTECTED] wrote:


 Given that power and HVAC are such key issues in building
 big datacenters, and that fiber to the office is now a reality
 virtually everywhere, one wonders why someone doesn't start
 building out distributed data centers. Essentially, you put
 mini data centers in every office building, possibly by
 outsourcing the enterprise data centers. Then, you have a
 more tractable power and HVAC problem. You still need to
 scale things but it since each data center is roughly comparable
 in size it is a lot easier than trying to build out one
 big data center.


Latency matters. Also, multiple small data centers will be more expensive
than a few big ones, especially if you are planning on average load vs peak
load heat rejection models.



 If you move all the entreprise services onto virtual servers
 then you can free up space for colo/hosting services.


There is no such thing in my experience. You free up a few thousand cores,
they get consumed by the next lower priority project that was sitting around
waiting on cpu.



 You can even still sell to bulk customers because few will
 complain that they have to deliver equipment to three
 dara centers, one two blocks west, and another three blocks
 north. X racks spread over 3 locations will work for everyone
 except people who need the physical proximity for clustering
 type applications.


Racks spread over n locations that aren't within a campus will be more
expensive to connect.

/vijay



 --Michael Dillon