On Sat, 2006-07-01 at 16:26 +0200, Andi Kleen wrote:
> On Saturday 01 July 2006 01:01, Tom Tucker wrote:
> > On Fri, 2006-06-30 at 14:16 -0700, David Miller wrote:
> > 
> > > The TOE folks have tried to submit their hooks and drivers
> > > on several occaisions, and we've rejected it every time.
> > 
> > iWARP != TOE
> 
> Perhaps a good start of that discussion David asked for would 
> be if you could give us an overview of the differences
> and how you avoid the TOE problems.

I think Roland already gave the high-level overview. For those
interested in some of the details, the API for iWARP transports was
originally conceived independently from IB and is documented in the
RDMAC Verbs Specification found here:

http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf

The protocols, etc... are available here:
http://www.ietf.org/html.charters/rddp-charter.html

As Roland mentioned, the RDMAC verbs are *very* similar to the IB verbs
and so when we were thinking about how to design an API for iWARP we
concluded it would be best to leverage the tremendous amount of work
already done for IB by OpenFabrics and then work iteratively to extend
this API to include features unique to iWARP. This work has been ongoing
since September of 2005. 

There is an open source svn repository available for the iWARP source at
https://openib.org/svn/gen2/branches/iwarp.

There is also an open source NFS over RDMA implementation for Linux
available here that: http://sourceforge.net/projects/nfs-rdma.


So how do we avoid the TOE pitfalls with iWARP? I think it depends on
the pitfall. At the low level:

- Stale Network/Address Information: Path MTU Change, ICMP Redirect 
and ARP next hop changes need netlink notifier events so that hardware
can be updated when they change. I see this support as an extension (new
events) to an existing service and a relatively low-level of "parallel
stack integration". iSCSI and IB could also benefit from these events.

- Port Space Collision, i.e. socket app and rdma/iWARP apps collide on 
a port number: The RDMA CMA needs to be able to allocate and de-allocate
port numbers, however, the services that do this today are not exported
and would need some minor tweaking. iSCSI and IB benefit from these
services as well.

- netfilter rules, syn-flood, conn-rate, etc.... You pointed out that 
if connection establishment were done in the native stack (SYN,
SYN/ACK), that this would account for the bulk of the netfilter utility,
however, this probably results in falling into many of the TOE traps
people have issue with.

WRT to http://linux-net.osdl.org/index.php/TOE

Security Updates

"A TOE net stack is closed source firmware. Linux engineers have no way
to fix security issues that arise. As a result, only non-TOE users will
receive security updates, leaving random windows of vulnerability for
each TOE NIC's users."

- A Linux security update may or may not be relevant to a vendors
implementation. 

- If a vendor's implementation has a security issue then the customer
must rely on the vendor to fix it. This is no less true for iWARP than
for any adapter.

Point-in-time Solution

"Each TOE NIC has a limited lifetime of usefulness, because system
hardware rapidly catches up to TOE performance levels, and eventually
exceeds TOE performance levels. We saw this with 10mbit TOE, 100mbit
TOE, gigabit TOE, and soon with 10gig TOE."

- iWARP needs to do protocol processing in order to validate and
evaluate TCP payload in advance of direct data placement. This
requirement is independent of CPU speed. 

Different Network Behavior

"System administrators are quite familiar with how the Linux network
stack interoperates with the world at large. TOE is a black box, each
NIC requires re-examination of network behavior. Network scanners and
analysis tools must be updated, or they will provide faulty analysis."

- Native Linux Tools like tcpdump, netstat, etc... will not work as
expected. 

- Network Analyzers such as Finisar, etc... will work just fine.

Performance

"Experience has shown that TOE implementations require additional work
(programming the hardware, hardware-specific socket manipulation) to set
up and tear down connections. For connection intensive protocols such as
HTTP, TOE often underperforms."

- I suspect that connection rates for RDMA adapters fall well-below the
rates attainable with a dumb device. That said, all of the RDMA
applications that I know of are not connection intensive. Even for TOE,
the later HTTP versions makes connection rates less of an issue.

Hardware-specific limits 

"TOE NICs are more resource limited than your overall computer system.
This is most readily apparent under load, when trying to support
thousands of simultaneous connections. TOE NICs simply do not have the
memory resources to buffer thousands of connections, much less have the
CPU power to handle such loads. Further, each TOE NIC has different
resource limitations (often unpublished, only to be discovered at the
worst moments)."

- Any hardware device has this issue and so does iWARP

"Once resources are exhausted, TOE will either fall back to 100%
software net stack, defeating the purpose of TOE, or will deny service
to additional clients."

- A depleted iWARP adapter will simply fail the request. There is no
parallel iWARP stack to fall back on.

Resource-based denial-of-service attacks 

"If an attacker can discover the TOE NIC model in use, they can use this
information to enable resource-based algorithmic attacks. For example, a
SYN flood could potentially use up all TOE resources in a matter of
seconds. The TOE NIC will either stop accepting connections (complete
DoS), or will constantly bounce back to the software net stack."

- True of iWARP too.

RFC compliance 

"Linux is the most RFC-compliant network stack available. TOE can only
diminish this. Further, as a black box, each TOE NIC will have a
different level of RFC compliance, and different TCP/IP features they
do/don't support."

- True of iWARP too.

Linux features 

"TOE is by definition poorly integrated into Linux. TOE NICs will not
provide netfilter, packet scheduling, QoS, and many other features that
Linux users depend on. Or if they do provide this, they implement the
features in a vendor-specific manner. The featureset becomes
vendor-specific."

- This is the problem we're trying to solve...incrementally and
responsibly.

Requires vendor-specific tools 

"In order to configure a TOE NIC, hardware-specific tools are usually
required. This dramatically increases support costs."

- OpenFabrics is an attempt to solve this not only across vendors, but
also across transports (at this time IB and iWARP)

Poor user support 

"Linux engineers cannot provide an adequate level of support for TOE
users, and must instead refer users to the vendor -- who in all
likelihood cares more about non-Linux operating systems."

- This will certainly be true for iWARP early on.

Short term kernel maintenance 

"Supporting TOE requires massive, heavily invasive hooks into the
network stack. This increases the kernel maintenance burden on Linux
engineers, to support a solution Linux engineers have no control over."

- iWARP does not use sockets and does not share data structures with the
TCP stack. 
- It is not my opinion, however, that the patches in question consist of
"massive, heavily invasive hooks into the network stack". 

Long term user support 

"Linux has been in existence for over a decade, and some pieces of
decade-old hardware continue to be used and supported. In contrast, most
hardware vendors end-of-life (stop supporting) their hardware after just
a few years. For most hardware vendors, the sales of old hardware simply
do not justify dedicating engineers to Linux support for many years."

- If the hooks are not hideous and invasive then support should not be
any more onerous than for any other hardware device.

Long term kernel maintenance 

"Similarly, kernel engineers must support TOE for as long as users
continue to use the hardware. Hardware vendors disappear, get bought, or
simply disappear (go out of business) during our maintenance timeframe.
Once a hardware vendor loses interest in Linux, TOE NICs will cease to
receive security updates, and hardware issues become incredibly
difficult to debug. Each new generation of system hardware often
requires re-examination of hardware drivers, a task made far more
difficult without a hardware vendor to receive questions."

- This seems like a general rant against any hardware device and so it
applies to iWARP too. 

Eliminates global system view 

"With TOE, the system no longer has a complete picture of all resources
used by network connections. Some connections are software-based, and
thus limited by existing policy controls (such as per-socket memory
limits). Other connections are managed by TOE, and these details are
hidden. As such, the VM cannot adequately manage overall socket buffer
memory usage, TOE-enabled connections cannot be rate-limited by the same
controls as software-based connections, per-user socket security limits
may be ignored, etc."

- iWARP doesn't use socket buffers.

Linux has several TCP Congestion Control algorithms available. For TOE
connections, this would no longer be true, all the congestion control
would be done by proprietary vendor specific algorithms on the card.

- I don't know of any proprietary congestion control algorithms built
into iWARP and doubt they would work between vendors. There is an iWARP
Interoperability Lab at UNH that tests this kind of thing.


> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to