Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread David Miller
From: Sean Hefty <[EMAIL PROTECTED]>
Date: Wed, 10 Oct 2007 14:01:07 -0700

> > The hack to use a socket and bind it to claim the port was just for 
> > demostrating the idea.  The correct solution, IMO, is to enhance the 
> > core low level 4-tuple allocation services to be more generic (eg: not 
> > be tied to a struct sock).  Then the host tcp stack and the host rdma 
> > stack can allocate TCP/iWARP ports/4tuples from this common exported 
> > service and share the port space.  This allocation service could also be 
> > used by other deep adapters like iscsi adapters if needed.
> 
> Since iWarp runs on top of TCP, the port space is really the same. 
> FWIW, I agree that this proposal is the correct solution to support iWarp.

But you can be sure it's not going to happen, sorry.

It would mean that we'd need to export the entire TCP socket table so
then when iWARP connections are created you can search to make sure
there is not an existing full 4-tuple that is the same.

It is not just about local TCP ports.

iWARP needs to live in it's seperate little container and not
contaminate the rest of the networking, this is the deal.  Any
suggested such change which breaks that deal will be NACK'd by all of
the core networking developers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread Sean Hefty
The hack to use a socket and bind it to claim the port was just for 
demostrating the idea.  The correct solution, IMO, is to enhance the 
core low level 4-tuple allocation services to be more generic (eg: not 
be tied to a struct sock).  Then the host tcp stack and the host rdma 
stack can allocate TCP/iWARP ports/4tuples from this common exported 
service and share the port space.  This allocation service could also be 
used by other deep adapters like iscsi adapters if needed.


Since iWarp runs on top of TCP, the port space is really the same. 
FWIW, I agree that this proposal is the correct solution to support iWarp.


- Sean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-09 Thread James Lentini

On Mon, 8 Oct 2007, Steve Wise wrote:

> The correct solution, IMO, is to enhance the core low level 4-tuple 
> allocation services to be more generic (eg: not be tied to a struct 
> sock).  Then the host tcp stack and the host rdma stack can allocate 
> TCP/iWARP ports/4tuples from this common exported service and share 
> the port space.  This allocation service could also be used by other 
> deep adapters like iscsi adapters if needed.

As a developer of an RDMA ULP, NFS-RDMA, I like this approach because 
it will simplify the configuration of an RDMA device and the services 
that use it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-08 Thread Steve Wise



David Miller wrote:

From: Sean Hefty <[EMAIL PROTECTED]>
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using a 
struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.

These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.

After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.

I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Hey Dave,

The hack to use a socket and bind it to claim the port was just for 
demostrating the idea.  The correct solution, IMO, is to enhance the 
core low level 4-tuple allocation services to be more generic (eg: not 
be tied to a struct sock).  Then the host tcp stack and the host rdma 
stack can allocate TCP/iWARP ports/4tuples from this common exported 
service and share the port space.  This allocation service could also be 
used by other deep adapters like iscsi adapters if needed.


Will you NAK such a solution if I go implement it and submit for review? 
 The dual ip subnet solution really sux, and I'm trying one more time 
to see if you will entertain the common port space solution, if done 
correctly.


Thanks,

Steve.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-28 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Tue, 28 Aug 2007 12:38:07 -0700

> It seems that the NIC would also have to look into a TCP stream (and
> handle out of order segments etc) to find message boundaries for this
> to be equivalent to what an RDMA NIC does.

It would work for data that accumulates in-order, give or take a small
window, just like LRO does.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-28 Thread Roland Dreier
Sorry for the long latency, I was at the beach all last week.

 > > And direct data placement really does give you a factor of two at
 > > least, because otherwise you're stuck receiving the data in one
 > > buffer, looking at some of the data at least, and then figuring out
 > > where to copy it.  And memory bandwidth is if anything becoming more
 > > valuable; maybe LRO + header splitting + page remapping tricks can get
 > > you somewhere but as NCPUS grows then it seems the TLB shootdown cost
 > > of page flipping is only going to get worse.

 > As Herbert has said already, people can code for this just like
 > they have to code for RDMA.

No argument, you need to change the interface to take advantage of RDMA.

 > There is no fundamental difference from converting an application
 > to sendfile or similar.

Yes, on the transmit side, there's not much difference from sendfile
or splice, although RDMA may give a slightly nicer interface that also
gives basically the equivalent of AIO.

 > The only thing this needs is a
 > "recvmsg_I_dont_care_where_the_data_is()" call.  There are no alignment
 > issues unless you are trying to push this data directly into the
 > page cache.

I don't understand how this gives you the same thing as direct data
placement (DDP).  There are many situations where the sender knows
where the data has to go and if there's some way to pass that to the
receiver, so that info can be used in the receive path to put the data
in the right place, the receiver can save a copy.  This is
fundamentally the same "offload" that an FC HBA does -- the SCSI
midlayer queues up commands like "read block A and put the data at
address X" and "read block B and put the data at address Y" and the
HBA matches tags on incoming data to put the blocks at the right
addresses, even if block B is received before block A.

RFC 4297 has some discussion of the various approaches, and while you
might not agree with their conclusions, it is interesting reading.

 > Couple this with a card that makes sure that on a per-page basis, only
 > data for a particular flow (or group of flows) will accumulate.

It seems that the NIC would also have to look into a TCP stream (and
handle out of order segments etc) to find message boundaries for this
to be equivalent to what an RDMA NIC does.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-21 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Mon, 20 Aug 2007 18:16:54 -0700

> And direct data placement really does give you a factor of two at
> least, because otherwise you're stuck receiving the data in one
> buffer, looking at some of the data at least, and then figuring out
> where to copy it.  And memory bandwidth is if anything becoming more
> valuable; maybe LRO + header splitting + page remapping tricks can get
> you somewhere but as NCPUS grows then it seems the TLB shootdown cost
> of page flipping is only going to get worse.

As Herbert has said already, people can code for this just like
they have to code for RDMA.

There is no fundamental difference from converting an application
to sendfile or similar.

The only thing this needs is a
"recvmsg_I_dont_care_where_the_data_is()" call.  There are no alignment
issues unless you are trying to push this data directly into the
page cache.

Couple this with a card that makes sure that on a per-page basis, only
data for a particular flow (or group of flows) will accumulate.

People already make cards that can do stuff like this, it can be done
statelessly with an on-chip dynamically maintained flow table.

And best yet it doesn't turn off every feature in the networking nor
bypass it for the actual protocol processing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-20 Thread Roland Dreier
[TSO / LRO discussion snipped -- it's not the main point so no sense
spending energy arguing about it]

 > Just be realistic and accept that RDMA is a point in time solution,
 > and like any other such technology takes flexibility away from users.
 > 
 > Horizontal scaling of cpus up to huge arity cores, network devices
 > using large numbers of transmit and receive queues and classification
 > based queue selection, are all going to work to make things like RDMA
 > even more irrelevant than they already are.

To me there is a real fundamental difference between RDMA and
traditional SOCK_STREAM / SOCK_DATAGRAM networking, namely that
messages can carry the address where they're supposed to be
delivered (what the IETF calls "direct data placement").  And on top
of that you can build one-sided operations aka put/get aka RDMA.

And direct data placement really does give you a factor of two at
least, because otherwise you're stuck receiving the data in one
buffer, looking at some of the data at least, and then figuring out
where to copy it.  And memory bandwidth is if anything becoming more
valuable; maybe LRO + header splitting + page remapping tricks can get
you somewhere but as NCPUS grows then it seems the TLB shootdown cost
of page flipping is only going to get worse.

Don't get too hung up on the fact that current iWARP (RDMA over IP)
implementations are using TCP offload -- to me that is just a side
effect of doing enough processing on the NIC side of the PCI bus to be
able to do direct data placement.  InfiniBand with competely different
transport, link and physical layers is one way to implement RDMA
without TCP offload and I'm sure there will be others -- eg Intel's
IOAT stuff could probably evolve to the point where you could
implement iWARP with software TCP and the data placement offloaded to
some DMA engine.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 22:23:01 -0700

> Also, looking at the complexity and bug-fixing effort that go into
> making TSO work vs the really pretty small gain it gives also makes
> part of me wonder whether the noble proclamations about
> maintainability are always taken to heart.

The cpu and bus utilization improvements of TSO on the sender side are
more than significant.  Ask anyone who looks closely at this.

For example, as part of his batching work Krisha Kumar has been
posting lots of numbers lately on the netdev list, I'm sure he can
post more specific numbers comparing the current stack in the case of
TSO disabled vs. TSO enabled if that is what you need to see how
beneficial TSO in fact is.

If TSO is such a lose why does pretty much every ethernet chip vendor
implement it in hardware?  If you say it's just because Microsoft
defines TSO in their NDI, that's a total cop-out.  It really does help
performance a lot.  Why did the Xen folks bother making generic
software TSO infrastructure for the kernel for the benefit of their
virtualization network device?  Why would someone as bright as Herbert
Xu even bother to implement that stuff if TSO gives a "pretty small
gain"?

Similarly for LRO and this isn't defined in NDI at all.  Vendors are
going so far as to put full flow tables in their chips in order to do
LRO better.

Using the bugs and issues we've run into while implementing TSO as
evidence there is something wrong with it is a total straw man.  Look
how many times the filesystem page cache has been rewritten over the
years.

Use the TSO problems as more of an example of how shitty a programmer
I must be. :)

Just be realistic and accept that RDMA is a point in time solution,
and like any other such technology takes flexibility away from users.

Horizontal scaling of cpus up to huge arity cores, network devices
using large numbers of transmit and receive queues and classification
based queue selection, are all going to work to make things like RDMA
even more irrelevant than they already are.

If you can't see that this is the future, you have my condolences.
Because frankly, the signs are all around that this is where things
are going.

The work doesn't belong in these special purpose devices, they belong
in the far-end-node compute resources, and our computers are getting
more and more of these general purpose compute engines every day.
We will be constantly moving away from specialized solutions and
towards those which solve large classes of problems for large groups
of people.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 > This is also a series of falsehoods.  All packet filtering,
 > queue management, and packet scheduling facilities work perfectly
 > fine and as designed with both LRO and TSO.

I'm not sure I follow.  Perhaps "broken" was too strong a word to use,
but if you pass a huge segment to a NIC with TSO, then you've given
the NIC control of scheduling the packets that end up getting put on
the wire.  If your software packet scheduling is operating at a bigger
scale, then things work fine, but I don't see how you can say that TSO
doesn't lead to head-of-line blocking etc at short time scales.  And
yes of course I agree you can make sure things work by using short
segments or not using TSO at all.

Similarly with LRO the packets that get passed to the stack are not
the packets that were actually on the wire.  Sure, most filtering will
work fine but eg are you sure your RTT estimates aren't going to get
screwed up and cause some subtle bug?  And I could trot out all the
same bugaboos that are brought up about RDMA and warn darkly about
security problems with bugs in NIC hardware that after all has to
parse and rewrite TCP and IP packets.

Also, looking at the complexity and bug-fixing effort that go into
making TSO work vs the really pretty small gain it gives also makes
part of me wonder whether the noble proclamations about
maintainability are always taken to heart.

Of course I know everything I just wrote is wrong because I forgot to
refer to the crucial axiom that stateless == good && RDMA == bad.
And sometimes it's unfortunate that in Linux when there's disagreement
about something, the default action is *not* to do something.

Sorry for prolonging this argument.  Dave, I should say that I
appreciate all the work you've done in helping build the most kick-ass
networking stack in history.  And as I said before, I have plenty of
interesting work to do however this turns out, so I'll try to leave
any further arguing to people who actually have a dog in this fight.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 16:31:07 -0700

>  > >  > When using RDMA you lose the capability to do packet shaping,
>  > >  > classification, and all the other wonderful networking facilities
>  > >  > you've grown to love and use over the years.
>  > > 
>  > > Same thing with TSO and LRO and who knows what else.
>  > 
>  > Not true at all.  Full classification and filtering still is usable
>  > with TSO and LRO.
> 
> Well, obviously with TSO and LRO the packets that the stack sends or
> receives are not the same as what's on the wire.  Whether that breaks
> your wonderful networking facilities or not depends on the specifics
> of the particular facility I guess -- for example shaping is clearly
> broken by TSO.  (And people can wonder what the packet trains TSO
> creates do to congestion control on the internet, but the netdev crowd
> has already decided that TSO is "good" and RDMA is "bad")

This is also a series of falsehoods.  All packet filtering,
queue management, and packet scheduling facilities work perfectly
fine and as designed with both LRO and TSO.

When problems come up, they are bugs, and we fix them.

Please stop spreading this FUD about TSO and LRO.

The fact is that RDMA bypasses the whole stack so that supporting
these facilities is not even _POSSIBLE_.  With stateless offloads it
is possible to support all of these facilities, and we do.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 > >  > When using RDMA you lose the capability to do packet shaping,
 > >  > classification, and all the other wonderful networking facilities
 > >  > you've grown to love and use over the years.
 > > 
 > > Same thing with TSO and LRO and who knows what else.
 > 
 > Not true at all.  Full classification and filtering still is usable
 > with TSO and LRO.

Well, obviously with TSO and LRO the packets that the stack sends or
receives are not the same as what's on the wire.  Whether that breaks
your wonderful networking facilities or not depends on the specifics
of the particular facility I guess -- for example shaping is clearly
broken by TSO.  (And people can wonder what the packet trains TSO
creates do to congestion control on the internet, but the netdev crowd
has already decided that TSO is "good" and RDMA is "bad")

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 12:52:39 -0700

>  > When using RDMA you lose the capability to do packet shaping,
>  > classification, and all the other wonderful networking facilities
>  > you've grown to love and use over the years.
> 
> Same thing with TSO and LRO and who knows what else.

Not true at all.  Full classification and filtering still is usable
with TSO and LRO.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 > > Isn't RDMA _part_ of the "software net stack" within Linux?

 > It very much is not so.

This is just nit-picking.  You can draw the boundary of the "software
net stack" wherever you want, but I think Sean's point was just that
RDMA drivers already are part of Linux, and we all want them to get
better.

 > When using RDMA you lose the capability to do packet shaping,
 > classification, and all the other wonderful networking facilities
 > you've grown to love and use over the years.

Same thing with TSO and LRO and who knows what else.  I know you're
going to make a distinction between "stateless" and "stateful"
offloads, but really it's just an arbitrary distinction between things
you like and things you don't.

 > Imagine if you didn't know any of this, you purchase and begin to
 > deploy a huge piece of RDMA infrastructure, you then get the mandate
 > from IT that you need to add firewalling on the RDMA connections at
 > the host level, and "oh shit" you can't?

It's ironic that you bring up firewalling.  I've had vendors of iWARP
hardware tell me they would *love* to work with the community to make
firewalling work better for RDMA connections.  But instead we get the
catch-22 of your changing arguments -- first, you won't even consider
changes that might help RDMA work better in the name of
maintainability; then you have to protect poor, ignorant users from
accidentally using RDMA because of some problem or another; and then
when someone tries to fix some of the problems you mention, it's back
to step one.

Obviously some decisions have been prejudged here, so I guess this
moves to the realm of politics.  I have plenty of interesting
technical stuff, so I'll leave it to the people with a horse in the
race to find ways to twist your arm.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread David Miller
From: Tom Tucker <[EMAIL PROTECTED]>
Date: Thu, 16 Aug 2007 08:43:11 -0500

> Isn't RDMA _part_ of the "software net stack" within Linux?

It very much is not so.

When using RDMA you lose the capability to do packet shaping,
classification, and all the other wonderful networking facilities
you've grown to love and use over the years.

I'm glad this is a surprise to you, because it illustrates the
point some of us keep trying to make about technologies like
this.

Imagine if you didn't know any of this, you purchase and begin to
deploy a huge piece of RDMA infrastructure, you then get the mandate
from IT that you need to add firewalling on the RDMA connections at
the host level, and "oh shit" you can't?

This is why none of us core networking developers like RDMA at all.
It's totally not integrated with the rest of the Linux stack and on
top of that it even gets in the way.  It's an abberation, an eye sore,
and a constant source of consternation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

> > I think removing the RDMA stack is the wrong thing to do, and you 
> > shouldn't just threaten to yank entire subsystems because you don't like 
> > the technology.  Lets keep this constructive, can we?  RDMA should get 
> > the respect of any other technology in Linux.  Maybe its a niche in your 
> > opinion, but come on, there's more RDMA users than say, the sparc64 
> > port.  Eh?
> 
> It's not about being a niche.  It's about creating a maintainable 
> software net stack that has predictable behavior.

Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

> 
> Needing to reach out of the RDMA sandbox and reserve net stack resources 
> away from itself travels a path we've consistently avoided.
> 
> 
> >> I will NACK any patch that opens up sockets to eat up ports or
> >> anything stupid like that.
> > 
> > Got it.
> 
> Ditto for me as well.
> 
>   Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Roland Dreier
 > Needing to reach out of the RDMA sandbox and reserve net stack
 > resources away from itself travels a path we've consistently avoided.

Where did the idea of an "RDMA sandbox" come from?  Obviously no one
disagrees with keeping things clean and maintainable, but the idea
that RDMA is a second-class citizen that doesn't get any input into
the evolution of the networking code seems kind of offensive to me.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Jeff Garzik

Steve Wise wrote:



David Miller wrote:

From: Sean Hefty <[EMAIL PROTECTED]>
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using 
a struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


I think removing the RDMA stack is the wrong thing to do, and you 
shouldn't just threaten to yank entire subsystems because you don't like 
the technology.  Lets keep this constructive, can we?  RDMA should get 
the respect of any other technology in Linux.  Maybe its a niche in your 
opinion, but come on, there's more RDMA users than say, the sparc64 
port.  Eh?


It's not about being a niche.  It's about creating a maintainable 
software net stack that has predictable behavior.


Needing to reach out of the RDMA sandbox and reserve net stack resources 
away from itself travels a path we've consistently avoided.




I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Got it.


Ditto for me as well.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Steve Wise



David Miller wrote:

From: Sean Hefty <[EMAIL PROTECTED]>
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using a 
struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


I think removing the RDMA stack is the wrong thing to do, and you 
shouldn't just threaten to yank entire subsystems because you don't like 
the technology.  Lets keep this constructive, can we?  RDMA should get 
the respect of any other technology in Linux.  Maybe its a niche in your 
opinion, but come on, there's more RDMA users than say, the sparc64 
port.  Eh?




These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.



Ok, although IMO its the correct solution.  But I'll propose other 
solutions below.  I ask for your feedback (and everyones!) on these 
alternate solutions.



After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.



The only other change requested and commited, if I recall correctly, was 
for netevents, and that enabled both Infiniband and iWARP to integrate 
with the neighbour subsystem.  I think that was a useful and needed 
change.  Prior to that, these subsystems were snooping ARP replies to 
trigger events.  That was back in 2.6.18 or 2.6.19 I think...



I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Got it.

Here are alternate solutions that avoid the need to share the port space:

Solution 1)

1) admins must setup an alias interface on the iwarp device for use with 
rdma.  This interface will have to be a separate subnet from the "TCP 
used" interface.  And with a canonical name that indicates its "for rdma 
only".  Like eth2:iw or eth2:rdma.  There can be many of these per device.


2) admins make sure their sockets/tcp services don't use the interface 
configured in #1, and their rdma service do use said interface.


3) iwarp providers must translation binds to ipaddr 0.0.0.0 to the 
associated "for rdma only" ip addresses.  They can do this by searching 
for all aliases of the canonical name that are aliases of the TCP 
interface for their nic device.  Or: somehow not handle incoming 
connections to any address but the "for rdma use" addresses and instead 
pass them up and not offload them.


This will avoid the collisions as long as the above steps are followed.


Solution 2)

Another possibility would be for the driver to create two net devices 
(and hence two interace names) like "eth2" and "iw2", and artificially 
separate the RDMA stuff that way.


These two solutions are similar in that they create a "rdma only" interface.

Pros:
- is not intrusive into the core networking code
- very minimal changes needed and in the iwarp provider's code, who are 
the ones with this problem

- makes it clear which subnets are RDMA only

Cons:
- relies on system admin to set it up correctly.
- native stack can still "use" this rdma-only interface and the same 
port space issue will exist.



For the record, here are possible port-sharing solutions Dave sez he'll NAK:

Solution NAK-1)

The rdma-cma just allocates a socket and binds it to reserve TCP ports.

Pros:
- minimal changes needed to implement (always a plus in my mind :)
- simple, clean, and it works (KISS)
- if no RDMA is in use, there is no impact on the native stack
- no need for a seperate RDMA interface

Cons:
- wastes memory
- puts a TCP socket in the "CLOSED" state in the pcb tables.
- Dave will NAK it :)

Solution NAK-2)

Create a low-level sockets-agnostic port allocation service that is 
shared by both TCP and RDMA.  This way, the rdma-cm can reserve ports in 
an efficient manor instead of doing it via kernel_bind() using a sock 
struct.


Pros:
- probably the correct solution (my opinion :) if we went down the path 
of sharing port space

- if no RDMA is in use, there is no impact on the native stack
- no need for a separate RDMA interface

Cons:

- very intrusive change because the port allocations stuff is tightly 
bound to the host stack and sock struct, etc.

- Dave will NAK it :)


Steve.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread Sean Hefty

How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


There are currently two RDMA solutions available.  Each solution has 
different requirements and uses the normal network stack differently. 
Infiniband uses its own transport.  iWarp runs over TCP.


We have tried to leverage the existing infrastructure where it makes sense.


After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.


Currently, the RDMA stack uses its own port space.  This causes a 
problem for iWarp, and is what Steve is looking for a solution for.  I'm 
not an iWarp guru, so I don't know what options exist.  Can iWarp use 
its own address family?  Identify specific IP addresses for iWarp use? 
Restrict iWarp to specific port numbers?  Let the app control the 
correct operation?  I don't know.


Steve merely defined a problem and suggested a possible solution.  He's 
looking for constructive help trying to solve the problem.


- Sean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread David Miller
From: Sean Hefty <[EMAIL PROTECTED]>
Date: Thu, 09 Aug 2007 14:40:16 -0700

> Steve Wise wrote:
> > Any more comments?
> 
> Does anyone have ideas on how to reserve the port space without using a 
> struct socket?

How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.

These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.

After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.

I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread Sean Hefty

Steve Wise wrote:

Any more comments?


Does anyone have ideas on how to reserve the port space without using a 
struct socket?


- Sean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/