Re: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]

2007-02-20 Thread Fab Tillier
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hal Rosenstock
Sent: Tuesday, February 20, 2007 1:43 PM

On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, February 20, 2007 10:57 AM
> 
> On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> [ftillier] This isn't just an install issue - it's a build issue.
> Anyone that wants to build OpenSM will need to find/download/install
the
> pthreads library so that the build will succeed.  If linking
statically,
> the resulting executable will not require any special installation.
> It's only an install issue if you link dynamically to pitheads.

OK; then build and install. How big an issue is this ?

I thought DLLs were dynamically linked but I'm a Windows plebe. 

[ftillier] When you build, the linker needs the import library for
pthreads so that the functions get resolved as being imported from the
pthreads DLL.  The dependency on the pthreads DLL is then created and
the DLL will be loaded dynamically, assuming it can be found in the
path.

So for the build process, you need to have the pthreads library
available to the build tool (path to the lib).  This requires installing
the pthreads developer package or however it's done.

If you statically link the pthreads lib, rather than dynamically link,
then all the pthreads goodies go directly into the executable and you
remove the dependency on an external DLL.  The build process
requirements are no different than for the dynamically linked case.

There is also the possibility to remove the link-time dependency by
calling GetProcAddress to explicitly resolve the pthreads entrypoints.
This method still requires having the DLL loaded on the user's systems.

Pesonally, I would rather see static linkage to the pthreads library so
that only the builds are affected (something only 'experts' will be
doing), while not affecting the common user.

-Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]

2007-02-20 Thread Fab Tillier


-Original Message-
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 20, 2007 10:57 AM

On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> Submissions to the OFW project are supposed to be bound by the
> contributor's agreement:
> 
> I can see this causing
> problems for builds, as people would need to find/install the pthreads
> library before OpenSM would build successfully.

Could install documentation for OpenSM on Windows minimize this as an
issue ?

[ftillier] This isn't just an install issue - it's a build issue.
Anyone that wants to build OpenSM will need to find/download/install the
pthreads library so that the build will succeed.  If linking statically,
the resulting executable will not require any special installation.
It's only an install issue if you link dynamically to pitheads.

-Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was: Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]

2007-02-20 Thread Fab Tillier
Submissions to the OFW project are supposed to be bound by the
contributor's agreement:

http://windows.openib.org/openib/contribute.aspx

Contributing code under anything but a BSD license violates condition 1,
though there shouldn't be issues with dual licenses as long as one of
the available licenses is a BSD license.

In any case, we're not talking about putting the pthreads library in
source or binary form in the OFW SVN, right?  We're just talking about
having OpenSM link to the pthreads library that is out-of-tree.  So the
question is whether there are any licensing issues with having a BSD
code include an out-of-tree LGPL file that would affect the ability to
retain the BSD license on the OpenSM files.  I can see this causing
problems for builds, as people would need to find/install the pthreads
library before OpenSM would build successfully.

-Fab

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hal Rosenstock
Sent: Tuesday, February 20, 2007 10:38 AM
To: ofw@lists.openfabrics.org
Cc: Gilad Shainer; OPENIB
Subject: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: win related [was:
Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]

Also, looping in the OpenFabrics Windows email list on this.

-- Hal

-Forwarded Message-

From: Hal Rosenstock <[EMAIL PROTECTED]>
To: Tzachi Dar <[EMAIL PROTECTED]>
Cc: OPENIB , Gilad Shainer
<[EMAIL PROTECTED]>
Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH
1/2] opensm: sigusr1: syslog() fixes]]
Date: 20 Feb 2007 13:21:38 -0500

Hi Tzachi,

On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote:
> See bellow.

I would like to get back to trying to close on this discussion.

> Thanks
> Tzachi 
> 
> > -Original Message-
> > From: Sasha Khapyorsky [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, February 08, 2007 9:47 PM
> > To: Tzachi Dar
> > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> > OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: syslog() fixes]]
> > 
> > On 20:31 Thu 08 Feb , Tzachi Dar wrote:
> > > The windows open IB has decided on using a BSD only license. 
> > > The common implementation of pthreads as far as I know is 
> > LGPL, which 
> > > means that it can not be used in open IB.
> > 
> > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > http://www.gnu.org/copyleft/lesser.html
> > 
> > And of course there are tons of examples when BSD software 
> > links against LGPLed glibc.
> 
> I can of course write you an answer that will be more than 5 pages
long
> of why *I* don't think that 
> Using GPL software is bad for everyone, but I guess that my opinion
> doesn't really meter, so I
> Won't do it.
> The page that you have referenced is of the GNU org, and even there it
> is hard to say that they
> are trying to encourage you to use the LGPL license. In any case, the
> main point is that 
> When open IB windows was formed there was a general decision that it
> will use BSD license. If we
> Start having components with the LGPL this will break that decision,
and
> therefore this requires
> some voting of the open IB organization.

I may be missing your point but is there something in the Windows
OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed
code (e.g. BSD like license) in concert with non OpenIB code (like LGPL)
? Isn't that essentially what using the Windows pthreads DLL with OpenSM
would be like ? As I understand it, I don't think this requires a
license change or anything in the OpenIB Windows charter prevents this
or needs changing.

> > > The only two ways that I see around this are 1) Change the 
> > license of 
> > > open IB windows which might be a complicated thing. 2) Find an 
> > > implementation of pthreads that is BSD.
> > 
> > BTW, just wondering... What is relation between windows open 
> > IB and OFA (and OFA's "dual-license rule")?
> Well, the way I see it one can take code from the Linux part under the
> BSD licance and use it in 
> The windows part. The otherway around seems fine to me but some say
that
> since the windows BSD liscance
> Reqires that some text will always remain there, the other way around
is
> not possibale. As I'm not an 
> Expert in that erea I don't know who is right.

I don't see how this affects what is being discussed about OpenSM. In
all the cases I'm aware of, the portability is from Linux to Windows and
not the other way around.

-- Hal

> > Sasha
> > 
> > > 
> > > Thanks
> > > Tzachi
> > > 
> > > > -Original Message-
> > > > From: Sasha Khapyorsky [mailto:[EMAIL PROTECTED]
> > > > Sent: Thursday, February 08, 2007 7:46 PM
> > > > To: Tzachi Dar; Yossi Leybovich
> > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal
Rosenstock
> > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > > opensm: sigusr1: syslog() fixes]]
> > > > 
> > > > On 11:24 Sun 21 Jan , Yevgeny Kliteynik wrote:
> > > >

Re: [openib-general] [openfabrics-ewg] drop mthca from svn?

2006-08-28 Thread Fab Tillier
> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 28, 2006 4:17 PM
> 
> With that said, why would maintaining mthca exclusively
> in git make it harder to track?  If anything I would think
> it would make it slightly easier, since "git log rev1..rev2
> drivers/infiniband/hw/mthca" and "git diff rev1..rev2
> drivers/infiniband/hw/mthca" are a lot faster than the
> svn equivalents.

Is git supported in Windows?  Right now, with MTHCA in SVN, it's possible to
do all development under Windows.  I don't know jack about git, so if
there's a Windows client that concern is moot.

- Fab



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Bugzilla

2006-04-12 Thread Fab Tillier
Hi Bryan,

Could you rename the current "OpenIB" product to "OpenIB Linux", and create an
"OpenIB Windows" project with the following components:

IPoIB
WSD
IB Core
MT23108
MTHCA
Diagnostics
OpenSM
SRP
Utils

Thanks,

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Plans for libibverbs 1.0, 1.1 and beyond

2006-02-16 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 16, 2006 11:18 AM
> 
> I'm also thinking of moving my libibverbs and libmthca development
> trees to git (most likely hosted at kernel.org).  This has the
> drawback of moving their development repositories out of the common
> openib.org svn tree.  However, it will make handling 1.0, 1.1 and
> feature development branches much easier.  I'd like to hear opinions
> on this before I make a decision.

I don't think the host of the repository matters.  I don't know anything about
git and its potential impact on the OpenIB dual license.  As long as development
is still done under the dual license I don't see any problem using a different
SCM tool.  Would we be able to host the git tree on the same server as SVN?

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RE: [Openib-windows] NFS performance and general disk network export advice (Linux-Windows)

2006-02-10 Thread Fab Tillier
Hi Tom,

> Fab:
> 
> As you point out, we've been focused on the main trunk and or target
> test platforms are Linux based. That said, we actually had Beta versions
> of NDIS and Winsock Direct drivers for the AMSO adapter, so we know this
> works and we know where the dead are buried.
> 
> It probably makes sense to wait until the core iWARP support is merged
> into the main trunk. however, when/if you decide to merge up from the
> main trunk and pick up iWARP support, I am more than happy to help you
> with any issues that you may have.

I think you misunderstood what I meant - I think it makes sense to have
functionality like the CMA in Windows as that provides applications with a
valuable service (using IP addressing to establish IB connections).  I don't
have any plans to merge in iWarp support though, as I don't understand what the
iWarp community's requirements are and haven't followed the discussions very
closely.  There is no policy or plan for merging code from Linux to Windows.

Lastly, with the Microsoft RDMA and TCP chimney designs, I don't know if it
makes sense for iWarp to layer into the IB framework rather than into the OS
provided one.

If there is to be iWarp support in OpenIB Windows, I expect iWarp vendors to
drive that effort.
 
Hopefully this clears things up.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] NFS performance and general disk network export advice (Linux-Windows)

2006-02-09 Thread Fab Tillier
Hi Paul,

> I'm looking to export a filesystem from each of four linux
> 64bit boxes to a single Windows server 2003 64bit Ed.
> 
> Has anyone achieved this already using an IB transport? Can
> I use NFS over IPoIB cross platform? i.e. do both ends
> support a solution?

IPoIB will interoperate cross platform, so any higher-level services you layer
above TCP/IP or UDP/IP should work fine.

> Is NFS over RDMA compatible with Windows (pretty sure the
> answer is no to this one but love to be proven wrong). I've
> attached Tom's announcement of the latest to the bottom of
> this email. I don't think Windows has the RDMA abstraction
> (yet)?

There is no NFS over RDMA file system for OpenIB Windows.  It would be great to
have it, but the focus is currently on getting the core stack stable and
released.

The long term goals, at least from my perspective, is to match functionality
between OpenIB Linux and Windows, even if the APIs aren't identical.  The
reality is that the iWARP crowd hasn't really been involved in the Windows
project, and have not driven any requirements, so that stack is continuing to be
focused on IB only.  I don't have a timeline for getting functionality matched
up, and we could certainly use more hands on deck for the Windows project.

> Are windows IB drivers (Openib or Mellanox) compatible with
> these options?
> Do I layer Windows services for Unix on top of the Windows IB
> drivers and IPoIB to achieve a cross platform NFS?

I don't know what you would need to do to get NFS working on Windows, but that
should be an orthogonal problem to getting IB working.  If NFS works on Windows
over GbE, it should work without a problem over IPoIB.

> Has anyone done much in the way of NFS performance
> comparisons of NFS over IPoIB in cross-platform situations
> vs say Gigabit ethernet. Does it work :) What is large file
> throughput and processor loading - I'm aiming for 150-200
> MB/s on large files on 4x SDR IB (possibly DDR if we can
> fit the bigger 144 port switch chassis into our rack layout
> for 50-ish nodes).

I can tell you that IPoIB performance on Windows is pretty awful.  The reason
for that is that the IPoIB driver shoehorns itself into the NDIS stack as a
802.3 Ethernet NIC, and thus gets 6-byte Ethernet MAC addresses.  Further,
Windows doesn't have any IB knowledge, so the IPoIB driver is responsible for
all ARP and DHCP encapsulation to match the IPoIB protocol on the wire.  This
involves snooping both outbound and inbound packets to see if they need
conversion, which does nasty stuff to performance.

Depending on the host CPU, 150-200MB/s should be achievable (I've seen 150+MB/s
in some of my testing).

> Are there any alternatives to using NFS that may be better
> and that would 'transparently' receive a performance boost
> with IB compared with using a simple NFS/gigabit ethernet
> solution. Must be fairly straightforward, ideally application
> neutral (configure a drive and load/unload script for Linux
> and it just happens) and compatible between Win2003 and
> Linux? Alternatives using perhaps Samba on the Linux side?

If you only have a single Windows box that has to read data from one or more
Linux boxes, you might have some success with making the Linux boxes SRP
targets, and then using the Windows SRP driver to access the Linux boxes.  The
SRP target driver would have to handle SRP commands and perform local disk
access.

Of course, the file system would have to be Windows compatible with this
solution, but you should be able to get the full RDMA performance since there
would be no network stack involved.  You'd also need to make sure that only a
single system accesses the data on the disks exported as SRP targets to prevent
corruption as those disks would appear as locally attached drives to the Windows
box.

I am unaware of an SRP target implementation for Linux, though, so that may not
be a viable option for you.

> My lack of knowledge of IB in the windows world has got me
> concerned over whether this is actually achievable (easily).
> 
> I hope to be trying this once we get a Windows 2003 machine,
> but hope someone can encourage me that its a breeze prior to
> my coming unstuck in a month or so!

The IB stuff should be a breeze to get functional and interoperating.  Whether
performance matches your requirements/expectations is another thing.  Do report
back if you have any questions or run into any problems along the way.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Reregister Memory Region Verb

2006-01-25 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 25, 2006 10:49 AM
> 
> Fab Tillier wrote:
> >> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, January 25, 2006 10:06 AM
> >>
> >> Fab Tillier wrote:
> >>>> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> >>>> Sent: Wednesday, January 25, 2006 9:22 AM
> >>>>
> >>>>>> although we would prefer that it wouldn't block if possible
> >>>>>
> >>>>> mmm. All the current memory registration verbs both user and
> >>>>> kernel are blocking, is it an issue for you?
> >>>>>
> >>>>
> >>>> If you need to do memory registrations in a context where blocking
> >>>> is not an option then you really need FMR work requests as in RDMAC
> >>>> and InfiniBand 1.2 verbs.
> >>>
> >>> No.  The blocking semantics of memory registration APIs is a
> >>> deliberate design choice and not a limitation of the hardware.  It
> >>> is possible (though more complicated) to make the API asynchronous.
> >>> No existing IB stack to date has ever done so, however.
> >>>
> >>> - Fab
> >>
> >> If asynchronous memory registration (via work request) is such a bad
> >> idea then why is it part of both the RDMAC iWARP and InfiniBand 1.2
> >> verb specifications?
> >
> > You misunderstood me.  I didn't say anything about FMR being
> > a bad idea, just that regular MRs could be made to work in a
> > non-blocking manner.  Non-blocking calls don't require FMR, it could
> > be done without.
> 
> Yes, it is possible to specify a set of conditions where
> a memory registration verb would not have to block. But
> is it worthwhile to specify that under those conditions
> that it MUST NOT block?
> 
> For verification purposes it is much simpler if a given
> verb is either guaranteed to never block, or is considered
> subject to blocking. It is much easier to check whether a
> routine that is supposed to be non-blocking NEVER makes a
> call to a routine that could block than it is to check that
> it never makes a call to a routine with the set of conditions
> that might cause it to block.
> 
> So if applications have need to do registration where they
> are *guaranteed* that they will not block then I believe
> an asynch API (i.e., work requests) is a much better
> solution than adding lots of asterisks explaining
> when a call that normally "can block" will in fact
> be guaranteed not to block. The list of non-blocking
> scenarios is real easy to generate as a SHOULD NOT
> list, but it gets very tricky if you convert it to
> a MUST NOT list.

I wholeheartedly agree that having an API that *may* block is much worse than
just treating it as always blocking from a maintenance perspective.  What I was
referring to was that all the verb APIs could be made asynchronous, putting the
burden on the API provider to handle any blocking issues and not the end user.
You don't need work requests to have asynchronous APIs.  However, this is a
pretty significant change that I don't see happening for Linux (however I've had
a long term dream of making async verbs a reality in Windows for kernel
clients).

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Reregister Memory Region Verb

2006-01-25 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 25, 2006 10:06 AM
> 
> Fab Tillier wrote:
> >> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, January 25, 2006 9:22 AM
> >>
> >>>> although we would prefer that it wouldn't block if possible
> >>>
> >>> mmm. All the current memory registration verbs both user and kernel
> >>> are blocking, is it an issue for you?
> >>>
> >>
> >> If you need to do memory registrations in a context where blocking is
> >> not an option then you really need FMR work requests as in RDMAC and
> >> InfiniBand 1.2 verbs.
> >
> > No.  The blocking semantics of memory registration APIs is a
> > deliberate design choice and not a limitation of the
> > hardware.  It is possible (though more
> > complicated) to make the API asynchronous.  No existing IB
> > stack to date has ever done so, however.
> >
> > - Fab
> 
> If asynchronous memory registration (via work request) is
> such a bad idea then why is it part of both the RDMAC iWARP
> and InfiniBand 1.2 verb specifications?

You misunderstood me.  I didn't say anything about FMR being a bad idea, just
that regular MRs could be made to work in a non-blocking manner.  Non-blocking
calls don't require FMR, it could be done without.

- Fab



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Reregister Memory Region Verb

2006-01-25 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 25, 2006 9:22 AM
> 
> >> although we would prefer that it wouldn't block if possible
> >
> > mmm. All the current memory registration verbs both user and
> > kernel are blocking, is it an issue for you?
> >
> 
> If you need to do memory registrations in a context where
> blocking is not an option then you really need FMR work
> requests as in RDMAC and InfiniBand 1.2 verbs.

No.  The blocking semantics of memory registration APIs is a deliberate design
choice and not a limitation of the hardware.  It is possible (though more
complicated) to make the API asynchronous.  No existing IB stack to date has
ever done so, however.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RE: [PATCH] OpenSM/ib_types.h: Modifyib_port_info_compute_rate

2005-12-18 Thread Fab Tillier
Hi Hal,

> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Sunday, December 18, 2005 7:48 AM
> 
> On Sun, 2005-12-18 at 06:53, Eitan Zahavi wrote:
> > Hi Hal,
> >
> > The attached patch is fine. Please go ahead and commit it.
> >
> > BTW:
> > In the following commit 4509 you have changed the name of a
> > switch info record field.
> > Note this is an API change and have severe effect on any
> > application using ib_types.h (and there are plenty of these)
> 
> Is this "API" frozen for all time ? How would you propose that this
> "API" evolve ? I do not see where there is any versioning to the API.

The ib_types.h file in the OpenSM project was originally lifted from the IBAL
project.  The Windows OpenIB Project, since it's derived from IBAL, has that
file.  Currently, OpenSM has it's own shadow copy of that header, so changes to
it must be carefully controlled to keep OpenSM building on Windows.

Hopefully that helps explain.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] IB_AT_MOST

2005-12-16 Thread Fab Tillier
Hi Michael,

> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 16, 2005 5:58 AM
> 
> Hi!
> I recently noted that some middleware seems to use the "as much
> as possible" approach, for example, using maximum possible value
> for max_rd_atomic or other fields, in create/modify qp.
> 
> An obvious thing could be to perform query_device and use max.
> values from there. However, it turns out that hardware max supported
> values might not be easy to express in terms of a single constant.
> Consider for example the max number of s/g entries supported per
> WQE: mellanox HCAs support different number of these for RC and UD
> QPs. So whatever single number query device reports, using it will
> never achieve what the user wants for all QP types.
> 
> Rather than extending the device query for all thinkable hardware
> weirdness, I'd like to propose, instead, the following API extension
> (below): passing a negative value in e.g. qp attribute would have the
> meaning: let hardware use at most the specified value.
> This, as opposed to the usual "at least the specified value" meaning
> for positive values.
> 
> How does the following work, for an API? Please comment.

I don't understand the IB_AT_MOST macro.  If someone uses IB_AT_MOST( 1 ) and
the hardware supports 4, they will get 4, which is definitely not "at most 1".

I would rename it to IB_MAX, and define it a -1 or something like that.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CMA] support for SDP + standard protocol

2005-12-13 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 13, 2005 10:39 AM
> 
> >I understand that SDP needs address translation services as well as
> >its own private data. However, I think it could be implemented using
> >optional API functions that allow the ULP to modify the private data
> >per its need, rather than adding ULP knowledge into CMA.
> >As example, if ISER spec will be modified, or some new ULP
> >implemented, that needed their own private data, we'll need to modify
> >CMA again, as well as creating a dependency between CMA versions and
> >ULPs.
> 
> The CMA must be aware of the format of the data in order to
> set and extract the IP addressing information.  SDP and the
> new CMA format locate these in different areas of the private
> data.  The CMA only defines the SDP hello header, and
> restricts its definition to the location of the IP addresses,
> source port, and version information.
> 
> If a ULP wants to define their own private data format and move
> the locations of any of those fields, then yes, the CMA would
> need to be changed again.  But I don't see how any API changes
> can prevent this, since the CMA must be able to extract the data
> on the remote side.

Now that the IB spec is going to have a section for how to support IP addressing
in CM MADs, there shouldn't be any need for a ULP to duplicate that
functionality.  SDP is a special case because it predates the IP addressing
extension to the CM protocol.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 4:06 PM
> 
> Socket listen semantics have nothing to do with Ethernet.
> They are Unix/POSIX. In fact a major point of socket semantics
> is that they worked over multiple networks.

The IB CM doesn't provide socket semantics.  Period, end of story.  Providing
socket semantics is higher level functionality (the CMA), and outside the scope
of the IB CM and this email thread.

> Sockets are part of the problem when it comes to transferring
> data once a connection is established, which is why we have
> QPs and CQs.

Irrelevant.

> But there is a very simple transport neutral definition of
> passive side connectin setup. The server issues a listen.
> The server receives connection requests. The service can
> optionally hand off the connection request, accept it
> or reject it.

There is no notion of per-request handoff in IB - you either accept or reject -
that's it.  The reject can cause a redirect, but that requires a new connection
request from the client.

> That model is a natural extension of both TCP connection
> setup and the InfiniBand CM. 

How does the IB CM protocol support hand off?

> It allows the server to deal
> with destination multiplexing. DAPL and IT-API both already
> work this way.
> 
> Are you opposed to transport neutral connection establishment?

I don't give a hoot about transport neutral connection establishment, DAPL, or
IT-API in the scope of this email thread.  They just aren't relevant whatsoever.
This thread is about adding private data comparison functionality to the IB CM.
The IB CM is the module to which the CMA interfaces.  The CMA is a separate
module providing higher level functionality, and is designed to provide
transport neutral connection establishment, specifically IP addressing over IB.

As Sean originally stated in the mail that started this thread, the CMA will
make use of the private data comparison functionality.  Adding this
functionality to the IB CM is simpler than implementing it in the CMA while at
the same time providing additional flexibility to future users of the IB CM that
wish to have similar functionality.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 3:02 PM
> 
> There are many reasons why an established RDMA connection
> cannot be passed between processes, but I know of know
> reason why a Connection Request cannot be passed to a child
> or third process where it can be accepted.
> 
> Why not emulate the existing solution rather than creating
> a new interface that is transport specific?

Allowing a connection request to come in on one CID (which is associated with
the listening process) and letting that connection be accepted by a different
process requires making changes to the user-mode CM infrastructure to allow CIDs
to be migrated safely between processes.  This is very likely to be more
difficult than adding private data comparison to the IB CM.

This is all under the covers for socket applications.  It avoids the need for
the CMA to keep an efficiently searchable tree of listen requests to perform
private data comparison when the IB CM already does 90% of the work.

To sum up, it is simpler to add the private data compare functionality to the IB
CM than to add it to every client that wants it.  The changes required don't
complicate the API significantly, certainly within the grasp of someone
interfacing to verbs.  I know this from experience because I've done it before.

> Or conversely, if you truly think this is of general utility,
> why not implement it in INETD as well?

I wasn't making the case that it has general utility, just that it has utility
within the realm of IB connection management.  Someone else is welcome to expand
the scope if they see fit, but that's not what I'm advocating.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens

2005-12-02 Thread Fab Tillier
> From: Tom Tucker [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 2:14 PM
> 
> Am I correct to assume that this functionality is unique to the IB CM
> and is not going to be exposed through the CMA?

My understanding is that the CMA would make use of that functionality, but it
would not be exposed to users of the CMA.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 12:28 PM
> 
> Fab Tillier wrote:
> >
> > Your focus is strictly on TCP socket semantics, but we're
> > talking about IB CM functionality - the IB CM does more than
> > just provide TCP socket semantics.
> >
> > Imagine a user-mode IB application (not virtualization mind
> > you, but just an
> > app) that wants to listen on a given SID (because the SID
> > defines the application), but wants to discriminate incoming
> > requests based on some content in the private data.  Multiple
> > instances of that application can only work properly if the
> > CM performs the private data comparison to properly dispatch
> > the incoming requests to the right user-mode process.
> >
> > If the CM doesn't provide the private data compare
> > functionality, then the app developer needs to create a
> > kernel agent to perform this functionality for the app.  The
> > functionality is simple enough, and has potential value to
> > multiple clients, that it makes sense to have the IB CM provide it.
> 
> You are proposing that the API be made more complex and
> you do not have any justification other that something
> some user-mode application *might* want to do.

In Windows, the Winsock Direct provider does exactly this, and would require a
kernel component if the IB CM wasn't providing this functionality.  WSD uses the
private data to carry the IP address of the client, but uses its own private
data format.

I believe some native-IB MPI implementations make use of similar functionality,
using the rank of the process in the private data.  This allows such
implementations to limit the size of their SID range to a single value or a
single value per job.

> Why are these different user-mode applications sharing
> a Service ID in the first place? On what basis do they
> trust each other? How do they co-ordinate their filtering?
> Couldn't they use CM redirection to share the Service ID?

The world is larger than just TCP-compatible applications.  I'm not talking
about two applications sharing a SID, but two instances of one application
sharing a SID.  Imagine processes in a larger MPI job - the SID can be used to
differentiate jobs, and the private data comparison can be used to differentiate
different processes within that job.  Alternatively, the SID could be constant,
and the job ID and rank could be expressed in the private data, with the
IB-level CM performing all the proper dispatching.

I don't think CM redirection would work since both apps are on the same system,
and share the same CM.  There can only be a single connection ID namespace per
HCA GUID or things quickly become ambiguous.

> The goal was supposed to be providing TCP-compatible
> connection setup, but this is describing something that
> is decidedly un-TCP-like. TCP applications differentiate
> within the daemon, or redirect connections. If they split
> connections based upon packet content it is only done by
> very sophisticated L7 load balancers that identify cookies
> or other HTTP content.

The goal of the CMA *is* to support TCP-compatible semantics, but that is not
the goal of the IB CM.  The IB CM already keeps track of listens and performs
lookups when a REQ comes in based on service ID.  Extending it to do some fairly
basic extra checking is far simpler than adding duplicate lookup functionality
to the CMA.  This allows the IB CM to do all the filtering at once as part of
REQ matching, and thus simplifies the CMA.  It also allows user-mode apps to use
similar functionality without requiring a kernel agent.

Anyhow, do you have an objection to the CM enabling simple comparisons on
private data?  If so, what are your objections (aside from it not being
TCP-like)?

Thanks,

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 12:13 PM
> 
> Sean Hefty wrote:
> > Fab Tillier wrote:
> >>> Just listen on the Service ID / Port and let the ULP sort them out
> >>> by destination IP address.
> >>
> >> That only works if there is a single kernel module providing the
> >> extra checks. Multiple user-mode ULPs cannot do the checking in
> >> user-mode - the checking must be done in the kernel to figure out
> >> which user-mode client to hand the request to.
> >>
> >> I think putting in restrictions to the comparisons possible is fine,
> >> as the functionality of having the CM facilitate some sort of
> >> filtering is useful.
> >
> > My concern with pushing this to the ULP is that it requires
> > the ULP to track service IDs for reference counting purposes
> > and adds additional synchronization to the ULP that could have been
> > handled by the CM.
> >
> > I'm looking at what the full effect of implementing this in the ULP
> > would be.
> 
> I'm still missing something.
> 
> I don't see how filtering in the CM is of benefit in either case. The
> work either belongs in the Hypervisor or in the Daemon, not the CM.

Your focus is strictly on TCP socket semantics, but we're talking about IB CM
functionality - the IB CM does more than just provide TCP socket semantics.

Imagine a user-mode IB application (not virtualization mind you, but just an
app) that wants to listen on a given SID (because the SID defines the
application), but wants to discriminate incoming requests based on some content
in the private data.  Multiple instances of that application can only work
properly if the CM performs the private data comparison to properly dispatch the
incoming requests to the right user-mode process.

If the CM doesn't provide the private data compare functionality, then the app
developer needs to create a kernel agent to perform this functionality for the
app.  The functionality is simple enough, and has potential value to multiple
clients, that it makes sense to have the IB CM provide it.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 12:13 PM
> 
> Sean Hefty wrote:
> > Fab Tillier wrote:
> >>> Just listen on the Service ID / Port and let the ULP sort them out
> >>> by destination IP address.
> >>
> >> That only works if there is a single kernel module providing the
> >> extra checks. Multiple user-mode ULPs cannot do the checking in
> >> user-mode - the checking must be done in the kernel to figure out
> >> which user-mode client to hand the request to.
> >>
> >> I think putting in restrictions to the comparisons possible is fine,
> >> as the functionality of having the CM facilitate some sort of
> >> filtering is useful.
> >
> > My concern with pushing this to the ULP is that it requires
> > the ULP to track service IDs for reference counting purposes
> > and adds additional synchronization to the ULP that could have been
> > handled by the CM.
> >
> > I'm looking at what the full effect of implementing this in the ULP
> > would be.
> 
> I'm still missing something.
> 
> I don't see how filtering in the CM is of benefit in either case. The
> work either belongs in the Hypervisor or in the Daemon, not the CM.

Your focus is strictly on TCP socket semantics, but we're talking about IB CM
functionality - the IB CM does more than just provide TCP socket semantics.

Imagine a user-mode IB application (not virtualization mind you, but just an
app) that wants to listen on a given SID (because the SID defines the
application), but wants to discriminate incoming requests based on some content
in the private data.  Multiple instances of that application can only work
properly if the CM performs the private data comparison to properly dispatch the
incoming requests to the right user-mode process.

If the CM doesn't provide the private data compare functionality, then the app
developer needs to create a kernel agent to perform this functionality for the
app.  The functionality is simple enough, and has potential value to multiple
clients, that it makes sense to have the IB CM provide it.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens

2005-12-02 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 02, 2005 10:59 AM
> 
> Just listen on the Service ID / Port and let the ULP sort them
> out by destination IP address.

That only works if there is a single kernel module providing the extra checks.
Multiple user-mode ULPs cannot do the checking in user-mode - the checking must
be done in the kernel to figure out which user-mode client to hand the request
to.

I think putting in restrictions to the comparisons possible is fine, as the
functionality of having the CM facilitate some sort of filtering is useful.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] userspace physical memory

2005-11-22 Thread Fab Tillier
> From: yipee [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 22, 2005 9:00 AM
> 
> Hi,
> 
> Is there a way to do physical memory registration from user-space?

No, there is not.  The IB spec specifically calls out physical registration as a
privileged-only operation.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] OpenSM Debug

2005-11-20 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Sunday, November 20, 2005 4:59 AM
> 
> Hi Fab,
> 
> On Sat, 2005-11-19 at 13:50, Fab Tillier wrote:
> >
> > That's correct - structure definitions change between the debug and
> > release builds of complib.  The code above is there because in Linux,
> > the library created by complib has the same name in debug and release
> > builds, so it is possible to have a mismatch between the type of
> > build for opensm and complib.  In Windows, I solved this by adding a
> > debug-only suffix to the library name (complibd vs. complib) so that
> > the risk of linkage errors is eliminated.  I have suggested in the
> > past that the Linux complib adopt a similar naming scheme and
> > that doing runtime checks for linkage errors was indicative of a
> > poor design.
> >
> > This has been the basis for me pushing back on adding the
> > cl_is_debug function to the Windows version of complib.
> 
> Is there a convention for naming debug libraries in Linux ?

I'm no Linux expert, so I have no clue here.  Perhaps the C libraries already
have some method?

> Is there any reason why the 2 versions of the libraries (with different
> names) shouldn't be allowed concurrently to exist and just link with the
> desired one ?

There is none that I can think of.  In fact, the Windows drivers allow both the
debug and release versions of the user-mode components to co-exist, as well as
mixing debug and release kernel drivers.  This makes it easy to debug a single
component without affecting timings in the whole stack.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] OpenSM Debug

2005-11-19 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Saturday, November 19, 2005 8:38 AM
> 
> >The following code snippet is in opensm/main.c:
> >
> >  if ( osm_is_debug() != cl_is_debug() )
> >  {
> >fprintf(stderr, "ERROR: OpenSM and Complib were compiled using
> > different modes\n");
> >fprintf(stderr, "ERROR: OpenSM debug:%d Complib debug:%d \n",
> >osm_is_debug(), cl_is_debug() );
> >exit(1);
> >  }
> >
> >Is there a reason debug can't be turned on independently in OpenSM and
> >the component library ?
> 
> There used to be a restriction that you couldn't mix a free/release
> version of the component library with a debug version of a client,
> and vice-versa.  The debug version of complib added fields to
> structures that were not needed in the release version, resulting
> in different structure sizes between free and debug versions.
> This is probably still the case.

That's correct - structure definitions change between the debug and release
builds of complib.  The code above is there because in Linux, the library
created by complib has the same name in debug and release builds, so it is
possible to have a mismatch between the type of build for opensm and complib.
In Windows, I solved this by adding a debug-only suffix to the library name
(complibd vs. complib) so that the risk of linkage errors is eliminated.  I have
suggested in the past that the Linux complib adopt a similar naming scheme and
that doing runtime checks for linkage errors was indicative of a poor design.

This has been the basis for me pushing back on adding the cl_is_debug function
to the Windows version of complib.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [swg] RE: [openib-general] RE: [dat-discussions] socketbased connectionmodel for IB proposal - round 3

2005-11-11 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> 
> Fab Tillier wrote:
> >> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> >> Sent: Friday, November 11, 2005 1:12 PM
> >>
> >> How does this prevent a non-privileged client running on a remote
> >> host with current CM software from generating a connection request
> >> to the targeted Service ID with the entire private data coming from
> >> the non-privileged consumer.
> >
> > There is no need to prevent a non-privileged client from
> > generating connection requests.  Where does this requirement
> > come from?  Who cares where the private data comes from as
> > long as the recipient, whether privileged or not, has a way
> > of validating that it matches the path record information?
> >
> > Specifically, adding the logic in the low level IB CM to
> > validate the private data will tie the IB CM to address
> > translation for IPoIB, which I think is better done at a
> > higher level (like the CMA).
> >
> > If a higher level entity is going to be responsible for
> > validating the private data, the low level IB CM doesn't do
> > squat with the reserved bit.  The low level CM API must now
> > expose the bit to allow clients to specify it so that REQs
> > can be routed to them, so that two requests with the same SID
> > can be distinguished form one another by this reserved bit.
> > Thus if the bit has to be exposed through the low-level IB CM
> > it is no more than a 65th bit for a service ID.
> >
> By the time the connection request is passed to the application
> the remote IP address needs to be validated.

I agree - by the time the upper-most, IP addressing aware application gets it,
whoever is sending the connection request up must have done the validation.

> I don't care whether the remote CM validated it (and is known
> to be privileged software) or if the local CM validates it
> with a reverse lookup.

A reverse lookup isn't needed - a forward lookup is.  The whole point of passing
the IP addresses in the private data was to solve the ambiguity of reverse
lookups.

> What I do not want is to kick this problem up to the application.
> If it is kicked up to the application it is no longer TCP-compatible
> connection setup, because that responsibility does not exist over TCP.

I agree.  I'm just pointing out that the validation of the private data does not
have to be done by a privileged entity, so trying to put a bunch of bits in the
protocol to require enforcement by privileged code is unnecessary.  That means
that the CMA functionality could (not should) be implemented in user-mode.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [swg] RE: [openib-general] RE: [dat-discussions] socketbased connectionmodel for IB proposal - round 3

2005-11-11 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 11, 2005 1:12 PM
> 
> How does this prevent a non-privileged client running on a remote host with
> current
> CM software from generating a connection request to the targeted Service ID
> with the entire private data coming from the non-privileged consumer.

There is no need to prevent a non-privileged client from generating connection
requests.  Where does this requirement come from?  Who cares where the private
data comes from as long as the recipient, whether privileged or not, has a way
of validating that it matches the path record information?

Specifically, adding the logic in the low level IB CM to validate the private
data will tie the IB CM to address translation for IPoIB, which I think is
better done at a higher level (like the CMA).

If a higher level entity is going to be responsible for validating the private
data, the low level IB CM doesn't do squat with the reserved bit.  The low level
CM API must now expose the bit to allow clients to specify it so that REQs can
be routed to them, so that two requests with the same SID can be distinguished
form one another by this reserved bit.  Thus if the bit has to be exposed
through the low-level IB CM it is no more than a 65th bit for a service ID.

> A current CM does not know that the Service ID requires it to
> generate/validate
> any portion of the private data.

The CM doesn't need to validate any private data.  The CM only needs to pass the
incoming REQ to a client that listened on that particular SID.  The client that
listened on the particular SID is expected to know the private data format and
to validate it as it sees fit.

> A current CM does not know how to use a later version number or to set a
> bit that is currently defined as reserved.

I don't think we need the reserved bit at all.  I agree with Sean it just adds a
65th bit to the SID that is unnecessary.  We don't need a privileged-only
implementation, either.  As long as we have forward lookups of IP to GID
available through address translation, any recipient of a CM REQ with the
IP-address in the private data can validate that the IP addresses are
appropriate for the IB path specified in the CM REQ.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RE: [dat-discussions] socket basedconnectionmodel for IB proposal - round 3

2005-11-11 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 11, 2005 9:57 AM
> 
> Sean Hefty wrote in response to Arkady Kanevsky:
> 
> > What's this proposal defines is basically a 65th bit for the
> > service ID.  If the new 65 bit SID is:
> >
> > 1 - private data has this format
> > 0 - private data format is unknown
> >
> > Why do we need this 65th bit?
> 
> Because current software can set any of the 64 bits.
> There is no assurance that any bit within the current 64
> being set means that privileged software on the remote
> side is vouching for the standardized portion of the
> private data.

Do we need the remote side to vouch for that portion of the private data?  The
recipient of a CM REQ can validate fully that the GIDs in the path record match
the IP addresses.  That was the whole point of this proposal - eliminate the
need to do some reverse lookup of GID to IP based on the source GID in the path
record.  With the source IP provided in the private data, the recipient of the
CM REQ can do a forward lookup of that IP address and validate that the GID
returned matches the one in the CM REQ path.  Thus, all address translation can
use forward lookups and we eliminate the flaws of the reverse lookup schemes
that are currently in use.

It doesn't matter one bit if the CM REQ private data was formatted by a
privileged entity or not - garbage in the private data can be detected by the
receiving entity, even one that sits above the IB CM.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] socket based connection model for IB proposal -round 3

2005-11-10 Thread Fab Tillier
> From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 10, 2005 1:37 PM
> 
> It will be discussed at IBTA SWG meeting next week Tu.
> Please, post your comments before that.

Looks fine to me overall.  The only thing I would change is make the version
field 4 bits rather than just 2, and shift the IP version down 2 bits,
eliminating the reserved bits.  That way, the first byte is split evenly between
protocol version and IP version.

Do we even need to indicate the IP version, or can IPv4 addresses be expressed
as IPv6 addresses just by zeroing the first 12 bytes?

I don't understand the relevance of the 0-based VA or Send with Invalidate
discussion points.  They seem orthogonal to the socket-based CM proposal, and
IMO should be moved to a separate proposal.

I have no opinion one way or another on the presence of the protocol field.  It
could just as well be left as "flags" for the consumer to do with what they
please.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH/RFC]

2005-11-07 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 07, 2005 4:53 PM
> 
> Any comments on changing the signature of the struct ib_device
> resize_cq method to take the new CQ size rather than a pointer to the
> new CQ size?  The low-level driver would then be responsible for
> updating the cq->cqe member itself (possibly with proper locking).
> This would also make the prototype match the create_cq method and the
> actual ib_resize_cq() function.

This matches what I plan on doing in Windows, so it sounds good to me.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Fab Tillier
> From: Bob Woodruff [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 04, 2005 10:58 AM
> 
> Fab wrote,
> >There is not a 1:1 relationship between a UDP application socket
> >and an IB QP, rather there is a single IB connection between systems
> >over which traffic from multiple UDP sockets flows.
> 
> That would probably provide better scalability, since there
> would not be a 1:1 mapping between UDP sockets and IB connections,
> however for large clusters there may still be a scalability issue
> if every node needs to have a connection to every other node.
> If you implemented it on top of datagrams instead, then each node
> would only need one QP, rather than one for every node in the cluster.

Doing a UDP to IB-UD protocol is unlikely to buy you anything over just using
IPoIB.  I don't know about doing UDP to IB-RDD, but the complexity of supporting
end to end contexts and RDD QPs seems to me to outweigh the complexity of doing
SW multiplexing over multiple IB-RC QPs.  I don't think software multiplexing
over IB-RC costs much from both a system/HCA resource and performance
perspective, especially compared to doing something like uDAPL or SDP where
there's a 1:1 relationship between EP or socket to QP, respectively.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] ContributeRDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 04, 2005 10:30 AM
> 
> Rick Frank wrote:
> > No we do not use TCP sockets - we use to many connections for this 100k+.
> 
> Isn't RDS implemented on top of reliable IB/RDMA connections anyway?

There is not a 1:1 relationship between a UDP application socket and an IB QP,
rather there is a single IB connection between systems over which traffic from
multiple UDP sockets flows.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general][PATCH] local device search with sourceaddresswildcard

2005-11-04 Thread Fab Tillier
> From: Steve Wise [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 04, 2005 10:23 AM
> 
> - Original Message -
> From: "Sean Hefty" <[EMAIL PROTECTED]>
> To: "Tom Tucker" <[EMAIL PROTECTED]>
> Cc: 
> 
> > Tom Tucker wrote:
> >> Sean:
> >>
> >> I was looking through ip_resolve_local and it looks to me like
> >> if the source address is 0, it will end up getting set to the
> >> destination IP instead of the IP address of the local interface.
> >
> > The intent of ip_resolve_local() is to check if a given destination
> > address is on the local system.  If it is and no source address is
> > specified, then the source address is set to the same address as the
> > destination.
> >
> 
> This doesn't sound correct to me.  The src ip address is supposed to be
> the local ip address to be used for establishing the connection.  If you
> set it to the destination address, then you'd end up passing that
> address to the peer in the private data, and that is incorrect...

If the destination address is on the local system, then the user is establishing
a loopback connection.  I think that if the user didn't specify a source
address, returning the same address as the destination should give the proper
results.

For loopback connections, source and destination can (and will likely) be the
same.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [OpenSM] SA database query tool

2005-11-02 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 02, 2005 1:11 PM
> 
> As I said there is likely a real SA client that will be developed. In
> the short term, you can use some diag as an example but these are SMP
> rather than GMP based (except for perfquery). There is some SA
> infrastructure in place but I'm not sure how well it works. Would you be
> using RMPP too as little has exercised it to date ?

RMPP would be required for a query of all service registrations.

> There's sa_call and just an ib_path_query right now (in
> libibmad/src/sa.c). A service query could be easily added. RMPP is not
> supported yet at this level.
> 
> > > What is the timeframe for this need ?
> >
> > I'm thinking of debugging tools that would be useful for me at SC05.
> 
> I was planning on using ibis at SC05 if this was needed.

If there are Windows boxes on the same IB fabric, you could pretty easily write
a program to do the query for you.  Windows supports user-mode SA queries
including RMPP.

I don't know if this is practical for your SC05 needs.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation

2005-11-01 Thread Fab Tillier
> From: Kenneth L Jeffries [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 01, 2005 4:45 PM
> 
> > My objections are the following (as I said in my previous mail):
> >  - I don't like allocating a 1 KB IU for every send IU, since most of
> >that memory will probably never be used.
> >  - I'm not convinced that it's _ever_ a win to have the target do
> >another RDMA to fetch the indirect buffer list.  You need to
> >convince me that it's not better to simply tell the upper layers
> >what the limit on s/g list length is to fit in the current IU size.
> 
> I also don't want to allocate 1KB IU's. If IU's were fixed size, I'd want
> (probably, depending on performance testing) a fixed size of 350 bytes
> (from Fab Tiller's 64KB i/o, 4KB pages, Windows) or possibly even
> the mininum DDBD (as Fab Tiller also says).  1KB IU's with thousands
> of RC's causes me a lot of wasted space heartburn.

Even 350 bytes is a burden - imagine a target that supports a queue depth of
1000 I/Os from a few dozen initators.  Ideally, I'd like to see us use just
DDBDs and the 64-byte IU, along with registering the data buffers on a per-I/O
basis, either via FMR or regular MRs.

> [as an aside, it sure would be nice if we could do an SRP-3 (since SRP-2
> is dead) where multiple direct descriptors would be allowed. The only
> way to get multiple descriptors now is with indirect descriptors.]

That saves you 20 bytes - not a huge gain.

> I am pretty sure that someone doing a video server might want to do, say,
> 1MB i/o's. 1MB with 4KB pages means 256 descriptors and an iu of
> something over 4096 bytes. I definitely don't want to be told by the srp
> initiator that I need to use 4KB iu's. (So we agree there.)

For large I/O, doing a registration of the buffer and sending a DDBD with a
single descriptor might well provide the best performance.  If you look at the
traffic on the wire, having the target do multiple page-sized RDMA operations is
far less efficient than creating a virtual contiguous (to the target) region
that a single RDMA operation can service.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation

2005-11-01 Thread Fab Tillier
> From: John Kingman [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 01, 2005 7:26 AM
> 
> On Mon, 31 Oct 2005, Roland Dreier wrote:
> 
> >With that said I don't think I like this patch.  I don't think it's a
> >win to allocate 1 KB IUs when we'll almost never have gather/scatter
> >lists that big.  Even the 256 byte IUs that the current driver uses
> >seem on the borderline of being too big.
> >
> >Also, is it really a win to have the target fetch a large indirect
> >buffer list?  It seems like it would be better for performance to give
> >the SCSI layer a limit on the size of the gather/scatter list we
> >support so that our indirect buffer lists always fit in the IUs we send.
> 
> Without knowing what the optimal values should be, perhaps we should
> make some of these module parameters.

The Windows SRP initiator sizes the IU to be capable of performing a 64KB I/O
with all SGEs specified in the IU.  It takes 350 bytes to be able to put the
full SGL into an IDBD IU, assuming 4K pages.

An alternative is to always force a RDMA read of the SGL, and just go with the
minimum size IU.  I don't know how this would affect performance, though -
likely increased latencies.  In fact, in environments where each I/O buffer can
be registered (via regular or fast MR) on the fly, DDBD should be used and the
IU would be a constant 64 bytes.  This should yield the best performance.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] TCP/IP connection service over IB

2005-10-21 Thread Fab Tillier
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 21, 2005 12:38 PM
> 
> On Fri, 21 Oct 2005, Sean Hefty wrote:
> 
> > > sean> version(8) | reserved(8) | src port (16)
> > version(1) | reserved(1) | src port (2)
> > > sean> src ip (16)
> > > sean> dst ip (16)
> > > sean> user private data (56)  /* for version 1 */
> > >
> > > Are the numbers in parens in bytes or bits? It looks like a mixture to me.
> >
> > Uhm.. they were a mix.  Changed above to bytes.
> 
> Ok. I assume that your 1 byte of version information is broken into 2
> 4-bit pieces, one for the protocol version and one for the IP version.

Doesn't leading-zero-padding the IPv4 addresses to be 16 bytes eliminates the
need for an IP version field?

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 1:40 PM
> 
> Fab Tillier wrote:
> > I'd like to see us define the protocol independent of the service ID.
> > We can then establish a service ID range to be used with this protocol
> > for NFS/RDMA, or for more generic TCP mappings, but these are two
> > different issues to me.
> 
> But the protocol (if you define a private data format as a protocol) has no
> meaning to the CM.  It only has meaning to the application that's listening on
> the service ID.

The same can be said of the starting local QPN, responder resource, initiator
depth, starting PSN, MTU, and so forth.  The CM doesn't care about these - the
application does, as these settings affect how it configures its QP and what
features of its protocol it can use.

>  Using a reserved bit in the REQ mixes the CM's protocol
> (which is to process REQs, REPs, etc.) with that of the application.

There are a number of fields that are not used by the CM state machine that are
included in these MADs already.  These fields are defined in the CM protocol not
because they impact MAD processing in the CM, but because they represent minimum
information needed to configure a QP and client.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 1:30 PM
> 
> Using a bit in the REQ means that the higher level connection management
> protocol needs to receive and process CM REQs.  How does the REQ get routed to
> the higher level CM?  If it's based on service ID, then why is the bit needed
> at all?  If I'm routing based on this bit, then I could just as easily define
> this
> protocol to exist on a single service ID, and still route on service ID.  The
> upper level CM can then demultiplex to the correct application based on the
> addresses found in the private data.
> 
> Using a reserved bit is essentially adding a 65th bit to the service ID.

I disagree.  Using a reserved bit indicates that the first 32-bytes of private
data have a known format and can be evaluated by an entity shared by multiple
clients (the CMA).

The service ID on the other hand indicates what protocol is implemented over the
connection once it is established.
 
> In any case, I don't see how defining this private data format without
> specifying which service IDs use it is all that useful.

You can do both, but I think they are separate.  The protocol can be useful
outside the scope of DAPL or NFS/RDMA.  WSD could use it, and then use a
higher-level CM to do all the IP to IB path management rather than duplicating
it.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 1:18 PM
> 
> Fab> My understanding was that we want the IBTA to add a section
> Fab> in the IB spec to define this higher-level connection
> Fab> management protocol, specifically the use of the first
> Fab> 32-bytes of the private data in the REQ to contain the source
> Fab> and destination IP addresses associated with the source and
> Fab> destination GIDs in the primary and alternate paths.
> 
> Yes, but there's no point in doing this unless there's a defined range
> of service IDs to map TCP ports onto.  If every protocol needs to
> define its own service ID mapping, then the protocol might as well
> define how it uses the IB CM private data to carry IP addressing info.
> This is exactly what SDP does today.  However, this solution is
> apparently not acceptable for NFS/RDMA.  Hence the current discussion.

I'm not saying we shouldn't define a range of service IDs, I'm questioning
whether we should restrict the use of this protocol to just the defined range of
service IDs.  I think there's a benefit in having different protocols use a
well-established and defined way of mapping IP addresses to IB.

I'd like to see us define the protocol independent of the service ID.  We can
then establish a service ID range to be used with this protocol for NFS/RDMA, or
for more generic TCP mappings, but these are two different issues to me.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 12:59 PM
> 
> On Thu, 20 Oct 2005, Fab Tillier wrote:
> 
> > > From: James Lentini [mailto:[EMAIL PROTECTED]
> > > Sent: Thursday, October 20, 2005 11:39 AM
> > >
> > > I like Sean's idea better. Have a well know service id or range of
> > > service ids on which this protocol is used. I think of it as a service
> > > running on top of the CM protocol for using IP addresses on native IB.
> > > I don't think it should be mandatory for every CM connection.
> >
> > The well known service ID implies that a DAPL application *would*
> > prevent a TCP application from using a particular port, which seems
> > to conflict your statement that DAPL apps shouldn't prevent TCP apps
> > from working.
> 
> I don't understood what you mean by TCP application. I assumed you
> meant an application that uses the Berkley sockets API to communicate
> over TCP, but I see now that is not what you meant.  This IBTA
> proposal does not involve any interactions with the TCP protocol
> stack.

I meant a TCP application that was re-routed over IB through the use of some
protocol (SDP-like).  SDP itself isn't a good example because it already handles
the IP addressing issues itself in the hello message.

> > That's not to say you couldn't have one range of service IDs for TCP
> > applications,
> 
> What do you mean by "TCP applications" in this context?

Applications that expect TCP-like behavior with respect to IP address and port
usage.

> > and another range for DAPL applications,
> 
> I don't see a reason why DAPL applications couldn't take advantage of
> the services being provided by the proposed protocol.

It depends on whether DAPL expects to consume a full TCP address (IP+port), or
is just using the IP addresses to 'facilitate' connection establishment.

> > and yet another range per protocol or application that wishes to use
> > IP addressing during connection establishment.
> 
> How are the applications in this group different from the "TCP
> applications" above?

An application may wish to use IP addresses (without port numbers) to allow
users to easily specify addressing information in a way they are familiar with.
However, such an application may not care about the port number at all, and
there's no need to force it to claim a port (and thus prevent someone who cares
about port numbers from getting one).

DAPL to me fell into this category, but maybe it falls into the "TCP" category.

> > However, this doesn't extend the CM protocol, but just creates an
> > ad-hoc group of protocols that happen to define the first 32-bytes
> > of their private data similarly.
> >
> > Having a bit in the CM REQ indicate whether the first 32-bytes of
> > private data contain the source and destination IP addresses allows
> > any app using any service ID to use IP addresses as source and
> > destination identifiers regardless of what protocol they actually
> > use once the connection is established.
> 
> For a particular protocol, I would expect this addressing service
> either to be used or not used. I can't envision a situation were you
> would want the protocol to use this service in some situations and not
> use the service in others.

Using a bit allows the protocol to be used independently of the service ID,
allowing any client, using any service ID, to use the facility if it so desires.


I wasn't advocating allowing arbitrary use of the protocol with any given
service ID, and I agree with you that the protocol would be either used or not
given a particular service ID.

> If multiple protocols are going to be using the same service id (some
> times an server for protocol X is listening on service ID Z, sometimes
> a server for protocol Y is listening on service ID Z,...) and their
> use of this service isn't consistent, then I agree that the CM bit
> solves this problem.

The CM bit allows protocol usage to be clear and independent of service ID.  It
comes down to whether we want to tie protocol use with a set of SIDs, rather
than defining a protocol generically, and just tying SID usage to protocol use.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 12:11 PM
> 
> Fab Tillier wrote:
> > That's not to say you couldn't have one range of service IDs for TCP
> > applications, and another range for DAPL applications, and yet another
> > range per protocol or application that wishes to use IP addressing
> > during connection establishment.  However, this doesn't extend the
> > CM protocol, but just creates an ad-hoc group of protocols that happen
> > to define the first 32-bytes of their private data similarly.
> 
> If applications map their "port" numbers to different service IDs, then
> there's no need to define the private data at all.  The CM can perform
> its job without changes and route based purely on service IDs.  The only
> reason to use a reserve bit or change the version is if the CM needs to
> look into the private data.
> 
> The definition of private data is an issue for an upper level connection
> manager.  My hope is that this can be defined such that the upper level
> connection manager can support multiple transports, so I don't have to
> build an upper level upper level connection manager.

My understanding was that we want the IBTA to add a section in the IB spec to
define this higher-level connection management protocol, specifically the use of
the first 32-bytes of the private data in the REQ to contain the source and
destination IP addresses associated with the source and destination GIDs in the
primary and alternate paths.

If that's not the case, then why is the IBTA SW working group involved here?
Why do they care?

If my understanding is correct, the bit would have meaning to this higher-level
connection management protocol, and not to the lower level IB connection
management protocol.  Defining a range of service IDs for protocols that use
this feature creates a bound group that then requires a rev of the spec anytime
someone else wants in on the fun.  I think defining the higher level protocol
without restricting the scope of service IDs would be beneficial.

> > Having a bit in the CM REQ indicate whether the first 32-bytes of
> > private data contain the source and destination IP addresses allows
> > any app using any service ID to use IP addresses as source and
> > destination identifiers regardless of what protocol they actually
> > use once the connection is established.
> 
> What does the CM do with this bit?

The IB CM does nothing.  A higher-level, IP addressing aware CM protocol defined
by the IBTA would.  If a connection request comes in on a particular SID handled
by the higher level CM and doesn't have the bit set, then the request should be
rejected as malformed.  If the bit is set, the higher level CM could check that
the source and destination IP addresses provided match the GIDs specified in the
primary and alternate paths.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 11:39 AM
> 
> I like Sean's idea better. Have a well know service id or range of
> service ids on which this protocol is used. I think of it as a service
> running on top of the CM protocol for using IP addresses on native IB.
> I don't think it should be mandatory for every CM connection.

The well known service ID implies that a DAPL application *would* prevent a TCP
application from using a particular port, which seems to conflict your statement
that DAPL apps shouldn't prevent TCP apps from working.

That's not to say you couldn't have one range of service IDs for TCP
applications, and another range for DAPL applications, and yet another range per
protocol or application that wishes to use IP addressing during connection
establishment.  However, this doesn't extend the CM protocol, but just creates
an ad-hoc group of protocols that happen to define the first 32-bytes of their
private data similarly.

Having a bit in the CM REQ indicate whether the first 32-bytes of private data
contain the source and destination IP addresses allows any app using any service
ID to use IP addresses as source and destination identifiers regardless of what
protocol they actually use once the connection is established.

Defining service ID ranges for particular protocols then becomes the
responsibility of the organizations defining such protocols and the owner of the
OUI with which the service ID ranges are defined, and is outside the scope of
the IBTA.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 11:11 AM
> 
> Fab Tillier wrote:
> > I would personally rather see a reserved bit get used.
> > Imagine a system that has two protocols installed that
> > use IP addressing.  That system might want to have different
> > apps listening on the same port number over both, even though
> > the protocols are different.
> 
> I don't think that this maps well to TCP.  Apps need to listen on
> different ports.

Are DAPL apps TCP apps?  I thought they just wanted to use IP addresses for
connection establishment, but weren't actual TCP apps.  If DAPL apps aren't TCP
apps, should they block usage of the TCP port from real TCP apps?

> > Having a reserved bit in the REQ indicate the presence of IP
> > addressing information (including source and destination port
> > numbers) in the private data seems most flexible to me.
> 
> How would a reserved bit help here?  How does the CM know which
> app to give the request to?

Based on the ServiceID provided by the applications on both sides of the
connection.

> My preference is to use the service ID, with a mapping that looks like:
> 
> (OPENIB_OUI << 48) + port number
> 
> because that makes my job easier.  :)

I think having a range of service IDs defined for TCP applications makes sense.

So for TCP apps, the port number would be encapsulated in the SID as you
suggest, and non-TCP apps that want to use IP addresses for connection
establishment wouldn't care about ports and would use their own SID.  This
eliminates the need to put the port numbers in the private data - only the
source and destination IP addresses.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 11:11 AM
> 
> Fab Tillier wrote:
> > I would personally rather see a reserved bit get used.
> > Imagine a system that has two protocols installed that
> > use IP addressing.  That system might want to have different
> > apps listening on the same port number over both, even though
> > the protocols are different.
> 
> I don't think that this maps well to TCP.  Apps need to listen on
> different ports.

Are DAPL apps TCP apps?  I thought they just wanted to use IP addresses for
connection establishment, but weren't actual TCP apps.  If DAPL apps aren't TCP
apps, should they block usage of the TCP port from real TCP apps?

> > Having a reserved bit in the REQ indicate the presence of IP
> > addressing information (including source and destination port
> > numbers) in the private data seems most flexible to me.
> 
> How would a reserved bit help here?  How does the CM know which
> app to give the request to?

Based on the ServiceID provided by the applications on both sides of the
connection.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 10:25 AM
> 
> If we use an IBTA assigned service ID, I think that this can be
> defined without using a reserved bit or changing a version number.
> The two possible implementations that I see are using a single
> service ID, or mapping port numbers to a range of assigned service
> IDs.

I would personally rather see a reserved bit get used.  Imagine a system that
has two protocols installed that use IP addressing.  That system might want to
have different apps listening on the same port number over both, even though the
protocols are different.

Having a reserved bit in the REQ indicate the presence of IP addressing
information (including source and destination port numbers) in the private data
seems most flexible to me.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Fab Tillier
> From: Michael Krause [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 20, 2005 10:00 AM
> 
> This is really an IBTA issue to resolve and to insure that backward
> compatibility with existing applications is maintained.  Hence, this exercise
> of who is broken or not is inherently flawed in that one cannot comprehend all
> implementations that may exist. Therefore, the spec should use either a new
> version number or a reserved bit to indicate that there is a defined format to
> the private data portion or not.   This is no different than what is done in
> other technologies such as PCIe.  Those applications that require the existing
> semantics will be confined to the existing associated infrastructure.  Those
> that want the new IP semantics set the bit / version and operate within the
> restricted private data space available.  It is that simple.

While I agree with you, the issue at hand is that DAPL tries to do both -
providing IP semantics to the application *and* 64-bytes of private data.  While
the IBTA may use a reserved bit to differentiate native IB or IP-enhanced
connection establishment MADs, if DAPL is to use this feature then DAPL clients
will lose some of their private data.  This gets us back to how to handle DAPL
clients that depend on the full 64 bytes of private data and how to support
them, which is a DAPL issue IMO and not an IBTA issue.  The IBTA should do
what's right for IB independently of DAPL, and define a proper IP-enhanced CM
protocol.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] OpenSM Interactive Console

2005-10-18 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 18, 2005 12:11 PM
> 
> If you have a request for a command you would like in the console, I
> would like to compile a list of these.

I think it would be great to have console commands to dump information from the
SM - like linear and multicast forwarding tables, service registrations, LID
assignment, etc.  Maybe there's a way already to do this interactively, but I'm
not aware of one.  If there is, please ignore me.

Thanks,

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: IBM eHCA testing..

2005-10-14 Thread Fab Tillier
> From: Troy Benjegerdes [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 14, 2005 9:09 AM
> 
> I just discovered another problem.. We have been running pfvs2 over
> IPoIB on the same subnet, and in debugging this, I restarted opensm
> several times, and somewhere in the stack a PVFS2 write failed. I
> wouldn't think that a short downtime of the SM from restarting it would
> cause any IPoIB TCP sessions to fall over..

If the path has already been resolved, traffic (even multicast) between existing
nodes will survive the SM going down.  You run into issues if you try to talk to
a new node and attempt to contact the SM for a path record, or if you try to
bring up a new interface.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] QP with large starting sequence adds latencyto RDMA READ???

2005-10-13 Thread Fab Tillier
> From: Arlin Davis [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 13, 2005 9:42 AM
> 
> Sean Hefty wrote:
> 
> > Arlin Davis wrote:
> >
> >> I just noticed some RDMA read performance issues that seem to be
> >> related to the QP starting sequence number. If I set the starting
> >> sequence to 1 then all is fine but if I set it to 0x1 then it
> >> seems to add ~40us to my 32KB RDMA read operation (polling for
> >> completions). Has anyone seen anything like this?
> >
> >
> > Has anyone else noticed this issue?  You could try to reproduce this
> > by using the rdma_bw test and changing the PSN.
> >
> > - Sean
> >
> 
> I added a starting PSN and RDMA READ option to the rdma_bw test and was
> able to reproduce on a PCI-E adapter with 4.6.2 firmware. I retried on a
> system with 4.7.0 and it looks like the problem is fixed. However,  I
> see nothing about this problem in the "bug fix" list in the release
> notes. Can someone at Mellanox confirm this problem with RDMA reads and
> add to release notes as a fix so it is documented somewhere?
> 
> http://www.mellanox.com/products/fw_images/fw-25208-4_7_0-release_notes.pdf

Note that I have seen similar behavior (drop in bandwidth) correlated to
starting PSN using Winsock Direct under Windows, so this doesn't seem to be a
uDAPL or Linux issue.  As for Arlin, the issue disappeared in firmware 4.7.0,
and I too would like to see some confirmation that there was an issue and that
it was fixed.

Thanks,

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] DMA mapping abuses in MAD layer

2005-10-11 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 11, 2005 8:39 PM
> 
> On mainstream architectures, it turns out that we can get away with
> violating this rule.  However, on non-cache-coherent architectures
> like PowerPC 4xx, dma_map_single(..., DMA_TO_DEVICE) does a cache
> flush, which makes sure that the contents of the CPU's cache are
> really written to memory.  If a driver then changes the contents of
> the buffer after the call to dma_map_single(), then it's quite likely
> that the change will be made only in the CPU's cache and the device
> will end up DMA-ing the old data.
> 
> The problem I hit is in ib_post_send_mad(), specifically:
> 
>   smp = (struct ib_smp *)send_wr->wr.ud.mad_hdr;
>   if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
>   ret = handle_outgoing_dr_smp(mad_agent_priv, smp,
>send_wr);
> 
> basically, when the MAD layer goes to send a directed route reply, it
> changes the MAD buffer after the DMA mapping is done.  The HCA
> doesn't see the change, the wrong packet gets sent and the SM never
> sees replies to its queries.
> 
> Adding a PPC-specific cache flush call after the call to
> handle_outgoing_dr_smp() fixes things to the point that the port can
> be brought to ACTIVE, and in fact IPoIB works as well.  However, this
> is just a cludge -- the real fix will need to be more invasive.  It
> seems that the whole interface to the MAD layer may need to be
> reorganized to avoid doing this.

Why not just use inline sends for the special QPs and remove the need to perform
any DMA mappings on the send side altogether?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] [CMA] RDMA CM abstraction module

2005-10-10 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 10, 2005 11:16 AM
> 
> Michael S. Tsirkin wrote:
> > Maybe rdma_connection (these things encapsulate connectin state)?
> > Or, rdma_sock or rdma_socket, since people are used to the fact that
> > connections are sockets?
> 
> Any objection to rdma_socket?

I don't like rdma_socket, since you can't actually perform any I/O operations on
the rdma_socket, unlike normal sockets.  We're dealing only with the connection
part of the problem, and the name should reflect that.  So rdma_connection,
rdma_conn, or rdma_cid seem more appropriate.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Lustre Network Driver - KDAPL or verbs?

2005-10-09 Thread Fab Tillier
> From: Peter J. Braam [mailto:[EMAIL PROTECTED]
> Sent: Sunday, October 09, 2005 2:18 PM
> 
> Cluster File Systems, Inc and its customers have been wondering if the Lustre
> Network Driver (LND) for OpenIb gen2, which we will begin to develop during
> the coming months, should be based on kdapl or verbs.
> 
> The driver we plan to develop should strive to address several goals:
>  - high reliability and performance
>  - allow interoperability between user and kernel level
>  - allow interoperability, or better, portability among different operating
> systems (Linux, OS X, Windows, Solaris)
>  - be suitable for inclusion in the Linux kernel

I think that suitability for inclusion in the Linux kernel is going to be
mutually exclusive with portability between different operating systems.  If you
want to be in the Linux kernel, you need to be a native Linux driver, and not
use any sorts of abstraction layers.  Feedback to date on abstraction layers has
been consistently clear that they will not be tolerated in the kernel.

With the ongoing work to support both IB and iWarp devices under the OpenIB
verbs, I think coding directly to verbs would be just fine.  You'll likely want
to use the higher level CM abstraction being developed now for establishing
connections in a transport neutral manner, but the verbs themselves should be
the same.  Others more closely involved can likely give you better guidance.

With all this said, I'm personally interested to see a cluster file system on
top of the OpenIB Windows stack, and since kDAPL doesn't exist in Windows at the
moment, interfacing to native verbs would be my preference.  There really aren't
that many differences in verbs, though Windows will likely make you deal with
more things asynchronously depending on your IRQL.  I'd be happy to field
specific questions about Windows on the openib-windows mailing list if you have
them.

Cheers,

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 05, 2005 12:07 PM
> 
> Shirley> The port failure means the SW clients initilization of
> Shirley> that port failure. Doesn't matter whether the link is
> Shirley> up/down or the hardware/firmare problem. If encountering
> Shirley> any of the SW errors, the upper users can't use that port
> Shirley> correctly, or even the whole device correctly. It's
> Shirley> easily to prove that if you set error points during
> Shirley> client registration and start the upper users. The
> Shirley> problems could be kernel hung, kernel oops. For example,
> Shirley> if mad_client initilization ports failure and you start
> Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
> Shirley> failure, the ipoib multicast join will hit kernel
> Shirley> oops. Staring the upper users without checking the
> Shirley> depency resouce allocation is buggy. It is definitely
> Shirley> worth to spend time to address this.
> 
> Yes, I agree we should fix the bugs in error handling during
> registration.  However, I don't think that a mask of ports is the
> right answer -- it doesn't seem to address the real issue.  We should
> just make sure that if, say, the MAD layer fails to initialize a
> device, then all clients that depend on the MAD layer don't try to use
> that device.

Shouldn't a user get an error (not an oops) if they try to use the MAD layer for
a device that didn't initialize properly within the MAD layer?  Doesn't the MAD
layer trap that device requests are valid?  It seems that adding such checks
would be much simpler to implement, rather than trying to figure out how to
express these limitations to the various ULPs.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Fab Tillier
> From: Shirley Ma [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 05, 2005 11:56 AM
> 
> The port failure means the SW clients initilization of that port failure.
> Doesn't matter whether the link is up/down or the hardware/firmare problem. If
> encountering any of the SW errors, the upper users can't use that port
> correctly, or even the whole device correctly. It's easily to prove that if
> you set error points during client registration and start the upper users. The
> problems could be kernel hung, kernel oops. For example, if mad_client
> initilization ports failure and you start ipoib_client. ifconfig will hung in
> kernel. If sa_client failure, the ipoib multicast join will hit kernel oops.
> Staring the upper users without checking the depency resouce allocation is
> buggy. It is  definitely worth to spend time to address this.

This sounds like bugs in the code where we don't trap failures gracefully.  I
think fixing that is probably much more useful.  There will always be situations
where runtime errors can occur (memory allocation failure, for example), and all
upper level protocols must handle failures of these calls.

Putting in code and requiring every client to compare all the various bit fields
they're interested in doesn't remove the need for proper error handling.  Proper
error handling should resolve both the ifconfig hang and multicast join oops.

Just my $0.02

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: opensm and SIGINT

2005-09-21 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 21, 2005 6:01 PM
> 
> Hi Viswa,
> 
> On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:
> > Currently opensm traps SIGINT. There was some discussion to remove it.
> > I have currently running some tests on opensm
> > by killing (SIGKILL) and restarting opensm. So far I ahve not found
> > any resource leak issues. Is ther a plan to remove that
> > signal handler. Ideally it should not exist.
> 
> Eitan stated that this was historical in nature for gen1 drivers which
> had resource tracking problems: "if OpenSM left without cleaning up all
> used resources (like MAD buffers and UD-AVs), the driver oops'ed."
> 
> I think that (eliminating the handler for SIGINT) can at least be done
> for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor
> layers for starters. I will experiment with gen2 and let you know.

I'd like to see elimination of signal handling removed from the Windows version
too.  If there's a bug in the transport due to resource leaks, that needs to be
fixed, not masked by handling signals.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] Allow setting of NodeDescription

2005-09-16 Thread Fab Tillier
> From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 15, 2005 9:55 PM
> 
> Roland Dreier wrote:
> > The kernel will need to have the HCA port active with the mthca driver
> > running before it can mount root and get to /etc/sysconfig/network or
> > wherever the hostname is set.
> 
> If we limit our solution to the common case:
> On a regular non disk-less machine is it possible to have the node description
> be set before the QP0 is physically UP?

I don't know where the machine name is set in Linux.  In Windows, it is stored
in the registry, and loaded when the access layer first loads (before any QPs
are allocated).

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] Allow setting of NodeDescription

2005-09-15 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 15, 2005 9:01 AM
> 
> Jack> The resulting set of NodeDescription strings present in the
> Jack> SM and SA could then be a race-dependent salad (depending on
> Jack> the timing of QP0 entering RTS state, SM subnet sweep, and
> Jack> resetting of the local NodeDescription string).
> 
> Yes, it's unfortunate.
> 
> But I don't see any way to handle the situation arising when booting
> over IB, where a system needs the SM to bring its port to active
> before it can boot, but where the system doesn't know its host name
> until after it boots.

What happens during the handoff from the boot environment to the OS?  Does the
HCA get disabled and then the mthca driver starts fresh?  Or does the mthca
driver inherit a device that is already fully initialized.  If it gets
re-initialized, don't the ports go down when the boot agent shuts down (and the
SM should get a GID out of service trap), followed by the ports going up when
mthca starts?  Or is the problem that the boot driver doesn't know when the
handoff is, and thus can't disable the device?

Thanks,

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] Allow setting of NodeDescription

2005-09-14 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 14, 2005 10:18 PM
> 
> Fab> To me, non volatile in this context means something like
> Fab> using the system name.
> 
> Huh??  To me non-volatile means not changing.

What I meant is that having the users be able to set anything they want at
runtime isn't non-volatile.  One could argue that the system name is volatile,
since it can be changed, but how often do people change their system names once
the system is setup?

Just ignore me if I don't make sense. :)

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] Allow setting of NodeDescription

2005-09-14 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 14, 2005 6:56 PM
> 
> Roland>  - Allows userspace to set node_desc by writing into sysfs file,
> Hal> Shouldn't there be a non volatile way to do this ?
> 
> I'm not sure what "non volatile" means in this context.  For Mellanox
> HCAs, one can already set a permanent NodeDescription in the flash
> when burning firmware.

To me, non volatile in this context means something like using the system name.
Would this be hard to do?  In fact, I would prefer to see the system name used
instead of whatever is programmed in the HCA as the default.  Granted, this
makes the node description burned in the firmware useless, but that doesn't seem
like a big deal.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: Opensm - casting issues #2

2005-09-13 Thread Fab Tillier
> From: Ryan, Jim [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 13, 2005 10:07 AM
> 
> My recollection is Matt Leininger, could be wrong

I believe that Matt was just the messenger, conveying his organization's
position on the matter.  Whether or not we agree with that position is
immaterial - it is Sandia's prerogative.

There was a need to have a separate repository independent of hosting issues.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: Opensm - casting issues #2

2005-09-13 Thread Fab Tillier
> From: Ronald G Minnich [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 13, 2005 10:11 AM
> 
> Roland Dreier wrote:
> 
> > Actually I think the issue was somewhat different.  Microsoft is so
> > allergic to the GPL that they asked for the code to be in a physically
> > separate repository.
> >
> 
> that makes much more sense, ah, well, not really, but it is easier to
> understand. I doubt the Labs would have any objection to Windows code.

Sandia did object, so we found an alternate host for the Windows SVN repository.

> Actually, I kind of wish that the code were all at openib.org. Should we
> really pay that much heed to requests of this sort if it makes life hard
> for openib people?

The code is all under the openib.org domain.  The Windows SVN repository is at
svn://windows.openib.org.

Given the goals of the Windows project, heeding Microsoft's requests made sense.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: Opensm - casting issues #2

2005-09-13 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 13, 2005 10:08 AM
> 
> Sean> My understanding is that the labs, who control the OpenIB
> Sean> servers, refused to host any Windows related code, forcing
> Sean> it to have a separate repository.
> 
> Christoph> It shouldn't be difficult to find someone to host it.
> Christoph> I could maybe ask if such a repo could be put at the
> Christoph> lst.de servers.
> 
> Actually I think the issue was somewhat different.  Microsoft is so
> allergic to the GPL that they asked for the code to be in a physically
> separate repository.

Microsoft requested a separate repository, not separate servers.  Sandia
currently hosts the OpenIB SVN repository for Linux and did not want to host the
Windows code since they have no interest in it.  Yes, this makes things a bit
more cumbersome, but such is life.

The DDK license supposedly has limitations that make it incompatible with the
GPL license - building GPL code with the DDK would be a violation of the DDK
license somehow.  I have no interest in revisiting this topic - it is what it
is, we've argued endlessly about it already, so let's just move on.

That said, I personally don't see any issue with user-mode tools being
dual-license - it's the core bits that can't be.  As far as I'm concerned,
having OpenSM maintained in the Linux SVN repository is fine.  It would be handy
to have a shadow in the Windows repository so that it's easy to get and build,
and that's what I think the plan is.

As a note, the uDAPL code in the Windows SVN has the uDAPL triple license.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RFC: ib_set_comp_handler

2005-09-12 Thread Fab Tillier
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Monday, September 12, 2005 10:03 AM
> 
> Quoting r. Fab Tillier <[EMAIL PROTECTED]>:
> > It seems that what you really want is a way to disarm a CQ, not change the
> > completion handler.
> Yes, but changing the handler to an empty function looks like an easy
> way to do it, without adding conditions on typical event path.

It might be worth changing the name of the function to reflect that -
ib_clr_comp_handler, for example.

> See the patch I posted separately.

Sorry, I missed that.  I'll take a look.

> > Are CQs shared between sockets in SDP, or does each socket
> > have its own CQ?
> 
> Currently each socket has 2 CQs.

Doesn't the verbs layer provide synchronization between CQ callbacks and CQ
destruction?  Why not just destroy the CQs and avoid having to change the
handler?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RFC: ib_set_comp_handler

2005-09-12 Thread Fab Tillier
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Sunday, September 11, 2005 1:32 PM
> 
> Hi!
> I'd like to add a capability to change the cq completion handler.
>
> It seems this cant be done in the ULP without introducing additional
> indirection and/or locking, which I'd like to avoid.

You need to have locking to properly synchronize so that the user can know when
their "old" callback handler will cease to be invoked.  I agree however that you
need some help from the verbs layer because only it knows when callbacks are in
progress.

> I'd use it in sdp to disable cq events while a connection is destroyed.

Why not just move the QP to the reset state to suppress any further completions,
then poll the CQ for any prior completions?  Aren't you guaranteed that once the
QP is in reset, all pending CQEs have been written?

> It also seems like ipoib could use such a capability, simply blocking
> completion events instead of waiting for 5 seconds in ipoib_ib_dev_stop.
> I expect this to be useful in other scenarious (IPoIB NAPI?).

It seems that what you really want is a way to disarm a CQ, not change the
completion handler.  Are CQs shared between sockets in SDP, or does each socket
have its own CQ?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP

2005-09-07 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 07, 2005 11:55 AM
> 
> Fab Tillier wrote:
> >>I'm not sure.  The first thought that comes to mind is having a MAD
> >>redirection module that the CM could query before sending any message.
> >
> > Since the ARI for a redirect is defined by the IB spec, why not have
> > the CM just update the redirect_qpn (or better yet the forthcoming
> > redirection gizmo) itself when it receives such an REJ?
> 
> The CM still needs to know that redirection should occur, cache that
> information somewhere, and use it for other CM traffic to that same
> destination.  Updating the redirect_qpn only fixes the issue for that
> single connection.  My thought was to have a module that the CM would
> call to insert a redirected endpoint, and then call again to see if an
> outbound MAD should be redirected.  Hopefully, such a module could be
> defined to be useful for other management classes.

Right, hence the "or better yet the forthcoming redirection gizmo".  My point
was that the CM should handle this, not the CM client (in this case SDP).

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP

2005-09-07 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 07, 2005 10:53 AM
> 
> Roland Dreier wrote:
> > One question on the CM interface:
> >
> > > + cm_id->redirect_qpn =
> > > + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32))
> > > + & 0x00ff;
> >
> > It seems a little awkward that a consumer has to poke a value back
> > into the cm_id structure.  Sean, how do you want to handle this?
> 
> I'm not sure.  The first thought that comes to mind is having a MAD
> redirection module that the CM could query before sending any message.

Since the ARI for a redirect is defined by the IB spec, why not have the CM just
update the redirect_qpn (or better yet the forthcoming redirection gizmo) itself
when it receives such an REJ?

In any case, the CM has to drive any updates of cached redirection information
since it is the recipient of the REJ carrying that information.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] [CM] 1/2 Fix CM redirection

2005-09-07 Thread Fab Tillier
> From: John Kingman [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 07, 2005 10:09 AM
> 
> I found that CM handling for SRP is broken when handling a REJ with
> reason 24 (Port and CM Redirection) with a RedirectLID supplied.  As
> stated in the spec, if RedirectLID is non-zero, it is the DLID a
> requester _shall_ use to access the class services.  I believe that
> without this support, CM does not comply with C13-28 with respect to
> RedirectLID.
> 
> In my testing, the following patches seem to fix the problem.  If there
> is a better way to fix the problem, I would appreciate the input.
> 
> Signed-off-by: John Kingman  storagegear.com>
> 
> Index: ib_cm.h
> ===
> --- ib_cm.h   (revision 3328)
> +++ ib_cm.h   (working copy)
> @@ -290,6 +290,7 @@ struct ib_cm_id {
>   enum ib_cm_lap_statelap_state;  /* internal CM/debug use */
>   __be32  local_id;
>   __be32  remote_id;
> + u32 redirect_qpn;
>  };
> 
>  /**
> 
> Index: cm.c
> ===
> --- cm.c  (revision 3328)
> +++ cm.c  (working copy)
> @@ -167,13 +167,14 @@ static int cm_alloc_msg(struct cm_id_pri
>   struct ib_mad_agent *mad_agent;
>   struct ib_mad_send_buf *m;
>   struct ib_ah *ah;
> + u32qpn = cm_id_priv->id.redirect_qpn? cm_id_priv->id.redirect_qpn:
> 1;
> 
>   mad_agent = cm_id_priv->av.port->mad_agent;
>   ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr);
>   if (IS_ERR(ah))
>   return PTR_ERR(ah);
> 
> - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index,
> + m = ib_create_send_mad(mad_agent, qpn, cm_id_priv->av.pkey_index,
>  ah, 0, sizeof(struct ib_mad_hdr),
>  sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr),
>  GFP_ATOMIC);

Why not just initialize redirect_qpn to 1 and just use it always?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: OpenSM 1.8.0 libvendor initial merge nits

2005-08-31 Thread Fab Tillier
> From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 31, 2005 10:45 AM
> 
> Hal Rosenstock wrote:
> > Hi again Yael & Eitan,
> >
> > I've now merged the OpenSM 1.8.0 libvendor changes and found
> > the following:
> >
> > General nits:
> >
> > There are a number of violations of the coding style here as well. Also,
> > There is some unneeded whitespace added to a number of files.
>
> We should run osm_check_n_fix this will get this fixed.
> I also think we need to decide if we want to change the OpenSM coding
> style to use tabs or we keep the no-tabs rule.

I personally prefer tabs to spaces, as it has less potential for people's
individual tab width to mess with the code.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: ibv_get_async_event

2005-08-30 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 30, 2005 4:42 PM
> 
> Sean> Roland, would a patch to fix this that is similar to what
> Sean> was done for uCM be acceptable?  (I can describe the method
> Sean> in more detail if you'd like.)  The drawback is that it
> Sean> basically adds reference counting, which would require
> Sean> calling ibv_put_event().
> 
> Hmm, I'd rather just sweep through the list of events when we destroy
> a CQ/QP/SRQ and delete any events that refer to the object we're
> destroying.  It's on my to-do list but I'll definitely take patches if
> you do it first.

Couldn't an event be "in flight" when the user destroys an object, causing the
event to be delivered post-destruction?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: James Lentini [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 1:58 PM
> 
> On Wed, 24 Aug 2005, Fab Tillier wrote:
> 
> > > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, August 24, 2005 11:03 AM
> > >
> > > Fab> Why can't the IPV field be ignored?  If a listen wants only
> > > Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> > > Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
> > > Fab> address, and would set the offset to that of the hello
> > > Fab> message's destination address (32).
> > >
> > > Yes, you're right for SDP.  I guess if we're comfortable mandating
> > > that all protocols put their source and destination IPs in the private
> > > data for the IB case, then this works.  Of course it's somewhat
> > > awkward to pass this information into the transport-neutral CM API but
> > > I think this can be worked around.
> >
> > I don't know if we need to mandate IP usage - it's up to the
> > application.  Any application that wants to have similar semantics
> > to the way socket listens work (especially when bound to one of
> > multiple IP addresses on a port) the application would have to
> > define its private data to accommodate this.
> >
> >  At the IB level, the contents of the private data are still opaque,
> > even to the CM.  The CM would only expose the ability to have it
> > perform an initial triage of requests by doing binary comparisons
> > over regions of private data.  It doesn't know (or need to know)
> > what the data represents - it only cares about finding a match (or
> > not).  The CM doesn't define any sort of policy here, and I don't
> > think it should.  It's just bytes to the CM, and it's doing a blind
> > comparison without interpreting the contents.
> 
> You need to consider what makes sense for *both* ib and iwarp. Keep in
> mind that the correct API will allow a consumer to use ib and iwarp
> devices transparently. In other words their will be one code path that
> support both.

I believe using the private data makes the most sense from the IB perspective.
One could even argue that it is the only way to provide positive "getpeername"
functionality.  Use of the IB private data does not require identical use of
private data in other technologies.

> If we were to adopt your proposal, the consumer would need to perform
> unnecessary operations on iWARP.

It doesn't have to impact the client if there's some intermediate abstraction to
isolate the client from the IB CM details (including private data use).

> A transport neutral client would be forced to put IP information into
> its CM private data on iWARP.
> 
> Likewise, a transport neutral server would be forced to pass an
> private data offset and binary blob to the listen API call on iWARP.
> 
> Neither of these make sense.

A higher-level CM abstraction could implement the policy of private data use
when running on IB without the client's involvement.  The end result still is
that you end up with a wire protocol that needs to be documented so that someone
without that exact CM abstraction knows where and how to format the private data
as well as how to interpret it.  If the IBTA defines something like this, all
these issues go away.  I don't know if the IBTA can define this without
affecting existing protocols like SDP and iSER that already define how to
encapsulate the source and destination information in the private data.

Using the private data, either by the client or some IB-specific CM abstraction,
will remove the need for any reverse lookups.  A forward lookup to validate the
incoming source GID to the source IP in the private data can validate the IP
address.  Performing a forward lookup via ARP is going to be a lot faster than
ATS if the ARP entry already exists.  On large fabrics, ARP is also going to
scale better since there's not one single entity responsible for responding to
every node's requests. 

> These API problems are secondary to the burden you would be placing on
> the protocols. As has been mentioned in a previous email, extending
> the current protocols to use this convention will require further
> standardization and in some cases may not be compatible with their
> current architecture.

I think biting the bullet now on establishing these standards for applications
using IP addressing over IB, whether in the IBTA or in each application, is
going to give us the best long term result.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:18 AM
> 
> For IB, using private data to listen on a specific IP address seems the
> easiest thing to do.  (Maybe we could do it by mapping different IP
> addresses to different service IDs, requiring registration and lookup?)

The problem with the SID method is that the SID namespace is smaller than the
IPV6 address name space.  There's no way to get every possible IPV6 address
represented by a 64-bit SID.  This further ignores the rules for SIDs in the IB
specification.  I think private data is the only way to do this properly.

> If the CM abstraction layer expected those values to be returned in the
> REP message, it could validate that the remote side it using the same
> protocol to ensure some degree of backwards compatibility.
> 
> I don't know if it makes more sense to push private data checks into the
> actual CM or keep them in a CM abstraction layer.  My guess is that the
> former may be the easier implementation.

I think putting the checks in the CM makes the most sense, though it should be
done in a generic fashion.  A CM abstraction layer could then simply apply a
policy for private data usage - where in the private data it stores the IP
address information.

Layering it this way allows the private data compare to be used for things other
than IP addresses.  Add functionality without imposing policy.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:03 AM
> 
> Fab> Why can't the IPV field be ignored?  If a listen wants only
> Fab> IPV4 addresses, it would specify a 16-byte compare buffer
> Fab> with the first 12 bytes zero, the next 4 filled with the IPV4
> Fab> address, and would set the offset to that of the hello
> Fab> message's destination address (32).
> 
> Yes, you're right for SDP.  I guess if we're comfortable mandating
> that all protocols put their source and destination IPs in the private
> data for the IB case, then this works.  Of course it's somewhat
> awkward to pass this information into the transport-neutral CM API but
> I think this can be worked around.

I don't know if we need to mandate IP usage - it's up to the application.  Any
application that wants to have similar semantics to the way socket listens work
(especially when bound to one of multiple IP addresses on a port) the
application would have to define its private data to accommodate this.
 
At the IB level, the contents of the private data are still opaque, even to the
CM.  The CM would only expose the ability to have it perform an initial triage
of requests by doing binary comparisons over regions of private data.  It
doesn't know (or need to know) what the data represents - it only cares about
finding a match (or not).  The CM doesn't define any sort of policy here, and I
don't think it should.  It's just bytes to the CM, and it's doing a blind
comparison without interpreting the contents.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 11:14 AM
> 
> On 8/24/05, Fab Tillier <[EMAIL PROTECTED]> wrote:
> > > From: Roland Dreier [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, August 24, 2005 10:16 AM
> > >
> > > Fab> Knowledge of actual IP addresses would be up to the consumer.
> > > Fab> However, the IB CM can facilitate checks by allowing the user
> > > Fab> to specify an offset and length in the private data to match
> > > Fab> to for incoming requests.
> > >
> > > This seems too complex and at the same time too limited to me.  For
> > > one thing -- although I think ATS should die -- this doesn't support
> > > ATS reverse lookups.
> >
> > I think if all ULPs provide their source and destination IP in the private
> > data, you can eliminate the reverse lookup altogether.  A simple forward
> > lookup is all that's needed to validate that the source GID in the REQ
> > matches the reported source IP in the private data.  The forward lookup
> > could be done via ATS or via ARP, but the CM doesn't need to care which
> > method is used.
> 
> That is not an option.
> 
> The applications are expecting source/destination network addresses
> that come from a network layer, not from the peer application. IP has
> no problem meeting this requirement. This is an IB problem that needs
> to be solved within the scope of IB without changing any ULPs.

If the app wants to use source/destination network addresses, there isn't a
problem.  The problem is the app wants to use IP addresses, which are *not*
network addresses in IB.  So the app needs to decide between one of two things -
be aware of IB network addresses, or provide meaning to IP addresses over IB.
The latter can't be done reliably under the covers - ATS reverse lookups won't
tell you the IP the source actually used, and there's no way to do so without
either using private data in the CM REQ or requiring a 1:1 mapping of IB:IP
addresses.  The 1:1 IB:IP mapping is not feasible, so the only way to know what
IP address the application used is to embed that into the private data.  I would
expect protocols that try to use IP as their addressing would accommodate this
in their IB usage, just like SDP accommodates it in the hello message.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 10:16 AM
> 
> Fab> Knowledge of actual IP addresses would be up to the consumer.
> Fab> However, the IB CM can facilitate checks by allowing the user
> Fab> to specify an offset and length in the private data to match
> Fab> to for incoming requests.
> 
> This seems too complex and at the same time too limited to me.  For
> one thing -- although I think ATS should die -- this doesn't support
> ATS reverse lookups.

I think if all ULPs provide their source and destination IP in the private data,
you can eliminate the reverse lookup altogether.  A simple forward lookup is all
that's needed to validate that the source GID in the REQ matches the reported
source IP in the private data.  The forward lookup could be done via ATS or via
ARP, but the CM doesn't need to care which method is used.
 
> For another, it doesn't handle something like
> the SDP Hello header, where the IP version is at a certain offset, and
> then the IP address is interpreted according to the IP address.

Why can't the IPV field be ignored?  If a listen wants only IPV4 addresses, it
would specify a 16-byte compare buffer with the first 12 bytes zero, the next 4
filled with the IPV4 address, and would set the offset to that of the hello
message's destination address (32).

> What makes it really ugly is that it's perfectly reasonable for one
> consumer to listen to a service at 192.168.11.12 and another consumer
> to listen to the same service at 192.168.98.99.  How do we handle this
> in the IB case??

As long as the service IP address (the local address on the listening side) is
always advertised in the same place in the private data, this isn't a problem.
The compare lengths and offsets would be identical for both services, but the
compare buffer contents would differ.  Did I miss what you were getting at?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RDMA connection and address translation API

2005-08-24 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 24, 2005 9:27 AM
> 
> Tom> I think I understand, but the purpose of specifying the IP
> Tom> address in the listen is not to filter incoming connect
> Tom> requests, but rather to determine which devices I listen
> Tom> on. I think this works for the IB case as well. So the
> Tom> utility of the IP address specified in the listen is only to
> Tom> determine which devices the sid is created on. Does this make
> Tom> sense or am I missing something?
> 
> Well, that's not what I would expect.  Suppose I have a device
> configured with local addresses 192.168.11.12 and 192.168.98.99 and I
> start listening for some service at the address 192.168.11.12.  I
> don't think I should see a connection request if a remote system tries
> to connect to 192.168.98.99 (even though it's the same network
> interface as 192.168.11.12).

I think the IB CM needs to be able to do two things.  It needs to allow a listen
to be bound to a specific port - using the port GUID or the LID or something
along those lines.  The Windows CM currently take a port GUID as input to allow
binding requests to a local IB port.  Incoming MADs are matched based on which
port they came in on.  This does introduce the limitation that sending CM MADs
to a port other than the one you wish to connect to won't have the desired
result if the ULP performs port filtering.  I don't think this is a big deal.

Knowledge of actual IP addresses would be up to the consumer.  However, the IB
CM can facilitate checks by allowing the user to specify an offset and length in
the private data to match to for incoming requests.  ULPs that would want to
distinguish between IP addresses on a given port would put the IP in their
private data, and instruct the CM to compare a specific value at a specific
offset and length for every incoming REQ.  The Windows CM does this - a listen
takes as input a private data compare buffer, buffer length, and offset within
the REQ private data to perform the comparison.

Without the CM performing the private data comparison for the client, there is
no way for the CM to route to the proper person based on something like IP.
Using a generic private data compare mechanism enables the users to do whatever
they feel like, without putting knowledge of IP addresses and whatnot into the
IB CM or dictating how clients must use their private data.

A lookup of a listen for an incoming request changes from just being based on
SID to taking as additional parameters the port GUID on which the REQ was
received and the REQ's private data in case a private data compare needs to be
performed.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [RFC] [uCM] proposed API changes

2005-08-11 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 11, 2005 12:10 PM
> 
> There is an issue with adding context.  When a connection REQ is received, a
> new
> kernel cm_id is created.  This cm_id doesn't have any context associated with
> it.  For kernel clients, this isn't a big deal, since all events associated
> with
> a single cm_id are serialized.  A kernel app can set the context as part of
> their REQ handling.

Serialize events for user-mode cm_ids, and allow the user client to set the
context from their REQ handler.  The latter is probably pretty easy to do, but
in and of itself doesn't solve the problem with the out-of-order events and
races between setting the context and receiving an event.

> Userspace clients will run into the same situation, where no context is
> defined.
> But events for the same cm_id are not serialized for userspace clients.  An
> app
> can receive a REJ event for a newly created cm_id that does not have a
context.
> (They can even process the REJ event before the REQ event is seen.)  Searching
> in this case is unavoidable.  I'm not even sure of the right way to handle
> this
> situation.

A search on a REJ isn't a big deal - it should be a rare case as it will only
occur if the remote side times out or aborts.  A client could ignore the REJ
because sending the REP will fail if a REJ was received.

> In a more generic sense, userspace clients need to be able to handle out of
> order events if they use multiple threads for event handling.  For example,
> MRA
> to a REQ, REP received, and REJ received events could all occur at the same
> time.  (In this case, a userspace context would be valid.)

If you allow the user to target a get_event call to a specific cm_id this
problem goes away.  If the user issues multiple requests against the same cm_id,
they need to be ready to deal with out-of-order event reporting.  This also
solves the context issue, since the REJ won't be reported until the user
requests an event from that specific cm_id.
 
- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH][RFC] uverbs SRQ implementation

2005-08-03 Thread Fab Tillier
> From: Grant Grundler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 03, 2005 10:56 AM
> 
> On Wed, Aug 03, 2005 at 09:28:04AM -0700, Roland Dreier wrote:
> > Feedback in the meantime appreciated, though...
> ...
> > if (!pd  || pd->uobject->context  != file->ucontext ||
> > !scq || scq->uobject->context != file->ucontext ||
> > -   !rcq || rcq->uobject->context != file->ucontext) {
> > +   !rcq || rcq->uobject->context != file->ucontext ||
> > +   (cmd.is_srq && (!srq || srq->uobject->context != file->ucontext))) {
> 
> I think it's redudant to test cmd.is_srq.
> srq is NULL if cmd.is_srq is not set.
> ie !srq should short circuit the rest of the test.
> 
> if idr_find() fails, I would expect it to return NULL.

If idr_find returns NULL when cmd.is_srq is non-zero, then the user passed an
invalid parameter.  Likewise, if the SRQ is not null, but its context doesn't
match, that's also an invalid parameter.

If cmd.is_srq is zero, then a NULL SRQ is perfectly fine, and there's no need to
fail the call.

That is, the check for (!srq || srq->uobject->context != file->ucontext) must
only be performed if cmd.is_srq is non-zero.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: create several RC QPs with the same init attributes structure cau ses the init attribute structure to be changed

2005-08-03 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 03, 2005 9:15 AM
> 
> Dotan> I work with gen2 with svn rev 2946 on Mellanox HCA 23108.
> Dotan> When i try to create several RC QPs with the same init
> Dotan> attributes structure,
> 
> Dotan> i can see that this structure is being changed by the verb.
> 
> This is expected and correct according to our API: the actual values
> allocated for the QP are returned in the pointer passed in by the consumer.

Why doesn't this happen with UD/UC QPs?  Is inline data not supported on those?

Also, why does the size keep growing?  It seems that if you request 1 SGE, you
get 28 bytes of max_inline.  If you then request 28 bytes of max_inline, you get
2 SGEs back, and 96 bytes of max_inline.

It seems to me something's off with the calculations, like using < rather than
<= or something like that.  If 1 SGE gives you 28 bytes, requesting 28 bytes
should give you 1 SGE.

I would have expected output like this:

s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 0 
s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 28
s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 28
s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 28
s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 28
s_wr 1, r_wr 1, s_sge 1, r_sge 1, max_inline 28

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: why does the value of the node_guid don't have the machine endianess?

2005-08-03 Thread Fab Tillier
> From: Dotan Barak [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 02, 2005 11:02 PM
> 
> I expect to get this values with the endianess of the host that i'm working
on,
> and if i will print the node_guid as a number it will be the same as the
> sys_fs value.
>
> I don't see any reason for the driver to return this value in the endianess of
> the network, i think that it is better that the driver will return the value
> of this attribute in the host order, instead of every application that query
> for this attribute will change the order of it.

It's a matter of consistency.  The stack doesn't perform byte swapping on MAD
payloads either, and SA requests (for NodeInfo, for example) will return node
GUIDs.  It's simpler to set the expectation for the client that all GUIDs are
always treated in network order, that way the client doesn't have to distinguish
between getting a GUID from a SA response, or getting a GUID from the device
directly.  It also removes the need to perform byte swapping to put the GUID in
network order when issuing SA requests that need that information (if there are
any).

Personally, I would prefer to see the GUIDs always reported in network order in
all places.  We don't want to add byte swapping policy to the MAD layer.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [Rdma-developers] Meeting(07/22) summary:OpenRDMA community development discussion

2005-08-01 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 01, 2005 9:46 AM
> 
> Yaron Haviv wrote:
> > we can spend time and discuss theories and intentions, at the end of the
> > day an iWarp RNIC cannot just reside under IB-Verbs without major
> > changes to the overall infrastructure.
> 
> I don't disagree with having a common connection library that supports both
> IB and iWarp, or that you could derive a solution from kDAPL.  But based on
> the proposed APIs that I've seen, I believe that an RNIC could reside under
> IB verbs with minimal changes, and would likely be the best engineered
> solution for including RNIC support in Linux.

Just for clarity, when you say verbs you exclude connection
establishment/management, right?

I think keeping the two distinct is important in this discussion, as it seems
there is some confusion - some people refer to verbs as verbs + CM, others as
just verbs.

Here's my take from the discussions so far:
- RNICs can probably be made to work under the IB verbs (with changes of
course).
- RNICs can probably not be made to work under the IB CM (not that I've seen
this suggested).

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RE: userspace event reporting

2005-07-20 Thread Fab Tillier
> From: Libor Michalek [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 20, 2005 5:55 PM
> 
> On Wed, Jul 20, 2005 at 05:17:09PM -0700, Fab Tillier wrote:
> > > From: Libor Michalek [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, July 20, 2005 5:02 PM
> > >
> > >   My question is do we really want this, since an application will likely
> > > have yet another table containing the app specific connection structure.
> > > How that table is locked and managed will differ based on the threading
> > > model. I think it's an incorrect assumption that if we create this level
> > > thread safety in the uCM then the app connection structure can be placed
> > > in a uCM user supplied pointer (e.g. the kCM context variable) and avoid
> > > this type of app level connection table.
> >
> > Ok, now that I think more about it, I think that having the ability to do
> > an async, per-CM ID get_event call would help tremendously.  The
> > application knows what CM IDs it has, and thus can initiate such a
> > get_event request for each. The app can then maintain its own reference
> > count on its internal structures (which is really what is needs to do),
> > rather than rely on the CM providing the reference counting and
> > synchronization.
> 
>   Well, I think having a per-CM ID get_event is a useful thread based
> programming model, but I don't think it's the one and only approach
> for all applications. Again, it's a layer that could (should?) be
> implemented entirely in userspace on top of the current interface.

I'm trying to think how you could implement the per-CM ID get event without
introducing an extra thread that would sit there polling for events, and then
perform the multiplexing.  Maybe having extra threads isn't a big deal, but
that's where the async model comes in handy - a single threaded app can get
exactly the behavior it wants using AIO.

>   Also, I agree with your last sentence. I don't think it necessarily
> follows or depends on the previous two, it's true for all most all
> serious apps, not just if there is a per-CM ID get_event.

Right - it's a matter of the CM enabling the app to do what it needs to do.

> > Is it even possible to make async get_event requests, from a coding
> > perspective? Would the resulting usage model work for clients?  If
> > using an AIO read request, could the file offset could be used to
> > convey the CM ID being polled?
> 
>   Well, you can poll on the file descriptor, which means you can submit
> an AIO poll request. You could submit an AIO read, and you would get
> back the ABI data in your buffer, which contains the cm_id. However, I'm
> not sure what you're getting at.

I don't know how AIO works in Linux.  The idea would be to allow a single
threaded client to issue a get_event AIO request for each CM ID that client has,
and then use the same thread to reap the AIO completions in whatever order they
come in.  This model allows the client to easily keep references on its internal
structures associated with each individual CM ID.

Does that make more sense?  Does Linux have a concept similar to the I/O
completion ports in Windows for gathering AIO requests?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: uCM init_qp_attr() [Re: uCM create connectionID]

2005-07-20 Thread Fab Tillier
> From: Libor Michalek [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 20, 2005 5:12 PM
> 
> On Thu, Jun 30, 2005 at 07:01:22PM -0700, Arlin Davis wrote:
> >
> > The uDAPL code is now connecting properly but I am having difficulty
> > setting the QP states properly without the ib_cm_init_qp_attr() call.
> > Any chance of providing this call in uCM?
> 
>   Sean, To do this I'll need to add a kernel CM function to get the
> connections qp attributes from 'struct cm_id_private' inorder to pass
> them to userspace. Any preference on what form the function should take?

Why can't you just expose the exiting kernel ib_cm_init_qp_attr call?  It seems
that call provides the access to cm_id_private.  As long as you buffer the
qp_attr structure properly, what are the issues?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] RE: userspace event reporting

2005-07-20 Thread Fab Tillier
> From: Libor Michalek [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 20, 2005 5:02 PM
> 
>   My question is do we really want this, since an application will likely
> have yet another table containing the app specific connection structure.
> How that table is locked and managed will differ based on the threading
> model. I think it's an incorrect assumption that if we create this level
> thread safety in the uCM then the app connection structure can be placed
> in a uCM user supplied pointer (e.g. the kCM context variable) and avoid
> this type of app level connection table.

Ok, now that I think more about it, I think that having the ability to do an
async, per-CM ID get_event call would help tremendously.  The application knows
what CM IDs it has, and thus can initiate such a get_event request for each.
The app can then maintain its own reference count on its internal structures
(which is really what is needs to do), rather than rely on the CM providing the
reference counting and synchronization.

Is it even possible to make async get_event requests, from a coding perspective?
Would the resulting usage model work for clients?  If using an AIO read request,
could the file offset could be used to convey the CM ID being polled?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: userspace event reporting

2005-07-12 Thread Fab Tillier
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 12, 2005 3:15 PM
> 
> Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> >
> > I believe that this is purely a userspace issue.  I can't see why
> > using a mutex wouldn't work, but I believe that get_event() currently
> > blocks waiting for an event.
> >
> > Note that get_event() may be reporting events associated with an
> > object other than the one being destroyed.
> 
> Maybe, create a special kind of event after the object has been destroyed?

Adding such an event in effect makes destruction asynchronous - the user must
wait until the destroy event before freeing their context.  This is similar to
what is done in Windows, but still requires tracking each CM ID in user-mode to
provide the proper serialization of events.  All non-destroyed events must have
been put back to the CM before the destroy event can be generated.  Note also
that having the kernel take references to track get_event/put_event pairs can
lead to issues if an app dies before performing the put unless there's a good
way for the kernel to distinguish graceful from abortive CM ID destruction.

So I think anyway I look at it, the uCM ends up needing to shadow the CM IDs in
user-land to maintain references.  If it does that, then Sean's earlier idea of
performing reference counting and blocking destruction should work fine and
maintain the synchronous behavior that exists today.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] userspace event reporting

2005-07-12 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 12, 2005 12:29 PM
> 
> Currently uCM doesn't associate a data structure with the cm_id.  It passes an
> index directly between the application and the kernel, so adding this would
> require abstracting away the cm_id, which is a fair amount of work.  I don't
> see any other reasonable alternative however.

Actually, it gets a bit messier, since the get_event call would needs an extra
input parameter to identify which CM ID to perform the operation on.  This is a
pretty significant semantic change since an application with many connections
will now need to make many get_event calls, one for each CM ID it wants events
from.  Unless the get_event operation can work asynchronously, this requires a
thread per CM ID, which sucks.

I don't see a clean way to solve this without the above semantic change - the
users can't perform their own reference counting because they don't know which
CM ID will get the next event.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] userspace event reporting

2005-07-12 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 12, 2005 11:53 AM
> 
> This fell out of the uCM connection ID discussion...
> 
> There's an issue reporting events to userspace clients for an object that a
> user
> may have destroyed.  The problem exists with user verbs, but is much more
> likely
> to be seen by a userspace CM client.  To avoid reporting events for a
> destroyed
> object, I think that something similar to the following could be used from
> userspace:

I had to put something like this in place in the Windows CM, except that I had
to use async destroy semantics to allow preserving the existing CM API.

> destroy() should set a state marking destruction and wait for a reference
> count
> to go to 0 before transitioning to the kernel.  The kernel code should destroy
> the associated kernel object and then discard any unclaimed events.

You don't need a state here - just preset the reference count to 1.  The
sequence should follow something like:

1. Tell kernel to destroy CM ID.
2. Kernel flushes all pending events, completes (with failure) any requests from
a previous call to get_event().
3. deref, and wait for zero.

> get_event() should check the object state and discard the event if the object
> is
> being destroyed.  It should increment a reference count before reporting any
> events.

No need to check for the state.  If the object was destroyed already, the user
has a bug and there's not much you can do about it.  Something like this:
1. inc ref_cnt
2. call kernel to get event.
3. if failure dec ref_cnt.

You need to bump the ref_cnt before making the kernel call to get the event to
prevent a call to destroy() from blowing the CM ID away while an event request
is outstanding (and the status isn't known yet).

> put_event() should decrement the reference count and unblock destroy if the
> reference count goes to 0.

Yep.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: calculate inline data size

2005-07-08 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 08, 2005 11:43 AM
> 
> >> Gleb> But how should I know what maximum size that I can use? By
> >> Gleb> trial and error?
> >>
> >> This is a good question that I don't have an answer to right now.
> >
> >Why not add the max inline size to the CA attributes?  HCAs that don't
> >support inline data would set it to zero.
> >
> >It does require the user to query the CA attributes to get the limit, but
> >that's probably OK.
> 
> In general I agree that the ib_device_attr is a good place to store this.  The
> problem is that the maximum changes depending on the work request type (send
> or RDMA write), if immediate data is used, and number of sge's.

As far as I know, the maximum size supported by the Mellanox HCAs is larger than
optimum - that is, the performance goes down due to the data copy into the WQE
as compared to formatting the data segment and letting the HCA DMA the data.  

If we expect all implementation that support inline data to do so beyond the
optimal point, I don't see an issue with the maximum being the smallest maximum
of all the different types of work requests.

So my vote is for adding the max inline to the device attributes, and making it
the maximum for the most constrained work request (probably RDMA Write with
immediate?).

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: calculate inline data size

2005-07-08 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 08, 2005 11:33 AM
> 
> Gleb> But how should I know what maximum size that I can use? By
> Gleb> trial and error?
> 
> This is a good question that I don't have an answer to right now.

Why not add the max inline size to the CA attributes?  HCAs that don't support
inline data would set it to zero.

It does require the user to query the CA attributes to get the limit, but that's
probably OK.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: IBDM and IBMgtSim Proposal Comments

2005-07-08 Thread Fab Tillier
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 08, 2005 12:47 AM
> 
> Performance is not the only reason to have a clean architecture.
> Why cant the windows driver match the openib API?

OSMV is not the Windows API.  The Windows stack has a clean architecture, is
already implemented, and works.  Why rewrite?

Also, the Linux API is not *the* openib API.  It's specific to Linux.

If there are issues with the architecture of the Windows drivers, let's hear
them and fix them.  Otherwise, just changing the API to match doesn't make
sense, especially given how different the driver models are between the two
operating systems.

Remember that code sharing is not a goal.  One reason for this is that the Linux
kernel code will never incorporate features that make sense in Windows but not
Linux for the sake of similarity in APIs.  The Linux stack is designed to fit
into Linux properly and not to be a piece of portable code.  The Windows stack
will do the same, but for Windows.

There's nothing that prevents the Windows or Linux APIs from evolving to more
closely match one another.  In fact, I plan on changing the CQ poll semantics to
match the Linux model more closely.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [Openib-windows] RE: [openib-general] IBDM and IBMgtSim ProposalComments

2005-07-07 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 2:53 PM
> 
> It seems to me that the person writing the code will decide whether they
> should
> interface at the lowest layer or write to a higher level abstraction for
> portability.  I don't see that either way is wrong.  And if the higher level
> abstraction is the native interface on one platform, that sounds like a
> benefit.

It seems unclear to me whether changes made to the existing IB diags to make
them run over OSMV would be accepted, even if Eitan made the changes.  That's my
impression given the dialogue in this thread - there's an attitude of umad or
bust.

Aside from that, I agree - whoever gets it done first gets it done their way.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] RE: IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 2:36 PM
> 
> On Thu, 2005-07-07 at 17:26, Fab Tillier wrote:
> > > So it appears there are 3 choices:
> > > 1. Port OpenIB Linux libraries to Windows and OpenIB Linux "diags" port
> > > as well (second part is less work than next alternative)
> > > 2. Port OpenIB Linux diagnostics to OSM vendor layer
> > > 3. No OpenIB "Linux" diagnostics in the windows environment
> >
> > What about:
> > 4. Port OpenIB Linux "diags" to Windows/IBAL.
> >
> > Seems like a pretty obvious choice, and what I've been talking about in my
> > previous emails.
> 
> Yes, that is another alternative but is similar to choice 2 in terms of
> affecting all the applications.

All the applications will be affected by being ported from Linux to Windows, not
to mention changes in the umad interface necessary to make it work on Windows.
It has the benefits that the tools interface to the lowest API available to
them.  If interfacing at the lowest layer isn't important, the choice is between
porting the diags to OSMV or porting umad to Windows.  I don't know which is
easier, and it probably doesn't matter.  What matters is which gets done first.

However, it doesn't sound that an effort from Eitan to port the diagnostics to
OSMV would be accepted into the Linux SVN repository simply on the basis that it
isn't umad.  What gives?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] RE: IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 2:31 PM
> 
> Quoting r. Fab Tillier <[EMAIL PROTECTED]>:
> > umad in Linux is the functional equivalent of IBAL in Windows, period.
> 
> Doesnt IBAL also include uverbs, ucm and whatnot?

Yes, in Windows all the components are integrated into a single access layer and
work in harmony.  So let me rephrase:  umad in Linux is the functional
equivalent of the MAD services of IBAL in Windows, period.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Yaron Haviv [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 2:09 PM
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of Fab Tillier
> > Sent: Thursday, July 07, 2005 11:37 PM
> >
> > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> > > Sent: Thursday, July 07, 2005 10:56 AM
> > >
> > > In the OpenIB architecture, umad is the lowest layer library and the
> > > diagnostics are built on that.
> >
> > That's only true in the *Linux* OpenIB Architecture.  Windows is
> > different - the access layer already provides support for user-level
> > MAD clients, and the API is very close (if not identical) to the
> > IBAL interface OpenSM was originally written to.
> 
> From my understanding the main advantage for using the OSM Vendor
> specific layer is that it is also present in Windows ?

OSMV already exists over a variety of transports, as Eitan mentioned.  It is one
option for developing portable MAD-based tools and diagnostics.

> or does it have some other advantage over the umad layer (from Hal's
> response seems like umad has better layering/functionality) ?

I'm not familiar with it, so I can't answer what functional advantage it has.  I
would expect that as an abstraction layer, it will hide some of the
functionality in umad, just like it likely hides functionality in IBAL.  That's
the price of using a higher level abstraction.  Hal seems to be making the
argument that the lowest layer is the best to use for Linux, but somehow not for
Windows.  I'm just questioning the inconsistency, independent of OSMV.

> If that is the case than you can also suggest to replace the OpenIB
> verbs layer or CM, etc' with the IBAL one because its present in Windows

No, that wasn't the point.  The point was that we have CM, verbs, MAD and so
forth support in Linux and Windows, independent of OSMV, and more importantly
independent of one another.  Just like I'm not suggesting moving the Windows
code to Linux, I'm pushing back against gratuitously moving Linux code to
Windows.  If there's a reason to do it, great.  If it's just because the Linux
code works, I'll be one of the first to point out that the Windows code works
too.

> I believe if we want to do a major change in the management
> infrastructure that is live and kicking (can probably improve like
> always)
> We need a much better reason than "its done this way in Windows"

Turn your above statement around: "I believe if we want to do a major change in
the management infrastructure that is live and kicking (can probably improve
like always) we need a much better reason than "its done this way in Linux"

We're not talking about changing the umad interface in Linux.  Eitan is
proposing having all diagnostics interface to OSVM to facilitate portability.
Hal is proposing porting umad to Windows.  I'm saying just use IBAL - it's
already there and works.

- Fab


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] RE: IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 2:05 PM
> 
> On Thu, 2005-07-07 at 16:43, Fab Tillier wrote:
> > umad is the lowest level API in Linux, but not in Windows.  So either the
> > diagnostics interface to the lowest level layer (umad for Linux, IBAL for
> > Windows), or the diagnostics interface to some higher abstraction layer.
> > If a higher abstraction layer, why not use the existing OSM vendor layer
> > and skip porting umad to Windows all together?
> 
> In Linux, OSM vendor layer is implemented on top of umad. Whether it is
> a higher abstraction layer is another matter. It is an abstraction with
> different semantics and may be higher as I am not sure whether the umad
> and mad libraries could be put on top of the OSM vendor layer but the
> other way 'round works.

Since OSMV works over umad, IBAL, Gen1 and so forth, to me that makes it a
higher level abstraction.  I'm not here to debate which is higher - I don't
care.  I was just making the point that if the argument for interfacing to umad
in Linux is that it is the lowest level interface possible, that same argument
should be made for Windows.  Consistency in logic what I'm asking for here.

> 
> I looked at IBAL some time ago but can't comment now on how it compares.
> 
> So it appears there are 3 choices:
> 1. Port OpenIB Linux libraries to Windows and OpenIB Linux "diags" port
> as well (second part is less work than next alternative)
> 2. Port OpenIB Linux diagnostics to OSM vendor layer
> 3. No OpenIB "Linux" diagnostics in the windows environment

What about:
4. Port OpenIB Linux "diags" to Windows/IBAL.

Seems like a pretty obvious choice, and what I've been talking about in my
previous emails.

I don't care about OSMV - I care about the access layer in Windows.  The
functionality for user-level MAD clients is already there - why not use it?  If
there are problems with the exiting MAD interface, let's address them.  If that
ends up making the interface more similar to the Linux umad interface, so be it.

However, I'm not in favor of just porting umad to Windows because someone
randomly decreed that it is *the* user-mode MAD interface for the industry.  It
isn't.  umad in Linux is the functional equivalent of IBAL in Windows, period.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] RE: IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 1:38 PM
> 
> On Thu, 2005-07-07 at 16:33, Eitan Zahavi wrote:
> > >
> > > Have you looked at the umad and mad libraries ? These are not IBAL.
> > [EZ] Yes I know. This is why they do not work on Windows IBAL.
> 
> Is it a requirement for them to work on Windows IBAL ?

The OpenIB Windows stack is IBAL.

> There is no reason the MAD and UMAD libraries couldn't be ported to
> Windows.

Is there any reason to port these when the Windows OpenIB stack already has
fully implemented and functioning user-mode MAD client support?  What's the
value in porting the API?  Is it portable with no changes so applications using
the umad interface don't have to change between Linux and Windows?

I'm guessing the diagnostics will have to change regardless of what MAD API they
interface to unless they interface to an abstraction layer that hides OS
differences.  If the diagnostics are going to change, why not just change them
to interface directly to the native MAD API for the target platform?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [Openib-windows] RE: IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 12:45 PM
> 
> On Thu, 2005-07-07 at 14:58, Eitan Zahavi wrote:
> > [EZ] There is no need to port UMAD to windows!!! We already have OSM
> > Vendor ported to it. It works on top of the existing IBAL API
> > (actually this is the first OSM Vendor that was ever built).
> 
> There is if the OpenIB diagnostics and other applications in the Linux
> environment which are not on top of the "OSM" vendor layer are to work
> in the Windows environment. That was what started this whole thread.

The discussion was about porting the diagnostics to the Windows environment.
Whether that's done by porting umad and the MAD libraries to run on top of the
IBAL MAD services APIs or by porting the diagnostics to interface directly to
IBAL hasn't been settled on.  I suppose it will be up to whoever ports the
diagnostics.  If I was porting them, I'd make them interface to the lowest level
available (just like they do in Linux).  If Eitan was porting them, he'd have
them run over the OSM vendor layer.  If you were porting them, you'd port umad.

How is porting umad any different than using the OSM vendor layer?  umad is the
lowest level API in Linux, but not in Windows.  So either the diagnostics
interface to the lowest level layer (umad for Linux, IBAL for Windows), or the
diagnostics interface to some higher abstraction layer.  If a higher abstraction
layer, why not use the existing OSM vendor layer and skip porting umad to
Windows all together?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] IBDM and IBMgtSim Proposal Comments

2005-07-07 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 07, 2005 10:56 AM
> 
> In the OpenIB architecture, umad is the lowest layer library and the
> diagnostics are built on that.

That's only true in the *Linux* OpenIB Architecture.  Windows is different - the
access layer already provides support for user-level MAD clients, and the API is
very close (if not identical) to the IBAL interface OpenSM was originally
written to.

> The OSM vendor layer is built on top of
> this. So when the umad and mad libraries are ported to Windows,
> everything on top of this will work. This includes all diagnostics
> (OpenIB ones as well as the additional tools you are proposing to add).

Is there a plan to migrate the Linux umad and MAD libraries and replace the
user-mode MAD support already in the Windows stack?  If so, does it make sense
or are umad and the MAD libraries too closely tied to Linux?

I haven't had a chance to look carefully at the umad interface - does anyone
know how it compares to the IBAL MAD service interface?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] gen2/rnic-pi differences

2005-07-01 Thread Fab Tillier
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 01, 2005 9:15 AM
> 
> RNIC-PI defined several verb layer features that if supported
> eliminate the need for a DTO_COOKIE. If the information can
> be integrated with existing verb layer structures it is a
> major improvement in efficiency, at worst case it merely
> requires the verb layer to implement the same workarounds
> that the Access Layer is already forced to use.
> 
> These features are:
>   all verb layer objects have a consumer supplied
>   identifier (os_data) that is used to identify that
>   object back to the consumer in all completions and
>   callbacks. So instead of getting the QPID you get
>   the EP pointer (assuming that is what you supplied).

This should be really easy to implement for Mellanox HCAs - the mthca driver
already has to resolve the QP structure when processing completions, and getting
the user's QP context and including it in the work completion should be a
trivial addition (for someone familiar with the code base).

>   Three flags are identified per work request that can
>   be ignored, passed through or fully implemented. They
>   are Local Solicited, Consumer0 and Consumer1. If 'Local
>   Solicited' is defined it means that completion of the
>   work request should be treated as though it were a
>   solicited event (i.e., it qualifies for 'next solicited
>   event' callback notification).

Would these flags be returned in the work completion?  I don't know if I quite
understand what you're requesting here.  Do Consumer0 and Consumer1 represent
bits in a flags field?

In the Mellanox HCA implementation, the 64-bit work request ID is stored by the
driver and recovered upon a completion.  Basically, the HCA driver maintains
DTO_COOKIE-like information for each work request already.  Due to this lookup
requirement, the information stored per work request could be arbitrarily large
if so desired.  I don't know if that holds true for the PathScale HCA hardware,
though - if it doesn't it would require an additional lookup in the HCA driver
to get back to this information, in effect pushing the DTO_COOKIE concept into
the HCA driver rather than leaving it in the consumer.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] user_mad: Add receive side RMPP support

2005-06-30 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 30, 2005 2:46 PM
> 
> Fab Tillier wrote:
> > Why not just expect the user to read a MAD until the read returns zero?
> >
> > I'm thinking something like this:
> >
> > read( offset = 0, len = 256 )
> > read( offset 256, len =  )
> >
> > So a read at offset 0 would block waiting for the next MAD, but a read with
> > a non-zero offset would return EOF if the full MAD was read.
> >
> > Thoughts?
> 
> The user would need to know how large of a buffer to allocate for the read.
>   If the user needs to allocate two buffers, then they either need to be
> able to process data the spans multiple buffers, or a second data copy is
> required.  There's also an issue if the user is using multiple threads...

Ok, so my idea sucked.  What about not coalescing in the kernel, and just
handing up MADs to be coalesced in user-mode?  It would require a buffer
allocation in UM, as well as copies, but those currently happen in the kernel
and could be eliminated, no?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH] user_mad: Add receive side RMPP support

2005-06-30 Thread Fab Tillier
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 30, 2005 2:32 PM
> 
> Roland Dreier wrote:
> > I understand and agree with the sentiment of not wanting to add
> > another ioctl() to get the length.  Instead, how about returning a
> > ib_user_mad_hdr with a status of ENOSPC and putting the actual length
> > somewhere.  I'm not sure if it's better to change the ABI and add a
> > length field to ib_user_mad_hdr, or if we want to return the first 36
> > bytes of an RMPP MAD so the user can figure out the correct length.
> 
> Unless the MAD layer modifies the received MAD, the length may not be set in
> the header.  Setting this doesn't seem like a big deal.  If it is set, then
> I'm guessing that we'd want to set the PayloadLength to the value indicated
> by the spec, but that's not the easiest value to use in order to determine
> the size of the read.
> 
> Given that, I think that it makes more sense to add a length field to the
> ib_user_mad_hdr.

Why not just expect the user to read a MAD until the read returns zero?

I'm thinking something like this:

read( offset = 0, len = 256 )
read( offset 256, len =  )

So a read at offset 0 would block waiting for the next MAD, but a read with a
non-zero offset would return EOF if the full MAD was read.

Thoughts?

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


  1   2   >