Re: [openib-general] prototype version of ebus driver

2005-10-28 Thread Troy Benjegerdes
On Wed, Oct 26, 2005 at 04:56:08PM +0200, IBMEHCA DD wrote:
> on kernel 2.6.13 and 14 a "ebus" driver is needed to enable the ehca 
> driver on power5.
> I just uploaded a prototype patch to gen2/users/ehca svn 3879
> 

Please get some responses from the PPC64 maintainers, or possibly
linux-kernel.

I'd like to see ehca get reviewed as well, but it may be a little early
for that ;)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git pull] InfiniBand updates for 2.6.14

2005-10-28 Thread Roland Dreier
Andrew> That would be suitable, I guess.  It's a bit of a hassle,
Andrew> but some bugs will likely be found, and useful suggestions
Andrew> will be made.

No objections here... the more people I can get reading patches, the
better.  I'll see about scripting something to make it a
semi-automatic part of my workflow.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ehca testing

2005-10-28 Thread Troy Benjegerdes
On Thu, Oct 27, 2005 at 10:03:17AM -0700, Roland Dreier wrote:
> OK, looks like you have two problems.  First of all, you seem to have
> two versions of ib_mthca, one of which gets picked up by hotplug on
> boot and one of which gets picked up by modprobe.  Notice how you
> don't see the
> 
> dev->ib_dev.node_type = 1
> 
> line when mthca runs on boot?  The only explanation I can come up with
> for that would be that you have an old version of it in an initrd or
> something that's screwing thing up.

Whoops, that's exactly what's going on.. Now to figure out how to not
have IB stuff included in my initrd..
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git pull] InfiniBand updates for 2.6.14

2005-10-28 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> Andrew> a) arrange for the current infiniband devel tree to be
> Andrew> included in -mm and
> 
> Sure.  How do you want to handle that?  The way I've been working
> lately is to merge things onto my "upstream" branch when I intend for
> them to go to Linus eventually, and merge that onto the "for-linus"
> branch when I'm going to ask Linus to pull.  I guess it would make
> sense for you to grab the upstream branch for -mm.

That suits.  I'll include

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git#upstream

> Andrew> b) arrange for infiniband patches to get wider review than this?
> 
> No objection from me.  How do you suggest I do that?  Post things to
> linux-kernel as I merge them into git?

That would be suitable, I guess.  It's a bit of a hassle, but some bugs
will likely be found, and useful suggestions will be made.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git pull] InfiniBand updates for 2.6.14

2005-10-28 Thread Roland Dreier
Andrew> a) arrange for the current infiniband devel tree to be
Andrew> included in -mm and

Sure.  How do you want to handle that?  The way I've been working
lately is to merge things onto my "upstream" branch when I intend for
them to go to Linus eventually, and merge that onto the "for-linus"
branch when I'm going to ask Linus to pull.  I guess it would make
sense for you to grab the upstream branch for -mm.

Andrew> b) arrange for infiniband patches to get wider review than this?

No objection from me.  How do you suggest I do that?  Post things to
linux-kernel as I merge them into git?

Thanks,
  Roland
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git pull] InfiniBand updates for 2.6.14

2005-10-28 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
>  43 files changed, 2675 insertions(+), 1773 deletions(-)

That's rather a lot of code.  AFAIK it hasn't been past linux-kernel.  It
hasn't been in -mm.

Can we please

a) arrange for the current infiniband devel tree to be included in -mm and

b) arrange for infiniband patches to get wider review than this?

Thanks.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [git pull] InfiniBand updates for 2.6.14

2005-10-28 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

The pull will get the following changes:

Jack Morgenstein:
  [IB] Add checks to multicast attach and detach
  [IB] mthca: Report correct atomic capability
  [IB] mthca: Fill in more fields in query_port method
  [IB] mthca: Better limit checking and reporting
  [IB] mthca: Don't enter QP into MCG more than once.

Roland Dreier:
  [IB] uverbs: ABI-breaking fixes for userspace verbs
  [IB] uverbs: Fix up resource creation error paths
  [IB] uverbs: Add device-specific ABI version attribute
  [IB] uverbs: reject invalid memory registration permission flags
  [IB] Check port number in ib_query_port()/ib_modify_port()
  [IB] mthca: SRQ limit reached events
  [IB] mthca: detect SRQ overflow
  [IB] Fix leak on MAD initialization failure
  [IPoIB] Rename ipoib_create_qp() -> ipoib_init_qp() and fix error cleanup
  [IB] uverbs: unlock correctly in error paths
  [IB] fail SA queries if device initialization failed
  [IB] uverbs: Add a mask of device methods allowed for userspace
  [IB] uverbs: Add ABI structures for more commands
  [IB] uverbs: Implement more commands
  [IB] ucm: quiet sparse warnings
  [IPoIB] Improve ipoib_timeout() output
  [IB] mthca: Use enum in mthca_alloc_db() prototype
  [IB] mthca: Add struct pci_driver.owner field
  [IB] Fail sysfs queries after device is unregistered
  [IB] cm: Add missing break in switch
  [IB] user_mad: trivial coding style fixes
  [IB] user_mad: Use class_device.devt
  [IB] mthca: Always re-arm EQs in mthca_tavor_interrupt()
  Merge master.kernel.org:/.../torvalds/linux-2.6
  [IB] Add idr_destroy() calls on module unload
  Manual merge of for-linus to upstream (fix conflicts in 
drivers/infiniband/core/ucm.c)
  [IB] mthca: correct modify QP attribute masks for UC
  [IB] simplify mad_rmpp.c:alloc_response_msg()
  [IB] mthca: first pass at catastrophic error reporting
  [IB] ib_umad: fix crash when freeing send buffers
  [IPoIB] Drop RX packets when out of memory
  [IB] umad: Fix device lifetime problems
  [IB] uverbs: Fix device lifetime problems
  Merge master.kernel.org:/.../torvalds/linux-2.6
  [IB] fix up class_device_create() calls

Sean Hefty:
  [IB] merge ucm.h into ucm.c
  [IB] CM: bind IDs to a specific device
  [IB] CM: Fix initialization of QP attributes for UC QPs.
  [IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped
  [IB] ib_umad: various cleanups

 drivers/infiniband/core/agent.c  |  301 ++---
 drivers/infiniband/core/agent.h  |   13 
 drivers/infiniband/core/agent_priv.h |   62 --
 drivers/infiniband/core/cm.c |  217 +++
 drivers/infiniband/core/cm_msgs.h|1 
 drivers/infiniband/core/device.c |   12 
 drivers/infiniband/core/mad.c|  329 +-
 drivers/infiniband/core/mad_priv.h   |8 
 drivers/infiniband/core/mad_rmpp.c   |  112 ++-
 drivers/infiniband/core/mad_rmpp.h   |2 
 drivers/infiniband/core/sa_query.c   |  272 
 drivers/infiniband/core/smi.h|2 
 drivers/infiniband/core/sysfs.c  |   16 
 drivers/infiniband/core/ucm.c|  267 ++--
 drivers/infiniband/core/ucm.h|   83 ---
 drivers/infiniband/core/user_mad.c   |  403 ++--
 drivers/infiniband/core/uverbs.h |   62 +-
 drivers/infiniband/core/uverbs_cmd.c |  858 +-
 drivers/infiniband/core/uverbs_main.c|  503 ++-
 drivers/infiniband/core/verbs.c  |   18 -
 drivers/infiniband/hw/mthca/Makefile |3 
 drivers/infiniband/hw/mthca/mthca_catas.c|  153 +
 drivers/infiniband/hw/mthca/mthca_cmd.c  |   11 
 drivers/infiniband/hw/mthca/mthca_dev.h  |   22 +
 drivers/infiniband/hw/mthca/mthca_eq.c   |   21 +
 drivers/infiniband/hw/mthca/mthca_mad.c  |   72 --
 drivers/infiniband/hw/mthca/mthca_main.c |   11 
 drivers/infiniband/hw/mthca/mthca_mcg.c  |   11 
 drivers/infiniband/hw/mthca/mthca_memfree.c  |3 
 drivers/infiniband/hw/mthca/mthca_memfree.h  |3 
 drivers/infiniband/hw/mthca/mthca_provider.c |   49 +
 drivers/infiniband/hw/mthca/mthca_qp.c   |   16 
 drivers/infiniband/hw/mthca/mthca_srq.c  |   43 +
 drivers/infiniband/hw/mthca/mthca_user.h |6 
 drivers/infiniband/ulp/ipoib/ipoib.h |   23 -
 drivers/infiniband/ulp/ipoib/ipoib_ib.c  |  122 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c|   15 
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c   |9 
 include/rdma/ib_cm.h |   10 
 include/rdma/ib_mad.h|   66 +-
 include/rdma/ib_user_cm.h|   10 
 include/rdma/ib_user_verbs.h 

Re: uDAPL Problem : [WasRe: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Arlin Davis

Aniruddha Bohra wrote:



Now, I have a problem with udapl :

The following is a code snippet from :
dapl_ib_dto.h

for (i = 0; i < segments; i++ ) {
   if ( !local_iov[i].segment_length )
   continue;

   ds_array_p->addr  = (uint64_t) 
local_iov[i].virtual_address;

   ds_array_p->length = local_iov[i].segment_length;
   ds_array_p->lkey  = local_iov[i].lmr_context;

   dapl_dbg_log (  DAPL_DBG_TYPE_EP,
   " post_snd: lkey 0x%x va %p len %d \n",
   ds_array_p->lkey, ds_array_p->addr,
   ds_array_p->length );

   total_len += ds_array_p->length;
   wr.num_sge++;
   ds_array_p++;
   }

The following is the relevant part of the log with DAPL_DBG_TYPE=0x

dapl_ep_post_send (0x8087110, 2, 0x80f9910, %P, b5f395bc)^M
post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x80f9910 r_iov 
0xbfc29060 f 0^M

post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x80f9910^M
post_snd: lkey 0x10de003b va 0xb5f3976c len 0 ^M
post_snd: lkey 0x10de003b va 0xb5f39924 len 0 ^M




From the above loop, how is this possible :
If local_iov[i].segment_length == 0, it should not be printed. And the
if the assignment is successful, len must not be 0.

Any ideas? Of course following this, the ep is disconnected in the 
next step :(


local_iov (LMR) length is 64bits and the ibv_sge (ds_array) length is 32 
bits so it truncates.

Sounds like you setup a transfer greater then 4GB-1?

If you query the device via uDAPL you will see the max limits (2GB):

query_hca: (a0.0) ep 64512 ep_q 65535 evd 65408 evd_q 131071
query_hca: msg 2147483648 rdma 2147483648 iov 59 lmr 131056 rmr 0

-arlin



Also a minor patch, you can see that %P is printed as %P and not used as
a format character.

Index: common/dapl_ep_post_rdma_write.c
===
--- common/dapl_ep_post_rdma_write.c(revision 3892)
+++ common/dapl_ep_post_rdma_write.c(working copy)
@@ -78,7 +78,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_rdma_write (%p, %d, %p, %P, %p, %x)\n",
+ "dapl_ep_post_rdma_write (%p, %d, %p, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,
Index: common/dapl_ep_post_send.c
===
--- common/dapl_ep_post_send.c  (revision 3892)
+++ common/dapl_ep_post_send.c  (working copy)
@@ -75,7 +75,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_send (%p, %d, %p, %P, %x)\n",
+ "dapl_ep_post_send (%p, %d, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,
Index: common/dapl_srq_post_recv.c
===
--- common/dapl_srq_post_recv.c (revision 3892)
+++ common/dapl_srq_post_recv.c (working copy)
@@ -79,7 +79,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_srq_post_recv (%p, %d, %p, %P)\n",
+ "dapl_srq_post_recv (%p, %d, %p, %p)\n",
 srq_handle,
 num_segments,
 local_iov,
Index: common/dapl_ep_post_recv.c
===
--- common/dapl_ep_post_recv.c  (revision 3892)
+++ common/dapl_ep_post_recv.c  (working copy)
@@ -79,7 +79,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_recv (%p, %d, %p, %P, %x)\n",
+ "dapl_ep_post_recv (%p, %d, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,

Thanks
Aniruddha



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] fix umad object lifetime stuff

2005-10-28 Thread Roland Dreier
I just committed the following patch for user_mad.c, which fixes
various issues with possibly freeing various data structures before
the last reference is gone.  For example, cdev_del() might return
before the last reference to the cdev is gone, so freeing a structure
containing the cdev is wrong at that point.  (Side note: it's
essentially impossible to use cdev_init() safely unless the cdev in
question is statically allocated as part of the module).

Something like this is probably required for ucm and anything else
that exports a character device, since everyone seems to have copied
my bad user_mad code.  But I haven't had a chance to do anything
beyond user_mad and uverbs so far...

 - R.

--- infiniband/core/user_mad.c  (revision 3890)
+++ infiniband/core/user_mad.c  (working copy)
@@ -64,18 +64,39 @@ enum {
IB_UMAD_MINOR_BASE = 0
 };
 
+/*
+ * Our lifetime rules for these structs are the following: each time a
+ * device special file is opened, we look up the corresponding struct
+ * ib_umad_port by minor in the umad_port[] table while holding the
+ * port_lock.  If this lookup succeeds, we take a reference on the
+ * ib_umad_port's struct ib_umad_device while still holding the
+ * port_lock; if the lookup fails, we fail the open().  We drop these
+ * references in the corresponding close().
+ *
+ * In addition to references coming from open character devices, there
+ * is one more reference to each ib_umad_device representing the
+ * module's reference taken when allocating the ib_umad_device in
+ * ib_umad_add_one().
+ *
+ * When destroying an ib_umad_device, we clear all of its
+ * ib_umad_ports from umad_port[] while holding port_lock before
+ * dropping the module's reference to the ib_umad_device.  This is
+ * always safe because any open() calls will either succeed and obtain
+ * a reference before we clear the umad_port[] entries, or fail after
+ * we clear the umad_port[] entries.
+ */
+
 struct ib_umad_port {
-   intdevnum;
-   struct cdevdev;
-   struct class_deviceclass_dev;
-
-   intsm_devnum;
-   struct cdevsm_dev;
-   struct class_devicesm_class_dev;
+   struct cdev   *dev;
+   struct class_device   *class_dev;
+
+   struct cdev   *sm_dev;
+   struct class_device   *sm_class_dev;
struct semaphore   sm_sem;
 
struct ib_device  *ib_dev;
struct ib_umad_device *umad_dev;
+   intdev_num;
u8 port_num;
 };
 
@@ -102,13 +123,25 @@ struct ib_umad_packet {
struct ib_user_mad mad;
 };
 
+static struct class *umad_class;
+
 static const dev_t base_dev = MKDEV(IB_UMAD_MAJOR, IB_UMAD_MINOR_BASE);
-static spinlock_t map_lock;
+
+static DEFINE_SPINLOCK(port_lock);
+static struct ib_umad_port *umad_port[IB_UMAD_MAX_PORTS];
 static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 2);
 
 static void ib_umad_add_one(struct ib_device *device);
 static void ib_umad_remove_one(struct ib_device *device);
 
+static void ib_umad_release_dev(struct kref *ref)
+{
+   struct ib_umad_device *dev =
+   container_of(ref, struct ib_umad_device, ref);
+
+   kfree(dev);
+}
+
 static int queue_packet(struct ib_umad_file *file,
struct ib_mad_agent *agent,
struct ib_umad_packet *packet)
@@ -534,13 +567,23 @@ static long ib_umad_ioctl(struct file *f
 
 static int ib_umad_open(struct inode *inode, struct file *filp)
 {
-   struct ib_umad_port *port =
-   container_of(inode->i_cdev, struct ib_umad_port, dev);
+   struct ib_umad_port *port;
struct ib_umad_file *file;
 
+   spin_lock(&port_lock);
+   port = umad_port[iminor(inode) - IB_UMAD_MINOR_BASE];
+   if (port)
+   kref_get(&port->umad_dev->ref);
+   spin_unlock(&port_lock);
+
+   if (!port)
+   return -ENXIO;
+
file = kzalloc(sizeof *file, GFP_KERNEL);
-   if (!file)
+   if (!file) {
+   kref_put(&port->umad_dev->ref, ib_umad_release_dev);
return -ENOMEM;
+   }
 
spin_lock_init(&file->recv_lock);
init_rwsem(&file->agent_mutex);
@@ -556,6 +599,7 @@ static int ib_umad_open(struct inode *in
 static int ib_umad_close(struct inode *inode, struct file *filp)
 {
struct ib_umad_file *file = filp->private_data;
+   struct ib_umad_device *dev = file->port->umad_dev;
struct ib_umad_packet *packet, *tmp;
int i;
 
@@ -570,6 +614,8 @@ static int ib_umad_close(struct inode *i
 
kfree(file);
 
+   kref_put(&dev->ref, ib_umad_release_dev);
+
return 0;
 }
 
@@ -586,30 +632,46 @@ static struct file_operations umad_fops 
 
 static int ib_umad_sm_open(struct inode *inode, struct file *filp)
 {
-   struct ib_umad_port *port =
-   container_of(inode->i_cdev, struct ib_umad_port, sm_dev);
+   struct ib_umad_p

uDAPL Problem : [WasRe: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


   > OK so, what options do I have right now -- compile a new kernel and
   > apply patches and
   > continue, or is there some patch that I can apply ?

I don't think anyone has prepared a kzalloc() patch, but just adding
something like

static void *kzalloc(size_t size, unsigned int flags)
{
void *ret = kmalloc(size, flags);
if (ret)
memset(ret, 0, size);
return ret;
}

to files that use kzalloc() should let you use 2.6.13 (assuming there
are no other incompatibilities).




Thanks, that works.

Now, I have a problem with udapl :

The following is a code snippet from :
dapl_ib_dto.h

for (i = 0; i < segments; i++ ) {
   if ( !local_iov[i].segment_length )
   continue;

   ds_array_p->addr  = (uint64_t) local_iov[i].virtual_address;
   ds_array_p->length = local_iov[i].segment_length;
   ds_array_p->lkey  = local_iov[i].lmr_context;

   dapl_dbg_log (  DAPL_DBG_TYPE_EP,
   " post_snd: lkey 0x%x va %p len %d \n",
   ds_array_p->lkey, ds_array_p->addr,
   ds_array_p->length );

   total_len += ds_array_p->length;
   wr.num_sge++;
   ds_array_p++;
   }

The following is the relevant part of the log with DAPL_DBG_TYPE=0x

dapl_ep_post_send (0x8087110, 2, 0x80f9910, %P, b5f395bc)^M
post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x80f9910 r_iov 
0xbfc29060 f 0^M

post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x80f9910^M
post_snd: lkey 0x10de003b va 0xb5f3976c len 0 ^M
post_snd: lkey 0x10de003b va 0xb5f39924 len 0 ^M




From the above loop, how is this possible :
If local_iov[i].segment_length == 0, it should not be printed. And the
if the assignment is successful, len must not be 0.

Any ideas? Of course following this, the ep is disconnected in the next 
step :(


Also a minor patch, you can see that %P is printed as %P and not used as
a format character.

Index: common/dapl_ep_post_rdma_write.c
===
--- common/dapl_ep_post_rdma_write.c(revision 3892)
+++ common/dapl_ep_post_rdma_write.c(working copy)
@@ -78,7 +78,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_rdma_write (%p, %d, %p, %P, %p, %x)\n",
+ "dapl_ep_post_rdma_write (%p, %d, %p, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,
Index: common/dapl_ep_post_send.c
===
--- common/dapl_ep_post_send.c  (revision 3892)
+++ common/dapl_ep_post_send.c  (working copy)
@@ -75,7 +75,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_send (%p, %d, %p, %P, %x)\n",
+ "dapl_ep_post_send (%p, %d, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,
Index: common/dapl_srq_post_recv.c
===
--- common/dapl_srq_post_recv.c (revision 3892)
+++ common/dapl_srq_post_recv.c (working copy)
@@ -79,7 +79,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_srq_post_recv (%p, %d, %p, %P)\n",
+ "dapl_srq_post_recv (%p, %d, %p, %p)\n",
 srq_handle,
 num_segments,
 local_iov,
Index: common/dapl_ep_post_recv.c
===
--- common/dapl_ep_post_recv.c  (revision 3892)
+++ common/dapl_ep_post_recv.c  (working copy)
@@ -79,7 +79,7 @@
DAT_RETURN dat_status;

dapl_dbg_log (DAPL_DBG_TYPE_API,
- "dapl_ep_post_recv (%p, %d, %p, %P, %x)\n",
+ "dapl_ep_post_recv (%p, %d, %p, %p, %x)\n",
 ep_handle,
 num_segments,
 local_iov,

Thanks
Aniruddha



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] [SRP] srp_cm_handler expanded response handling

2005-10-28 Thread Roland Dreier
Thanks, applied.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Roland Dreier
> OK so, what options do I have right now -- compile a new kernel and
> apply patches and
> continue, or is there some patch that I can apply ?

I don't think anyone has prepared a kzalloc() patch, but just adding
something like

static void *kzalloc(size_t size, unsigned int flags)
{
void *ret = kmalloc(size, flags);
if (ret)
memset(ret, 0, size);
return ret;
}

to files that use kzalloc() should let you use 2.6.13 (assuming there
are no other incompatibilities).

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


   > With 3892 I now get the following warnings on compilation:
   > WARNING:
   > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko
   > needs unknown symbol kzalloc
   > WARNING:
   > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko
   > needs unknown symbol kzalloc

Yes, kzalloc() was added in 2.6.14.  Now that 2.6.14 has been
released, the subversion trunk is targeted against that kernel rather
than the old 2.6.13 release.

- R.
 

OK so, what options do I have right now -- compile a new kernel and 
apply patches and

continue, or is there some patch that I can apply ?

Thanks
Aniruddha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Roland Dreier
> With 3892 I now get the following warnings on compilation:
> WARNING:
> /lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko
> needs unknown symbol kzalloc
> WARNING:
> /lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko
> needs unknown symbol kzalloc

Yes, kzalloc() was added in 2.6.14.  Now that 2.6.14 has been
released, the subversion trunk is targeted against that kernel rather
than the old 2.6.13 release.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


   > Now there is an OOPS in the dmesg :

This really looks like the bug I fixed in r3889.  What svn rev are
your kernel modules built from?

- R.
 


And of course, the module does not load :
Oct 28 16:21:57 hora-3 kernel: ib_mthca: Unknown symbol kzalloc
Oct 28 16:21:58 hora-3 kernel: ib_umad: Unknown symbol kzalloc

Aniruddha


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


   > Now there is an OOPS in the dmesg :

This really looks like the bug I fixed in r3889.  What svn rev are
your kernel modules built from?

- R.
 


With 3892 I now get the following warnings on compilation:
WARNING: 
/lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko 
needs unknown symbol kzalloc
WARNING: 
/lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko needs 
unknown symbol kzalloc



Aniruddha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Roland Dreier
Jay> I also looked into "user_mad.c" and see that you don't have
Jay> the same compatibility defines for kzalloc that you used in
Jay> sdp.

Right, now that 2.6.14 is out, we won't try to maintain backward
compatibility with 2.6.13 in the main subversion trunk.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Jay Higley

Roland Dreier wrote:


   Jay> I am using the source dowloaded from kernel.org for 2.6.14
   Jay> with only the sk98lin and infiniband patches.  What newer
   Jay> headers are you refering to?  Where are the supposed to be
   Jay> located?

If you link a subversion tree into your kernel's drivers/infiniband
subdirectory, then you have to rm -rf include/rdma in your kernel
tree, or else the build will pick up the old headers from the kernel
tree instead of the new headers from the subversion tree.

- R.


 

Thanks.  I'll try that.  I also looked into "user_mad.c" and see that 
you don't have the same compatibility defines for kzalloc that you used 
in sdp.  I've attempted to duplicate them and am trying the recompile 
with 2.6.13 right now.


-Jay
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] cq callback question

2005-10-28 Thread Roland Dreier
Steve> This may seem like a dumb question, but can a kernel ULP
Steve> assume that after returning from ib_destroy_qp(), there
Steve> will be no more callbacks for that QP on the associated cq
Steve> event handler?

No, I don't think that's a valid assumption, at least with the current
code.  Also, there's no requirement in Documentation/infiniband/core_locking.txt
that destroy QP operations synchronize against CQ callbacks.

It is valid to assume that no callbacks will happen after
ib_destroy_cq() returns.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Roland Dreier
Jay> I am using the source dowloaded from kernel.org for 2.6.14
Jay> with only the sk98lin and infiniband patches.  What newer
Jay> headers are you refering to?  Where are the supposed to be
Jay> located?

If you link a subversion tree into your kernel's drivers/infiniband
subdirectory, then you have to rm -rf include/rdma in your kernel
tree, or else the build will pick up the old headers from the kernel
tree instead of the new headers from the subversion tree.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Jay Higley

Sean Hefty wrote:


Jay Higley wrote:

I updated to version 3891 and tried it with the 2.6.13.4 Kernel that 
I was using and got unresolved symbol errors for kzalloc.  I upgraded 
the kernel to 2.6.14 and tried agin and got the below compile 
errors.  As an aside, when I was running the unpatched openSM on a 
single-processor system I occasionally got it to start up, but it 
would hang and the port state would never change.  The same sort of 
behavior as in the "OpenSM crash with today's trunk" thread.



It looks like you have old header files (possibly the original ones 
shipped with 2.6.14).


I'm updating my systems to 2.6.14 at the moment, and will start 
testing this once done.


- Sean


I am using the source dowloaded from kernel.org for 2.6.14 with only the 
sk98lin and infiniband patches.  What newer headers are you refering 
to?  Where are the supposed to be located?


-Jay
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Roland Dreier
> Now there is an OOPS in the dmesg :

This really looks like the bug I fixed in r3889.  What svn rev are
your kernel modules built from?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] add node_guid to struct ib_device

2005-10-28 Thread Roland Dreier
Sean> Thanks.  I forgot to include the changes to sysfs.c in my
Sean> previous patch.  Not sure if we want to wait on this until
Sean> the other drivers have been updated.  We'll probably want to
Sean> remove node_guid from the device attributes as well.

Yes, I think that needs to wait until you or someone else updates
ipath and ehca.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Sean Hefty

Jay Higley wrote:
I updated to version 3891 and tried it with the 2.6.13.4 Kernel that I 
was using and got unresolved symbol errors for kzalloc.  I upgraded the 
kernel to 2.6.14 and tried agin and got the below compile errors.  As an 
aside, when I was running the unpatched openSM on a single-processor 
system I occasionally got it to start up, but it would hang and the port 
state would never change.  The same sort of behavior as in the "OpenSM 
crash with today's trunk" thread.


It looks like you have old header files (possibly the original ones shipped with 
2.6.14).


I'm updating my systems to 2.6.14 at the moment, and will start testing this 
once done.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM causes kernel trap

2005-10-28 Thread Jay Higley

Roland Dreier wrote:


BTW, Jay, can you confirm that this patch fixes your problem too?

Thanks,
 Roland


 



I updated to version 3891 and tried it with the 2.6.13.4 Kernel that I 
was using and got unresolved symbol errors for kzalloc.  I upgraded the 
kernel to 2.6.14 and tried agin and got the below compile errors.  As an 
aside, when I was running the unpatched openSM on a single-processor 
system I occasionally got it to start up, but it would hang and the port 
state would never change.  The same sort of behavior as in the "OpenSM 
crash with today's trunk" thread.


-Jay Higley

 CC [M]  drivers/infiniband/core/addr.o
 CC [M]  net/sched/em_text.o
 CC [M]  net/sctp/outqueue.o
 CC [M]  net/sunrpc/xprt.o
drivers/infiniband/core/addr.c:330: warning: initialization from 
incompatible po

inter type
 CC [M]  net/sctp/ulpqueue.o
 CC [M]  drivers/infiniband/core/at.o
drivers/infiniband/core/at.c:1547: warning: initialization from 
incompatible poi

nter type
 CC [M]  drivers/infiniband/core/cm.o
 CC [M]  net/sctp/command.o
drivers/infiniband/core/cm.c: In function `cm_alloc_msg':
drivers/infiniband/core/cm.c:179: error: `IB_MGMT_MAD_HDR' undeclared 
(first use

in this function)
drivers/infiniband/core/cm.c:179: error: (Each undeclared identifier is 
reported

only once
drivers/infiniband/core/cm.c:179: error: for each function it appears in.)
drivers/infiniband/core/cm.c:180: error: too few arguments to function 
`ib_creat

e_send_mad'
drivers/infiniband/core/cm.c:187: error: structure has no member named `ah'
drivers/infiniband/core/cm.c:188: error: structure has no member named 
`retries'

drivers/infiniband/core/cm.c: In function `cm_alloc_response_msg':
drivers/infiniband/core/cm.c:209: error: `IB_MGMT_MAD_HDR' undeclared 
(first use

in this function)
drivers/infiniband/core/cm.c:210: error: too few arguments to function 
`ib_creat

e_send_mad'
drivers/infiniband/core/cm.c:215: error: structure has no member named `ah'
drivers/infiniband/core/cm.c: In function `cm_free_msg':
drivers/infiniband/core/cm.c:222: error: structure has no member named `ah'
drivers/infiniband/core/cm.c: In function `cm_insert_listen':
drivers/infiniband/core/cm.c:371: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:371: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:374: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:374: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:376: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:376: error: structure has no member named 
`device'

drivers/infiniband/core/cm.c: In function `cm_find_listen':
drivers/infiniband/core/cm.c:398: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:401: error: structure has no member named 
`device'
drivers/infiniband/core/cm.c:403: error: structure has no member named 
`device'

drivers/infiniband/core/cm.c: At top level:
drivers/infiniband/core/cm.c:543: error: conflicting types for 
'ib_create_cm_id'
include/rdma/ib_cm.h:306: error: previous declaration of 
'ib_create_cm_id' was h

ere
drivers/infiniband/core/cm.c:543: error: conflicting types for 
'ib_create_cm_id'
include/rdma/ib_cm.h:306: error: previous declaration of 
'ib_create_cm_id' was h

ere
drivers/infiniband/core/cm.c: In function `ib_create_cm_id':
drivers/infiniband/core/cm.c:553: error: structure has no member named 
`device'

drivers/infiniband/core/cm.c: In function `ib_destroy_cm_id':
drivers/infiniband/core/cm.c:681: warning: passing arg 2 of 
`ib_cancel_mad' make

s integer from pointer without a cast
drivers/infiniband/core/cm.c:692: warning: passing arg 2 of 
`ib_cancel_mad' make

s integer from pointer without a cast
drivers/infiniband/core/cm.c:709: warning: passing arg 2 of 
`ib_cancel_mad' make

s integer from pointer without a cast
drivers/infiniband/core/cm.c: In function `ib_send_cm_req':
drivers/infiniband/core/cm.c:935: error: structure has no member named 
`timeout_

ms'
drivers/infiniband/core/cm.c:944: warning: passing arg 1 of 
`ib_post_send_mad' f

rom incompatible pointer type
drivers/infiniband/core/cm.c:944: error: too few arguments to function 
`ib_post_

send_mad'
drivers/infiniband/core/cm.c: In function `cm_issue_rej':
drivers/infiniband/core/cm.c:989: warning: passing arg 1 of 
`ib_post_send_mad' f

rom incompatible pointer type
drivers/infiniband/core/cm.c:989: error: too few arguments to function 
`ib_post_

send_mad'
drivers/infiniband/core/cm.c: In function `cm_dup_req_handler':
drivers/infiniband/core/cm.c:1197: warning: passing arg 1 of 
`ib_post_send_mad'

from incompatible pointer type
drivers/infiniband/core/cm.c:1197: error: too few arguments to function 
`ib_post

_send_mad'
drivers/infiniband/core/cm.c: In function `cm_match_req':
drivers/infiniband/core/cm.c:1237: error: structure has no member named 
`device'

drivers/infiniband/core/cm.c: In function `ib_send_cm_rep':
drivers/inf

Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Hal Rosenstock wrote:


Or perhaps something crashed and didn't clean up properly. Does this occur 
immediately after a boot ?

 



After a fresh reboot of the machines on the switch, I get the log at
http://www.cs.rutgers.edu/~bohra/osm-v2.log

The opensm process does not crash but hangs. The state of the port never 
changes.


Now there is an OOPS in the dmesg :

ct 28 13:52:13 hora-3 OpenSM[5168]: OpenSM Rev:openib-1.1.0
Oct 28 13:52:14 hora-3 kernel: Unable to handle kernel paging request at 
virtual address 0910

Oct 28 13:52:14 hora-3 kernel:  printing eip:
Oct 28 13:52:14 hora-3 kernel: f883f12d
Oct 28 13:52:14 hora-3 kernel: *pde = 
Oct 28 13:52:14 hora-3 kernel: Oops:  [#1]
Oct 28 13:52:14 hora-3 kernel: SMP
Oct 28 13:52:14 hora-3 kernel: Modules linked in: ib_uverbs ib_umad ipv6 
i2c_dev i2c_core sunrpc dm_mod video button battery ac uhci_hcd 
hw_random ib_mthca ib_mad ib_core e1000 floppy

Oct 28 13:52:14 hora-3 kernel: CPU:1
Oct 28 13:52:14 hora-3 kernel: EIP:0060:[]Not tainted VLI
Oct 28 13:52:14 hora-3 kernel: EFLAGS: 00010286   (2.6.13bohra)
Oct 28 13:52:14 hora-3 kernel: EIP is at ib_post_send_mad+0x1c/0x1b1 
[ib_mad]
Oct 28 13:52:14 hora-3 kernel: eax: 0900   ebx: c1a7d900   ecx: 
c1a7d918   edx: 
Oct 28 13:52:14 hora-3 kernel: esi: c1a7d918   edi: f6571f68   ebp: 
f6571efc   esp: f6571ed8

Oct 28 13:52:14 hora-3 kernel: ds: 007b   es: 007b   ss: 0068
Oct 28 13:52:14 hora-3 kernel: Process opensm (pid: 5224, 
threadinfo=f657 task=f7dfb020)
Oct 28 13:52:14 hora-3 kernel: Stack: f883ef5a  c1a7d800 
080bd018 f6571efc  f6a42900 a0f684f6
Oct 28 13:52:14 hora-3 kernel:f6571f68 f6571f74 f88f1728 
 0018 00e8 00d0 f6a42948
Oct 28 13:52:14 hora-3 kernel:f68bda24  0009 
a0f684f6 0009 c1a7d918  0100

Oct 28 13:52:14 hora-3 kernel: Call Trace:
Oct 28 13:52:14 hora-3 kernel:  [] show_stack+0x7c/0x92
Oct 28 13:52:14 hora-3 kernel:  [] show_registers+0x152/0x1ca
Oct 28 13:52:14 hora-3 kernel:  [] die+0xf4/0x16f
Oct 28 13:52:14 hora-3 kernel:  [] do_page_fault+0x463/0x649
Oct 28 13:52:14 hora-3 kernel:  [] error_code+0x4f/0x54
Oct 28 13:52:14 hora-3 kernel:  [] ib_umad_write+0x2d0/0x30e 
[ib_umad]

Oct 28 13:52:14 hora-3 kernel:  [] vfs_write+0x155/0x15a
Oct 28 13:52:14 hora-3 kernel:  [] sys_write+0x3d/0x64
Oct 28 13:52:14 hora-3 kernel:  [] sysenter_past_esp+0x54/0x75
Oct 28 13:52:14 hora-3 kernel: Code: e8 d8 63 af c7 89 d8 83 c4 0c 5b 5e 
5f 5d c3 55 89 e5 57 56 89 c6 53 83 ec 18 85 f6 89 55 f0 0f 84 ff 00 00 
00 8b 46 08 8d 5e e8 <8b> 50 10 8b 7b 14 85 d2 0f 84 7c 01 00 00 8b 4e 
18 85 c9 74 0b



Thanks
Aniruddha




From: [EMAIL PROTECTED] on behalf of Sean Hefty
Sent: Fri 10/28/2005 12:01 PM
To: Aniruddha Bohra
Cc: openib-general@openib.org
Subject: Re: [openib-general] OpenSM crash with today's trunk



Aniruddha Bohra wrote:
 


Oh well, I guess this is a different bug.  Is there an oops or
anything in your kernel log, or is this just a userspace crash?

 


This is what I see :
Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM

Is this useful?
   



Is there any chance opensm is already running on the system?  It sounds like
something has already registered to receive the same MADs that opensm wants to
receive.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Hal Rosenstock wrote:


Or perhaps something crashed and didn't clean up properly. Does this occur 
immediately after a boot ?

 


This is after a clean reboot.
There are two systems on the switch and this is the only active one.
I will reboot both and see again.

Thanks
Aniruddha




From: [EMAIL PROTECTED] on behalf of Sean Hefty
Sent: Fri 10/28/2005 12:01 PM
To: Aniruddha Bohra
Cc: openib-general@openib.org
Subject: Re: [openib-general] OpenSM crash with today's trunk



Aniruddha Bohra wrote:
 


Oh well, I guess this is a different bug.  Is there an oops or
anything in your kernel log, or is this just a userspace crash?

 


This is what I see :
Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM

Is this useful?
   



Is there any chance opensm is already running on the system?  It sounds like
something has already registered to receive the same MADs that opensm wants to
receive.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [PATCH] add node_guid to struct ib_device

2005-10-28 Thread Sean Hefty
>Thanks, I applied the following version (doesn't add a private kzalloc()
>now that 2.6.14 is out and doesn't rename cap_mask_mutex).

Thanks.  I forgot to include the changes to sysfs.c in my previous patch.
Not sure if we want to wait on this until the other drivers have been updated.
We'll probably want to remove node_guid from the device attributes as well.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>


Index: sysfs.c
===
--- sysfs.c (revision 3892)
+++ sysfs.c (working copy)
@@ -622,21 +622,15 @@
 static ssize_t show_node_guid(struct class_device *cdev, char *buf)
 {
struct ib_device *dev = container_of(cdev, struct ib_device, class_dev);
-   struct ib_device_attr attr;
-   ssize_t ret;
 
if (!ibdev_is_alive(dev))
return -ENODEV;
 
-   ret = ib_query_device(dev, &attr);
-   if (ret)
-   return ret;
-
return sprintf(buf, "%04x:%04x:%04x:%04x\n",
-  be16_to_cpu(((__be16 *) &attr.node_guid)[0]),
-  be16_to_cpu(((__be16 *) &attr.node_guid)[1]),
-  be16_to_cpu(((__be16 *) &attr.node_guid)[2]),
-  be16_to_cpu(((__be16 *) &attr.node_guid)[3]));
+  be16_to_cpu(((__be16 *) &dev->node_guid)[0]),
+  be16_to_cpu(((__be16 *) &dev->node_guid)[1]),
+  be16_to_cpu(((__be16 *) &dev->node_guid)[2]),
+  be16_to_cpu(((__be16 *) &dev->node_guid)[3]));
 }
 
 static CLASS_DEVICE_ATTR(node_type, S_IRUGO, show_node_type, NULL);



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB traffic generators

2005-10-28 Thread Grant Grundler
On Fri, Oct 28, 2005 at 08:40:09AM -0400, Suresh Shelvapille wrote:
> can you please point me to some traffic generators out there.

Hypothetically one could use IPoIB and pktgen driver to generate
UDP-like traffic. Someone more experienced than I could rewrite
pktgen driver to use OpenIB Verbs API to produce "raw" IB traffic.
ib_pktgen would be a cool ULP to have for testing.

grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Hal Rosenstock
Or perhaps something crashed and didn't clean up properly. Does this occur 
immediately after a boot ?



From: [EMAIL PROTECTED] on behalf of Sean Hefty
Sent: Fri 10/28/2005 12:01 PM
To: Aniruddha Bohra
Cc: openib-general@openib.org
Subject: Re: [openib-general] OpenSM crash with today's trunk



Aniruddha Bohra wrote:
>> Oh well, I guess this is a different bug.  Is there an oops or
>> anything in your kernel log, or is this just a userspace crash?
>> 
> This is what I see :
> Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
> Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
> Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM
>
> Is this useful?

Is there any chance opensm is already running on the system?  It sounds like
something has already registered to receive the same MADs that opensm wants to
receive.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Sean Hefty

Aniruddha Bohra wrote:

Oh well, I guess this is a different bug.  Is there an oops or
anything in your kernel log, or is this just a userspace crash?
 

This is what I see :
Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM

Is this useful?


Is there any chance opensm is already running on the system?  It sounds like 
something has already registered to receive the same MADs that opensm wants to 
receive.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Eitan Zahavi
Title: RE: [openib-general] OpenSM crash with today's trunk





This means you have another SM or application already registered for handling SubnetManagement packets. Thus OpenSM fails to start (register as the handler for such requests). The crash is a bug that should be solved. 

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL



> -Original Message-
> From: Aniruddha Bohra [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 28, 2005 5:28 PM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] OpenSM crash with today's trunk
> 
> Roland Dreier wrote:
> 
> >    Aniruddha> I tried with r3888 and r3891 with the same result.
> >
> >Oh well, I guess this is a different bug.  Is there an oops or
> >anything in your kernel log, or is this just a userspace crash?
> >
> >
> This is what I see :
> Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
> Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
> Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM
> 
> Is this useful?
> 
> Aniruddha
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


   Aniruddha> I tried with r3888 and r3891 with the same result.

Oh well, I guess this is a different bug.  Is there an oops or
anything in your kernel log, or is this just a userspace crash?
 


This is what I see :
Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM

Is this useful?

Aniruddha


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] cq callback question

2005-10-28 Thread Steve Wise
This may seem like a dumb question, but can a kernel ULP assume that after 
returning from ib_destroy_qp(), there will be no more callbacks for that QP 
on the associated cq event handler?



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] add node_guid to struct ib_device

2005-10-28 Thread Roland Dreier
Thanks, I applied the following version (doesn't add a private kzalloc()
now that 2.6.14 is out and doesn't rename cap_mask_mutex).

By the way, the ipath and ehca drivers will need something similar.

 - R.

--- include/rdma/ib_verbs.h (revision 3861)
+++ include/rdma/ib_verbs.h (working copy)
@@ -951,6 +951,7 @@
u64  uverbs_cmd_mask;
int  uverbs_abi_ver;
 
+   __be64   node_guid;
u8   node_type;
u8   phys_port_cnt;
 };
--- hw/mthca/mthca_provider.c   (revision 3830)
+++ hw/mthca/mthca_provider.c   (working copy)
@@ -45,6 +45,14 @@
 #include "mthca_user.h"
 #include "mthca_memfree.h"
 
+static void init_query_mad(struct ib_smp *mad)
+{
+   mad->base_version  = 1;
+   mad->mgmt_class= IB_MGMT_CLASS_SUBN_LID_ROUTED;
+   mad->class_version = 1;
+   mad->method= IB_MGMT_METHOD_GET;
+}
+
 static int mthca_query_device(struct ib_device *ibdev,
  struct ib_device_attr *props)
 {
@@ -55,7 +63,7 @@
 
u8 status;
 
-   in_mad  = kmalloc(sizeof *in_mad, GFP_KERNEL);
+   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
if (!in_mad || !out_mad)
goto out;
@@ -64,12 +72,8 @@
 
props->fw_ver  = mdev->fw_ver;
 
-   memset(in_mad, 0, sizeof *in_mad);
-   in_mad->base_version   = 1;
-   in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
-   in_mad->class_version  = 1;
-   in_mad->method = IB_MGMT_METHOD_GET;
-   in_mad->attr_id= IB_SMP_ATTR_NODE_INFO;
+   init_query_mad(in_mad);
+   in_mad->attr_id = IB_SMP_ATTR_NODE_INFO;
 
err = mthca_MAD_IFC(mdev, 1, 1,
1, NULL, NULL, in_mad, out_mad,
@@ -127,20 +131,16 @@
int err = -ENOMEM;
u8 status;
 
-   in_mad  = kmalloc(sizeof *in_mad, GFP_KERNEL);
+   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
if (!in_mad || !out_mad)
goto out;
 
memset(props, 0, sizeof *props);
 
-   memset(in_mad, 0, sizeof *in_mad);
-   in_mad->base_version   = 1;
-   in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
-   in_mad->class_version  = 1;
-   in_mad->method = IB_MGMT_METHOD_GET;
-   in_mad->attr_id= IB_SMP_ATTR_PORT_INFO;
-   in_mad->attr_mod   = cpu_to_be32(port);
+   init_query_mad(in_mad);
+   in_mad->attr_id  = IB_SMP_ATTR_PORT_INFO;
+   in_mad->attr_mod = cpu_to_be32(port);
 
err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1,
port, NULL, NULL, in_mad, out_mad,
@@ -219,18 +219,14 @@
int err = -ENOMEM;
u8 status;
 
-   in_mad  = kmalloc(sizeof *in_mad, GFP_KERNEL);
+   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
if (!in_mad || !out_mad)
goto out;
 
-   memset(in_mad, 0, sizeof *in_mad);
-   in_mad->base_version   = 1;
-   in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
-   in_mad->class_version  = 1;
-   in_mad->method = IB_MGMT_METHOD_GET;
-   in_mad->attr_id= IB_SMP_ATTR_PKEY_TABLE;
-   in_mad->attr_mod   = cpu_to_be32(index / 32);
+   init_query_mad(in_mad);
+   in_mad->attr_id  = IB_SMP_ATTR_PKEY_TABLE;
+   in_mad->attr_mod = cpu_to_be32(index / 32);
 
err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1,
port, NULL, NULL, in_mad, out_mad,
@@ -258,18 +254,14 @@
int err = -ENOMEM;
u8 status;
 
-   in_mad  = kmalloc(sizeof *in_mad, GFP_KERNEL);
+   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
if (!in_mad || !out_mad)
goto out;
 
-   memset(in_mad, 0, sizeof *in_mad);
-   in_mad->base_version   = 1;
-   in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
-   in_mad->class_version  = 1;
-   in_mad->method = IB_MGMT_METHOD_GET;
-   in_mad->attr_id= IB_SMP_ATTR_PORT_INFO;
-   in_mad->attr_mod   = cpu_to_be32(port);
+   init_query_mad(in_mad);
+   in_mad->attr_id  = IB_SMP_ATTR_PORT_INFO;
+   in_mad->attr_mod = cpu_to_be32(port);
 
err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1,
port, NULL, NULL, in_mad, out_mad,
@@ -283,13 +275,9 @@
 
memcpy(gid->raw, out_mad->data + 8, 8);
 
-   memset(in_mad, 0, sizeof *in_mad);
-   in_mad->base_version   = 1;
-   in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
-   in_mad->class_version  = 1;
-   in_mad->method   

Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Roland Dreier
Aniruddha> I tried with r3888 and r3891 with the same result.

Oh well, I guess this is a different bug.  Is there an oops or
anything in your kernel log, or is this just a userspace crash?

If it's just opensm crashing then I'm not much use in debugging.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Boot over IB - support in Bproc status?

2005-10-28 Thread Eric W. Biederman
Hb Chen <[EMAIL PROTECTED]> writes:

> Hi,
> Can anyone point out the current staus of  Boot over IB - support in Bproc?

We have it working here :)

kexec appears to work fine with the openIB stack.  The raw packet interfaces
are a little more difficult to use in the kernel because of the long MAC
address.  But no real problems.

> Also what is the other solution about "mass boot over IB" now?  (openSM, 
> SRP...)

Eric
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] IB traffic generators

2005-10-28 Thread Hal Rosenstock
Hi Suri,
 
The only traffic generator I am aware of is from Agilent (E2950 series) but 
they discontinued their IB support a while ago. I'm not sure if it is still 
available from them.
 
-- Hal



From: [EMAIL PROTECTED] on behalf of Suresh Shelvapille
Sent: Fri 10/28/2005 8:40 AM
To: openib-general@openib.org
Subject: [openib-general] IB traffic generators




Folks:

can you please point me to some traffic generators out there.

Thanks,
Suri

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] IB traffic generators

2005-10-28 Thread Suresh Shelvapille

Folks:

can you please point me to some traffic generators out there.

Thanks,
Suri 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] OpenSM crash with today's trunk

2005-10-28 Thread Aniruddha Bohra

Roland Dreier wrote:


I believe that this is in r3889.

- R.
 


I tried with r3888 and r3891 with the same result.

Aniruddha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] SRQ limit reached async event.

2005-10-28 Thread Tziporet Koren
Title: RE: [openib-general] SRQ limit reached async event.





Which HCA are you using?
Till lately SRQ limit event was supported only for mem-free HCAs.
Now it is supported for full-mem too but you need a special FW for this (4.7.400 release will be next week)


In gen2 it is implemented already for both types of HCAs and if you have the correct FW it will work.
in VAPI (gen1) you need an update of VAPI for this since we blocked it for full-mem cards.


Tziporet


-Original Message-
From: Galen M. Shipman [mailto:[EMAIL PROTECTED]]
Sent: Friday, October 28, 2005 12:34 AM
To: openib-general@openib.org
Subject: [openib-general] SRQ limit reached async event.



Hello,


Does anyone now if openib supports the SRQ limit asynchronous event?
I am working with mellanox verbs right now and it doesn't seem to  
support this. I say this because I have to set the srq_limit  
attribute via VAPI_modify_srq in order to get the event,  
unfortunately when I call VAPI_modify_srq I get:  error in  
VAPI_modify_srq: Not implemented


Any insight is appreciated.


Thanks,


Galen


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general


To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general