Re: [openib-general] [PATCH] process locked in D state.

2005-06-27 Thread Gleb Natapov
On Mon, Jun 27, 2005 at 09:24:45AM -0700, Roland Dreier wrote:
> Gleb> This is what happens: ibv_close_device() close cmd_fd and
> Gleb> then calls free_context().  free_context() calls munmap to
> Gleb> unmap doorbell registers.  In kernel sys_munmap gets
> Gleb> mm->mmap_sem semaphore and calls do_munmap.  do_munmap is
> Gleb> the last user of the file so it calls release method of the
> Gleb> file (ib_uverbs_close() in our case). ib_uverbs_close()
> Gleb> calls ib_dealloc_ucontext(). ib_dealloc_ucontext() notices
> Gleb> that there is unregistered memory on the file and calls
> Gleb> ib_umem_release(). And there we are trying to acquire
> Gleb> mm->mmap_sem on more time.
> 
> Thanks for the good debugging work.
> 
> Gleb> In attached patch I use down_write_trylock() instead of
> Gleb> down_write() in ib_umem_release(). If semaphore is already
> Gleb> locked we will not update locked_vm statistics. This way
> Gleb> malicious user can only cause harm to itself.
> 
> I don't like this solution -- as you point out, down_write_trylock()
> may fail if there is even momentary contention on the mmap_sem.  So
> for example a different malicious user could poll on /proc//maps
> and cause our locked_vm to continue to grow.
> 
You are right. For a moment I forgot that /proc//maps readable by
the world.

> How about if we use schedule_work() to defer the modification of
> locked_vm?
It seems this is the only sane way to do it.

--
Gleb.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] OpenSM on multiple HCA machine

2005-06-27 Thread Eitan Zahavi
Title: RE: [openib-general] OpenSM on multiple HCA machine





I was not aware of an issue with multiple HCAs in the OpenIB (gen2) stack.
The Gen1 stack had this issue and it was resolved. I hope to be able to focus on OpenIB stack in the coming months such that I can help Hal in fixing these kind of issues  too.

Eitan Zahavi
> -Original Message-
> From: Bernhard Fischer [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, June 28, 2005 12:14 AM
> To: Hal Rosenstock
> Cc: Eitan Zahavi; [EMAIL PROTECTED]; 'openib-general@openib.org'
> Subject: Re: [openib-general] Re: IB Diagnositic Tools
> 
> On Mon, Jun 27, 2005 at 02:04:16PM -0400, Hal Rosenstock wrote:
> >Hi Eitan,
> >
> >On Sat, 2005-06-25 at 15:25, Eitan Zahavi wrote:
> >> Following the discussion about the debug tools, I would like to
> >> propose using OpenSM Vendor layer as a common layer for developing the
> >> debug tools.
> 
> Hal,
> 
> This is kinda offtopic, but (iirc) i once stumbled over the issue of
> "port" vs. "mgmt port" [back then i had access to two 2-port cards]
> where you may have said something along the lines of
> \"There is clearly a bug for multi HCAs in osm_vendor_get_all_port_attr
> which is in the vendor layer. This needs to be fixed and is our problem.
> So I am close to being able to commit what I now have for this and fix
> this later (as there are other multi HCA issues).\"
> Just curious.. did somebody already have a chance to touch those or not?
> 
> On a related note (just the same thing, i tend to think)
> /* "local ports" is(?) phys, shouldn't this exclude port 0 then ? */
> 
> I'm not too familiar with these kindof questions, which might be define
> in a spec, so any hint on this would be well received, at least from my
> part.
> 
> anyone? Eitan?
> --
> thank you,
> Bernhard



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: A new simple ulp (SPTS)

2005-06-27 Thread Jeff Carr
On 06/27/05 13:32, Michael S. Tsirkin wrote:

> Does this mean the bandwidth is 200-300 MByte/sec?

[EMAIL PROTECTED]:~# ./fast_test.pl 10
starting sends
0 messages/sec (0 megabits/sec)
21845 messages/sec (2730 megabits/sec)
26214 messages/sec (3276 megabits/sec)
30247 messages/sec (3780 megabits/sec)
32768 messages/sec (4095 megabits/sec)
32768 messages/sec (4095 megabits/sec)
34192 messages/sec (4274 megabits/sec)
35288 messages/sec (4411 megabits/sec)
36157 messages/sec (4519 megabits/sec)
35746 messages/sec (4468 megabits/sec)

These are the results with 16KB messages; so the other speed problem is
likely due to the smaller 2KB message size or deficiencies in my code.
At any rate, this is good enough for what I need it for.

Jeff
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: A new simple ulp (SPTS)

2005-06-27 Thread Jeff Carr
On 06/27/05 13:32, Michael S. Tsirkin wrote:
> Quoting r. Jeff Carr <[EMAIL PROTECTED]>:
> 
>>Here is an updated version and a simple perl script that tests it's
>>performance. With 2K messages, these were the performance numbers
>>between 2 systems (3.6ghz Xeon/w 133mhz/64bit pci).
>>
>>[EMAIL PROTECTED]:~# ./fast_test.pl 20
>>starting sends
>>0 messages/sec (0 Mb/sec)
>>131072 messages/sec (2047 Mb/sec)
>>131072 messages/sec (2047 Mb/sec)
>>131072 messages/sec (2047 Mb/sec)
>>174762 messages/sec (2730 Mb/sec)
>>163840 messages/sec (2559 Mb/sec)
>>196608 messages/sec (3071 Mb/sec)
>>183500 messages/sec (2867 Mb/sec)
>>174762 messages/sec (2730 Mb/sec)
>>196618 messages/sec (3072 Mb/sec)
>>187254 messages/sec (2925 Mb/sec)
>>180232 messages/sec (2816 Mb/sec)
>>196616 messages/sec (3072 Mb/sec)
>>189333 messages/sec (2958 Mb/sec)
>>183507 messages/sec (2867 Mb/sec)
>>196614 messages/sec (3072 Mb/sec)
>>190656 messages/sec (2978 Mb/sec)
>>202571 messages/sec (3165 Mb/sec)
>>138785 messages/sec (2168 Mb/sec)
>>131075 messages/sec (2048 Mb/sec)
>>
>>Jeff
>>
> 
> 
> Does this mean the bandwidth is 200-300 MByte/sec?

Well, 3000 Mb/sec / 8 == 375 MB/sec :)

yes. It's not as fast as I would hope. I'm not sure what the cause is
yet. I think it's more to do with the small 2K message size.

If I run this test at 16/64KB message size it's likely to have better
performance. The PCI bus has theoretical throughput around 700MB/sec.

> What is the CPU utilization number (easy to sample with e.g. vstat)?

The CPU is maxed out in this code because it's set to spin while polling.

Jeff
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] updates to copyright

2005-06-27 Thread Sean Hefty
The following patch adds copyright statements for Mellanox, Sun,
and Intel.  I'm not the usual maintainer for all of the files listed,
so I wanted to verify that there are no issues applying this.  Note
that for Mellanox I copied an existing copyright statement, rather
than using Michael's patch.  (Tom, this should include your patch.)

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>

Index: include/ib_cm.h
===
--- include/ib_cm.h (revision 2723)
+++ include/ib_cm.h (working copy)
@@ -2,6 +2,7 @@
  * Copyright (c) 2004 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: include/ib_verbs.h
===
--- include/ib_verbs.h  (revision 2723)
+++ include/ib_verbs.h  (working copy)
@@ -4,6 +4,7 @@
  * Copyright (c) 2004 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: include/ib_cache.h
===
--- include/ib_cache.h  (revision 2723)
+++ include/ib_cache.h  (working copy)
@@ -1,5 +1,7 @@
 /*
  * Copyright (c) 2004 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: include/ib_fmr_pool.h
===
--- include/ib_fmr_pool.h   (revision 2723)
+++ include/ib_fmr_pool.h   (working copy)
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: core/agent.c
===
--- core/agent.c(revision 2723)
+++ core/agent.c(working copy)
@@ -4,6 +4,7 @@
  * Copyright (c) 2004, 2005 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004, 2005 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: core/device.c
===
--- core/device.c   (revision 2723)
+++ core/device.c   (working copy)
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2004 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: core/user_mad.c
===
--- core/user_mad.c (revision 2723)
+++ core/user_mad.c (working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2004 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Voltaire, Inc. All rights reserved. 
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: core/cm.c
===
--- core/cm.c   (revision 2723)
+++ core/cm.c   (working copy)
@@ -2,6 +2,7 @@
  * Copyright (c) 2004 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
Index: core/mad.c
===
--- core/mad.c  (revision 2723)
+++ core/mad.c  (working copy)
@@ -1,5 +1,7 @@
 /*
  * Copyright (c) 2004, 2005 Voltaire, Inc. All rights re

Re: [openib-general] [PATCH] process locked in D state.

2005-06-27 Thread Roland Dreier
Something like this should work...

 - R.

--- infiniband/core/uverbs_mem.c(revision 2710)
+++ infiniband/core/uverbs_mem.c(working copy)
@@ -37,6 +37,13 @@
 
 #include "uverbs.h"
 
+struct ib_umem_account_work {
+   struct work_struct work;
+   struct mm_struct  *mm;
+   unsigned long  diff;
+};
+
+
 static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int 
dirty)
 {
struct ib_umem_chunk *chunk, *tmp;
@@ -160,21 +167,53 @@ out:
return ret;
 }
 
-void ib_umem_release(struct ib_device *dev, struct ib_umem *umem)
+static void ib_umem_account(void *work_ptr)
 {
-   struct mm_struct *mm;
+   struct ib_umem_account_work *work = work_ptr;
 
-   mm = get_task_mm(current);
+   down_write(&work->mm->mmap_sem);
+   work->mm->locked_vm -= work->diff;
+   up_write(&work->mm->mmap_sem);
+   mmput(work->mm);
+   kfree(work);
+}
 
-   if (mm) {
-   down_write(&mm->mmap_sem);
-   mm->locked_vm -= PAGE_ALIGN(umem->length + umem->offset) >> 
PAGE_SHIFT;
-   }
+void ib_umem_release(struct ib_device *dev, struct ib_umem *umem)
+{
+   struct mm_struct *mm;
 
__ib_umem_release(dev, umem, 1);
 
+   mm = get_task_mm(current);
if (mm) {
-   up_write(&mm->mmap_sem);
-   mmput(mm);
+   /*
+* We may be called with the mm's mmap_sem already
+* held.  This can happen when a userspace munmap() is
+* the call that drops the last reference to our file
+* and calls our release method.  If there are memory
+* regions to destroy, we'll end up here and not be
+* able to take the mmap_sem.
+*
+* To handle this, we try to grab the mmap_sem, and if
+* we can't get it immediately, we defer the
+* accounting to the system workqueue.
+*/
+   if (down_write_trylock(&mm->mmap_sem)) {
+   mm->locked_vm -= PAGE_ALIGN(umem->length + 
umem->offset) >> PAGE_SHIFT;
+   up_write(&mm->mmap_sem);
+   mmput(mm);
+   } else {
+   struct ib_umem_account_work *work;
+
+   work = kmalloc(sizeof *work, GFP_KERNEL);
+   if (!work)
+   return;
+
+   INIT_WORK(&work->work, ib_umem_account, work);
+   work->mm   = mm;
+   work->diff = PAGE_ALIGN(umem->length + umem->offset) >> 
PAGE_SHIFT;
+
+   schedule_work(&work->work);
+   }
}
 }
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] ibv_open_device() broken in r2720?

2005-06-27 Thread Shirley Ma

>since the base minor has moved from 128 to 192.

That could be the reason.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ibv_open_device() broken in r2720?

2005-06-27 Thread Roland Dreier
Shirley> I am running netpipe test for ibv, the error is Couldn't
Shirley> create InfiniBand context, seems like ibv_open_device()
Shirley> return NULL for r2720. Is it broken, or I did something
Shirley> wrong?

Are you using udev, or creating /dev/ entries by hand?  If you are not
using udev then you need to recreate the /dev/infiniband/uverbsN
files, since the base minor has moved from 128 or 192.

If that's not it, please post strace output for your program.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] ibv_open_device() broken in r2720?

2005-06-27 Thread Shirley Ma

I am running netpipe test for ibv, the
error is Couldn't create InfiniBand context, seems like ibv_open_device()
return NULL for r2720. Is it broken, or I did something wrong?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: SDP: still getting sk_alloc() panic, any ideas?

2005-06-27 Thread Libor Michalek
On Mon, Jun 27, 2005 at 02:27:54PM -0700, Tom Duffy wrote:
> On Mon, 2005-06-27 at 11:17 -0700, Libor Michalek wrote:
> >   The problem is that each call to sk_alloc() is grabbing a reference to
> > the module, but it checks to make sure that there already is at least one
> > reference, if not the top BUG is triggered. In the case of the passive
> > connection there are no other references to the module. You can see that
> > the problem goes away if you open just one socket, even if you don't
> > listen on it, and then try the failing passive connect. When a socket is
> > created it actually grabs two references to the module, one at the sock
> > level and one at the sk level. The first reference at the sock level does
> > not trigger the BUG since it's through another code path. (try_module_get
> > vs. __module_get) This is why we only hit this during passive connect
> > to a system that has no active SDP sockets.
> > 
> >   Not sure the right way to fix this, maybe check to see if the socket
> > table size (dev_root_s.sk_entry) is greater then 0 in sdp_cm_req_handler()
> > before even performing the alloc...
> 
> Hrm.  That seems ugly.  How about a patch to upstream changing
> sk_alloc() to use try_module_get().

  I'm thinking the listen_lookup needs to be moved earlier in the
req_handler ahead of the sk_alloc, since it makes no sense to do
the alloc if we are not going to queue the new incomming connection,
since it just leads to a destroy in the same function.

-Libor
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [Fwd: Returned mail: see transcript for details]

2005-06-27 Thread Tom Duffy
On Mon, 2005-06-27 at 09:51 -0700, Sean Hefty wrote:
> Assuming that these haven't been applied yet, I will try to get to this 
> today or tomorrow.

Great, I don't think they have.

Thanks,

-tduffy


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] cleanup dat provider registration

2005-06-27 Thread James Lentini



On Sun, 26 Jun 2005, Christoph Hellwig wrote:


On Wed, Jun 22, 2005 at 05:13:03PM -0400, James Lentini wrote:


This is an excellent simplification. Committed in revision 2682 with a
few minor modifications:

 - kept printouts in dat_registry_add_provider,
   dat_registry_remove_provider, and dat_registry_list_providers

 - updated printout in dat_ia_close (this wasn't something you
   changed)

 - removed parens around sizeof


kernel style is to have parants around it, but all of the openib code
is different.  well, let's keep it that way.


 - removed space in front of labels


lots of new kernel code uses the space, but again it's okay to stick
to the surrounding code.


The last two are for consistency with the coding style we've been
using. If we've deviated from what is acceptable, let us know.

Given this simplification, I can think of a few more changes:

 - rename api.c to registry.c


Note yet.  The code will get some major surgery still, and as part of
that split into different files again maybe, just on very different
boundaries.


 - remove all "dictionary" references: rename dat_dictionary_search to
   dat_provider_list_search, rename struct dat_dictionary_entry to
   struct dat_provider_list_entry, rename
   dat_dictionary_key_is_equal() to dat_provider_info_is_equal()


I'll plan bigger changes in that area that will kill the dictionary
term, so a simple search and replace is probably not worth it.


Oh well, I already did the cleanup.

I'll add an item to the TODO list to update the API with a 
register/callback interface for accessing the provider.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] [udapl] fix build for x86_64

2005-06-27 Thread Arlin Davis

Signed-off-by: Arlin Davis <[EMAIL PROTECTED]>

Index: dapl/openib/dapl_ib_dto.h
===
--- dapl/openib/dapl_ib_dto.h   (revision 2720)
+++ dapl/openib/dapl_ib_dto.h   (working copy)
@@ -88,7 +88,7 @@
   total_len = 0;
   wr.next = 0;
   wr.num_sge = 0;
-   wr.wr_id = (uint64_t)cookie;
+   wr.wr_id = (uint64_t)(unsigned long)cookie;
   wr.sg_list = ds_array_p;

   for (i = 0; i < segments; i++ ) {
@@ -162,7 +162,7 @@
   wr.opcode = op_type;
   wr.num_sge = 0;
   wr.send_flags = 0;
-   wr.wr_id = (uint64_t)cookie;
+   wr.wr_id = (uint64_t)(unsigned long)cookie;
   wr.sg_list = ds_array_p;
   total_len = 0;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: [PATCH][kdapl] Integrate dapl_hca_alloc/dapl_hca_free to dapl _provider.c

2005-06-27 Thread James Lentini


Ok. Removed in revision 2723.

On Sun, 26 Jun 2005, Itamar Rabenstein wrote:


uDAPL used the dapl_hca structure's ia_list to cleanup IA
resources when the user space process forked. When the code was ported
to the kernel, the list was retained, but the cleanup wasn't.
Suppose a kDAPL consumer (another kernel module) is unloaded and
forgets to close an IA. What do we want to have happen when the kDAPL
module is unloaded? Don't we want the resources associated with any
open IA's cleaned up?

james



NO!!!

If other modules has a bug (forgot to close the IA) we are not going to hide
it.
ib_dat_provider module will not be able to go down (be unloaded) until the
bug will fixed.
Kernel is not a place where you clean other modules BUGS.

Udapl need this feature not Kdapl.

Please delete this list.

Itamar


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: IB Diagnositic Tools

2005-06-27 Thread Hal Rosenstock
On Mon, 2005-06-27 at 17:13, Bernhard Fischer wrote:
> Hal,
> 
> This is kinda offtopic, 

Yes, this is different topic(s).

> but (iirc) i once stumbled over the issue of
> "port" vs. "mgmt port" [back then i had access to two 2-port cards]
> where you may have said something along the lines of
> \"There is clearly a bug for multi HCAs in osm_vendor_get_all_port_attr
> which is in the vendor layer. This needs to be fixed and is our problem.
> So I am close to being able to commit what I now have for this and fix
> this later (as there are other multi HCA issues).\"
> Just curious.. did somebody already have a chance to touch those or not?

I don't think it has been fixed :-( I will need to refresh myself on
this again.

> On a related note (just the same thing, i tend to think)
> /* "local ports" is(?) phys, shouldn't this exclude port 0 then ? */

I guess it depends on what local ports is being used for. This is a
switch issue where there a 2 types of port 0s: base and enhanced.
Neither of these are physical ports. Enhanced port 0 is like an HCA
port. This is relevant as to PortInfo components and also which counters
might also be available. In general, architecturally speaking, local
ports would include all local ports whether physical IB ports or not so
on a switch port 0 is included. Is port 0 causing an issue somewhere ?

-- Hal

> I'm not too familiar with these kindof questions, which might be define
> in a spec, so any hint on this would be well received, at least from my
> part.
> 
> anyone? Eitan?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] mapping between IP address and device name

2005-06-27 Thread Roland Dreier
Itamar> But the ATS will not solve the problem of "many to one".
Itamar> What will the nfs module will do if the the result from
Itamar> the ATS will be a list of "IP's" which only one of them is
Itamar> has permission to the nfs ?  ATS cant tell you who is the
Itamar> source IP.

Thomas> The NFS server exports will function just fine in such a
Thomas> case.  This is no different from any other multihomed
Thomas> client, and /etc/exports can be configured appropriately.

I'm not sure I understand this.  At best, ATS can give you back a list
of IPs.  How do you decide which one to check against the exports?

In a pure IP world, every packet from a multihomed client carries a
source IP address.  So a server can use getpeername() to determine
which address a client is connecting from.  This is fundamentally
different from ATS.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: SDP: still getting sk_alloc() panic, any ideas?

2005-06-27 Thread Tom Duffy
On Mon, 2005-06-27 at 11:17 -0700, Libor Michalek wrote:
>   The problem is that each call to sk_alloc() is grabbing a reference to
> the module, but it checks to make sure that there already is at least one
> reference, if not the top BUG is triggered. In the case of the passive
> connection there are no other references to the module. You can see that
> the problem goes away if you open just one socket, even if you don't
> listen on it, and then try the failing passive connect. When a socket is
> created it actually grabs two references to the module, one at the sock
> level and one at the sk level. The first reference at the sock level does
> not trigger the BUG since it's through another code path. (try_module_get
> vs. __module_get) This is why we only hit this during passive connect
> to a system that has no active SDP sockets.
> 
>   Not sure the right way to fix this, maybe check to see if the socket
> table size (dev_root_s.sk_entry) is greater then 0 in sdp_cm_req_handler()
> before even performing the alloc...

Hrm.  That seems ugly.  How about a patch to upstream changing
sk_alloc() to use try_module_get().

Right now, we could make SDP depend on !CONFIG_MODULE_UNLOAD.  That is
even uglier!

-tduffy


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: IB Diagnositic Tools

2005-06-27 Thread Bernhard Fischer
On Mon, Jun 27, 2005 at 02:04:16PM -0400, Hal Rosenstock wrote:
>Hi Eitan,
>
>On Sat, 2005-06-25 at 15:25, Eitan Zahavi wrote: 
>> Following the discussion about the debug tools, I would like to
>> propose using OpenSM Vendor layer as a common layer for developing the
>> debug tools. 

Hal,

This is kinda offtopic, but (iirc) i once stumbled over the issue of
"port" vs. "mgmt port" [back then i had access to two 2-port cards]
where you may have said something along the lines of
\"There is clearly a bug for multi HCAs in osm_vendor_get_all_port_attr
which is in the vendor layer. This needs to be fixed and is our problem.
So I am close to being able to commit what I now have for this and fix
this later (as there are other multi HCA issues).\"
Just curious.. did somebody already have a chance to touch those or not?

On a related note (just the same thing, i tend to think)
/* "local ports" is(?) phys, shouldn't this exclude port 0 then ? */

I'm not too familiar with these kindof questions, which might be define
in a spec, so any hint on this would be well received, at least from my
part.

anyone? Eitan?
-- 
thank you,
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: A new simple ulp (SPTS)

2005-06-27 Thread Michael S. Tsirkin
Quoting r. Jeff Carr <[EMAIL PROTECTED]>:
> Here is an updated version and a simple perl script that tests it's
> performance. With 2K messages, these were the performance numbers
> between 2 systems (3.6ghz Xeon/w 133mhz/64bit pci).
> 
> [EMAIL PROTECTED]:~# ./fast_test.pl 20
> starting sends
> 0 messages/sec (0 Mb/sec)
> 131072 messages/sec (2047 Mb/sec)
> 131072 messages/sec (2047 Mb/sec)
> 131072 messages/sec (2047 Mb/sec)
> 174762 messages/sec (2730 Mb/sec)
> 163840 messages/sec (2559 Mb/sec)
> 196608 messages/sec (3071 Mb/sec)
> 183500 messages/sec (2867 Mb/sec)
> 174762 messages/sec (2730 Mb/sec)
> 196618 messages/sec (3072 Mb/sec)
> 187254 messages/sec (2925 Mb/sec)
> 180232 messages/sec (2816 Mb/sec)
> 196616 messages/sec (3072 Mb/sec)
> 189333 messages/sec (2958 Mb/sec)
> 183507 messages/sec (2867 Mb/sec)
> 196614 messages/sec (3072 Mb/sec)
> 190656 messages/sec (2978 Mb/sec)
> 202571 messages/sec (3165 Mb/sec)
> 138785 messages/sec (2168 Mb/sec)
> 131075 messages/sec (2048 Mb/sec)
> 
> Jeff
> 

Does this mean the bandwidth is 200-300 MByte/sec?
What is the CPU utilization number (easy to sample with e.g. vstat)?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] process locked in D state.

2005-06-27 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> How about if we use schedule_work() to defer the modification of
> locked_vm?

Good idea.

This would also reduce the time during which we hold the mm
semaphore, which is nice.


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] A new simple ulp (SPTS)

2005-06-27 Thread Jeff Carr
On 06/22/05 09:43, Jeff Carr wrote:
> On 06/21/2005 12:50 PM, Roland Dreier wrote:
> 
> 
>>What happens if you try replacing the send_flags line with the one you
>>have commented out?
>>
>>+ // send_wr.send_flags = IB_SEND_SIGNALED;
> 
> 
> Thanks, you are correct. IB_SEND_SIGNALED gives me the behavior I was
> expecting.

Here is an updated version and a simple perl script that tests it's
performance. With 2K messages, these were the performance numbers
between 2 systems (3.6ghz Xeon/w 133mhz/64bit pci).

[EMAIL PROTECTED]:~# ./fast_test.pl 20
starting sends
0 messages/sec (0 Mb/sec)
131072 messages/sec (2047 Mb/sec)
131072 messages/sec (2047 Mb/sec)
131072 messages/sec (2047 Mb/sec)
174762 messages/sec (2730 Mb/sec)
163840 messages/sec (2559 Mb/sec)
196608 messages/sec (3071 Mb/sec)
183500 messages/sec (2867 Mb/sec)
174762 messages/sec (2730 Mb/sec)
196618 messages/sec (3072 Mb/sec)
187254 messages/sec (2925 Mb/sec)
180232 messages/sec (2816 Mb/sec)
196616 messages/sec (3072 Mb/sec)
189333 messages/sec (2958 Mb/sec)
183507 messages/sec (2867 Mb/sec)
196614 messages/sec (3072 Mb/sec)
190656 messages/sec (2978 Mb/sec)
202571 messages/sec (3165 Mb/sec)
138785 messages/sec (2168 Mb/sec)
131075 messages/sec (2048 Mb/sec)

Jeff
config INFINIBAND_SPTS
tristate "A Simple Page Transfer Scheme over InfiniBand (SPiTS)"
depends on INFINIBAND
---help---
  All this does is let you establish a simple connection
  between two infiniband hosts so you can do 2 way
  data transfers. 
EXTRA_CFLAGS += -Idrivers/infiniband/include

obj-$(CONFIG_INFINIBAND_SPTS)   += ib_spts.o
ib_spts-y   := spts.o

obj-$(CONFIG_INFINIBAND_SPTS)   += ib_cm_spts.o
ib_cm_spts-y:= cm_spts.o

obj-$(CONFIG_INFINIBAND_SPTS)   += ib_cm_spts_client.o
ib_cm_spts_client-y := client_start.o

obj-$(CONFIG_INFINIBAND_SPTS)   += spts_fast.o
spts_fast-y := fast.o

all:
make -C $(BUILDDIR) SUBDIRS=$(PWD) modules

clean:
rm -rf .tmp_versions
rm -f *.o *.ko .*.o.cmd .*.o.d .*.ko.cmd *.mod.c

mytest:
/root/bin/make_and_copy_modules.pl 
./restart_spts.pl
/*
 * Copyright (c) 2005 Linux Machines Inc.  All rights reserved.
 *
 * This software is available to you under the terms of the GNU
 * General Public License (GPL) Version 2, available from the file
 * COPYING in the main directory of this source tree.
 *
 */


#include 
#include 

#include "spts.h"

MODULE_AUTHOR("Jeff Carr");
MODULE_DESCRIPTION("Trigger a client connection");
MODULE_LICENSE("GPL");

void ib_cm_spts_conn_client(int slid, int dlid);

static int slid = 0;
static int dlid = 0;

module_param(slid, int, 0444);
module_param(dlid, int, 0444);

MODULE_PARM_DESC(slid, "Source LID to use for connection.");
MODULE_PARM_DESC(dlid, "Destination LID to use for connection.");

static int __init client_start_init(void)
{
	/* comment out for now for other tests... */
	printk("client_start_init() slid = %d, dlid = %d\n", slid, dlid);
	ib_cm_spts_conn_client(slid, dlid);

	return 0;
}

static void __exit client_start_cleanup(void)
{
	return;
}

module_init(client_start_init);
module_exit(client_start_cleanup);
/*
 *  A Simple Page Transfer Scheme "SPTS" listener
 *
 *
 * Copyright (c) 2005 Linux Machines Inc.  All rights reserved.
 * Copyright (c) 2004 Intel Corporation.  All rights reserved.
 *
 * This software is available to you under the terms of the GNU
 * General Public License (GPL) Version 2, available from the file
 * COPYING in the main directory of this source tree.
 *
 *
 * This self contained kernel module will listen, establish,
 * destroy and manage connections. 
 *
 * I wrote this module so I could tuck away all the details
 * of making and watching connections somewhere from SPiTS.
 *
 * I tried to make this simple and generic enough that it might
 * be a good reference point for anyone else that might be
 * in the same Infiniband boat. 
 *
 * -- Jeff Carr <[EMAIL PROTECTED]>
 *
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 

#include "spts.h"

MODULE_AUTHOR("Jeff Carr");
MODULE_DESCRIPTION("A Simple CM listener intended for use with SPTS");
MODULE_LICENSE("GPL");

DECLARE_MUTEX(ib_cm_spts_lock);

static void ib_cm_spts_add_one		(struct ib_device *device);
static void ib_cm_spts_remove_one	(struct ib_device *device);

static struct ib_client ib_cm_spts_client = {
	.name   = "ib_cm_spts",
	.add= ib_cm_spts_add_one,
	.remove = ib_cm_spts_remove_one
};

struct cm_spts {
	struct workqueue_struct *wq;
	struct work_struct  work;
	atomic_tconnects_left;
	atomic_tdisconnects_left;
	struct semaphoresem;
	wait_queue_head_t   wait;
};

static struct cm_spts test;
struct ib_cm_id *listen_id;
struct ib_device *mydevice;
static int port_num = 1;
ib_cm_c

Re: [openib-general] [PATCH] mthca: report board id in sysfs

2005-06-27 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [PATCH] mthca: report board id in sysfs
> 
> Michael> Its really an ASCII string, so it looks the same
> Michael> (byte-swapped) on all architectures.
> 
> I'm confused... if it's just a string, why don't we just use memcpy()
> to get it from the mailbox?
> 
>  - R.
> 

Because firmware puts it byte-swapped in the mailbox.
E.g. if you put in there "foo" in the mailbox you get "\0oof".

Board ID should be things like "MT_00A001", mailbox has
things like "0_TM00A0\0\0\01".


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: IB Diagnositic Tools

2005-06-27 Thread Eitan Zahavi
Title: RE: IB Diagnositic Tools





> > [EZ] It is available (the code was part of IBAL but needed some fixes
> > etc).
> 
> Needed or still needs some fixes ?
[EZ] Under work but really close to completion
> 
> 
> The current OpenIB topology file has a place where these annotations can
> be made (and displayed).
[EZ] How would you define the internal structure of a 288port switch in the existing topology file? 
Would it support writing code that is able to report something like "board spine2 of system mySwitch is missing"? 


The code that supports all that is part of the simulator code I have posted long ago.
Please give it a look. Especially the Fabric.h, SysDef.h, ibnl_parser.yy in
https://openib.org/svn/gen2/utils/src/linux-user/ibdm/datamodel
> 
> 
> This brings in more things that are not currently ported to OpenIB and
> also there are some issues with some of these tools.
[EZ] Never heard of any specific issue. Can you describe these issues?




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] mapping between IP address and device name

2005-06-27 Thread Talpey, Thomas
At 03:10 AM 6/26/2005, Itamar Rabenstein wrote:
>But the ATS will not solve the problem of "many to one".
>What will the nfs module will do if the the result from the ATS will be 
>a list of "IP's" which only one of them is has permission to the nfs ?
>ATS cant tell you who is the source IP.

The NFS server exports will function just fine in such a case.
This is no different from any other multihomed client, and
/etc/exports can be configured appropriately.

What wouldn't be useful would be to use MAC addresses (GIDs)
for mounting, exports, etc. Can you imagine administering a network
where hardware addresses were the only naming? No sysadmin
would even entertain such an idea.

Tom.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: IB Diagnositic Tools

2005-06-27 Thread Hal Rosenstock
Hi again Eitan,

On Mon, 2005-06-27 at 14:29, Eitan Zahavi wrote:
> Hi Hal,
> 
> > 
> > Is the OpenSM vendor layer available in Windows for OpenIB or is
> this
> > something which needs to be developed ?
> [EZ] It is available (the code was part of IBAL but needed some fixes
> etc).

Needed or still needs some fixes ?
 
> > > Since this layer is already available on both Windows and Linux
> stacks
> > > it could allow us to have the same code tree for both.
> > 
> > Will the Linux distros take it this way (with #ifdef OS)

> [EZ] The different implementations are already included in the OpenSM
> code on the OpenIB trunk. They are not named Win/Linux but IBAL, TS. I
> propose we perform some restructuring where each "vendor" has its own
> package and all of them are actually made into the same "lib" name.
> Such that for OpenSM build it should not make any difference which
> vendor it is linked with. 

Currently, the OpenIB build (for Linux) just deals with the OpenIB
vendor layer. The others are there for tracking to OpenSM releases and
are not part of the build (or current requirements for the build).
 
> > Not sure exactly what you have in mind here but there is something
> like
> > this on the TODO list. How are GUIDs and LIDs aggregated into a name
> ?
> > Is this SystemImageGUID ?
> [EZ] I mean using topology file to map discovered topology to
> specified topology and thus enable the use of the user given names for
> the various systems (instead of guids).

The current OpenIB topology file has a place where these annotations can
be made (and displayed).

> > Not sure what you mean exactly by MAD and topology manipulations
> layers.
> > Are there different tools ? What do they provide different from the
> > current OpenIB diagnostics ?
> [EZ] The MAD layer provides scripting interface for sending receiving
> mads of the various classes. 
> The analysis layers perform topology matching, LFT traversals, credit
> loop analysis, MFT connectivity checks, routing hops histograms, etc.

This brings in more things that are not currently ported to OpenIB and
also there are some issues with some of these tools.
 
> > > If there is an interest in these tools we can provide an open
> version
> > > of those in week or two.
> > 
> > Will the development then be done in the OpenIB tree or will the
> drop
> > model be used ? On the Linux side, the community desires the tools
> to be
> > built with autotools.

> [EZ] The idea is to move the current tools into OpenIB and use OpenIB
> as the development environment. This includes OpenSM and the MAD
> layers.

> All should be autotools.

Glad to hear it.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] user_mad: Support RMPP on receive side

2005-06-27 Thread Hal Rosenstock
Hi,

I am in the process of enabling receive side RMPP in user_mad. There is
an open issue about read (length returned when supplied buffer is too
small) and failing doing it that way, there would be an additional
ioctl. 

I would prefer to do it the read way as that seems more optimal in that
a mad agent doing receives can do a normal MAD sized read and if a
larger RMPP size packet comes in, he just does a read with the larger
size. With the ioctl, the user would need to do an ioctl to get the size
for the read if any RMPPs are possible.

Any objections ? Thanks.

-- Hal

-Forwarded Message-

From: Hal Rosenstock <[EMAIL PROTECTED]>
To: Roland Dreier <[EMAIL PROTECTED]>
Cc: openib-general@openib.org
Subject: [openib-general] Re: [RFC] [PATCH] user_mad: Support RMPP on send side
Date: 20 May 2005 09:23:00 -0400

On Wed, 2005-05-18 at 19:04, Roland Dreier wrote: 
> This looks OK to check in with one small comment on the following:
> 
> - if (copy_to_user(buf, &packet->mad, sizeof packet->mad))
> + if (copy_to_user(buf, &packet->mad,
> +  min(count, packet->length +
> +  sizeof (struct ib_user_mad
>   ret = -EFAULT;
>   else
> - ret = sizeof packet->mad;
> + ret = count;
> 
> This code will truncate a received MAD that is bigger than the buffer
> passed into read(), but return the full size of the packet.  I don't
> think read() is allowed to do this: the return value can be at most
> the count value passed in by the user.
> 
> I think we have two options: truncate and return the actual amount of
> data read to the user, or return an error if the user's buffer is too
> small.

The man page for read states:
   read()  attempts to read up to count bytes from file descriptor fd into
   the buffer starting at buf.
RETURN VALUE
   On success, the number of bytes read is returned (zero indicates end of
   file), and the file position is advanced by this number.

For RMPP reads (next set of changes to this), it would eliminate an
additional call (ioctl) if on a read that a return length larger than
the count supplied indicates that the read did not occur and the count
(buffer size) to be used for the next read to get the entire packet.

It appears to me that there is nothing that enforces the man page
behavior above in Linux so this is a "convention". Is this something we
can take advantage of or do we need to require the additional call to
get the buffer length ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] SDP sk_data_ready() callback

2005-06-27 Thread Libor Michalek
On Wed, Jun 22, 2005 at 10:39:43AM +0200, Arne Redlich wrote:
> Hi,
> 
> I'm trying to use SDP from within the kernel. My problem is that the
> code relies on sk_data_ready() (this callback is modified to wake up a
> Rx thread before executing the original function), but sk_data_ready()
> apparently never gets called. Is there any way to fix this?

  You mean that you replace sk->sk_data_ready, with a similar but
slightly modified version and so rely on sk->sk_data_ready() being
called?

  You're right it's currently not being called, instead we're calling
the function to which it's pointing directly, and for no real reason.
Also, the function that's being called is a duplicate of the function
sock_def_readable() in net/sock.c. I'll look at correcting these
issues.

-Libor
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: IB Diagnositic Tools

2005-06-27 Thread Eitan Zahavi
Title: RE: IB Diagnositic Tools





Hi Hal,


> 
> Is the OpenSM vendor layer available in Windows for OpenIB or is this
> something which needs to be developed ?
[EZ] It is available (the code was part of IBAL but needed some fixes etc).
> 
> > Since this layer is already available on both Windows and Linux stacks
> > it could allow us to have the same code tree for both.
> 
> Will the Linux distros take it this way (with #ifdef OS)
[EZ] The different implementations are already included in the OpenSM code on the OpenIB trunk. They are not named Win/Linux but IBAL, TS. I propose we perform some restructuring where each "vendor" has its own package and all of them are actually made into the same "lib" name. Such that for OpenSM build it should not make any difference which vendor it is linked with. 

> 
> Not sure exactly what you have in mind here but there is something like
> this on the TODO list. How are GUIDs and LIDs aggregated into a name ?
> Is this SystemImageGUID ?
[EZ] I mean using topology file to map discovered topology to specified topology and thus enable the use of the user given names for the various systems (instead of guids).

> 
> 
> Not sure what you mean exactly by MAD and topology manipulations layers.
> Are there different tools ? What do they provide different from the
> current OpenIB diagnostics ?
[EZ] The MAD layer provides scripting interface for sending receiving mads of the various classes. 
The analysis layers perform topology matching, LFT traversals, credit loop analysis, MFT connectivity checks, routing hops histograms, etc.

> 
> > If there is an interest in these tools we can provide an open version
> > of those in week or two.
> 
> Will the development then be done in the OpenIB tree or will the drop
> model be used ? On the Linux side, the community desires the tools to be
> built with autotools.
[EZ] The idea is to move the current tools into OpenIB and use OpenIB as the development environment. This includes OpenSM and the MAD layers.

All should be autotools.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: IB Diagnositic Tools

2005-06-27 Thread Eitan Zahavi
Title: RE: IB Diagnositic Tools





Hi Fabian


> 
> I think this is a decent idea.  My only reservations are that it would require
> everyone to learn the OSM Vendor Layer API.  It might also not allow testing
> nuances in the access layer APIs, which might be useful.
[EZ] This is true. But the API is simple. The MAD flow API is:
bind - to get a handle for sending mads of specific class and registering callbacks
send - to send a mad
get_mad - to get a mad buffer
put_mad - to return it to the driver


The rest can be found in the OpenSM repository under osm_vendor_api.h


> 
> So I think it would be useful to have the test run over each low level MAD API,
> as well as to the OSM Vendor Layer.  I'm a bit weary of adding extra layers
> between the tests and the access layer - it just creates more areas where things
> can go wrong.  That said, I'm not dead set on this and could be convinced
> otherwise, but I just don't know enough about the OSM Vendor Layer at the moment
> and don't have many cycles to learn it.
[EZ] I agree. Code testing should be done in all layers. But writing cluster debug tools is easier with a higher abstraction layer (callbacks vs. polling or blocking reads).

> 
> 
> By system names, you mean node descriptions?
[EZ] If the user provide a file describing the topology in terms of systems then the code uses the names provided in the file in its reports.

For example: Assuming you have a cluster built of a 288port switch and 288 HCAs.
The topology description could then be:
IBSW288 mySwitch
   Leaf1/P1 -> HCA Rack1-Node1 P1
   Leaf1/P2 -> HCA Rack1-Node2 P1
   ...
   Leaf1/P12 -> HCA Rack2-Node3 P1
   Leaf2/P1 ->   HCA anyNameYouWant P2
   


Then any error report can be provided in these names like:
Error with cable from mySwitch/Leaf2/P1 to anyNameYouWant/P1




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: SDP: still getting sk_alloc() panic, any ideas?

2005-06-27 Thread Libor Michalek
On Thu, Jun 23, 2005 at 01:06:47PM -0700, Tom Duffy wrote:
> I am still getting the panic when you try to connect to a machine and
> it is not listening (but has ib_sdp loaded):
> 
> [EMAIL PROTECTED] ~]# --- [cut here ] - [please
> bite here ] -
> Kernel BUG at "/build1/tduffy/openib-work/linux-2.6.12-openib/in:352
> invalid operand:  [1] SMP
> CPU 1
> 
> Any idea about this?

  Sorry, I've been out of the office for a few days. 

  The problem is that each call to sk_alloc() is grabbing a reference to
the module, but it checks to make sure that there already is at least one
reference, if not the top BUG is triggered. In the case of the passive
connection there are no other references to the module. You can see that
the problem goes away if you open just one socket, even if you don't
listen on it, and then try the failing passive connect. When a socket is
created it actually grabs two references to the module, one at the sock
level and one at the sk level. The first reference at the sock level does
not trigger the BUG since it's through another code path. (try_module_get
vs. __module_get) This is why we only hit this during passive connect
to a system that has no active SDP sockets.

  Not sure the right way to fix this, maybe check to see if the socket
table size (dev_root_s.sk_entry) is greater then 0 in sdp_cm_req_handler()
before even performing the alloc...

-Libor

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: IB Diagnositic Tools

2005-06-27 Thread Hal Rosenstock
Hi Eitan,

On Sat, 2005-06-25 at 15:25, Eitan Zahavi wrote: 
> Following the discussion about the debug tools, I would like to
> propose using OpenSM Vendor layer as a common layer for developing the
> debug tools. 

Is the OpenSM vendor layer available in Windows for OpenIB or is this
something which needs to be developed ?

> Since this layer is already available on both Windows and Linux stacks
> it could allow us to have the same code tree for both.

Will the Linux distros take it this way (with #ifdef OS)

> Also I would like to propose developing an enhanced functionality for
> some of the tools.
> 
> Especially adding the concept of reporting using "system names" rather
> then GUIDs and LIDs.

Not sure exactly what you have in mind here but there is something like
this on the TODO list. How are GUIDs and LIDs aggregated into a name ? 
Is this SystemImageGUID ?

> The discovery tool would also be enhanced to perform some basic health
> checks for the fabric.

Sure that can be an additional option. What are the basic health checks
you want to add ?

> As we (Mellanox) already have a "MADs" and "Topology" manipulations
> layers implemented we plan to open them in the OpenIB repository as
> well as develop the enhanced debug capability in OpenIB.

Not sure what you mean exactly by MAD and topology manipulations layers.
Are there different tools ? What do they provide different from the
current OpenIB diagnostics ?

> If there is an interest in these tools we can provide an open version
> of those in week or two.

Will the development then be done in the OpenIB tree or will the drop
model be used ? On the Linux side, the community desires the tools to be
built with autotools.

-- Hal

> Eitan
> 
> Eitan Zahavi
> 
> Design Technology Director
> 
> Mellanox Technologies LTD
> 
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> 
> P.O. Box 586 Yokneam 20692 ISRAEL
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: IB Diagnositic Tools

2005-06-27 Thread Fab Tillier
> From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
> Sent: Saturday, June 25, 2005 12:25 PM
> 
> Following the discussion about the debug tools, I would like to propose using
> OpenSM Vendor layer as a common layer for developing the debug tools. Since
> this layer is already available on both Windows and Linux stacks it could
> allow us to have the same code tree for both.

I think this is a decent idea.  My only reservations are that it would require
everyone to learn the OSM Vendor Layer API.  It might also not allow testing
nuances in the access layer APIs, which might be useful.

So I think it would be useful to have the test run over each low level MAD API,
as well as to the OSM Vendor Layer.  I'm a bit weary of adding extra layers
between the tests and the access layer - it just creates more areas where things
can go wrong.  That said, I'm not dead set on this and could be convinced
otherwise, but I just don't know enough about the OSM Vendor Layer at the moment
and don't have many cycles to learn it.

> Also I would like to propose developing an enhanced functionality for some of
> the tools.
>
> Especially adding the concept of reporting using "system names" rather then
> GUIDs and LIDs.

By system names, you mean node descriptions?

> The discovery tool would also be enhanced to perform some basic health checks
> for the fabric.

I think this would be valuable.

> As we (Mellanox) already have a "MADs" and "Topology" manipulations layers
> implemented we plan to open them in the OpenIB repository as well as develop
> the enhanced debug capability in OpenIB.
>
> If there is an interest in these tools we can provide an open version of those
> in week or two.

Sounds good, thanks!

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] mthca: report board id in sysfs

2005-06-27 Thread Roland Dreier
Michael> Its really an ASCII string, so it looks the same
Michael> (byte-swapped) on all architectures.

I'm confused... if it's just a string, why don't we just use memcpy()
to get it from the mailbox?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] mthca: report board id in sysfs

2005-06-27 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [PATCH] mthca: report board id in sysfs
> 
> Michael> 4 last words in query adapter include the board id
> Michael> (byte-swapped).  Show this board id in sysfs.
> 
> Thanks, looks useful.
> 
> Is the VSD really always byte-swapped or just in big-endian order?  In
> other words, should this:
> 
> + adapter->board_id[i] = __swab32p(outbox + i + 
> QUERY_ADAPTER_BOARD_ID_OFFSET / 4);
> 
> use be32_to_cpup() instead?
> 
>  - R.
> 

Its really an ASCII string, so it looks the same (byte-swapped) on all
architectures.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH][dapl] cleanup dapl_cookie

2005-06-27 Thread Bernhard Fischer
Hi James,

untested.

- cleanup dapl_cookie.c: remove unneeded local variables and simplify
  branches to be consistent with dapl_rmr_cookie_alloc().

Signed-off-by: Bernhard Fischer <[EMAIL PROTECTED]>

thank you,
Bernhard
Index: users/jlentini/linux-kernel/dat-provider/dapl_cookie.c
===
--- users/jlentini/linux-kernel/dat-provider/dapl_cookie.c  (revision 2715)
+++ users/jlentini/linux-kernel/dat-provider/dapl_cookie.c  (working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2005 Bernhard Fischer, All rights reserved.
  *
  * This Software is licensed under one of the following licenses:
  *
@@ -136,8 +137,8 @@ u32 dapl_cb_create(struct dapl_cookie_bu
}
 
return DAT_SUCCESS;
-   } else
-   return DAT_INSUFFICIENT_RESOURCES;
+   }
+   return DAT_INSUFFICIENT_RESOURCES;
 }
 
 /*
@@ -157,8 +158,7 @@ u32 dapl_cb_create(struct dapl_cookie_bu
  */
 void dapl_cb_free(struct dapl_cookie_buffer *buffer)
 {
-   if (NULL != buffer->pool) 
-   kfree(buffer->pool);
+   kfree(buffer->pool);
 }
 
 /*
@@ -181,24 +181,19 @@ void dapl_cb_free(struct dapl_cookie_buf
 u32 dapl_cb_get(struct dapl_cookie_buffer *buffer,
struct dapl_cookie **cookie_ptr)
 {
-   u32 dat_status;
int new_head;
 
BUG_ON(cookie_ptr == NULL);
 
new_head = (atomic_read(&buffer->head) + 1) % buffer->pool_size;
 
-   if (new_head == atomic_read(&buffer->tail)) {
-   dat_status = DAT_INSUFFICIENT_RESOURCES;
-   goto bail;
-   } else {
-   atomic_set(&buffer->head, new_head);
+   if (new_head == atomic_read(&buffer->tail))
+   return DAT_INSUFFICIENT_RESOURCES;
 
-   *cookie_ptr = &buffer->pool[atomic_read(&buffer->head)];
-   dat_status = DAT_SUCCESS;
-   }
-bail:
-   return dat_status;
+   atomic_set(&buffer->head, new_head);
+
+   *cookie_ptr = &buffer->pool[atomic_read(&buffer->head)];
+   return DAT_SUCCESS;
 }
 
 /*
@@ -224,24 +219,19 @@ u32 dapl_rmr_cookie_alloc(struct dapl_co
  struct dapl_cookie **cookie_ptr)
 {
struct dapl_cookie *cookie;
-   u32 dat_status;
 
if (DAT_SUCCESS != dapl_cb_get(buffer, &cookie)) {
*cookie_ptr = NULL;
-   dat_status =
-   DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY);
-   goto bail;
+   return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES,
+DAT_RESOURCE_MEMORY);
}
 
-   dat_status = DAT_SUCCESS;
cookie->type = DAPL_COOKIE_TYPE_RMR;
cookie->val.rmr.rmr = rmr;
cookie->val.rmr.cookie = user_cookie;
 
*cookie_ptr = cookie;
-
-bail:
-   return dat_status;
+   return DAT_SUCCESS;
 }
 
 /*
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] compile issues w/sdp on 2.6.11: /gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_pass.c

2005-06-27 Thread Roland Dreier
Steve> Hello, I'm having problems getting the sdp kernel module to
Steve> build against 2.6.11 and/or 2.6.11.12.  Looks like the
Steve> latest version (2665) of sdp modified some of the sdp
Steve> debugging and socket debugging.  The last sdp module that I
Steve> can get to work is r2663.

Now that 2.6.12 is out, the SDP code in the svn tree has been updated
to match the new sockets API in the 2.6.12 kernel.

 - R>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] compile issues w/sdp on 2.6.11: /gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_pass.c

2005-06-27 Thread Steve Merker
Hello,
I'm having problems getting the sdp kernel module to build against
2.6.11 and/or 2.6.11.12.  Looks like the latest version (2665) of sdp
modified some of the sdp debugging and socket debugging.  The last sdp
module that I can get to work is r2663.

Wondering if I missed a patch for my sock.h include.
-Steven

[EMAIL PROTECTED] linux-2.6.11.12]# make modules
  CHK include/linux/version.h
make[1]: `arch/i386/kernel/asm-offsets.s' is up to date.
  CC [M]  drivers/infiniband/ulp/sdp/sdp_pass.o
drivers/infiniband/ulp/sdp/sdp_pass.c: In function `sdp_cm_listen_lookup':
drivers/infiniband/ulp/sdp/sdp_pass.c:299: error: `SOCK_DBG'
undeclared (first use in this function)
drivers/infiniband/ulp/sdp/sdp_pass.c:299: error: (Each undeclared
identifier is reported only once
drivers/infiniband/ulp/sdp/sdp_pass.c:299: error: for each function it
appears in.)
drivers/infiniband/ulp/sdp/sdp_pass.c:301: error: `SOCK_LOCALROUTE'
undeclared (first use in this function)
drivers/infiniband/ulp/sdp/sdp_pass.c:307: error: `SOCK_RCVTSTAMP'
undeclared (first use in this function)
make[3]: *** [drivers/infiniband/ulp/sdp/sdp_pass.o] Error 1
make[2]: *** [drivers/infiniband/ulp/sdp] Error 2
make[1]: *** [drivers/infiniband] Error 2
make: *** [drivers] Error 2
--
[EMAIL PROTECTED] ulp]# svn info sdp/sdp_pass.c
Path: sdp/sdp_pass.c
Name: sdp_pass.c
URL: 
https://openib.org/svn/gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_pass.c
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 2719
Node Kind: file
Schedule: normal
Last Changed Author: libor
Last Changed Rev: 2665
Last Changed Date: 2005-06-20 22:14:58 -0700 (Mon, 20 Jun 2005)
Text Last Updated: 2005-06-27 09:48:44 -0700 (Mon, 27 Jun 2005)
Properties Last Updated: 2005-06-24 10:07:28 -0700 (Fri, 24 Jun 2005)
Checksum: 714d1584bd100396a287538ff65f01d0
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [Fwd: Returned mail: see transcript for details]

2005-06-27 Thread Sean Hefty

Tom Duffy wrote:

I have grepped the core and include logs for any files I have changed
and added a Sun copyright.

Signed-off-by: Tom Duffy <[EMAIL PROTECTED]>



Assuming that these haven't been applied yet, I will try to get to this 
today or tomorrow.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] process locked in D state.

2005-06-27 Thread Roland Dreier
Gleb> This is what happens: ibv_close_device() close cmd_fd and
Gleb> then calls free_context().  free_context() calls munmap to
Gleb> unmap doorbell registers.  In kernel sys_munmap gets
Gleb> mm->mmap_sem semaphore and calls do_munmap.  do_munmap is
Gleb> the last user of the file so it calls release method of the
Gleb> file (ib_uverbs_close() in our case). ib_uverbs_close()
Gleb> calls ib_dealloc_ucontext(). ib_dealloc_ucontext() notices
Gleb> that there is unregistered memory on the file and calls
Gleb> ib_umem_release(). And there we are trying to acquire
Gleb> mm->mmap_sem on more time.

Thanks for the good debugging work.

Gleb> In attached patch I use down_write_trylock() instead of
Gleb> down_write() in ib_umem_release(). If semaphore is already
Gleb> locked we will not update locked_vm statistics. This way
Gleb> malicious user can only cause harm to itself.

I don't like this solution -- as you point out, down_write_trylock()
may fail if there is even momentary contention on the mmap_sem.  So
for example a different malicious user could poll on /proc//maps
and cause our locked_vm to continue to grow.

How about if we use schedule_work() to defer the modification of
locked_vm?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] mthca: report board id in sysfs

2005-06-27 Thread Roland Dreier
Michael> 4 last words in query adapter include the board id
Michael> (byte-swapped).  Show this board id in sysfs.

Thanks, looks useful.

Is the VSD really always byte-swapped or just in big-endian order?  In
other words, should this:

+   adapter->board_id[i] = __swab32p(outbox + i + 
QUERY_ADAPTER_BOARD_ID_OFFSET / 4);

use be32_to_cpup() instead?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] mthca: report board id in sysfs

2005-06-27 Thread Michael S. Tsirkin
4 last words in query adapter include the board id (byte-swapped).
Show this board id in sysfs.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

Index: hw/mthca/mthca_dev.h
===
--- hw/mthca/mthca_dev.h(revision 2719)
+++ hw/mthca/mthca_dev.h(working copy)
@@ -66,6 +66,10 @@ enum {
 };
 
 enum {
+   MTHCA_BOARD_ID_LEN = 0x10
+};
+
+enum {
MTHCA_EQ_CONTEXT_SIZE =  0x40,
MTHCA_CQ_CONTEXT_SIZE =  0x40,
MTHCA_QP_CONTEXT_SIZE = 0x200,
@@ -245,6 +249,7 @@ struct mthca_dev {
unsigned longdevice_cap_flags;
 
u32  rev_id;
+   u8   board_id[MTHCA_BOARD_ID_LEN];
 
/* firmware info */
u64  fw_ver;
Index: hw/mthca/mthca_main.c
===
--- hw/mthca/mthca_main.c   (revision 2719)
+++ hw/mthca/mthca_main.c   (working copy)
@@ -284,6 +284,7 @@ static int __devinit mthca_init_tavor(st
 
mdev->eq_table.inta_pin = adapter.inta_pin;
mdev->rev_id= adapter.revision_id;
+   memcpy(mdev->board_id, adapter.board_id, sizeof mdev->board_id);
 
return 0;
 
Index: hw/mthca/mthca_provider.c
===
--- hw/mthca/mthca_provider.c   (revision 2719)
+++ hw/mthca/mthca_provider.c   (working copy)
@@ -955,14 +955,23 @@ static ssize_t show_hca(struct class_dev
}
 }
 
+static ssize_t show_board(struct class_device *cdev, char *buf)
+{
+   struct mthca_dev *dev = container_of(cdev, struct mthca_dev, 
ib_dev.class_dev);
+   return sprintf(buf, "%.*s\n", MTHCA_BOARD_ID_LEN, dev->board_id);
+}
+
+
 static CLASS_DEVICE_ATTR(hw_rev,   S_IRUGO, show_rev,NULL);
 static CLASS_DEVICE_ATTR(fw_ver,   S_IRUGO, show_fw_ver, NULL);
 static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca,NULL);
+static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board,NULL);
 
 static struct class_device_attribute *mthca_class_attributes[] = {
&class_device_attr_hw_rev,
&class_device_attr_fw_ver,
-   &class_device_attr_hca_type
+   &class_device_attr_hca_type,
+   &class_device_attr_board_id
 };
 
 int mthca_register_device(struct mthca_dev *dev)
Index: hw/mthca/mthca_cmd.c
===
--- hw/mthca/mthca_cmd.c(revision 2719)
+++ hw/mthca/mthca_cmd.c(working copy)
@@ -1087,13 +1087,14 @@ int mthca_QUERY_ADAPTER(struct mthca_dev
 {
struct mthca_mailbox *mailbox;
u32 *outbox;
-   int err;
+   int i, err;
 
 #define QUERY_ADAPTER_OUT_SIZE 0x100
 #define QUERY_ADAPTER_VENDOR_ID_OFFSET 0x00
 #define QUERY_ADAPTER_DEVICE_ID_OFFSET 0x04
 #define QUERY_ADAPTER_REVISION_ID_OFFSET   0x08
 #define QUERY_ADAPTER_INTA_PIN_OFFSET  0x10
+#define QUERY_ADAPTER_BOARD_ID_OFFSET  0xF0
 
mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL);
if (IS_ERR(mailbox))
@@ -,6 +1112,9 @@ int mthca_QUERY_ADAPTER(struct mthca_dev
MTHCA_GET(adapter->revision_id, outbox, 
QUERY_ADAPTER_REVISION_ID_OFFSET);
MTHCA_GET(adapter->inta_pin, outbox,QUERY_ADAPTER_INTA_PIN_OFFSET);
 
+   for (i = 0; i < sizeof adapter->board_id / 4; ++i)
+   adapter->board_id[i] = __swab32p(outbox + i + 
QUERY_ADAPTER_BOARD_ID_OFFSET / 4);
+
 out:
mthca_free_mailbox(dev, mailbox);
return err;
Index: hw/mthca/mthca_cmd.h
===
--- hw/mthca/mthca_cmd.h(revision 2719)
+++ hw/mthca/mthca_cmd.h(working copy)
@@ -187,6 +187,7 @@ struct mthca_adapter {
u32 device_id;
u32 revision_id;
u8  inta_pin;
+   u32 board_id[MTHCA_BOARD_ID_LEN / 4];
 };
 
 struct mthca_init_hca_param {

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] [kdapl] Update EP state before disconnect

2005-06-27 Thread James Lentini

Committed in revision 2718.

On Fri, 24 Jun 2005, Hal Rosenstock wrote:

halr> [kdapl] Make update of EP state consistent. Update the EP state before
halr> disconnecting.
halr> 
halr> Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
halr> 
halr> 
halr> Index: dapl_evd.c
halr> ===
halr> -- dapl_evd.c  (revision 2713)
halr> +++ dapl_evd.c  (working copy)
halr> @@ -789,11 +789,11 @@
halr>  * reset the state to DISCONNECTED as we don't
halr>  * expect a callback on an ABRUPT disconnect.
halr>  */
halr> -   dapl_ib_disconnect(ep, DAT_CLOSE_ABRUPT_FLAG);
halr> spin_lock_irqsave(&ep->common.lock, 
ep->common.flags);
halr> ep->param.ep_state = DAT_EP_STATE_DISCONNECTED;
halr> spin_unlock_irqrestore(&ep->common.lock,
halr>ep->common.flags);
halr> +   dapl_ib_disconnect(ep, DAT_CLOSE_ABRUPT_FLAG);
halr> }
halr> }
halr>  
halr> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [openib-commits] r2629 - in gen2/branches/shaharf-ibat/src/linux-kernel/infiniband: core include

2005-06-27 Thread Hal Rosenstock
On Mon, 2005-06-27 at 07:01, Bernhard Fischer wrote:
> Hal,
> 
> On Wed, Jun 15, 2005 at 12:27:18PM -0700, [EMAIL PROTECTED] wrote:
> 
> >Support callback for route by ip
> 
> >Modified: gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c
> >===
> >--- gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c
> >2005-06-15 18:22:01 UTC (rev 2628)
> >+++ gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c
> >2005-06-15 19:27:17 UTC (rev 2629)
> 
> >@@ -414,13 +495,30 @@
> > printk(KERN_ERR "ib_uat_event: uevent %p ctx %p status 
> > %d not completed\n", uevent, uevent->ctx, uevent->ctx->status);
> > uevent->ctx->rec_num = -EIO;
> > }
> >-/* Copy path records returned from SA back to userspace */
> >-if (copy_to_user(uevent->ctx->user_path_arr,
> >- uevent->ctx->path_arr,
> >- uevent->ctx->user_length))
> >-uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
> >-kfree(uevent->ctx->path_arr);
> >-uevent->ctx->path_arr = NULL;
> >+switch (uevent->type) {
> >+case IB_UAT_PATH_EVENT:
> >+/* Copy path records returned from SA to userspace */
> >+if (copy_to_user(uevent->ctx->user_path_arr,
> >+ uevent->ctx->path_arr,
> >+ uevent->ctx->user_length))
> Should { result be set to FAULT here
> >+uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
> }
> >+kfree(uevent->ctx->path_arr);
> >+uevent->ctx->path_arr = NULL;
> 
> 
> 
> >+break;
> >+case IB_UAT_ROUTE_EVENT:
> >+if (copy_to_user(uevent->ctx->user_ib_route,
> >+ uevent->ctx->ib_route,
> >+ sizeof(*uevent->ctx->user_ib_route)))
> 
> .. and here {
> >+uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
> }
> >+kfree(uevent->ctx->ib_route);
> >+uevent->ctx->ib_route = NULL;
> 
> 
> 
> >+break;
> >+case IB_UAT_ATS_EVENT:
> >+default:
> >+printk(KERN_ERR "ib_uat_event: type %d not handled\n",
> >+   uevent->type);
> >+break;
> >+}
> > uevent->resp.callback = (u64)(unsigned 
> > long)uevent->ctx->user_callback;
> > uevent->resp.context = (u64)(unsigned 
> > long)uevent->ctx->user_context;
> > uevent->resp.req_id = uevent->ctx->req_id;
> >
> .. and take result into account before putting the response back to the
> user or is setting the status like you did above enough?

I just changed this to deal with this error as you suggested.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [openib-commits] r2629 - in gen2/branches/shaharf-ibat/src/linux-kernel/infiniband: core include

2005-06-27 Thread Bernhard Fischer
Hal,

On Wed, Jun 15, 2005 at 12:27:18PM -0700, [EMAIL PROTECTED] wrote:

>Support callback for route by ip

>Modified: gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c
>===
>--- gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c  
>2005-06-15 18:22:01 UTC (rev 2628)
>+++ gen2/branches/shaharf-ibat/src/linux-kernel/infiniband/core/uat.c  
>2005-06-15 19:27:17 UTC (rev 2629)

>@@ -414,13 +495,30 @@
>   printk(KERN_ERR "ib_uat_event: uevent %p ctx %p status 
> %d not completed\n", uevent, uevent->ctx, uevent->ctx->status);
>   uevent->ctx->rec_num = -EIO;
>   }
>-  /* Copy path records returned from SA back to userspace */
>-  if (copy_to_user(uevent->ctx->user_path_arr,
>-   uevent->ctx->path_arr,
>-   uevent->ctx->user_length))
>-  uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
>-  kfree(uevent->ctx->path_arr);
>-  uevent->ctx->path_arr = NULL;
>+  switch (uevent->type) {
>+  case IB_UAT_PATH_EVENT:
>+  /* Copy path records returned from SA to userspace */
>+  if (copy_to_user(uevent->ctx->user_path_arr,
>+   uevent->ctx->path_arr,
>+   uevent->ctx->user_length))
Should { result be set to FAULT here
>+  uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
}
>+  kfree(uevent->ctx->path_arr);
>+  uevent->ctx->path_arr = NULL;



>+  break;
>+  case IB_UAT_ROUTE_EVENT:
>+  if (copy_to_user(uevent->ctx->user_ib_route,
>+   uevent->ctx->ib_route,
>+   sizeof(*uevent->ctx->user_ib_route)))

.. and here {
>+  uevent->ctx->status = IB_USER_AT_STATUS_ERROR;
}
>+  kfree(uevent->ctx->ib_route);
>+  uevent->ctx->ib_route = NULL;



>+  break;
>+  case IB_UAT_ATS_EVENT:
>+  default:
>+  printk(KERN_ERR "ib_uat_event: type %d not handled\n",
>+ uevent->type);
>+  break;
>+  }
>   uevent->resp.callback = (u64)(unsigned 
> long)uevent->ctx->user_callback;
>   uevent->resp.context = (u64)(unsigned 
> long)uevent->ctx->user_context;
>   uevent->resp.req_id = uevent->ctx->req_id;
>
.. and take result into account before putting the response back to the
user or is setting the status like you did above enough?

Thanks for clarification.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] process locked in D state.

2005-06-27 Thread Gleb Natapov
Hello,

Summary:
If I call ibv_close_device() on the context with unregistered memory
process hangs in D state.

This is what happens:
ibv_close_device() close cmd_fd and then calls free_context().
free_context() calls munmap to unmap doorbell registers.
In kernel sys_munmap gets mm->mmap_sem semaphore and calls do_munmap.
do_munmap is the last user of the file so it calls release method of
the file (ib_uverbs_close() in our case). ib_uverbs_close() calls
ib_dealloc_ucontext(). ib_dealloc_ucontext() notices that there is 
unregistered memory on the file and calls ib_umem_release(). And there
we are trying to acquire mm->mmap_sem on more time.

One way to solve this problem is to call munmap before close in
ibv_close_device() but this will not stop malicious user so this is
not enough.

In attached patch I use down_write_trylock() instead of down_write()
in ib_umem_release(). If semaphore is already locked we will not update
locked_vm statistics. This way malicious user can only cause harm to
itself. 

The solution is not ideal since if some other process holds mmap_sem
(for instance do 'cat /proc/pid/maps') we will not be able to update
locked_vm counter but the chances this happening are close to zero.


Index: trunk/src/userspace/libibverbs/src/device.c
===
--- trunk/src/userspace/libibverbs/src/device.c (revision 2715)
+++ trunk/src/userspace/libibverbs/src/device.c (working copy)
@@ -121,10 +121,9 @@
close(context->async_fd);
for (i = 0; i < context->num_comp; ++i)
close(context->cq_fd[i]);
+   context->device->ops.free_context(context);
close(context->cmd_fd);
 
-   context->device->ops.free_context(context);
-
return 0;
 }
 
Index: trunk/src/linux-kernel/infiniband/core/uverbs_mem.c
===
--- trunk/src/linux-kernel/infiniband/core/uverbs_mem.c (revision 2715)
+++ trunk/src/linux-kernel/infiniband/core/uverbs_mem.c (working copy)
@@ -163,18 +163,19 @@
 void ib_umem_release(struct ib_device *dev, struct ib_umem *umem)
 {
struct mm_struct *mm;
+   int semlocked = 0;
 
mm = get_task_mm(current);
 
-   if (mm) {
-   down_write(&mm->mmap_sem);
+   if (mm && (semlocked = down_write_trylock (&mm->mmap_sem))) {
mm->locked_vm -= PAGE_ALIGN(umem->length + umem->offset) >> 
PAGE_SHIFT;
}
 
__ib_umem_release(dev, umem, 1);
 
if (mm) {
-   up_write(&mm->mmap_sem);
+   if (semlocked)
+   up_write(&mm->mmap_sem);
mmput(mm);
}
 }
--
Gleb.
Index: trunk/src/userspace/libibverbs/src/device.c
===
--- trunk/src/userspace/libibverbs/src/device.c (revision 2715)
+++ trunk/src/userspace/libibverbs/src/device.c (working copy)
@@ -121,10 +121,9 @@
close(context->async_fd);
for (i = 0; i < context->num_comp; ++i)
close(context->cq_fd[i]);
+   context->device->ops.free_context(context);
close(context->cmd_fd);
 
-   context->device->ops.free_context(context);
-
return 0;
 }
 
Index: trunk/src/linux-kernel/infiniband/core/uverbs_mem.c
===
--- trunk/src/linux-kernel/infiniband/core/uverbs_mem.c (revision 2715)
+++ trunk/src/linux-kernel/infiniband/core/uverbs_mem.c (working copy)
@@ -163,18 +163,19 @@
 void ib_umem_release(struct ib_device *dev, struct ib_umem *umem)
 {
struct mm_struct *mm;
+   int semlocked = 0;
 
mm = get_task_mm(current);
 
-   if (mm) {
-   down_write(&mm->mmap_sem);
+   if (mm && (semlocked = down_write_trylock (&mm->mmap_sem))) {
mm->locked_vm -= PAGE_ALIGN(umem->length + umem->offset) >> 
PAGE_SHIFT;
}
 
__ib_umem_release(dev, umem, 1);
 
if (mm) {
-   up_write(&mm->mmap_sem);
+   if (semlocked)
+   up_write(&mm->mmap_sem);
mmput(mm);
}
 }
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH] rdma_lat-09 and results

2005-06-27 Thread Michael S. Tsirkin
Quoting r. Bernhard Fischer <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] Re: [PATCH] rdma_lat-09 and results
> 
> On Wed, Jun 22, 2005 at 09:28:56AM -0700, Grant Grundler wrote:
> 
> >I also don't know how to express
> > if target == static then
> > MORECFLAGS = -rdynamic...
> > endif
> 
> $ cat Makefile 
> BAR:=-Os
> ifeq ($(MAKECMDGOALS),static)
> BAR += -rdynamic
> endif
> #default: all
> static: all
> all:
>   @echo "making ${MAKECMDGOALS}: $(BAR)"
> 
> 
> hth,
> Bernhard
> 

Or better:

static: BAR += -rdynamic


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH] rdma_lat-09 and results

2005-06-27 Thread Bernhard Fischer
On Wed, Jun 22, 2005 at 09:28:56AM -0700, Grant Grundler wrote:

>I also don't know how to express
>   if target == static then
>   MORECFLAGS = -rdynamic...
>   endif

$ cat Makefile 
BAR:=-Os
ifeq ($(MAKECMDGOALS),static)
BAR += -rdynamic
endif
#default: all
static: all
all:
@echo "making ${MAKECMDGOALS}: $(BAR)"


hth,
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general