[openib-general] libehca causes segfault when not physically present..

2005-10-30 Thread Troy Benjegerdes
On an Openpower720 system with a mellanox HCA (and no IBM ehca
installed), I get the following when trying to run ibv_rc_pingpong:

Starting program:
/usr/src/openib-src/userspace/libibverbs/examples/.libs/ibv_rc_pingpong
[Thread debugging using libthread_db enabled]
[New Thread 4398046660640 (LWP 6167)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4398046660640 (LWP 6167)]
hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0)
at src/hcp_phyp.c:72
72  *(u64 *) addr = value;
(gdb) bt
#0  hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0)
at src/hcp_phyp.c:72
#1  0x10001b7c in pp_post_recv (ctx=0x100177d0, n=-3807848)
at verbs.h:844
#2  0x10002364 in main (argc=Variable "argc" is not available.
) at examples/rc_pingpong.c:566


I assume this means something somewhere is not actually checking sysfs
to see if the driver is actually there and active.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: [PATCH] SRP: don't use TX IU after freei ng it

2005-10-30 Thread Tziporet Koren
Title: RE: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it





Hi Roland,
When do you expect to apply the FMRs patch for SRP?


Thanks,
Tziporet


-Original Message-
From: Vu Pham [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 11, 2005 8:03 PM
To: Roland Dreier
Cc: [EMAIL PROTECTED]; openib-general@openib.org
Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing
it



Roland,
    Thanks or reviewing it.
    Responding to your feedback, I prepare new patch (attached)



> 
> Why put a pointer to struct list_head here instead of just a struct
> list_head?  If you just used the struct, then you wouldn't need this:
>


Done. Using struct list_head instead of pointer




> > +   u16         in_use;
> >  };
> 
> I can't find anywhere that the in_use flag is used.
>


Removed



> > +static int srp_map_fmr(struct srp_target_port *target, struct scatterlist *scat,
> > +          int sg_cnt, struct srp_request *req)
> 
> [...]
> 
> > +   return -ENOMEM;
> 
> > +           } else if (fmr_cnt <= 0) {
> 
> fmr_cnt is unsigned so I think this is going to get you in trouble.
> Might as well make fmr_cnt a plain int to make things simpler.
> 


In previous patch, fmr_cnt was already declared as int


> Also, it might be good to try and add some more comments explaining
> srp_map_fmr() -- it would definitely help me review.
> 


I added some comments - Hope they help your review (instead 
of confusing you more :))


Signed-off-by: Vu Pham <[EMAIL PROTECTED]>






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] SRQ limit reached async event.

2005-10-30 Thread Tziporet Koren
Title: RE: [openib-general] SRQ limit reached async event.





FW 4.7.400 for Arbel mem-full was officially released yesterday.
Tavor (3.x) release will be by the end of the year.


Tziporet


-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED]]
Sent: Friday, October 28, 2005 12:44 AM
To: Galen M. Shipman
Cc: openib-general@openib.org
Subject: Re: [openib-general] SRQ limit reached async event.



    Galen> Does anyone now if openib supports the SRQ limit
    Galen> asynchronous event?


Yes, openib verbs and the mthca driver supports this.  However, with
current firmware, you will only receive this event for mem-free HCAs
(firmware versions 5.x and 1.x).  For mem-ful HCAs (firmware versions
3.x and 4.x), you will need to use as-yet-unreleased firmware for the
event to be generated.


 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general


To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] opensm errors with ehca

2005-10-30 Thread Troy Benjegerdes
The firmware on the IBM eHCA causes opensm to spit out these kinds of
errors all the time..

Is there a way we can either not send P_KeyTable requests to any eHCA
guids, or figure out what (if anything) is broken in their firmware?

Is this a spec violation, or just ambiguities in implementation?

Oct 30 17:49:46 053820 [43005960] -> umad_receiver: ERR 5409: send
completed wit
h error (method=0x1 attr=0x16 trans_id=0x158c) -- dropping.
Oct 30 17:49:46 053830 [43005960] -> umad_receiver: ERR 5411: DR SMP hop
ptr 0 h
op count 2 DR SLID 0x0 DR DLID 0x0
Oct 30 17:49:46 053839 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR
3113: MA
D completed in error (IB_TIMEOUT).
Oct 30 17:49:46 053861 [43005960] -> SMP dump:
base_ver0x1
mgmt_class..0x81
class_ver...0x1
method..0x1 (SubnGet)
D bit...0x0
status..0x0
hop_ptr.0x0
hop_count...0x2
trans_id0x158c
attr_id.0x16 (P_KeyTable)
resv0x0
attr_mod0x26
m_key...0x
dr_slid.0x
dr_dlid.0x

Initial path: [0][1][16]
Return path:  [0][0][0]
Reserved: [0][0][0][0][0][0][0]

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: SRQ freezes up

2005-10-30 Thread Roland Dreier
> "Ami" == Ami Parlmuter <[EMAIL PROTECTED]> writes:

Ami> running ibv_srq_pingpong pops up two bugs in the SRQ.  1.  a
Ami> failure to RRs to the SRQ after polling completions sent to
Ami> it (the verb ibv_post_srq_recv fails returning -1) 2.  as a
Ami> direct result of this, the other side gets a bad completion
Ami> with RETRY EXCEEDED error, and then the machine freezes up

Anything printed in the console from the kernel when this happens?

Ami> the first bug has been there for quit some time,

Any reason you kept it a secret until now?

Ami> the second only happens from REV 3890 (when the previous
Ami> version I tested was 3382)


I wasn't able to duplicate the exact symptoms you see, but I fixed a
couple of bugs that your test showed for me: one in the uverbs kernel
module that can cause a kernel panic, and one in the srq_pingpong
example that would cause a CQ overrun.

Do you still see problems with the latest svn code?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Missing ib_al.h file?

2005-10-30 Thread Hal Rosenstock
On Sun, 2005-10-30 at 11:23, Steven Wooding wrote:
> Thanks Hal.
> 
> That makes sense. I'll give that a go.

This is built as part of libosmvendor so if you build OpenSM, you will
have this to link with.

-- Hal

> Cheers,
> 
> Steve.
> 
> --- Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> >  
> > That's an IBAL file (gen1). You need to build with
> > VENDOR=openib to use this which should not need that
> > file.
> >  
> > -- Hal
> > 
> 
> 
> 
>   
>   
>   
> ___ 
> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with 
> voicemail http://uk.messenger.yahoo.com

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] SRQ freezes up

2005-10-30 Thread Ami Parlmuter
Title: SRQ freezes up





running ibv_srq_pingpong pops up two bugs in the SRQ.
1.  a failure to RRs to the SRQ after polling completions sent to it (the verb ibv_post_srq_recv fails returning -1)
2.  as a direct result of this, the other side gets a bad completion with RETRY EXCEEDED error, and then the machine freezes up

the first bug has been there for quit some time, the second only happens from REV 3890 (when the previous version I tested was 3382)

the command lines I used with the test:
server: /usr/local/bin/ibv_srq_pingpong --port=19872 --ib-dev=mthca0 --ib-port=1 -n 1 --num-qp=200 --rx-depth=5
client:  /usr/local/bin/ibv_srq_pingpong --port=19872 --ib-dev=mthca0 --ib-port=1 -n 1 --num-qp=200 --rx-depth=5 SERVER_IP_ADDR


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Missing ib_al.h file?

2005-10-30 Thread Steven Wooding
Thanks Hal.

That makes sense. I'll give that a go.

Cheers,

Steve.

--- Hal Rosenstock <[EMAIL PROTECTED]> wrote:

> Hi,
>  
> That's an IBAL file (gen1). You need to build with
> VENDOR=openib to use this which should not need that
> file.
>  
> -- Hal
> 






___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] 2.6.14 patches

2005-10-30 Thread Michael S. Tsirkin
Hi!
Sean, Hal, now that 2.6.14 is out, do you plan to apply
the patches in https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/?
Once you do, I'll put reverted patches in the backport directory.

I suggest we then remove the rest of the 2.6.14-rc3 files from the patches
directory except linux-2.6.14-fib-frontend.diff - what do you guys think?  I
already did this for SDP and for linux-2.6.14-rc3-sdp_link.diff

I took the liberty to rename linux-2.6.14-rc3-fib-frontend.diff to
linux-2.6.14-fib-frontend.diff, since the patch is for 2.6.14 as well.

Thanks,
-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Patches for Opensm

2005-10-30 Thread Yael Kalka
Title: Patches for Opensm





Hi Hal,


I noticed that you've checked in a change to the osm trunk few days ago without
sending a patch regarding it.
Since I am the owner of the opensm tree under Windows, and I am trying to keep
the Windows tree as similar as possible to the Linux tree - I want to know 
about checkins to the osm tree, so I can add the patches to the Windows tree as well.
Please send an e-mail with a patch when you commit changes to the osm tree.


Thanks,
Yael



-Original Message-
From: Yael Kalka [mailto:[EMAIL PROTECTED]]
Sent: Thursday, October 27, 2005 3:04 PM
To: [EMAIL PROTECTED]
Cc: openib-general@openib.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [PATCH] Opensm - fix lmc algorithm



Hi Hal,


We noticed a problem in the lmc assignment algorithm.
In the current code - when trying to run opensm with lmc > 0, the
opensm goes into infinite loop.
Debugging the problem we noticed that there is a problem with the
lid assignment, and we changed the algorithm. The change is in the
osm_lid_mgr_init_sweep function.
We have done some testing to the new code, and it seems that the lmc
assignment is ok with the fix.


Thanks,
Yael


Signed-off-by:  Yael Kalka <[EMAIL PROTECTED]>


Index: opensm/osm_lid_mgr.c
===
--- opensm/osm_lid_mgr.c    (revision 3848)
+++ opensm/osm_lid_mgr.c    (working copy)
@@ -337,7 +337,7 @@ __osm_lid_mgr_init_sweep(
   uint16_t max_defined_lid;
   uint16_t max_persistent_lid;
   uint16_t max_discovered_lid;
-  uint16_t lid, l;
+  uint16_t lid;
   uint16_t disc_min_lid;
   uint16_t disc_max_lid;
   uint16_t db_min_lid;
@@ -349,16 +349,23 @@ __osm_lid_mgr_init_sweep(
   osm_port_t  *p_port;
   cl_qmap_t   *p_port_guid_tbl;
   uint8_t  lmc_num_lids = (uint8_t)(1 << p_mgr->p_subn->opt.lmc);
+  uint16_t lmc_mask;
+  uint16_t req_lid, num_lids;
   
   OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_init_sweep );
 
+  if (p_mgr->p_subn->opt.lmc)
+    lmc_mask = ~((1 << p_mgr->p_subn->opt.lmc) - 1);
+  else
+    lmc_mask = 0x;
+
   /* if we came out of standby we need to discard any previous guid 2 lid
  info we might had */
   if ( p_mgr->p_subn->coming_out_of_standby == TRUE )
   {
 osm_db_clear( p_mgr->p_g2l );
 for (lid = 0; lid < cl_ptr_vector_get_size(&p_mgr->used_lids); lid++)
-  cl_ptr_vector_set(&p_mgr->used_lids, lid, NULL);
+  cl_ptr_vector_set(p_persistent_vec, lid, NULL);
   }
 
   /* we need to cleanup the empty ranges list */
@@ -375,7 +382,7 @@ __osm_lid_mgr_init_sweep(
 
   /* we if are on the first sweep and in re-assign lids mode 
  we should ignore all the available info and simply define one 
- hufe empty range */
+ huge empty range */
   if ((p_mgr->p_subn->first_time_master_sweep == TRUE) &&
   (p_mgr->p_subn->opt.reassign_lids == TRUE ))
   {
@@ -398,6 +405,34 @@ __osm_lid_mgr_init_sweep(
 osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid);
 for (lid = disc_min_lid; lid <= disc_max_lid; lid++)
   cl_ptr_vector_set(p_discovered_vec, lid, p_port );
+    /* make sure the guid2lid entry is valid. If not - clean it. */
+    if (!osm_db_guid2lid_get( p_mgr->p_g2l,
+  cl_ntoh64(osm_port_get_guid(p_port)),
+  &db_min_lid, &db_max_lid))
+    {
+  if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) !=
+   IB_NODE_TYPE_SWITCH)
+    num_lids = lmc_num_lids;
+  else
+    num_lids = 1;
+
+  if ((num_lids != 1) &&
+  (((db_min_lid & lmc_mask) != db_min_lid) ||
+   (db_max_lid - db_min_lid + 1 < num_lids)) )
+  {
+    /* Not alligned, or not wide enough - remove the entry */
+    osm_log( p_mgr->p_log, OSM_LOG_DEBUG,
+ "__osm_lid_mgr_init_sweep: "
+ "Cleaning persistent entry for guid:0x%016" PRIx64
+ " illegal range:[0x%x:0x%x] \n",
+ cl_ntoh64(osm_port_get_guid(p_port)), db_min_lid,
+ db_max_lid );
+    osm_db_guid2lid_delete( p_mgr->p_g2l,
+    cl_ntoh64(osm_port_get_guid(p_port)));
+    for ( lid = db_min_lid ; lid <= db_max_lid ; lid++ )
+  cl_ptr_vector_set(p_persistent_vec, lid, NULL);
+  }
+    }
   }
 
   /* 
@@ -434,7 +469,7 @@ __osm_lid_mgr_init_sweep(
   {
 is_free = TRUE;
 /* first check to see if the lid is used by a persistent assignment */
-    if ((lid < max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid))
+    if ((lid <= max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid))
 {
   osm_log( p_mgr->p_log, OSM_LOG_DEBUG,
    "__osm_lid_mgr_init_sweep: "
@@ -442,62 +477,86 @@ __osm_lid_mgr_init_sweep(
    lid);
   is_free = FALSE;
 }
-
-    /* check

[openib-general] $B40A4L5NA$G9,$;C5$7(B

2005-10-30 Thread info
$B"c"dF|[EMAIL PROTECTED]:[EMAIL PROTECTED]"d(B
$B"(F|K\:_=;[EMAIL PROTECTED](B
http://1191.jp/kensaku/index.html
$B5U1g!&%(%C%A$J%A%c%C%H$dEEOC!"%a!<%k8r49!&(B1-$BBP(B-1$B$N%;%C%/%9!&HkL)$N4X78!&(BSM$B4X78!&=wAu$dCKAu!&!&!&!&!&!&!&!&[EMAIL
 PROTECTED]/[EMAIL PROTECTED](B
http://1191.jp/kensaku/index.html
$B$*;n$7EPO?$NJ}$K40A4(B1$B1_J,:9$7>e$2$^$9!#(B



***
$B!|(BNO.I don't veceive your mail$B!|(B
[EMAIL PROTECTED]
$B!|:#8e!"l9g$O!|(B
[EMAIL PROTECTED]
***


18$B:PL$K~$N$4MxMQ$O1sN8$/[EMAIL PROTECTED](B
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [ANNOUNCE] 2.6.13 added to backport directory

2005-10-30 Thread Michael S. Tsirkin
Hello!
Now that 2.6.14 is out, patches to make svn trunk compile against
2.6.13 and older kernels have been uploaded to
https://openib.org/svn/gen2/branches/backport

Enjoy,

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general