[openib-general] libehca causes segfault when not physically present..
On an Openpower720 system with a mellanox HCA (and no IBM ehca installed), I get the following when trying to run ibv_rc_pingpong: Starting program: /usr/src/openib-src/userspace/libibverbs/examples/.libs/ibv_rc_pingpong [Thread debugging using libthread_db enabled] [New Thread 4398046660640 (LWP 6167)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 4398046660640 (LWP 6167)] hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0) at src/hcp_phyp.c:72 72 *(u64 *) addr = value; (gdb) bt #0 hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0) at src/hcp_phyp.c:72 #1 0x10001b7c in pp_post_recv (ctx=0x100177d0, n=-3807848) at verbs.h:844 #2 0x10002364 in main (argc=Variable "argc" is not available. ) at examples/rc_pingpong.c:566 I assume this means something somewhere is not actually checking sysfs to see if the driver is actually there and active. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [PATCH] SRP: don't use TX IU after freei ng it
Title: RE: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it Hi Roland, When do you expect to apply the FMRs patch for SRP? Thanks, Tziporet -Original Message- From: Vu Pham [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 11, 2005 8:03 PM To: Roland Dreier Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it Roland, Thanks or reviewing it. Responding to your feedback, I prepare new patch (attached) > > Why put a pointer to struct list_head here instead of just a struct > list_head? If you just used the struct, then you wouldn't need this: > Done. Using struct list_head instead of pointer > > + u16 in_use; > > }; > > I can't find anywhere that the in_use flag is used. > Removed > > +static int srp_map_fmr(struct srp_target_port *target, struct scatterlist *scat, > > + int sg_cnt, struct srp_request *req) > > [...] > > > + return -ENOMEM; > > > + } else if (fmr_cnt <= 0) { > > fmr_cnt is unsigned so I think this is going to get you in trouble. > Might as well make fmr_cnt a plain int to make things simpler. > In previous patch, fmr_cnt was already declared as int > Also, it might be good to try and add some more comments explaining > srp_map_fmr() -- it would definitely help me review. > I added some comments - Hope they help your review (instead of confusing you more :)) Signed-off-by: Vu Pham <[EMAIL PROTECTED]> ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] SRQ limit reached async event.
Title: RE: [openib-general] SRQ limit reached async event. FW 4.7.400 for Arbel mem-full was officially released yesterday. Tavor (3.x) release will be by the end of the year. Tziporet -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED]] Sent: Friday, October 28, 2005 12:44 AM To: Galen M. Shipman Cc: openib-general@openib.org Subject: Re: [openib-general] SRQ limit reached async event. Galen> Does anyone now if openib supports the SRQ limit Galen> asynchronous event? Yes, openib verbs and the mthca driver supports this. However, with current firmware, you will only receive this event for mem-free HCAs (firmware versions 5.x and 1.x). For mem-ful HCAs (firmware versions 3.x and 4.x), you will need to use as-yet-unreleased firmware for the event to be generated. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] opensm errors with ehca
The firmware on the IBM eHCA causes opensm to spit out these kinds of errors all the time.. Is there a way we can either not send P_KeyTable requests to any eHCA guids, or figure out what (if anything) is broken in their firmware? Is this a spec violation, or just ambiguities in implementation? Oct 30 17:49:46 053820 [43005960] -> umad_receiver: ERR 5409: send completed wit h error (method=0x1 attr=0x16 trans_id=0x158c) -- dropping. Oct 30 17:49:46 053830 [43005960] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 h op count 2 DR SLID 0x0 DR DLID 0x0 Oct 30 17:49:46 053839 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MA D completed in error (IB_TIMEOUT). Oct 30 17:49:46 053861 [43005960] -> SMP dump: base_ver0x1 mgmt_class..0x81 class_ver...0x1 method..0x1 (SubnGet) D bit...0x0 status..0x0 hop_ptr.0x0 hop_count...0x2 trans_id0x158c attr_id.0x16 (P_KeyTable) resv0x0 attr_mod0x26 m_key...0x dr_slid.0x dr_dlid.0x Initial path: [0][1][16] Return path: [0][0][0] Reserved: [0][0][0][0][0][0][0] ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: SRQ freezes up
> "Ami" == Ami Parlmuter <[EMAIL PROTECTED]> writes: Ami> running ibv_srq_pingpong pops up two bugs in the SRQ. 1. a Ami> failure to RRs to the SRQ after polling completions sent to Ami> it (the verb ibv_post_srq_recv fails returning -1) 2. as a Ami> direct result of this, the other side gets a bad completion Ami> with RETRY EXCEEDED error, and then the machine freezes up Anything printed in the console from the kernel when this happens? Ami> the first bug has been there for quit some time, Any reason you kept it a secret until now? Ami> the second only happens from REV 3890 (when the previous Ami> version I tested was 3382) I wasn't able to duplicate the exact symptoms you see, but I fixed a couple of bugs that your test showed for me: one in the uverbs kernel module that can cause a kernel panic, and one in the srq_pingpong example that would cause a CQ overrun. Do you still see problems with the latest svn code? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Missing ib_al.h file?
On Sun, 2005-10-30 at 11:23, Steven Wooding wrote: > Thanks Hal. > > That makes sense. I'll give that a go. This is built as part of libosmvendor so if you build OpenSM, you will have this to link with. -- Hal > Cheers, > > Steve. > > --- Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > That's an IBAL file (gen1). You need to build with > > VENDOR=openib to use this which should not need that > > file. > > > > -- Hal > > > > > > > > > ___ > Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with > voicemail http://uk.messenger.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRQ freezes up
Title: SRQ freezes up running ibv_srq_pingpong pops up two bugs in the SRQ. 1. a failure to RRs to the SRQ after polling completions sent to it (the verb ibv_post_srq_recv fails returning -1) 2. as a direct result of this, the other side gets a bad completion with RETRY EXCEEDED error, and then the machine freezes up the first bug has been there for quit some time, the second only happens from REV 3890 (when the previous version I tested was 3382) the command lines I used with the test: server: /usr/local/bin/ibv_srq_pingpong --port=19872 --ib-dev=mthca0 --ib-port=1 -n 1 --num-qp=200 --rx-depth=5 client: /usr/local/bin/ibv_srq_pingpong --port=19872 --ib-dev=mthca0 --ib-port=1 -n 1 --num-qp=200 --rx-depth=5 SERVER_IP_ADDR ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Missing ib_al.h file?
Thanks Hal. That makes sense. I'll give that a go. Cheers, Steve. --- Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi, > > That's an IBAL file (gen1). You need to build with > VENDOR=openib to use this which should not need that > file. > > -- Hal > ___ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] 2.6.14 patches
Hi! Sean, Hal, now that 2.6.14 is out, do you plan to apply the patches in https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/? Once you do, I'll put reverted patches in the backport directory. I suggest we then remove the rest of the 2.6.14-rc3 files from the patches directory except linux-2.6.14-fib-frontend.diff - what do you guys think? I already did this for SDP and for linux-2.6.14-rc3-sdp_link.diff I took the liberty to rename linux-2.6.14-rc3-fib-frontend.diff to linux-2.6.14-fib-frontend.diff, since the patch is for 2.6.14 as well. Thanks, -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Patches for Opensm
Title: Patches for Opensm Hi Hal, I noticed that you've checked in a change to the osm trunk few days ago without sending a patch regarding it. Since I am the owner of the opensm tree under Windows, and I am trying to keep the Windows tree as similar as possible to the Linux tree - I want to know about checkins to the osm tree, so I can add the patches to the Windows tree as well. Please send an e-mail with a patch when you commit changes to the osm tree. Thanks, Yael -Original Message- From: Yael Kalka [mailto:[EMAIL PROTECTED]] Sent: Thursday, October 27, 2005 3:04 PM To: [EMAIL PROTECTED] Cc: openib-general@openib.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: [PATCH] Opensm - fix lmc algorithm Hi Hal, We noticed a problem in the lmc assignment algorithm. In the current code - when trying to run opensm with lmc > 0, the opensm goes into infinite loop. Debugging the problem we noticed that there is a problem with the lid assignment, and we changed the algorithm. The change is in the osm_lid_mgr_init_sweep function. We have done some testing to the new code, and it seems that the lmc assignment is ok with the fix. Thanks, Yael Signed-off-by: Yael Kalka <[EMAIL PROTECTED]> Index: opensm/osm_lid_mgr.c === --- opensm/osm_lid_mgr.c (revision 3848) +++ opensm/osm_lid_mgr.c (working copy) @@ -337,7 +337,7 @@ __osm_lid_mgr_init_sweep( uint16_t max_defined_lid; uint16_t max_persistent_lid; uint16_t max_discovered_lid; - uint16_t lid, l; + uint16_t lid; uint16_t disc_min_lid; uint16_t disc_max_lid; uint16_t db_min_lid; @@ -349,16 +349,23 @@ __osm_lid_mgr_init_sweep( osm_port_t *p_port; cl_qmap_t *p_port_guid_tbl; uint8_t lmc_num_lids = (uint8_t)(1 << p_mgr->p_subn->opt.lmc); + uint16_t lmc_mask; + uint16_t req_lid, num_lids; OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_init_sweep ); + if (p_mgr->p_subn->opt.lmc) + lmc_mask = ~((1 << p_mgr->p_subn->opt.lmc) - 1); + else + lmc_mask = 0x; + /* if we came out of standby we need to discard any previous guid 2 lid info we might had */ if ( p_mgr->p_subn->coming_out_of_standby == TRUE ) { osm_db_clear( p_mgr->p_g2l ); for (lid = 0; lid < cl_ptr_vector_get_size(&p_mgr->used_lids); lid++) - cl_ptr_vector_set(&p_mgr->used_lids, lid, NULL); + cl_ptr_vector_set(p_persistent_vec, lid, NULL); } /* we need to cleanup the empty ranges list */ @@ -375,7 +382,7 @@ __osm_lid_mgr_init_sweep( /* we if are on the first sweep and in re-assign lids mode we should ignore all the available info and simply define one - hufe empty range */ + huge empty range */ if ((p_mgr->p_subn->first_time_master_sweep == TRUE) && (p_mgr->p_subn->opt.reassign_lids == TRUE )) { @@ -398,6 +405,34 @@ __osm_lid_mgr_init_sweep( osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid); for (lid = disc_min_lid; lid <= disc_max_lid; lid++) cl_ptr_vector_set(p_discovered_vec, lid, p_port ); + /* make sure the guid2lid entry is valid. If not - clean it. */ + if (!osm_db_guid2lid_get( p_mgr->p_g2l, + cl_ntoh64(osm_port_get_guid(p_port)), + &db_min_lid, &db_max_lid)) + { + if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != + IB_NODE_TYPE_SWITCH) + num_lids = lmc_num_lids; + else + num_lids = 1; + + if ((num_lids != 1) && + (((db_min_lid & lmc_mask) != db_min_lid) || + (db_max_lid - db_min_lid + 1 < num_lids)) ) + { + /* Not alligned, or not wide enough - remove the entry */ + osm_log( p_mgr->p_log, OSM_LOG_DEBUG, + "__osm_lid_mgr_init_sweep: " + "Cleaning persistent entry for guid:0x%016" PRIx64 + " illegal range:[0x%x:0x%x] \n", + cl_ntoh64(osm_port_get_guid(p_port)), db_min_lid, + db_max_lid ); + osm_db_guid2lid_delete( p_mgr->p_g2l, + cl_ntoh64(osm_port_get_guid(p_port))); + for ( lid = db_min_lid ; lid <= db_max_lid ; lid++ ) + cl_ptr_vector_set(p_persistent_vec, lid, NULL); + } + } } /* @@ -434,7 +469,7 @@ __osm_lid_mgr_init_sweep( { is_free = TRUE; /* first check to see if the lid is used by a persistent assignment */ - if ((lid < max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid)) + if ((lid <= max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid)) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_lid_mgr_init_sweep: " @@ -442,62 +477,86 @@ __osm_lid_mgr_init_sweep( lid); is_free = FALSE; } - - /* check
[openib-general] $B40A4L5NA$G9,$;C5$7(B
$B"c"dF|[EMAIL PROTECTED]:[EMAIL PROTECTED]"d(B $B"(F|K\:_=;[EMAIL PROTECTED](B http://1191.jp/kensaku/index.html $B5U1g!&%(%C%A$J%A%c%C%H$dEEOC!"%a!<%k8r49!&(B1-$BBP(B-1$B$N%;%C%/%9!&HkL)$N4X78!&(BSM$B4X78!&=wAu$dCKAu!&!&!&!&!&!&!&!&[EMAIL PROTECTED]/[EMAIL PROTECTED](B http://1191.jp/kensaku/index.html $B$*;n$7EPO?$NJ}$K40A4(B1$B1_J,:9$7>e$2$^$9!#(B *** $B!|(BNO.I don't veceive your mail$B!|(B [EMAIL PROTECTED] $B!|:#8e!"l9g$O!|(B [EMAIL PROTECTED] *** 18$B:PL$K~$N$4MxMQ$O1sN8$/[EMAIL PROTECTED](B ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [ANNOUNCE] 2.6.13 added to backport directory
Hello! Now that 2.6.14 is out, patches to make svn trunk compile against 2.6.13 and older kernels have been uploaded to https://openib.org/svn/gen2/branches/backport Enjoy, -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general