On Mar 31, 2009, at 11:15 AM, Roberto Ammendola wrote:

Hi all, I am developing a btl module for a custom interconnect board (we call it apelink, it's an academic project), and I am porting the module
from 1.2 (at which it used to work) to 1.3 branch. Two issues:

1) the use of pls_rsh_agent is said to be deprecated. How do I spawn the
jobs using rsh, then?


The "pls" framework was replaced by the "plm" framework. So "plm_rsh_agent" should work. It defaults to "ssh : rsh" meaning that it'll look for ssh in your path, if it finds it, it will use it; if not, it'll look for rsh in your path, if it finds it, it will use it. If not, it'll fail.

2) although compilation is fine, i get a

[gozer1:18640] mca: base: component_find: "mca_btl_apelink" does not
appear to be a valid btl MCA dynamic component (ignored)

already with an ompi_info command. Probably something changed in the 1.3
branch regarding DSO, which I should implement in my btl. Any hint?



This is likely due to dlopen failing with your component -- the most common reason for this is a missing/unresolvable symbol. There's unfortunately a bug in libtool that doesn't show you the exact symbol that is unresolvable -- it instead may give a misleading error such as "file not found". :-(

The way I have gotten around it before is to edit libltdl and add a printf. :-( Try this patch -- it compiles for me but I haven't tested it:

--- opal/libltdl/loaders/dlopen.c.~1~ 2009-03-27 08:06:52.000000000 -0400
+++ opal/libltdl/loaders/dlopen.c       2009-03-31 11:50:05.000000000 -0400
@@ -195,6 +195,9 @@

   if (!module)
     {
+        const char *error;
+        LT__GETERROR(error);
+        fprintf(stderr, "Can't dlopen %s: %s\n", filename, error);
       DL__SETERROR (CANNOT_OPEN);
     }



--
Jeff Squyres
Cisco Systems

Reply via email to