However, having a MCA parameter called rds_hostfile_path that belong to the mosix rds is really confusing. In general, we try to follow some strict rules when we define the names of the MCA parameters, and these rules clearly state that the name of the component that will use the MCA parameter is supposed to reflect in the name of the parameter. rds_hostfile_path should therefore be named rds_mosix_path.
Thanks, george. On Oct 21, 2007, at 6:24 AM, David Erukhimovich wrote:
Hi Ralph,I'm sorry to bother you again but adding the new component to rds stilldoesn't work as expected.I've created a new component rds_mosix. it is identical to rds_hostfile(with all parameters names changed) except: in rds/mosix/rds_mosix_component.c/orte_rds_mosix_open: mca_base_param_reg_string("rds_hostfile", "path", "ORTE Host filename", false, false, path, &mca_rds_mosix_component.path); in rds/mosix/rds_mosix.c/orte_rds_mosix_query: rc = mca_base_param_find("rds", "hostfile", "path"); mca_base_param_lookup_string(rc, &mca_rds_mosix_component.path); printf("got hostfile: %s\n", mca_rds_mosix_component.path); So I'm running:mpirun --mca rmaps round_robin --mca rds mosix --hostfile $MOSHOME/4hosts -np 2 hostnameand getting the output: "got hostfile: <default_hostfile_path>" and not the given path. What am I doing wrong? Thank you --David ---------- Forwarded message ---------- From: Ralph Castain <r...@lanl.gov> Date: Oct 20, 2007 6:52 PMSubject: Re: [OMPI devel] Trying to get total procs num in odls frameworkTo: David Erukhimovich <davider...@cs.huji.ac.il>On 10/20/07 10:10 AM, "David Erukhimovich" <davider...@cs.huji.ac.il> wrote:Hi Ralph, 2. I do want the user to be able to switch between my way of processlaunching, and the default way. I can do it using an mca flag, but I would prefer a new component. If I is not too defficult for you, please make thepatch, if it is, I'll just use an mca flag.I can make it next week - shouldn't be too big a deal. I'll let you know ifotherwise.1. Just remmembered another difficulty I had: I've created a new rdscomponent identical to the hostfile one. lets call it mosix. Now, orterun is saving the hostfile path in the mca parameter - rds_hostfile_path orsomething like that. when I try to retrieve rds_hostfile_path orrds_mosix_path in rds_mosix component I always get the default hostfilepath(doesn't matter if I gave an hostfile or not). And I tried everything -changing names in rds_mosix_component, declaring a new parameter rds_mosix_path in various places etc. So now I'm just altering theexistinghostfile component. Do you have any suggestions how to make it work?How are you retrieving the path? Here is the code from hostfile:mca_base_param_reg_string (&mca_rds_hostfile_component.super.rds_version,"path", "ORTE Host filename", false, false, path, &mca_rds_hostfile_component.path); If you look at that, it is actually looking for an mca param of "rds_hostfile_path". If you just copied this code, though, using your component's name, then you would be looking for the mca param"rds_<your-components-name>_path". What you probably need to do is hardwireit to: mca_base_param_reg_string("rds_hostfile", "path", "ORTE Host filename", false, false, path, &default_path);Also, you may be encountering a problem in that the rds_hostfile component is going to try and run as well as your component, and thus may overwrite what you do. You might want to try -mca rds my_component to ensure that onlyyour component gets executed.Sorry for all the questions and thank you very much for the quick answersNot a problem - hope this helps. RalphRegards --David ---------- Forwarded message ---------- From: Ralph Castain <r...@lanl.gov> Date: Oct 20, 2007 5:12 PMSubject: Re: [OMPI devel] Trying to get total procs num in odls frameworkTo: David Erukhimovich <davider...@cs.huji.ac.il> Hi David Thanks for the info - see comments below. Ralph On 10/20/07 6:58 AM, "David Erukhimovich" <davider...@cs.huji.ac.il>wrote:Hi Thank you for your answer. First of all, my two questions wasn't connected and they belong todifferentpart of my project. and the subject of the mail should have been: Tryingtoget total procs num in rds framework (sorry my mistake). Here the parts in the order of the last email1. I've solved the problem about getting total num of procs in rds (just called some function incorrectly), so sorry for disturbing you aboutthat.Now a bit more about what I'm trying to do, maybe there is a better waythenmine:I have a tool (external application) that given a list of machines and a number n , it chooses the n best ones from the list (least loaded ones)andif the list of machines isn't given, it just returns the n best machines from the claster. I am wishing to include this in ompi. hence - given amachinefile, It'll run the process only on the best nodes. If amachinefileisn't given, it'll take the best node that my application returns.I think the best place to implement it is in rds - after building thelistof newly discovered nodes: if it is empty, fill it using my tool,otherwisefilter it using my tool. It seems to me the most logical way to do it. AmIright? I am asking you because I guess you have a better knowledge inompiarchitecture.It sounds like the correct place to me. At some point in the future, you could migrate that logic to the RAS instead, but I would just continue asyou are doing for now.2. The other thing I am trying to do is to make ompi to run everyprocess,not directly, but through external program. e.g: If I want to launch the program "hostname", I want that following to be launched: "<my- program><my-program's-flags> hostname".I figured that the best way to do it is in odls framework because there Ihave the exact executing point.I guess I wouldn't do it that way if I were doing a project of my own. Iwould just go into the default odls module and hardcode the revisedlaunch.I can't see this coming back into the production system, so unless youhavesome reason to want to run both with and without your revision, why gothrough the pain?I am currently working on the checkpoint 1.2.3. I don't work on the trunk because I need the patches to be added on some stable release. Is there a 1.2.* release where the bug is fixed. And if not - when can such fixedversion be stableI don't think there are any plans to backport that fix, though I imagineitcould be done. If not, I could try and create a patch for you next week,though I would again suggest you just hardcode your change into theexistingodls default component to make your life easier. RalphThank you --Davis ---------- Forwarded message ---------- From: Ralph Castain <r...@lanl.gov> Date: Oct 17, 2007 11:22 PMSubject: Re: [OMPI devel] Trying to get total procs num in odls frameworkTo: davider...@cs.huji.ac.il Cc: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> Hi David I could probably answer your questions better if I had a betterunderstanding of what you are trying to do. For example, looking in the hostfile rds for the number of procs to be launched seems strange as thefunctional role of the framework is to simply learn what nodes are available.It would also help to have some idea of what environment you are workingin,and how you configured the beast. Please see comments below. Ralph On 10/17/07 2:47 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:Yo Ralph -- Can you answer these questions? Begin forwarded message:From: David Erukhimovich <davider...@cs.huji.ac.il> Date: October 14, 2007 5:08:45 PM EDT To: de...@open-mpi.orgSubject: [OMPI devel] Trying to get total procs num in odls frameworkReply-To: Open MPI Developers <de...@open-mpi.org> Hello, I have 2 questions: 1. I am trying to get the total number of requested processes for the job in' hostfile' component in rds. I took the job object that was given as a parameter, extracted the application objects and checked how many procs each application has. The result in every run was 0. As I understand, this variable is updated before the rds part. So what am I doing wrong?Do you mean you took the jobid given to the hostfile RDS (which isn't an object, but just a number) and did an orte_rmgr.get_app_context to getthearray of app_contexts? Is there some reason why you would want to do thatthere?Depending upon what the command line looks like, it is possible for the number of procs to be zero - we allow that option and then fill in the number later. If it was specified, though, we do insert the number in theapp_context object.Maybe you could tell me what the command line looks like, the functioncallyou used to get the "application objects", and what field you werelookingat when you found zero?2. I've discovered an undocumented framework - odls.It wasn't exactly hidden...we haven't documented it because we are lazyandthe existing components cover every known environment (or so we thought).;-) Is there some special reason to want to create another one?I've created a new component for it. The problem is that there is no way to switch between the default component and mine (--mca odls <my component> doesn't work). Is there a way to switch between odls components (I saw bprocs there and I guess it is used)?Are you working on the trunk? What r level? Reason I ask: I recently fixed a problem where the command line mcaparamswere not getting passed to the orteds. Your description looks like youhaven't picked up that change. If you have updated recently, and youstillcan't get it to work, then we likely have a lingering problem.If I read your subject line correctly, then I am somewhat puzzled. Youcanlook at the orte/mca/odls/base/odls_base_default_fns.c file, theorte_odls_base_default_get_add_procs_data function and see where we getthetotal number of procs in a job and how that is passed to the orteds. Ifyouhave some new environment that the existing odls components can't handle, then I would strongly suggest you at least use the default functions inthebase to provide as much support as possible as this will help you to keeppace with changes in the system.I would also welcome feedback on what you encountered that required a new odls component - perhaps we can modify the base support functions to makeitfit within one of the existing components. Thanks RalphThank you, --David _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
smime.p7s
Description: S/MIME cryptographic signature