I see no problem with the latest code from Wesley. The warning is silenced, and was not indicative of any buggy behavior. So for me the problem is closed.
Aurelien Le 8 août 2011 à 15:30, Barrett, Brian W a écrit : > 4) Don't design an interface where you're going to have the circular > dependency. Perhaps it's a sign that the epoch is in the wrong place, or > handled in the wrong way, or the whole naming scheme is wrong. I'm not > sure. I've relented that it doesn't need to be fixed now (since it is > clearly bigger than just the patch introducing the epochs), but I still > think it's a sign we have a basic design problem with the way we're > handling names. > > Brian > > On 8/8/11 1:23 PM, "Thomas Herault" <herault.tho...@gmail.com> wrote: > >> proc_get_epoch( proc ); >> calls ORTE_NAME_PRINT(proc) >> when in debug mode. >> >> When this is the case, ORTE_NAME_PRINT prints all fields of proc, >> including epoch. So, proc->epoch = proc_get_epoch(proc) triggers the >> valgrind warning, because of the print, in debug mode. >> This is the only reason why proc->epoch must be initialized before >> calling proc->epoch = proc_get_epoch( proc ); >> >> I agree with you it's not a good approach, for coding style / avoiding >> this kind of discussions. >> >> Alternative approaches include: >> - having proc_get_epoch have a side effect on proc->epoch instead of >> returning the correct epoch value. Assignment can then be done before the >> debug message. But this is inconsistent with the rest of the interface >> used in proc_name_fns; >> - having proc_get_epoch use another helper than ORTE_NAME_PRINT to print >> the debug info. But this requires coding an alternative >> ORTE_SOMETHING_PRINT function, only for this use; >> - having proc be defined as an opal_object, and set epoch to INVALID (or >> even UNSET) into the constructor. This could induce changes at many >> places, and there is always the risk that some changes are left >> incomplete. >> >> Thomas >> >> Le 8 août 2011 à 12:20, Barrett, Brian W a écrit : >> >>> Ok, that makes some sense. It's a really weird way of doing >>> initialization, though. Some of it is probably how Ralph chose to share >>> nidpid map code, so perhaps it's unavoidable. But since the name isn't >>> use in proc_get_epoch() (or, better not be used in there), I'm not >>> entirely sure why that made valgrind happy... >>> >>> Brian >>> >>> On 8/8/11 9:53 AM, "Wesley Bland" <wbl...@eecs.utk.edu> wrote: >>> >>>> It's not just copying the value from one to the other. >>>> The value is initialized on the first line. The proc_name is passed >>>> into >>>> the function where the jobid and vpid are used (not the epoch). The >>>> function returns a new (the correct) value for the epoch based on the >>>> passed in jobid and vpid which is assigned to the process name. >>>> >>>> This is the same thing as initializing the value to NULL. We do that >>>> all >>>> the time to avoid compiler warnings. I just can't do that here because >>>> this isn't a pointer. If it would make the code look better I can move >>>> the first assignment to the top of the function where the other >>>> initializations are. >>>> >>>> On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W <bwba...@sandia.gov> >>>> wrote: >>>> >>>> On 8/8/11 9:34 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >>>> >>>>> On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote: >>>>> >>>>>> The reason is because valgrind was complaining about uninitialized >>>>>> values that were passed into proc_get_epoch. I saw the same warnings >>>>>> from valgrind when I ran it. I added the code to initialize the >>>>>> values >>>>>> to what really should be the default value and the warnings went >>>>>> away. >>>>>> Since the process_name_t struct isn't an object, it doesn't have an >>>>>> initialization function like so many of the other objects in the >>>>>> code. >>>>>> This is what we have. >>>>> >>>>> Ah, I see -- you are passing peer_name into proc_get_epoch(). I >>>>> missed >>>>> that. >>>>> >>>>> Thanks! >>>>> >>>>>> peer_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> peer_name.vpid = peer_idx; >>>>>> + peer_name.epoch = ORTE_EPOCH_INVALID; >>>>>> peer_name.epoch = orte_ess.proc_get_epoch(&peer_name); >>>> >>>> >>>> I'm still not convinced this is a rational coding style. It seems to >>>> me >>>> that if you're just going to copy the epoch from peer_name to >>>> peer_name, >>>> just assign the epoch to INVALID and be done with it. There's no need >>>> for >>>> both the assignment added in this patch and the line after it. >>>> >>>> Now, normally I'd say that this is fixing a bug and having to fix other >>>> people's bad code shouldn't be your problem. But since you wrote both >>>> lines, fixing it to make sense seems reasonable to me ;). >>>> >>>> Brian >>>> >>>> -- >>>> Brian W. Barrett >>>> Dept. 1423: Scalable System Software >>>> Sandia National Laboratories >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Brian W. Barrett >>> Dept. 1423: Scalable System Software >>> Sandia National Laboratories >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > > > -- > Brian W. Barrett > Dept. 1423: Scalable System Software > Sandia National Laboratories > > > > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel