I see no problem with the latest code from Wesley. The warning is silenced, and 
was not indicative of any buggy behavior. So for me the problem is closed. 


Aurelien 

Le 8 août 2011 à 15:30, Barrett, Brian W a écrit :

> 4) Don't design an interface where you're going to have the circular
> dependency.  Perhaps it's a sign that the epoch is in the wrong place, or
> handled in the wrong way, or the whole naming scheme is wrong.  I'm not
> sure.  I've relented that it doesn't need to be fixed now (since it is
> clearly bigger than just the patch introducing the epochs), but I still
> think it's a sign we have a basic design problem with the way we're
> handling names.
> 
> Brian
> 
> On 8/8/11 1:23 PM, "Thomas Herault" <herault.tho...@gmail.com> wrote:
> 
>> proc_get_epoch( proc );
>> calls ORTE_NAME_PRINT(proc)
>> when in debug mode.
>> 
>> When this is the case, ORTE_NAME_PRINT prints all fields of proc,
>> including epoch. So, proc->epoch = proc_get_epoch(proc) triggers the
>> valgrind warning, because of the print, in debug mode.
>> This is the only reason why proc->epoch must be initialized before
>> calling proc->epoch = proc_get_epoch( proc );
>> 
>> I agree with you it's not a good approach, for coding style / avoiding
>> this kind of discussions.
>> 
>> Alternative approaches include:
>> - having proc_get_epoch have a side effect on proc->epoch instead of
>> returning the correct epoch value. Assignment can then be done before the
>> debug message. But this is inconsistent with the rest of the interface
>> used in proc_name_fns;
>> - having proc_get_epoch use another helper than ORTE_NAME_PRINT to print
>> the debug info. But this requires coding an alternative
>> ORTE_SOMETHING_PRINT function, only for this use;
>> - having proc be defined as an opal_object, and set epoch to INVALID (or
>> even UNSET) into the constructor. This could induce changes at many
>> places, and there is always the risk that some changes are left
>> incomplete.
>> 
>> Thomas
>> 
>> Le 8 août 2011 à 12:20, Barrett, Brian W a écrit :
>> 
>>> Ok, that makes some sense.  It's a really weird way of doing
>>> initialization, though.  Some of it is probably how Ralph chose to share
>>> nidpid map code, so perhaps it's unavoidable.  But since the name isn't
>>> use in proc_get_epoch() (or, better not be used in there), I'm not
>>> entirely sure why that made valgrind happy...
>>> 
>>> Brian
>>> 
>>> On 8/8/11 9:53 AM, "Wesley Bland" <wbl...@eecs.utk.edu> wrote:
>>> 
>>>> It's not just copying the value from one to the other.
>>>> The value is initialized on the first line. The proc_name is passed
>>>> into
>>>> the function where the jobid and vpid are used (not the epoch). The
>>>> function returns a new (the correct) value for the epoch based on the
>>>> passed in jobid and vpid which is assigned to the process name.
>>>> 
>>>> This is the same thing as initializing the value to NULL. We do that
>>>> all
>>>> the time to avoid compiler warnings. I just can't do that here because
>>>> this isn't a pointer. If it would make the code look better I can move
>>>> the first assignment to the top of the function where the other
>>>> initializations are.
>>>> 
>>>> On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W <bwba...@sandia.gov>
>>>> wrote:
>>>> 
>>>> On 8/8/11 9:34 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>>>> 
>>>>> On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote:
>>>>> 
>>>>>> The reason is because valgrind was complaining about uninitialized
>>>>>> values that were passed into proc_get_epoch. I saw the same warnings
>>>>>> from valgrind when I ran it. I added the code to initialize the
>>>>>> values
>>>>>> to what really should be the default value and the warnings went
>>>>>> away.
>>>>>> Since the process_name_t struct isn't an object, it doesn't have an
>>>>>> initialization function like so many of the other objects in the
>>>>>> code.
>>>>>> This is what we have.
>>>>> 
>>>>> Ah, I see -- you are passing peer_name into proc_get_epoch().  I
>>>>> missed
>>>>> that.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>>>   peer_name.jobid  = ORTE_PROC_MY_NAME->jobid;
>>>>>>   peer_name.vpid   = peer_idx;
>>>>>> +    peer_name.epoch  = ORTE_EPOCH_INVALID;
>>>>>>   peer_name.epoch  = orte_ess.proc_get_epoch(&peer_name);
>>>> 
>>>> 
>>>> I'm still not convinced this is a rational coding style.  It seems to
>>>> me
>>>> that if you're just going to copy the epoch from peer_name to
>>>> peer_name,
>>>> just assign the epoch to INVALID and be done with it.  There's no need
>>>> for
>>>> both the assignment added in this patch and the line after it.
>>>> 
>>>> Now, normally I'd say that this is fixing a bug and having to fix other
>>>> people's bad code shouldn't be your problem.  But since you wrote both
>>>> lines, fixing it to make sense seems reasonable to me ;).
>>>> 
>>>> Brian
>>>> 
>>>> --
>>>> Brian W. Barrett
>>>> Dept. 1423: Scalable System Software
>>>> Sandia National Laboratories
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Brian W. Barrett
>>> Dept. 1423: Scalable System Software
>>> Sandia National Laboratories
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
> 
> 
> -- 
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to