Hi Andrew (and list):

To finish up this topic, my previous posts simply revealed some of my
fundamental misunderstandings of how valgrind deals with uninitialized
variables.  I know better now; valgrind was acting as designed.

Here is what I have learned about this.

>From http://valgrind.org/docs/manual/faq.html

<quote>
5.3.   Memcheck's uninitialised value errors are hard to track down,
because they are often reported some time after they are caused. Could
Memcheck record a trail of operations to better link the cause to the
effect? [...]

Prior to version 3.4.0, the answer was "we don't know how to do it
without huge performance penalties". As of 3.4.0, try using the
--track-origins=yes option. It will run slower than usual, but will
give you extra information about the origin of uninitialised values.
</quote>

To find out about the circumstances where valgrind will actually
complain about an uninitialized value, here is what the valgrind
documentation had to say (from
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.uninitvals)

<quote>
It is important to understand that your program can copy around junk
(uninitialised) data as much as it likes. Memcheck observes this and
keeps track of the data, but does not complain. A complaint is issued
only when your program attempts to make use of uninitialised data in a
way that might affect your program's externally-visible behaviour[....]
</quote>

What is not clear from the above is whether arithmetic operations on
uninitialized values trigger a valgrind complaint or not.  The 2003
article http://valgrind.org/gallery/linux_mag.html has an example of
division of an uninitialized variable by an initialized variable which
generated the following valgrind message:

==12903== Conditional jump or move depends on uninitialised value(s)
==12903==    at 0x8079FCC: __divdi3 (in /sbin/fsck.jfs)
==12903==    by 0x805CB0E: validate_super (fsckmeta.c:2331)
==12903==    by 0x805C266: validate_repair_superblock (fsckmeta.c:1833)
==12903==    by 0x806E2B5: initial_processing (xchkdsk.c:1968)

So in that case, the math library routine that does division
presumably had an if statement involving the uninitialized numerator
which triggered the message.  But you cannot count on that (math
library having if statements) for all arithmetic operations.
Furthermore, even with division that doesn't seem to be happening any
more (or it is valgrind suppressed) with the Linux math library. The
following simple test code shows the issue:

#include <stdlib.h>
#include <stdio.h>
int main(void) {
   double *x, y=1., z;
   x=(double *) malloc(sizeof(double));
   z = *x/y;
   printf("%s %f\n", "z = *x*y =", z);
}

There are no valgrind messages at all concerning the "z = *x/y;" line
(although plenty about the subsequent printf.)

For the ephcom2 code issue that started me down this trail of
learning, the valgrind behaviour is consistent with the conclusion
that no arithmetic operations on uninitialized values are going to
cause a message from valgrind (so long as you avoid division by zero,
which is a different issue).

I did finally (without the aid of valgrind) track down where two
components of a vector on the stack which is part of the coords struct
were uninitialized in the ephcom2 code, and solving that issue quieted
all valgrind messages.  So what happened is those two uninitialized
vector components on the stack were involved in some arithmetic
operations (which triggered no complaints from valgrind) and
ultimately ended up contributing to the the value of two of the
components of the testr vector (also on the stack), and the testdel
value. valgrind only produced a message concerning uninitialization
when an attempt was made to use the testdel value in an if statement.

The reason I emphasize the stack above for the vectors involved is
that the --track-origins=yes valgrind option mentioned at the start of
this only weakly identifies the location of the original uninitialized
data for stacked data as the valgrind man page explains:

<quote>
For uninitialised values originating from a heap block, Memcheck shows
where the block was allocated. For uninitialised values originating
from a stack allocation, Memcheck can tell you which function
allocated the value, but no more than that.
</quote>

So in the ephcom2 case, valgrind was limited to telling me that the
uninitialized problem had something to do with some stacked variable
in main.  That actually is useful to know, but does not give nearly
the detail you get with uninitialized values associated with the heap.
For example, valgrind with --track-origins=yes for the above test code
emits the following message

==20780==  Uninitialised value was created by a heap allocation
==20780==    at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==20780==    by 0x400553: main (test_valgrind_heap_uninitialized.c:6)

That 6th line is the one that mallocs x so it immediately identifies
the source of the issue.  (As expected, the equivalent test for the
stack case simply says something stacked in main is the source of the
issue.)

In retrospect what I should have done instead of publicly complaining
about valgrind was the above research first to understand valgrind's
uninitialized variable reporting limitations and the important
difference in behaviour for --track-origins=yes between stacked
and heaped (malloced) data.  My response to

==2307== Conditional jump or move depends on uninitialised value(s)
==2307==    at 0x4013C4: main (testeph.c:181)
==2307==  Uninitialised value was created by a stack allocation
==2307==    at 0x400CF9: main (testeph.c:16)

should have been to immediately start looking systematically backward
in the code (along with sprinkling some if's around which appears to
be the only sure-fire method of getting valgrind to complain about
uninitialized problems) to find the origin of the chain of
mathematical computations that lead to valgrind complaining about an
uninitialized problem concerning testdel.  I got there eventually and
fixed the uninitialized problem with the ephcom2 code, but the
debugging process would have been a lot faster if I had trusted what
valgrind was telling me.

So this whole exercise has been quite a valgrind uninitialized
variable learning experience for me, and I hope the rest of you who
are interested in valgrind issues have managed to follow and learn
from my experience as well.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to