On Jun 17, 2007, at 12:56 AM, Joshua ben Jore wrote:

On 6/16/07, Chris Dolan <[EMAIL PROTECTED]> wrote:
Josh,

Josh, can you explain to us in a little more depth what this means?
Are you showing that certain input values follow the same path
through the code?

Yes.

It looks like the full path through the code is
the key to your hash of runs.  If I've understood that part
correctly, then I'm still having trouble understanding where you go
from there.  How is that a measure of code coverage?  Are you
planning to then compare those paths to the full tree of opcodes?

Ah, sorry. I have a problem at work where there's a pile of nasty code
which deals in really gigantic piles of input per day. Much of the
data is similar and I assume triggers identical paths through the
code. If I wanted to get "full" coverage I could probably use a
suitably gigantic pile of input because that would probably be
sufficient to cover all or many of the potential code paths.

It is impractical to write tests using those monstrously large piles
of input. I want to filter out input which triggers redundant code
paths and retain only those data that cause a unique behaviour.

My example was an evenness detector. In one possible scenario I figure
"several thousand" numbers are sufficient to cover all the possible
code paths because "I don't understand this function." Maybe I really
have hundreds of millions of numbers being tested and I really want to
know if I get any additional behaviors by trying 4 if I've already
tried 2.

What I learn from using this is that I can get equivalent coverage
from both inputs 2 and 4. I can now opt to write my tests using just
the number 2 because I don't learn anything new by using any
additional numbers like 4, 6, or 8.

So I can test for which inputs will add to my code coverage and which will not.

Josh

Got it. Thanks very much for the explanation. Does anyone know if there's been research in a topic like this? That is, what size/ complexity can your code grow to be before small changes in inputs create wildly divergent code paths? It sounds (distantly) related to the code tracing that software like Coverity is doing, but on a macro scale.

I wonder if path-finding algorithms (computer games, highway route finders, etc) have any relevance in this space? Hmm, on further thought that's probably wrong since (to put it mildly) code has much stronger opinions about which path to take than people or bots.

Chris

Reply via email to