On 21/11/2012 01:43, Steven D'Aprano wrote:
On Tue, 20 Nov 2012 20:07:54 +0000, Robert Kern wrote:

The source of bugs is not excessive complexity in a method, just
excessive lines of code.

Taken literally, that cannot possibly the case.

def method(self, a, b, c):
     do_this(a)
     do_that(b)
     do_something_else(c)


def method(self, a, b, c):
     do_this(a); do_that(b); do_something_else(c)


It *simply isn't credible* that version 1 is statistically likely to have
twice as many bugs as version 2. Over-reliance on LOC is easily gamed,
especially in semicolon languages.

Logical LoC (executable LoC, number of statements, etc.) is a better measure than Physical LoC, I agree. That's not the same thing as cyclomatic complexity, though. Also, the relationship between LoC (of either type) and bugs is not linear (at least not in the small-LoC regime), so you are certainly correct that it isn't credible that version 1 is likely to have twice as many bugs as version 2. No one is saying that it is.

Besides, I think you have the cause and effect backwards. I would rather
say:

The source of bugs is not lines of code in a method, but excessive
complexity. It merely happens that counting complexity is hard, counting
lines of code is easy, and the two are strongly correlated, so why count
complexity when you can just count lines of code?

No, that is not the takeaway of the research. More code correlates with more bugs. More cyclomatic complexity also correlates with more bugs. You want to find out what causes bugs. What the research shows is that cyclomatic complexity is so correlated with LoC that it is going to be very difficult, or impossible, to establish a causal relationship between cyclomatic complexity and bugs. The previous research that just correlated cyclomatic complexity to bugs without controlling for LoC does not establish the causal relationship.

Keep in mind that something like 70-80% of published scientific papers
are never replicated, or cannot be replicated. Just because one paper
concludes that LOC alone is a better metric than CC doesn't necessary
make it so. But even if we assume that the paper is valid, it is
important to understand just what it says, and not extrapolate too far.

This paper is actually a replication. It is notable for how comprehensive it is.

The paper makes various assumptions, takes statistical samples, and uses
models. (Which of course *any* such study must.) I'm not able to comment
on whether those models and assumptions are valid, but assuming that they
are, the conclusion of the paper is no stronger than the models and
assumptions. We should not really conclude that "CC has no more
predictive power than LOC". The right conclusion is that one specific
model of cyclic complexity, McCabe's CC, has no more predictive power
than LOC for projects written in C, C++ and Java.

How does that apply to Python code? Well, it's certainly suggestive, but
it isn't definitive.

More so than the evidence that CC is a worthwhile measure, for Python or any language.

It's also important to note that the authors point out that in their
samples of code, they found very high variance and large numbers of
outliers:

[quote]
Modules where LOC does not predict CC (or vice-versa) may indicate an
overly-complex module with a high density of decision points or an overly-
simple module that may need to be refactored.
[end quote]

So *even by the terms of this paper*, it isn't true that CC has no
predictive value over LOC -- if the CC is radically high or low for the
LOC, that is valuable to know.

Is it? What is the evidence that excess, unpredicted-by-LoC CC causes (or even correlates with) bugs? The paper points that out as a target for future research because no one has studied it yet. It may turn out to be a valid metric, but one that has a very specific utility: identifying a particular hotspot. Running CC over whole projects to compare their "quality", as the OP has done, is not a valid use of even that.

LoC is much simpler, easier to understand, and
easier to correct than CC.

Well, sure, but do you really think Perl one-liners are the paragon of
bug-free code we ought to be aiming for? *wink*

No, but introducing more statements and method calls to avoid if statements isn't either.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to