Measuring code complexity

John Harrop Sun, 25 Oct 2009 19:08:53 -0700

Reading http://www.paulgraham.com/power.html and specifically the section
titled "Metrics" near the top, I realized that it would be very easy to
calculate such metrics for Lisp code, and it took me literally only seconds
to hack something in Clojure:


user=> (defn count-nodes [data] (if (clojure.contrib.core/seqable? data) (+
1 (reduce + 0 (map count-nodes data))) 1))
#'user/count-nodes
user=> (count-nodes '(+ 3 (* 2 7)))
7

Each node of the AST is counted: 1 for the argument and if the argument is
seqable the sum of count-nodes applied recursively to the child nodes. The
example uses a simple math calculation and correctly counts seven nodes:
five leaves (three integer literals and two symbols) and two non-leaf nodes
(one for the product and one for the sum).

The count-nodes function itself has a code complexity of 22 by the same
measure:

user=> (count-nodes '(defn count-nodes [data] (if
(clojure.contrib.core/seqable? data) (+ 1 (reduce + 0 (map count-nodes
data))) 1)))
22

An alternative version also has 22:

user=> (count-nodes '(defn count-nodes [data] (+ 1 (if
(clojure.contrib.core/seqable? data) (reduce + 0 (map count-nodes data))
0))))
22

The alternative version might be considered preferable for other reasons;
the + 1 for the node passed to the current call is in a single place instead
of repeated. On the other hand it always does an addition even with a leaf
node, so is slightly less processor-efficient.

Both versions can be reduced to 21 by noting that the 0 in the reduce's
argument list is optional. The first version can be reduced further though
by exploiting this argument to reduce:

user=> (count-nodes '(defn count-nodes [data] (if
(clojure.contrib.core/seqable? data) (reduce + 1 (map count-nodes data)))
1))
19

The downside is that the relationship between the two 1s is even more
obscured.

The drop from 22 to 19 exactly corresponds to the complexity cost of adding
something to something: since (+ x y) adds a +, a y, and an invocation to
just x, that cost is 3. This doesn't depend on whether x is a complicated
subexpression or just a constant.

Perhaps something like count-nodes would be a better metric for clojure golf
than counting characters. The effects include that use of #(foo) has no
benefit (the expanded (fn [] (foo)) will be seen by count-nodes regardless)
and there's no penalty for using meaningful identifiers, but there are still
big wins from using higher-order functions, macros, and other potent
abstractions.

Consider this alternative implementation of count-nodes:

(defn count-nodes [data]
  (loop [s [data] n 0]
    (if (empty? s)
      n
      (let [e (first data) r (rest data) x (inc n)]
        (if (clojure.contrib.core/seqable? e)
          (recur (concat r e) x)
          (recur r x))))))

It has, itself, 50 nodes, more than double the initial implementation using
map and reduce and nearly triple the winner above! The imperative style also
is full of bookkeeping complexity, such as managing an explicit stack, which
is abstracted away in the map based implementations; the latter have the
code focused mainly on the node-counting itself rather than on the gritty
details of tree traversal and sum accumulation. The map versions are also
easy to parallelize simply by changing "map" to "pmap"; parallelizing the
imperative version requires explicit futures and in Java would furthermore
require mucking about with parts of java.util.concurrent, probably with a
LinkedBlockingDeque in there somewhere as well as a Future or two and
various anonymous Runnables.

Even a plain old serial implementation in Java is bad:

public class NodeCounter {
    public static int countNodes (Object data) {
        if (!(data instanceof Iterable)) return 1;
        int result = 1;
        for (Object e : (Iterable)data) {
            result += countNodes(e);
        }
        return result;
    }
}

Of course this can't be applied to its own source code without first
converting it into a parse tree whose non-leaf nodes are Iterable. I'll
count by hand, using the rule that a keyword, identifier, operator,
non-delimiter punctuation mark, or literal counts 1 and a balanced pair of
delimiters counts 1 (I'll just count open delimiters and ignore close
delimiters). The 1 for each semicolon replaces the () for each invocation in
a Lisp, and the reduced count exploiting operator precedence to save
parentheses or using combination operators like += may roughly balance the
added count from type identifiers and keywords. I'll also count a cast as 1
rather than the 2 I'd get from counting separately the parenthesis pair and
the enclosed type name. Still I get:

4  public class NodeCounter {
8      public static int countNodes (Object data) {
10         if (!(data instanceof Iterable)) return 1;
5          int result = 1;
8          for (Object e : (Iterable)data) {
6              result += countNodes(e);
0          }
3          return result;
0      }
0  }
------
44

This is actually slightly better than the loop/recur version in Clojure,
despite the types everywhere, probably because of the for-each loop being a
bit more abstract than loop/recur (a while with an explicit iterator would
perhaps be a fairer comparison; and map + reduce is clearly much more
abstract still) and the use of recursion. It would surely blow up to 60 or
even 70 if we added explicit stack management to traverse the tree
non-recursively and used a while loop with an explicit iterator.

The Java version has hidden complexities not counted here, too: explicit
mutable state, in the form of the local variable "result", and a
short-circuit return. As noted previously, parallelizing this would be much
harder even than parallelizing even the loop-recur Clojure implementation.

This illustrates one of Graham's points about language expressiveness well,
as well as pointing up two other advantages of Clojure: the easier
parallelizability of more abstractly specified algorithms using "map" and
from avoiding mutable state, and the benefit of Lisp's code-is-data
philosophy, which makes it easy to make a Clojure function to count AST
nodes in Clojure code (a one-liner!!) and to apply it to Clojure source
(just quote the argument), vs. the Java version not working on its own
source code without adding in a significant amount of parsing code to turn a
Java string containing source code into a count of
terminal-tokens-minus-close-delimiters. (Even a Ruby version would have this
problem!)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Measuring code complexity

Reply via email to