Howdy all - it's been a long time, but I was just planning a return to NM to see family for the holiday and reading Mandelbrot's book on finance, and the pairing reminded me of FRIAM.
I have something I'm puzzling with, which you guys might find interesting. I want to calculate the fractal dimension of repetitive text, specifically code. I'm building a system to do automated refactoring. It's essentially a compiler. I see this as a business project, but the fractal dimensionality, I don't see any actual business in that. It's just a mild personal obsession. I read Mandelbrot's "Fractal Geometry of Nature" when I was a teenager and it changed my life. So anyway, consider a code base at a tech company. Typically, these code bases are measured in lines of code. It's a one-dimensional measurement, so in a sense, a "line" whose length is determined by the number of lines of code. However, lines of code exhibit self-similarity, and the self-similarity increases with the clumsiness of the syntax of the programming language, and with the incompetence or indifference of the programmers who wrote it. This comes from tons of personal experience, and from the fact that the less competent a programmer is, or the less they care about the company, or the more confusing a language's syntax is, the more likely a programmer is to just copy and paste some code, instead of actually figuring out what it does. I think the "line" formed by measuring the number of these lines of code would be more strictly be considered a curve, as in the Koch curve, which is a highly self-similar curve with a fractal or Hausdorff dimension greater than one; its fractal dimension comes from its self-similar filagrees. Fractals 101. http://en.wikipedia.org/wiki/Koch_snowflake http://en.wikipedia.org/wiki/Hausdorff_dimension You create the Koch curve by taking a line, inserting a deviation, and then dividing the result into fourths, and repeating the process on each fourth. You end up with this thing that looks like a snowflake or a fuzzy Star of David. Many programmers write their code by copying a line and inserting some minor variation. E.g., say I have this: this.moduleOverlay.data.info.specifics.details.render(this.display, "hello world", this); That's pretty much a real-world example. Say I need the application this comes from to display "howdy world" a little later on. Even though I think it's kind of a horrible thing to do, for expediency's sake, I might copy this code, paste it, and just modify the literal: this.moduleOverlay.data.info.specifics.details.render(this.display, "howdy world", this); Now imagine this happening very frequently in a company with a very high rate of copy-paste. The company creates its entire code base by copying lines and inserting deviations. You very soon have a very self-similar corpus of text. The recursivity in the Koch curve also has an analog, because programmers will not just copy-paste line by line, but also copy-paste giant blocks of code, instead of refactoring to objects. So although the copying is not a recursive process in most cases, you do have repetition at multiple scales. The result is that the code base, considered as a whole, has some degree of fractal self-similarity. If we take lines of code as our metric, we're going to have some kind of fractal dimensionality in the result, something greater than 1 but less than 2, just like the coastline of Britain. (again, http://en.wikipedia.org/wiki/Hausdorff_dimension) In reality, however, lines of code is a ridiculous metric, because it's meaningless. If somebody offers you a consulting gig and tells you they have a thousand lines of code, is that highly repetitive code, or entirely unique code? You could technically be talking about the same line repeated a thousand times. (In fact I worked at a company in Santa Fe which was just about that dumb, and I know if Carl is still on here, he knows who I'm talking about.) In order to do any kind of useful analysis on code, you need to turn the code into a parse tree, and analyze the tree. For instance, my automated refactoring code is still in its infancy, but can detect simple kinds of repetition and similarity, and does so by converting code into parse trees, and then comparing the trees. It's downright tautological to say that if you're looking at trees which contain subtrees, and those subtrees are equal or similar to other subtrees (both subtrees within the same tree and subtrees found in other trees), and further that the same patterns of repetition shaped both the macro and micro scales of the tree, you are talking about fractals. But I can't find anything on how to calculate the fractal dimension of a tree! The best thing I've found is something on how to calculate the fractal dimension of a network, which I'm guessing could be clumsily coerced into a method for trees. There's plenty out there about how to use fractals to **generate** an irregular tree, so I may be able to use something like that and go backwards. OK, and I found this: http://www.trusoft-international.com/ But it costs about $250, and I want something Unix-y or RESTful so I can use it programmatically. -- Giles Bowkett http://gilesbowkett.com
============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College lectures, archives, unsubscribe, maps at http://www.friam.org