On Tue, Apr 9, 2013 at 11:44 PM, John Benediktsson <[email protected]> wrote: > Right, an algorithm that can identity possible duplications and a human that > guides the tool to ignore or not the recommendation, and determine a good > name for the resulting new word.
Think I have an idea for the algorithm. We need something similar to "all-subsets". http://docs.factorcode.org/content/word-all-subsets,math.combinatorics.html The output is slightly different, however. Let's call our version, "all-slices". Here is an example ... (scratchpad) { "a" "b" "c" "d" } all-slices { { "a" "b" "c" "d" } { "a" "b" "c" } { "b" "c" "d" } { "a" "b" } { "b" "c" } { "c" "d" } } We're not including individual elements, because it doesn't make sense to refactor a single word as a new word. First, we convert our word definition into an array of strings ... "dup last file-info directory?" becomes { "dup" "last" "file-info" "directory?" } We calculate ... { "dup" "last" "file-info" "directory?" } all-slices For each output array, we search the source file for occurrences of its concatenation. We find that { "dup" "last" "file-info" "directory?" } occurs elsewhere in the file. Let's say that the user declines to refactor at this point. { "dup" "last" "file-info" } also occurs elsewhere. The user declines again. { "last" "file-info" "directory?" } The user decides to refactor at this point and extracts "last file-info directory?" into a new word. This worked ok, but we ignored the if statement at the end of move-into-dir. An if statement has more organization than an array of strings can express. So it appears that we need to do more than just search the source file for occurrences of combinations of strings. We need to actually parse each word definition into an abstract syntax tree. We need to traverse each tree, from top to bottom, BFS. For each node, check whether an equivalent subtree occurs elsewhere. In code that doesn't use variables, equivalence should be easy to determine. ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Factor-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/factor-talk
