On Tue, Apr 9, 2013 at 11:44 PM, John Benediktsson <mrj...@gmail.com> wrote:
> Right, an algorithm that can identity possible duplications and a human that
> guides the tool to ignore or not the recommendation, and determine a good
> name for the resulting new word.

Think I have an idea for the algorithm.

We need something similar to "all-subsets".

http://docs.factorcode.org/content/word-all-subsets,math.combinatorics.html

The output is slightly different, however.

Let's call our version, "all-slices".

Here is an example ...

(scratchpad) { "a" "b" "c" "d" } all-slices
{ { "a" "b" "c" "d" } { "a" "b" "c" } { "b" "c" "d" } { "a" "b" } {
"b" "c" } { "c" "d" } }

We're not including individual elements, because it doesn't make sense
to refactor a single word as a new word.

First, we convert our word definition into an array of strings ...

"dup last file-info directory?" becomes { "dup" "last" "file-info"
"directory?" }

We calculate ...

{ "dup" "last" "file-info" "directory?" } all-slices

For each output array, we search the source file for occurrences of
its concatenation.

We find that { "dup" "last" "file-info" "directory?" } occurs
elsewhere in the file.

Let's say that the user declines to refactor at this point.

{ "dup" "last" "file-info" } also occurs elsewhere.

The user declines again.

{ "last" "file-info" "directory?" }

The user decides to refactor at this point and extracts "last
file-info directory?" into a new word.

This worked ok, but we ignored the if statement at the end of move-into-dir.

An if statement has more organization than an array of strings can express.

So it appears that we need to do more than just search the source file
for occurrences of combinations of strings.

We need to actually parse each word definition into an abstract syntax tree.

We need to traverse each tree, from top to bottom, BFS.

For each node, check whether an equivalent subtree occurs elsewhere.

In code that doesn't use variables, equivalence should be easy to determine.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to