one could extend that idea to include multiple vocabularies and
create/extend vocabularies, though i could imagine this getting kind-of
tedious and messy :-D

you would also want some constraints to what should be considered for
refactoring, as in actually reducing codesize: you wouldn't want to
refactor 2 words used in two places like dip dup (i know, bad example) and
call it dipperdupper.

just my two cent

On Wed, Apr 10, 2013 at 9:38 AM, leonard <leonard14...@gmail.com> wrote:

> On Tue, Apr 9, 2013 at 11:44 PM, John Benediktsson <mrj...@gmail.com>
> wrote:
> > Right, an algorithm that can identity possible duplications and a human
> that
> > guides the tool to ignore or not the recommendation, and determine a good
> > name for the resulting new word.
>
> Think I have an idea for the algorithm.
>
> We need something similar to "all-subsets".
>
> http://docs.factorcode.org/content/word-all-subsets,math.combinatorics.html
>
> The output is slightly different, however.
>
> Let's call our version, "all-slices".
>
> Here is an example ...
>
> (scratchpad) { "a" "b" "c" "d" } all-slices
> { { "a" "b" "c" "d" } { "a" "b" "c" } { "b" "c" "d" } { "a" "b" } {
> "b" "c" } { "c" "d" } }
>
> We're not including individual elements, because it doesn't make sense
> to refactor a single word as a new word.
>
> First, we convert our word definition into an array of strings ...
>
> "dup last file-info directory?" becomes { "dup" "last" "file-info"
> "directory?" }
>
> We calculate ...
>
> { "dup" "last" "file-info" "directory?" } all-slices
>
> For each output array, we search the source file for occurrences of
> its concatenation.
>
> We find that { "dup" "last" "file-info" "directory?" } occurs
> elsewhere in the file.
>
> Let's say that the user declines to refactor at this point.
>
> { "dup" "last" "file-info" } also occurs elsewhere.
>
> The user declines again.
>
> { "last" "file-info" "directory?" }
>
> The user decides to refactor at this point and extracts "last
> file-info directory?" into a new word.
>
> This worked ok, but we ignored the if statement at the end of
> move-into-dir.
>
> An if statement has more organization than an array of strings can express.
>
> So it appears that we need to do more than just search the source file
> for occurrences of combinations of strings.
>
> We need to actually parse each word definition into an abstract syntax
> tree.
>
> We need to traverse each tree, from top to bottom, BFS.
>
> For each node, check whether an equivalent subtree occurs elsewhere.
>
> In code that doesn't use variables, equivalence should be easy to
> determine.
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to