In case anyone's interested, attached is my interpretation of the "tree similarity" metric given in the paper I linked. The definition was somewhat vague, so I just did what I thought made sense.
IN: scratchpad \ move-to-file \ move-to-dir word-similarity . 35/39 IN: scratchpad \ move-to-file \ usage word-similarity . 0 IN: scratchpad \ move-to-dir \ move-to-dir word-similarity . 1 It would be interesting to implement the rest of the algorithm. See how it does in Factor. Regards, --Alex Vondrak On Wed, Apr 10, 2013 at 6:35 PM, John Benediktsson <[email protected]> wrote: > You don't really want a flattened intersection: > > A word definition like this: > > : foo ( x -- x ) [ 2^ ] [ bitor ] bi ; > > Shouldn't match ``set-bit``: > > : set-bit ( x n -- y ) 2^ bitor ; inline > > You probably want something that does something like a deep-each, then for > each subsequence, collecting any subsequence that is a duplicate of all > possible subsequences of all quotations, or something ambitious like that. > > In the lint vocabulary, the lint word looks at all callable's trying to > find any definition that includes it as a subsequence: > > GENERIC: lint ( obj -- seq ) > > M: callable lint > [ lint-definitions-keys get-global ] dip [ subseq? ] curry > filter ; > > M: object lint drop f ; > > M: word lint > def>> [ callable? ] deep-filter [ lint ] map concat ; > > > > > > On Wed, Apr 10, 2013 at 5:19 PM, leonard <[email protected]> wrote: > >> On Wed, Apr 10, 2013 at 2:33 PM, John Benediktsson <[email protected]>wrote: >> >>> You should really look at how the lint tool works. >>> >>> In particular, look at "lint" and see how it looks for a word which has >>> a definition that is contained in another word (where the second word >>> should be calling the first instead of duplicating its definition). >>> >>> Your version could look for common subsequences instead, perhaps. >>> >> >> Is there a word for calculating the intersection of two deep sequences? >> >> >> >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> Factor-talk mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/factor-talk >> >> > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > Factor-talk mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/factor-talk > >
similarity.factor
Description: Binary data
------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________ Factor-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/factor-talk
