In case anyone's interested, attached is my interpretation of the "tree
similarity" metric given in the paper I linked.  The definition was
somewhat vague, so I just did what I thought made sense.

IN: scratchpad \ move-to-file \ move-to-dir word-similarity .
35/39
IN: scratchpad \ move-to-file \ usage word-similarity .
0
IN: scratchpad \ move-to-dir \ move-to-dir word-similarity .
1

It would be interesting to implement the rest of the algorithm.  See how it
does in Factor.

Regards,
--Alex Vondrak



On Wed, Apr 10, 2013 at 6:35 PM, John Benediktsson <mrj...@gmail.com> wrote:

> You don't really want a flattened intersection:
>
> A word definition like this:
>
>     : foo ( x -- x ) [ 2^ ] [ bitor ] bi ;
>
> Shouldn't match ``set-bit``:
>
>     : set-bit ( x n -- y ) 2^ bitor ; inline
>
> You probably want something that does something like a deep-each, then for
> each subsequence, collecting any subsequence that is a duplicate of all
> possible subsequences of all quotations, or something ambitious like that.
>
> In the lint vocabulary, the lint word looks at all callable's trying to
> find any definition that includes it as a subsequence:
>
> GENERIC: lint ( obj -- seq )
>
> M: callable lint
>     [ lint-definitions-keys get-global ] dip [ subseq? ] curry
>     filter ;
>
> M: object lint drop f ;
>
> M: word lint
>     def>> [ callable? ] deep-filter [ lint ] map concat ;
>
>
>
>
>
> On Wed, Apr 10, 2013 at 5:19 PM, leonard <leonard14...@gmail.com> wrote:
>
>> On Wed, Apr 10, 2013 at 2:33 PM, John Benediktsson <mrj...@gmail.com>wrote:
>>
>>> You should really look at how the lint tool works.
>>>
>>> In particular, look at "lint" and see how it looks for a word which has
>>> a definition that is contained in another word (where the second word
>>> should be calling the first instead of duplicating its definition).
>>>
>>> Your version could look for common subsequences instead, perhaps.
>>>
>>
>> Is there a word for calculating the intersection of two deep sequences?
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Factor-talk mailing list
>> Factor-talk@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>
>>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>

Attachment: similarity.factor
Description: Binary data

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to