Ugh. Should think before hitting "send".
Point is, the code in `similarity` should count how many nodes are
different; in the case where one sequence is long than another, I wanted it
to effectively add the difference between the lengths (all the "remainder"
nodes on the longer sequence are necessarily different). Which isn't what
the code in my previous message did. Maybe just
: similarity ( def1 def2 -- score )
[ weighted-number-of-shared-nodes ]
[
identity-tuple new pad-longest
[ number-of-different-nodes ] 2map-sum
] 2bi over + / ;
I'll have to step away now, before I make more of a fool of myself. :)
Spam spam spam,
--Alex Vondrak
On Wed, Apr 10, 2013 at 7:57 PM, Alex Vondrak <ajvond...@gmail.com> wrote:
> Self bug-report:
>
> IN: scratchpad [ 1 2 3 ] [ 1 ] similarity .
> 1
>
> Oh well. Suppose we'd need
>
> : similarity ( def1 def2 -- score )
> [ weighted-number-of-shared-nodes ]
> [
> [ max-length ]
> [ [ number-of-different-nodes ] 2map-sum ] 2bi -
> ] 2bi over + / ;
>
> But again, a little ambiguous (treating sequences as n-ary trees). Just a
> heuristic, I guess.
>
> --Alex Vondrak
>
>
>
> On Wed, Apr 10, 2013 at 7:51 PM, Alex Vondrak <ajvond...@gmail.com> wrote:
>
>> In case anyone's interested, attached is my interpretation of the "tree
>> similarity" metric given in the paper I linked. The definition was
>> somewhat vague, so I just did what I thought made sense.
>>
>> IN: scratchpad \ move-to-file \ move-to-dir word-similarity .
>> 35/39
>> IN: scratchpad \ move-to-file \ usage word-similarity .
>> 0
>> IN: scratchpad \ move-to-dir \ move-to-dir word-similarity .
>> 1
>>
>> It would be interesting to implement the rest of the algorithm. See how
>> it does in Factor.
>>
>> Regards,
>> --Alex Vondrak
>>
>>
>>
>> On Wed, Apr 10, 2013 at 6:35 PM, John Benediktsson <mrj...@gmail.com>wrote:
>>
>>> You don't really want a flattened intersection:
>>>
>>> A word definition like this:
>>>
>>> : foo ( x -- x ) [ 2^ ] [ bitor ] bi ;
>>>
>>> Shouldn't match ``set-bit``:
>>>
>>> : set-bit ( x n -- y ) 2^ bitor ; inline
>>>
>>> You probably want something that does something like a deep-each, then
>>> for each subsequence, collecting any subsequence that is a duplicate of all
>>> possible subsequences of all quotations, or something ambitious like that.
>>>
>>> In the lint vocabulary, the lint word looks at all callable's trying to
>>> find any definition that includes it as a subsequence:
>>>
>>> GENERIC: lint ( obj -- seq )
>>>
>>> M: callable lint
>>> [ lint-definitions-keys get-global ] dip [ subseq? ] curry
>>> filter ;
>>>
>>> M: object lint drop f ;
>>>
>>> M: word lint
>>> def>> [ callable? ] deep-filter [ lint ] map concat ;
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 10, 2013 at 5:19 PM, leonard <leonard14...@gmail.com> wrote:
>>>
>>>> On Wed, Apr 10, 2013 at 2:33 PM, John Benediktsson <mrj...@gmail.com>wrote:
>>>>
>>>>> You should really look at how the lint tool works.
>>>>>
>>>>> In particular, look at "lint" and see how it looks for a word which
>>>>> has a definition that is contained in another word (where the second word
>>>>> should be calling the first instead of duplicating its definition).
>>>>>
>>>>> Your version could look for common subsequences instead, perhaps.
>>>>>
>>>>
>>>> Is there a word for calculating the intersection of two deep sequences?
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Precog is a next-generation analytics platform capable of advanced
>>>> analytics on semi-structured data. The platform includes APIs for
>>>> building
>>>> apps and a phenomenal toolset for data science. Developers can use
>>>> our toolset for easy data analysis & visualization. Get a free account!
>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>> _______________________________________________
>>>> Factor-talk mailing list
>>>> Factor-talk@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> _______________________________________________
>>> Factor-talk mailing list
>>> Factor-talk@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>>
>>>
>>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk