Halls,

1. You might do yourself some good coding of your own, if you can --
possibly using a combination of shell/coding. I'd recommend you doing this,
assuming you're the one in the right :), because you'll be able to get the
custom stats needed for strength in your case, without being limited to
someone else's tools.
2. That being said, maybe a few stats would be useful to some people in
meld. I wonder if kdiff3 outputs stats. kdiff3 is another GUI diff-merge
tool. I use meld and kdiff3.
3. Also, maybe look into the Levenshtein text difference algorithm. In Perl
I use
Text::Levenshtein (_XS). It provides a character-distance between two texts
(ie. how many single-character edits are needed to make one into the
other), which then readily translates to a percentage. In that respect,
it's more literally-related to the amount of change than line counts.

Jag

On Sep 28, 2017 7:09 AM, "Alan Halls" <alanjha...@gmail.com> wrote:

> Thanks Phil for the response, I guess I was thinking of a debug report
> such as:
> Files Analyzed:19,543
> Folders Analyzed:343
> Total lines of code analyzed: 1,544,346
> Total lines of code in source: 1,244,346
> Total lines of code in destination: 1,944,346
> Total lines with exact matches: 856,644
> Unique lines in source: 400,546
> Unique lines in destination: 850,546
> Similarity of source to destination: 45%
> Exact matches of greater than 25 contiguous lines of code: 943
> Exact matches of greater than 5 contiguous lines of code: 46,733
>
> I looked into the plagiarism-detector tools and haven't found anything yet
> that does PHP, and the command line diff tools "should" be able to output
> this type of report, I just figured that all of this info, with the
> exception of the last 2 would be already tracked in the software and just
> need to be output somewhere.
>
> Alan
>
> On Wed, Sep 27, 2017 at 4:14 PM, Phil Hord <phil.h...@gmail.com> wrote:
>
>> Alan,
>>
>> Tools already exist that more directly meet your need.  Any unix-like
>> system will have command-line tools to do most of this analysis.  I'd start
>> with "diff -b -B -w", but you can also use "comm".  The comm tool relies on
>> the files being sorted, though, so you might want to ignore "empty" lines
>> or common lines like </head>, for example.
>>
>> There are some plagiarism-detector tools that may also help, but I don't
>> have any experience with those.
>>
>> Feel free to contact me off-list if you need more specific guidance.
>> Phil
>>
>>
>> On Wed, Sep 27, 2017 at 2:49 PM Alan Halls <alanjha...@gmail.com> wrote:
>>
>>> I am involved in a legal matter regarding an employees theft of trade
>>> secrets. In particular he stole the source code for a website that he and 2
>>> other programmers worked on for 2 years.
>>>
>>> I now have a copy of his project, and of course a copy of mine. I found
>>> the software Meld which seems to do a great job on a one by one basis, but
>>> it would be very time consuming to try to end up with any "score" of how
>>> much of our original code is still in his existing project.
>>>
>>> He was sloppy and his launched public website still has our company info
>>> in the 404 page, which links you to the about us, pricing, docs, contact us
>>> pages ---- which all still have the original code in them, so there is no
>>> question about whether or not he did, just how much "custom" work did he do
>>> for himself.
>>>
>>> I was kind of imagining a report with a total score, then the top 50
>>> matches with each of their scores. Has anyone thought of adding that in? It
>>> seems that all that info would be available already in the program, just
>>> needing a view for it to display on.
>>>
>>> _______________________________________________
>>> meld-list mailing list
>>> meld-list@gnome.org
>>> https://mail.gnome.org/mailman/listinfo/meld-list
>>
>>
>
> _______________________________________________
> meld-list mailing list
> meld-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/meld-list
>
_______________________________________________
meld-list mailing list
meld-list@gnome.org
https://mail.gnome.org/mailman/listinfo/meld-list

Reply via email to