Re: Open source archives hosting malicious software packages

2017-09-21 Thread David Precious
On Fri, 22 Sep 2017 01:00:22 +1200
Kent Fredric  wrote:

> On 22 September 2017 at 00:11, David Cantrell 
> wrote:
> 
> > But is anyone paying attention? I assume you're talking about
> > #cpantesters, which I'm on, but I hardly ever look at it, and when
> > I do look I certainly don't look at scrollback, let alone looking at
> > scrollback *carefully*.  
> 
> It gets duty on freenode #perl too, and its not uncommon for people
> like me to glance at https://metacpan.org/recent ( usually to see
> something and regret looking )


Yeah, freenode/#perl is the one I was referring to - 500+ sets of
eyeballs (although how many of them are people likely to recognise
typo-squatting of popular modules and go check them out I don't know).

Certainly agree that something that automatically flags up anything
suspicious-looking would be good - to a mailing list would have the
benefit of not being missed if nobody was looking at the time.  I'd
certainly be happy enough to sit on such a mailing list and help check
anything dodgy-looking.


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Kent Fredric
On 22 September 2017 at 00:11, David Cantrell  wrote:

> But is anyone paying attention? I assume you're talking about
> #cpantesters, which I'm on, but I hardly ever look at it, and when I do
> look I certainly don't look at scrollback, let alone looking at
> scrollback *carefully*.

It gets duty on freenode #perl too, and its not uncommon for people
like me to glance at https://metacpan.org/recent ( usually to see
something and regret looking )



-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


Re: Open source archives hosting malicious software packages

2017-09-21 Thread David Cantrell
On Wed, Sep 20, 2017 at 11:13:50PM +0100, David Precious wrote:

> One thing I thing is good to consider is the fact that all CPAN releases
> get announced on a quite populated IRC channel, increasing the chance of
> someone spotting a release announcement and thinking "hmm, that looks
> dodgy" - but that's of course not entirely reliable, and doesn't focus
> only on new releases.

But is anyone paying attention? I assume you're talking about
#cpantesters, which I'm on, but I hardly ever look at it, and when I do
look I certainly don't look at scrollback, let alone looking at
scrollback *carefully*.

-- 
David Cantrell | Godless Liberal Elitist

Planckton: n, the smallest possible living thing


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Kent Fredric
On 21 September 2017 at 20:24, Neil Bowers  wrote:

> I’ll tweak my script to not worry about packages in the same distribution
> (eg Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of
> new packages each day, and I’m just about there :-)

I'd probably want PAUSE trust modelling to play a part too. On the
basis that people are unlikely to typo-squat themselves, and that
recognized, reputable authors are less likely to typo-squat.

(Because reputation is an important thing to maintain in opensource,
tarnish your reputation and nobody will use your stuff any more)

Which, by inversion, means that newer authors are more disposed to
typo-squatting, and that people are more likely to typo squat things
dissimilar to what they already own.

A long time ago, I was discussing with somebody, I cant remember who,
that we could generalize this problem as a public feed, allowing
anyone to review new module permissions assignments and changes.

Having public access to the permissions list is good, but having some
sort of feed that makes it public knowledge every time a new
permission occurs, or every time a permission change occurs, would do
wonders for this problem ( and others, like the surprise change of
hands of important but undermaintained modules into the hands of
potentially too keen maintainers )

It would even expose attempts at smuggling typo-squatted names in the
back of distros with dissimilar names, similar to cuckoo-packages.


-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Neil Bowers
> Would anyone know of any prior art for detection of "short edit distances"?  
> (Perhaps even already on CPAN?)

As David & Zefram pointed out, Levenshtein is the classic algorithm for this, 
but there are plenty of others; in the SEE ALSO for Text::Levenshtein I’ve 
listed at least some of the ones I know of on CPAN:
https://metacpan.org/pod/Text::Levenshtein#SEE-ALSO

A better algorithm for this purpose is the Damerau-Levenshtein edit distance:
Classic Levenshtein counts the number of insertions, deletions, and 
substitutions needed to get from one string to the other. Comparing 
"Algorithm::SVM" and "Algorithm::VSM” gives an edit distance of 2.
The Damerau variant adds transpositions of adjacent characters. This results in 
an edit distance of 1 for the example above, which is how my script found it.

I used Text::Levenshtein::Damerau::XS, because it’s quicker. That’s how I found 
the examples I gave yesterday.

I’ll tweak my script to not worry about packages in the same distribution (eg 
Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of new 
packages each day, and I’m just about there :-)

Neil