Re: Open source archives hosting malicious software packages
On Fri, 22 Sep 2017 01:00:22 +1200 Kent Fredricwrote: > On 22 September 2017 at 00:11, David Cantrell > wrote: > > > But is anyone paying attention? I assume you're talking about > > #cpantesters, which I'm on, but I hardly ever look at it, and when > > I do look I certainly don't look at scrollback, let alone looking at > > scrollback *carefully*. > > It gets duty on freenode #perl too, and its not uncommon for people > like me to glance at https://metacpan.org/recent ( usually to see > something and regret looking ) Yeah, freenode/#perl is the one I was referring to - 500+ sets of eyeballs (although how many of them are people likely to recognise typo-squatting of popular modules and go check them out I don't know). Certainly agree that something that automatically flags up anything suspicious-looking would be good - to a mailing list would have the benefit of not being missed if nobody was looking at the time. I'd certainly be happy enough to sit on such a mailing list and help check anything dodgy-looking.
Re: Open source archives hosting malicious software packages
On 22 September 2017 at 00:11, David Cantrellwrote: > But is anyone paying attention? I assume you're talking about > #cpantesters, which I'm on, but I hardly ever look at it, and when I do > look I certainly don't look at scrollback, let alone looking at > scrollback *carefully*. It gets duty on freenode #perl too, and its not uncommon for people like me to glance at https://metacpan.org/recent ( usually to see something and regret looking ) -- Kent KENTNL - https://metacpan.org/author/KENTNL
Re: Open source archives hosting malicious software packages
On Wed, Sep 20, 2017 at 11:13:50PM +0100, David Precious wrote: > One thing I thing is good to consider is the fact that all CPAN releases > get announced on a quite populated IRC channel, increasing the chance of > someone spotting a release announcement and thinking "hmm, that looks > dodgy" - but that's of course not entirely reliable, and doesn't focus > only on new releases. But is anyone paying attention? I assume you're talking about #cpantesters, which I'm on, but I hardly ever look at it, and when I do look I certainly don't look at scrollback, let alone looking at scrollback *carefully*. -- David Cantrell | Godless Liberal Elitist Planckton: n, the smallest possible living thing
Re: Open source archives hosting malicious software packages
On 21 September 2017 at 20:24, Neil Bowerswrote: > I’ll tweak my script to not worry about packages in the same distribution > (eg Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of > new packages each day, and I’m just about there :-) I'd probably want PAUSE trust modelling to play a part too. On the basis that people are unlikely to typo-squat themselves, and that recognized, reputable authors are less likely to typo-squat. (Because reputation is an important thing to maintain in opensource, tarnish your reputation and nobody will use your stuff any more) Which, by inversion, means that newer authors are more disposed to typo-squatting, and that people are more likely to typo squat things dissimilar to what they already own. A long time ago, I was discussing with somebody, I cant remember who, that we could generalize this problem as a public feed, allowing anyone to review new module permissions assignments and changes. Having public access to the permissions list is good, but having some sort of feed that makes it public knowledge every time a new permission occurs, or every time a permission change occurs, would do wonders for this problem ( and others, like the surprise change of hands of important but undermaintained modules into the hands of potentially too keen maintainers ) It would even expose attempts at smuggling typo-squatted names in the back of distros with dissimilar names, similar to cuckoo-packages. -- Kent KENTNL - https://metacpan.org/author/KENTNL
Re: Open source archives hosting malicious software packages
> Would anyone know of any prior art for detection of "short edit distances"? > (Perhaps even already on CPAN?) As David & Zefram pointed out, Levenshtein is the classic algorithm for this, but there are plenty of others; in the SEE ALSO for Text::Levenshtein I’ve listed at least some of the ones I know of on CPAN: https://metacpan.org/pod/Text::Levenshtein#SEE-ALSO A better algorithm for this purpose is the Damerau-Levenshtein edit distance: Classic Levenshtein counts the number of insertions, deletions, and substitutions needed to get from one string to the other. Comparing "Algorithm::SVM" and "Algorithm::VSM” gives an edit distance of 2. The Damerau variant adds transpositions of adjacent characters. This results in an edit distance of 1 for the example above, which is how my script found it. I used Text::Levenshtein::Damerau::XS, because it’s quicker. That’s how I found the examples I gave yesterday. I’ll tweak my script to not worry about packages in the same distribution (eg Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of new packages each day, and I’m just about there :-) Neil