Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-04 Thread Andreas Labres
On 03.12.14 17:14, Andy Allan wrote:
 Thanks for the analysis, I hope it provides developers with ideas for
 combatting it via the automated spam filters that we already have[1].

I'd suggest to extend/refine the automated filter somewhat. Say:

* a novice ist not allowed to post at all
* a novice who did some changesets is allowed to post say once per day
* an intermediate is allowed to post say once per hour
* for an expert (either subscribed for years or lots of changesets) the
posting limit is waived

One could even think to allow experts to delete other user's posts (because of
spam). Of course a log has to be maintained. And so no special people
(moderators etc.) are needed!

And of course the parameters need to be optimized:

* how long is a user a novice?
* is 10 changesets enough to allow him/her to post?
* when does the intermediate level start? 2 years? 100 changesets?
* what are the achievements to reach the expert level? 4 years? 1000 
changesets?

Those parameters could be tweaked on the fly, I'd say.

/al

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-04 Thread Tom Hughes

On 04/12/14 11:17, Andreas Labres wrote:

On 03.12.14 17:14, Andy Allan wrote:

Thanks for the analysis, I hope it provides developers with ideas for
combatting it via the automated spam filters that we already have[1].


I'd suggest to extend/refine the automated filter somewhat. Say:

* a novice ist not allowed to post at all
* a novice who did some changesets is allowed to post say once per day
* an intermediate is allowed to post say once per hour
* for an expert (either subscribed for years or lots of changesets) the
posting limit is waived


So in other words, most of things we already factor in to our spam 
scoring... We're just not quite as rigid.


In particular you can still post (within reason) without having made any 
edits - it is actually surprisingly common for non-spammers to do that.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-04 Thread Tom Hughes

On 03/12/14 16:14, Andy Allan wrote:


However, spam is an arms race, and I think we might need a different
long-term approach. I know in the past using 3rd-party spam filtering
services was too expensive (and not really very OSM-ish either).


The main such system is akmiset and I'd love to use it but (a) it costs 
money and (b) to make it most effective we would have to send it things 
like email addresses and IP addresses which I figure people may object to.



Perhaps we need a new set of human content moderators on the site, say
40-80 people with a variety of languages between them. We can consider
grey-listing all accounts - i.e. the first few posts of every account
is held for review automatically by default, and enable direct posting
after we're more certain they aren't a spammer.


Once we have a review queue and moderator system then obviously it 
becomes trivial to do things like holding posts from new users for 
moderation - we need the basic infrastructure first though.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-04 Thread Andreas Labres
On 04.12.14 12:33, Tom Hughes wrote:
 So in other words, most of things we already factor in to our spam scoring...
 We're just not quite as rigid.

A (hidden) spam score is bad (IMO). Nobody sees it, almost nobody can test it.

A documented user level with documented rules would make much more sense and
(IMO) would much more likely be accepted.

 In particular you can still post (within reason) without having made any edits
 - it is actually surprisingly common for non-spammers to do that. 

OSM is not a blog site. OSM is about making the data better. Once you have
somehow figured out a little bit how OSM works, you could blog about it. IMO.

/al

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-04 Thread Tom Hughes

On 04/12/14 12:06, Andreas Labres wrote:

On 04.12.14 12:33, Tom Hughes wrote:

So in other words, most of things we already factor in to our spam scoring...
We're just not quite as rigid.


A (hidden) spam score is bad (IMO). Nobody sees it, almost nobody can test it.


Nothing is hidden:

https://github.com/openstreetmap/openstreetmap-website/blob/master/app/models/user.rb#L210

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Andrew Hain
A spammer is periodically posting messages in Chinese to the User Diaries.
These diaries follow a distinct pattern:

1. Reading machine translations, the messages advertise a variety of
products and services that are against the law. This may be to attract
people who would be reluctant to contact the authorities and admit what they
are looking for.

2. Diaries are posted in batches of considerable size (up to 20+), typically
differing only in having different names of cities and provinces in the
text. This would appear to be targeted at searches through search engines.

3. Diaries rarely contain links (occasional exceptions) so cannot be
targeted at search engine rankings for pages hosted away from OSM.

4. Numbers preceded with the letters QQ appear regularly; these may be
accounts with the Tencent QQ messaging service.

5. The spammer has come back repeatedly creating new accounts so it is
likely that that the operation is successful.

I have not followed any message account, keyword or link in any of the
spams. Among other issues I am wary about possible malware in scam pages.

--
Andrew


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Andy Allan
On 3 December 2014 at 15:46, Andrew Hain andrewhain...@hotmail.co.uk wrote:
 A spammer is periodically posting messages in Chinese to the User Diaries.

Thanks for the analysis, I hope it provides developers with ideas for
combatting it via the automated spam filters that we already have[1].

However, spam is an arms race, and I think we might need a different
long-term approach. I know in the past using 3rd-party spam filtering
services was too expensive (and not really very OSM-ish either).
Perhaps we need a new set of human content moderators on the site, say
40-80 people with a variety of languages between them. We can consider
grey-listing all accounts - i.e. the first few posts of every account
is held for review automatically by default, and enable direct posting
after we're more certain they aren't a spammer.

Of course, this would all need coding, but I'm interested in other
people's ideas. The current situation where our spam filters can be
overwhelmed, and all the removal of spam depends on full-blown
system-administrator[2] accounts, isn't perfect!

Thanks,
Andy

[1] 
https://github.com/openstreetmap/openstreetmap-website/blob/master/app/models/user.rb#L211
[2] 
https://github.com/openstreetmap/openstreetmap-website/blob/master/app/controllers/diary_entry_controller.rb#L10

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Martin Koppenhoefer
2014-12-03 17:14 GMT+01:00 Andy Allan gravityst...@gmail.com:

 Thanks for the analysis, I hope it provides developers with ideas for
 combatting it via the automated spam filters that we already have[1].

 However, spam is an arms race, and I think we might need a different
 long-term approach. I know in the past using 3rd-party spam filtering
 services was too expensive (and not really very OSM-ish either).
 Perhaps we need a new set of human content moderators on the site, say
 40-80 people with a variety of languages between them. We can consider
 grey-listing all accounts - i.e. the first few posts of every account
 is held for review automatically by default, and enable direct posting
 after we're more certain they aren't a spammer.




maybe we could have a crowd-sourced approach and introduce a spam-flag
that logged-in users could set, i.e. another button in the comment,
reply line which says something like flag as spam, with a counter, and
if more than x people have clicked on it we would automatically or manually
hide/delete the post. This should work similar to our stackexchange-like
helpsystem (you can flag or unflag with the same button).

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Andy Allan
On 3 December 2014 at 16:25, Martin Koppenhoefer dieterdre...@gmail.com wrote:

 maybe we could have a crowd-sourced approach and introduce a spam-flag
 that logged-in users could set, i.e. another button in the comment,
 reply line which says something like flag as spam, with a counter, and
 if more than x people have clicked on it we would automatically or manually
 hide/delete the post.

Good idea.

I'd also like to do something to prevent (as well as react to) spam,
since reaction-only processes still fill RSS/atom feeds and relayers
like https://twitter.com/osmblogs with spam posts.

Cheers,
Andy

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Serge Wroclawski
I think the solution to this is actually pretty simple and straightforward.

First, right now there's only a single person who can remove spam from
diary entries or profiles.

Allowing other people (such as existing site moderators) to address
this would go a long way.

Second of all, we need a flagging mechanism. I know that Tom wants a
complete solution that includes a work queue, etc. I think that's a
very laudable goal, but something that just sends an email would be
great for now.

Third, I think we can get our queue, etc. though either funding, or
else through a GSoC project next year, as we did with Changeset
Discussions. I volunteer to mentor for it.

- Serge

On Wed, Dec 3, 2014 at 11:25 AM, Martin Koppenhoefer
dieterdre...@gmail.com wrote:

 2014-12-03 17:14 GMT+01:00 Andy Allan gravityst...@gmail.com:

 Thanks for the analysis, I hope it provides developers with ideas for
 combatting it via the automated spam filters that we already have[1].

 However, spam is an arms race, and I think we might need a different
 long-term approach. I know in the past using 3rd-party spam filtering
 services was too expensive (and not really very OSM-ish either).
 Perhaps we need a new set of human content moderators on the site, say
 40-80 people with a variety of languages between them. We can consider
 grey-listing all accounts - i.e. the first few posts of every account
 is held for review automatically by default, and enable direct posting
 after we're more certain they aren't a spammer.




 maybe we could have a crowd-sourced approach and introduce a spam-flag
 that logged-in users could set, i.e. another button in the comment,
 reply line which says something like flag as spam, with a counter, and
 if more than x people have clicked on it we would automatically or manually
 hide/delete the post. This should work similar to our stackexchange-like
 helpsystem (you can flag or unflag with the same button).

 Cheers,
 Martin

 ___
 dev mailing list
 dev@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/dev


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Tom Hughes

On 03/12/14 16:25, Martin Koppenhoefer wrote:


maybe we could have a crowd-sourced approach and introduce a spam-flag
that logged-in users could set, i.e. another button in the comment,
reply line which says something like flag as spam, with a counter,
and if more than x people have clicked on it we would automatically or
manually hide/delete the post. This should work similar to our
stackexchange-like helpsystem (you can flag or unflag with the same button).


Because nobody has ever thought of that before, or maybe discussed how 
it might work, or... Oh, but they have:


https://github.com/openstreetmap/openstreetmap-website/issues/841

You might also notice that there hasn't actually been any such spam for 
nearly a fortnight now. Maybe the administrators noticed there was a 
problem and made a change to combat it?


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Andy Allan
On 3 December 2014 at 16:33, Serge Wroclawski emac...@gmail.com wrote:

 First, right now there's only a single person who can remove spam from
 diary entries or profiles.

Not strictly true - any user with site administrator priviledges can
remove spam - see my previous link to the code. There are multiple
people who have those privileges. Of course, in reality it's mainly
one person (Tom) who does the work.

 Second of all, we need a flagging mechanism. I know that Tom wants a
 complete solution that includes a work queue, etc. I think that's a
 very laudable goal, but something that just sends an email would be
 great for now.

The flagging mechanism is still reactive - I'd like to look at ideas
for blocking spam before it hits the site.

Cheers,
Andy

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Antje
There’s another suspicious post at 
http://www.openstreetmap.org/user/Medyum%20Y%C4%B1lmaz%20Eren%20Hoca/diary/28134,
 which is Turkish.

I personally prefer an increase in human blog moderators.
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Chinese spam diaries, an analysis

2014-12-03 Thread Ian Dees
On Wed, Dec 3, 2014 at 2:17 PM, Antje 2...@minoa.li wrote:

 There’s another suspicious post at
 http://www.openstreetmap.org/user/Medyum%20Y%C4%B1lmaz%20Eren%20Hoca/diary/28134,
 which is Turkish.

 I personally prefer an increase in human blog moderators.


I agree. It's usually pretty obvious (even without knowing the language)
when a diary post is spam.

I'm happy to help if someone gives me the ability to do it.
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev