Hi, There is obviously plenty of data that represents "good" changes. The data working group reversions could be used to train a classifier on what a bad edit looks like. After that, looking for change sets that are logically erased after a short period of time (say 2 weeks), might also yield some bad change set.
Jason On Sun, Dec 18, 2016 at 6:38 PM, Animesh Sinha <[email protected]> wrote: > Hi, > > I am a first year masters students at Purdue University and would like to > propose a project idea for GSoC 2017. I have worked on Vandalism Detection > in Wikipedia in the past and understand how important it is to predict if an > information is correct or not as it may be misleading to others. > > Hence, I would like to propose this project idea: > > Title: Detect if a user edit made in OSM is a vandal edit or regular. > Summary: It's a very challenging task to monitor the malicious edits or > spams manually for a large active user base. I plan to identify the cases of > vandalism on OSM by classifying edits as either regular or vandal. This is > clearly a Binary Classification task, but if the distribution of regular and > vandalism cases in the dataset are skewed, it can also be explored as an > Anomaly Detection problem. > Requirements: Lots of data about the edits made, information about the users > making the edit, information about the people annotating the true labels, > etc. > > I would appreciate if someone can provide a feedback on the project idea and > the requirements needed. > > Thanks, > Animesh Sinha > > _______________________________________________ > dev mailing list > [email protected] > https://lists.openstreetmap.org/listinfo/dev > _______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

