Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Platonides
Andrew Garrett wrote: > On Thu, Mar 19, 2009 at 11:54 AM, Platonides wrote: >> PS: Why there isn't a link to Special:AbuseFilter/history/$id on the >> filter view? > > There is. Oops. I was looking for it on the top bar, not at the bottom. I stay corrected.

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread David Gerard
2009/3/19 Aryeh Gregor : > On Thu, Mar 19, 2009 at 5:26 PM, Brian wrote: >> A general point - there is a *lot* of information contained in edits >> that AbuseFilter cannot practically characterize due to the complexity >> of language and the subtelty of certain types of abuse. A system with >> ac

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Aryeh Gregor
On Thu, Mar 19, 2009 at 5:26 PM, Brian wrote: > A general point - there is a *lot* of information contained in edits > that AbuseFilter cannot practically characterize due to the complexity > of language and the subtelty of certain types of abuse. A system with > access to natural language feature

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brian
Ultimately we need a system that integrates information from multiple sources, such as WikiTrust, AbuseFilter and the Wikipedia Editorial Team. A general point - there is a *lot* of information contained in edits that AbuseFilter cannot practically characterize due to the complexity of language an

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Delirium
Brian wrote: > I just wanted to be really clear about what I mean as a specific > counter-example to this just being an example of "reconstructing that > rule set." Suppose you use the AbuseFilter rules on the entire history > of the wiki in order to generate a dataset of positive and negative > ex

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Delirium
Brian wrote: > Delerium, you do make it sound as if merely having the tagged dataset > solves the entire problem. But there are really multiple problems. One > is learning to classify what you have been told is in the dataset > (e.g., that all instances of this rule in the edit history *really > ar

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brian
I just wanted to be really clear about what I mean as a specific counter-example to this just being an example of "reconstructing that rule set." Suppose you use the AbuseFilter rules on the entire history of the wiki in order to generate a dataset of positive and negative examples of vandalism edi

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brian
I presented a talk at Wikimania 2007 that espoused the virtues of combining human measures of content with automatically determined measures in order to generalize to unseen instances. Unfortunately all those Wikimania talks seem to have been lost. It was related to this article on predicting the q

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brion Vibber
On 3/19/09 12:21 PM, Alex wrote: > Yes, in one filter (filter 32) I've been watching, it was taking > 90-120ms for what seemed like simple checks (action, editcount, > difference in bytes), so I moved the editcount check last, in case it > had to pull that from the DB. The time dropped to ~3ms, but

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Alex
Robert Rohde wrote: > On Wed, Mar 18, 2009 at 8:00 PM, Andrew Garrett wrote: > >> To help a bit more with performance, I've also added a profiler within >> the interface itself. Hopefully this will encourage self-policing with >> regard to filter performance. > > Based on personal observations,

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Soxred93
Cobi (owner of ClueBot) and his roomate Crispy have already been working hard to make this specific dataset, but they've been hurt by not enough contributors. The page is here: http://en.wikipedia.org/ wiki/User:Crispy1989#New_Dataset_Contribution_Interface X! On Mar 19, 2009, at 8:15 AM [Ma

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Robert Rohde
On Wed, Mar 18, 2009 at 8:00 PM, Andrew Garrett wrote: > > To help a bit more with performance, I've also added a profiler within > the interface itself. Hopefully this will encourage self-policing with > regard to filter performance. Based on personal observations, the self-profiling is quite n

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brion Vibber
On 3/19/09 5:15 AM, Tei wrote: > since theres already a database, this sounds like could be done flagging > edits as "vandalism", and then reading the existing database information to > extract these details, like ip, a diff of the change, etc.. that way, > humans define what is a "vandalism", a

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Brion Vibber
On Mar 18, 2009, at 20:00, Andrew Garrett wrote: > >> > To help a bit more with performance, I've also added a profiler within > the interface itself. Hopefully this will encourage self-policing with > regard to filter performance. Awesome! Maybe we could use that for templates too ... ;) -- Br

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Tei
On Thu, Mar 19, 2009 at 1:03 PM, Delirium wrote: > Brian wrote: > > This extension is very important for training machine learning > > vandalism detection bots. Recently published systems use only hundreds > > of examples of vandalism in training - not nearly enough to > > distinguish between th

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-19 Thread Delirium
Brian wrote: > This extension is very important for training machine learning > vandalism detection bots. Recently published systems use only hundreds > of examples of vandalism in training - not nearly enough to > distinguish between the variety found in Wikipedia or generalize to > new, unseen f

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 8:00 PM, Andrew Garrett wrote: > I've disabled a filter or two which were taking well in excess of > 150ms to run, and seemed to be targetted at specific vandals, without > any hits. The culprit seemed to be running about 20 regexes to > determine if an IP is in a particul

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Andrew Garrett
Tim Starling wrote: > Robert Rohde wrote: >> For Andrew or anyone else that knows, can we assume that the filter is >> smart enough that if the first part of an AND clause fails then the >> other parts don't run (or similarly if the first part of an OR >> succeeds)?  If so, we can probably optimize

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Platonides
Tim Starling wrote: > Robert Rohde wrote: >> For Andrew or anyone else that knows, can we assume that the filter is >> smart enough that if the first part of an AND clause fails then the >> other parts don't run (or similarly if the first part of an OR >> succeeds)? If so, we can probably optimize

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Soxred93
However, that simply disallows them all. On enwiki, the blanking filter warns the user, and lets them go through with it after confirmation. X! On Mar 18, 2009, at 4:51 PM [Mar 18, 2009 ], jida...@jidanni.org wrote: > AG> frown on page-blanking > > For now I just stop them on my wikis with >

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brian
This extension is very important for training machine learning vandalism detection bots. Recently published systems use only hundreds of examples of vandalism in training - not nearly enough to distinguish between the variety found in Wikipedia or generalize to new, unseen forms of vandalism. A la

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread jidanni
AG> frown on page-blanking For now I just stop them on my wikis with $wgSpamRegex=array('/^\B$/'); I haven't tried fancier solutions yet. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Tim Starling
Robert Rohde wrote: > For Andrew or anyone else that knows, can we assume that the filter is > smart enough that if the first part of an AND clause fails then the > other parts don't run (or similarly if the first part of an OR > succeeds)? If so, we can probably optimize rules by doing easy check

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brion Vibber
On 3/18/09 12:59 PM, Tim Starling wrote: > Brion Vibber wrote: >> On 3/18/09 5:34 AM, Andrew Garrett wrote: >>> I am pleased to announce that the Abuse Filter [1] has been activated >>> on English Wikipedia! >> I've temporarily disabled it as we're seeing some performance problems >> saving edits a

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 12:59 PM, Tim Starling wrote: > Brion Vibber wrote: >> On 3/18/09 5:34 AM, Andrew Garrett wrote: >>> I am pleased to announce that the Abuse Filter [1] has been activated >>> on English Wikipedia! >> >> I've temporarily disabled it as we're seeing some performance problems

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Tim Starling
Brion Vibber wrote: > On 3/18/09 5:34 AM, Andrew Garrett wrote: >> I am pleased to announce that the Abuse Filter [1] has been activated >> on English Wikipedia! > > I've temporarily disabled it as we're seeing some performance problems > saving edits at peak time today. Need to make sure there's

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 12:43 PM, Brion Vibber wrote: > On 3/18/09 5:34 AM, Andrew Garrett wrote: >> I am pleased to announce that the Abuse Filter [1] has been activated >> on English Wikipedia! > > I've temporarily disabled it as we're seeing some performance problems > saving edits at peak time

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brion Vibber
On 3/18/09 5:34 AM, Andrew Garrett wrote: > I am pleased to announce that the Abuse Filter [1] has been activated > on English Wikipedia! I've temporarily disabled it as we're seeing some performance problems saving edits at peak time today. Need to make sure there's functional per-filter profil

[Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Andrew Garrett
I am pleased to announce that the Abuse Filter [1] has been activated on English Wikipedia! The Abuse Filter is an extension to the MediaWiki [2] software that powers Wikipedia allowing automatic "filters" or "rules" to be run against every edit, and to take actions if any of those rules are trigg