Jonathan wrote:
Lars wrote:

One idea for finding stats on errors is to compare changes made to Wikipedia articles. The complete text revision history is

That might make a good corpus.
Would it be possible to write a script that picks up just the spelling/grammar changes? If not, you'll be counting the effects of numerous edit wars.

I think that edit wars always involve a change of more than two or three words, and spelling changes are measured in ten characters or something. I would start with this simple length heuristics.

Some versions of Wikipedia have their own ways of annotating the changes, i.e., Polish Wikipedia has abbreviations for typos, spelling mistakes etc. This is directly usable to us but nobody has reused that AFAIK. The idea looks very promising, though.

Best,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to