Jonathan wrote:
Lars wrote:
One idea for finding stats on errors is to compare changes made to
Wikipedia articles. The complete text revision history is
That might make a good corpus.
Would it be possible to write a script that picks up just the
spelling/grammar changes? If not, you'll be counting the effects of
numerous edit wars.
I think that edit wars always involve a change of more than two or three
words, and spelling changes are measured in ten characters or something.
I would start with this simple length heuristics.
Some versions of Wikipedia have their own ways of annotating the
changes, i.e., Polish Wikipedia has abbreviations for typos, spelling
mistakes etc. This is directly usable to us but nobody has reused that
AFAIK. The idea looks very promising, though.
Best,
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]