Re: [Wikisource-l] Proofreading based on statistics

2013-05-27 Thread Alex Brollo
> > > > As Alex says, no. And it's a pity. > I really think it is paramount to have a central place (on Wikisurce.org?) > to discuss tools, templates, and procedures, as an international > community. > It is a project long needed. > Where should we start it? > > Aubrey > > Personally. I'll try to

Re: [Wikisource-l] Proofreading based on statistics

2013-05-26 Thread Andrea Zanni
> > I remember, for example, an awesome tool from Alex Brollo, postOCR, >>> a js script which corrects automatically most common OCR errors and >>> converts apostrophes. >>> >> >> Where is this? Is it documented in English? > > As Alex says, no. And it's a pity. I really think it is paramount to h

Re: [Wikisource-l] Proofreading based on statistics

2013-05-25 Thread Alex Brollo
On 05/24/2013 09:11 AM, Andrea Zanni wrote: > I remember, for example, an awesome tool from Alex Brollo, postOCR, a js script which corrects automatically most common OCR errors and > converts apostrophes. Where is this? Is it documented in English? Andrea mentioned two different tools merged

Re: [Wikisource-l] Proofreading based on statistics

2013-05-25 Thread Lars Aronsson
On 05/24/2013 09:11 AM, Andrea Zanni wrote: I remember, for example, an awesome tool from Alex Brollo, postOCR, a js script which corrects automatically most common OCR errors and converts apostrophes. Where is this? Is it documented in English? As an example, we are collaborating right now w

Re: [Wikisource-l] Proofreading based on statistics

2013-05-24 Thread Alex Brollo
I explored as a user the website of Distributed Proofreaders, to catch ideas about proofreading. It has been a very productive and highlighting experience, even if the whole philosophy of DP proofreading/formatting is completely different - and incompatible - with wiki approach. One of tools is an

Re: [Wikisource-l] Proofreading based on statistics

2013-05-24 Thread Andrea Zanni
I completely agree with Lars. I remember, for example, an awesome tool from Alex Brollo, postOCR, a js script which corrects automatically most common OCR errors and converts apostrophes. The tool is very useful and very used, and it would improve a lot from a given list of common OCR errors per bo

Re: [Wikisource-l] Proofreading based on statistics

2013-05-23 Thread Federico Leva (Nemo)
Lars Aronsson, 24/05/2013 01:54: It should be possible, in any language of Wikisource, to check all existing text against What do you define as existing text? Only the text currently stored in wiki pages? Also the text layer of the DjVu or PDF files in use on the wiki? Also the files uploaded

[Wikisource-l] Proofreading based on statistics

2013-05-23 Thread Lars Aronsson
It should be possible, in any language of Wikisource, to check all existing text against a known dictionary valid for that year, and to find words that are outside the dictionary. These words could be proofread in some tool similar to a CAPTCHA. They might be uncommon place names that are correctl