This has reminded me to complain about Google Books. Google has the world's
best OCR (in virtue of having the largest OCR'able dataset) and also has a
mission to scan in all the public domain books they can get their hand on.
They recently updated their interface to, as they put it, "make it easier to
find our plain text versions of public domain books. If a book is available
in full view, you can click the 'Plain text' button in the toolbar."
Unfortunately the only way I've found to download the full text of a public
domain book from Google is to flip through the book a page at a time,
copying the text to your clipboard.
There are roughly 2-3 million public domain books in Google Books.


On Sat, Jun 20, 2009 at 10:10 AM, Samuel Klein <meta...@gmail.com> wrote:

> There is a wealth of work done all the time by primary source
> researchers and publishers, which could be improved on by having
> wikisource entries, translations, &c.
>
> Related question : how appropriate would large numbers of public
> domain texts, with page scans and the best available OCR [and
> translations of same], fit with what Wikisource does now?  This is
> clearly a wiki project that needs to happen : OCR even at its best
> misses rare meaning-bearing words.   If not Wikisource, where should
> this work take place?
>
> SJ
>
> On Sat, Jun 20, 2009 at 11:41 AM, David Gerard<dger...@gmail.com> wrote:
> >
> http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/
> >
> > Interesting. How well does this fit with what Wikisource does?
> >
> >
> > - d.
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> >
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to