On 2016-04-06 20:27, Marcin Miłkowski wrote: >> <https://en.wikisource.org/wiki/Wikisource:What_is_Wikisource%3F>. It >> has some bias as well, like texts being old due to copyrights, or >> old-fashion language, but I there is an opportunity here since it has >> many kind of documents, from legal texts to science fiction novels, >> original or translated texts etc. > > I used our indexer on a large newspaper corpus, and on some corpus of > literary works. I agree, Wikipedia is strongly biased.
Indeed. https://tatoeba.org is a nice complementary source for dialogue-style, often colloquial, texts (it's much smaller than Wikipedia, but still has >350,000 sentences for English). Regards Daniel ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel