On 2016-04-06 20:27, Marcin Miłkowski wrote:

>> <https://en.wikisource.org/wiki/Wikisource:What_is_Wikisource%3F>. It
>> has some bias as well, like texts being old due to copyrights, or
>> old-fashion language, but I there is an opportunity here since it has
>> many kind of documents, from legal texts to science fiction novels,
>> original or translated texts etc.
> 
> I used our indexer on a large newspaper corpus, and on some corpus of
> literary works. I agree, Wikipedia is strongly biased.

Indeed. https://tatoeba.org is a nice complementary source for 
dialogue-style, often colloquial, texts (it's much smaller than 
Wikipedia, but still has >350,000 sentences for English).

Regards
  Daniel


------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to