On 17.08.2017 12:08, Andrej Warkentin wrote: > Hello, > > in a talk at the PyData Berlin meetup I saw this project: > https://github.com/lusy/hora-de-decir-bye-bye , where spanish articles > are scraped and searched for english words. In order to identify english > words she used the dictionaries from Open Office and compared scraped > words to the dictionaries. She mentioned the problem that not all words > were in the dictionaries. > > So I thought this could be used to find (or at least help finding) most > missing words in dictionaries for all languages. One could scrape e.g. > all Wikipedia articles of a certain language and create a candidate list > of missing words. Or it could also be used to find domain specific words > by scraping e.g. scientific articles, articles from certain types of > websites and so on. > > My question is if this would be something helpful at all or if missing > words in dictionaries is not a problem anymore. Also, I unfortunately > don't have much spare time at the moment to work on this so if anyone > wants to pick this up feel free to do so. I will let you know when I > implemented something myself.
by "missing words in dictionaries", do you mean that if "teh" was used as an archaic spelling of "tea" in a work of Shakespeare (completely made up and hypothetical example), that we should add "teh" to the dictionary and no longer flag it as a wrongly spelled word? _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice