Re: equivalent and optional characters in words

Andriy Rysin Sun, 21 Apr 2013 06:35:43 -0700

On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:

2013/4/21 Andriy Rysin <[email protected] <mailto:[email protected]>>


    1) I would like to treat several apostrophes equally (apostrophes are
    part of the word in Ukrainian), e.g. in dictionary and rules I
    could use
    ' (0x27) but I would like to be able to parse text that has U+2019
    (and
    potentially U+02BC) the same way, I guess I could do a simple
    replace in
    word tokenizer but I was wondering if there's a better way

This is what is done in Catalan. So far  I have found no problem.

Jaume

Thanks, will try that. Another one: what's the recommended way to storeknowledge about alternative spellings for the word, e.g. color vscolour? It looks like it would make sense to code this relation in thedictionary so that we don't have to introduce regex for alternativespelling and repeat it multiple times in the rules. But I looked at theEnglish module and it looks like such relation is not present in thedictionary but instead hardcoded in the rules.


Thanks
Andriy

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: equivalent and optional characters in words

Reply via email to