W dniu 2013-05-21 05:26, Andriy Rysin pisze: > On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote: >> 2013/4/21 Andriy Rysin <[email protected] <mailto:[email protected]>> >> >> 1) I would like to treat several apostrophes equally (apostrophes are >> part of the word in Ukrainian), e.g. in dictionary and rules I >> could use >> ' (0x27) but I would like to be able to parse text that has U+2019 >> (and >> potentially U+02BC) the same way, I guess I could do a simple >> replace in >> word tokenizer but I was wondering if there's a better way >> >> This is what is done in Catalan. So far I have found no problem. >> > This seems to work pretty nice for *replacing* chars, but if I also > *remove* accent (U+0301) from words in word tokenizer it looks like it > messes up the error position in the sentence (at least in the web > interface). Is there a right way to remove symbols I don't care about?
Yes, but you'd need to change processing a bit: I had an idea to mark up some AnalyzedTokenReadings as ignorable, so that the rules wouldn't see them. Basically, a single attribute should suffice, and in several places (where you get tokens without spaces, for example) these tokens would be excluded. Also, the code for checking for the preceding space would need to be checked so that the ignorable symbol would not mess up with it. Best, Marcin > > Thanks > Andriy > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > > > > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
