On May 21, 2013 4:17 AM, "Marcin Miłkowski" <[email protected]> wrote:
>
> W dniu 2013-05-21 05:26, Andriy Rysin pisze:
> > On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:
> >> 2013/4/21 Andriy Rysin <[email protected] <mailto:[email protected]>>
> >>
> >> 1) I would like to treat several apostrophes equally (apostrophes
are
> >> part of the word in Ukrainian), e.g. in dictionary and rules I
> >> could use
> >> ' (0x27) but I would like to be able to parse text that has U+2019
> >> (and
> >> potentially U+02BC) the same way, I guess I could do a simple
> >> replace in
> >> word tokenizer but I was wondering if there's a better way
> >>
> >> This is what is done in Catalan. So far I have found no problem.
> >>
> > This seems to work pretty nice for *replacing* chars, but if I also
> > *remove* accent (U+0301) from words in word tokenizer it looks like it
> > messes up the error position in the sentence (at least in the web
> > interface). Is there a right way to remove symbols I don't care about?
>
> Yes, but you'd need to change processing a bit: I had an idea to mark up
> some AnalyzedTokenReadings as ignorable, so that the rules wouldn't see
> them. Basically, a single attribute should suffice, and in several
> places (where you get tokens without spaces, for example) these tokens
> would be excluded. Also, the code for checking for the preceding space
> would need to be checked so that the ignorable symbol would not mess up
> with it.
Marcin
I'm not sure I understood, I don't want to exclude tokens, I want to remove
a character from token as it wasn't there. But it looks when the position
of the error after that token in a sentence is calculated the removed
character is not taken to account.
It feels like if I want to remove character I need to remember previous
token position and length and use it later for position calculation.
Andriy
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel