W dniu 2013-05-21 05:26, Andriy Rysin pisze:
> On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:
>> 2013/4/21 Andriy Rysin <[email protected] <mailto:[email protected]>>
>>
>>     1) I would like to treat several apostrophes equally (apostrophes are
>>     part of the word in Ukrainian), e.g. in dictionary and rules I
>>     could use
>>     ' (0x27) but I would like to be able to parse text that has U+2019
>>     (and
>>     potentially U+02BC) the same way, I guess I could do a simple
>>     replace in
>>     word tokenizer but I was wondering if there's a better way
>>
>> This is what is done in Catalan. So far  I have found no problem.
>>
> This seems to work pretty nice for *replacing* chars, but if I also
> *remove* accent (U+0301) from words in word tokenizer it looks like it
> messes up the error position in the sentence (at least in the web
> interface). Is there a right way to remove symbols I don't care about?

Yes, but you'd need to change processing a bit: I had an idea to mark up 
some AnalyzedTokenReadings as ignorable, so that the rules wouldn't see 
them. Basically, a single attribute should suffice, and in several 
places (where you get tokens without spaces, for example) these tokens 
would be excluded. Also, the code for checking for the preceding space 
would need to be checked so that the ignorable symbol would not mess up 
with it.

Best,
Marcin



>
> Thanks
> Andriy
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
>
>
>
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to