Hi, Ruud.

I don't find any documentation. It is used in Polish, French, Catalan,
Russian, Ukrainian and Spanish.

Implementation:

Enable it (Java).
Create a "multiwords.txt" in your resources folder like these [1]. The
tokens are separated by white space and the tag is separated by a tab.

Result:

The first token of the multiword is tagged with "<POSTAG>" and the last
token is tagged with "</POSTAG>".

The MultiwordChunker is case-insensitive. I would like to make it
configurable, specially for first letter uppercase.

Regards,
Jaume


[1]
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/multiwords.txt

https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ca/src/main/resources/org/languagetool/resource/ca/multiwords.txt

2014-09-16 12:33 GMT+02:00 R.Baars <baar...@xs4all.nl>:

>  Jaume, thanks, but I am not sure.
>
> Depends on its implementation I think.
>
> Where can I find more info?
>
> Ruud
>
> Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
>
>   2014-09-16 11:21 GMT+02:00 R.J. Baars <r.j.ba...@xs4all.nl>:
>
>> We don't agree. There is a spellchecker, but also a single word ignore
>> list for it.
>> There are XML rules, but also a Simplereplace rule, a compounding rule.
>>
>> So apart from the hammer and the screwdriver, there are more tools.
>>
>>
>  There is indeed another tool for multi-words. It seems that Ruud doesn't
> know it.
>
>  We can enable a HybridDisambiguator and add a MultiwordChunker to the
> disambiguation. With this you can write a list of "multi-words" with its
> corresponding tag in a plain text file (multiwords.txt).
>
>  I use the MultiwordChunker with two objectives: improve disambiguation
> and avoid spelling matches in multiwords.
>
>  Would it be useful for you, Ruud?
>
>  Regards,
> Jaume
>
>
>
>
>
>> But anyway, adding the most frequent ones tot the disambiguator works.
>>
>> Getting rid of wrong postags and 10% reported possible spelling errors on
>> the entire corpus is a higher priority.
>> And fixing false positives. Having almost doubled the amount or rules is
>> enough for this month.
>>
>> Ruud
>>
>>
>>
>> > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
>> >> A word like 'Aviv'is not correct unless 'Tel' is before it.
>> >> So it is best to leave Tel and Aviv out of the spell checker.
>> >> That results in spell checking reporting errors for Aviv.
>> >>
>> >> In the disambiguator, there is the option to block that, by making an
>> >> immunizing rule:
>> >>
>> >>    <!-- Tel Aviv-->
>> >>    <rule id="TEL_AVIV" name="Tel Aviv">
>> >>      <pattern>
>> >>        <token>Tel</token>
>> >>        <token>Aviv</token>
>> >>      </pattern>
>> >>      <disambig action="ignore_spelling"/>
>> >>    </rule>
>> >>
>> >> That works perfectly. But then, there are a lot of these word
>> >> combinations. Wouldn't it be better to have a multi-word ignore list
>> for
>> >> the spell checker?
>> >>
>> >> (Or even a multi-word spell checker, not just knowing 'correct' and
>> 'not
>> >> in list', but 'correct', 'incorrect' and 'not in list')
>> >
>> > It would not be an enhancement, as this would not give new functionality
>> > but cripple the existing one. Also, the ability to use all XML syntax is
>> > extremely important to me (I use POS tags and regular expressions), so I
>> > wouldn't make use of the multi-word spell checker anyway. So we'd have
>> > to introduce a crippled syntax that would look a little bit different
>> > for a human being but with no meaningful functional change. I don't
>> > think it's worth our time.
>> >
>> > The spell checker is best for checking individual words. Just like a
>> > hammer, it's good for nails, and not for screws. For screws, we have a
>> > screwdriver. For multi-word entities, we have more refined tools, like
>> > tagging and disambiguation and special attributes.
>> >
>> > Best,
>> > Marcin
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Want excitement?
>> > Manually upgrade your production database.
>> > When you want reliability, choose Perforce.
>> > Perforce version control. Predictably reliable.
>> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > Languagetool-devel mailing list
>> > Languagetool-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>> >
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably 
> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Languagetool-devel mailing 
> listLanguagetool-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to