2014-09-16 13:03 GMT+02:00 R.Baars <baar...@xs4all.nl>:

>  I see. This is probably of no use for spellchecking, but it is for
> postagging.
>
>
It gives no suggestions, but it can be used for avoiding false positives in
spellchecking, if you set that tagged words are to be ignored.


>
> Does
> Abu Dhabi NPCNG00
> cause both words to be tagged with that tag, or are they considered 1
> token with that postag?
>
>
Tokenization is not changed. In this case:

<token postag="<NPCNG00>">Abu</token>
<token postag="</NPCNG00>">Dhabi</token>

if there are more than two tokens, the inside tokens are not tagged.
Perhaps this should be optionally changed (ie, tag the inside tokens too).

Regards,
Jaume





> (Might come in handy for just this tagging..)
>
> Ruud
>
> Op 16-09-14 om 12:56 schreef Jaume Ortolà i Font:
>
>  Hi, Ruud.
>
>  I don't find any documentation. It is used in Polish, French, Catalan,
> Russian, Ukrainian and Spanish.
>
>  Implementation:
>
>  Enable it (Java).
> Create a "multiwords.txt" in your resources folder like these [1]. The
> tokens are separated by white space and the tag is separated by a tab.
>
>  Result:
>
>  The first token of the multiword is tagged with "<POSTAG>" and the last
> token is tagged with "</POSTAG>".
>
>  The MultiwordChunker is case-insensitive. I would like to make it
> configurable, specially for first letter uppercase.
>
>  Regards,
> Jaume
>
>
>  [1]
> https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/multiwords.txt
>
>
> https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ca/src/main/resources/org/languagetool/resource/ca/multiwords.txt
>
> 2014-09-16 12:33 GMT+02:00 R.Baars <baar...@xs4all.nl>:
>
>>  Jaume, thanks, but I am not sure.
>>
>> Depends on its implementation I think.
>>
>> Where can I find more info?
>>
>> Ruud
>>
>> Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
>>
>>   2014-09-16 11:21 GMT+02:00 R.J. Baars <r.j.ba...@xs4all.nl>:
>>
>>> We don't agree. There is a spellchecker, but also a single word ignore
>>> list for it.
>>> There are XML rules, but also a Simplereplace rule, a compounding rule.
>>>
>>> So apart from the hammer and the screwdriver, there are more tools.
>>>
>>>
>>  There is indeed another tool for multi-words. It seems that Ruud
>> doesn't know it.
>>
>>  We can enable a HybridDisambiguator and add a MultiwordChunker to the
>> disambiguation. With this you can write a list of "multi-words" with its
>> corresponding tag in a plain text file (multiwords.txt).
>>
>>  I use the MultiwordChunker with two objectives: improve disambiguation
>> and avoid spelling matches in multiwords.
>>
>>  Would it be useful for you, Ruud?
>>
>>  Regards,
>> Jaume
>>
>>
>>
>>
>>
>>> But anyway, adding the most frequent ones tot the disambiguator works.
>>>
>>> Getting rid of wrong postags and 10% reported possible spelling errors on
>>> the entire corpus is a higher priority.
>>> And fixing false positives. Having almost doubled the amount or rules is
>>> enough for this month.
>>>
>>> Ruud
>>>
>>>
>>>
>>> > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
>>> >> A word like 'Aviv'is not correct unless 'Tel' is before it.
>>> >> So it is best to leave Tel and Aviv out of the spell checker.
>>> >> That results in spell checking reporting errors for Aviv.
>>> >>
>>> >> In the disambiguator, there is the option to block that, by making an
>>> >> immunizing rule:
>>> >>
>>> >>    <!-- Tel Aviv-->
>>> >>    <rule id="TEL_AVIV" name="Tel Aviv">
>>> >>      <pattern>
>>> >>        <token>Tel</token>
>>> >>        <token>Aviv</token>
>>> >>      </pattern>
>>> >>      <disambig action="ignore_spelling"/>
>>> >>    </rule>
>>> >>
>>> >> That works perfectly. But then, there are a lot of these word
>>> >> combinations. Wouldn't it be better to have a multi-word ignore list
>>> for
>>> >> the spell checker?
>>> >>
>>> >> (Or even a multi-word spell checker, not just knowing 'correct' and
>>> 'not
>>> >> in list', but 'correct', 'incorrect' and 'not in list')
>>> >
>>> > It would not be an enhancement, as this would not give new
>>> functionality
>>> > but cripple the existing one. Also, the ability to use all XML syntax
>>> is
>>> > extremely important to me (I use POS tags and regular expressions), so
>>> I
>>> > wouldn't make use of the multi-word spell checker anyway. So we'd have
>>> > to introduce a crippled syntax that would look a little bit different
>>> > for a human being but with no meaningful functional change. I don't
>>> > think it's worth our time.
>>> >
>>> > The spell checker is best for checking individual words. Just like a
>>> > hammer, it's good for nails, and not for screws. For screws, we have a
>>> > screwdriver. For multi-word entities, we have more refined tools, like
>>> > tagging and disambiguation and special attributes.
>>> >
>>> > Best,
>>> > Marcin
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Want excitement?
>>> > Manually upgrade your production database.
>>> > When you want reliability, choose Perforce.
>>> > Perforce version control. Predictably reliable.
>>> >
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> > _______________________________________________
>>> > Languagetool-devel mailing list
>>> > Languagetool-devel@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>> >
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce.
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> Languagetool-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably 
>> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> Languagetool-devel mailing 
>> listLanguagetool-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably 
> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Languagetool-devel mailing 
> listLanguagetool-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to