2012/12/28 Mauro Condarelli <mc5...@mclink.it>

> My disambiguation rule needs updating, if someone can suggest how.
>
> Mario gli chiese l'ora.
>
> 121 rules activated for language Italian
>
> <S> Mario[Mario/NPR]  gli[gli/PRO-PERS-CLI-3-M-S,il/ART-M:p]  
> chiese[chiesa/NOUN-F:p]  
> l[l]'['/PON]ora[orare/VER:impr+pres+2+s,orare/VER:ind+pres+3+s,ora/ADV,ora/NOUN-F:s].[./SENT,</S>]<P/>
>
> Disambiguator log:
>
> art-ver: chiese[chiesa/NOUN-F:p,chiedere/VER:ind+past+3+s] -> 
> chiese[chiesa/NOUN-F:p]
>
> 1.) Line 1, column 7, Rule ID: GR_02_001[2]
>
> Message: L'articolo non concorda: 'le chiese'.
>
> Suggestion: le chiese
>
> Mario gli chiese l'ora.
>
>       ^^^^^^^^^^
>
>
> Here problem is "gli" is not an ARTicle, but a PROnoun, thus the rule
> should not apply.
> In the same sentence "l'" is not recognized as ARTicle ("la") and thus the
> rule is not applied to the following "ora", while it should have.
>

Hi Mauro,

Catalan has exactly the same kind of ambiguities. I have (more or less)
solved them, but it is quite complicated. Now I can tell some of the ideas
used:

- Number and gender concordance/non concordance is used to keep or discard
interpretations.
- Proximity of two consecutive verbs or two consecutive nouns is used to
discard interpretations.
- Tags for "nominal groups" and "verbal groups" are applied.
- Concordances of 4- and 3-tokens patterns are given more "weight" than
2-tokens patterns. Example: "La porta bianca" (article-noun-adjective in
concordance) is more probably a nominal group than "la porta" (article-noun
or pronoun-verb).
- Etcetera.

I need to put in order the Catalan disambiguation file, and then see what
can be directly used in other languages.

In your example, if the disambiguation rule you are using is the one I
wrote before, then you need to add an exception:

<rule>
     <pattern>
         <token postag="PREP.*|ART.*" postag_regexp="yes"><exception
postag="PRO.*" postag_regexp="yes"></token>
         <marker>
             <and>
                 <token postag="VER.*" postag_regexp="yes"></token>
                 <token postag="NOUN.*|ADJ.*" postag_regexp="yes"></token>
              </and>
          </marker>
      </pattern>
      <disambig action="filter" postag="[^V].*"></disambig>
<!-- Or: <disambig action="filter" postag="NOUN.*|ADJ.*"></disambig>-->
 </rule>

Now, taking into account that "gli" (ART-M:p) doesn't agree with "chiese"
(NOUN-F:p) you could discard the article-noun interpretation, and keep the
pronoun-verb interpretation. You should familiarize yourself with
"unification" in order to write such rules:
http://languagetool.wikidot.com/using-unification

Regards,
Jaume Ortolà
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to