Re: Valency dictionary and attribute [long mail]

2016-02-07 Thread Andriy Rysin
Hi Marcin

I was actually thinking for something even more abstract. To adjust
your example:

 



 


or


 .*xxx.*




In this case LT core would only know that there's set of extra
"dictionary" information that has category and a value. This way we
can add many different type of dictionaries, that could be specific to
each language and rules for that language will use that
language-specific information.
Core LT will allow each language to provide this extra information and
will not care much about its content.

Regards,
Andriy

2016-02-02 14:40 GMT-05:00 Marcin Miłkowski :
> W dniu 02.02.2016 o 18:08, Andriy Rysin pisze:
>> Hey Marcin
>>
>> this is great addition, though I have one remark. Besides valency
>> information some other type of information could be useful too (if we
>> starting to head this direction). E.g. I have rules in Ukrainian that
>> suggests superlative form for adjective when "самий" (very) + base
>> form is used. Currently I have the relation between base form and
>> comparative/superlative forms encoded in the dictionary but in general
>> this is higher-level information that should be stored outside of the
>> tag dictionary.
>
> I would argue that in some languages (at least in Polish and English)
> this is not a semantic-level information, this is a grammatical
> information, or morphosyntactic information.
>
>>
>> I am wondering if we could develop more generic approach for such
>> additional (semantic) information, e.g. split each type of this info
>> into category and allow generic references in the token/exception,
>> something like this:
>>
>> > semantic_info=":"/>
>>
>> or even as a subelement (I assume semantic information can get pretty
>> long/complicated so child element may be better choice and will allow
>> to add new attributes easily on it later)
>>
>> 
>> > value=""/>
>> > value=""/>
>> 
>>
>> so in valency case you described (1st case) it could be:
>>
>> 
>>
>> 
>
> Valency is definitely not a semantic category:
>
> https://en.wikipedia.org/wiki/Valency_(linguistics)
>
> But your approach seems quite elegant. I would argue that valency is one
> kind of information that should be treated as key-value
>
> 
>  
> 
>
> This would match a verb that takes an accusative noun phrase (of course,
> the values would be defined per valency lexicon in a language). There
> are free valency lexicons for many languages beside Polish.
>
>>
>> Thus if we add other semantic information into LT we can use this info
>> in the logic without changing the LT core.
>
> The core XML parsing will have to be changed anyway.
>
> Best,
> Marcin
>
>>
>> Thanks
>> Andriy
>>
>> 2016-01-28 7:30 GMT-05:00 Marcin Miłkowski :
>>> Hi all,
>>>
>>> To allow for better disambiguation and have better rules, I need to
>>> include a valency dictionary with LT. These are dictionaries that
>>> specify which grammatical cases or prepositions go with which verbs etc.
>>> There are such resources for many languages that we support. And using
>>> these resources, we could enrich POS tag disambiguation a lot (I'm using
>>> a horribly long regular expression right now instead of a dictionary,
>>> for example), and write up a lot of important rules.
>>>
>>> The obvious choice for representing the dictionary (which is available
>>> for Polish on a fairly liberal license) is to use a finite-state lexicon
>>> that we normally use for taggers. The dictionary will be applied after
>>> tagging because valency dictionary will require POS tag + lexeme
>>> information. In Polish, the entries look like this:
>>>
>>> absurdalny: pewny: : : : {prepnp(dla,gen)}
>>> absurdalny: pewny: : : : {prepnp(w,loc)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(gdy)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(int)}
>>> absurdalny: potoczny: : pred: : {prepnp(dla,gen)}+{cp(jak)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(jeśli)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(kiedy)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(że)}
>>> absurdalny: pewny: : pred: : {prepnp(dla,gen)}+{cp(żeby)}
>>>
>>> But for French (see http://bach.arts.kuleuven.be/dicovalence/) they are
>>> paragraph-based:
>>>
>>> VAL$abaisser: P0 P1
>>> VTYPE$  predicator simple
>>> VERB$   ABAISSER/abaisser
>>> NUM$10
>>> EG$ il faudra abaisser la persienne
>>> TR_DU$  laten zakken, neerhalen, neerlaten, doen dalen
>>> TR_EN$  let down, lower
>>> FRAME$  subj:pron|n:[hum], obj:pron|n:[nhum,?abs]
>>> P0$ (que), qui, je, nous, elle, il, ils, on, (ça), (ceci), celui-ci, 
>>> ceux-ci
>>> P1$ que, la, le, les, en Q, ça, ceci, celui-ci, ceux-ci
>>> RP$ passif être, se passif
>>> AUX$avoir
>>>
>>>
>>> VAL$abaisser: P0 (P1)
>>> VTYPE$  predicator simple
>>> VERB$   ABAISSER/abaisser
>>> NUM$20
>>> EG$ il a raconté cette anecdote pour m'abaisser
>>> TR_DU$  vernederen, kleineren
>>> TR_EN$  humiliate
>>> FRAME$  subj:pron|n:[hum], ?obj:pron|n:[hum]
>>> P0$ (que), qu

Re: Rule not working properly

2016-02-07 Thread Daniel Naber
On 2016-02-07 09:03, Marco A.G.Pinto wrote:

>  The above rule still triggers "a" even if it is an exception:
>  "A região é subdividida em duas zonas, uma a Ocidente"

I cannot reproduce that, i.e. this sentence doesn't get matched for me 
(at least not by the rule you quoted, but by others).

Regards
  Daniel


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


MS Word add-in translations

2016-02-07 Thread Jaume Ortolà i Font
Hi,

If you want to translate the LanguageTool MS Word add-in into your
language, you can do it now at transifex.com. See the
file WinFormStrings.resx. Most of the strings are already translated using
existing translations.

Regards,
Jaume Ortolà
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Rule not working properly

2016-02-07 Thread Marco A.G.Pinto

Hello!

I have been up since 6am trying to fix my last two rules before going to 
my weekend job at around noon.


This is the status:
***
****

**  **
****
**um**
****
**postag="AQ0CS0|PI0FS000|NCFS000">uma**
**postag="NCMS000">**

** **
** Erro de concordância um/uma.**
** uma**
** Quero um bonita foto 
dela.**

** *

The above rule with the "uma" exception fixed:
"A posse dos resultados do trabalho de cada um uma vez que"


***
****

**  **
****
**uma**
****
**postag="AQ0CS0|PI0MS000|NCMS000">a**
**postag="NCFS000">**

** **
** Erro de concordância uma/um.**
** um**
** Quero uma bonito carro 
novo.**

** **
*
The above rule still triggers "a" even if it is an exception:
"A região é subdividida em duas zonas, uma a Ocidente"

What is wrong with the second rule?

Thanks!

Kind regards,
  >Marco A.G.Pinto
---

--
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel