HI again,

I wrote the previous message using my phone, hence the brevity.

I use unification in Polish in the disambiguator, to remove non-agreeing
interpretations of POS tags in tokens. That really helps in disambiguation.
I also hope to use unification to group tokens in chunks (there are such
rules for Polish in Spejd, a shallow parser used to process the National
Polish Corpus). For example, I need to detect connectives (such as "and")
that belong to the noun phrase (rather than a connective that links two
sentences). Marking up such connectives will make it possible to detect
punctuation problems that remain undetected. But for such purposes, I need
to ignore some tokens in unification.

In the grammar file, there are known rendundant phrases that I match by
using unification -- I want to find the phrases that agree. In addition,
there are terminological mistakes (direct translations from English),
stylistic blunders that need agreement.

Thanks to unification, it is very easy to find such phrases in a terse
manner. This is why it is used in morphologically rich languages. For our
purposes, the language really needs a rich tagset -- Penn Treebank is not
really nice for it as it does not specify attributes in a positional
manner. LT German tagset would be perfect for unification.

We should have that info in the wiki... Sorry for not writing this before.

By the way, I believe there's unification for French as well.

Regards,
Marcin


2013/10/16 Jaume Ortolà i Font <jaumeort...@gmail.com>

> 2013/10/16 Daniel Naber <list2...@danielnaber.de>
>
>> Hi,
>>
>> although I think I understand the technical details of unification, I'm
>> not sure how/why it is used in grammar.xml. For example, if a sequence
>> of words share the same gender and number, that means there's agreement,
>> so you cannot use that to write an error rule. So is unification just
>> used to avoid false alarms in rules that are not related to agreement at
>> all?
>>
>
> Hi, Daniel.
>
> In some cases, unification is used, as you say, in rules not related to
> agreement, just to make sure the context is the expected one, to avoid
> false alarms. In rules related to agreement, the common use is probably
> <unify negate="yes">, so you can find non-agreement in words that should
> agree, ex. determinant+noun, determinant+nou+adjective, etc. That's so in
> Catalan. In other Latin languages should be the same, if used.
>
> Jaume
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to