On 11 July 2010 20:56, Kevin Brubeck Unhammer <[email protected]> wrote:
> 2010/7/11 Jimmy O'Regan <[email protected]>:
>> The attached patch adds a new mechanism to transfer rules: <exception>
>
> This has been on my wishlist for a while =D
>

Well, I stole the idea from LanguageTool, though it's not as flexible
as LanguageTool's exception facility.

>> Exception can contain a single <test> -- if the test evaluates to
>> 'true', the current rule is ignored, and the last applicable rule is
>> used instead (the implication being that it should only be used in
>> rules whose <pattern> contains more than one <pattern-item>).
>>
> [snip]
>> Motivation:
>>
>> The primary motivation was in dealing with Polish: highly inflected
>> (few 'markers'), adjectives can come before or after the noun.
>> Inflection *usually* gives enough information for proper segmentation,
>> but handling it properly would be a matter of having individual rules
>> for each gender, case, and number + each combination of words (i.e.,
>> multiply number of NP rules by 70). I've seen recently that it would
>> help in less inflected languages, so it's probably generally useful.
>
> I just tested it for nb->nn, where I used it to avoid chunking adj.ind
> n.def (the adjective is used adverbially, not modifying the noun),
> which in some cases can be quite important:
>
> Before:
> $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy
> Ingebrigtsen|apertium -d . nb-nn
> Leiinga likar det dårlege fokuset på utøvarar som Tommy Ingebrigtsen
> ≈ The management likes the bad focus on athletes such as Tommy Ingebrigtsen
>
> After, correct meaning:
> $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy
> Ingebrigtsen|apertium -d . nb-nn
> Leiinga likar dårleg fokuset på utøvarar som Tommy Ingebrigtsen
> ≈ The management doesn't like ("likes badly") the focus on athletes
> such as Tommy Ingebrigtsen
>
>
> Of course, one can always acheive the same as <exception> by using
> <choose><when> and duplicating the contents of the single-item rules,
> but, well, that means duplicating content… this looks like it would be
> a lot simpler to maintain (and less ugly than output macros).
>

Yeah; having to repeat almost all of the same rules in two layers of
transfer is a bit excessive.

>
> I'm still trying to make it break :)
>

Well, as I mentioned, if the rule that it's falling back to also has
an exception, which also evaluates to true, then you will get
breakage: you'll either lose all but the first word, or it'll
segfault, which are both drastically bad outcomes.

It's not ready for primetime, but I sent it to the list for a long
list of reasons:
1) You expressed an interest in it
2) We have two students working on two alternate implementations, they
should be kept abreast of this sort of thing
3) Someone might have a very good reason for it not to go in
4) I've tried a couple of different ways to get this to work and I'm
all excited and stuff
4a) ...and I would appreciate a fresh perspective
5) The list has been awfully quiet lately... :D

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to