2010/7/24 Jimmy O'Regan <[email protected]>:
> On 24 July 2010 13:15, Kevin Brubeck Unhammer <[email protected]> wrote:
>> I've noticed a lot more rules that all could do with this <exception>,
>> at least a fifth of the sme-nob chunking rules have possibilities for
>> mis-chunking (eg. det.loc + n.ill should not be chunked, but most
>> other cases of det and n should be chunked), the same for all the
>> conjunction rules (in the first interchunk) that merge two chunks.
>>
>
> Not exactly sure what you're saying here.
>
> The main point of this is not to throw lookahead all over place; but
> to stop a pattern from 'stealing' from a real chunk; if there's
> nothing that could follow n.ill that would form a proper chunk, then
> you don't need this, just check as normal and output two chunks from
> the existing rule - probably what you're already doing.
In the same way, in your example with ADJ N ADJ you could have one
rule matching the full three-part thing and outputting either {ADJ N}
ADJ or ADJ {N ADJ}, or even three chunks -- you just need another rule
ADJ N ADJ N for when the second adj modifies a noun.
My DET NOMCMP NOM rule can give one chunk on seeing any of
<prn><indef><attr> <n><sg><nom><cmp> <n><pl><ill>
<prn><indef><attr> <n><sg><nom><cmp> <n><pl><com>
<prn><pers><sg><p1><gen> <n><sg><nom><cmp> <n><pl><com>
<prn><pers><sg><p1><gen> <n><sg><nom><cmp> <n><pl><loc>
<prn><dem><sg><p1><gen> <n><sg><nom><cmp> <n><sg><loc>
<prn><dem><pl><loc> <n><sg><nom><cmp> <n><pl><loc>
<prn><indef><sg><loc> <n><sg><nom><cmp> <n><sg><nom>
etc., or two chunks on seeing
<prn><indef><pl><loc> <n><sg><nom><cmp> <n><pl><ill>
(perhaps also with other combinations that I haven't discovered yet)
and if the last noun is also a compound part, I just need a DET NOMCMP
NOMCMP NOM rule too, which can output either one or two chunks.
It's always possible to fix things with more redundancy. I was just
trying to make a point that this <exception> could lead to much more
maintainable transfer rules.
>> In the above example I could have just added extra almost-identical
>> rules to cover all the patterns (involving a lot of redundancy), but
>> if the exception depends on target-language information even that
>> wouldn't do it. Eg. most verbs both in Bokmål and Sámi have adjective
>> forms, so we allow Sámi <v><adj> to enter into ADJ NOM rules. But some
>> Sámi verbs translate to a certain class of Bokmål verbs (lexicalised
>> passives) that don't have adj forms, these get the tag <pstv> in
>> bidix, but we can't know that from the <pattern>; here the <exception>
>> would be great.
>
> It sounds to me like you're just not being precise enough in the
> pattern items, but the whole area of derivational morphology bores me
> to sleep, so maybe I missed something.
Using pattern-items here would mean either adding all verb lemmas
apart from some hundred from the sme dictionary into a pattern-item,
or adding tags to all these in the _sme_ dictionary which record what
they will turn into in bidix. I wouldn't tag nouns in English with
what their gender is in Spanish, that's a bidix job.
--
Kevin Brubeck Unhammer
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff