On 24 July 2010 15:41, Kevin Brubeck Unhammer <[email protected]> wrote: > 2010/7/24 Jimmy O'Regan <[email protected]>: >> On 24 July 2010 13:15, Kevin Brubeck Unhammer <[email protected]> wrote: >>> I've noticed a lot more rules that all could do with this <exception>, >>> at least a fifth of the sme-nob chunking rules have possibilities for >>> mis-chunking (eg. det.loc + n.ill should not be chunked, but most >>> other cases of det and n should be chunked), the same for all the >>> conjunction rules (in the first interchunk) that merge two chunks. >>> >> >> Not exactly sure what you're saying here. >> >> The main point of this is not to throw lookahead all over place; but >> to stop a pattern from 'stealing' from a real chunk; if there's >> nothing that could follow n.ill that would form a proper chunk, then >> you don't need this, just check as normal and output two chunks from >> the existing rule - probably what you're already doing. > > In the same way, in your example with ADJ N ADJ you could have one > rule matching the full three-part thing and outputting either {ADJ N} > ADJ or ADJ {N ADJ}, or even three chunks -- you just need another rule > ADJ N ADJ N for when the second adj modifies a noun. > > My DET NOMCMP NOM rule can give one chunk on seeing any of > > <prn><indef><attr> <n><sg><nom><cmp> <n><pl><ill> > <prn><indef><attr> <n><sg><nom><cmp> <n><pl><com> > <prn><pers><sg><p1><gen> <n><sg><nom><cmp> <n><pl><com> > <prn><pers><sg><p1><gen> <n><sg><nom><cmp> <n><pl><loc> > <prn><dem><sg><p1><gen> <n><sg><nom><cmp> <n><sg><loc> > <prn><dem><pl><loc> <n><sg><nom><cmp> <n><pl><loc> > <prn><indef><sg><loc> <n><sg><nom><cmp> <n><sg><nom> > > etc., or two chunks on seeing > <prn><indef><pl><loc> <n><sg><nom><cmp> <n><pl><ill> > > (perhaps also with other combinations that I haven't discovered yet) > > and if the last noun is also a compound part, I just need a DET NOMCMP > NOMCMP NOM rule too, which can output either one or two chunks. > > It's always possible to fix things with more redundancy. I was just > trying to make a point that this <exception> could lead to much more > maintainable transfer rules. >
Not quite. You may have more maintainable *individual* rules, but at the cost of the increased complexity of having to then bear in mind the backoff relationship between rules. In summary, use it where it makes sense, and not just because you can. > >>> In the above example I could have just added extra almost-identical >>> rules to cover all the patterns (involving a lot of redundancy), but >>> if the exception depends on target-language information even that >>> wouldn't do it. Eg. most verbs both in Bokmål and Sámi have adjective >>> forms, so we allow Sámi <v><adj> to enter into ADJ NOM rules. But some >>> Sámi verbs translate to a certain class of Bokmål verbs (lexicalised >>> passives) that don't have adj forms, these get the tag <pstv> in >>> bidix, but we can't know that from the <pattern>; here the <exception> >>> would be great. >> >> It sounds to me like you're just not being precise enough in the >> pattern items, but the whole area of derivational morphology bores me >> to sleep, so maybe I missed something. > > Using pattern-items here would mean either adding all verb lemmas > apart from some hundred from the sme dictionary into a pattern-item, Fair point. This <exception> mechanism has the potential to open the floodgates for a while pile of craziness, so you'll have to excuse me if I'm a bit defensive. > or adding tags to all these in the _sme_ dictionary which record what > they will turn into in bidix. I wouldn't tag nouns in English with > what their gender is in Spanish, that's a bidix job. There was a point in my saying that I find gisting systems uninteresting (other than 'it's true')... I found that whole exchange of 'pick a name that suits the minority of people who choose to think in terms of the mechanism rather than the effect' extremely unmotivating, and I'd really rather not have what little motivation I have left eroded by considerations of things I don't care about. I can see that you're trying to do that, thanks. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
