Re: Ruta - MARKFAST
Am 30.06.2014 15:31, schrieb Peter Klügl: > Am 30.06.2014 14:58, schrieb armin.weg...@bka.bund.de: >> Hi, Peter! >> >> I got that. I restricted MARKFAST on segments. It works just nearly > perfect. How does MARKFAST match things? Using >> Document{->MARKFAST(MyType, { "a", "b", "a b" }); Well, when spending another thought about it, then it is clear... The matching process considers the longest match. I don't think that all matches are currently supported, but it should not be complicated to add the functionality. You can open a feature request if you want. Peter > hehe... I didn't even remember that this is possible. I will open an > issue for string lists. > > The normal application of MARKFAST is with word lists: > > WORDLIST MyList = 'somelist.txt'; > Document{-> MARKFAST(MyType, MyList)}; > > ... whereas the file somelists.txt contains something like: > > a > b > a b > > Files with endings "twl" and "mtwl" are for compiled dictionaries. > > Just to mention: > The usage of characters (in the word list) that are filtered when > applying the dictionary lookup may cause unexpected behavior because the > algorithm may choose the wrong subtree. I happened once in our > applications until now. > > Best, > > Peter > > > >> on >> >> a b >> >> yields >> >> "a b" and "b" but not "a". >> >> I would like to have "a" as well. Can this be done? >> >> Buy the way: I love Ruta.apply(). That is exactly what I needed. >> >> Thanks, >> Armin >> >> >> -Ursprüngliche Nachricht- >> Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] >> Gesendet: Montag, 30. Juni 2014 12:51 >> An: user@uima.apache.org >> Betreff: Re: Ruta - MARKFAST >> >> Hi, >> >> Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de: >>> Hello! >>> >>> On which annotation type does MARFKAST work? >> It is applied on the annotations, on which the rule element of the > action matched. >> Document{-> MARKFAST(...)}; >> ... causes a dictionary lookup on the complete document. >> >> Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate > dictionary lookup on each of the matched sentences (e.g., no > inter-sentence annotations). >> >>> Can I restrict MARKFAST to a single annotation Type, say my own token > type? >> No, but there is an issue that includes this functionality. >> >> UIMA-3775: Fast multi token dictionary matching on feature values >> >> The idea is the apply the dictionary lookup on sequences feature > values (e.g., lemmas). If the feature represents the covered text, then > this would also support your use case. The issue is not top priority > right now, but if you want, then I can try to include it in the next > release (August). >>> It would be nice to restrict a ruta script to a set of annotations by >>> giving that set of annotations >> explicitly, like >>> Document{-> INPUT(Token, Organization, Location)}; >> UIMA Ruta follows a different strategy, e.g., compared to JAPE and its > input specification. The availability and visibility of annotations is > not type-based but coverage-based. This enables the easy specification > of complex patterns, but also complicates the things sometimes. If one > type is set to invisible (FILTERTYPE), then all annotations of this type > and all covered annotations of other types are invisible. >> The MARKFAST action operates on the RutaStream and thus is lookup is > sensitive to the filtering setting. For example, the lookup ignored > whitespaces, breaks and markup using the default settings. By extending > the set of filtered types, you can also change the behavior of the > dictionary lookup. However, mind that annotations covered by one of the > types are also not accessible by the dictionary. >>> All other annotations should be ignored. Is there a way to do this in >> Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? >> >> Yes, but it depends on the actual occurrences of types in your document. >> The easiest way is to filter the types of the annotations that cover > the positions that should be skipped. It's not easy to give a generic > solution for this. >> An example: >> Your tokenizer creates annotations for words and numbers, but not for > punctuation marks, and you want to apply the dictionary lookup only for > sequences of token annotations skipping punctuation marks. >> Document{-> FILTERTYPE(PM)}; >> Document{-> MARKFAST(...)}; >> >> >> There are plans to extend and modify the concept of accessibility and > visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are > welcome :-) >> >> >> Best, >> >> Peter >> >> >>> >>> Cheers, >>> Armin >>> >> > >
Re: Ruta - MARKFAST
Am 30.06.2014 14:58, schrieb armin.weg...@bka.bund.de: > Hi, Peter! > > I got that. I restricted MARKFAST on segments. It works just nearly perfect. How does MARKFAST match things? Using > > Document{->MARKFAST(MyType, { "a", "b", "a b" }); hehe... I didn't even remember that this is possible. I will open an issue for string lists. The normal application of MARKFAST is with word lists: WORDLIST MyList = 'somelist.txt'; Document{-> MARKFAST(MyType, MyList)}; ... whereas the file somelists.txt contains something like: a b a b Files with endings "twl" and "mtwl" are for compiled dictionaries. Just to mention: The usage of characters (in the word list) that are filtered when applying the dictionary lookup may cause unexpected behavior because the algorithm may choose the wrong subtree. I happened once in our applications until now. Best, Peter > > on > > a b > > yields > > "a b" and "b" but not "a". > > I would like to have "a" as well. Can this be done? > > Buy the way: I love Ruta.apply(). That is exactly what I needed. > > Thanks, > Armin > > > -Ursprüngliche Nachricht- > Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] > Gesendet: Montag, 30. Juni 2014 12:51 > An: user@uima.apache.org > Betreff: Re: Ruta - MARKFAST > > Hi, > > Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de: >> Hello! >> >> On which annotation type does MARFKAST work? > > It is applied on the annotations, on which the rule element of the action matched. > > Document{-> MARKFAST(...)}; > ... causes a dictionary lookup on the complete document. > > Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate dictionary lookup on each of the matched sentences (e.g., no inter-sentence annotations). > > >> Can I restrict MARKFAST to a single annotation Type, say my own token type? > > No, but there is an issue that includes this functionality. > > UIMA-3775: Fast multi token dictionary matching on feature values > > The idea is the apply the dictionary lookup on sequences feature values (e.g., lemmas). If the feature represents the covered text, then this would also support your use case. The issue is not top priority right now, but if you want, then I can try to include it in the next release (August). > >> It would be nice to restrict a ruta script to a set of annotations by >> giving that set of annotations > explicitly, like >> >> Document{-> INPUT(Token, Organization, Location)}; > > UIMA Ruta follows a different strategy, e.g., compared to JAPE and its input specification. The availability and visibility of annotations is not type-based but coverage-based. This enables the easy specification of complex patterns, but also complicates the things sometimes. If one type is set to invisible (FILTERTYPE), then all annotations of this type and all covered annotations of other types are invisible. > > The MARKFAST action operates on the RutaStream and thus is lookup is sensitive to the filtering setting. For example, the lookup ignored whitespaces, breaks and markup using the default settings. By extending the set of filtered types, you can also change the behavior of the dictionary lookup. However, mind that annotations covered by one of the types are also not accessible by the dictionary. > >> >> All other annotations should be ignored. Is there a way to do this in > Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? > > Yes, but it depends on the actual occurrences of types in your document. > The easiest way is to filter the types of the annotations that cover the positions that should be skipped. It's not easy to give a generic solution for this. > > An example: > Your tokenizer creates annotations for words and numbers, but not for punctuation marks, and you want to apply the dictionary lookup only for sequences of token annotations skipping punctuation marks. > > Document{-> FILTERTYPE(PM)}; > Document{-> MARKFAST(...)}; > > > There are plans to extend and modify the concept of accessibility and visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are welcome :-) > > > > Best, > > Peter > > >> >> >> Cheers, >> Armin >> > >
AW: Ruta - MARKFAST
Hi, Peter! I got that. I restricted MARKFAST on segments. It works just nearly perfect. How does MARKFAST match things? Using Document{->MARKFAST(MyType, { "a", "b", "a b" }); on a b yields "a b" and "b" but not "a". I would like to have "a" as well. Can this be done? Buy the way: I love Ruta.apply(). That is exactly what I needed. Thanks, Armin -Ursprüngliche Nachricht- Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] Gesendet: Montag, 30. Juni 2014 12:51 An: user@uima.apache.org Betreff: Re: Ruta - MARKFAST Hi, Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de: > Hello! > > On which annotation type does MARFKAST work? It is applied on the annotations, on which the rule element of the action matched. Document{-> MARKFAST(...)}; ... causes a dictionary lookup on the complete document. Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate dictionary lookup on each of the matched sentences (e.g., no inter-sentence annotations). > Can I restrict MARKFAST to a single annotation Type, say my own token type? No, but there is an issue that includes this functionality. UIMA-3775: Fast multi token dictionary matching on feature values The idea is the apply the dictionary lookup on sequences feature values (e.g., lemmas). If the feature represents the covered text, then this would also support your use case. The issue is not top priority right now, but if you want, then I can try to include it in the next release (August). > It would be nice to restrict a ruta script to a set of annotations by > giving that set of annotations explicitly, like > > Document{-> INPUT(Token, Organization, Location)}; UIMA Ruta follows a different strategy, e.g., compared to JAPE and its input specification. The availability and visibility of annotations is not type-based but coverage-based. This enables the easy specification of complex patterns, but also complicates the things sometimes. If one type is set to invisible (FILTERTYPE), then all annotations of this type and all covered annotations of other types are invisible. The MARKFAST action operates on the RutaStream and thus is lookup is sensitive to the filtering setting. For example, the lookup ignored whitespaces, breaks and markup using the default settings. By extending the set of filtered types, you can also change the behavior of the dictionary lookup. However, mind that annotations covered by one of the types are also not accessible by the dictionary. > > All other annotations should be ignored. Is there a way to do this in Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? Yes, but it depends on the actual occurrences of types in your document. The easiest way is to filter the types of the annotations that cover the positions that should be skipped. It's not easy to give a generic solution for this. An example: Your tokenizer creates annotations for words and numbers, but not for punctuation marks, and you want to apply the dictionary lookup only for sequences of token annotations skipping punctuation marks. Document{-> FILTERTYPE(PM)}; Document{-> MARKFAST(...)}; There are plans to extend and modify the concept of accessibility and visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are welcome :-) Best, Peter > > > Cheers, > Armin > pgpq34lmv1zxF.pgp Description: PGP signature
Re: Ruta - MARKFAST
Hi, Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de: > Hello! > > On which annotation type does MARFKAST work? It is applied on the annotations, on which the rule element of the action matched. Document{-> MARKFAST(...)}; ... causes a dictionary lookup on the complete document. Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate dictionary lookup on each of the matched sentences (e.g., no inter-sentence annotations). > Can I restrict MARKFAST to a single annotation Type, say my own token type? No, but there is an issue that includes this functionality. UIMA-3775: Fast multi token dictionary matching on feature values The idea is the apply the dictionary lookup on sequences feature values (e.g., lemmas). If the feature represents the covered text, then this would also support your use case. The issue is not top priority right now, but if you want, then I can try to include it in the next release (August). > It would be nice to restrict a ruta script to a set of annotations by giving > that set of annotations explicitly, like > > Document{-> INPUT(Token, Organization, Location)}; UIMA Ruta follows a different strategy, e.g., compared to JAPE and its input specification. The availability and visibility of annotations is not type-based but coverage-based. This enables the easy specification of complex patterns, but also complicates the things sometimes. If one type is set to invisible (FILTERTYPE), then all annotations of this type and all covered annotations of other types are invisible. The MARKFAST action operates on the RutaStream and thus is lookup is sensitive to the filtering setting. For example, the lookup ignored whitespaces, breaks and markup using the default settings. By extending the set of filtered types, you can also change the behavior of the dictionary lookup. However, mind that annotations covered by one of the types are also not accessible by the dictionary. > > All other annotations should be ignored. Is there a way to do this in Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? Yes, but it depends on the actual occurrences of types in your document. The easiest way is to filter the types of the annotations that cover the positions that should be skipped. It's not easy to give a generic solution for this. An example: Your tokenizer creates annotations for words and numbers, but not for punctuation marks, and you want to apply the dictionary lookup only for sequences of token annotations skipping punctuation marks. Document{-> FILTERTYPE(PM)}; Document{-> MARKFAST(...)}; There are plans to extend and modify the concept of accessibility and visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are welcome :-) Best, Peter > > > Cheers, > Armin >
Ruta - MARKFAST
Hello! On which annotation type does MARFKAST work? Can I restrict MARKFAST to a single annotation Type, say my own token type? It would be nice to restrict a ruta script to a set of annotations by giving that set of annotations explicitly, like Document{-> INPUT(Token, Organization, Location)}; All other annotations should be ignored. Is there a way to do this in Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? Cheers, Armin pgpQ6A8Ri0Uqd.pgp Description: PGP signature
Re: AW: Ruta - MARKFAST
On 5/23/2013 9:03 AM, armin.weg...@bka.bund.de wrote: > Hello Jörn, > > absolutely right. But for now I'm still a nooby. That's why I'm asking so > much. Sometimes, noobies make better contributions, because they write for other noobies :-). I would encourage you to contribute, anyways. You can mark up your contribution with little tags like etc. to indicate you're not sure an whoever integrates your patch in should pay more attention. -Marshall > > Cheers, > Armin > > > > -Ursprüngliche Nachricht- > Von: Jörn Kottmann [mailto:kottm...@gmail.com] > Gesendet: Donnerstag, 23. Mai 2013 14:24 > An: user@uima.apache.org > Betreff: Re: Ruta - MARKFAST > > On 05/23/2013 01:19 PM, Peter Klügl wrote: >> That is the official documentation. An up-to-date version that >> describes the new features since 2.0.0 can be found in the trunk. >> >> I know that there are many passages and section that need to be added >> or improved, but it is hard to find enough time for it. > Another way to improve the documentation is to contribute patches for it, if > you use a specific feature of Ruta and know it well enough, just take 10 > minutes, write some documentation, open a jira issue and attach the patch to > it. > > Jörn > >
AW: Ruta - MARKFAST
Hello Jörn, absolutely right. But for now I'm still a nooby. That's why I'm asking so much. Cheers, Armin -Ursprüngliche Nachricht- Von: Jörn Kottmann [mailto:kottm...@gmail.com] Gesendet: Donnerstag, 23. Mai 2013 14:24 An: user@uima.apache.org Betreff: Re: Ruta - MARKFAST On 05/23/2013 01:19 PM, Peter Klügl wrote: > That is the official documentation. An up-to-date version that > describes the new features since 2.0.0 can be found in the trunk. > > I know that there are many passages and section that need to be added > or improved, but it is hard to find enough time for it. Another way to improve the documentation is to contribute patches for it, if you use a specific feature of Ruta and know it well enough, just take 10 minutes, write some documentation, open a jira issue and attach the patch to it. Jörn
Re: Ruta - MARKFAST
On 05/23/2013 01:19 PM, Peter Klügl wrote: That is the official documentation. An up-to-date version that describes the new features since 2.0.0 can be found in the trunk. I know that there are many passages and section that need to be added or improved, but it is hard to find enough time for it. Another way to improve the documentation is to contribute patches for it, if you use a specific feature of Ruta and know it well enough, just take 10 minutes, write some documentation, open a jira issue and attach the patch to it. Jörn
Re: AW: AW: Ruta - MARKFAST
Hi, On 23.05.2013 13:06, armin.weg...@bka.bund.de wrote: > Hello Peter, > > Now that I understand it, it's a nice feature. > > By the way, where can I find a good documentation of Ruta? I only know of > http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html > That is the official documentation. An up-to-date version that describes the new features since 2.0.0 can be found in the trunk. I know that there are many passages and section that need to be added or improved, but it is hard to find enough time for it. There is ongoing work by others to improve the description of the java integration for uses cases in part of speech tagging, and we are planning to provide screencasts for the Ruta Workbench. Are there any specific passages that should be improved or added? I also easily forget to add important information (since I implemented it). > and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description > would be appreciated. This wiki refers to the old version hosted at sourceforge and should not be referred to. Best, Peter > Thanks, > Armin > > -Ursprüngliche Nachricht- > Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] > Gesendet: Mittwoch, 22. Mai 2013 15:09 > An: user@uima.apache.org > Betreff: Re: AW: Ruta - MARKFAST > > Hi, > > yes this example won't work without changes, because the word list is > sensitive to white spaces, e.g., you distinguish between "n.C." and "n. > C.". I know this sound like a bug, but it is rather a feature. > > In order to solve your problem you could either remove all spaces in your > word list, you could add "n.Chr." and "v.Chr." (without space) to your word > list, or you could retain the spaces before calling MARKFAST (Document{-> > RETAINTYPE(SPACE)};) > > The short explanation for this is that the action and the word list won't see > any spaces with the default filtering settings, thus they check on a > candidate like "n.Chr". However, in the trie, there is no "h" > in that path without space before the "C". > > Best, > > Peter > > On 22.05.2013 10:52, armin.weg...@bka.bund.de wrote: >> Hi Peter, >> >> your example does work perfectly fine. But try this as word list and input >> document: >> >> nach Christus >> nach der Zeitenwende >> n. C. >> n.C. >> nC. >> n. Chr. >> n. d. Z. >> n.d.Z. >> unserer Zeit >> unserer Zeitrechnung >> u. Z. >> u.Z. >> v. C. >> v.C. >> vC. >> v. Chr. >> v. d. Z. >> v.d.Z. >> vor Christus >> vor der Zeitenwende >> vor unserer Zeitrechnung >> v. u. Z. >> v.u.Z. >> >> "n. Chr." and "v. Chr." are not recognized. Do you have the same result? >> >> Cheers, >> Armin >> >> >> -Ursprüngliche Nachricht- >> Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] >> Gesendet: Dienstag, 21. Mai 2013 19:58 >> An: user@uima.apache.org >> Betreff: Re: Ruta - MARKFAST >> >> Hi, >> >> On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote: >>> Hello! >>> >>> Is there any possibility to match strings like >>> >>> nC. >>> v. Chr. >>> >>> with MARKFAST? >> Yes. Did you observe any problems? I just tested it with: >> >> Wordlist: >> nC. >> v. Chr. >> >> Input document: >> nC. >> v. Chr. >> n C . >> v . Chr. >> >> Script: >> PACKAGE uima.ruta.tests; >> WORDLIST testList = 'test.txt'; >> DECLARE Test; >> Document{->MARKFAST(Test, testList)}; >> >> ... creates four annotations of type test. >> >> Best, >> >> Peter >> >> >> >>> Cheers, >>> Armin
AW: AW: Ruta - MARKFAST
Hello Peter, Now that I understand it, it's a nice feature. By the way, where can I find a good documentation of Ruta? I only know of http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description would be appreciated. Thanks, Armin -Ursprüngliche Nachricht- Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] Gesendet: Mittwoch, 22. Mai 2013 15:09 An: user@uima.apache.org Betreff: Re: AW: Ruta - MARKFAST Hi, yes this example won't work without changes, because the word list is sensitive to white spaces, e.g., you distinguish between "n.C." and "n. C.". I know this sound like a bug, but it is rather a feature. In order to solve your problem you could either remove all spaces in your word list, you could add "n.Chr." and "v.Chr." (without space) to your word list, or you could retain the spaces before calling MARKFAST (Document{-> RETAINTYPE(SPACE)};) The short explanation for this is that the action and the word list won't see any spaces with the default filtering settings, thus they check on a candidate like "n.Chr". However, in the trie, there is no "h" in that path without space before the "C". Best, Peter On 22.05.2013 10:52, armin.weg...@bka.bund.de wrote: > Hi Peter, > > your example does work perfectly fine. But try this as word list and input > document: > > nach Christus > nach der Zeitenwende > n. C. > n.C. > nC. > n. Chr. > n. d. Z. > n.d.Z. > unserer Zeit > unserer Zeitrechnung > u. Z. > u.Z. > v. C. > v.C. > vC. > v. Chr. > v. d. Z. > v.d.Z. > vor Christus > vor der Zeitenwende > vor unserer Zeitrechnung > v. u. Z. > v.u.Z. > > "n. Chr." and "v. Chr." are not recognized. Do you have the same result? > > Cheers, > Armin > > > -Ursprüngliche Nachricht- > Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] > Gesendet: Dienstag, 21. Mai 2013 19:58 > An: user@uima.apache.org > Betreff: Re: Ruta - MARKFAST > > Hi, > > On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote: >> Hello! >> >> Is there any possibility to match strings like >> >> nC. >> v. Chr. >> >> with MARKFAST? > Yes. Did you observe any problems? I just tested it with: > > Wordlist: > nC. > v. Chr. > > Input document: > nC. > v. Chr. > n C . > v . Chr. > > Script: > PACKAGE uima.ruta.tests; > WORDLIST testList = 'test.txt'; > DECLARE Test; > Document{->MARKFAST(Test, testList)}; > > ... creates four annotations of type test. > > Best, > > Peter > > > >> Cheers, >> Armin
Re: AW: Ruta - MARKFAST
Hi, yes this example won't work without changes, because the word list is sensitive to white spaces, e.g., you distinguish between "n.C." and "n. C.". I know this sound like a bug, but it is rather a feature. In order to solve your problem you could either remove all spaces in your word list, you could add "n.Chr." and "v.Chr." (without space) to your word list, or you could retain the spaces before calling MARKFAST (Document{-> RETAINTYPE(SPACE)};) The short explanation for this is that the action and the word list won't see any spaces with the default filtering settings, thus they check on a candidate like "n.Chr". However, in the trie, there is no "h" in that path without space before the "C". Best, Peter On 22.05.2013 10:52, armin.weg...@bka.bund.de wrote: > Hi Peter, > > your example does work perfectly fine. But try this as word list and input > document: > > nach Christus > nach der Zeitenwende > n. C. > n.C. > nC. > n. Chr. > n. d. Z. > n.d.Z. > unserer Zeit > unserer Zeitrechnung > u. Z. > u.Z. > v. C. > v.C. > vC. > v. Chr. > v. d. Z. > v.d.Z. > vor Christus > vor der Zeitenwende > vor unserer Zeitrechnung > v. u. Z. > v.u.Z. > > "n. Chr." and "v. Chr." are not recognized. Do you have the same result? > > Cheers, > Armin > > > -Ursprüngliche Nachricht- > Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] > Gesendet: Dienstag, 21. Mai 2013 19:58 > An: user@uima.apache.org > Betreff: Re: Ruta - MARKFAST > > Hi, > > On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote: >> Hello! >> >> Is there any possibility to match strings like >> >> nC. >> v. Chr. >> >> with MARKFAST? > Yes. Did you observe any problems? I just tested it with: > > Wordlist: > nC. > v. Chr. > > Input document: > nC. > v. Chr. > n C . > v . Chr. > > Script: > PACKAGE uima.ruta.tests; > WORDLIST testList = 'test.txt'; > DECLARE Test; > Document{->MARKFAST(Test, testList)}; > > ... creates four annotations of type test. > > Best, > > Peter > > > >> Cheers, >> Armin
AW: Ruta - MARKFAST
Hi Peter, your example does work perfectly fine. But try this as word list and input document: nach Christus nach der Zeitenwende n. C. n.C. nC. n. Chr. n. d. Z. n.d.Z. unserer Zeit unserer Zeitrechnung u. Z. u.Z. v. C. v.C. vC. v. Chr. v. d. Z. v.d.Z. vor Christus vor der Zeitenwende vor unserer Zeitrechnung v. u. Z. v.u.Z. "n. Chr." and "v. Chr." are not recognized. Do you have the same result? Cheers, Armin -Ursprüngliche Nachricht- Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] Gesendet: Dienstag, 21. Mai 2013 19:58 An: user@uima.apache.org Betreff: Re: Ruta - MARKFAST Hi, On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote: > Hello! > > Is there any possibility to match strings like > > nC. > v. Chr. > > with MARKFAST? Yes. Did you observe any problems? I just tested it with: Wordlist: nC. v. Chr. Input document: nC. v. Chr. n C . v . Chr. Script: PACKAGE uima.ruta.tests; WORDLIST testList = 'test.txt'; DECLARE Test; Document{->MARKFAST(Test, testList)}; ... creates four annotations of type test. Best, Peter > Cheers, > Armin
Re: Ruta - MARKFAST
Hi, On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote: > Hello! > > Is there any possibility to match strings like > > nC. > v. Chr. > > with MARKFAST? Yes. Did you observe any problems? I just tested it with: Wordlist: nC. v. Chr. Input document: nC. v. Chr. n C . v . Chr. Script: PACKAGE uima.ruta.tests; WORDLIST testList = 'test.txt'; DECLARE Test; Document{->MARKFAST(Test, testList)}; ... creates four annotations of type test. Best, Peter > Cheers, > Armin
Ruta - MARKFAST
Hello! Is there any possibility to match strings like nC. v. Chr. with MARKFAST? Cheers, Armin