[
https://issues.apache.org/jira/browse/UIMA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655866#comment-13655866
]
Peter Klügl edited comment on UIMA-2758 at 5/13/13 10:16 AM:
-------------------------------------------------------------
There is one thing I keep thinking about:
How does the feature match influence the sequential matching? Or better: Is
there only one reasonable interpretation of the sequential matching.
Here's an example to talk about (the test case I am using right now to develop
that stuff):
Input text:
{noformat}
Peter Kluegl, Joern Kottmann, Marshall Schor
{noformat}
Rules:
{noformat}
PACKAGE org.apache.uima.ruta;
//A = full name
//B = last name
//C = first name
DECLARE Annotation D(STRING ds);
DECLARE D C(INT ci, BOOLEAN cb);
DECLARE D B(C bc);
DECLARE Annotation A(B ab, C ac);
INT count;
CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb"
= false)} CW{ ->
GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
C{REGEXP("M.*") -> SETFEATURE("cb", true)};
(CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
{noformat}
So, if I write a rule like:
{noformat}
(A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
{noformat}
... then on what should the wildcard (#) match?
Right now, only the annotation, which is actually used in the sequential
matching, determines the possible annotations of the next rule element.
Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first
name. One would maybe expect that the rule element matches on the complete name
since the rule element starts with "A", which refers to the complete name. The
rule itself would now create an annotation covering "Peter Kluegl, Joern
Kottmann, Marshall" (missing " Schor"). Is this behavior
intelligible/reasonable to others?
Well, I can imagine that there are use cases where not the match of the
feature-annotation is important, but the match of the annotation containing the
feature.
I could think of a solution introducing some operator, which enables navigation
in the feature structure for different parts of a rule element, but that seems
not really straight forward.
My favorite solution would be a simple extension: Allow deep feature checks as
conditions.
{noformat}
(A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});
{noformat}
Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted
as an IS condition combined with a FEATURE condition.
Are there any opinions about this problem? I should search for some real use
cases with parse trees.
was (Author: pkluegl):
There is one thing I keep thinking about:
How does the feature match influence the sequential matching? Or better: Is
there only one reasonable interpretation of the sequential matching.
Here's an example to talk about (the test case I am using right now to develop
that stuff):
Input text:
{noformat}
Peter Kluegl, Joern Kottmann, Marshall Schor
{noformat}
Rules:
{noformat}
PACKAGE org.apache.uima.ruta;
//A = full name
//B = last name
//C = first name
DECLARE Annotation D(STRING ds);
DECLARE D C(INT ci, BOOLEAN cb);
DECLARE D B(C bc);
DECLARE Annotation A(B ab, C ac);
INT count;
CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb"
= false)} CW{ ->
GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
C{REGEXP("M.*") -> SETFEATURE("cb", true)};
(CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
{noformat}
So, if I write a rule like:
{noformat}
(A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
{noformat}
... then on what should the wildcard (#) match?
Right now, only the annotation, which is actually used in the sequential
matching, determines the possible annotations of the next rule element.
Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first
name. One would maybe expect that the rule element matches on the complete name
since the rule element starts with "A", which refers to the complete name. The
rule itself would now create an annotation covering "Peter Kluegl, Joern
Kottmann, Marshall" (missing " Schor"). Is this behavior
intelligible/reasonable to others?
Well, I can imagine that there are use cases where not the match of the
feature-annotation is important, but the match of the annotation containing the
feature.
I could think of a solution introducing some operator, which enables navigation
in the feature structure for different parts of a rule element, but that seems
not really straight forward.
My favorite solution would be a simple extension: Allow deep feature checks as
conditions.
(A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});
Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted
as an IS condition combined with a FEATURE condition.
Are there any opinions about this problem? I should search for some real use
cases with parse trees.
> TextMarker: Provide support for tree structures and parse trees in rule
> language
> --------------------------------------------------------------------------------
>
> Key: UIMA-2758
> URL: https://issues.apache.org/jira/browse/UIMA-2758
> Project: UIMA
> Issue Type: New Feature
> Components: ruta
> Reporter: Peter Klügl
> Assignee: Peter Klügl
>
> Manipulation of features which refer to annotations and matching on simple
> features is currently supported, but matching on the complex values of some
> feature is not. A first step can be something like (Type Person with feature
> "title" of type Annotation):
> Person.title;
> This rule matches on all annotations, which are values of features of
> annotations of the type Person.
> This new language element can also be used for syntactic sugar when checking
> primitive feature values:
> Person.begin=0 (A Person annotation, which starts a offset 0)
> This can only be a first step towards supporting tree structures. Maybe there
> is no way around something for explicitly and directly referring to certain
> annotations (which is not possible right now, but is done by using the type).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira