[jira] [Comment Edited] (UIMA-2758) TextMarker: Provide support for tree structures and parse trees in rule language

JIRA Mon, 13 May 2013 03:17:23 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655866#comment-13655866
 ]


Peter Klügl edited comment on UIMA-2758 at 5/13/13 10:16 AM:
-------------------------------------------------------------

There is one thing I keep thinking about:
How does the feature match influence the sequential matching? Or better: Is 
there only one reasonable interpretation of the sequential matching.

Here's an example to talk about (the test case I am using right now to develop 
that stuff):

Input text:
{noformat}
Peter Kluegl, Joern Kottmann, Marshall Schor
{noformat}

Rules:
{noformat}
PACKAGE org.apache.uima.ruta;

//A = full name
//B = last name
//C = first name
DECLARE Annotation D(STRING ds);
DECLARE D C(INT ci, BOOLEAN cb);
DECLARE D B(C bc);
DECLARE Annotation A(B ab, C ac);

INT count;
CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb" 
= false)} CW{ -> 
    GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
C{REGEXP("M.*") -> SETFEATURE("cb", true)};
(CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
{noformat}

So, if I write a rule like:

{noformat}
(A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
{noformat}

... then on what should the wildcard (#) match?
Right now, only the annotation, which is actually used in the sequential 
matching, determines the possible annotations of the next rule element. 
Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first 
name. One would maybe expect that the rule element matches on the complete name 
since the rule element starts with "A", which refers to the complete name. The 
rule itself would now create an annotation covering "Peter Kluegl, Joern 
Kottmann, Marshall" (missing " Schor"). Is this behavior 
intelligible/reasonable to others?

Well, I can imagine that there are use cases where not the match of the 
feature-annotation is important, but the match of the annotation containing the 
feature.

I could think of a solution introducing some operator, which enables navigation 
in the feature structure for different parts of a rule element, but that seems 
not really straight forward.

My favorite solution would be a simple extension: Allow deep feature checks as 
conditions.

{noformat}
(A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});
{noformat}

Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted 
as an IS condition combined with a FEATURE condition.

Are there any opinions about this problem? I should search for some real use 
cases with parse trees.
                
      was (Author: pkluegl):
    There is one thing I keep thinking about:
How does the feature match influence the sequential matching? Or better: Is 
there only one reasonable interpretation of the sequential matching.

Here's an example to talk about (the test case I am using right now to develop 
that stuff):

Input text:
{noformat}
Peter Kluegl, Joern Kottmann, Marshall Schor
{noformat}

Rules:
{noformat}
PACKAGE org.apache.uima.ruta;

//A = full name
//B = last name
//C = first name
DECLARE Annotation D(STRING ds);
DECLARE D C(INT ci, BOOLEAN cb);
DECLARE D B(C bc);
DECLARE Annotation A(B ab, C ac);

INT count;
CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb" 
= false)} CW{ -> 
    GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
C{REGEXP("M.*") -> SETFEATURE("cb", true)};
(CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
{noformat}

So, if I write a rule like:

{noformat}
(A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
{noformat}

... then on what should the wildcard (#) match?
Right now, only the annotation, which is actually used in the sequential 
matching, determines the possible annotations of the next rule element. 
Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first 
name. One would maybe expect that the rule element matches on the complete name 
since the rule element starts with "A", which refers to the complete name. The 
rule itself would now create an annotation covering "Peter Kluegl, Joern 
Kottmann, Marshall" (missing " Schor"). Is this behavior 
intelligible/reasonable to others?

Well, I can imagine that there are use cases where not the match of the 
feature-annotation is important, but the match of the annotation containing the 
feature.

I could think of a solution introducing some operator, which enables navigation 
in the feature structure for different parts of a rule element, but that seems 
not really straight forward.

My favorite solution would be a simple extension: Allow deep feature checks as 
conditions.

(A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});

Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted 
as an IS condition combined with a FEATURE condition.

Are there any opinions about this problem? I should search for some real use 
cases with parse trees.

                  
> TextMarker: Provide support for tree structures and parse trees in rule 
> language
> --------------------------------------------------------------------------------
>
>                 Key: UIMA-2758
>                 URL: https://issues.apache.org/jira/browse/UIMA-2758
>             Project: UIMA
>          Issue Type: New Feature
>          Components: ruta
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>
> Manipulation of features which refer to annotations and matching on simple 
> features is currently supported, but matching on the complex values of some 
> feature is not. A first step can be something like (Type Person with feature 
> "title" of type Annotation):
> Person.title;
> This rule matches on all annotations, which are values of features of 
> annotations of the type Person.
> This new language element can also be used for syntactic sugar when checking 
> primitive feature values:
> Person.begin=0 (A Person annotation, which starts a offset 0)
> This can only be a first step towards supporting tree structures. Maybe there 
> is no way around something for explicitly and directly referring to certain 
> annotations (which is not possible right now, but is done by using the type).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (UIMA-2758) TextMarker: Provide support for tree structures and parse trees in rule language

Reply via email to