Re: Getting OOM issue while running ruta script with large texts

2022-08-10 Thread Peter Klügl

Hi,


sorry, I haven't had time yet, but I will have a look at it this weekend.


Best


Peter


Am 10.08.2022 um 09:00 schrieb Md Azaz Ali:

Hi Dr. Peter,

Did you had time to see the gist or any more information that is required
please let me know

Also recently we found a text that throws stackoverflow in local system as
for the same ruta script  shared here is the text , its part of a email
that gets translated to base 64 may be some special symbol or any thing
else present in email body. but with this it breaks but still we are not
sure the actual texts that caused it oom in prod


IkVtcGxveWVlIE5hbWUiLCJFbXBsb3llZSBDb2RlIiwiRW1wbG95ZWUgU3RhdHVzIiwi
RHJpdmVyIElEIiwiRGVmaWNpZW5jeSBDb3VudCIsIkxvY2F0aW9uIiwiRm9ybSIsIkZp
bGUgTmFtZSIsIk1lc3NhZ2UgTnVtYmVyIiwiTWVzc2FnZSBEZXNjcmlwdGlvbiIsIkV4
cGlyZWQgRGF0ZSINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNz
IiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIkFN
QkdDIiwiQW1hem9uIEJhY2tncm91bmQgU3RhdHVzIEZpbGUiLCI5MjUiLCJBbWF6b24g
QmFja2dyb3VuZCBDaGVjayBTdGF0dXMgUGVuZGluZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIkFNWk9UIiwiQW1hem9uIENvbmR1Y3RlZCBUcmFp
bmluZyBSZXF1aXJlbWVudHMiLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJl
biBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJE
ZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FTlJWIiwiRHJpdmVyIFF1YWxp
ZmljYXRpb24iLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIlJUUlRDIiwiRHJpdmVyIFF1YWxpZmljYXRpb24i
LCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2
NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGlj
cyBMTENfRElJMyIsIkFQUCIsIkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMTE3IiwiUFJF
VklPVVMgRU1QTE9ZTUVOVCBBRERSRVNTIElORk9STUFUSU9OIE1JU1NJTkcvSU5DT01Q
TEVURSIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwi
MTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FQyIs
IkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMCIsIkRvY3VtZW50IE1pc3NpbmciLCIiDQo=



Thanks

On Sun, Aug 7, 2022 at 11:59 AM Md Azaz Ali  wrote:


Hi Dr. Peter,


sorry for not being able to clarify it , i have created gist .

Below gist has address.ruta file with one one example attached to both the
rules

https://gist.github.com/azazali30/635c3b80e02908e9f8387db3fda865db


Many Thanks





On Sat, Aug 6, 2022 at 4:11 PM Peter Klügl 
wrote:


Hi,


I had a quick look at the rules. Given the examples you provided, only
the first rule matches three times, the second rule not once.

So I have to ask before I can refactor the rules: what should the rules
annotate exactly?


Best


Peter


Am 05.08.2022 um 11:44 schrieb Md Azaz Ali:

Hi   Dr. Peter Klügl,

Yes its same in stackoverflow

On Fri, Aug 5, 2022 at 12:48 PM Peter Klügl 
wrote:


Hi,


the attachements are removed by the mailing list. Are the rules the

same

as in the StackOverflow question?


Best,


Peter

Am 04.08.2022 um 20:15 schrieb Md Azaz Ali:

HI Dr. Peter,

Here are some example addresses that the attached ruta is able to

find.

There is two ruta rules which is used one is for multiline addresses
and other for single line addresses.
Also we are using some prepopulated EntityType Annotation with feature
location_indicator



//Annotation EntityType with feature location_indicator is already
present = Georgia

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022



//EntityType with feature location_indicator is already present =
Cambridge;MA;U.S.A

One Rogers Street
Cambridge, MA
02142-1209
U.S.A

//EntityType with feature location_indicator is already present  =
Cambridge, MA, U.S.A.
1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


//EntityType with feature location_indicator is already present =

U.S.A

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022
U.S.A

//EntityType with feature location_indicator is already present =

U.S.A

My new address is
8 Commerce Dr.
Suite 3B
Bedford, NH 03110
U.S.A


//EntityType with feature location_indicator is already present  =

U.S.A.

400 Renaissance Center Drive
Suite 2600
Detroit, MI 48243
U.S.A.

//EntityType with feature location_indicator is already present  =

U.S.A.

125 Wacker Drive
Suite 300
Chicago, IL 60606
U.S.A.

//EntityType with feature location_indicator is already present  =

U.S.A.


1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


222 West Las Colinas Blvd. Suite 1650 North Tower Millennium Center
Irving, TX 75039 U.S.A.


Block No. 9A, Pritech Park SEZ, RMZ Ecospace Internal Road, Bellandur,
Bengaluru, Karnataka 560103, India



Thanks & Regard
Md Azaz Ali

On Thu, Aug 4, 2022 at 5:42 PM Peter Klügl 
wrote:

  Hi,


  yes, I can suggest some refactored rules.

  However, I do not know the common input data and the use cases.

It is

  easier for me if I have a few representative input snippets I can
  test
  the r

Re: Getting OOM issue while running ruta script with large texts

2022-08-06 Thread Peter Klügl

Hi,


I had a quick look at the rules. Given the examples you provided, only 
the first rule matches three times, the second rule not once.


So I have to ask before I can refactor the rules: what should the rules 
annotate exactly?



Best


Peter


Am 05.08.2022 um 11:44 schrieb Md Azaz Ali:

Hi   Dr. Peter Klügl,

Yes its same in stackoverflow

On Fri, Aug 5, 2022 at 12:48 PM Peter Klügl 
wrote:


Hi,


the attachements are removed by the mailing list. Are the rules the same
as in the StackOverflow question?


Best,


Peter

Am 04.08.2022 um 20:15 schrieb Md Azaz Ali:

HI Dr. Peter,

Here are some example addresses that the attached ruta is able to find.

There is two ruta rules which is used one is for multiline addresses
and other for single line addresses.
Also we are using some prepopulated EntityType Annotation with feature
location_indicator



//Annotation EntityType with feature location_indicator is already
present = Georgia

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022



//EntityType with feature location_indicator is already present =
Cambridge;MA;U.S.A

One Rogers Street
Cambridge, MA
02142-1209
U.S.A

//EntityType with feature location_indicator is already present  =
Cambridge, MA, U.S.A.
1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


//EntityType with feature location_indicator is already present = U.S.A

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022
U.S.A

//EntityType with feature location_indicator is already present = U.S.A

My new address is
8 Commerce Dr.
Suite 3B
Bedford, NH 03110
U.S.A


//EntityType with feature location_indicator is already present  = U.S.A.

400 Renaissance Center Drive
Suite 2600
Detroit, MI 48243
U.S.A.

//EntityType with feature location_indicator is already present  = U.S.A.

125 Wacker Drive
Suite 300
Chicago, IL 60606
U.S.A.

//EntityType with feature location_indicator is already present  = U.S.A.


1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


222 West Las Colinas Blvd. Suite 1650 North Tower Millennium Center
Irving, TX 75039 U.S.A.


Block No. 9A, Pritech Park SEZ, RMZ Ecospace Internal Road, Bellandur,
Bengaluru, Karnataka 560103, India



Thanks & Regard
Md Azaz Ali

On Thu, Aug 4, 2022 at 5:42 PM Peter Klügl 
wrote:

 Hi,


 yes, I can suggest some refactored rules.

 However, I do not know the common input data and the use cases. It is
 easier for me if I have a few representative input snippets I can
 test
 the refactored rules against. Can you provide some (artifical)
 example
 text snippets?


 Best


 Peter


 Am 04.08.2022 um 11:33 schrieb Md Azaz Ali:
 > Hi Dr. Peter Klügl,
 >
 >
 > 1. We are not able to upgrade to Ruta 3.x because we have to
 upgrade
 > uimaj-core also and to do that we need an stable version of
 cleartk-ml
 > (which is not working with uima 3.x).
 >
 > 2. using PARAM_MAX_RULE_MATCHES , PARAM_MAX_RULE_ELEMENT_MATCHES we
 > are not sure what numer will be good enough.
 >
 > 3. if possible can you please suggest an improved version for above
 > script it will really help.
 >
 > 4. Also getting a new build from main-v2 is also not possible
 because
 > we can only use ga versions which are available directly in mvn
 repository
 >
 > I am attaching one script file if you can suggest the possible
 > improvements it will be really helpful.
 >
 > Note: I am new to ruta and these ruta scripts are written by old
 > developers in my company who are not associated with the company
 any
 > more.
 >
     > Many Thanks
 >
 >
 > On Tue, Aug 2, 2022 at 8:35 PM Peter Klügl
 
 > wrote:
 >
 > Hi,
 >
 >
 > thanks for the pointer. I added an answer.
 >
 > Let me know if you want to have more information about the rule
 > refactoring.
 >
 >
 > In my experience, the life of a Ruta rule engineer is much
 easier
 > if the
 > Ruta rules stay small :-)
 >
 >
 > Best,
 >
 >
 > Peter
 >
 >
 > Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:
 > >
 >


https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts

 > >
 > >
 > >
 > > Many Thanks
 > >
 > --
 > Dr. Peter Klügl
 > Head of Text Mining/Machine Learning
 >
 > Averbis GmbH
 > Salzstr. 15
 > 79098 Freiburg
 > Germany
 >
 > Fon: +49 761 708 394 0
 > Fax: +49 761 708 394 10
 > Email: peter.klu...@averbis.com
 > Web: https://

Re: Getting OOM issue while running ruta script with large texts

2022-08-05 Thread Peter Klügl

Hi,


the attachements are removed by the mailing list. Are the rules the same 
as in the StackOverflow question?



Best,


Peter

Am 04.08.2022 um 20:15 schrieb Md Azaz Ali:

HI Dr. Peter,

Here are some example addresses that the attached ruta is able to find.

There is two ruta rules which is used one is for multiline addresses 
and other for single line addresses.
Also we are using some prepopulated EntityType Annotation with feature 
location_indicator




//Annotation EntityType with feature location_indicator is already 
present = Georgia


11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022



//EntityType with feature location_indicator is already present = 
Cambridge;MA;U.S.A


One Rogers Street
Cambridge, MA
02142-1209
U.S.A

//EntityType with feature location_indicator is already present  = 
Cambridge, MA, U.S.A.

1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


//EntityType with feature location_indicator is already present = U.S.A

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022
U.S.A

//EntityType with feature location_indicator is already present = U.S.A

My new address is
8 Commerce Dr.
Suite 3B
Bedford, NH 03110
U.S.A


//EntityType with feature location_indicator is already present  = U.S.A.

400 Renaissance Center Drive
Suite 2600
Detroit, MI 48243
U.S.A.

//EntityType with feature location_indicator is already present  = U.S.A.

125 Wacker Drive
Suite 300
Chicago, IL 60606
U.S.A.

//EntityType with feature location_indicator is already present  = U.S.A.


1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


222 West Las Colinas Blvd. Suite 1650 North Tower Millennium Center 
Irving, TX 75039 U.S.A.



Block No. 9A, Pritech Park SEZ, RMZ Ecospace Internal Road, Bellandur, 
Bengaluru, Karnataka 560103, India




Thanks & Regard
Md Azaz Ali

On Thu, Aug 4, 2022 at 5:42 PM Peter Klügl  
wrote:


Hi,


yes, I can suggest some refactored rules.

However, I do not know the common input data and the use cases. It is
easier for me if I have a few representative input snippets I can
test
the refactored rules against. Can you provide some (artifical)
example
text snippets?


Best


Peter


Am 04.08.2022 um 11:33 schrieb Md Azaz Ali:
> Hi Dr. Peter Klügl,
>
>
> 1. We are not able to upgrade to Ruta 3.x because we have to
upgrade
> uimaj-core also and to do that we need an stable version of
cleartk-ml
> (which is not working with uima 3.x).
>
> 2. using PARAM_MAX_RULE_MATCHES , PARAM_MAX_RULE_ELEMENT_MATCHES we
> are not sure what numer will be good enough.
>
> 3. if possible can you please suggest an improved version for above
> script it will really help.
>
> 4. Also getting a new build from main-v2 is also not possible
because
> we can only use ga versions which are available directly in mvn
repository
>
> I am attaching one script file if you can suggest the possible
> improvements it will be really helpful.
>
> Note: I am new to ruta and these ruta scripts are written by old
> developers in my company who are not associated with the company
any
> more.
    >
> Many Thanks
>
>
> On Tue, Aug 2, 2022 at 8:35 PM Peter Klügl

> wrote:
>
>     Hi,
>
>
>     thanks for the pointer. I added an answer.
>
>     Let me know if you want to have more information about the rule
>     refactoring.
>
>
>     In my experience, the life of a Ruta rule engineer is much
easier
>     if the
>     Ruta rules stay small :-)
>
>
>     Best,
>
>
>     Peter
>
>
>     Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:
>     >
>

https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts
>     >
>     >
>     >
>     > Many Thanks
>     >
>     --
>     Dr. Peter Klügl
>     Head of Text Mining/Machine Learning
>
>     Averbis GmbH
>     Salzstr. 15
>     79098 Freiburg
>     Germany
>
>     Fon: +49 761 708 394 0
>     Fax: +49 761 708 394 10
>     Email: peter.klu...@averbis.com
>     Web: https://averbis.com
>
>     Headquarters: Freiburg im Breisgau
>     Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl

Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email:peter.klu...@averbis.com
<mailto:ema

Re: Getting OOM issue while running ruta script with large texts

2022-08-04 Thread Peter Klügl

Hi,


yes, I can suggest some refactored rules.

However, I do not know the common input data and the use cases. It is 
easier for me if I have a few representative input snippets I can test 
the refactored rules against. Can you provide some (artifical) example 
text snippets?



Best


Peter


Am 04.08.2022 um 11:33 schrieb Md Azaz Ali:

Hi Dr. Peter Klügl,


1. We are not able to upgrade to Ruta 3.x because we have to upgrade 
uimaj-core also and to do that we need an stable version of cleartk-ml 
(which is not working with uima 3.x).


2. using PARAM_MAX_RULE_MATCHES , PARAM_MAX_RULE_ELEMENT_MATCHES we 
are not sure what numer will be good enough.


3. if possible can you please suggest an improved version for above 
script it will really help.


4. Also getting a new build from main-v2 is also not possible because 
we can only use ga versions which are available directly in mvn repository


I am attaching one script file if you can suggest the possible 
improvements it will be really helpful.


Note: I am new to ruta and these ruta scripts are written by old 
developers in my company who are not associated with the company any 
more.


Many Thanks


On Tue, Aug 2, 2022 at 8:35 PM Peter Klügl  
wrote:


Hi,


thanks for the pointer. I added an answer.

Let me know if you want to have more information about the rule
refactoring.


In my experience, the life of a Ruta rule engineer is much easier
if the
Ruta rules stay small :-)


Best,


Peter


Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:
>

https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts
>
>
>
> Many Thanks
>
    -- 
    Dr. Peter Klügl

Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email:peter.klu...@averbis.com
Web:https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: Getting OOM issue while running ruta script with large texts

2022-08-02 Thread Peter Klügl

Hi,


thanks for the pointer. I added an answer.

Let me know if you want to have more information about the rule refactoring.


In my experience, the life of a Ruta rule engineer is much easier if the 
Ruta rules stay small :-)



Best,


Peter


Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:

https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts



Many Thanks


--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Can we reuse Seeds and RutaBasic annotations for multiple script executed sequentially on same cas object

2022-08-02 Thread Peter Klügl

Hi,


thanks for the pointer.


I added an answer. Let me know if you have more question or if the 
answer is not sufficient.



Best


Peter

Am 31.07.2022 um 21:08 schrieb Md Azaz Ali:

https://stackoverflow.com/questions/73162285/can-we-reuse-seeds-and-rutabasic-annotations-for-multiple-script-executed-sequen



Many Thanks


--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta, Maven and Custom Engines

2022-04-21 Thread Peter Klügl

Hi,

ENGINE loads a xml descriptor. In simple Ruta projects they are located 
in descriptor/utils which would result in 'ENGINE utils.HtmlAnnotator;'. 
Loading the xml descriptor from the dependency or even the installed 
OSGi bundle should also work. I would have to check what the exact 
problem is in the project configuration.


However, if you want to load an analysis engine from the maven 
dependencies, it is simpler to initialize it directly with uimaFIT 
instead of the xml descriptor. This would result in something like 
'UIMAFIT org.apache.uima.ruta.engine.HtmlAnnotator;'.


It should not look relative to the current package, but I have to check 
that.


Best,

Peter


Am 21.04.2022 um 10:20 schrieb Michael B.:

Hi Peter!

Bonus question: how do I get Engines and Type Systems included through 
maven dependencies to resolve correctly in scripts?


For instance:

PACKAGE de.miba;

ENGINE org.apache.uima.ruta.engine.HtmlAnnotator;
ENGINE org.apache.uima.ruta.engine.HtmlConverter;
TYPESYSTEM utils.HtmlTypeSystem;
TYPESYSTEM utils.SourceDocumentInformation;

Document{-> RETAINTYPE(SPACE,BREAK)};
Document{-> EXEC(HtmlAnnotator)};

Document { -> CONFIGURE(HtmlConverter, "inputView" = "_InitialView",
    "outputView" = "plain"),
  EXEC(HtmlConverter)};

All the ruta dependencies are on the class path, but the script can't 
find it ("error: "org.apache.uima.ruta.engine.HtmlAnnotator" not 
found.). I assume it's looking relative to current package (de.miba.*)?



Regards,

Michael

Am 14/04/2022 um 14:46 schrieb Peter Klügl:

Hi,

I had a quick look at your ruta-test project.

1. There was problem for the paths config of your annotator 
description. If you use a java/maven project, it is easier to just 
use the classpath instead of the Ruta paths. So I removed 
'descriptor:descriptor' from the pom and moved the descriptor folder 
to src/main/resources.


2. I added 'TYPESYSTEM descriptor.typeSystemDescriptor;' to your 
TestProjectMain.ruta (after the UIMAFIT import) and deleted the JCas 
cover class in 'src/main/java' as it is now generated and located at 
'target/...'


3. I added a '.addToIndexes()' in TestAnno so that the annotation is 
available and added 'EXEC(TestAnno);' in TestProjectMain.ruta so that 
the annotator is executed. The keyword UIMAFIT is just an import of 
the analysis engine.


4. I added 'Assert.assertEquals(1, JCasUtil.select(jcas, 
SomeType.class).size());' to RutaTest.java in order to check if the 
annotation is created.


If you want, I can open a pull request with the changes for your 
project or for a fork.


Best

Peter


Am 13.04.2022 um 16:06 schrieb Michael B.:

Hi everyone!

I'm at a loss here. Trying to create a basic RUTA project, but I 
just can't get it to work. I'd like to include a custom Engine (Java 
code!) that needs a few maven dependencies. I just can't get RUTA 
(via ENGINE/TYPESYSTEM) to properly include and initialize Java 
annotators.


What I tried so far:

- Create project using archetype

- Add code for custom AE

- Add TS and AE descriptors in ./descriptor folder, run JCASGen

- Add descriptor:descriptor to pom (in the 
ruta-maven-plugin section). ./descriptor folder is configured 
successfully as part of build path


Issues:

- When using ENGINE aeDescriptor in script, project doesn't run:

Caused by: java.io.FileNotFoundException: class path resource 
[aeDescriptor.xml] cannot be resolved to URL because it does not exist


- TYPESYSTEM typeSystemDescriptor <- works in editor, but maven pom 
/ task throws an error (no details, assuming name/path can't be 
resolved)


- UIMAFIT de.miba.TestAnno works (and can be executed in ruta 
script), but Types/TS missing, so runtime errors.


Any hints or an example project where this works would be highly 
appreciated! I've uploaded the skeleton to 
https://github.com/mybyte/ruta-test



Cheers,

Michael




--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email:peter.klu...@averbis.com
Web:https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: Ruta, Maven and Custom Engines

2022-04-14 Thread Peter Klügl

Hi,


here's the pull request: https://github.com/mybyte/ruta-test/pull/1


Yes, more examples would be good, but it also depends on the dev 
environment and build approaches.



Do you know these projects?

https://github.com/averbis/ruta-pear-archetype

https://github.com/averbis/hello-world-ruta-pear


When the PEAR overhead is ignored, then they provide also some examples 
for project configuration.



Best


Peter


Am 14.04.2022 um 14:51 schrieb Michael B.:

Hi Peter!

I'd appreciate that pull request. Overall, I feel like we need more 
skeletons / example projects for RUTA. Maybe it's worth putting a few 
of those together...


Cheers,

Michael

Am 14/04/2022 um 14:46 schrieb Peter Klügl:

Hi,

I had a quick look at your ruta-test project.

1. There was problem for the paths config of your annotator 
description. If you use a java/maven project, it is easier to just 
use the classpath instead of the Ruta paths. So I removed 
'descriptor:descriptor' from the pom and moved the descriptor folder 
to src/main/resources.


2. I added 'TYPESYSTEM descriptor.typeSystemDescriptor;' to your 
TestProjectMain.ruta (after the UIMAFIT import) and deleted the JCas 
cover class in 'src/main/java' as it is now generated and located at 
'target/...'


3. I added a '.addToIndexes()' in TestAnno so that the annotation is 
available and added 'EXEC(TestAnno);' in TestProjectMain.ruta so that 
the annotator is executed. The keyword UIMAFIT is just an import of 
the analysis engine.


4. I added 'Assert.assertEquals(1, JCasUtil.select(jcas, 
SomeType.class).size());' to RutaTest.java in order to check if the 
annotation is created.


If you want, I can open a pull request with the changes for your 
project or for a fork.


Best

Peter


Am 13.04.2022 um 16:06 schrieb Michael B.:

Hi everyone!

I'm at a loss here. Trying to create a basic RUTA project, but I 
just can't get it to work. I'd like to include a custom Engine (Java 
code!) that needs a few maven dependencies. I just can't get RUTA 
(via ENGINE/TYPESYSTEM) to properly include and initialize Java 
annotators.


What I tried so far:

- Create project using archetype

- Add code for custom AE

- Add TS and AE descriptors in ./descriptor folder, run JCASGen

- Add descriptor:descriptor to pom (in the 
ruta-maven-plugin section). ./descriptor folder is configured 
successfully as part of build path


Issues:

- When using ENGINE aeDescriptor in script, project doesn't run:

Caused by: java.io.FileNotFoundException: class path resource 
[aeDescriptor.xml] cannot be resolved to URL because it does not exist


- TYPESYSTEM typeSystemDescriptor <- works in editor, but maven pom 
/ task throws an error (no details, assuming name/path can't be 
resolved)


- UIMAFIT de.miba.TestAnno works (and can be executed in ruta 
script), but Types/TS missing, so runtime errors.


Any hints or an example project where this works would be highly 
appreciated! I've uploaded the skeleton to 
https://github.com/mybyte/ruta-test



Cheers,

Michael




--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta, Maven and Custom Engines

2022-04-14 Thread Peter Klügl

Hi,

I had a quick look at your ruta-test project.

1. There was problem for the paths config of your annotator description. 
If you use a java/maven project, it is easier to just use the classpath 
instead of the Ruta paths. So I removed 'descriptor:descriptor' from the 
pom and moved the descriptor folder to src/main/resources.


2. I added 'TYPESYSTEM descriptor.typeSystemDescriptor;' to your 
TestProjectMain.ruta (after the UIMAFIT import) and deleted the JCas 
cover class in 'src/main/java' as it is now generated and located at 
'target/...'


3. I added a '.addToIndexes()' in TestAnno so that the annotation is 
available and added 'EXEC(TestAnno);' in TestProjectMain.ruta so that 
the annotator is executed. The keyword UIMAFIT is just an import of the 
analysis engine.


4. I added 'Assert.assertEquals(1, JCasUtil.select(jcas, 
SomeType.class).size());' to RutaTest.java in order to check if the 
annotation is created.


If you want, I can open a pull request with the changes for your project 
or for a fork.


Best

Peter


Am 13.04.2022 um 16:06 schrieb Michael B.:

Hi everyone!

I'm at a loss here. Trying to create a basic RUTA project, but I just 
can't get it to work. I'd like to include a custom Engine (Java code!) 
that needs a few maven dependencies. I just can't get RUTA (via 
ENGINE/TYPESYSTEM) to properly include and initialize Java annotators.


What I tried so far:

- Create project using archetype

- Add code for custom AE

- Add TS and AE descriptors in ./descriptor folder, run JCASGen

- Add descriptor:descriptor to pom (in the 
ruta-maven-plugin section). ./descriptor folder is configured 
successfully as part of build path


Issues:

- When using ENGINE aeDescriptor in script, project doesn't run:

Caused by: java.io.FileNotFoundException: class path resource 
[aeDescriptor.xml] cannot be resolved to URL because it does not exist


- TYPESYSTEM typeSystemDescriptor <- works in editor, but maven pom / 
task throws an error (no details, assuming name/path can't be resolved)


- UIMAFIT de.miba.TestAnno works (and can be executed in ruta script), 
but Types/TS missing, so runtime errors.


Any hints or an example project where this works would be highly 
appreciated! I've uploaded the skeleton to 
https://github.com/mybyte/ruta-test



Cheers,

Michael




--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email:peter.klu...@averbis.com
Web:https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: Ruta, Maven and Custom Engines

2022-04-13 Thread Peter Klügl

Hi,


I just wanted to let you know that I will take a closer look at it as 
soon as possible, hopefully tomorrow.



Best,


Peter


Am 13.04.2022 um 16:06 schrieb Michael B.:

Hi everyone!

I'm at a loss here. Trying to create a basic RUTA project, but I just 
can't get it to work. I'd like to include a custom Engine (Java code!) 
that needs a few maven dependencies. I just can't get RUTA (via 
ENGINE/TYPESYSTEM) to properly include and initialize Java annotators.


What I tried so far:

- Create project using archetype

- Add code for custom AE

- Add TS and AE descriptors in ./descriptor folder, run JCASGen

- Add descriptor:descriptor to pom (in the 
ruta-maven-plugin section). ./descriptor folder is configured 
successfully as part of build path


Issues:

- When using ENGINE aeDescriptor in script, project doesn't run:

Caused by: java.io.FileNotFoundException: class path resource 
[aeDescriptor.xml] cannot be resolved to URL because it does not exist


- TYPESYSTEM typeSystemDescriptor <- works in editor, but maven pom / 
task throws an error (no details, assuming name/path can't be resolved)


- UIMAFIT de.miba.TestAnno works (and can be executed in ruta script), 
but Types/TS missing, so runtime errors.


Any hints or an example project where this works would be highly 
appreciated! I've uploaded the skeleton to 
https://github.com/mybyte/ruta-test



Cheers,

Michael




--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: RUTA: NPE in RutaLiteralMatcher

2021-07-27 Thread Peter Klügl
Hi Erik,


which version of Ruta do you use? It looks like you use an old version
in your maven dependencies, in which this bug hasn't been fixed yet.


Best,


Peter

Am 27.07.2021 um 12:06 schrieb Erik Fäßler:
> Hi everyone,
>
> I have been having an issue with literal matching for quite some time now. In 
> the past, I just replaced all literal matching with word lists. In my current 
> script, however, this would be rather cumbersome.
> The error looks like this:
>
> java.lang.NullPointerException: null
>   at 
> org.apache.uima.ruta.rule.RutaLiteralMatcher.getAnnotation(RutaLiteralMatcher.java:72)
>   at 
> org.apache.uima.ruta.rule.RutaLiteralMatcher.getMatchingAnnotations(RutaLiteralMatcher.java:62)
>   at 
> org.apache.uima.ruta.rule.RutaLiteralMatcher.getMatchingAnnotations(RutaLiteralMatcher.java:37)
>   at 
> org.apache.uima.ruta.rule.RutaRuleElement.getAnchors(RutaRuleElement.java:52)
>   at 
> org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:60)
>   at 
> org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:87)
>   at 
> org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:77)
>   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:65)
>   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:56)
>   at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:38)
>   at 
> org.apache.uima.ruta.block.RutaScriptBlock.apply(RutaScriptBlock.java:72)
>   at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:56)
>   at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:612)
>   ... 11 common frames omitted
>
> So all I know is that something is null. But I have no idea what it could be. 
> I cannot even reproduce the issue in the Eclipse Workbench: The XMI files 
> that cause the error work without issue in Eclipse. The error only occurs 
> when running the CPE with the Ruta component that I created using the Maven 
> plugin.
> Hence, I don’t even know what part of my script is causing the error. I have 
> rules like this:
>
> (WS|"("|"/") (Greek|NUM|Roman) {->MARK(Specifier)} (WS|"("|"/“);
>
> which I think could be the cause. But this is based on the fact that I could 
> fix the error with word lists until now. Hints and ideas on how to circumvent 
> this issue are appreciated.
>
> Best,
>
> Erik


Re: RUTA : ForEach block - ignoring Annotations

2021-05-05 Thread Peter Klügl
Hi,

 

yes, the FOREACH block will skip the second line because it starts with
a whitespace. All normal matching condition will ignore it, too.


Ruta applies a coverage based (in)visibility concept and this
implementation need to able to handle overlapping annotations in a
symmetric way.

In practice this means that all annotations are invisible and will be
ignored that start or end with something invisible. In your example,
whitespaces are still invisible (filtered) and thus the second line is
invisible for the matching condition. This sounds unreasonable, but it
is really important.

There are many options for you to avoid this problem depending on the
overall application and use case.

- you could make whitespaces visible if they matter

Something like (not tested);

DECLARE Test;

ADDRETAINTYPE(WS);

BLOCK(ForEach) Line{}{
    W+{->Test};
}

REMOVERETAINTYPE(WS);

- you could adapt your regex

- you could trim the line annotations

Something like (not tested):

DECLARE Line;
"[^\\r\\n]+" -> Line;

ADDRETAINTYPE(WS);
Line{-> TRIM(WS)};
REMOVERETAINTYPE(WS);

Line{->SHIFT(Line,1,2)} BR;

 

 

Best

 

Peter

Am 04.05.2021 um 14:26 schrieb Michael Bach:
> Hi!
>
> I’m struggling a bit with the usage of ForEach blocks. I’ve written a few 
> rules for structure looking like this:
>
> Document{-> RETAINTYPE(BREAK)};
>
> DECLARE BR;
> "\\r?\\n" -> BR;
>
> DECLARE Line;
> "[^\\r\\n]+" -> Line;
> Line{->SHIFT(Line,1,2)} BR;
>
> DECLARE Empty_Line;
> "\\r?\\n[ ]*(\\r?\\n)" -> 1=Empty_Line;
>
> DECLARE After_Empty,Before_Empty;
> Line{->Before_Empty} Empty_Line;
> Empty_Line Line{->After_Empty};
>
> DECLARE Paragraph;
> Line+{-PARTOF(Paragraph)->Paragraph};
>
>
> Seems to do exactly what I want, but it seems that for some reason, 
> ForEach-Blocks „skip“ some of the Lines. For instance, when a line starts 
> with a leading SPACE, it is being skipped.
>
> For instance, given this script:
>
> DECLARE Test;
> BLOCK(ForEach) Line{}{
> W+{->Test};
> }
>
> And this input:
>
> This will match
>   This won’t
> But this will
> This too
>
>
> Any hints why the ForEach block might be skipping the second line?
>
> Cheers,
> Michael

-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Ruta Workbench LayerInstantiationException

2021-03-16 Thread Peter Klügl
Hi,


the UIMA Ruta Workbench has problems with newer Eclipse (> 4.8?) and
Java (11) versions for some time now. We will fix the problem in the
upcoming releases (which will require to increase the min required
version of Eclipse).


There are several workarounds for this problem concerning the current
UIMA Ruta Workbench versions. I created a short description for some
Ruta users and I wanted to share this:


https://docs.google.com/document/d/11GnvmJfmHD-QsmCCS4F2-ueQAhIaDZPINz565kMMi-g


Best


Peter



-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Default value of lexer/seeder in RutaEngine

2021-02-25 Thread Peter Klügl
Hi,


I am thinking about changing the default value of the seeder parameter
in the RutaEngine from DefaultSeeder to TextSeeder. I think TextSeeder
(no MARKUP annotations) is a better default value in most use cases.


Are there opinions on that?


Best,


Peter


-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: RUTA: Copy features into new annotation

2021-01-14 Thread Peter Klügl
Hi,

Am 13.01.2021 um 12:04 schrieb Erik Fäßler:
>> :-)
>>
>> I was looking the the Person definition there, but didn't find matching
>> features.
> Oh, sorry, I did not articulate myself clear enough: In my real case work I 
> don’t have Person annotations but Organism annotation which are derived from 
> ConceptMentions. And ConceptMentions have the resourceEntryList feature.
> I apologize for the confusion. For the matter of simplicity I made up the 
> Person example in my initial E-Mail and now and bit me in the a** ;-)


Ah no, all fine. When I prepared the first exemplary rules, I wondered
about the type range of the id feature. As I assumed you were using the
JCore type systems as your question indicated some non-trivial real
world use case. I have a quick look (1min) if I can identify the range
for the ids Person annotations in these type systems but failed... so I
simply used String as range :-)



>>
>> In general, I find it better to create additional annotations for
>> complex structures instead of merging the information in an existing
>> annotation, simple due to maintainability reasons. It's easier to
>> inspect unintended behavior several month later that way ...
> Great, I am with you here, feels like I did it the recommended way.
>>
>>> So actually, there is one step missing now: I need to replace merged 
>>> Organism entries with the covering OrganismEnumeration (Person and 
>>> PersonEnumeration in my example).
>>
>> I am not sure what the input/output behavior should be. Don't you have
>> two separate annotations and isn't the enum the merge of the semantic?
> You’re right. And I think I will leave it this way. I’m thinking too 
> complicated.
>>
>> Labels and inlined rules are the two best language features I added in
>> Ruta, really useful. Let me know if you want to learn more about them
>> and if there is information missing in the documentation.
>>
> No, it’s all great. It’s just not that trivial and, honestly, while I had a 
> look at the base syntax, I came quite far with cherry-picking from the 
> documentation what I needed. I did not study the syntax in great detail 
> because I could always make it work with doing it. That’s my bad. But this 
> time I didn’t know where to start so I asked. And you helped me a lot, thank 
> you so much.
> RUTA is a great tool. I only have trouble of a regular exceptions in the 
> Eclipse Workbench but I got used to it and I have probably combined wrong 
> versions of RUTA and Eclipse or something.


There were several reports of problems lately which had their source in
different Java versions used.



Best,


Peter



>
> Thank you!
>
> Erik
>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> construction so this enumeration-annotation-merging might actually be easy 
>>> and I just don’t see it.
>>>
>>> Thank you so much!
>>>
>>> Erik
>>>
>>>> On 10. Jan 2021, at 16:21, Peter Klügl  wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>>>> Hi Peter and thank you once again for your excellent support of your 
>>>>> excellent RUTA software!
>>>> You are welcome :-)
>>>>
>>>>
>>>>> Your second example was very much what I needed. Thank you so far!
>>>>> I have one last bump in the road:
>>>>>
>>>>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>>>>> uima.cas.String. So, one Person annotation might have multiple IDs per 
>>>>> the type system.
>>>>> The ID type has a feature “entryId”.
>>>>> In my particular case I actually have only one entry in the id array. 
>>>>> Still, I need to access this entry somehow.
>>>>> Is that at all possible in RUTA? I would need something like
>>>>>
>>>>>
>>>>> // collect ids of all covered Persons using an extra list
>>>>> STRINGLIST ids;
>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>   <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ 
>>>>> <http://p.id/>>[0].entryId)};};
>>>>>
>>>>> This does not seem to be covered by the FeatureExpression grammar in 
>>>>> RUTA. Is there a work around? Otherwise I will have to solve it some 
>>>>> other way.
>>>> there are actual "indexed" expressions like Person.ids[0] but it's 

Re: RUTA: Copy features into new annotation

2021-01-13 Thread Peter Klügl
Hi,

Am 11.01.2021 um 08:13 schrieb Erik Fäßler:
> Hello Peter,
>
> thank you again that you put so much thought it in.
> I am a bit embarrassed to say that I already had the solution in my script 
> when I just opened Eclipse again. I think I just didn’t really try it because 
> I didn’t expect it to work.
> This works now, thank you!
>
> In order to better understand my case, here some details:
> My type system is indeed the JCoRe TS.
> And I am not working with Person annotations but with Organism mentions, but 
> I wanted to keep things simple. Organism mentions are extended from 
> ConceptMentions:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125
>  
> <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125>
>
> Those have the “resourceEntryList” feature which is an FSArray of 
> ResourceEntry instances:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44
>  
> <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44>
>
> The ResourceEntry, finally, has a feature named “entryId”.


:-)

I was looking the the Person definition there, but didn't find matching
features.



>
> The entryIds are set in a separate annotator (JCoRe Linneaus annotator). And 
> my goal is to connect multiple mentions of Organisms ("mouse and human”) into 
> a single expression for a downstream annotator that is checking the Organism 
> mentions directly in front of gene mentions. However, in the example “mouse 
> and human” it would always detect “human” but disregard “mouse”. So I thought 
> I would create new annotations to “merge” the originals.
>
> Is this how you would do it? Alternatively, I could also have merged the two 
> existing Organism annotations. I would even prefer that. But I would not know 
> how to organize this so that, in the end, instead of two single Organism 
> annotations with two resourceEntries there would be only one Organism 
> annotation with both resourceEntries.


It hard to tell without taking a closer look.

In general, I find it better to create additional annotations for
complex structures instead of merging the information in an existing
annotation, simple due to maintainability reasons. It's easier to
inspect unintended behavior several month later that way ...


>
> So actually, there is one step missing now: I need to replace merged Organism 
> entries with the covering OrganismEnumeration (Person and PersonEnumeration 
> in my example).


I am not sure what the input/output behavior should be. Don't you have
two separate annotations and isn't the enum the merge of the semantic?

If you can give me an example, I'll write a rule for you :-)



> Is there a way to do this better in RUTA? I have to say that I have not yet 
> fully penetrated the syntax, I would have not been able to come up with the
> // collect ids of all covered Persons using a extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
> <-{p:Person{-> ADD(ids,p.ids.personId)};};


Labels and inlined rules are the two best language features I added in
Ruta, really useful. Let me know if you want to learn more about them
and if there is information missing in the documentation.



Best,


Peter



>
> construction so this enumeration-annotation-merging might actually be easy 
> and I just don’t see it.
>
> Thank you so much!
>
> Erik
>
>> On 10. Jan 2021, at 16:21, Peter Klügl  wrote:
>>
>> Hi,
>>
>>
>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>> Hi Peter and thank you once again for your excellent support of your 
>>> excellent RUTA software!
>>
>> You are welcome :-)
>>
>>
>>> Your second example was very much what I needed. Thank you so far!
>>> I have one last bump in the road:
>>>
>>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>>> uima.cas.String. So, one Person annotation might have multiple IDs per the 
>>> type system.
>>> The ID type has a feature “entryId”.
>>> In my particular case I actually have only one entry in the id array. 
>>> Still, I need to access this entry somehow.
>>> Is that at all possible in RUTA? I would need something like
>>>
>>>
>>> // collect ids of all cover

Re: RUTA: Copy features into new annotation

2021-01-10 Thread Peter Klügl
Hi,


Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
> Hi Peter and thank you once again for your excellent support of your 
> excellent RUTA software!


You are welcome :-)


>
> Your second example was very much what I needed. Thank you so far!
> I have one last bump in the road:
>
> My Person#id feature is an FSArray with ID annotations instead of a plain 
> uima.cas.String. So, one Person annotation might have multiple IDs per the 
> type system.
> The ID type has a feature “entryId”.
> In my particular case I actually have only one entry in the id array. Still, 
> I need to access this entry somehow.
> Is that at all possible in RUTA? I would need something like
>
>
> // collect ids of all covered Persons using an extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
> <-{p:Person{-> ADD(ids,p.id <http://p.id/>[0].entryId)};};
>
> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is 
> there a work around? Otherwise I will have to solve it some other way.


there are actual "indexed" expressions like Person.ids[0] but it's not
yet an "official" and stable feature. However, I think it's not even
necessary.


Is your typesystem available somewhere? JCoRe?

Is this a solution for you?


PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (FSArray ids);
DECLARE PersonId (String personId);
DECLARE PersonEnumeration (StringArray personIds);

// mock annotations
"Trump" -> Person;
"Biden" -> Person;
"and" -> CC;
INT counter = 1;
p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
counter = counter +1, p.ids = pid};

(COMMA? @CC){-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect ids of all covered Persons using a extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.ids.personId)};};


Best,


Peter



>
> Many thanks,
>
> Erik
>
>> On 7. Jan 2021, at 10:47, Peter Klügl  wrote:
>>
>> Hi Erik,
>>
>>
>> it depends on how you want to represent the information of the ids of
>> the covered Person annotations. You somehow need to represent the values
>> in the PersonEnumeration annotation. I assume that the ID feature of
>> Person is uima.cas.String? PersonEnumeration could either use one String
>> Feature, a StringArray feature or a FSArray feature (pointing to the
>> Person annotation which provide the IDs).
>>
>> Here are two examples:
>>
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (STRING id);
>> DECLARE PersonEnumeration (FSArray persons);
>>
>> // mock annotations
>> "Trump" -> Person ("id" = "1");
>> "Biden" -> Person ("id" = "2");
>> "and" -> CC;
>>
>> COMMA? @CC{-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect all covered Persons
>> pe:PersonEnumeration{-> pe.persons = Person};
>>
>> 
>>
>> 
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (STRING id);
>> DECLARE PersonEnumeration (StringArray personIds);
>>
>> // mock annotations
>> "Trump" -> Person ("id" = "1");
>> "Biden" -> Person ("id" = "2");
>> "and" -> CC;
>>
>> COMMA? @CC{-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect ids of all covered Persons using an extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>> <-{p:Person{-> ADD(ids,p.id)};};
>>
>>
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>> Hello everyone (and a happy new year :-)),
>>>
>>> I have been working on the following issue: Whenever there is conjunction 
>>> in text of two entities (e.g. [...]Biden and Trump ran for president […]) I 
>>> create a new annotation spanning both entities and the conjunction ([Biden 
>>> and Trump]_coordination). I can do this fine.
>>> However, my entities - Biden and Trump - also have the ID feature. The new 
>>> annotation should receive both IDs from the Biden and Trump

Re: RUTA: Copy features into new annotation

2021-01-07 Thread Peter Klügl
Hi Erik,


it depends on how you want to represent the information of the ids of
the covered Person annotations. You somehow need to represent the values
in the PersonEnumeration annotation. I assume that the ID feature of
Person is uima.cas.String? PersonEnumeration could either use one String
Feature, a StringArray feature or a FSArray feature (pointing to the
Person annotation which provide the IDs).

Here are two examples:


PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (STRING id);
DECLARE PersonEnumeration (FSArray persons);

// mock annotations
"Trump" -> Person ("id" = "1");
"Biden" -> Person ("id" = "2");
"and" -> CC;

COMMA? @CC{-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect all covered Persons
pe:PersonEnumeration{-> pe.persons = Person};





PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (STRING id);
DECLARE PersonEnumeration (StringArray personIds);

// mock annotations
"Trump" -> Person ("id" = "1");
"Biden" -> Person ("id" = "2");
"and" -> CC;

COMMA? @CC{-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect ids of all covered Persons using an extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.id)};};




Best,


Peter


Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
> Hello everyone (and a happy new year :-)),
>
> I have been working on the following issue: Whenever there is conjunction in 
> text of two entities (e.g. [...]Biden and Trump ran for president […]) I 
> create a new annotation spanning both entities and the conjunction ([Biden 
> and Trump]_coordination). I can do this fine.
> However, my entities - Biden and Trump - also have the ID feature. The new 
> annotation should receive both IDs from the Biden and Trump annotations. But 
> I couldn’t manage to do this.
>
> I have rules like this:
>
> (Person (
> ",” (Person)
>  ","? PennBioIEPOSTag.value=="CC"
>  Person
> ) {->MARK(PersonEnumeration)};
>
> So an enumeration of Persons are covered with a new annotation of type 
> “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID 
> features from the covered Person annotations. How can I do this?
>
> Best,
>
> Erik

-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Discussing changes to UIMA - on which list?

2020-11-11 Thread Peter Klügl
+1 user list


Peter

Am 11.11.2020 um 18:20 schrieb Richard Eckart de Castilho:
> Hi folks,
>
> I am have been working on providing UIMA with a consistent concept of how 
> annotations can relate to each other
> (e.g. overlap, follow, precede, cover, etc.) and following this on making the 
> SelectFS API of UIMAv3
> consistent with this concept. While this might sound trivial, it is actually 
> not. E.g. figuring out in
> which cases two annotations overlap if one of the annotations has a length of 
> 0 requires some
> consideration.
>
> So we have talked about this topic so far on the developer list:
>
>   
> https://lists.apache.org/thread.html/rff2b9882af077907ff1ad08e90f80a62a20efbfab587a08a5e2bc78c%40%3Cdev.uima.apache.org%3E
>
> Now I am wondering whether it would be good to instead post such topics to 
> the users list
> because it might be interesting to people here as well. Or maybe all those 
> who find this
> kind of discussion are already also subscribed to the developers list and 
> thats ok.
>
> What got me thinking into potentially moving this to the users list was that 
> at some points,
> it becomes clear that the behavior of the code (e.g. SelectFS) in some 
> (edge)-cases may not
> have received sufficient consideration in the past and may need to be changed 
> to be consistent
> with itself and with other parts of the framework (e.g. the annotation 
> relation concepts).
> This could potentially break something for a user. I believe I can make good 
> informed decisions
> when a break is so unlikely that the risk is acceptable, but I would actually 
> prefer to get
> some community feedback, e.g. to this mail:
>
>   
> https://lists.apache.org/thread.html/r640c3433db93160f77783896c182b2a8a53334434c337425040bc5d2%40%3Cdev.uima.apache.org%3E
>
> What do you think? 
>
> Would you like to see this topic and maybe similar ones in the future to be 
> discussed on the users list instead of the developers list?
>
> Cheers,
>
> -- Richard

-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: RUTA problem with extended DKPro Type System

2020-07-15 Thread Peter Klügl
(StructuredViewer.java:1205)
>     at
> org.eclipse.jface.util.OpenStrategy.fireDefaultSelectionEvent(OpenStrategy.java:251)
>     at
> org.eclipse.jface.util.OpenStrategy.access$0(OpenStrategy.java:249)
>     at
> org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:308)
>     at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:89)
>     at org.eclipse.swt.widgets.Display.sendEvent(Display.java:5676)
>     at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1423)
>     at
> org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:4935)
>     at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:4429)
>     at
> org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$5.run(PartRenderingEngine.java:1160)
>     at
> org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
>     at
> org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1049)
>     at
> org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:155)
>     at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:660)
>     at
> org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
>     at
> org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:559)
>     at
> org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:154)
>     at
> org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:150)
>     at
> org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:203)
>     at
> org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:137)
>     at
> org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:107)
>     at
> org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:401)
>     at
> org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:255)
>     at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>     at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:657)
>     at org.eclipse.equinox.launcher.Main.basicRun(Main.java:594)
>     at org.eclipse.equinox.launcher.Main.run(Main.java:1465)
>     at org.eclipse.equinox.launcher.Main.main(Main.java:1438)
> Caused by: XCASParsingException: Error parsing XCAS or XMI-CAS from
> source  at line , column : unknown type:
> de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData.
>     at
> org.apache.uima.cas.impl.XmiCasDeserializer$XmiCasDeserializerHandler.createException(XmiCasDeserializer.java:1635)
>     at
> org.apache.uima.cas.impl.XmiCasDeserializer$XmiCasDeserializerHandler.createException(XmiCasDeserializer.java:1657)
>     at
> org.apache.uima.cas.impl.XmiCasDeserializer$XmiCasDeserializerHandler.readFS(XmiCasDeserializer.java:490)
>     at
> org.apache.uima.cas.impl.XmiCasDeserializer$XmiCasDeserializerHandler.startElement(XmiCasDeserializer.java:409)
>     at
> org.apache.uima.util.XmlCasDeserializer$XmlCasDeserializerHandler.startElement(XmlCasDeserializer.java:150)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:510)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:183)
>     at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:351)
>     at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2710)
>     at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
>     at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>     at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>     at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
>     at
> org.apache.uima.util.XmlCasDeserializer.deserializeR(XmlCasDeserializer.java:111)
>     at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:366)
>     ... 144 more
>
>
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



UIMA v2 CAS deserialization in UIMA v3

2020-04-08 Thread Peter Klügl
Hi,


I currently deserialize some UIMA v2 CAS files in UIMA v3 while the
typesystems have evolved considerably.

I faced some problems, but managed to do what I wanted.


I just wanted to thank you Marshall for the great work you did and do :-)


Best,


Peter



-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: UIMA RUTA NPE in RutaLiteralMatcher (UIMA-6915)

2020-03-19 Thread Peter Klügl
Hi,


sorry for the delayed/missing response to the ticket. I was too busy,
but it was on my todo list for this week.


The bug should aready be fixed in the current snapshot version. I will
try to find the time to prepare a maintenance release ASAP.


Best,


Peter


Am 19.03.2020 um 08:53 schrieb Dominic Jehle:
> Hi,
> We've been using UIMA and Ruta in production in a machine-translation related 
> project for a while, but during the update to Ruta we've encountered the NPE 
> bug described in UIMA-6195 [https://issues.apache.org/jira/browse/UIMA-6195]. 
> It's a critical issue for our project, the bug blocks us from completing the 
> update. 
> There has been no documented Jira activity on the issue. Can the issue be 
> worked on and corrected? Is it possible to supply a patch from our side to 
> help?
> If it will be fixed, could there also be a release soon?
> Thanks!

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta: Loading ressources from classpath?

2020-03-05 Thread Peter Klügl
Ok :-)

Am 05.03.2020 um 14:41 schrieb Erik Fäßler:
> Hi again,
>
> so i found that one actually can load resources from classpath. I didn’t 
> realize that I needed to provide the path directly to the script e.g.
> WORDLIST GreekList = "ruta/resources/GreekAlphabet.txt";
> instead I tried to let the resourcePaths point to classpath locations which 
> didn’t work.
>
> So sorry to bother, I got it now :-)
>
>> On 5. Mar 2020, at 12:56, Peter Klügl  wrote:
>>
>> Hi Erik,
>>
>>
>> I thought classpath lookup should be possible. I'll check that and get
>> back to you (probably tomorrow)
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 04.03.2020 um 16:10 schrieb Erik Fäßler:
>>> I have created a Ruta script and now I want it to use as a UIMA AE. Thanks 
>>> to the Maven plugin, I created the AE descriptor which I could get to work 
>>> easily.
>>> I have packaged the whole component into a single JAR which I then use via 
>>> the Maven dependency mechanism. This works fine until I try to use a word 
>>> list. It seems that while script and descriptor files can be loaded from a 
>>> classpath location, this does not work for resources.
>>>
>>> Right now I have the impression that I would have to deliver not only file 
>>> paths but even absolute file paths as the JavaDocs suggest.
>>>
>>> Would it be possible to change this and allow a classpath lookup like for 
>>> the script file? I would really like to just package everything into a JAR 
>>> and avoid a non-portable installation due to absolute paths.
>>>
>>> Are there other possibilities to achieve portability when using word lists?
>>>
>>> Thanks,
>>>
>>> Erik
>> -- 
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta: Loading ressources from classpath?

2020-03-05 Thread Peter Klügl
Hi Erik,


I thought classpath lookup should be possible. I'll check that and get
back to you (probably tomorrow)


Best,


Peter


Am 04.03.2020 um 16:10 schrieb Erik Fäßler:
> I have created a Ruta script and now I want it to use as a UIMA AE. Thanks to 
> the Maven plugin, I created the AE descriptor which I could get to work 
> easily.
> I have packaged the whole component into a single JAR which I then use via 
> the Maven dependency mechanism. This works fine until I try to use a word 
> list. It seems that while script and descriptor files can be loaded from a 
> classpath location, this does not work for resources.
>
> Right now I have the impression that I would have to deliver not only file 
> paths but even absolute file paths as the JavaDocs suggest.
>
> Would it be possible to change this and allow a classpath lookup like for the 
> script file? I would really like to just package everything into a JAR and 
> avoid a non-portable installation due to absolute paths.
>
> Are there other possibilities to achieve portability when using word lists?
>
> Thanks,
>
> Erik

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Erratic nullpointer exceptions because feature structure has no type in Ruta

2020-02-04 Thread Peter Klügl
Hi,


thank you very much for this analysis! I am glad that your problem is
hopefully resolved.


The missing reset of the annotation-based variables was considered a bug
(UIMA-5888) and was resolved in Ruta 2.7.0.

For Ruta 2.8.0, there were several improvements for handling null
values, e.g., UIMA-5663 and UIMA-5481.


I have to check the code, but I think null as an initial value should be
the default now, at least for annotation and string variables.


Best,


Peter


Am 05.02.2020 um 00:02 schrieb Mario Juric:
> Hi,
>
> I like to follow up on this one, which I posted a while ago, since the
> cause of the issue has finally been identified.
>
> It turns out that it was caused by uninitialised Ruta annotation
> variables. Ruta doesn’t seem to allow an annotation variable to be set
> null at the point of declaration or later, so we assumed that it would
> implicitly be initialised to null during script execution for every
> new CAS, but this is not the case in Ruta 2.6.1. The variable would
> hold a reference to an annotation from the previous CAS when it hasn’t
> been assigned a new value from the new CAS, because the rule that
> makes the assignment is never fired.
>
> I have illustrated this using the attached example code where a
> CasRuntimeException is the result with Ruta 2.6.1, but the problem
> seems solved in Ruta 2.8.0. This exception could in our full setup
> sometimes trigger a NullpointerException during exception handling,
> which is what I first reported. I assume the problem would be gone in
> Ruta 3.0.0 as well.
>
> I think the matter can be considered closed, although we are not
> through with all tests yet. I wonder though if it would generally be
> useful to allow the null value to be assigned to annotation variables.
>
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>> On 11 Nov 2019, at 22:19 , Mario Juric > <mailto:m...@unsilo.ai>> wrote:
>>
>> Hi Peter,
>>
>> Ruta version is 2.7.0 with UIMA version 2.10.2 in this setup.
>>
>> We are to best of my knowledge not trying to access any internal Ruta
>> objects and there is nothing that makes any covered text assignments. 
>>
>> Cheers Mario
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 11 Nov 2019, at 11:13 , Peter Klügl >> <mailto:peter.klu...@averbis.com>> wrote:
>>>
>>> Hi,
>>>
>>>
>>> and which version of UIMA Ruta?
>>>
>>>
>>> Do you access internal Ruta objects somehow?
>>>
>>> Do you (by accident) try to assign covered text?
>>>
>>>
>>>
>>> Best,
>>>
>>>
>>> Peter
>>>
>>>
>>>
>>> Am 11.11.2019 um 10:23 schrieb Richard Eckart de Castilho:
>>>> Hi Mario,
>>>>
>>>> which version of the UIMA Java SDK are you using?
>>>>
>>>> -- Richard
>>>>
>>>>> On 11. Nov 2019, at 09:58, Mario Juric >>>> <mailto:m...@unsilo.ai>> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> A while ago we started to get some erratic null pointer exceptions
>>>>> from Ruta because the type of some feature structure element is
>>>>> null (see stack trace below). The error is not consistently
>>>>> reproducible, in fact it seldomly occurs and when reprocessing the
>>>>> document it doesn’t happen again. We therefore think there are
>>>>> some race conditions at play when running in a multithreaded
>>>>> environment as we do in production, and I was hoping that maybe
>>>>> you would get an idea what might be causing it just by looking at
>>>>> the stack trace.
>>>>>
>>>>> Cheers
>>>>> Mario
>>>
>>> -- 
>>> Dr. Peter Klügl
>>> R&D Text Mining/Machine Learning
>>>
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>>
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: peter.klu...@averbis.com <mailto:peter.klu...@averbis.com>
>>> Web: https://averbis.com
>>>
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>
>>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Uima-AS 3 ?

2020-01-07 Thread Peter Klügl
Hi,


may I ask how the relationship between UIMA-AS and UIMA-DUCC is?

I always thought that DUCC uses/builds on UIMA-AS, and there is already
DUCC v3.


Best,


Peter


Am 07.01.2020 um 15:52 schrieb Jaroslaw Cwiklik:
> Hi Matthias, I will start working on v3 release soon.
> Jerry
>
> On Tue, Jan 7, 2020 at 9:28 AM koch  wrote:
>
>> Hi
>>
>> Is it planned to release v3 of uima-as ?
>>
>> best regards,
>>
>> Matthias
>>
>>
>>
>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Erratic nullpointer exceptions because feature structure has no type in Ruta

2019-11-12 Thread Peter Klügl
odule.java:56) at 
>>> org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) at 
>>> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:318)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:412)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:271)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:412)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:271)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:412)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:271)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:412)
>>>  at 
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:271)
>>>  at 
>>> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
>>>  at 
>>> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:895)
>>>  at 
>>> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:575)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó




Re: Erratic nullpointer exceptions because feature structure has no type in Ruta

2019-11-11 Thread Peter Klügl
Hi,


and which version of UIMA Ruta?


Do you access internal Ruta objects somehow?

Do you (by accident) try to assign covered text?



Best,


Peter



Am 11.11.2019 um 10:23 schrieb Richard Eckart de Castilho:
> Hi Mario,
>
> which version of the UIMA Java SDK are you using?
>
> -- Richard
>
>> On 11. Nov 2019, at 09:58, Mario Juric  wrote:
>>
>> Hi Peter,
>>
>> A while ago we started to get some erratic null pointer exceptions from Ruta 
>> because the type of some feature structure element is null (see stack trace 
>> below). The error is not consistently reproducible, in fact it seldomly 
>> occurs and when reprocessing the document it doesn’t happen again. We 
>> therefore think there are some race conditions at play when running in a 
>> multithreaded environment as we do in production, and I was hoping that 
>> maybe you would get an idea what might be causing it just by looking at the 
>> stack trace.
>>
>> Cheers
>> Mario

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Extract text from line below/above annotated keyword using RUTA

2019-11-06 Thread Peter Klügl
Hi,


here are some quick rules. It could be solved with fewer rules and also
with better or faster rules. You need essentially a rule for detecting
the structure and a rule for assigning the semantics. The rules would
also work if you have a plain text table with more rows.


Let me know if you have questions about some parts.


Best,

Peter

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);

DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;

"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");

DECLARE Entry(Keyword keyword);

EXEC(PlainTextAnnotator, {Line,Paragraph});

ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};

SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);

INT index = 0;
BLOCK(structure) Line{}{
    ASSIGN(index, 0);
    Line{STARTSWITH(Paragraph) -> Header};
    c:Cell{-> c.column = index, index = index + 1};
}

Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
    # c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};

DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;

FOREACH(entry) Entry{}{
    entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
    entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
    entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}



Am 06.11.2019 um 12:45 schrieb Shashank Pathak:
> Hi Peter,
>
> I am trying to get information from a indented text file.
>
> Input file text:
> Underwriter's Name  Appraiser's Name/License  Appraisal
> Company Name
> Alice Wheaton   Bruce Banner  Stark
> Industries
>
> Approach:
>I am trying to annotate fixed keywords like "Underwriter's Name" and
> then go to line next to this annotated keyword.
>But I am not able to fetch UnderWriter's Name. It is giving all
> instances which are matched(Alice Wheaton  Bruce, Wheaton Bruce Banner,
> etc).
>
>
> Code :
>
> TYPESYSTEM utils.PlainTextTypeSystem;
> ENGINE utils.PlainTextAnnotator;
>
> EXEC(PlainTextAnnotator, {Line});
> ADDRETAINTYPE(WS);
> Line{->TRIM(WS)};
> REMOVERETAINTYPE(WS);
> Document{->FILTERTYPE(SPECIAL)};
>
> DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
> DECLARE UnderWriterName(String label, String value);
>
> CW{REGEXP("\\bUnderwriter") -> UnderWriterKeyword};
> CW{REGEXP("Name")->NameKeyword};
> (UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
> Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
>n:CW[1,3]{-> CREATE(UnderWriterName, "label"="UnderWriter Name",
> "value"=n.ct)};
>};
>
> Please tell me whether it is possible to achieve this using RUTA or not.
> Also share steps to get Underwriter's Name, Appraiser's Name/License and
> Appraisal Comapny Name.
> I have already posted question similar to this on stackoverflow
> https://stackoverflow.com/questions/58726610/using-ruta-get-a-data-present-in-next-line-of-annotated-keyword/58728364#58728364
>
> Thanks,
>
> Shashank Pathak
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Using DKPro Core in UIMA Ruta example

2019-11-04 Thread Peter Klügl
Hi,


I got a few requests for a running example combining Ruta and DKPro Core
components lately. Therefore, I finally took some time to update the
example project on GitHub:

https://github.com/pkluegl/ruta-examples/tree/master/ruta-german-novel-with-dkpro


The example is on GitHub and not here at Apache since it has a GPL
dependency (Stanford CoreNLP for POS Tagging).

It's just an example project highlighting how one could use DKPro Core
components in Ruta, but it is not at all a serious solution for the task
itself.

The rules are still very oldfashioned/outdated (I'll clean them up when
I have time), but the versions are compatible again, UIMA v2 only right now.


Best,


Peter


-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Question about covering annotations in Ruta match semantics

2019-10-22 Thread Peter Klügl
Hi,

Am 21.10.2019 um 21:46 schrieb Mario Juric:
> Thanks Peter,
>
> No problem with the delay. I was on vacation myself, and sometimes it is just 
> necessary to pull the plug :)
>
> I am just happy that you take the time to answer my questions, and I think 
> your answers help making sense to this. I now have some ideas that I can 
> experiment with to see what works, but it’s possible to use RutaBasic when 
> optional spaces are included in the rules, although it gets more awkward. I 
> would still prefer to avoid this and having a type-based rule-logic feature 
> would makes sense in our case. Shall I create a feature request for this?


Yes, please create a ticket. Even specifiying what should be done helps,
especially including more use cases than my own...


Best,


Peter


>
> I wouldn’t expect you to do this any time soon, but let me know if there is 
> something I could help out with when the time comes.
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>> On 18 Oct 2019, at 10:10 , Peter Klügl  wrote:
>>
>> Hi,
>>
>>
>> sorry for the delayed reply.
>>
>>
>> comments below...
>>
>>
>> Am 09.10.2019 um 22:19 schrieb Mario Juric:
>>> Hi Peter,
>>>
>>> Thanks a lot for the answer.
>>>
>>> I am still trying to wrap my head around this, and I understand the issues 
>>> at play when dealing with a generic rule engine, since I am looking at an 
>>> isolated case only. I was just thinking that in my particular case the 
>>> covering annotation starts before matching 'Dog Cat’, so why would its 
>>> ending right before Cat prevent the rule from firing? It doesn’t follow 
>>> Dog, and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be 
>>> matched either, but I understand now that it is enough that something else 
>>> being present in this area between the two rule elements is enough for the 
>>> match to fail. However, as you describe, the presence of SPACE annotations 
>>> and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching 
>>> despite the presence of the covering annotation.
>>
>> The main thing here is probably the requirement that the logic for
>> applying the visibility concept should always be symmetric, meaning it
>> should be the same regardless if the rule matches from left to right or
>> from right to left (or inside out).
>>
>> In your example, the rule matches from left to right (I assume), so that
>> behavior that the last space is not skipped is not intuitive at all.
>> However, if the rule would match for some reason from right to left,
>> e.g., because of dynamic anchoring or a manual anchor, then the
>> inference would detect a starting Covering annotation as the next
>> possible position, which is not invisible (since there is nothing at all
>> invisible). So there would actually be something that could be matched,
>> but it is not the correct type (Dog).
>>
>> I do not know if this explanation makes sense... it's easier with a
>> whiteboard ;-)
>>
>>
>>
>>> Have you ever described the implementation of the matching in some paper or 
>>> similar? I would be interested to have a look at it, but maybe it’s better 
>>> just to have a go at the code? I would certainly prefer reading a high 
>>> level abstract specification first though :)
>>
>> The last paper is the NLE journal article, which contains some high
>> level description of the algorithm. However, this is some really
>> specific functionality for a specific scenario. So, if I write a new
>> paper, it will most likely not cover this.
>>
>>
>>> Generally I cannot just trim the annotations in the real application, since 
>>> some of these whitespaces are included in the marking for various reasons. 
>>> I therefore played around with type filtering, since I was hoping that the 
>>> type filter would allow me to match the rules while ignoring any presence 
>>> of filtered types. I was again surprised to find out that filtering the 
>>> Covering type while retaining Cat and Dog would in this case just prevent 
>>> anything from being matched, because it seems to make all those text parts 
>>> invisible where the filtered types appear, no matter if they cover any 
>>> retained annotation types. So this didn’t seem to solve my problem either, 
>>> although I could of course try to mark those areas I otherwise would 
>>> consider trimming and include those in the rules like a space or filter on 
>>

Re: Question about covering annotations in Ruta match semantics

2019-10-18 Thread Peter Klügl
Hi,


sorry for the delayed reply.


comments below...


Am 09.10.2019 um 22:19 schrieb Mario Juric:
> Hi Peter,
>
> Thanks a lot for the answer.
>
> I am still trying to wrap my head around this, and I understand the issues at 
> play when dealing with a generic rule engine, since I am looking at an 
> isolated case only. I was just thinking that in my particular case the 
> covering annotation starts before matching 'Dog Cat’, so why would its ending 
> right before Cat prevent the rule from firing? It doesn’t follow Dog, and a 
> rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be matched 
> either, but I understand now that it is enough that something else being 
> present in this area between the two rule elements is enough for the match to 
> fail. However, as you describe, the presence of SPACE annotations and a rule 
> like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching despite the 
> presence of the covering annotation.


The main thing here is probably the requirement that the logic for
applying the visibility concept should always be symmetric, meaning it
should be the same regardless if the rule matches from left to right or
from right to left (or inside out).

In your example, the rule matches from left to right (I assume), so that
behavior that the last space is not skipped is not intuitive at all.
However, if the rule would match for some reason from right to left,
e.g., because of dynamic anchoring or a manual anchor, then the
inference would detect a starting Covering annotation as the next
possible position, which is not invisible (since there is nothing at all
invisible). So there would actually be something that could be matched,
but it is not the correct type (Dog).

I do not know if this explanation makes sense... it's easier with a
whiteboard ;-)



> Have you ever described the implementation of the matching in some paper or 
> similar? I would be interested to have a look at it, but maybe it’s better 
> just to have a go at the code? I would certainly prefer reading a high level 
> abstract specification first though :)


The last paper is the NLE journal article, which contains some high
level description of the algorithm. However, this is some really
specific functionality for a specific scenario. So, if I write a new
paper, it will most likely not cover this.


>
> Generally I cannot just trim the annotations in the real application, since 
> some of these whitespaces are included in the marking for various reasons. I 
> therefore played around with type filtering, since I was hoping that the type 
> filter would allow me to match the rules while ignoring any presence of 
> filtered types. I was again surprised to find out that filtering the Covering 
> type while retaining Cat and Dog would in this case just prevent anything 
> from being matched, because it seems to make all those text parts invisible 
> where the filtered types appear, no matter if they cover any retained 
> annotation types. So this didn’t seem to solve my problem either, although I 
> could of course try to mark those areas I otherwise would consider trimming 
> and include those in the rules like a space or filter on them, which I guess 
> is what you suggested. It suddenly just becomes somewhat awkward though, and 
> it may just be more clear to use RutaBasic with the rules instead.


Yes, the visibility concept in Ruta is not type-based but type
coverage-based (and I think that's really cool)

It is possible to extend the functionality to additionally support
type-based logic, but I do not know when this would be ready.

I would not recommend to use RutaBasic in the rules (I actually do not
know right now, if it would work), but if you do, then you should
probably deactivate the "empty is invisible" option.


Best,


Peter


>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>> On 9 Oct 2019, at 09:35 , Peter Klügl  wrote:
>>
>> Hi Mario,
>>
>>
>> I need to take a closer look as this is not the usual scenario :-)
>>
>>
>> However, without testing, I would assume that the second rule does not
>> match because the space between dog and cat is not "empty".
>>
>>
>> Normally, you have a complete partitioning provided by the seeding which
>> causes the RutaBasic annotations. If there are only a few annotations,
>> then there needs to be a decision if a text position is visible or not
>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
>> that the space between the annotations is ignored, but there is actually
>> no reason why Ruta should do that, as there is no information at all
>> that it should be ignored (... generic system, you might want to write
>> rules for

Re: Question about covering annotations in Ruta match semantics

2019-10-09 Thread Peter Klügl
Hi Mario,


I need to take a closer look as this is not the usual scenario :-)


However, without testing, I would assume that the second rule does not
match because the space between dog and cat is not "empty".


Normally, you have a complete partitioning provided by the seeding which
causes the RutaBasic annotations. If there are only a few annotations,
then there needs to be a decision if a text position is visible or not
(as you have no SPACE, BREAK and MARKUP annotation). You would expect
that the space between the annotations is ignored, but there is actually
no reason why Ruta should do that, as there is no information at all
that it should be ignored (... generic system, you might want to write
rules for whitespaces...). In order to avoid this problem in such
situations there is the option to define empty RutaBasics as invisible.
That are text position where no annotation begins or ends (and not
covered by annotations) AFAIR and sequential matching could not match at
all anyway. Thus, the first space is ignored, but the not the second,
because the Covering annotation ends there.


Does that make sense?


I think there are many option how your rules can become more robust, but
that depends on your complete system/pipeline. Is it an option to trim
annotations in order to avoid whitespaces at the beginning or ending? Is
it easy to identify these positions? You could create an annotation
there and filter it the type.



Best,


Peter



Am 07.10.2019 um 10:21 schrieb Mario Juric:
> Hi Peter,
>
> I have a script that is executed without any seeders for performance reasons, 
> and we don’t need the seeded annotations in that case. I have an issue 
> involving annotation elements that partially cover the rule elements of 
> interest, and I do not have a simple solution for it, so I have a question 
> about the match semantics. Let me explain it using a simple example and the 
> text ‘cat dog cat’.
>
> Assume the following 4 annotation types and 2 rule statements:
>
> DECLARE Covering;
> DECLARE Cat;
> DECLARE Dog;
> DECLARE CHASE;
> Cat Dog { -> MARK(CHASE)};
> Dog Cat { -> MARK(CHASE)};
> Assume prior to script execution the following annotations with beginnings 
> and endings:
>
> Cat[0,3[
> Dog[4,7[
> Cat[8,11[
> Covering[0,8[
>
> The Covering annotation is an example of the disturbing element that I 
> observed, which has nothing or little to do with what I am trying to match. 
> It just happens to be there for a reason unrelated to these rules, but it 
> causes the second rule not to match when I expected it. Only the first rule 
> fires, but the second will also fire when I change Covering bounds to [0,7[ 
> though.
>
> The order in which elements are matched seems very different from how they 
> are usually selected from the CAS index, where you would get 'Covering Cat 
> Dog Cat’, and with this order you would intuitvely expect both rules to 
> match. This would probably be overly simplified though, since I would not be 
> able to match adjacent covering annotations this way, so I believe matching 
> is somehow based on edge detection. Sill, I have difficulties to understand 
> why that extra covering space makes a difference.
>
> I was hoping you could provide me with some details, and I also like to know 
> what possible workaround options I have. I was considering playing around 
> with type filtering, but it would require a bit of adding/removing types to 
> be filtered during the script, so it didn’t seem as the simplest solution. 
> Ensuring that covering always aligns with the end of a token is another 
> possibility in this particular case, but I still need to add general 
> robustness to the Ruta script against these scenarios. Any feedback is mostly 
> appreciated, thanks :)
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-26 Thread Peter Klügl
Hi,


there is no reason.


I think there are two, I remember. One in the test utils, which can be
replaced, and one in the generated code of the jflex lexer, which I do
not know if it can be replaced. Maybe a newer version of jflex avoids
this? Or, it could be catched and wrapped in an exception...


Do you want to open an issue for this?


Best,


Peter

Am 25.09.2019 um 20:32 schrieb Mario Juric:
> Hi Peter,
>
> Just one more thing that came to my mind. Is there a particular reason for 
> throwing a java.lang.Error instead of an exception?
>
> Normally that is something only thrown by the JVM when it’s really impossible 
> to continue the process, e.g. out of memory, linkage errors or fatal VM 
> failures. It is normally not meant to be caught so our UIMA runtime 
> environment exits because of this, although it’s not a big issue when we run 
> the process as a service since it is then restarted automatically. I just 
> thought it’s maybe a bit drastic behaviour when only the document in question 
> needs to fail.
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>> On 23 Sep 2019, at 09:48 , Mario Juric  wrote:
>>
>> Thanks Peter,
>>
>> I will await your confirmation of the fix, but I guess we will then stick 
>> with 2.6.1 until the next Ruta release :)
>>
>> Cheers,
>> Mario
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 20 Sep 2019, at 18:09 , Peter Klügl >> <mailto:peter.klu...@averbis.com>> wrote:
>>>
>>> Hi Mario,
>>>
>>>
>>> I did not have the chance to have a look at your example yet...
>>>
>>>
>>> Most likely, this problem is already fixed in the current trunk, but I
>>> was not able to find the time for a new release. In 2.7.0, there was a
>>> small modification in the lexer rules for the seeding, which had
>>> unfortunately some unintended side effects in the generated code
>>> especially with unusual unicode characters. I'll try to verify that with
>>> your example the next days.
>>>
>>>
>>> Best,
>>>
>>>
>>> Peter
>>>
>>> Am 19.09.2019 um 12:35 schrieb Mario Juric:
>>>> Hi Peter,
>>>>
>>>> After upgrading to Ruta 2.7.0 a while ago we started getting some
>>>> errors from the SeedLexer, which we didn’t have before. It appears
>>>> related to odd unicode characters that we haven’t cleaned properly
>>>> upstream, but it is consumed by the previous version 2.6.1 where our
>>>> pipeline completes without error. I attached a small sample program
>>>> with a dummy ruta script to reproduce it.
>>>>
>>>> Which version has the correct behaviour in such cases? 2.7.0 or 2.6.1?
>>>>
>>>>
>>>> Cheers,
>>>> Mario
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> -- 
>>> Dr. Peter Klügl
>>> R&D Text Mining/Machine Learning
>>>
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>>
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: peter.klu...@averbis.com <mailto:peter.klu...@averbis.com>
>>> Web: https://averbis.com
>>>
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: a question about REGEXP

2019-09-20 Thread Peter Klügl
Hi,


the REGEXP condition is only a boolean function without "side effects".


You could solve your use case in Ruta with simple regex rules. You need
to use some rules which do not depend on annotations for matching in
order to create smaller annotations. Something like:

DECLARE ThirdDigitInNum;

NUM -> {"^\\d\\d(\\d)" -> 1 = ThirdDigitInNum;};


Best,


Peter

Am 20.09.2019 um 10:21 schrieb B. Li:
> Hi All,
>
>
> I have a question about REGEXP. I would like to extract a digit (e.g. the 
> third one) in a number (NUM). Could I use REGEXP to get the result of a 
> matched group (something like NUM{REGEXP("^\\d\\d(\\d)")})?
>
>
> Any hint would be greatly appreciated! Thanks in advance!
>
>
> Baoli
>
>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-20 Thread Peter Klügl
Hi Mario,


I did not have the chance to have a look at your example yet...


Most likely, this problem is already fixed in the current trunk, but I
was not able to find the time for a new release. In 2.7.0, there was a
small modification in the lexer rules for the seeding, which had
unfortunately some unintended side effects in the generated code
especially with unusual unicode characters. I'll try to verify that with
your example the next days.


Best,


Peter

Am 19.09.2019 um 12:35 schrieb Mario Juric:
> Hi Peter,
>
> After upgrading to Ruta 2.7.0 a while ago we started getting some
> errors from the SeedLexer, which we didn’t have before. It appears
> related to odd unicode characters that we haven’t cleaned properly
> upstream, but it is consumed by the previous version 2.6.1 where our
> pipeline completes without error. I attached a small sample program
> with a dummy ruta script to reproduce it.
>
> Which version has the correct behaviour in such cases? 2.7.0 or 2.6.1?
>
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Usage of anchors

2019-08-30 Thread Peter Klügl
Hi,

Am 29.08.2019 um 17:31 schrieb Nikolai Krot:
> Hi Peter,
>
> Thank you for your answer. Is this the relevant issue:
> https://issues.apache.org/jira/browse/UIMA-3862 ?


Yes. (The description should really be more informative)


>
> Honestly, your answer is a revelation for me :) I originally though that
> matching on literals should be faster because no extra step of preliminary
> annotation thereof is required. Can I expect a speed up if I implement the
> rules as follows:
>
> 1) all/most of literals that are found in rules are first wrapped into an
> annotation, say, WRD;
>
>MARKTABLE(WDR, VocabularyOfWordsAppearingInRules);
>
> 2) the rules that rely on these literals are rewritten to be something like
> this:
>
> ... @WRD.ct == "hello" ... {-> ACTION1};
> ... @WRD.ct == "world" ... {-> ACTION2};
>
> Im just curious. We are trying to figure out what is the best tactics of
> writing the rules to guarantee they work am schnellsten.


Yes, I would assume that it is faster. However, it depends on many
factors, e.g., the distributions of the words, length of the document
and the length of the rule and index of the anchor.

I would recommend the usage of FOREACH in order to avoid redudant index
matches on the same annotation.

In my use cases, the initialization of the stream is often relatively
expensive since there are many Ruta compoments in a pipeline that each
reindex the RutaBasics anew. Thus, the speed of a rule is sometimes not
as important as the combination with other annotators.


Best,


Peter


> Best regards,
> Nikolai
>
>
> On Thu, Aug 29, 2019 at 3:26 PM Peter Klügl 
> wrote:
>
>> Hi,
>>
>> Am 29.08.2019 um 15:21 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> thank you for your answer. Can you confirm my understanding (i have
>> certain
>>> difficulty understanding stacked negations)
>>>
>>> * it may be a problem if a literal string in a rule is also an anchor
>>> (either explicitly set by user or selected by rule interpreter)
>>
>> yes, it is especially inefficient because there is no index on the
>> covered text. The rule element needs to evaluate very RutaBasic in the
>> current window (document) by comparing the covered text to the string
>> value. It is of course much slower since you could normally restrict the
>> type of annotation somehow  and use an annotation index.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Thu, Aug 29, 2019 at 2:27 PM Peter Klügl 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> the second option should be preferred at least until UIMA-3862 is
>>>> resolved with some additional indexing.
>>>>
>>>> It is of course not so problematic if the literal matching condition is
>>>> not the starting anchor. However, it is still annoying that the rule
>>>> lements need to be designed according the dynamic partitioning of the
>>>> RutaBasis. This easily leads to problems is larger pipelines.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 29.08.2019 um 11:59 schrieb Nikolai Krot:
>>>>> Hi Peter,
>>>>>
>>>>> I have a question about this comment of yours:
>>>>>
>>>>> < ... but the matching using literal string expression is still really
>>>>> inefficient.
>>>>>
>>>>> What do you mean by "inefficient"? Do you mean it is slow? Say, if I
>> want
>>>>> to use a literal in one hundred rules, what is a better strategy:
>>>>> 1) writing the string literally in every of these 100 rules; or
>>>>> 2) annotating the string (using MARKTABLE) and they using the
>> annotation
>>>> in
>>>>> these 100 rules?
>>>>>
>>>>> Best regards,
>>>>> Nikolai
>>>>>
>>>>> On Mon, Aug 26, 2019 at 2:27 PM Peter Klügl 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> Thanks a lot for the clarification. I was wondering about (10) too.
>>>>>>>
>>>>>>> Following your explanation I was wondering, Does it make sense to
>>>

Re: Using extensions

2019-08-30 Thread Peter Klügl
Hi,

Am 29.08.2019 um 15:34 schrieb Dominik Terweh:
> Hey,
>
> I tried to understand the rules that you suggested and have a few questions 
> (see below).
> What we have (successfully) implemented so far is a set of rules that change 
> the value of the stored string, in order to produce some kind of expression 
> that is evaluated subsequently:
> a) replace numbers: "eins" becomes "(1)", "zwei|zwan" becomes "(2)"...
> b) replaced factors: "zig" becomes "*(10)", "hundert" becomes "*(100)" 
> and remove "and"
> c) other ruta rules interpret the expression in chain-like order
>
> "dreimillionenzweitausendvierhunderteinundzwanzig"
> a) "(3)millionen(2)tausend(4)hundert(1)und(2)zig"
> b) "(3)*(100)(2)*(1000)(4)(100)(1)(20)"
> c) "(3)*(100)(2)*(1000)(400)(21)" => "(3)*(100)(2)*(1000)(421)" => 
> "(300)(2000)(421)" => "(300)(2421)" => "(3002421)"
>
> However, we use replaceAll(string, pattern, patter) in all these 
> transformations and fear that it might not be the optimal solution for UIMA 
> Ruta.
> Do you have any suggestion?


Why do you want to use a string feature to represent the numeric value?

I would assume that switching to a double/int feature makes it a lot
easier as you can directly perform the calculations.

Btw here's our type system for numeric values:

https://github.com/averbis/core-typesystems/blob/master/numeric-value-typesystem/src/main/resources/de/averbis/textanalysis/typesystems/NumericValueTypeSystem.xml


>
> Here are the questions for your rules:
> 1)
>> Before you can apply the dictionaries, you need to split the RutaBasics  
>> using some conjunction words in order to map the subword segments.
> How exactly can I do that? I know there is SPLIT() but that can only split an 
> annotation
> on the basic of another inlaying one, or do I understand it wrong?
> Because if I could split words then German agglutinated numbers would be no 
> problem (since we have a working solution for English).


In Ruta, you can use simple regex rules for splitting up annotations. If
you have a rule like:
"und" -> ConjunctionFragment;

Then the "und" within the word fünfundzwanzig is annotated with the type
ConjunctionFragment since the simple regex rules are not bound to
annotations at all.
However, as a result, the RutaBasics will be updated. First there was
only one for the W, afterwards there are three. The WORDTABLE operates
on RutaBasic annotations and therefore is able to find "fünf"=5 and
"zwanzig"=20


> 2)
> Is there a special reason, why you use 3 for 'thousand', when you use it with 
> POW(10, x)? Intuitively I would just use 1000.


No, I think someone (me?) thought it would be more elegant.


>
> 3)
> In your "combination with multipliers like 3 million"-rule (Rule 1), you 
> shift the annotation to span over (1,4), should it not be (1,3)?


ah yes, that's a typo.


> 4)
> In Rule 1, is num{IS(NumericValue) )-> SHIFT(NumericValue,1,4)} just a 
> different way of writing num:NumericValue{)-> SHIFT(NumericValue,1,4)}?


The "num" is the variable of the FOREACH block, which in this case
operates from right to left.

So, all rules of the block are performed on the each NumericValue
successively. It is a bit more like an FST. The reverse order was
selected due to some calculations.

Your second rule would be performed on all NumericValue before the next
rule is executed.


>
> 5)
> What exactly is the function of the NEAR() in your Rule 1? Is it there do 
> match only "3", "3-Million" and "3-Million" but not "3-"?


Yes.

(Actually, I would not use NEAR here)


> 6)
> I tried to play Rule 1 through in my head with "zweitausendeins" and 
> "dreimillionenzweitausendeins":
> This works good for the first example


This rule was maybe not a good example afterall.

I have to check it in the context of the block, but AFAIR it would not
be applied for these examples in our rule set (but others).


Best,


Peter


> (num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
> //value = 2
>
>   (Multiplicator{-> num.value = (2 * (POW(10,3)))}
> //value = 2000
> add2:NumericValue?{-> num.value = (2000 + 1), UNMARK(add2)}));
> //value = 2001
>
>
> But fails for the second:
>
> (num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
> //value = 3
>
>   (Multiplicator{-> num.value = (2 * (POW(10,6)))}
> //value = 300
> add2:NumericValue?{-> num.value = (300 + 2), UNMARK(add2)})
> //value = 302, after 1st iteration
>
> 

Re: Usage of anchors

2019-08-29 Thread Peter Klügl
Hi,

Am 29.08.2019 um 15:21 schrieb Nikolai Krot:
> Hi Peter,
>
> thank you for your answer. Can you confirm my understanding (i have certain
> difficulty understanding stacked negations)
>
> * it may be a problem if a literal string in a rule is also an anchor
> (either explicitly set by user or selected by rule interpreter)


yes, it is especially inefficient because there is no index on the
covered text. The rule element needs to evaluate very RutaBasic in the
current window (document) by comparing the covered text to the string
value. It is of course much slower since you could normally restrict the
type of annotation somehow  and use an annotation index.


Best,


Peter


> Best regards,
> Nikolai
>
> On Thu, Aug 29, 2019 at 2:27 PM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> the second option should be preferred at least until UIMA-3862 is
>> resolved with some additional indexing.
>>
>> It is of course not so problematic if the literal matching condition is
>> not the starting anchor. However, it is still annoying that the rule
>> lements need to be designed according the dynamic partitioning of the
>> RutaBasis. This easily leads to problems is larger pipelines.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 29.08.2019 um 11:59 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> I have a question about this comment of yours:
>>>
>>> < ... but the matching using literal string expression is still really
>>> inefficient.
>>>
>>> What do you mean by "inefficient"? Do you mean it is slow? Say, if I want
>>> to use a literal in one hundred rules, what is a better strategy:
>>> 1) writing the string literally in every of these 100 rules; or
>>> 2) annotating the string (using MARKTABLE) and they using the annotation
>> in
>>> these 100 rules?
>>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Mon, Aug 26, 2019 at 2:27 PM Peter Klügl 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
>>>>> Hi Peter,
>>>>>
>>>>> Thanks a lot for the clarification. I was wondering about (10) too.
>>>>>
>>>>> Following your explanation I was wondering, Does it make sense to
>> anchor
>>>> sequences, such as in (8) and is it "legal" to use multiple anchors in
>>>> hierarchical fashion?
>>>>> Like A @(B @C D)?
>>>> Yes, it is "legal", but you have to be careful. (There are not enough
>>>> unit tests for those rules)
>>>>
>>>>
>>>>> Also, is there a difference between the processing of sequences of
>>>> annotations or literals (given "A" is annotated as A and so on)?
>>>>> A @(B C D)
>>>>> Vs
>>>>> "A" @("B" "C" "D")
>>>>> Vs
>>>>> A @("B" C "D")
>>>> It should not make a difference for the result, but the matching using
>>>> literal string epxression is still really inefficient.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>> Best
>>>>> Dominik
>>>>>
>>>>>
>>>>>
>>>>> Dominik Terweh
>>>>> Praktikant
>>>>>
>>>>> DROOMS
>>>>>
>>>>>
>>>>> Drooms GmbH
>>>>> Eschersheimer Landstraße 6
>>>>> 60322 Frankfurt, Germany
>>>>> www.drooms.com
>>>>>
>>>>> Phone:
>>>>> Fax:
>>>>> Mail: d.ter...@drooms.com
>>>>>
>>>>>
>>>>> Subscribe to the Drooms newsletter
>> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
>>>>> Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer
>>>> Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management
>>>> Board: Alexandre Grellier;
>>>>> Registergericht / Court of Registration: Amtsgericht Frankfurt am Main,
>>>> HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main,
>> USt-IdNr.:
>>>> DE 224007190
>>>>> On 21.08.19, 12:10, "Peter Klügl"  wrote:
>>>>>
>>>>> Hi,
>&g

Re: Usage of anchors

2019-08-29 Thread Peter Klügl
Hi,


the second option should be preferred at least until UIMA-3862 is
resolved with some additional indexing.

It is of course not so problematic if the literal matching condition is
not the starting anchor. However, it is still annoying that the rule
lements need to be designed according the dynamic partitioning of the
RutaBasis. This easily leads to problems is larger pipelines.


Best,


Peter


Am 29.08.2019 um 11:59 schrieb Nikolai Krot:
> Hi Peter,
>
> I have a question about this comment of yours:
>
> < ... but the matching using literal string expression is still really
> inefficient.
>
> What do you mean by "inefficient"? Do you mean it is slow? Say, if I want
> to use a literal in one hundred rules, what is a better strategy:
> 1) writing the string literally in every of these 100 rules; or
> 2) annotating the string (using MARKTABLE) and they using the annotation in
> these 100 rules?
>
> Best regards,
> Nikolai
>
> On Mon, Aug 26, 2019 at 2:27 PM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
>>> Hi Peter,
>>>
>>> Thanks a lot for the clarification. I was wondering about (10) too.
>>>
>>> Following your explanation I was wondering, Does it make sense to anchor
>> sequences, such as in (8) and is it "legal" to use multiple anchors in
>> hierarchical fashion?
>>> Like A @(B @C D)?
>> Yes, it is "legal", but you have to be careful. (There are not enough
>> unit tests for those rules)
>>
>>
>>> Also, is there a difference between the processing of sequences of
>> annotations or literals (given "A" is annotated as A and so on)?
>>> A @(B C D)
>>> Vs
>>> "A" @("B" "C" "D")
>>> Vs
>>> A @("B" C "D")
>>
>> It should not make a difference for the result, but the matching using
>> literal string epxression is still really inefficient.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>> Best
>>> Dominik
>>>
>>>
>>>
>>> Dominik Terweh
>>> Praktikant
>>>
>>> DROOMS
>>>
>>>
>>> Drooms GmbH
>>> Eschersheimer Landstraße 6
>>> 60322 Frankfurt, Germany
>>> www.drooms.com
>>>
>>> Phone:
>>> Fax:
>>> Mail: d.ter...@drooms.com
>>>
>>>
>>> Subscribe to the Drooms newsletter
>> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
>>> Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer
>> Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management
>> Board: Alexandre Grellier;
>>> Registergericht / Court of Registration: Amtsgericht Frankfurt am Main,
>> HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.:
>> DE 224007190
>>> On 21.08.19, 12:10, "Peter Klügl"  wrote:
>>>
>>> Hi,
>>>
>>> Am 20.08.2019 um 16:09 schrieb Dominik Terweh:
>>> >
>>> > Dear All,
>>> >
>>> >
>>> >
>>> > I have some questions regarding processing times and anchors ("@").
>>> >
>>> >
>>> >
>>> > First of all, is it possible to define an anchor on a disjunction?
>>> >
>>> > What I tested was to have a simple rule (1) that should start on
>> the
>>> > Element in the middle (2). Now this element had a variation (3)
>> but I
>>> > could not use the anchor in that case anymore:
>>> >
>>> > 1) AB   C;   // works
>>> >
>>> > 2) A   @B   C;   // works
>>> >
>>> > 3) A @(B|D) C;   // NOT WORKING
>>> >
>>> > Is this behaviour intended or simply not supported?
>>> >
>>> > [NOTE: NOT WORKING means eclipse does not complain, but the rule
>> never
>>> > matches]
>>> >
>>> >
>>> >
>>> > The above led to some testing with a different setup(4), however,
>>> > since disjunctions don't seem to work, this was also not valid.
>>> >
>>> > 4) A @((B C) | (D C));   // NOT WORKING
>>> >
>>>
>>> Anchors at disjunct rule elements are syntactically supported but do
&g

Re: Using extensions

2019-08-29 Thread Peter Klügl
Hi,


we are using a separate component for dictionary lookup, which can
combine multiple dictionaries and can also assign arbitrary feature
values. Most language-dependent information is extracted to
language-specific dictionaries and some language independent
dictionaries. There is a ticket to contribute parts of this
implementation to Ruta in order to replace WORDLIST, WORDTABLE and TRIE.
It's faster, more powerful and more stable. I did not have to time yet
to migrate it.

Most Ruta rules are language-independent. Some rules focus on different
constraints for separators, e.g., the space separator for thousands in
some languages instead of commas or periods.


I think this combination, as much fast dictionary lookup as possible and
then sequential rules for creating more complex expression if necessary,
is a good choice with respect to speed vs maintainability.

I have no recent numbers concerning throughput, but there have always
been another component that would have been optimized before.


I see no need for additional Ruta language extensions since they
increase the complexity of the pipeline without providing considerable
advantages for the use case.


Best,


Peter




Am 29.08.2019 um 11:39 schrieb Nikolai Krot:
> Hi Peter,
>
> From *your* perspective, for this particular task of turning written out
> numbers to their numerical representation, what would be better to
> implement it as a language extension (= one additional function) or a set
> of ruta rules?
> Against language extension speaks the fact that such conversion may be
> language-dependent, that is, it does no generalize well. On the other hand,
> the language extension may be faster that plain ruta rules. Is the
> implementation of this functionality that you have at your company good in
> terms of speed?
>
> Best regards,
> Nikolai KROT
>
> On Wed, Aug 28, 2019 at 1:48 PM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> we (Averbis) have an annotator which does exactly what you describe, but
>> unfortunetly I cannot share it.  However, I can tell that the annotator
>> is almost completely implemented in Ruta and uses no Ruta language
>> extensions.
>>
>>
>> If you want to learn more about language extensions, then there are
>> example projects in the Ruta trunk: ruta-core-ext and
>> example-projects/ruta-ep-example-extensions
>>
>>
>> If you want to build the annotator with Ruta rules, I can help you
>> create it.
>>
>>
>> As a starting point you need some dictionaries (wordtables) for numbers
>> (ein;1\neins;1\nzwei;2) , exponents/multiplicators (tausend;3) and
>> special characters (½). For German that's not too much, maybe one
>> hundred entries overall is a good start.
>>
>> Before you can apply the dictionaries, you need to split the RutaBasics
>> using some conjunction words in order to map the subword segments. You
>> can do that with a simple regex rule:
>>
>> "und" -> ConjunctionFragment;
>>
>> Then, you can write some rules that combine numbers using additions,
>> multiplications and exponents, e.g., something like:
>>
>>
>> FOREACH(num, false) NumericValue{}{
>>
>> // combination with multipliers like 3 million
>> (num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
>> SPECIAL?{REGEXP("-"), NEAR(W,0,1,true)}
>> (
>> Multiplicator{-> num.value = (num.value * (POW(10,
>> Multiplicator.value)))}
>> add2:NumericValue?{-> num.value = (num.value +
>> add2.value), UNMARK(add2)}
>> )*);
>>
>>
>> // fünfundzwanzig
>> (num{PARTOF(W)-> SHIFT(NumericValue,1,3)} ConjunctionFragment
>> add:NumericValue.value!=0{PARTOF(W), IF((NumericValue.value%1) == 0) ->
>> UNMARK(add)})
>> {-> num.value = (num.value + add.value)};
>>
>> }
>>
>>
>> At the end you get about 200 lines of Ruta ...
>>
>>
>>
>>
>> Best,
>>
>>
>> Peter
>>
>> Am 27.08.2019 um 16:30 schrieb Dominik Terweh:
>>> Dear All,
>>>
>>>
>>>
>>> When working with German written out numbers I figured, that in order
>>> to get what I want (the numeric value of a written number) I need to
>>> either hard code every single number name and use Wordtable or I need
>>> to work with the string. However, this made me thinking that this
>>> would probably be better done in a Language Extension. Unfortunately I
>>> am not sure how these work and how I can include them in my project.
>>> Also the manual did not reall

Re: Using extensions

2019-08-28 Thread Peter Klügl
Hi,


we (Averbis) have an annotator which does exactly what you describe, but
unfortunetly I cannot share it.  However, I can tell that the annotator
is almost completely implemented in Ruta and uses no Ruta language
extensions.


If you want to learn more about language extensions, then there are
example projects in the Ruta trunk: ruta-core-ext and
example-projects/ruta-ep-example-extensions


If you want to build the annotator with Ruta rules, I can help you
create it.


As a starting point you need some dictionaries (wordtables) for numbers
(ein;1\neins;1\nzwei;2) , exponents/multiplicators (tausend;3) and
special characters (½). For German that's not too much, maybe one
hundred entries overall is a good start.

Before you can apply the dictionaries, you need to split the RutaBasics
using some conjunction words in order to map the subword segments. You
can do that with a simple regex rule:

"und" -> ConjunctionFragment;

Then, you can write some rules that combine numbers using additions,
multiplications and exponents, e.g., something like:


FOREACH(num, false) NumericValue{}{

    // combination with multipliers like 3 million
    (num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
SPECIAL?{REGEXP("-"), NEAR(W,0,1,true)}
    (
    Multiplicator{-> num.value = (num.value * (POW(10,
Multiplicator.value)))}
    add2:NumericValue?{-> num.value = (num.value +
add2.value), UNMARK(add2)}
    )*);  


    // fünfundzwanzig
    (num{PARTOF(W)-> SHIFT(NumericValue,1,3)} ConjunctionFragment
add:NumericValue.value!=0{PARTOF(W), IF((NumericValue.value%1) == 0) ->
UNMARK(add)})
    {-> num.value = (num.value + add.value)};

}


At the end you get about 200 lines of Ruta ...




Best,


Peter

Am 27.08.2019 um 16:30 schrieb Dominik Terweh:
>
> Dear All,
>
>  
>
> When working with German written out numbers I figured, that in order
> to get what I want (the numeric value of a written number) I need to
> either hard code every single number name and use Wordtable or I need
> to work with the string. However, this made me thinking that this
> would probably be better done in a Language Extension. Unfortunately I
> am not sure how these work and how I can include them in my project.
> Also the manual did not really help me there
> (https://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.language.extensions).
>
>
>  
>
> Further I was wondering if there are any readily available extensions
> that can be used, e.g. to convert a string of number words into actual
> numbers (or replacing words on a dictionary basis, such as “one”:”1”,
> “two”:”2”,…), or an extension, that can evaluate a calculation in the
> form of a string (like “100*5+55”).  If something exists for number
> conversion it would be interesting to see if it does both, annotation
> and calculation, and how it handles different languages such as:
>
> 1) input is one token (like numbers in german, einundzwanzig)
>
> 2) input is several tokens jointly representing one number (like in
> english: twenty two)
>
> And mixed cases such as:
>
> 3) input is combination of number and string (like: 10 Millionen)
>
>  
>
> Thank you in advance for your help,
>
> Best
>
> Dominik
>
> Dominik Terweh
> Praktikant
>
> *Drooms GmbH*
> Eschersheimer Landstraße 6
> 60322 Frankfurt, Germany
> www.drooms.com <http://www.drooms.com>
>
> Phone:
> Mail: d.ter...@drooms.com <mailto:d.ter...@drooms.com>
>
> <https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature>
>
> *Drooms GmbH*; Sitz der Gesellschaft / Registered Office:
> Eschersheimer Landstr. 6, D-60322 Frankfurt am Main; Geschäftsführung
> / Management Board: Alexandre Grellier;
> Registergericht / Court of Registration: Amtsgericht Frankfurt am
> Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main,
> USt-IdNr.: DE 224007190
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Usage of anchors

2019-08-26 Thread Peter Klügl
Hi,


Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
> Hi Peter,
>
> Thanks a lot for the clarification. I was wondering about (10) too.
>
> Following your explanation I was wondering, Does it make sense to anchor 
> sequences, such as in (8) and is it "legal" to use multiple anchors in 
> hierarchical fashion?
> Like A @(B @C D)?

Yes, it is "legal", but you have to be careful. (There are not enough
unit tests for those rules)


>
> Also, is there a difference between the processing of sequences of 
> annotations or literals (given "A" is annotated as A and so on)?
> A @(B C D)
> Vs
> "A" @("B" "C" "D")
> Vs
> A @("B" C "D")


It should not make a difference for the result, but the matching using
literal string epxression is still really inefficient.


Best,


Peter


>
> Best
> Dominik
>
>
>
> Dominik Terweh
> Praktikant
>
> DROOMS
>
>
> Drooms GmbH
> Eschersheimer Landstraße 6
> 60322 Frankfurt, Germany
> www.drooms.com
>
> Phone:
> Fax:
> Mail: d.ter...@drooms.com
>
>
> Subscribe to the Drooms newsletter
>>>> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
> Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer 
> Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management Board: 
> Alexandre Grellier;
> Registergericht / Court of Registration: Amtsgericht Frankfurt am Main, HRB 
> 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.: DE 
> 224007190
>
> On 21.08.19, 12:10, "Peter Klügl"  wrote:
>
> Hi,
>
> Am 20.08.2019 um 16:09 schrieb Dominik Terweh:
> >
> > Dear All,
> >
> >
> >
> > I have some questions regarding processing times and anchors ("@").
> >
> >
> >
> > First of all, is it possible to define an anchor on a disjunction?
> >
> > What I tested was to have a simple rule (1) that should start on the
> > Element in the middle (2). Now this element had a variation (3) but I
> > could not use the anchor in that case anymore:
> >
> > 1) AB   C;   // works
> >
> > 2) A   @B   C;   // works
> >
> > 3) A @(B|D) C;   // NOT WORKING
> >
> > Is this behaviour intended or simply not supported?
> >
> > [NOTE: NOT WORKING means eclipse does not complain, but the rule never
> > matches]
> >
> >
> >
> > The above led to some testing with a different setup(4), however,
> > since disjunctions don't seem to work, this was also not valid.
> >
> > 4) A @((B C) | (D C));   // NOT WORKING
> >
>
> Anchors at disjunct rule elements are syntactically supported but do not
> work correctly. I will open a bug ticket.
>
>
> >
> >
> > Is there a scenario where anchors are valid in and before brackets?
> > From my observation I've seen that (5)-(10) are all working as
> > expected and all start matching on B. But, do they differ in terms of
> > processing? I noticed slightly longer processing times in (5) and ever
> > so slightly in (6), but not very indicative. Could (5)-(10) differ in
> > processing time?
> >
> > 5)   A   @B C
> >
> > 6)  (A   @B C)
> >
> > 7) @(A   @B C)
> >
> > 8)   A  @(B C)
> >
> > 9)   A @(@B C)
> >
> > 10)  A  (@B C)
> >
>
> Yes since different combinations of methods are called, but I think
> there should not be a big difference between (5)-(9).
>
>
> >
> >
> > Since rule (10) works as expected, why does (11) work differently and
> > start on A but not on B and D? (This would be useful in a scenario
> > where B and D combined appear less often than A)
> >
> > 11) A  ((@B C) | (@D C));   // starts matching on A
> >
> >
> >
> >
> >
>
> I have to check that. I think (10) start with A too.
>
>
>
> Two comments for anchors and disjunct rule elements:
>
> Anchors started as a manual option to optimize the rule execution time
> compared tot he automatic dynamic anchoring. However, the anchor can
> considerably change the consequences of a rule. For me, the anchor is
> more of an engineering option which also can be used to speed up the 
> r

Re: Usage of anchors

2019-08-21 Thread Peter Klügl
Hi,

Am 20.08.2019 um 16:09 schrieb Dominik Terweh:
>
> Dear All,
>
>  
>
> I have some questions regarding processing times and anchors ("@").
>
>  
>
> First of all, is it possible to define an anchor on a disjunction?
>
> What I tested was to have a simple rule (1) that should start on the
> Element in the middle (2). Now this element had a variation (3) but I
> could not use the anchor in that case anymore:
>
> 1) A    B   C;   // works
>
> 2) A   @B   C;   // works
>
> 3) A @(B|D) C;   // NOT WORKING
>
> Is this behaviour intended or simply not supported?
>
> [NOTE: NOT WORKING means eclipse does not complain, but the rule never
> matches]
>
>  
>
> The above led to some testing with a different setup(4), however,
> since disjunctions don't seem to work, this was also not valid.
>
> 4) A @((B C) | (D C));   // NOT WORKING
>

Anchors at disjunct rule elements are syntactically supported but do not
work correctly. I will open a bug ticket.


>  
>
> Is there a scenario where anchors are valid in and before brackets?
> From my observation I've seen that (5)-(10) are all working as
> expected and all start matching on B. But, do they differ in terms of
> processing? I noticed slightly longer processing times in (5) and ever
> so slightly in (6), but not very indicative. Could (5)-(10) differ in
> processing time?
>
> 5)   A   @B C
>
> 6)  (A   @B C)
>
> 7) @(A   @B C)
>
> 8)   A  @(B C)
>
> 9)   A @(@B C)
>
> 10)  A  (@B C)
>

Yes since different combinations of methods are called, but I think
there should not be a big difference between (5)-(9).


>  
>
> Since rule (10) works as expected, why does (11) work differently and
> start on A but not on B and D? (This would be useful in a scenario
> where B and D combined appear less often than A)
>
> 11) A  ((@B C) | (@D C));   // starts matching on A
>
>  
>
>  
>

I have to check that. I think (10) start with A too.



Two comments for anchors and disjunct rule elements:

Anchors started as a manual option to optimize the rule execution time
compared tot he automatic dynamic anchoring. However, the anchor can
considerably change the consequences of a rule. For me, the anchor is
more of an engineering option which also can be used to speed up the rules.


Disjunct rule elements are not well supported and maintained in Ruta.
Their implementation is not efficient and they can lead to unintened
matches. Thus, their usage is not allowed in my team and I would not
recommend using them right now.


(I will try to find the time to improve the implementation)


Best,


Peter


> Thank you in advance for your answers,
>
> Best
>
> Dominik
>
> Dominik Terweh
> Praktikant
>
> *Drooms GmbH*
> Eschersheimer Landstraße 6
> 60322 Frankfurt, Germany
> www.drooms.com <http://www.drooms.com>
>
> Phone:
> Mail: d.ter...@drooms.com <mailto:d.ter...@drooms.com>
>
> <https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature>
>
> *Drooms GmbH*; Sitz der Gesellschaft / Registered Office:
> Eschersheimer Landstr. 6, D-60322 Frankfurt am Main; Geschäftsführung
> / Management Board: Alexandre Grellier;
> Registergericht / Court of Registration: Amtsgericht Frankfurt am
> Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main,
> USt-IdNr.: DE 224007190
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: how to match patterns back from the end of an input string

2019-07-26 Thread Peter Klügl
Hi,


there are no special language elements for this. Howver, there are many
other ways to do this (efficiently).

You could for example create an annotation on the last part of the
document with MARKLAST and then use that nnoation as a starting anchor
"@" in an additional rule.


Best,


Peter


Am 26.07.2019 um 11:54 schrieb B. Li:
> Hi All,
>
>
> I would like to match patterns back from the end of an input string, which 
> may not end at SENTENCEEND. I am wondering whether there are some special 
> tokens like "^" and "$" in normal regular expression in RUTA.
>
>
> Thanks in advance,
>
>
> Baoli

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: How to check matched annotation with StringList/ WordList in RUTA

2019-07-26 Thread Peter Klügl
Hi,


there are many ways to do that, one being the rules in your example. Do
the rules not work for you?


I just tested the following and got only one EntityType annotation.


Input:

The policy number is 1A-AB12345-PAD.
The policy number is 1A-AB12345-PAN.


Script:

PACKAGE uima.example;

DECLARE Annotation ProdCode;
DECLARE EntityType;

STRINGLIST CustomSL = {"PLB","PAD"};

"(?i)\\b(?=.*\\d)[1]{0,1}[A-Z0-9]{2}[\\s |-]{0,2}[A-Z0-9]{7}[\\s
|-]{0,2}([A-Z]{3})\\b" -> 1 = ProdCode;

ProdCode{INLIST(CustomSL)->MARK(EntityType)};


Best,


Peter


Am 22.07.2019 um 12:41 schrieb vamsi kruthiventi:
> Hi Team,
>
> I have a Policy Number of pattern :- AB-1234567-PAD, and Product Code as last 
> 3 characters (PAD) of Policy number. With my code, I am now successfully 
> extracting the Product Code (PAD) for given format of policy number. But, 
> now, I need to check the extracted Product Code(PAD) with list of Product 
> codes available. Currently, I am using STRINGLIST which has list of product 
> codes. But, I don't know how to check the extracted Product Code matches with 
> List of available Product codes.
>
> Below is my code:
>
> PACKAGE uima.ruta.example;
>
> Document{->RETAINTYPE(SPACE)};
>
> DECLARE Annotation ProdCode;
>
> "(?i)\\b(?=.*\\d)[1]{0,1}[A-Z0-9]{2}[\\s |-]{0,2}[A-Z0-9]{7}[\\s 
> |-]{0,2}([A-Z]{3})\\b"->1 = ProdCode;//<-Previously ProdCode was replaced 
> with EntityType to get last 3 chars of given REGEX
>
> STRINGLIST CustomSL = {"PLB","PAD"};
>
> ProdCode{INLIST(CustomSL)->MARK(EntityType)};//<- Requesting your help here!
>
> Ex 1:
>
> Input : The policy number is 1A-AB12345-PAD.
>
> Exp OP : PAD
>
> Ex 2:
>
> Input : The policy number is 1A-AB12345-PAN.
>
> Exp OP : Entity should not be recognized since PAN does not exist in given 
> STRINGLIST
>
> Note:
> I am a Pega developer and new to RUTA.
>
> Please share your thoughts and kindly do the needful.
>
> Thanks
> Vamsi
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: How to see output of profile and statistics configuration parameters

2019-07-26 Thread Peter Klügl
Hi,


yes, you need to activate the configuration parameter profile and
statistics, together with debug and maybe debugWithMatches. How to do
that depends on how you execute your Ruta script.


If you integrate the Ruta script directly in an UIMA pipeline, then you
need to set 3-5 configuration parameters.


If you launch directly the script in the Ruta Workbench, then you can
simply choose "debug" instead of "run". The analysis engine will be
automatically configured.


The collected information are normally available in two views of the
Ruta Workbench. Open the CAS file with the CAS Editor and switch to the
Explain perspective. In the applied rules view, there is additional
information about the absolute and relative time for each script and
each rule. For the statistics, there is a separate view called statistics.


Using CONFIGURE within a script for additional script won't do the trick
and self-configuring is not supported as it causes the SO.


Best,


Peter


(sorry that it took me so long to answer)


Am 18.07.2019 um 13:34 schrieb Nikolai Krot:
> Hi,
>
> I would like to measure the speed of running Ruta scripts. I do not know
> this can be done but found some configurations parameters in uima ruta
> manual, namely, *profile* and *statistics*, described around this section
> https://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic.parameter.profile
>
> So, I configure it in the ruta script. For example, the script is named
> Main.ruta. In the Main.ruta, I write
>
> // in Main.ruta file
> ENGINE uima.ruta.mystuff.OtherEngine;
> CONFIGURE(uima.ruta.mystuff.OtherEngine, "profile"=true,
> "statistics"=true);
>
> And i run the script on some file. Now question one: where can I see info
> generated by the above configuration options? Are they somehow displayed in
> Annotation Browser View? Are they in generated xmi file? Are they in the
> cas object?
>
> My 2nd queston is how to set the same options to the script being run. If I
> add the following lines to the script Main.ruta, I end up with stack
> overflow (my guess, caused by recursion):
>
>   // in Main.ruta file
>   ENGINE uima.ruta.mystuff.Main;
>   CONFIGURE(uima.ruta.mystuff.MainEngine, "profile"=true,
> "statistics"=true);
>
> My main question is how to measure speed is how to measure speed of running
> ruta script. Im my setup, i have one Main.ruta that imports several other
> scripts and runs them, I want to measure time of all individual scripts and
> Main.ruta. COuld you advice on how to do it or point me to the
> documentation where it is described?
>
> Thanks in advance,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Averbis released type systems as open source

2019-07-08 Thread Peter Klügl
Hi,


I just wanted to mention that Averbis released a selection of type
systems as open source (Apache License 2.0).


https://github.com/averbis/core-typesystems

Types for linguistic preprocessing, generic types like 'Concept' as well
as domain-independent entities like 'Date' or 'Measurement'.


https://github.com/averbis/health-typesystems

Types for specific medical entities and relationships like 'Diagnosis',
'Medication' and also for more specialized types like 'HLA' or
'VisualAcuity'.


https://github.com/averbis/pharma-typesystems

Types mainly for IDMP.


Artifacts with descriptions, types.txt, JCas cover classes and some
generated asciidoc are available at maven central.


Best,


Peter

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: gZdd.csv

2019-07-07 Thread Peter Klügl
Hi,


I was not able to reproduce the problem.


I attached a screenshot of my results and cc'ed your email. I
essentially used the CSV as input and was able to find all entries.


Maybe there is another encoding problem. Is the input file UTF-8? Can
you provide a sample input text file?


Best,


Peter

Am 05.07.2019 um 09:46 schrieb B. Li:
> Hi Peter,
>
> Attached please find the sample file. 
>
> Thanks a lot for your help!
>
> Baoli
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: a Problem about WORDTABLE

2019-07-05 Thread Peter Klügl
Thanks, I will take a look at it today.


Best,


Peter


Am 05.07.2019 um 09:47 schrieb B. Li:
> Hi Peter,
>
>
> I sent it to your email box.
>
>
> Thanks a lot,
>
>
> Baoli
>
>
> On 7/5/2019 15:41,Peter Klügl wrote:
> Hi,
>
>
> I think the attachment got lost. Can you either send it again to me
> email address or open a Jira ticket and attach the file there?
>
>
> Best,
>
>
> Peter
>
>
> Am 05.07.2019 um 09:37 schrieb B. Li:
> Thanks a lot Peter.
>
> Attached please find a CSV table encoded in UTF-8. Each row in the
> file contains a single Chinese digital character and its latin
> / mathematical value. I failed to get the value in the second column
> with the following RUTA script:
>
> WORDTABLE CnDigitTable = 'gZdd.csv';
> DECLARE Annotation CnD(STRING DVal);
> Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};
>
> The type CnD with a feature DVal has been defined in the type
> descriptor XML file.
>
> I have upgraded the engine to the newest 2.7.0 version, but the
> problem is not solved.  Any suggestion? Thanks.
>
> Kind regards,
>
> Baoli
>
> On 7/5/2019 14:11,Peter Klügl
> <mailto:peter.klu...@averbis.com> wrote:
>
> Hi,
>
>
> most problems with the WordTable are caused by whitespaces in the
> dictionary. Can you test if this is your issue by removing all white
> spaces in the relevant column?
>
> If this is the source of the problem, there is a configuration
> parameter
> for automatically avoiding it, but I have to check in which version it
> was introduced. However, upgrading the Ruta version is recommended in
> any case.
>
>
> If this is not the source of your problem, do you have a minimal
> example
> for reproducing it?
>
>
> Best,
>
>
> Peter
>
>
>
> Am 05.07.2019 um 03:51 schrieb B. Li:
>
> Hi All,
>
>
> I am trying to use a WordTable to configure and give several
> different attribute values (with different columns) to some
> SINGLE (Chinese) characters, but I always fail to get the
> correct values from columns in the WordTable file, although
> the engine can correctly recognize and mark the SINGLE
> characters. I am using RUTA 2.4.0. How can I solve this
> problem? Any hint would be greatly appreciated!
>
>
> Thanks a lot,
>
>
> Baoli LI
>
>
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.klu...@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.klu...@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: a Problem about WORDTABLE

2019-07-05 Thread Peter Klügl
Hi,


I think the attachment got lost. Can you either send it again to me
email address or open a Jira ticket and attach the file there?


Best,


Peter


Am 05.07.2019 um 09:37 schrieb B. Li:
> Thanks a lot Peter.
>
> Attached please find a CSV table encoded in UTF-8. Each row in the
> file contains a single Chinese digital character and its latin
> / mathematical value. I failed to get the value in the second column
> with the following RUTA script:
>
> WORDTABLE CnDigitTable = 'gZdd.csv';
> DECLARE Annotation CnD(STRING DVal);
> Document{-> MARKTABLE(CnD, 1, CnDigitTable, "DVal" = 2)};
>
> The type CnD with a feature DVal has been defined in the type
> descriptor XML file.
>
> I have upgraded the engine to the newest 2.7.0 version, but the
> problem is not solved.  Any suggestion? Thanks.
>
> Kind regards,
>
> Baoli
>
> On 7/5/2019 14:11,Peter Klügl
> <mailto:peter.klu...@averbis.com> wrote:
>
> Hi,
>
>
> most problems with the WordTable are caused by whitespaces in the
> dictionary. Can you test if this is your issue by removing all white
> spaces in the relevant column?
>
> If this is the source of the problem, there is a configuration
> parameter
> for automatically avoiding it, but I have to check in which version it
> was introduced. However, upgrading the Ruta version is recommended in
> any case.
>
>
> If this is not the source of your problem, do you have a minimal
> example
> for reproducing it?
>
>
> Best,
>
>
> Peter
>
>
>
> Am 05.07.2019 um 03:51 schrieb B. Li:
>
> Hi All,
>
>
> I am trying to use a WordTable to configure and give several
> different attribute values (with different columns) to some
> SINGLE (Chinese) characters, but I always fail to get the
> correct values from columns in the WordTable file, although
> the engine can correctly recognize and mark the SINGLE
> characters. I am using RUTA 2.4.0. How can I solve this
> problem? Any hint would be greatly appreciated!
>
>
> Thanks a lot,
>
>
> Baoli LI
>
>
> -- 
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.klu...@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: a Problem about WORDTABLE

2019-07-04 Thread Peter Klügl
Hi,


most problems with the WordTable are caused by whitespaces in the
dictionary. Can you test if this is your issue by removing all white
spaces in the relevant column?

If this is the source of the problem, there is a configuration parameter
for automatically avoiding it, but I have to check in which version it
was introduced. However, upgrading the Ruta version is recommended in
any case.


If this is not the source of your problem, do you have a minimal example
for reproducing it?


Best,


Peter



Am 05.07.2019 um 03:51 schrieb B. Li:
> Hi All,
>
>
> I am trying to use a WordTable to configure and give several different 
> attribute values (with different columns) to some SINGLE (Chinese) 
> characters, but I always fail to get the correct values from columns in the 
> WordTable file, although the engine can correctly recognize and mark the 
> SINGLE characters. I am using RUTA 2.4.0. How can I solve this problem? Any 
> hint would be greatly appreciated!
>
>
> Thanks a lot,
>
>
> Baoli LI

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Modularizing RUTA rules

2019-06-24 Thread Peter Klügl
Hi,


unfortunately, the import of macro definitions is not yet supported.
Right now, you need to copy them.


Best,


Peter


Am 19.06.2019 um 01:23 schrieb Nikolai Krot:
> Hi,
>
> I am trying to split large ruleset (Main.ruta) into several smaller and
> then work with them in the normal fashion
>
> SCRIPT path.to.my.Script1;
> SCRIPT path.to.my.Script2;
>
> CALL(Script1);
> CALL(Script2);
>
> I have some common part that should be done before calling Script1 and
> Script2. This common part includes definition of a number of CONDITION
> statements. Looks like this constructs are not available in included
> scripts. I receive an error a la
>
> Caused by: org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException:
> org.apache.uima.ruta.extensions.RutaParseRuntimeException: Error in
> Script1, line 14, "(": expected OPTIONAL, but found LPAREN
>
> that hits just before the construction is used. As soon as I copy the
> CONDITION statement to the Script1, the error disappears.
>
> The question is how to make these statements available to the included
> scripts? What if I extract them into a dedicated ruta script, can I then
> import them in all scripts that use them: Main, Script1 and Script2? In a
> way similar to TYPESYSTEM
>
> Thank you in advance.
>
> Best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Text traversal order

2019-06-04 Thread Peter Klügl
Hi,

yes, it is intentional that the action of a rule will not automatically
restarts the matching process due to different reasons.


The rule matching is sensitive to the consequences of a rule during the
matching process (also across rule matches in a rule apply) but not
concerning the anchors of the rule matching. A bit simplified, the rule
matching behaves like an iterator for each anchoring matching condition
(the first rule element in most cases) with following iterators for
sequential patterns creating new rule matches (matching alternatives).
This means that failed match on the first "aa" consumes that position
and it is not investigated again, even if it could be successful with
new facts/annotations created by later rule matches.


This does not mean that you cannot specify such patterns. You could use
different rules to achieve the desired result. My first guess would be
something like:


("aa" "+" "bb"){-> FOUND};
("aa"{-> FOUND} "/")[1,10] @FOUND;

I think there are also other ways to solve it.

Best,

Peter


Am 04.06.2019 um 11:29 schrieb Nikolai Krot:
> Hi all,
>
> I have an example of rules that dont quite work, which leads me to
> realization that I dont understand how text is traversed in ruta and how
> rules are applied.
>
> Below is a simplified example of what I m doing.
>
> Say, i have a text that has "words" like this
>
> 1 aa+bb
> 2 aa / aa+bb
> 3 aa /aa /aa+bb
>
> I want to annotate the tokens as follows
>
> 1 FOUND
> 2 FOUND / FOUND
> 3 FOUND / FOUND / FOUND
>
> and there can be longer sequences separated by a slash.
>
> These are my rules:
>
> "aa" "+" "bb"  {->MARK(FOUND,1,3)};
> "aa" "/" FOUND {->MARK(FOUND, 1)};
>
> In other words: the rightmost token of the sequence is annotated first as
> FOUND. and this becomes an evidence to annotate preceeding tokens as FOUND
> as well.
>
> The thing is that only cases 1 and 2 are fully annotated. The case 3 is
> annotated only partially.
>
> 1 FOUND
> 2 FOUND / FOUND
> 3 aa / FOUND / FOUND
>
> Seems that the second rule is applied only once, though I expect it to be
> applied many times in a loop as long as there is a match. The case 3 should
> work as soon as the case 2 has been annotated, because case 3 is an
> extension of case 2.
>
> Case 3 starts to work when the second rule is duplicated. Which is not a
> good solution, in my opinion. My question is: is the above by design (rule
> matching does not restart after a match) or is it a bug in ruta? Or maybe
> there is a configuration option to choose a behaviour?
>
> Thank you in advance and best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Matching REPLACEd text

2019-05-21 Thread Peter Klügl
Hi Dominik,


the REPLACE action is maybe not what you are looking for. It is a part
of a use case where you want to set a replacement for specific parts of
the document, e.g. deidentification. However, as the document text is
static, these replacements are only used for modifications later on by a
different analysis engine, which essentially creates a new CAS. Thus,
applying the REPLACE action without the additional analysis engine and
another pipeline has not effect as it only sets the value of a feature
in RutaBasic.


Fixing the OCR error and producing a new document text for further
processing is in general a good idea, but that action is probably not
the best solution for implementing it.


If the initial document text should not be changed in an additional step
(multiple pipeline or multi view CAS), then you can still implement the
use case by storing the corrected word in a feature, e.g., in a feature
"normalized" of a type "Token". This depends of course on your pipeline
and type system. Then, you would need some dictionary lookup on the
feature values. This is currently not supported by the wordlists in
Ruta, there is still an open feature request for this. Other dictionary
lookup components most likely support such functionality. The same
applies for stem instead of corrected OCR errors. Regarding Ruta rules,
you can simply match in the feature values as usual:
t:Token{t.normalized == "bla"} or  t:Token{t.stem == "bla"}


Some more comments to your specific question below...


Am 20.05.2019 um 09:19 schrieb Dominik Terweh:
>
> Dear All,
>
>  
>
> I am using uima to detect certain parts of contracts. Unfortunately
> the documents are not originals but scanned and due to the recognition
> of OCR I have a rather high percentage of errors. Furthermore I have
> some situations, where I would like to get the root or lemma of a word
> and match on their basis, so I thought the best solution for both of
> these problems would be the REPLACE() action, but unfortunately I seem
> not to get it working.
>
>  
>
> What I would like to achieve, given the sentences:
>
> “They worked hard”,
>
> “They were warking hard”,
>
> “He vvorks hard”,
>
> “I work hard”
>
> I would want to perform some OCR correction (“warking” -> “working”,
> “vvorks” -> “works”), like:
>
> WrongWord{-> REPLACE(CorrectWord)};
>
> And some stemming/lemmatizing (“working”,”works”,”worked” -> “work”),
> like:
>
>     Word{-> REPLACE(Stem)};
>
> After that I would like to match on the replaced text, by simply using
> the stems, like:
>
>     ANY “work” “hard”{-> MARK(WhatIWant, 2, 3)};
>
>  
>
> Now my main questions are:
>
>   * Is it possible to match on replaced text?
>
Yes, but it depends on the representation and the component. It is
possible to match on any feature values using Ruta rules, but not using
wordlists.

>   * If so, can I highlight it in the original text?
>

The highlighting depends on the CAS viewer and thus normally on the
type. For a different highlighting, you would need an additional type.


>   * Can I see the changed text in the Annotation Browser View?
>

You cannot see the replaced text in this view since it is represented in
a feature of RutaBasic which is hidden in that view.


>   * Do I first need to write the outcome to a file and then reread and
> process it?
>

It depends on your overall use case and your pipline setup. No, if you
only rely on matching on feature values.



Best,


Peter


>  
>
> I hope you can help me with my request,
>
> Dominik
>
> Dominik Terweh
> Praktikant
>
> *Drooms GmbH*
> Eschersheimer Landstraße 6
> 60322 Frankfurt, Germany
> www.drooms.com <http://www.drooms.com>
>
> Phone:
> Mail: d.ter...@drooms.com <mailto:d.ter...@drooms.com>
>
> <https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature>
>
> *Drooms GmbH*; Sitz der Gesellschaft / Registered Office:
> Eschersheimer Landstr. 6, D-60322 Frankfurt am Main; Geschäftsführung
> / Management Board: Alexandre Grellier;
> Registergericht / Court of Registration: Amtsgericht Frankfurt am
> Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main,
> USt-IdNr.: DE 224007190
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Problem setting up a uimaFIT pipeline

2019-05-16 Thread Peter Klügl
I added a comment there...

Am 16.05.2019 um 04:21 schrieb Marshall Schor:
> Cross posted from stackoverflow:
>
> https://stackoverflow.com/questions/56149592/jcas-type-timex3-used-in-java-code-but-was-not-declared-in-the-xml-type-d
>
> Can someone see if the uimaFIT auto configure of the type system would work 
> with
> this setup, or if something else is needed?
>
> -Marshall
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: fuzzy matching possible?

2019-05-07 Thread Peter Klügl
Hi,


at the end you need to check both, but you could maybe refactor the
checks in a new condition like (not tested):


CONDITION LemmaCT(ANNOTATION word, STRING check) = OR(word.lemma ==
check, word.ct == check);

w: Word{LemmaCT(w, "gearbeitet")};

... or with two string arguments for different checks for lemma and
covered text.


If you have many rules like this, I would prefer an additional analysis
engine because those rules may become less maintainable over time.


Best,


Peter



Am 04.05.2019 um 13:44 schrieb Nikolai Krot:
> Hi Peter,
>
> Thank you for the answer.
>
>> that mainly depends on the typesystem. Your rule could look something
> like:
>>
>> w:Word{OR(w.lemma == "arbeiten", w.ct == "gearbeitet")};
> I know of this syntax. My question is whether there is a shorter form to
> tell than whenever I need to match word text, the matching should check
> both lemma and ct fields. Think of a few dozen rules like this...
>
> Best regards,
> Nikolai
>>
>> Best,
>>
>>
>> Peter
>>
>> Am 03.05.2019 um 18:28 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> Thank you for your prompt reply.
>>>
>>> Speaking about pre-annotation with another engine. Say, I managed to
>>> annotate words of interest and additionally set an attribute, something
>>> like this
>>>
>>> ... gearbeitet...
>>>
>>> Is there a simple way configure the object of matching in ruta rules so
>>> that the rule matches over actual text ("gearbeitet" in our case) or the
>>> value of attribute "lemma" ("arbeiten" in our case)?
>>> That is, match should return True if either of the fields evaluates to
> True.
>>> This would make some rules simpler.
>>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Fri, May 3, 2019 at 2:03 PM Peter Klügl 
> wrote:
>>>> Hi,
>>>>
>>>>
>>>> there is/was support for a weighted edit distance in the trie lookup,
>>>> but that functionality was not maintained for many years.
>>>>
>>>> The dictionary lookup functionality in Ruta is overall very limited.
>>>> Normally, one uses an separate analysis engine with extended logic
>>>> (ConceptMapper?) for creating the annotations, which are then later
>>>> reused in rules.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>> Am 03.05.2019 um 13:16 schrieb Nikolai Krot:
>>>>> Hi all,
>>>>>
>>>>> Is there a possibility to match a word somehow fuzzily in UIMA Ruta
>>>>> language? I am thinking how to overcome problems with typos and OCR
>>>>> mistakes... It is hardly possible to list all possibilities how a word
>>>>> could have been broken.
>>>>>
>>>>> Best regards,
>>>>> Nikolai Krot
>>>>>
>>>> --
>>>> Dr. Peter Klügl
>>>> R&D Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.klu...@averbis.com
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>>>>
>> --
>> Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: fuzzy matching possible?

2019-05-04 Thread Peter Klügl

Hi,


that mainly depends on the typesystem. Your rule could look something like:


w:Word{OR(w.lemma == "arbeiten", w.ct == "gearbeitet")};


Best,


Peter

Am 03.05.2019 um 18:28 schrieb Nikolai Krot:

Hi Peter,

Thank you for your prompt reply.

Speaking about pre-annotation with another engine. Say, I managed to
annotate words of interest and additionally set an attribute, something
like this

... gearbeitet...

Is there a simple way configure the object of matching in ruta rules so
that the rule matches over actual text ("gearbeitet" in our case) or the
value of attribute "lemma" ("arbeiten" in our case)?
That is, match should return True if either of the fields evaluates to True.
This would make some rules simpler.

Best regards,
Nikolai

On Fri, May 3, 2019 at 2:03 PM Peter Klügl  wrote:


Hi,


there is/was support for a weighted edit distance in the trie lookup,
but that functionality was not maintained for many years.

The dictionary lookup functionality in Ruta is overall very limited.
Normally, one uses an separate analysis engine with extended logic
(ConceptMapper?) for creating the annotations, which are then later
reused in rules.


Best,


Peter

Am 03.05.2019 um 13:16 schrieb Nikolai Krot:

Hi all,

Is there a possibility to match a word somehow fuzzily in UIMA Ruta
language? I am thinking how to overcome problems with typos and OCR
mistakes... It is hardly possible to list all possibilities how a word
could have been broken.

Best regards,
Nikolai Krot


--
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



--
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: fuzzy matching possible?

2019-05-03 Thread Peter Klügl
Hi,


there is/was support for a weighted edit distance in the trie lookup,
but that functionality was not maintained for many years.

The dictionary lookup functionality in Ruta is overall very limited.
Normally, one uses an separate analysis engine with extended logic
(ConceptMapper?) for creating the annotations, which are then later
reused in rules.


Best,


Peter

Am 03.05.2019 um 13:16 schrieb Nikolai Krot:
> Hi all,
>
> Is there a possibility to match a word somehow fuzzily in UIMA Ruta
> language? I am thinking how to overcome problems with typos and OCR
> mistakes... It is hardly possible to list all possibilities how a word
> could have been broken.
>
> Best regards,
> Nikolai Krot
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: how to regenerate type system files in descriptor/ directory

2019-04-30 Thread Peter Klügl
Hi,


you somehow need to activate the builder. "Cleaning" the project
(Menu->Project->Clean...) for example will do that.


If the project becomes more complicated, I recommend switching to the
maven plugin even with the overhead of maven.


Best,


Peter


Am 30.04.2019 um 18:47 schrieb Nikolai Krot:
> Hi all,
>
> I am trying to figure out which files in uima ruta project need to be
> tracked in git and which not. I have noticed that certain files in the
> directory descriptor/, namely those that correspond to my own scripts, have
> absolute paths. Therefore I decided to delete such files.
>
> These files are however necessary for running a ruta script. Hence a
> qwestion:
>
> *how to regenerate type system files of user-defined classes in
> eclipse/uima workbench?*
>
> Based on my observations, generation of typesystem files happens when a
> project is imported in eclipse*. *Therefore I had to remove and re-add my
> project. Is there a magic button that instructs eclipse to regenerate
> missing components?
>
> Best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: how to set scriptPath

2019-04-18 Thread Peter Klügl
Hi,


yes, there is no real reason, but a hard rule in Ruta. A script has to
follow the order: package declaration-> global statements like imports
-> staments like rules or blocks


Best,


Peter


Am 18.04.2019 um 13:58 schrieb Nikolai Krot:
> another interesting discovery concerning importing several scripts:
> all statements ENGINE and SCRIPT should *preceed* any CALL command, type
> declarations, normal rules, etc, like this:
>
> // file: script/uima/ruta/eng/aaa.ruta
> PACKAGE uima.ruta.eng;
>
> // all imports
> ENGINE uima.ruta.eng.common.dateEngine;
> SCRIPT uima.ruta.eng.common.date;
> ENGINE uima.ruta.eng.common.xxxEngine;
> SCRIPT uima.ruta.eng.common.xxx;
>
> // and now run annotation
> CALL(date);
> CALL(xxx);
>
> as opposed to the incorrect:
>
> // file: script/uima/ruta/eng/aaa.ruta
> PACKAGE uima.ruta.eng;
>
> ENGINE uima.ruta.eng.common.dateEngine;
> SCRIPT uima.ruta.eng.common.date;
> CALL(date);
>
> ENGINE uima.ruta.eng.common.xxxEngine; // <-- an error is reported here
> SCRIPT uima.ruta.eng.common.xxx;
> CALL(xxx);
>
> Sorry for spamming.
>
> BR, Nikolai
>
> On Thu, Apr 18, 2019 at 12:40 PM Nikolai Krot  wrote:
>
>> Hi,
>>
>> Great day! Finally, after many experiments I managed to reuse another
>> script that is not in the same directory and the current ruta script.
>>
>> It turns out to be very easy to achieve:
>>
>> // file: script/uima/ruta/eng/aaa.ruta
>> PACKAGE uima.ruta.eng;
>>
>> // these two lines import another script form a subdirectory
>> ENGINE uima.ruta.eng.common.dateEngine;
>> SCRIPT uima.ruta.eng.common.date;
>>
>> // and either of these two performs annotation with the above imported
>> script
>> CALL(date);
>> //EXEC(date); // this also works
>>
>> I am not sure this is the right way to accomplish the goal, but still I
>> want to leave the answer here for the record, as it turned out to be very
>> time consuming to find the answer. Hopefully, the answer will save someone
>> else's time :)
>>
>> Happy Easter!
>>
>> BR, Nikolai
>>
>>
>> On Tue, Apr 16, 2019 at 11:14 AM Nikolai Krot  wrote:
>>
>>> Hi Peter,
>>>
>>> Thank you for your quick reply. Still can not get it to work. Please read
>>> my questions intertwined with your answers.
>>>
>>> I will start with an overview of the projects structure I have set up
>>>
>>> script/uima/ruta/deu/common/date.ruta
>>> script/uima/ruta/eng/common/date.ruta
>>> script/uima/ruta/eng/aaa.ruta
>>>
>>> So, the project is multilingual (hence deu and eng) and I want to keep
>>> some shared scripts under *common/* directory. An example of such a
>>> shared script is, for example, *date.ruta*, for recognizing dates,
>>> because dates can appear in any type of document.
>>> All specific stuff, that is my case is document-type/genre specific, goes
>>> outside of *common/* directory: *eng/aaa.ruta* is an example of such
>>> specific document genre-dependent script.
>>>
>>> And now I want to reuse *eng/common/date.ruta* in said *eng/aaa.ruta*
>>> that looks like this:
>>>
>>> // file: script/uima/ruta/eng/aaa.ruta
>>> PACKAGE uima.ruta.eng;
>>> SCRIPT uima.ruta.eng.common.date;  <-- this causes an error
>>> Document {->CALL(date)};
>>>
>>> I tried to do it like this but it did not work. Hence i am asking this
>>> question.
>>>
>>> On Tue, Apr 16, 2019 at 9:27 AM Peter Klügl 
>>> wrote:
>>>
>>> I assume that you use a simple Ruta project (compared to a maven project
>>>> with the ruta-maven-plugin)?
>>>>
>>> true.
>>>
>>>
>>>> Normally, you should not need to set the scriptPaths configuration
>>>> parameter as it is set automatically by the builder to the absolute
>>>> paths to the "script" folder of your ruta project.
>>>>
>>>> The file "descriptor/BasicEngine.xml" is only the template for the
>>>> generated descriptors for your Ruta scripts. Thus, is isn't used for
>>>> launching scripts.
>>>>
>>> Do you mean that *descriptor/BaseEngine.xml* is used as a basis for
>>> other descriptor/*Engine.xml files? that is, the file
>>> *descriptor/uima/ruta/eng/aaaEngine.xml* was generated from
>>> BasicEngine.xml when I first created *eng/aaa.ruta* file. Can the file
>>> aaaEngine.xml be edited manua

Re: how to set scriptPath

2019-04-18 Thread Peter Klügl
Hi,


hmm, that's strange. Do you need the additional ENGINE import in order
that the SCRIPT CALL works correctly? If yes, then there is a bug in Ruta...


Best,


Peter


Am 18.04.2019 um 12:40 schrieb Nikolai Krot:
> Hi,
>
> Great day! Finally, after many experiments I managed to reuse another
> script that is not in the same directory and the current ruta script.
>
> It turns out to be very easy to achieve:
>
> // file: script/uima/ruta/eng/aaa.ruta
> PACKAGE uima.ruta.eng;
>
> // these two lines import another script form a subdirectory
> ENGINE uima.ruta.eng.common.dateEngine;
> SCRIPT uima.ruta.eng.common.date;
>
> // and either of these two performs annotation with the above imported
> script
> CALL(date);
> //EXEC(date); // this also works
>
> I am not sure this is the right way to accomplish the goal, but still I
> want to leave the answer here for the record, as it turned out to be very
> time consuming to find the answer. Hopefully, the answer will save someone
> else's time :)
>
> Happy Easter!
>
> BR, Nikolai
>
>
> On Tue, Apr 16, 2019 at 11:14 AM Nikolai Krot  wrote:
>
>> Hi Peter,
>>
>> Thank you for your quick reply. Still can not get it to work. Please read
>> my questions intertwined with your answers.
>>
>> I will start with an overview of the projects structure I have set up
>>
>> script/uima/ruta/deu/common/date.ruta
>> script/uima/ruta/eng/common/date.ruta
>> script/uima/ruta/eng/aaa.ruta
>>
>> So, the project is multilingual (hence deu and eng) and I want to keep
>> some shared scripts under *common/* directory. An example of such a
>> shared script is, for example, *date.ruta*, for recognizing dates,
>> because dates can appear in any type of document.
>> All specific stuff, that is my case is document-type/genre specific, goes
>> outside of *common/* directory: *eng/aaa.ruta* is an example of such
>> specific document genre-dependent script.
>>
>> And now I want to reuse *eng/common/date.ruta* in said *eng/aaa.ruta*
>> that looks like this:
>>
>> // file: script/uima/ruta/eng/aaa.ruta
>> PACKAGE uima.ruta.eng;
>> SCRIPT uima.ruta.eng.common.date;  <-- this causes an error
>> Document {->CALL(date)};
>>
>> I tried to do it like this but it did not work. Hence i am asking this
>> question.
>>
>> On Tue, Apr 16, 2019 at 9:27 AM Peter Klügl 
>> wrote:
>>
>> I assume that you use a simple Ruta project (compared to a maven project
>>> with the ruta-maven-plugin)?
>>>
>> true.
>>
>>
>>> Normally, you should not need to set the scriptPaths configuration
>>> parameter as it is set automatically by the builder to the absolute
>>> paths to the "script" folder of your ruta project.
>>>
>>> The file "descriptor/BasicEngine.xml" is only the template for the
>>> generated descriptors for your Ruta scripts. Thus, is isn't used for
>>> launching scripts.
>>>
>> Do you mean that *descriptor/BaseEngine.xml* is used as a basis for other
>> descriptor/*Engine.xml files? that is, the file
>> *descriptor/uima/ruta/eng/aaaEngine.xml* was generated from
>> BasicEngine.xml when I first created *eng/aaa.ruta* file. Can the file
>> aaaEngine.xml be edited manually? Is it safe? I see that this file contains
>> absolute paths and what happens when I move this project to another
>> computer that has a different directory structure? And finally, should
>> aaaEngine.xml be committed to a git repository?
>>
>>
>>> Can you check the parameter value of the descriptor of the Ruta script
>>> you want to run?
>>>
>> sorry, can not understand this. How do I locate it?
>>
>>
>>> If you want to use an additional script, you can simply import it in
>>> your script (using the correct package with "SCRIPT" and then execute it
>>> with "CALL"). The Workbench should take care of all the configuration.
>>>
>> Unfortunately, it did not work. The error is
>>
>> Exception in thread "main"
>> org.apache.uima.resource.ResourceInitializationException: Initialization of
>> annotator class "org.apache.uima.ruta.engine.RutaEngine" failed.
>> (Descriptor: file:/path/to/zzz/descriptor/uima/ruta/eng/aaaEngine.xml)
>> ...
>> Caused by: org.apache.uima.ruta.extensions.RutaParseRuntimeException:
>> Error in aaa, line 11, "SCRIPT": expected 'none', but found ScriptString
>>
>>
>>> In any case this works not as ex

Re: deleting temporary annotations

2019-04-18 Thread Peter Klügl
Hi,


why do you want to remove them? I tend to keep them for better
explainability of the rule inference.


In UIMA 2, removing the annotation will mainly only influence the index.
The garbage collector will not clean them up. Well, in UIMA 3 that
changes...


If you want to remove them, there are several options. I would prefer to
either use a parent type or an annotation list. Just a quick guess...


Use a single parent type to identify temporary annotations:


DECLARE Local LeftBoundary;

DECLARE Local RightBoundary;

l:Local{-> UNMARK(l)};


Store temporary annotations in a list and remove the values of the list:


ANNOTATIONLIST locals;

lb:LeftBoundary{-> ADD(locals, lb)};

l:locals{-> UNMARK(l)};


Best,


Peter


Am 17.04.2019 um 21:15 schrieb Nikolai Krot:
> Hi all,
>
> When developing rules, i often use temporary annotations. For example, to
> create an anchor, another rule will rely upon. Is there any quick way to
> delete such temporary annotations (form the document) when the annotation
> process has concluded? For the time being, I delete them individually with
> an UNMARK statement, like this:
>
> leftBoundary {-> UNMARK(leftBoundary)};
>
>
> If such a shortcut does not yet exist, what do you think about extending
> DECLARE statement to include an indication of such tempoary nature of the
> type, a la
>
> DECLARE leftBoundary as local;
>
>
> Opinions? Ideas?
>
> Best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Flattening Annotations?

2019-04-18 Thread Peter Klügl
Hi,

I normally use inlined rules as action for those use cases:

NP -> {
  np:NP{-> UNMARK(np)} ANY;
  ANY np:@NP{-> UNMARK(np)};
  };


It is longer, but faster...


Best,

Peter


Am 16.04.2019 um 22:08 schrieb Nikolai Krot:
> Hi all,
>
> Is there a quick way to remove all annotations of specific type if they are
> embedded in a wider annotation of the same type, for example:
>
>  ...  ... ...
>
> needs to be flattened to just outermost
>
>  ......  ... 
>
> At the moment I am doing it with the following (which seems to work and is
> said to be slow)
>
> NP  { PARTOFNEQ(NP)   -> UNMARK(NP)};
>
> Best regards,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Problem running UIMA Ruta ExampleProject

2019-04-16 Thread Peter Klügl
Oh no.


I actually have thought of this, but I would expected a completely
different error as other users reported already.


Sorry for the inconveniences. There is already a ticket for solving this
problem for good.


Best,


Peter


Am 16.04.2019 um 13:55 schrieb Nikolai Krot:
> Hi Peter,
>
> An update: I have experimented with Eclipse and now I tend to believe that
> what made the ExampleProject work is
> * not compiling ruta-2.7.0 and setting UIMA_HOME variable to point to it;
> * but rather running [UIMA Ruta/Update project] on the project.
>
> Please consider this issue solved.
>
> Best regards,
> Nikolai
>
> On Tue, Apr 16, 2019 at 9:11 AM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> hmm strange, that should not be necessary and I do not know why it has
>> any influence at all.
>>
>> I'll try to reproduce your problem anyway. Maybe I find the source of
>> the problem in Ruta.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 15.04.2019 um 22:29 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> Finally, things started to work. Though I can not tell why exactly.
>>> Retrospectively, i would guess it happened once I compiled the stuff in
>>> ruta-2.7.0 directory and set UIMA_HOME variable. Not sure nor willing to
>>> investigate it.
>>>
>>> I think this page
>>>
>> https://uima.apache.org/d/ruta-current/tools.ruta.book.html#section.ugr.tools.ruta.workbench.install
>>> does not mention it but I remember reading something somewhere else...
>>>
>>> Best regards,
>>> Nikolai
>>>
>>>
>>>
>>> On Mon, Apr 15, 2019 at 1:54 PM Peter Klügl 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> I tested it with Windows 10, Eclipse 4.8, Java 8.
>>>>
>>>>
>>>> I'll take a closer look at it the next days.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 13.04.2019 um 10:58 schrieb Nikolai Krot:
>>>>> Hi Peter,
>>>>>
>>>>> Thanks for supporting me :) This is what I tried on Ubuntu 18.04,
>>>>> unfortunately to no avail:
>>>>>
>>>>> 1) i have deleted the configuration and launched the Main.ruta script
>>>>> directly. I have the following
>>>>>
>>>>> Error occurred during initialization of boot layer
>>>>>> java.lang.LayerInstantiationException: Package
>>>>>> jdk.internal.jimage.decompressor in both module jrt.fs and module
>>>> java.base
>>>>> 2) then i have edited the configuration that was created by running
>> step
>>>> 1,
>>>>> changing the following
>>>>>   Tab JRE -> set Alternate JRE to point to java-8-openjdk-amd64
>>>>>
>>>>> Exception in thread "main" java.lang.IllegalArgumentException: Value
>> for
>>>>>> descriptor is missing - Arguments: [descriptor, null, inputFolder,
>>>>>>
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Finput,
>>>>>> outputFolder,
>>>>>>
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Foutput,
>>>>>> mode, run, encoding, UTF-8, view, _InitialView, inputRecursive, false,
>>>>>> addsdi, false, serialFormat, XMI]
>>>>>> at
>>>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.throwException(RutaLauncher.java:172)
>>>>>> at
>>>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.parseCmdLineArgs(RutaLauncher.java:111)
>>>>>> at
>>>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:176)
>>>>> this looks  like something very basic related to environment
>>>> configuration,
>>>>> but as being non-java guy, i can not solve it.
>>>>>
>>>>> If ExampleProject works for you, could you share your setup: OS,
>> eclipse
>>>>> version, JDK, etc whatever is relevant?
>>>>>
>>>>> Best regards,
>>>>> Nikolai
>>>>>
>>>>> On Tue, Apr 9, 2019 at 3:27 PM Peter Klügl  wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> hmm

Re: how to set scriptPath

2019-04-16 Thread Peter Klügl
Hi,


I assume that you use a simple Ruta project (compared to a maven project
with the ruta-maven-plugin)?


Normally, you should not need to set the scriptPaths configuration
parameter as it is set automatically by the builder to the absolute
paths to the "script" folder of your ruta project.

The file "descriptor/BasicEngine.xml" is only the template for the
generated descriptors for your Ruta scripts. Thus, is isn't used for
launching scripts.

Can you check the parameter value of the descriptor of the Ruta script
you want to run?

If you want to use an additional script, you can simply import it in
your script (using the correct package with "SCRIPT" and then execute it
with "CALL"). The Workbench should take care of all the configuration. 


In any case this works not as expected, here are the two configuration
parameter values you would need to set:

    
    scriptPaths
    
    
    C:/src/ws/ws-ta/ARutaTest/script
    
    
    


    
    additionalScripts
    
    
    uima.example.Test2
    
    
    


Please mind the "s" in scriptPaths. The values can be relative, but the
to datapath, which can be set in the project preferences.


I hope that helps.


Best,


Peter



Am 15.04.2019 um 22:46 schrieb Nikolai Krot:
> Hi,
>
> In a ruta rule script, I want to include and run another script that is
> located in a subdirectory w.r.t to the location of the current script. From
> reading the documentation, it seems that the variable *scriptPath* needs to
> be set to that subdirectory for *SCRIPT* directive to work. Or am I wrong?
>
> I can not figure out how to set *scriptPath*. One possibility would be to
> add this configuration to *descriptor/BasicEngine.xml* file. Unfortunately,
> I can not find any example of how to accomplish it. Should it be like below?
>
> 
>>   scriptPath
>>
>> path/to/subdirectory
>>
>> 
>>
> What if I want to set several path values?
> Do values need to be relative to the project root directory?
>
> Thank you in advance,
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Problem running UIMA Ruta ExampleProject

2019-04-16 Thread Peter Klügl
Hi,


hmm strange, that should not be necessary and I do not know why it has
any influence at all.

I'll try to reproduce your problem anyway. Maybe I find the source of
the problem in Ruta.


Best,


Peter


Am 15.04.2019 um 22:29 schrieb Nikolai Krot:
> Hi Peter,
>
> Finally, things started to work. Though I can not tell why exactly.
> Retrospectively, i would guess it happened once I compiled the stuff in
> ruta-2.7.0 directory and set UIMA_HOME variable. Not sure nor willing to
> investigate it.
>
> I think this page
> https://uima.apache.org/d/ruta-current/tools.ruta.book.html#section.ugr.tools.ruta.workbench.install
> does not mention it but I remember reading something somewhere else...
>
> Best regards,
> Nikolai
>
>
>
> On Mon, Apr 15, 2019 at 1:54 PM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> I tested it with Windows 10, Eclipse 4.8, Java 8.
>>
>>
>> I'll take a closer look at it the next days.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 13.04.2019 um 10:58 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> Thanks for supporting me :) This is what I tried on Ubuntu 18.04,
>>> unfortunately to no avail:
>>>
>>> 1) i have deleted the configuration and launched the Main.ruta script
>>> directly. I have the following
>>>
>>> Error occurred during initialization of boot layer
>>>> java.lang.LayerInstantiationException: Package
>>>> jdk.internal.jimage.decompressor in both module jrt.fs and module
>> java.base
>>> 2) then i have edited the configuration that was created by running step
>> 1,
>>> changing the following
>>>   Tab JRE -> set Alternate JRE to point to java-8-openjdk-amd64
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException: Value for
>>>> descriptor is missing - Arguments: [descriptor, null, inputFolder,
>>>>
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Finput,
>>>> outputFolder,
>>>>
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Foutput,
>>>> mode, run, encoding, UTF-8, view, _InitialView, inputRecursive, false,
>>>> addsdi, false, serialFormat, XMI]
>>>> at
>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.throwException(RutaLauncher.java:172)
>>>> at
>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.parseCmdLineArgs(RutaLauncher.java:111)
>>>> at
>>>>
>> org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:176)
>>> this looks  like something very basic related to environment
>> configuration,
>>> but as being non-java guy, i can not solve it.
>>>
>>> If ExampleProject works for you, could you share your setup: OS, eclipse
>>> version, JDK, etc whatever is relevant?
>>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Tue, Apr 9, 2019 at 3:27 PM Peter Klügl  wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> hmm, I think I have to try to reproduce it...
>>>>
>>>>
>>>> Can you try it with Java 8? There is an option in the launch config.
>>>>
>>>> Can you delete the old launch config and try to launch the script
>>>> directly: Open the script in the Editor and simply press the Run button
>>>> without selecting anything?
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 09.04.2019 um 14:46 schrieb Nikolai Krot:
>>>>> Hallo Peter,
>>>>>
>>>>> Thank you for your prompt reply.
>>>>>
>>>>> As recommended, I tried Photon (4.8.0), this time on Ubuntu 18.04
>>>>> (openjdk-11), everything from the official repository. I applied the
>> same
>>>>> steps as before, except
>>>>>
>>>>> 5) in Eclipse, go to ExampleProject/input/Example1.txt, right click,
>>>> select
>>>>> [Debug As]. There is no ready configuration, so i created one. Well,
>> the
>>>>> only thing I did was pointing Launch script to Main.ruta.
>>>>>
>>>>> The error now is different, "Cannot connect to VM, socket closed".
>> When I
>>>>> just [Run as], the error pops up
>>>>>
>>>>> Error occurred during in

Re: Problem running UIMA Ruta ExampleProject

2019-04-15 Thread Peter Klügl
Hi,


I tested it with Windows 10, Eclipse 4.8, Java 8.


I'll take a closer look at it the next days.


Best,


Peter


Am 13.04.2019 um 10:58 schrieb Nikolai Krot:
> Hi Peter,
>
> Thanks for supporting me :) This is what I tried on Ubuntu 18.04,
> unfortunately to no avail:
>
> 1) i have deleted the configuration and launched the Main.ruta script
> directly. I have the following
>
> Error occurred during initialization of boot layer
>> java.lang.LayerInstantiationException: Package
>> jdk.internal.jimage.decompressor in both module jrt.fs and module java.base
>>
> 2) then i have edited the configuration that was created by running step 1,
> changing the following
>   Tab JRE -> set Alternate JRE to point to java-8-openjdk-amd64
>
> Exception in thread "main" java.lang.IllegalArgumentException: Value for
>> descriptor is missing - Arguments: [descriptor, null, inputFolder,
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Finput,
>> outputFolder,
>> %2Fhome%2Fkrot%2Feclipse-other%2Fruta-2.7.0%2Fexample-projects%2FExampleProject%2Foutput,
>> mode, run, encoding, UTF-8, view, _InitialView, inputRecursive, false,
>> addsdi, false, serialFormat, XMI]
>> at
>> org.apache.uima.ruta.ide.launching.RutaLauncher.throwException(RutaLauncher.java:172)
>> at
>> org.apache.uima.ruta.ide.launching.RutaLauncher.parseCmdLineArgs(RutaLauncher.java:111)
>> at
>> org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:176)
>>
>
> this looks  like something very basic related to environment configuration,
> but as being non-java guy, i can not solve it.
>
> If ExampleProject works for you, could you share your setup: OS, eclipse
> version, JDK, etc whatever is relevant?
>
> Best regards,
> Nikolai
>
> On Tue, Apr 9, 2019 at 3:27 PM Peter Klügl  wrote:
>
>> Hi,
>>
>>
>> hmm, I think I have to try to reproduce it...
>>
>>
>> Can you try it with Java 8? There is an option in the launch config.
>>
>> Can you delete the old launch config and try to launch the script
>> directly: Open the script in the Editor and simply press the Run button
>> without selecting anything?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 09.04.2019 um 14:46 schrieb Nikolai Krot:
>>> Hallo Peter,
>>>
>>> Thank you for your prompt reply.
>>>
>>> As recommended, I tried Photon (4.8.0), this time on Ubuntu 18.04
>>> (openjdk-11), everything from the official repository. I applied the same
>>> steps as before, except
>>>
>>> 5) in Eclipse, go to ExampleProject/input/Example1.txt, right click,
>> select
>>> [Debug As]. There is no ready configuration, so i created one. Well, the
>>> only thing I did was pointing Launch script to Main.ruta.
>>>
>>> The error now is different, "Cannot connect to VM, socket closed". When I
>>> just [Run as], the error pops up
>>>
>>> Error occurred during initialization of boot layer
>>>> java.lang.LayerInstantiationException: Package
>>>> jdk.internal.jimage.decompressor in both module jrt.fs and module
>> java.base
>>>  looks like something really wrong.. I know this is not related to UIMA
>>> Ruta, but perhaps you could provide me with further advice on that?
>>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Mon, Apr 8, 2019 at 10:15 AM Peter Klügl 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> there are some known issue with the latest Eclipse version. Can you try
>>>> it with Eclipse Photon (4.8.0)?
>>>>
>>>> Is there more Stacktrace output? If yes, can you add it too?
>>>>
>>>>
>>>> BEst,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> Am 07.04.2019 um 15:36 schrieb Nikolai Krot:
>>>>> Hi all,
>>>>>
>>>>> I am trying out UIMA Ruta and having hard times figuring out how to run
>>>>> ExampleProject. I am not a Java guy and at this point feel already
>>>>> frustrated with abundant documentation.
>>>>>
>>>>> My steps so far have been to:
>>>>>
>>>>> 1) install Eclipse Version: 2019-03 (4.11.0) on Debian Linux
>>>>> 2) install UIMA Ruta plugin from
>>>>> http://www.apache.org/dist/uima/eclipse-update-site (via elipse menu
>>>>> [Insta

Re: Problem running UIMA Ruta ExampleProject

2019-04-09 Thread Peter Klügl
Hi,


hmm, I think I have to try to reproduce it...


Can you try it with Java 8? There is an option in the launch config.

Can you delete the old launch config and try to launch the script
directly: Open the script in the Editor and simply press the Run button
without selecting anything?


Best,


Peter


Am 09.04.2019 um 14:46 schrieb Nikolai Krot:
> Hallo Peter,
>
> Thank you for your prompt reply.
>
> As recommended, I tried Photon (4.8.0), this time on Ubuntu 18.04
> (openjdk-11), everything from the official repository. I applied the same
> steps as before, except
>
> 5) in Eclipse, go to ExampleProject/input/Example1.txt, right click, select
> [Debug As]. There is no ready configuration, so i created one. Well, the
> only thing I did was pointing Launch script to Main.ruta.
>
> The error now is different, "Cannot connect to VM, socket closed". When I
> just [Run as], the error pops up
>
> Error occurred during initialization of boot layer
>> java.lang.LayerInstantiationException: Package
>> jdk.internal.jimage.decompressor in both module jrt.fs and module java.base
>>
>  looks like something really wrong.. I know this is not related to UIMA
> Ruta, but perhaps you could provide me with further advice on that?
>
> Best regards,
> Nikolai
>
> On Mon, Apr 8, 2019 at 10:15 AM Peter Klügl 
> wrote:
>
>> Hi,
>>
>>
>> there are some known issue with the latest Eclipse version. Can you try
>> it with Eclipse Photon (4.8.0)?
>>
>> Is there more Stacktrace output? If yes, can you add it too?
>>
>>
>> BEst,
>>
>>
>> Peter
>>
>>
>>
>> Am 07.04.2019 um 15:36 schrieb Nikolai Krot:
>>> Hi all,
>>>
>>> I am trying out UIMA Ruta and having hard times figuring out how to run
>>> ExampleProject. I am not a Java guy and at this point feel already
>>> frustrated with abundant documentation.
>>>
>>> My steps so far have been to:
>>>
>>> 1) install Eclipse Version: 2019-03 (4.11.0) on Debian Linux
>>> 2) install UIMA Ruta plugin from
>>> http://www.apache.org/dist/uima/eclipse-update-site (via elipse menu
>>> [Install New Software])
>>> 3) download und unpack source distribution of UIMA Ruta 2.7.0 (found at
>>> http://uima.apache.org/downloads.cgi#Latest%20Official%20Releases)
>>> 4) start eclipse and import ExampleProject from the distribution
>> downloaded
>>> in step 3)
>>> 5) in Eclipse, go to ExampleProject/input/Example1.txt, right click,
>> select
>>> [Debug As], then in Debug Configurations there is already Main.ruta, i
>>> simply choose it and click Debug button.
>>>
>>> As a result, I get an error Source not found. This is the stack:
>>>
>>> org.apache.uima.ruta.ide.launching.RutaLauncher at localhost:43931
>>>> Thread [main] (Suspended (exception
>>>> AnalysisEngineProcessException))
>>>> PrimitiveAnalysisEngine_impl.innerCall(CAS) line: 336
>>>> PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(CAS) line:
>>>> 321
>>>>
>>  PrimitiveAnalysisEngine_impl(AnalysisEngineImplBase).process(CAS)
>>>> line: 269
>>>> RutaLauncher.processFile(File, AnalysisEngine, CAS) line: 242
>>>> RutaLauncher.main(String[]) line: 191
>>>>
>>> Could you please help me understand what is wrong? What am I missing?
>>> Please bare with a non-Java guy if I am asking a really basic/stupid/etc
>>> question.
>>>
>>> Thank you in advance
>>> Nikolai
>>>
>> --
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>>


Re: Problem running UIMA Ruta ExampleProject

2019-04-08 Thread Peter Klügl
Hi,


there are some known issue with the latest Eclipse version. Can you try
it with Eclipse Photon (4.8.0)?

Is there more Stacktrace output? If yes, can you add it too?


BEst,


Peter



Am 07.04.2019 um 15:36 schrieb Nikolai Krot:
> Hi all,
>
> I am trying out UIMA Ruta and having hard times figuring out how to run
> ExampleProject. I am not a Java guy and at this point feel already
> frustrated with abundant documentation.
>
> My steps so far have been to:
>
> 1) install Eclipse Version: 2019-03 (4.11.0) on Debian Linux
> 2) install UIMA Ruta plugin from
> http://www.apache.org/dist/uima/eclipse-update-site (via elipse menu
> [Install New Software])
> 3) download und unpack source distribution of UIMA Ruta 2.7.0 (found at
> http://uima.apache.org/downloads.cgi#Latest%20Official%20Releases)
> 4) start eclipse and import ExampleProject from the distribution downloaded
> in step 3)
> 5) in Eclipse, go to ExampleProject/input/Example1.txt, right click, select
> [Debug As], then in Debug Configurations there is already Main.ruta, i
> simply choose it and click Debug button.
>
> As a result, I get an error Source not found. This is the stack:
>
> org.apache.uima.ruta.ide.launching.RutaLauncher at localhost:43931
>> Thread [main] (Suspended (exception
>> AnalysisEngineProcessException))
>> PrimitiveAnalysisEngine_impl.innerCall(CAS) line: 336
>> PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(CAS) line:
>> 321
>> PrimitiveAnalysisEngine_impl(AnalysisEngineImplBase).process(CAS)
>> line: 269
>> RutaLauncher.processFile(File, AnalysisEngine, CAS) line: 242
>> RutaLauncher.main(String[]) line: 191
>>
> Could you please help me understand what is wrong? What am I missing?
> Please bare with a non-Java guy if I am asking a really basic/stupid/etc
> question.
>
> Thank you in advance
> Nikolai
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Conditional MARKFAST in Ruta

2019-04-07 Thread Peter Klügl

Hi,


I do not see a way to do something like a conditional MARKFAST in Ruta 
as you described.



I would postprocess the annotations with an UNMARK rule or add another 
layer by creating temp annotations. Then, only create 
MyFeatureAnnotations for those that start with a sentence.



Best,


Peter


Am 04.04.2019 um 12:26 schrieb Mario Juric:

Hi,

I am trying out some simple rule based feature extraction using Ruta. In one of 
the cases I want to mark up places where certain phrases occur in the beginning 
of a sentence, and I want to keep the phrases in a list that I load as resource 
and then apply using the MARKFAST action, but I don’t know how to constrain it 
to sentence beginnings only. Doing something like

Sentence { STARTSWITH(Sentence) ->
 MARKFAST(MyFeatureAnnotation, phraseList)
}

will obviously not work because the sentence always starts with a sentence, and 
it doesn’t seem to put any constrain on the MARKFAST action itself. I can then 
try to unmark those again that don’t align with a sentence start, but it’s 
kinda clumsy, and I have to be careful not to unmark other MyFeatureAnnotation 
created previously by other rules/annotators. It’s doable but I like to 
understand if there is a smarter way, which has so far eluded me after looking 
at the documentation?

Any input is appreciated, thanks :)

Cheers,
Mario Juric















--
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta start issues

2019-03-01 Thread Peter Klügl
Hi Erik,

Am 01.03.2019 um 13:58 schrieb Erik Fäßler:
> Hi Peter,
>
>>
>> Yes, eventually the type system description of the script will be used
>> to create the CAS for the launch delegate and that generated type system
>> description imports (internally) RutaInternalTypeSystem.xml. In your
>> case this description is not up to date. You can, for example, replace
>> it using the popup UIMA Ruta -> Update Project, or simply create a new
>> project and copy the descriptors. Then, maybe you need to modify the
>> script and save it in order to force the builder to update the generated
>> descriptions.
> “Update Project” did the trick, thank you!
>
> But I think that all users of Ruta 2.7 would run into the issue because I did 
> not modify the ExampleProject in any way, I just imported it the way the 
> documentation told me to from the fresh UIMA Ruta 2.7 download.


Yes, I forgot to update the descriptors, and also forgot to mention that
users should update their old projects.  Jira ticket created.


>
> Thanks a bunch for the quick help!


You are welcome :-)



Best,


Peter



>
> Yours,
>
> Erik
>
>>
>>>> Please mind that the Annotation Editor is able to remember the type
>>>> system description that was used previously (see preferences). And
>>>> concerning Ruta, the descriptors are not automatically updated when a
>>>> new version of the Workbench is installed.
>>> I am not so sure that this is an issue with the editor. The warnings and 
>>> errors happen when I run the scripts rather when viewing the results.
>>
>> There could be a problem with the CAS Editor extensions like the Applied
>> Rules view. However, if the generated descriptor is fine, then there
>> should bo no issue.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> I hope we can find a solution, I am sure I just have beginner’s headache.
>>>
>>> I am very impressed by the sophisticated tooling you have built over the 
>>> years, I would love to use it in my work.
>>>
>>> Best,
>>>
>>> Erik
>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> Am 01.03.2019 um 12:37 schrieb Erik Fäßler:
>>>>> Hi all,
>>>>>
>>>>> I am trying to get started with UIMA Ruta. It really looks like an 
>>>>> interesting and useful project. So I installed a fresh Eclipse, 
>>>>> downloaded the 2.7 source distribution from the UIMA download page and 
>>>>> imported the ExampleProject.
>>>>>
>>>>> When I try to run the Main.ruta script with JRE 11.0.2 on my Mac, I get 
>>>>> the error message
>>>>>
>>>>> Error occurred during initialization of boot layer
>>>>> java.lang.LayerInstantiationException: Package 
>>>>> jdk.internal.jimage.decompressor in both module jrt.fs and module 
>>>>> java.base
>>>>>
>>>>> So I switched to JRE 1.8.0_152. Now, the code will run but output
>>>>>
>>>>> Mar 01, 2019 12:34:07 PM org.apache.uima.jcas.impl.JCasImpl 
>>>>> reportInitErrors(810)
>>>>> WARNING: 
>>>>> JCas Type "org.apache.uima.ruta.type.DebugBlockApply" implements getters 
>>>>> and setters for feature "timestamp", but the type system doesnt define 
>>>>> that feature.
>>>>> JCas Type "org.apache.uima.ruta.type.DebugRuleApply" implements getters 
>>>>> and setters for feature "timestamp", but the type system doesnt define 
>>>>> that feature.
>>>>> JCas Type "org.apache.uima.ruta.type.DebugScriptApply" implements getters 
>>>>> and setters for feature "timestamp", but the type system doesnt define 
>>>>> that feature.
>>>>>
>>>>> Then, I wanted to check out the “Ruta Explain” perspective and tried to 
>>>>> run Main.ruta with the debugger. This won’t finish at all due to the 
>>>>> exception
>>>>>
>>>>> org.apache.uima.cas.CASRuntimeException: Feature "timestamp" is not 
>>>>> defined for type "org.apache.uima.ruta.type.DebugScriptApply”.
>>>>>
>>>>> It actually seems to be straight forward what is going wrong, the compile 
>>>>> type system class does not match the descriptor. But this look like Ruta 
>>>>> internals to me, I am not quite sure if and where I went down the wrong 
>>>>> path.
>>>>>
>>>>> Any hints?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Erik
>>>> -- 
>>>> Dr. Peter Klügl
>>>> R&D Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Tennenbacher Str. 11
>>>> 79106 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.klu...@averbis.com
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>> -- 
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Tennenbacher Str. 11
>> 79106 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com <mailto:peter.klu...@averbis.com>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>


Re: Ruta start issues

2019-03-01 Thread Peter Klügl
Hi Erik,

Am 01.03.2019 um 12:59 schrieb Erik Fäßler:
> Hi Peter,
>
> thanks for that rapid response!
>
> OK, I will stick to Java8 for the time being.
>
>> Can you check if the feature is available in the type system description
>> you used to create the CAS and, in the case of the Workbench, in the
>> type system description you used to open the CAS in the Annotation Editor?
> OK, newbie here :-) I just pushed the “run” button of the editor. I have not 
> yet learned how to specify the type system.
> The example Ruta scripts specify
> TYPESYSTEM types.BibtexTypeSystem;
> but this can’t be the issue because this type system does not import other 
> types.
>
> I can only imagine that the type system that resides in the descriptor/ 
> subdirectory is also loaded. Especially since this directory contains the 
> InternalTypeSystem.xml file that also specifies the 
> org.apache.uima.ruta.type.DebugBlockApply type - which does not specify the 
> timestamp feature.


Yes, eventually the type system description of the script will be used
to create the CAS for the launch delegate and that generated type system
description imports (internally) RutaInternalTypeSystem.xml. In your
case this description is not up to date. You can, for example, replace
it using the popup UIMA Ruta -> Update Project, or simply create a new
project and copy the descriptors. Then, maybe you need to modify the
script and save it in order to force the builder to update the generated
descriptions.


>
>> Please mind that the Annotation Editor is able to remember the type
>> system description that was used previously (see preferences). And
>> concerning Ruta, the descriptors are not automatically updated when a
>> new version of the Workbench is installed.
> I am not so sure that this is an issue with the editor. The warnings and 
> errors happen when I run the scripts rather when viewing the results.


There could be a problem with the CAS Editor extensions like the Applied
Rules view. However, if the generated descriptor is fine, then there
should bo no issue.


Best,


Peter



> I hope we can find a solution, I am sure I just have beginner’s headache.
>
> I am very impressed by the sophisticated tooling you have built over the 
> years, I would love to use it in my work.
>
> Best,
>
> Erik
>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>> Am 01.03.2019 um 12:37 schrieb Erik Fäßler:
>>> Hi all,
>>>
>>> I am trying to get started with UIMA Ruta. It really looks like an 
>>> interesting and useful project. So I installed a fresh Eclipse, downloaded 
>>> the 2.7 source distribution from the UIMA download page and imported the 
>>> ExampleProject.
>>>
>>> When I try to run the Main.ruta script with JRE 11.0.2 on my Mac, I get the 
>>> error message
>>>
>>> Error occurred during initialization of boot layer
>>> java.lang.LayerInstantiationException: Package 
>>> jdk.internal.jimage.decompressor in both module jrt.fs and module java.base
>>>
>>> So I switched to JRE 1.8.0_152. Now, the code will run but output
>>>
>>> Mar 01, 2019 12:34:07 PM org.apache.uima.jcas.impl.JCasImpl 
>>> reportInitErrors(810)
>>> WARNING: 
>>> JCas Type "org.apache.uima.ruta.type.DebugBlockApply" implements getters 
>>> and setters for feature "timestamp", but the type system doesnt define that 
>>> feature.
>>> JCas Type "org.apache.uima.ruta.type.DebugRuleApply" implements getters and 
>>> setters for feature "timestamp", but the type system doesnt define that 
>>> feature.
>>> JCas Type "org.apache.uima.ruta.type.DebugScriptApply" implements getters 
>>> and setters for feature "timestamp", but the type system doesnt define that 
>>> feature.
>>>
>>> Then, I wanted to check out the “Ruta Explain” perspective and tried to run 
>>> Main.ruta with the debugger. This won’t finish at all due to the exception
>>>
>>> org.apache.uima.cas.CASRuntimeException: Feature "timestamp" is not defined 
>>> for type "org.apache.uima.ruta.type.DebugScriptApply”.
>>>
>>> It actually seems to be straight forward what is going wrong, the compile 
>>> type system class does not match the descriptor. But this look like Ruta 
>>> internals to me, I am not quite sure if and where I went down the wrong 
>>> path.
>>>
>>> Any hints?
>>>
>>> Thanks,
>>>
>>> Erik
>> -- 
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learn

Re: Ruta start issues

2019-03-01 Thread Peter Klügl
Hi Erik,


I am glad to hear that you take a look at Ruta :-)


There are still some issues with Java 11 in the latest release.


Your analysis is completely correct. The feature "timestamp" is new in
Ruta 2.7.0.


Can you check if the feature is available in the type system description
you used to create the CAS and, in the case of the Workbench, in the
type system description you used to open the CAS in the Annotation Editor?

Please mind that the Annotation Editor is able to remember the type
system description that was used previously (see preferences). And
concerning Ruta, the descriptors are not automatically updated when a
new version of the Workbench is installed.


Best,


Peter



Am 01.03.2019 um 12:37 schrieb Erik Fäßler:
> Hi all,
>
> I am trying to get started with UIMA Ruta. It really looks like an 
> interesting and useful project. So I installed a fresh Eclipse, downloaded 
> the 2.7 source distribution from the UIMA download page and imported the 
> ExampleProject.
>
> When I try to run the Main.ruta script with JRE 11.0.2 on my Mac, I get the 
> error message
>
> Error occurred during initialization of boot layer
> java.lang.LayerInstantiationException: Package 
> jdk.internal.jimage.decompressor in both module jrt.fs and module java.base
>
> So I switched to JRE 1.8.0_152. Now, the code will run but output
>
> Mar 01, 2019 12:34:07 PM org.apache.uima.jcas.impl.JCasImpl 
> reportInitErrors(810)
> WARNING: 
> JCas Type "org.apache.uima.ruta.type.DebugBlockApply" implements getters and 
> setters for feature "timestamp", but the type system doesnt define that 
> feature.
> JCas Type "org.apache.uima.ruta.type.DebugRuleApply" implements getters and 
> setters for feature "timestamp", but the type system doesnt define that 
> feature.
> JCas Type "org.apache.uima.ruta.type.DebugScriptApply" implements getters and 
> setters for feature "timestamp", but the type system doesnt define that 
> feature.
>
> Then, I wanted to check out the “Ruta Explain” perspective and tried to run 
> Main.ruta with the debugger. This won’t finish at all due to the exception
>
> org.apache.uima.cas.CASRuntimeException: Feature "timestamp" is not defined 
> for type "org.apache.uima.ruta.type.DebugScriptApply”.
>
> It actually seems to be straight forward what is going wrong, the compile 
> type system class does not match the descriptor. But this look like Ruta 
> internals to me, I am not quite sure if and where I went down the wrong path.
>
> Any hints?
>
> Thanks,
>
> Erik

-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



[ANNOUNCE] Apache UIMA Ruta 2.7.0 released

2019-02-24 Thread Peter Klügl
The Apache UIMA team is pleased to announce the release of
Apache UIMA Ruta (Rule-based Text Annotation), version 2.7.0.

The Unstructured Information Management Architecture (UIMA) is a
component framework supporting development, discovery, composition, and
deployment of multi-modal analytics tasked with the analysis of
unstructured information.

Apache UIMA is an Apache licensed open source implementation of the UIMA
specification which is being developed by a technical committee within
OASIS, a standards organization. The implementation comprises an SDK and
tooling for composing and running analytic components written in Java
and C++, with some support for Perl, Python and TCL.

Apache UIMA Ruta is a rule-based script language supported by
Eclipse-based tooling. The language is designed to enable rapid
development of text processing applications within UIMA. A special focus
lies on the intuitive and flexible domain specific language for defining
patterns of annotations. The Eclipse-based tooling,
called the Apache UIMA Ruta Workbench, was created to support the
user and to facilitate every step when writing rules. Both
the rule language and the workbench integrate
smoothly with Apache UIMA.

Major Changes in this Release

UIMA Ruta Language and Analysis Engine:

- Requires Java 8
- New language feature: label expressions at actions for directly
assigning/reusing newly created annotations. Example: Document{-> a:T1,
CREATE(T2, "ref" = a)};
- New language feature: new type of rule element for completely optional
match which does not require an existing annotation and therefore also
works at the boundary of a window/document. Example: NUM _{-PARTOF(CW)};
- Type lists can be used as matching condition.
- Initial default value of string and annotations variables is now null.
- Comparison of annotation and annotation list are now supported.
- New configuration parameter 'inferenceVisitors'.
- New configuration parameter 'maxRuleMatches'.
- New configuration parameter 'maxRuleElementMatches'.
- New configuration parameter 'rulesScriptName'.
- Inlined rules as condition are only evaluated if the rule element
match was successful.
- Multiple inlined rule blocks are allowed at one rule element.
- String features with allowed values are supported.
- PlainTextAnnotator supports vertical tabs.
- Various improvements for WORDTABLE.
- Thrown exceptions include script name.
- Fixed values of label for failed matches.
- Fixed inlined rules as condition at wildcards.
- Fixed resetting of annotation-based variables.
- Fixed various bugs of wildcards.
- Fixed CONTAINS condition for annotations overlapping the window.
- Fixed COUNT condition.
- Fixed setting variables by configuration parameter.

UIMA Ruta Workbench:

- Query View support more CAS formats.
- Fixed order of scripts in Applied Rules view.
- Fixed reporting of non-existing problems in editor.


For a full list of the changes, please refer to Jira:
http://uima.apache.org/d/ruta-2.7.0/issuesFixed/jira-report.html

More information about UIMA Ruta can be found here:
http://uima.apache.org/ruta.html

 - Peter Klügl, for the Apache UIMA development team










signature.asc
Description: OpenPGP digital signature


Re: How to get TextRuler to work

2019-02-14 Thread Peter Klügl
Hi,

Am 14.02.2019 um 09:44 schrieb Mandy Neumann:
> Hi,
>
> yes I already read elsewhere that TextRuler is not actively
> maintained. Which is a pity because it has the potential to be a
> really cool tool! We really need something like that in our project.
>

I could talk for hours about black-box vs white-box machine learning vs
manual rule engineering also concerning the maintanance of the annotator
and developement cycles...

Afterall, it's a question of priorities where the available time will be
spent and even if TextRuler is cool, it was always at the bottom of my
priority list for several years.


> Thanks for pointing me to the error log, I eventually found the
> problem in a NullPointerException - Ruta was not able to find the
> script Base.ruta I had imported in Features.ruta (I wanted to mirror
> the example project as closely as possible). Although both reside in
> the same package, it was not clear to me that I still have to use the
> fully qualified name to import.
>

Glad to hear that you found the problem.


> By the way, in case anybody wants to pick up maintaining TextRuler
> again, I would suggest to improve a bit on error handling here.


You can also help :-)

Just by creating a bug report in our Jira, you will put it on my todo
list. Also if I do not find the time to fix it, maybe others will.

https://issues.apache.org/jira/browse/UIMA-5987?jql=component%20%3D%20ruta


Best,


Peter



>
> Cheers,
>
> Mandy
>
>
> Am 12.02.19 um 16:48 schrieb Peter Klügl:
>> Hi,
>>
>>
>> I haven't used TextRuler for years and therefore I cannot tell right
>> away what the issue could be. I also have to mention that the view and
>> its algorithms are not actively maintained.
>>
>>
>> Is there a message in the Error Log of Eclipse?
>>
>>
>> Can you provide a minimal exmaple for reproducing the problem?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 11.02.2019 um 17:22 schrieb Mandy Neumann:
>>> Hi,
>>>
>>> after fixing my initial problems with the CAS views, I now want to
>>> proceed to my main task. I want to apply TextRuler to learn a set of
>>> rules instead of writing them manually.
>>>
>>> Unfortunately, I can't get TextRuler to work in my project. The
>>> example project works, but when I start the process on my project, the
>>> view displays "Preprocessing... Loading XMI file (input): " and
>>> does nothing more. I also cannot stop, pressing the stop button has no
>>> effect, I need to restart the workbench to be able to change TextRuler
>>> settings.
>>>
>>> I'm not sure if my workflow is even right. The first step is to
>>> convert html input into XMIs with the html tags as annotations. In
>>> order to create the training data, I figured I need to include the
>>> target type system already at this step. I then used the
>>> "Annotate/Quick Annotate" options in the CAS viewer to create the gold
>>> standard annotations.
>>>
>>> I also created two Ruta scripts to be used by TextRuler. Base.ruta
>>> basically just declares the target type system. Features.ruta creates
>>> some basic annotations that should be used by TextRuler to infer rules.
>>>
>>> Anybody an idea why my workflow won't work?
>>>
>>> Cheers,
>>>
>>> Mandy
>>>
>>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: How to get TextRuler to work

2019-02-12 Thread Peter Klügl
Hi,


I haven't used TextRuler for years and therefore I cannot tell right
away what the issue could be. I also have to mention that the view and
its algorithms are not actively maintained.


Is there a message in the Error Log of Eclipse?


Can you provide a minimal exmaple for reproducing the problem?


Best,


Peter


Am 11.02.2019 um 17:22 schrieb Mandy Neumann:
> Hi,
>
> after fixing my initial problems with the CAS views, I now want to
> proceed to my main task. I want to apply TextRuler to learn a set of
> rules instead of writing them manually.
>
> Unfortunately, I can't get TextRuler to work in my project. The
> example project works, but when I start the process on my project, the
> view displays "Preprocessing... Loading XMI file (input): " and
> does nothing more. I also cannot stop, pressing the stop button has no
> effect, I need to restart the workbench to be able to change TextRuler
> settings.
>
> I'm not sure if my workflow is even right. The first step is to
> convert html input into XMIs with the html tags as annotations. In
> order to create the training data, I figured I need to include the
> target type system already at this step. I then used the
> "Annotate/Quick Annotate" options in the CAS viewer to create the gold
> standard annotations.
>
> I also created two Ruta scripts to be used by TextRuler. Base.ruta
> basically just declares the target type system. Features.ruta creates
> some basic annotations that should be used by TextRuler to infer rules.
>
> Anybody an idea why my workflow won't work?
>
> Cheers,
>
> Mandy
>
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Issues with Ruta workbench (Permission Denied and wrong output view)

2019-02-06 Thread Peter Klügl
Hi,


does the plain vs _InitialView problem occur in the CASes in the output
folder or in the converted folder?


"output" should contain the result of the script processing. The
_InitialView is set by the launcher, it's static and cannot be changed.

"converted" should contain additional CASes where the plain view is
copied to the _InitialView, which hasn't been set yet.


(Although I think that I have written those rules as an example some
time ago, I personally prefer to perform the HTML conversion in Java)


Best,


Peter


Am 06.02.2019 um 16:18 schrieb Mandy Neumann:
> Hi,
>
> after some additional digging I found this setting in the workbench
> preferences where SourceDocumentInformation is used for the output
> parameter. This seems to have fixed the permission issue, I get no
> more exceptions.
>
> Unfortunately, the problem with plain vs. _InitialView still persists,
> which is kind of annoying. Any ideas on that? (I'd like to also make
> sure that this is not causing any further problems in my planned
> workflow.)
>
> Best,
>
> Mandy
>
> Am 06.02.19 um 15:40 schrieb Marshall Schor:
>> hi,
>>
>> I'm not an expert, but I'm guessing that there still is a permissions
>> issue,
>> perhaps on a different file or directory than the one you checked.
>>
>> Try having someone else take a look at your stack trace / error
>> message, and
>> your file system permissions.  A second pair of eyes often is helpful
>> (I speak
>> from personal experience).
>>
>> Cheers. -Marshall
>>
>> On 2/6/2019 5:44 AM, Mandy Neumann wrote:
>>> Hi all,
>>>
>>> I'm just starting to get familiar with UIMA Ruta and the workbench,
>>> and I'm
>>> having some strange issues.
>>>
>>> I got a project from a co-worker who already prepared some scripts
>>> for me to
>>> extend. The project has .html files in the input folder, and he already
>>> provided a Ruta script to convert HTML markup into annotations. The
>>> script is
>>> adapted from the Ruta manual:
>>>
>>>> ENGINE utils.HtmlAnnotator;
>>>> ENGINE utils.HtmlConverter;
>>>> ENGINE HtmlViewWriter;
>>>> TYPESYSTEM utils.HtmlTypeSystem;
>>>> TYPESYSTEM utils.SourceDocumentInformation;
>>>>
>>>> Document{->CONFIGURE(HtmlAnnotator, "onlyContent"=true),
>>>> EXEC(HtmlAnnotator,
>>>> {TAG})};
>>>>
>>>> Document { -> CONFIGURE(HtmlConverter, "inputView" = "_InitialView",
>>>>  "outputView" = "plain", "expandOffsets"=false,
>>>> "replaceLinebreaks"=true,
>>>> "skipWhitespacs"=true, "linebreakReplacement"=" ", "processAll"=true),
>>>>    EXEC(HtmlConverter)};
>>>>
>>>> Document{ -> CONFIGURE(HtmlViewWriter, "inputView" = "plain",
>>>>  "outputView" = "_InitialView", "output" = "../converted"),
>>>>  EXEC(HtmlViewWriter)};
>>> On my machine and with my settings, when I run this script, my
>>> console get
>>> spammed with
>>> org.apache.uima.analysis_engine.AnalysisEngineProcessExceptions
>>> caused by java.io.FileNotFoundException
>>>   with the message "../converted (Permission denied)". I checked the
>>> file
>>> permissions on this directory which were 775 - I even chmodded to
>>> 777 but
>>> still the same issue.
>>>
>>> In spite of all these exceptions, the output still gets generated,
>>> though. I
>>> would be fine with it if there weren't another issue - although the
>>> script
>>> should write the annotations into _InitialView, I need to change the
>>> view to
>>> "plain" in the editor to get plain text with HTML annotations. The
>>> _InitialView still shows the html markup.
>>>
>>> I think both issues are related. Any ideas?
>>>
>>> Cheers,
>>>
>>> Mandy
>>>
>>>
>>> System Info: eclipse Oxygen.3a Release (4.7.3a), UIMA Ruta workbench
>>> 2.6.1, OS
>>> Kubuntu 18.04
>>>
>>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta syntax check of missing ";" after rule statement

2018-10-05 Thread Peter Klügl
Hi Mario,


this is actually not restricted to the condition action part. You can of
course have rules without actions where the same applies.

We have a strict convention concerning the indentation. If a rule covers
several lines, the new line has to be indented. This makes it easier to
see if a semicolon is missing, especially if someone else wrote the rules.


MyType1

    MyType2;


vs

MyType1

MyType2;



Best,


Peter




Am 05.10.2018 um 08:42 schrieb Mario Juric:
> Hi Peter,
>
> I think I understand the conflict, although I never considered this 
> possibility since we never used the rules in this way as illustrated by your 
> first example. We always had the constraint and action part at the end of the 
> rules followed by semicolon and never in between, but I can see why it might 
> be difficult to determine a syntax error in this case. I therefore have no 
> suggestions for improving the situation, since it would generally be 
> impossible to determine whether such a rule makes semantic sense or not. 
> However, now that we are aware of it we might stand a better change to spot 
> the problem earlier. Thanks for the feedback :)
>
>
> Best Regards,
>
> Mario Juric
> Head of Research & Development
> UNSILO.ai <http://unsilo.ai/>
> mobile:  +45 3082 4100
> skype: mario.juric.dk
>
>
>
>
>
>
>
>
>
>
>
>
>> On 3 Oct 2018, at 15:50 , Peter Klügl  wrote:
>>
>> Hi Mario,
>>
>>
>> hmm, the question is how Ruta should detect that a rule actually should 
>> ends. There are no semantics on whitespaces and line breaks.
>>
>>
>> Ruta should complain with a syntax error if the semicolon is missing and 
>> there is no valid stuff afterwards.
>>
>>
>> An example:
>>
>>
>> MyAnnotation { -> CREATE(SomeAnnotation)}
>>
>> Document{-> MyType};
>>
>> 
>>
>>
>> This is a valid script, but the rule will never match and the CREATE will 
>> never be executed. It's actually one rule with two rule element.
>>
>>
>> MyAnnotation { -> CREATE(SomeAnnotation)}
>>
>> 
>>
>> ... or ...
>>
>> MyAnnotation { -> CREATE(SomeAnnotation)}
>>
>> DECLARE MyType;
>>
>>
>> This is not a valid script and a syntax error should be reported.
>>
>>
>>
>> I normally find the missing semicolons quite easily with the Explain View in 
>> the Ruta IDE, which is unfortunately only available for Eclispe.
>>
>>
>> I have right now no idea how our situation can be improved. Do you have a 
>> proposal?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 02.10.2018 um 11:31 schrieb Mario Juric:
>>> Hi Peter,
>>>
>>> It occasionally happens that developers forgets the semicolon “;” after a 
>>> Ruta rule statement, e.g.
>>>
>>> MyAnnotation { -> CREATE(SomeAnnotation)}; <——
>>>
>>> We believe to remember that the Ruta engine previously produced syntax 
>>> errors in this case, but this doesn’t appear to happen anymore, so we ended 
>>> up wasting some time on tracking this down, because the rule actions didn’t 
>>> fire. It’s easy to overlook a detail like that when we don’t have any 
>>> syntax highlighting in IDEA, which is our preferred IDE. Shouldn’t it 
>>> produce a syntax error, or is there some reasons to just skip the 
>>> statements without it?
>>>
>>> We are using Ruta version 2.6.1.
>>>
>>> Cheers,
>>> Mario
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> -- 
>> Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Tennenbacher Str. 11
>> 79106 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Ruta syntax check of missing ";" after rule statement

2018-10-03 Thread Peter Klügl

Hi Mario,


hmm, the question is how Ruta should detect that a rule actually should 
ends. There are no semantics on whitespaces and line breaks.



Ruta should complain with a syntax error if the semicolon is missing and 
there is no valid stuff afterwards.



An example:


MyAnnotation { -> CREATE(SomeAnnotation)}

Document{-> MyType};




This is a valid script, but the rule will never match and the CREATE 
will never be executed. It's actually one rule with two rule element.



MyAnnotation { -> CREATE(SomeAnnotation)}



... or ...

MyAnnotation { -> CREATE(SomeAnnotation)}

DECLARE MyType;


This is not a valid script and a syntax error should be reported.



I normally find the missing semicolons quite easily with the Explain 
View in the Ruta IDE, which is unfortunately only available for Eclispe.



I have right now no idea how our situation can be improved. Do you have 
a proposal?



Best,


Peter


Am 02.10.2018 um 11:31 schrieb Mario Juric:

Hi Peter,

It occasionally happens that developers forgets the semicolon “;” after a Ruta 
rule statement, e.g.

MyAnnotation { -> CREATE(SomeAnnotation)}; <——

We believe to remember that the Ruta engine previously produced syntax errors 
in this case, but this doesn’t appear to happen anymore, so we ended up wasting 
some time on tracking this down, because the rule actions didn’t fire. It’s 
easy to overlook a detail like that when we don’t have any syntax highlighting 
in IDEA, which is our preferred IDE. Shouldn’t it produce a syntax error, or is 
there some reasons to just skip the statements without it?

We are using Ruta version 2.6.1.

Cheers,
Mario















--
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: UIMA 3 support in Ruta and DKPro

2018-10-03 Thread Peter Klügl

Hi,


yes, I am waiting for uimaFIT 3 in order to prepare a Ruta 3 RC.


I also plan to resolve some more tickets for the next Ruta 2 release, 
which will hopefully be 2.7.0 instead of 2.6.2


So, my plan would be to simultaneously release Ruta 2.7.0 and Ruta 3.0.0 
(with the 2.7.0 changes) before the end of the year.



Best,


Peter


Am 03.10.2018 um 08:51 schrieb Mario Juric:

Thanks Richard,

I will integrate the uimaFIT 3.0.0 branch and see what happens.

Cheers,
Mario













On 2 Oct 2018, at 12:00 , Richard Eckart de Castilho  wrote:

uimaFIT v3 is on my todo list for some time already. A few
weeks ago, I almost announced a vote for a release candidate,
but then Apache policy regarding checksums changed, mandating
a switch to SHA 256/512, so I dropped the RC again.

We are now about to release a new version of the UIMA parent
POM which will produce SHA 512 checksums for our artifacts,
then I'll go back to running the uimaFIT v2 and v3 releases.

There are no more significant changes scheduled for uimaFIT
2.5.0 and 3.0.0 - it's basically just doing the release now.

If you are brave enough to use unreleased version, you could
just check out the uimaFIT 3.0.x branch [1] temporarily and
build the SNAPSHOTs locally.

As far as I understood, the Ruta v3 release is basically waiting
for the uimaFIT 3.0.0 release - but I don't know anything more
about if/how many changes in addition to the uimaFIT v3 release
are necessary to perform in Ruta before v3 can be released.

Cheers,

-- Richard

[1] https://github.com/apache/uima-uimafit/tree/3.0.x


On 2. Oct 2018, at 09:49, Mario Juric  wrote:

Hi Peter & Richard,

We are working on migrating to UIMA 3, but we are getting an awful many runtime 
incompatibility complaints from DKPro and Ruta components in the form of

"JCas Class X, loaded from Y.jar, is missing required constructor; likely cause is 
wrong version (UIMA version 3 or later JCas required)."

It kinda puts migration on hold for us unless there is a workaround. Do you 
have any roadmap for migrating to UIMA 3?

Cheers,
Mario




--
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Help - Text Extraction

2018-07-17 Thread Peter Klügl
Hi,


reasonable rules largely depend on the greater use case and on the
complexity of the context. You could start with something trivial like


DECLARE Keyword, Value;
"Name|DOB|Ref\\.Num" -> Keyword;
RETAINTYPE(BREAK);
Keyword COLON #{-> Value} BREAK;


Best,

Peter



Am 13.07.2018 um 17:12 schrieb Vijay Kumar Reddy:
> Hi,
>
> I am using Apache Ruta to extract the data from the email body. I have to
> apply the Ruta script on the email content to match the keyword and extract
> the values from that. Below is the sample email content,
>
> Name: TestName
> DOB: 20/10/1986
> Ref.Num: 123456
>
> I am looking for sample script which extracts the Name, DOB and Ref.Num
> values from the email content. Can someone help in this pls.

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Issue while applying COUNT condition in UIMA RUTA

2018-07-06 Thread Peter Klügl
Hi,


thanks you for reporting this. This is a severe bug and I will fix it as
soon as possible.


Best,


Peter


Am 05.07.2018 um 08:00 schrieb amyjackson...@gmail.com:
> I used COUNT Condition to find the number of punctuations in an 
> annotation.But I didn't received the expected output.
>
>  DECLARE Sentence(INT pmcount);
>  Conflicts of interest"->Sentence;
>
>  DECLARE SentenceLastToken;
>  
> Sentence{-PARTOF(SentenceLastToken)->MARKLAST(SentenceLastToken)};
>  INT Pmcount=0; 
>  
>  Sentence->{ANY+?{->SHIFT(Sentence,1,1,true)} 
> SentenceLastToken{PARTOF(PM)};};
>  Sentence{COUNT(PM,Pmcount)->Sentence.pmcount=Pmcount};
>
> **Sample Input:**
>
>  Conflicts of interest.
>
> **Expected Output:**
>
>   Conflicts of interest
>pmcount:0
>  
> 
> **Received Output:**
>
>   Conflicts of interest
>pmcount:1
>  
> I'm facing this problem only if there is any PM after the Annotation value.
> 
>   

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Single and double quotes help needed

2018-05-14 Thread Peter Klügl
Hi,


did you try something like:


DECLARE InQuotes;
(SPECIAL{REGEXP("\"")} # SPECIAL{REGEXP("\"")}){-> InQuotes};


The actual rules depend on the patterns that could occur, e.g.,
different kind of quotes, quotes in direct speech etc...




Best,


Peter


Am 10.05.2018 um 13:47 schrieb Trinka Dcunha-Crimson Interactive:
> Hi,
>
>  
>
> I have been trying to capture single (' ') and double (" ") quotes in Ruta,
> but am at my wit's end figuring out how to do it.
>
> I used the escape characters but that didn't help.
>
>  
>
> Basically, I want to capture the text within the double quotes or let's say
> the comma before the double quotes in the following sentence.
>
>  
>
> "The case isn't ready," the judge said.
>
>  
>
> Could someone advise?
>
>  
>
> Best,
>
> Trinka
>
>
> This email and any files transmitted with it are confidential and solely 
> intended for the use of the individual or entity to whom they are addressed. 
> If you have   received this email in error, please notify the sender and 
> delete this e-mail from your system immediately. If you are not the intended 
> recipient, you are notified  that disclosing, copying, distributing, or 
> taking any action in reliance upon the contents of this email is strictly 
> prohibited. Any views expressed in this email are those of the individual 
> sender only, unless expressly stated to be those of Crimson Interactive Pvt. 
> Ltd. and its affiliates. Crimson Interactive Pvt. Ltd. does not guarantee the 
> integrity of this email content or that it is free from errors, viruses, or 
> interference.

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Help, please

2018-04-16 Thread Peter Klügl
Hi,


I actually would not solve this task with Ruta, but would rather acquire
more training data and apply single-label multi-class classification
using a linear svm with some iterations for the feature
extraction/weighting/normalization... as a start...

If you need to approach this with Ruta, I would do some simple
dictionary lookup for the keywords and apply some postprocessing like:

DECLARE Intent (String value);
WORDTABLE intentTable = 'intent_table.csv';
MARKTABLE(Intent, 1, intentTable, true, 4, ".,-", 2, "value" = 2);
Intent->{ANY i:@Intent{-> UNMARK(i)};};

with the table looking like:

new credit card application;Apply_for_Card
I want to apply for a new card;Apply_for_Card
papers needed to apply for a card;Apply_for_Card
open a MasterCard;Apply_for_Card
open a new credit card;Apply_for_Card
card application;Apply_for_Card
application for a new personal credit card;Apply_for_Card-Personal
I want to apply for a new personal card;Apply_for_Card-Personal
Open a business card;Apply_for_Card-Business
balance;Balance-Inquiry
...


(Param dictRemove WS = true)



Best,

Peter



Am 11.04.2018 um 19:57 schrieb Igor Mayer:
> Hello, Peter!
>
> Hope you are doing well. 
> I am a new user of UIMA RUTA, and sorry, that I dare to ask you
> questions directly, but I have seen this email address at
> StackOverflow and you said there, that it takes less time normally to
> receive an answer. I have to solve the exercise attached to this
> letter. I fully read the UIMA RUTA Guide & References posted on the
> Apache.org. However, I didn't found the good approach there.  
> I have a task, the script has to 'understand' each sentence\utterance
> (one by one) and link each to one of the intents. I tried to use
> Regular Expressions as keywords, and sort sentences with some keywords
> with Contains statements. However, this approach looks really
> duplicative. So, would you be kind to help me, may I ask you to tell
> the best approach, to solve this task and an example of how it should
> look like?
>
> Thank you a lot in advance! 

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Help, please

2018-04-13 Thread Peter Klügl
Hi,


unfortunately, I am very busy these days. I'll try to take a look at it
this weekend.


Best,


Peter


Am 11.04.2018 um 19:57 schrieb Igor Mayer:
> Hello, Peter!
>
> Hope you are doing well. 
> I am a new user of UIMA RUTA, and sorry, that I dare to ask you
> questions directly, but I have seen this email address at
> StackOverflow and you said there, that it takes less time normally to
> receive an answer. I have to solve the exercise attached to this
> letter. I fully read the UIMA RUTA Guide & References posted on the
> Apache.org. However, I didn't found the good approach there.  
> I have a task, the script has to 'understand' each sentence\utterance
> (one by one) and link each to one of the intents. I tried to use
> Regular Expressions as keywords, and sort sentences with some keywords
> with Contains statements. However, this approach looks really
> duplicative. So, would you be kind to help me, may I ask you to tell
> the best approach, to solve this task and an example of how it should
> look like?
>
> Thank you a lot in advance! 

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: RUTA: create own extensions

2018-03-23 Thread Peter Klügl

Hi,


that should actually work just fine as you described. The exception 
indicates that the keyword for the condition is not known.


I'll try to reproduce the problem and get back to you.


Best,


Peter


Am 22.03.2018 um 17:30 schrieb Nicolas Paris:

Hello

My goal is to create a RUTA custom extension that would do in my own UIMA 
pipeline:
(DateDifferrenceInMonth(Timex3.timexValue, "2018-03-22") - 6 )  {-> 
DateSixMonthBeforeNow}
*where Timex is an annotation from heideltime for eg


Then I have tried to begin with the examples extensions, and this
tutorial: https://stackoverflow.com/a/22067299/3865083

As stated in the last paragraph, if I want to use this extension in
eclipse, this might be complicated to import all dltk toolkit.

Then as a first attempt I just tried to implement both ExampleCondition and
ExampleConditionExtension and added the
new 
String[]{"org.apache.uima.ruta.example.extensions.ExampleConditionExtension"}
to the
RutaEngine.PARAM_ADDITIONAL_EXTENSIONS
of my own pipeline.

As a result, when I introduce the ExampleCondition in a ruta script, I get a
org.apache.uima.ruta.extensions.RutaParseRuntimeException
"(": expected WILDCARD, but found LPAREN

Then, I am missing something ?

Thanks for all,




--
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Pipeline Performance Measurment (equivlent to RUTA annotation tests)

2018-03-20 Thread Peter Klügl
Hi,


I think there is no common default implementation for this, but everyone
has its own implementation. For the normal uses cases this can be
implemented easily. However, it gets complicated if you need fancy
features, e.g., comparing different levels of complex feature values.


Normally, you have something like:

1. CasReader for providing the expected gold annotations

2. AnnotationCopier for moving the annotations to a gold view

3. Annotators of your pipeline for created the annotations

4. AnnotationComparator for comparing the new annotation with the
annotations of the gold view

5. EvaluationWriter for aggregating and storing the evaluation result


I personally do not use the Ruta evaluation anymore since it does not
provide enough features and our evaluations are integrated in your maven
build as integration tests, thus no need for the Ruta GUI/Annotation
Testing View.


Best,


Peter


Am 17.03.2018 um 20:31 schrieb Nicolas Paris:
> Hello,
>
> The RUTA workbench Annotation Test is a great tool to evaluate the
> performances of a RUTA script based on a gold standard.
>
> Is there any existing tool to measure the performances based on one
> input/ouput xmi folders?
>
> I guess it is feasible to hack the RUTA workbench by running uimafit
> pipelines from ruta, but since my pipeline has many annotators engines,
> this looks complicated to do.
>
> Thanks,

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: XCASParsingException while using DKPro with UIMA RUTA for POS tagging

2018-02-28 Thread Peter Klügl
n.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
>> Source)
>> at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>> at
>> org.apache.uima.util.XmlCasDeserializer.deserializeR(XmlCasDeserializer.java:111)
>> at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:366)
>> ... 123 more
>>
>>
> The script i tried to run is as follow;
>
> IMPORT PACKAGE de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos FROM
>> desc.type.POS AS pos;
>> IMPORT de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma FROM
>> desc.type.LexicalUnits;
>> UIMAFIT de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpSegmenter;
>> UIMAFIT de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordPosTagger;
>> UIMAFIT de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordLemmatizer;
>> uima.tcas.DocumentAnnotation{-CONTAINS(pos.POS)} -> {
>> uima.tcas.DocumentAnnotation{-> SETFEATURE("language", "en")};
>> EXEC(OpenNlpSegmenter);
>> EXEC(StanfordPosTagger, {pos.POS});
>> };
>
>
>
> I couldn't find anyway to resolve this issue and im very new to RUTA. I'd
> greatful if someone could help me out with this.
>
> [1].https://github.com/pkluegl/ruta/tree/master/ruta-german-novel-with-dkpro
>
> Thanks in advance,
> Regards,
> Ruwini
>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: UIMA Ruta - Conditional execution of a block

2018-02-26 Thread Peter Klügl
Hi,


you can use the normal functionality of conditions in the head rule of a
block, which results in sort of "if". This could look like:


BLOCK(p) Participant{APL==true} {

or

BLOCK(p) Participant{APL} {

You still have the iterator, so maybe you would like to add another
(outer) block just for the "if".



It hard to say something about how the rules could be improved without a
typical use case where they are applied and the larger context.

I personally would use more annotations/types and less ANNOTATIONLIST
since Ruta debugging does not cover it and it is more readable. Using
additional types, you could write the second block probably in one rule.
Something like:

p1:Participant{APL, p1.isCustomer -> CREATE(ControlRuleDetection,
"anchors" = SelectedMark, "values" = SelectedMark)}
    <-{m1:Mark{ m1.code=="BAD", m1.value=="Y" -> TRANSFER(SelectedMark)};};

This is not  a solution for you, just as an example. It's also missing
the positions.


Best,


Peter


Am 26.02.2018 um 06:47 schrieb Josep María Formentí Serra:
> Hi all,
>
>   I have this script in Ruta:
>
> PACKAGE com.aia.tas.uima.types;
> DECLARE ControlRuleDetection;
> ANNOTATIONLIST anchors;
> ANNOTATIONLIST values;
> INTLIST positions;
> BOOLEAN APL;
> BOOLEAN BAD;
> BLOCK(h) Header{} {
> Document { -> APL = false };
> a1:Attribute{ a1.code=="general.codapl", a1.ct=="CDO" -> APL = true};
> }
> BLOCK(p) Participant{} {
> Document{ -> BAD=false, CLEAR(anchors), CLEAR(values), CLEAR(positions) };
> m1:Mark{ m1.code=="BAD", m1.value=="Y" -> BAD=true, ADD(anchors, m1),
> ADD(values, m1), ADD(positions, 1)};
> p1:Document{APL, p1.isCustomer, BAD -> CREATE(ControlRuleDetection,
> "anchors" = anchors, "values" = values, "positions" = positions)};
> }
>
>
>  where there are 2 BLOCKs, the 2th BLOCK should be only executed when
> APL==true (calculated in first BLOCK).
>
>  How could I do? another optimizations are welcome too.
>
> BR,
>   JM

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Lost in UIMA Ruta Workbench !

2018-02-26 Thread Peter Klügl
Hi,


if different combination of values of ENTITY and ACTIONS should result
in different values of INTENT, you need separate rules instantiating the
possible combinations.

Does this make sense?

There are several ways to avoid redundant code then... e.g., you could
set a variable in the inlined rules for the CREATE, so that you would
restrict the combinations to the inlined rules.


Best,


Peter


Am 23.02.2018 um 17:30 schrieb Anna Polychroniou:
> Hello,
> I am trying to complete an exercise in NLU using UIMA Ruta.
> I have hit a wall for the last 3 days.
> I would be grateful if you could give a hint on my issue:
>
> I want to create 2 annotations ENTITY and ACTIONS for a list of sentences.
> I define a list of words for each one.
> Then I want to create a third annotation (INTENT) based on the first 2.
> Different values of ENTITY and ACTIONS must combine the 10 different values
> of INTENT annotation.
>
> I 've stuck on the final step where I have to create the combined
> annotation (with bold).
> Could you please help?
> I attach my work below.
>
>
>
> PACKAGE uima.ruta.exercise;
>
>
> WORDLIST EntityList = "Entities.txt";
> WORDLIST ActionList = "Actions.txt";
> DECLARE Annotation ENTITY(STRING value);
> DECLARE Annotation ACTIONS(STRING value);
>
>
> Document{-> RETAINTYPE(BREAK)};
> DECLARE Sentence;
> BREAK #{->MARK (Sentence)} BREAK;
>
> DECLARE Annotation INTENT(STRING value);
> BLOCK(ForEach) Sentence{} {
> Document{-> MARKFAST(ENTITY, EntityList)};
> Document{-> MARKFAST(ACTIONS, ActionList)};
>
> *Document{-> CREATE(INTENT, "value" =
> "Apply_for_Card")}<-{e:ENTITY{e.value=="card"} #
> a:ACTIONS{a.value=="application"};}*
> *;}*
>
>
>
>
> Thank you,
> Anna
>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2018-02-13 Thread Peter Klügl
Ok, I opened an issue for this.


Peter


Am 12.02.2018 um 17:46 schrieb Marshall Schor:
> nope. sorry. -Marshall
>
>
> On 2/9/2018 3:25 AM, Peter Klügl wrote:
>> Hi,
>>
>>
>> did you get an answer?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 10.01.2018 um 17:12 schrieb Marshall Schor:
>>> I'm pinging some people who might know something about LanguageWare's use of
>>> this feature. -Marshall
>>>
>>>
>>> On 1/10/2018 6:07 AM, Peter Klügl wrote:
>>>> Hi,
>>>>
>>>>
>>>> Am 10.01.2018 um 10:57 schrieb Richard Eckart de Castilho:
>>>>>> On 16.12.2017, at 13:48, Peter Klügl  wrote:
>>>>>>
>>>>>>> Is it a problem for us to simply implement Matthias's solution: Make use
>>>>>>> of the parameters in the PearSpecifier and just set them in the wrapped
>>>>>>> analysis engine description if they are compatible?
>>>>>>>
>>>>>> Are there any opinions on this?
>>>>> First, I was a bit confused and though the "PearSpecifier" would be
>>>>> this guy here [1]. The I realized it is this one [2].
>>>>>
>>>>> Looking at where the parameters of the PearSpecifier are used: apparently 
>>>>> the
>>>>> setParameter and getParameter are only ever called directly in unit tests.
>>>>>
>>>>> Does it mean that the frameworks so far does not make any use of these 
>>>>> parameter
>>>>> as all? Or maybe they are used via some inherited methods...?
>>>>>
>>>>> It sounds reasonable to me that these parameters are forwarded to the 
>>>>> top-level
>>>>> component in the PEAR - the question I am asking myself is though: why 
>>>>> doesn't
>>>>> this already happen and (maybe) what else where these PearSpecifier 
>>>>> parameters
>>>>> intended to do then?
>>>> Yes, these are exactly the questions we had :-)
>>>>
>>>> I rather wanted to ask twice before I open an issue or implement
>>>> something. Could always be that I missed something. Initially, I thought
>>>> that the IBM guys (LanguageWare) made massive use of the PEAR concept
>>>> and they surely had some possibility to configure their PEARs.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>>
>>>>> Cheers,
>>>>>
>>>>> -- Richard
>>>>>
>>>>> [1] 
>>>>> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.installation_descriptor
>>>>> [2] 
>>>>> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.specifier

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2018-02-09 Thread Peter Klügl
Hi,


did you get an answer?


Best,


Peter


Am 10.01.2018 um 17:12 schrieb Marshall Schor:
> I'm pinging some people who might know something about LanguageWare's use of
> this feature. -Marshall
>
>
> On 1/10/2018 6:07 AM, Peter Klügl wrote:
>> Hi,
>>
>>
>> Am 10.01.2018 um 10:57 schrieb Richard Eckart de Castilho:
>>>> On 16.12.2017, at 13:48, Peter Klügl  wrote:
>>>>
>>>>> Is it a problem for us to simply implement Matthias's solution: Make use
>>>>> of the parameters in the PearSpecifier and just set them in the wrapped
>>>>> analysis engine description if they are compatible?
>>>>>
>>>> Are there any opinions on this?
>>> First, I was a bit confused and though the "PearSpecifier" would be
>>> this guy here [1]. The I realized it is this one [2].
>>>
>>> Looking at where the parameters of the PearSpecifier are used: apparently 
>>> the
>>> setParameter and getParameter are only ever called directly in unit tests.
>>>
>>> Does it mean that the frameworks so far does not make any use of these 
>>> parameter
>>> as all? Or maybe they are used via some inherited methods...?
>>>
>>> It sounds reasonable to me that these parameters are forwarded to the 
>>> top-level
>>> component in the PEAR - the question I am asking myself is though: why 
>>> doesn't
>>> this already happen and (maybe) what else where these PearSpecifier 
>>> parameters
>>> intended to do then?
>> Yes, these are exactly the questions we had :-)
>>
>> I rather wanted to ask twice before I open an issue or implement
>> something. Could always be that I missed something. Initially, I thought
>> that the IBM guys (LanguageWare) made massive use of the PEAR concept
>> and they surely had some possibility to configure their PEARs.
>>
>> Best,
>>
>> Peter
>>
>>
>>> Cheers,
>>>
>>> -- Richard
>>>
>>> [1] 
>>> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.installation_descriptor
>>> [2] 
>>> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.specifier

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2018-01-10 Thread Peter Klügl
Hi,


Am 10.01.2018 um 10:57 schrieb Richard Eckart de Castilho:
>> On 16.12.2017, at 13:48, Peter Klügl  wrote:
>>
>>> Is it a problem for us to simply implement Matthias's solution: Make use
>>> of the parameters in the PearSpecifier and just set them in the wrapped
>>> analysis engine description if they are compatible?
>>>
>> Are there any opinions on this?
> First, I was a bit confused and though the "PearSpecifier" would be
> this guy here [1]. The I realized it is this one [2].
>
> Looking at where the parameters of the PearSpecifier are used: apparently the
> setParameter and getParameter are only ever called directly in unit tests.
>
> Does it mean that the frameworks so far does not make any use of these 
> parameter
> as all? Or maybe they are used via some inherited methods...?
>
> It sounds reasonable to me that these parameters are forwarded to the 
> top-level
> component in the PEAR - the question I am asking myself is though: why doesn't
> this already happen and (maybe) what else where these PearSpecifier parameters
> intended to do then?

Yes, these are exactly the questions we had :-)

I rather wanted to ask twice before I open an issue or implement
something. Could always be that I missed something. Initially, I thought
that the IBM guys (LanguageWare) made massive use of the PEAR concept
and they surely had some possibility to configure their PEARs.

Best,

Peter


> Cheers,
>
> -- Richard
>
> [1] 
> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.installation_descriptor
> [2] 
> http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.pear.specifier

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2018-01-10 Thread Peter Klügl
ping?


Am 16.12.2017 um 13:48 schrieb Peter Klügl:
> Hi,
>
>
> Am 13.12.2017 um 14:33 schrieb Peter Klügl:
>> ...
>>
>>
>> Is it a problem for us to simply implement Matthias's solution: Make use
>> of the parameters in the PearSpecifier and just set them in the wrapped
>> analysis engine description if they are compatible?
>>
>
> Are there any opinions on this?
>
> If not, then I would open a ticket and implement the changes.
>
>
> Best,
>
> Peter
>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2017-12-16 Thread Peter Klügl

Hi,


Am 13.12.2017 um 14:33 schrieb Peter Klügl:

...


Is it a problem for us to simply implement Matthias's solution: Make use
of the parameters in the PearSpecifier and just set them in the wrapped
analysis engine description if they are compatible?



Are there any opinions on this?

If not, then I would open a ticket and implement the changes.


Best,

Peter



Re: Parameters for PEAR

2017-12-13 Thread Peter Klügl
Hi,


if I may join the discussion *putting on my Averbis hat*


It's about configuring the annotator packaged as a PEAR. The problem is
that we cannot access the analysis engine description and change the
parameter values programmatically as it is hidden in the PearSpecifier.
The external setting are not a good solution for us as far as I
understand the override settings. I see two disadvantages for our use
case: you need to declare and set the ExternalOverrideName and the
external settings are somewhat global.

Let's assume we have a pipeline like PEAR1_v1(param1=a) ->
PEAR1_v1(param1=b) -> PEAR1_v2(param1=c,param2=d) -> PEAR2_v1(param2->e)

Then, let's assume that there is an arbitrary combination of these in
different aggregated annotators in one pipeline.

If I understood the implementation correctly, it could get really nasty
to solve this with external settings.

There are other solutions to get this functionality up and running like
overriding the uima factory, using different PEAR versions for different
parameter values, relinking the PEAR to multiple analysis engine
descriptions. However, none of these solutions are really "acceptable".


*putting on my UIMA hat*


Is it a problem for us to simply implement Matthias's solution: Make use
of the parameters in the PearSpecifier and just set them in the wrapped
analysis engine description if they are compatible?


(I imagine someone could provide a patch for this)


Best,


Peter



Am 12.12.2017 um 15:57 schrieb Marshall Schor:
> Hi,
>
> Good question...
>
> The use of the word "parameters" in UIMA is unfortunately overloaded with
> multiple meanings.
>
> There are in general 2 kinds:  the kind used in produceAnalysisEngine - the
> so-called "additional parameters".  The other kind are the "configuration
> parameters", also called configuration settings.  These latter have a 
> capability
> for specifying global external settings overrides.
>
> If the parameters you want to override are configuration params (which I think
> you mean, because you say they're already in the "xml"),take a look at
> https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.external_configuration_parameter_overrides
>
> Maybe that will be an easy way to address your use case:  the external 
> settings
> could be dynamically written into a temp file and that temp file specified in 
> an
> "additional parameters" key to produceAnalysisEngine. 
>
> -Marshall
>
>
> On 12/12/2017 2:39 AM, Matthias Koch wrote:
>> Hi,
>>
>> I want to configure a PEAR dynamically. (I install the pear and want to
>> produce the analysis engine with different parameters than in the xml).
>> Is this possible? Can I use the additionalParameters? I have seen that the
>> PearSpecifier has an instance variable for parameters, but no one is using
>> (calling) it.
>>
>> I want to produce the analysisEngine with:
>> UIMAFramework.produceAnalysisEngine(resourceSpecifer, resourceManager, 
>> params);
>>
>> In this specifier there should be one or more pearSpecifiers that should be
>> configured.
>>
>> I have overridden the PearAnalysisEngineWrapper and built a loop that
>> configures the following specifier over the configurationParameterSettings. 
>> It
>> takes the parameters from the pear specifiers.
>>
>> line 257-258
>> // Parse the resource specifier
>> ResourceSpecifier specifier =
>> UIMAFramework.getXMLParser().parseResourceSpecifier(in);
>>
>> ==> added code
>> AnalysisEngineDescription analysisEngineDescription =
>> (AnalysisEngineDescription) specifier;
>> AnalysisEngineMetaData analysisEngineMetaData =
>> analysisEngineDescription.getAnalysisEngineMetaData();
>> ConfigurationParameterSettings configurationParameterSettings =
>> analysisEngineMetaData.getConfigurationParameterSettings();
>> for (Parameter parameter : Arrays.asList(pearSpec.getParameters())) {
>>
>> configurationParameterSettings.setParameterValue(parameter.getName(),
>> parameter.getValue());
>> }
>>
>> Is it possible without overriding anything?
>>
>> UIMAJ Version: 2.10
>>
>> Sincerely
>> Matthias
>>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: Parameters for PEAR

2017-12-13 Thread Peter Klügl
Hi Jens,


Am 13.12.2017 um 09:53 schrieb Jens Grivolla:
> Is there a specific reason to use PEARs?

Yes. An example could be to add a previously unknown annotator with its
implementation in a running system, e.g. without restarting the
application server.
(I am not aware of any other functionality that would support that out
of the box for UIMA.)


>
> As far as I remember (but I could be wrong, it's been a few years), the
> main advantages of using them (automatic class path configuration, some
> degree of isolation between components) was lost when we wanted to change
> configuration parameters because then we would need to use the AE
> descriptor instead of the PEAR descriptor (at least with CPE).

Yes, that's the point. However, just because something does not work,
you do not have to accept it. You could fix the problem even if it is a
limited/special fix for the pear. The PearSpecifier theoretically
supports parameters, but these are just not used in the implementation
of UIMA. I'll comment on that more in an answer to Marshalls post.

>  If you're
> not going to use the PEAR descriptor then an installed PEAR is not much
> more than a bunch of JARs, and component descriptors with tons of
> hard-coded absolute file paths, so you should be able to just use and
> configure a component based on those descriptors (without anything
> PEAR-specific).
>
> We have since switched to doing everything with uimaFIT which gives you
> many many possibilities to adapt your workflow, configure engines
> programatically, etc. For us the change has been hugely positive, both for
> development (and debugging) and for deployment in a wide variety of ways
> and environments.

Yes, we (Averbis) also use uimaFIT everywhere, but PEARs have an
additional use case as I mention above. PEAR and uimaFIT do not need to
be mutually exclusive as you can or course also have a PEAR of a uimaFIT
annotator.


Best,

Peter


PS: nice to hear from you again :-)


> Best,
> Jens
>
> On Tue, Dec 12, 2017 at 8:39 AM, Matthias Koch 
> wrote:
>
>> Hi,
>>
>> I want to configure a PEAR dynamically. (I install the pear and want to
>> produce the analysis engine with different parameters than in the xml).
>> Is this possible? Can I use the additionalParameters? I have seen that the
>> PearSpecifier has an instance variable for parameters, but no one is using
>> (calling) it.
>>
>> I want to produce the analysisEngine with: 
>> UIMAFramework.produceAnalysisEngine(resourceSpecifer,
>> resourceManager, params);
>>
>> In this specifier there should be one or more pearSpecifiers that should
>> be configured.
>>
>> I have overridden the PearAnalysisEngineWrapper and built a loop that
>> configures the following specifier over the configurationParameterSettings.
>> It takes the parameters from the pear specifiers.
>>
>> line 257-258
>> // Parse the resource specifier
>> ResourceSpecifier specifier = UIMAFramework.getXMLParser().p
>> arseResourceSpecifier(in);
>>
>> ==> added code
>> AnalysisEngineDescription analysisEngineDescription =
>> (AnalysisEngineDescription) specifier;
>> AnalysisEngineMetaData analysisEngineMetaData =
>> analysisEngineDescription.getAnalysisEngineMetaData();
>> ConfigurationParameterSettings configurationParameterSettings =
>> analysisEngineMetaData.getConfigurationParameterSettings();
>> for (Parameter parameter : Arrays.asList(pearSpec.getParameters())) {
>>
>> configurationParameterSettings.setParameterValue(parameter.getName(),
>> parameter.getValue());
>> }
>>
>> Is it possible without overriding anything?
>>
>> UIMAJ Version: 2.10
>>
>> Sincerely
>> Matthias
>>
>> --
>> Matthias Koch
>>
>> Averbis GmbH
>> Tennenbacher Str. 11
>> 79106 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email:matthias.k...@averbis.com
>> Web:https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>>

-- 
Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Tennenbacher Str. 11
79106 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó



Re: problem encountered when running main on new input files (eclipse)

2017-11-14 Thread Peter Klügl
Hi,


that sounds strange, the quick ruta command should have no influence. Is
the test file an xmi or a txt file?


My first guess would be that the CAS Editor is not up-to-date. Can you
reproduce it and test if a refresh on the output folder helps, or close
and reopen the CAS Editor?


Best,

Peter


Am 10.11.2017 um 12:52 schrieb Giulia Donato:
> Hi, I am not sure how to describe this issue as it's the first time I am
> encountering this problem:
> I am running some rules on a test file with a single utterance.
>
> When I update my input txt file with a different utterance and run again
> Main.ruta  on the updated file the new xmi file still contains the old
> annotation.
>
> Apparently the only thing that seems to overcome this is by hitting quick
> ruta on the input txt file and then run again from the run button (so quick
> ruta + run button in sequence).
>
> Thank you for any reply
>
> G.
>



Re: Erratic block variable behaviour in Ruta

2017-11-06 Thread Peter Klügl
Hi Mario,


sorry for the delayed response... I was travelling.


First of all, there should be no multithreading issues in ruta (in
normal usage), at least, I am quite confident about that.


My first guess would be that the problem is caused by the nature of
variables and their initialization in ruta.

The initialization of variables with values (e.g., BOOLEAN ignore =
false;) does not reset its actual value during a loop like BLOCK as the
variables are declared only once and because they are always global. The
value only defines the initial value of the variable to which it is
reset when the complete environment is reset (e.g., different CAS). The
declaration is actually ignored in the execution of the block.

So, you need to reset the value to false for each iteration in BLOCK. I
wonder if your solution with the ASSIGN in the head rule of the block
will work. The rule is applied in order to get a list of annotations
(windows for the block), and so the action is already applied before the
actual iteration starts.

Could you try something like that:


BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
    BOOLEAN ignore = false;
    ASSIGN(ignore, false);
    EnclosedAnnotation.property==“something else"{FEATURE("value",
“ignorable") -> ASSIGN(ignore, true)};
    EnclosingAnnotation.name==“Hello"{IF(ignore == false) ->
CREATE(AnotherAnnotation, “name" = “World")};
}



Best,

Peter


Am 29.10.2017 um 17:49 schrieb Mario Juric:
> Hi Peter,
>
> We encountered a problem with a Ruta rule behaving erratically in a 
> multithreaded environment. We isolated the problem to the following rule 
> shown in pseudo form:
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
> BOOLEAN ignore = false;
> EnclosedAnnotation.property==“something else"{FEATURE("value", 
> “ignorable") -> ASSIGN(ignore, true)};
> EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> 
> CREATE(AnotherAnnotation, “name" = “World")};
> }
> We identified about 1000 documents where “AnotherAnnotation” above should be 
> created, and we reprocessed them several times on EC2 using Oracle JDK build 
> 1.8.0_151
> with both Ruta 2.5 and UIMA 2.9 as well as Ruta 2.6.1 and UIMA 2.10.1. The 
> number of inconsistencies in rule firing over many runs of the 1K appears 
> erratic between approximately 16% down to approximately 0,5%, but there was 
> always inconsistencies in every run. Removing the ignore condition made of 
> course the issue disappear entirely, e.g.
>
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
> EnclosingAnnotation.name==“Hello"{ -> CREATE(AnotherAnnotation, “name" = 
> “World")};
> }
> We haven’t experienced the issue in a single threaded environment yet, but we 
> are not entirely sure whether it is related to multithreading, although the 
> nature of the problem could point in the direction of some thread-safety 
> issues around shared data inside Ruta, but that is just guessing. However, 
> the workaround in our case was too rewrite the rule as follows:
> BOOLEAN ignore = false;
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {-> ASSIGN(ignore, 
> false)} {
> EnclosedAnnotation.property==“something else"{FEATURE("value", 
> “ignorable") -> ASSIGN(ignore, true)};
> EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> 
> CREATE(AnotherAnnotation, “name" = “World")};
> }
> I assume the BLOCK(ForEach) action happen for every occurrence, but I haven’t 
> actually verified that yet since there is usually only one occurrence in this 
> particular case, but I was hoping you might be able to shed some light on 
> this, and the problems we experienced with the variable declaration inside 
> the block.
>
> Thanks
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>



  1   2   3   4   5   6   >