Re: UIMA Ruta into jar?

2014-10-23 Thread Alexandre Patry

On 14-10-23 09:40 AM, Piyush Paliwal wrote:

Hi Richard,

its seems to work now. Thanks. As I was only at testing stage, I forgot to
add other descriptors (OpenNlpTagger, etc) prior to that Ruta descriptor in
pipeline. Those were needed so that the CAS can find all types.

Though, its a little hectic solution (copy and paste), but is workable and
therefore is great.
I am glad that you made it work! If you want to reduce XML boilerplate, 
you can look at uimaFIT [1], a library offering a very nice Java API to 
replace XML descriptors.


Alexandre

[1] http://uima.apache.org/uimafit.html


Piyush

On Thu, Oct 23, 2014 at 8:10 AM, Richard Eckart de Castilho 
wrote:


On 23.10.2014, at 00:39, Piyush Paliwal  wrote:


As an example, I wish to import the following types from TypeSystem.xml
descriptor which also resides in same folder as script (both files now in
Java project).

//import the additional annotations types and alias in short name

IMPORT de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.NN FROM
uima.ruta.example.TypeSystem  AS _NN;

IMPORT de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.PP FROM
uima.ruta.example.TypeSystem AS _PP;

I assume you are invoking Ruta via uimaFIT? If yes, then you should make
sure that uimaFIT can find all necessary type systems via the type
detection
mechanism [1].

If you not using uimaFIT or if you have some special way to create your
CASes, make sure that when the CAS is created, all types that all your
scripts need are already loaded at that point.

UIMA does not allow to change the type system while a pipeline is running.
Thus the IMPORT declarations will normally not be interpreted when the
script
is executed.

I do not know how the IMPORT (type) AS (alias) is implemented. If the alias
is set up at execution time and not at CAS initialization time, it should
work.

Alexandre?

Cheers,

-- Richard

[1]
http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#d5e531








Re: UIMA Ruta into jar?

2014-10-23 Thread Alexandre Patry

On 14-10-23 02:10 AM, Richard Eckart de Castilho wrote:

On 23.10.2014, at 00:39, Piyush Paliwal  wrote:


As an example, I wish to import the following types from TypeSystem.xml
descriptor which also resides in same folder as script (both files now in
Java project).

//import the additional annotations types and alias in short name

IMPORT de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.NN FROM
uima.ruta.example.TypeSystem  AS _NN;

IMPORT de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.PP FROM
uima.ruta.example.TypeSystem AS _PP;

I do not know how the IMPORT (type) AS (alias) is implemented. If the alias
is set up at execution time and not at CAS initialization time, it should
work.

Alexandre?
IMPORT instructions and aliases are resolved at the same time as 
TYPESYSTEM instructions, when the first CAS is processed.


Best,

Alexandre




Re: UIMA Ruta into jar?

2014-10-22 Thread Alexandre Patry

Hi Piyush,

A while ago, I wrote a blog post on how to package a RUTA script with 
maven:


http://textjuicer.com/blog/2013/09/08/using-ruta-in-a-maven-project/

Even if you do not use maven, it should give you an idea on the files to 
distribute in your jars.


Hope this help,

Alexandre

On 14-10-22 07:35 AM, Piyush Paliwal wrote:

Hi,

we are developing one Ruta Project and want to access it in java project.
Currently what we did is to add the descriptor (generated from ruta script)
into UIMA pipeline which is in java project.

The pipeline can only be run on workspace, we are not able to make a single
jar of that java project and run on command line because it can not access
Ruta project as dependency.

There is also a direct way to read ruta script within java, but the script
can not import annotations from type systems if we put in java project
(i.e. it needs Ruta editor).

Any way to add Ruta project dependency into java?

Thanks.

Piyush






Re: sendCAS is slow

2014-09-24 Thread Alexandre Patry

This is good news :)

Did you try to increase the number of CAS in the pool as Jerry suggested 
[1]?


You can reply to the list as well, there are a lot of people more 
knowledgeable than me that can help you there.


[1] 
http://uima.markmail.org/search/?q=#query:+page:1+mid:4aa3ifmzg5zvj4bm+state:results


On 24/09/2014 09:49, xym210 wrote:
no, everything seems has worked right, when I deploy two collection 
reader instances, the processing speed improved



--
发自 Android 网易邮箱


在2014年09月24日 21:44, Alexandre Patry 
<mailto:alexandre.pa...@keatext.com>写 道:


Did you get an error message or a stack trace?

On 24/09/2014 09:38, xym210 wrote:
it doesn't work, when I deploy the collectionReader and the AE 
colocated, it doesn't work either, is there something i 
misunderstood, thanks.


--
发自 Android 网易邮箱


在2014年09月23日 20:57, Alexandre Patry 
<mailto:alexandre.pa...@keatext.com>写 道:


Did you try to use binary serialisation instead of XML serialisation for
the CAS?

For more information on binary serialisation, you can search for the
word "binary" in the UIMA-AS user guide
( http://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html).

Hope this help,

Alexandre

On 23/09/2014 03:16, xia yongmin wrote:
> hi,
>
> I am a new one in uima, and i meet a problem as follow:
>
> Supposing I have a CollectionReader, an AE and a Cas Consumer,
>
> it tooks 1ms for CollectionReader to initialize a cas, 5ms for AE 
to analyze,

> and 1ms for Cas Consumer to consume the cas.
>
> it seems that I can deploy 5 instances of AE to get five times speed.
>
> but when I deploy 3 instances of AE, it doesn't improve the speed.
>
> And I found that it took a long time for the UIMA to send a cas from
> CollectionReader to AE using sendCAS(cas) method.
>
> how can I solve this problem?
>
> many thanks.
>


--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com




--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Re: sendCAS is slow

2014-09-24 Thread Alexandre Patry

Did you get an error message or a stack trace?

On 24/09/2014 09:38, xym210 wrote:
it doesn't work, when I deploy the collectionReader and the AE 
colocated, it doesn't work either, is there something i misunderstood, 
thanks.


--
发自 Android 网易邮箱


在2014年09月23日 20:57, Alexandre Patry 
<mailto:alexandre.pa...@keatext.com>写 道:


Did you try to use binary serialisation instead of XML serialisation for
the CAS?

For more information on binary serialisation, you can search for the
word "binary" in the UIMA-AS user guide
( http://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html).

Hope this help,

Alexandre

On 23/09/2014 03:16, xia yongmin wrote:
> hi,
>
> I am a new one in uima, and i meet a problem as follow:
>
> Supposing I have a CollectionReader, an AE and a Cas Consumer,
>
> it tooks 1ms for CollectionReader to initialize a cas, 5ms for AE to 
analyze,

> and 1ms for Cas Consumer to consume the cas.
>
> it seems that I can deploy 5 instances of AE to get five times speed.
>
> but when I deploy 3 instances of AE, it doesn't improve the speed.
>
> And I found that it took a long time for the UIMA to send a cas from
> CollectionReader to AE using sendCAS(cas) method.
>
> how can I solve this problem?
>
> many thanks.
>


--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com




--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Re: sendCAS is slow

2014-09-23 Thread Alexandre Patry
Did you try to use binary serialisation instead of XML serialisation for 
the CAS?


For more information on binary serialisation, you can search for the 
word "binary" in the UIMA-AS user guide 
(http://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html).


Hope this help,

Alexandre

On 23/09/2014 03:16, xia yongmin wrote:

hi,

I am a new one in uima, and i meet a problem as follow:

Supposing I have a CollectionReader, an AE and a Cas Consumer,

it tooks 1ms for CollectionReader to initialize a cas, 5ms for AE to analyze,
and 1ms for Cas Consumer to consume the cas.

it seems that I can deploy 5 instances of AE to get five times speed.

but when I deploy 3 instances of AE, it doesn't improve the speed.

And I found that it took a long time for the UIMA to send a cas from
CollectionReader to AE using sendCAS(cas) method.

how can I solve this problem?

many thanks.




--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Re: RUTA: case insensitive regex rule?

2014-08-29 Thread Alexandre Patry

On 29/08/2014 03:34, Renaud Richardet wrote:

(How) can I make the following rule Case Insensitive?

"\\b((inter)?neurone?s?|cells?)\\b" -> Neuron;

You can turn the "ignore case" flag by prefixing your regex with (?i):

"(?i)\\b((inter)?neurone?s?|cells?)\\b" -> Neuron;


Hope this help,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Re: Ruta - best practices for unit tests?

2014-08-18 Thread Alexandre Patry

Hi Renaud,

On 14-08-18 05:30 PM, Renaud Richardet wrote:

Hello,

What are best practices for writing unit tests for Ruta?

Ideally, I would like to have 1) tests that can be run on the command line
(so as to automate them in Jenkins), and
We use JUnit and it works quite well for us. I have a small example 
project on github with a RUTA script and its unit test 
(https://github.com/apatry/ruta-with-maven). You can also look at RUTA 
test suite if you want more examples 
(http://svn.apache.org/viewvc/uima/ruta/trunk/ruta-core/src/test/java/org/apache/uima/ruta/).

2) where input and expected output
can be edited in a text editor (meaning: not xmi's or java code).
Is there a reason why you want to avoid Java code for unit tests? 
Building and inspecting CAS in Java for each test allow a lot of 
flexibility and makes it possible to test each analysis engine outside 
of its pipeline. And uimaFIT is an excellent tool for that 
(http://uima.apache.org/uimafit.html).


Hope this help,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: Loading a resource from the classpath

2014-08-07 Thread Alexandre Patry
Well, the error was on the other side of the screen. This works 
perfectly well with fileUrl, I only had a typo in my path.


On 07/08/2014 11:37, Alexandre Patry wrote:

Hi,

I would like to locate a resource in the classpath, something along 
the lines of:



  LocationDictionary
  Dictionary of locations
  

path/in/jar/location-dictionary.xml
  
org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource_impl


Is it possible with existing resource specifiers or do I have to write 
my own and use a CustomResourceSpecifier?


Thanks!

Alexandre




--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Loading a resource from the classpath

2014-08-07 Thread Alexandre Patry

Hi,

I would like to locate a resource in the classpath, something along the 
lines of:



  LocationDictionary
  Dictionary of locations
  

path/in/jar/location-dictionary.xml
  
  
org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource_impl


Is it possible with existing resource specifiers or do I have to write 
my own and use a CustomResourceSpecifier?


Thanks!

Alexandre

--
Alexandre Patry, Ph.D
Chercheur Principal / Principal Researcher
http://KeaText.com



Re: dinamically type system creation

2014-05-13 Thread Alexandre Patry

Hi Tiziano,

On 13/05/2014 09:55, Tiziano Lorenzetti wrote:

Dear all,
I'm new to UIMA and I'm trying to develope an annotator that creates
dinamically a type system with serveral feature structure.
To accomplish this, the annotator does:

...
TypeSystemDescription tsd =
TypeSystemDescriptionFactory.createTypeSystemDescription(new String[0]);
tsd.addType("it.uniroma2.art.ExcelAnnotation", "", "uima.tcas.Annotation");
TypeDescription type = tsd.getType("it.uniroma2.art.ExcelAnnotation");
type.addFeature("newUIMAFeature", "", "uima.cas.String");
...

In another annotator, I try to access this type system and its features in
this way:

TypeSystem ts = aCAS.getTypeSystem();
Iterator types = ts.getTypeIterator();
Iterator features = ts.getFeatures();

but neither the type system and its features are present. How could I reach
my goal?
How do you create your CAS? I guess the types should be found if you 
create it using:


CAS aCAS = CasCreationUtils.createCas(typeSystemDescription, null, null);

Hope this help,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: RandomAccessFile problem in UIMA

2014-05-03 Thread Alexandre Patry

Hi Debbie,

I do not use eclipse, I won't be of any help regarding maven and eclipse 
interoperability. The simplest thing is probably to download extJWNL 
from http://sourceforge.net/projects/extjwnl/files/ and add all jars 
under the lib/ directory in your project. Once it is done, you should be 
able to load the dictionary with the following line of code:


Dictionary dictionary = Dictionary.getDefaultResourceInstance();

Let me know if it helps,

Alexandre

On 14-05-02 08:47 PM, Debbie Zhang wrote:

Thanks Alexandre for your reply!

I will try extJWNL as suggested. As I have never used maven, may I ask which
maven Eclipse plugin you use?

Thanks again for your help!

Regards,

Debbie



-Original Message-
From: Alexandre Patry [mailto:alexandre.pa...@keatext.com]
Sent: Saturday, 3 May 2014 12:13 AM
To: user@uima.apache.org
Subject: Re: RandomAccessFile problem in UIMA

Hi Debbie,

I recommend you to use extJWNL (https://github.com/extjwnl/extjwnl)
instead of JWNL. We made the switch from JWNL and never looked back.

For your path problems, extJWNL distribute WordNet dictionaries as
maven dependencies. It should become a non-issue.

Hope this help,

Alexandre

On 02/05/2014 03:36, Debbie Zhang wrote:

Hi,



I am having problems to use JWNL wordnet in UIMA.



JWNL uses RandomAccessFile to read wordnet dictionary files. In order
to create a PEAR file, wordnet dictionary files are put in
resources/wordnet folder under project. As resources is in my Build
Path, I have no problem to run the application I created in Eclipse.
Therefore, I am  certain the dictionary files can be read. However,
when I use UIMA Document Analyzer or UIMA CAS Visual Debugger to run

the annotation, I get the following error:



java.io.FileNotFoundException: resources/wordnet/data.noun (No such
file or
directory)



The error comes from the following code:



RandomAccess _file = new RandomAccessFile(path, _permissions);



I use the following code to check the current working directory of

the

class:



URL location =


PrincetonRandomAccessDictionaryFile.class.getProtectionDomain().getCod

eSourc
e().getLocation();

System.out.println(location.getFile());



It seems both situation have the same location: /project/bin/



Did anyone encounter a similar problem before? Any suggestion is

welcome.

Thank you!



Regards,



Debbie






--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com





--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: RandomAccessFile problem in UIMA

2014-05-02 Thread Alexandre Patry

Hi Debbie,

I recommend you to use extJWNL (https://github.com/extjwnl/extjwnl) 
instead of JWNL. We made the switch from JWNL and never looked back.


For your path problems, extJWNL distribute WordNet dictionaries as maven 
dependencies. It should become a non-issue.


Hope this help,

Alexandre

On 02/05/2014 03:36, Debbie Zhang wrote:

Hi,

  


I am having problems to use JWNL wordnet in UIMA.

  


JWNL uses RandomAccessFile to read wordnet dictionary files. In order to
create a PEAR file, wordnet dictionary files are put in resources/wordnet
folder under project. As resources is in my Build Path, I have no problem to
run the application I created in Eclipse. Therefore, I am  certain the
dictionary files can be read. However, when I use UIMA Document Analyzer or
UIMA CAS Visual Debugger to run the annotation, I get the following error:

  


java.io.FileNotFoundException: resources/wordnet/data.noun (No such file or
directory)

  


The error comes from the following code:

  


RandomAccess _file = new RandomAccessFile(path, _permissions);

  


I use the following code to check the current working directory of the
class:

  


URL location =
PrincetonRandomAccessDictionaryFile.class.getProtectionDomain().getCodeSourc
e().getLocation();

System.out.println(location.getFile());

  


It seems both situation have the same location: /project/bin/

  


Did anyone encounter a similar problem before? Any suggestion is welcome.
Thank you!

  


Regards,

  


Debbie

  






--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: UIMA-OPENNLP

2014-04-04 Thread Alexandre Patry
UIMA descriptors are distributed with the source code of opennlp-uima. 
You can grab them from 
http://svn.apache.org/viewvc/opennlp/trunk/opennlp-uima/descriptors/.


Hope this help,

Alexandre

On 03/04/2014 18:53, Pathima Nusrath Hameed wrote:

Hi,

I am interested in using UIMA for clinical text data processing. I am
working on WIndows7 platform. I installed UIMA but I could not configure
OpenNLP tools. OpenNLP descriptors are not available in UIMA.
I am glad if you could help me in this matter.
I appreciate your reply.
Thank you



--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: Serializing multiple JCas objects to a single file

2014-02-01 Thread Alexandre Patry

On 14-02-01 08:22 PM, Samudra Banerjee wrote:

Hi Experts,

I have a scenario where processing a wikipedia XML dump generates a 
huge number of JCas objects (~1 million), one per page. I want to 
serialize these JCas objects for later use, but generating 1 million 
different files will take a toll on the system. So I was wondering if 
there was a way to serialize multiple JCas objects to a single file 
for later retrieval. Any idea if this can be achieved?
The JDK provide classes to read and write zip files (see 
http://docs.oracle.com/javase/7/docs/api/java/util/zip/package-summary.html). 
You could serialize each JCas in an entry of a zip file.


Best,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: UIMA Ruta 2.1.0 Issues

2013-12-17 Thread Alexandre Patry

On 2013-12-17 12:10, Peter Klügl wrote:

Am 17.12.2013 18:00, schrieb Alexandre Patry:

On 2013-12-17 11:56, Peter Klügl wrote:

Hi,

some of the rules behave as expected. It's maybe a bit counterintuitive,
but I do not see a way to improve it. I will fix the rest in the next
few days.

An example:

(SPECIAL ALL* SPECIAL) {-> MARK(TMP_GenericAllSTAR)};

ALL is a parent type of SPECIAL and * is a greedy quantifier. Therefore
ALL matches on all annotations and also on the SPECIAL annotations until
the end of the document. Then, there is no SPECIAL annotation left to
match and the rule fails.

Using a reluctant quantifier should work as expected for this specific
case case:

(SPECIAL ALL*? SPECIAL) {-> MARK(TMP_GenericAllSTAR)};



Just another comment that has nothing to do with the problem :-)

The rule is of course somewhat "slow".

I would rather rewrite it in:

(SPECIAL # SPECIAL) {-> MARK(TMP_GenericAllSTAR)};

Here, the wildcard searches for the next SPECIAL annotation in the index
and has not to match on each token until the next SPECIAL annotation.

Nice trick, thanks for sharing!

Is there a cookbook somewhere where all these tricks are stored?

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: UIMA Ruta 2.1.0 Issues

2013-12-17 Thread Alexandre Patry

On 2013-12-17 11:56, Peter Klügl wrote:

Hi,

some of the rules behave as expected. It's maybe a bit counterintuitive,
but I do not see a way to improve it. I will fix the rest in the next
few days.

An example:

(SPECIAL ALL* SPECIAL) {-> MARK(TMP_GenericAllSTAR)};

ALL is a parent type of SPECIAL and * is a greedy quantifier. Therefore
ALL matches on all annotations and also on the SPECIAL annotations until
the end of the document. Then, there is no SPECIAL annotation left to
match and the rule fails.
Using a reluctant quantifier should work as expected for this specific 
case case:


(SPECIAL ALL*? SPECIAL) {-> MARK(TMP_GenericAllSTAR)};


Hope this help,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: Macros in Ruta? - How to make long scripts short?

2013-12-06 Thread Alexandre Patry

On 2013-12-06 10:46, Richard Eckart de Castilho wrote:

Hi,

assuming I have a Ruta script with recurring statements of the type

  PartOfSpeech{FEATURE("value", "N")

Is it possible to define some kind of macro to replace this long
statement with a short-hand?

  MACRO N := PartOfSpeech{FEATURE("value", "N")}
  MACRO V := PartOfSpeech{FEATURE("value", "V")}

  N{0,2} V

From what I know, RUTA does not support macro yet.

The closest thing I found in Ruta for such a thing was a Block - but
doesn't seem to do what I want, because I would need to ->CALL it.
I would define temporary annotations for N and V. The compromise on 
performance is not the same though. It consumes more memory, but 
searching for N or V does not require to scan all part-of-speeches 
annotations anymore.


Hope this help,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: Problem writing ruta extensions

2013-12-05 Thread Alexandre Patry

On 2013-12-04 12:33, Sebastian wrote:

Hi,

I'm highly interested in ruta, and its potential applications in industrial
applications. Right know I'm trying to create a simple toy condition
extension that is simply a case insensitive INLIST condition. It is
completely based on the InListCondition class, I also declared an
implementation of the IRutaConditionExtension interface.

With primitve types everything seems to work great, except when the
condition is used with a variable :

STRINGLIST MonthsList = {"january", ...};
DECLARE Month;
ANY{INSENSITIVEINLIST(MonthsList) -> MARK(Month)};

I get a class cast exception when the condition is being created, because
MonthsList is a SimpleTypeExpression and I'm expecting a StringListExpression.

Am I doing something wrong ? I suppose there is a way to resolve the
variable to the actual list, but I missed it somehow.
It may not help you to get your toy extension working, but for small 
lists I like to use regular expressions where case insensitiveness is free:


W{REGEXP("(?i)january|february|march|...|december") -> MARK(Month)}

Regards,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: [ruta] How to efficiently delete an annotation only if it appears within the N first token of a document?

2013-08-28 Thread Alexandre Patry

On 2013-08-28 15:20, Peter Klügl wrote:

Am 28.08.2013 20:33, schrieb Alexandre Patry:

On 2013-08-28 12:19, Peter Klügl wrote:

On 28.08.2013 18:17, Alexandre Patry wrote:

I will be happy to test drive MARKFIRST when it will be in trunk.

It's already in the trunk. If you want, then I can also think of
something that avoid the visibility problem.
I was able to make it work in my application, but my eclipse plugin 
does not recognize the MARKFIRST keyword. Here is what I did :


1. Uninstall the RUTA Workbench plugin from eclipse
2. `mvn clean install` in ruta/trunk
3. `mvn clean package -Declipse.home=/usr/lib/eclipse 
-Duima-eclipse-jar-processor=/usr/lib/eclipse/plugins/org.eclipse.equinox.p2.jarprocessor_1.0.200.dist.jar 
-Declipse-equinox-launcher=/usr/lib/eclipse/plugins/org.eclipse.equinox.launcher_1.2.0.dist.jar` 
in ruta/trunk/ruta-eclipse-update-site
4. Re-install the RUTA Workbench plugin in eclipse from 
ruta/trunk/ruta-eclipse-update-site/target/eclipse-update-site


Did I miss something?



I will do some testing tomorrow, but my first guess is that uninstall 
does not remove the plugins, only the feature. When you install the 
feature again with the same version, then the plugins have not changed 
as they are already present in the same version. You could try to 
simply replace the plugins in your eclipse installation and restart it 
with -clean.


Your guess is right :)

I uninstalled ruta from eclipse, removed ruta jars in 
$ECLIPSE_HOME/plugins and removed all entries referencing ruta in 
$ECLIPSE_HOME/artifacts.xml. I then installed it again from eclipse and 
now it is working.


--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools



Re: [ruta] How to efficiently delete an annotation only if it appears within the N first token of a document?

2013-08-28 Thread Alexandre Patry

On 2013-08-28 12:19, Peter Klügl wrote:

On 28.08.2013 18:17, Alexandre Patry wrote:

I will be happy to test drive MARKFIRST when it will be in trunk.

It's already in the trunk. If you want, then I can also think of
something that avoid the visibility problem.
I was able to make it work in my application, but my eclipse plugin does 
not recognize the MARKFIRST keyword. Here is what I did :


1. Uninstall the RUTA Workbench plugin from eclipse
2. `mvn clean install` in ruta/trunk
3. `mvn clean package -Declipse.home=/usr/lib/eclipse 
-Duima-eclipse-jar-processor=/usr/lib/eclipse/plugins/org.eclipse.equinox.p2.jarprocessor_1.0.200.dist.jar 
-Declipse-equinox-launcher=/usr/lib/eclipse/plugins/org.eclipse.equinox.launcher_1.2.0.dist.jar` 
in ruta/trunk/ruta-eclipse-update-site
4. Re-install the RUTA Workbench plugin in eclipse from 
ruta/trunk/ruta-eclipse-update-site/target/eclipse-update-site


Did I miss something?

Thanks,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools



Re: [ruta] How to efficiently delete an annotation only if it appears within the N first token of a document?

2013-08-28 Thread Alexandre Patry

On 2013-08-28 11:25, Peter Klügl wrote:

On 28.08.2013 16:52, Alexandre Patry wrote:

Hi,

I use RUTA and I want to delete an annotation if it is within the
first 50 tokens of a document. I came up with the following rules :

ANY{POSITION(Document, 1)-> Header};// Annotate the
first token in the document
Header{->SHIFT(Header, 1, 2)} ANY[0,49];// Appends the
49 following tokens
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};// Delete the
first ToDelete if it is within the header


These rules work as expected but they are *really* slow. Is there a
faster way to achieve that?


Oh yes, the first rule is really slow. I always miss an action MARKFIRST
(as there is a MARKLAST). I will add it today or tomorrow.

There are two reasons why the first rule is slow:
ANY has to look at all tokens and POSITION is just the slowest condition
in Ruta.
  
For now you could use a rule like:

ANY{STARTSWITH(Document)-> Header};
... which avoids at least the POSITION condition.

A simple test with a 200 W document:

...
ANY{POSITION(Document, 1)-> Header}; // [0.274s|93.52%]
Header{->SHIFT(Header, 1, 2)} ANY[0,49];  // [0.090s|3.07%]
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.030s|1.02%]

...
ANY{STARTSWITH(Document)-> Header};  // [0.047s|50.00%]
Header{->SHIFT(Header, 1, 2)} ANY[0,49];  // [0.029s|30.85%]
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.011s|11.7%]

well, that's still slow (in debug mode) and I actually wonder why the
other rules are getting faster... but I hope that the performance will
soon be improved :-)

Just tried it and it is much better, thanks!

Many of my documents start with space, so I had to update the rules to :

   Document{-> ADDRETAINTYPE(SPACE, BREAK)};
   ANY{STARTSWITH(Document) -> Header};
   // if the first token is a space, use the first non-space following it
   Header{IS({SPACE, BREAK}) -> UNMARK(Header)} ANY*?
   ANY{-PARTOF({SPACE, BREAK}) -> MARK(Header)};
   Document{-> REMOVERETAINTYPE(SPACE, BREAK)};

   Header{->SHIFT(Header, 1, 2)} ANY[0,49];
   ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};

I will be happy to test drive MARKFIRST when it will be in trunk.

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools



[ruta] How to efficiently delete an annotation only if it appears within the N first token of a document?

2013-08-28 Thread Alexandre Patry

Hi,

I use RUTA and I want to delete an annotation if it is within the first 
50 tokens of a document. I came up with the following rules :


   ANY{POSITION(Document, 1)-> Header};// Annotate the
   first token in the document
   Header{->SHIFT(Header, 1, 2)} ANY[0,49];// Appends the
   49 following tokens
   ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};// Delete the
   first ToDelete if it is within the header


These rules work as expected but they are *really* slow. Is there a 
faster way to achieve that?


Thanks,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools



Re: Multi-view CAS and sofa-unaware AE

2013-04-03 Thread Alexandre Patry

On 13-04-03 11:10 AM, Peter Klügl wrote:

Yes, but imagine you have a CAS with 10 views and you want to apply a
primitive sofa-unaware AE on each view.

The easiest solution I found was to write a template AAE descriptor,
replaced the AE descriptor and sofa name (and mapping), instantiate the
AAE, call process(), and then repeat that for the next view.

This can get quite ugly, if you have to override parameters and you do
not know the primitive AE and its parameters.
If you are willing to use uimafit, you could do it in a simple for loop. 
It would look like this :


   // build an aggregate that will run the same analysis engine on many
   sofas
   final AggregateBuilder builder = new AggregateBuilder();
   for (String sofa : sofas) {
final AnalysisEngineDescription annotator =
   AnalysisEngineFactory.createPrimitiveDescription(YourEngine.class,
   paramName1, paramValue1, paramName2, paramValue2, ...);
builder.add(annotator, "_InitialView", sofa);
   }

   final AnalysisEngine engine = builder.createAggregate();
   // you can then user your engine


The documentation on the web site ( https://code.google.com/p/uimafit/) 
is quite good if you want more information.


Regards,

Alexandre



Best,

Peter


On 03.04.2013 14:38, Jörn Kottmann wrote:

Yes, you can use the sofa mapping, to map some view to the _InitialView.

Have a look here:
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.mvs.sofa_name_mapping


Jörn

On 04/03/2013 02:19 PM, Peter Klügl wrote:

Hi,

sorry for this beginner question:

It there a shortcut to apply a sofa-unaware AE on CAS view that is not
the _InitialView?

It seems quite cumbersome to programmatically generate an aggregate
analysis engine description to wrap to sofa-unaware engine.

Best,

Peter





--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools



Re: CAS Visualisation

2012-10-16 Thread Alexandre Patry

On 2012-10-16, at 8:31 AM, Andreas Niekler  
wrote:

> Dear UIMA Users,
> 
> i wonder what the best practice would be to render a CAS as a html snippet 
> that could be included into a webpage. I already found the 
> AnnotationViewGenerator which is producing complete html files which is far 
> to much as i just want to generate snippets.
> 
> Has anybody a nice library or script to easily convert a cas to a html based 
> structure?

I do not know if there is a class doing what you want from a CAS, but it is 
easy to extract snippets of html from a document using a library like jsoup 
(http://jsoup.org). For example, you could extract the body content in the 
following way :

 // retrieve complete document html
String html = …

// extract html under body
String snippet = Jsoup.parse(html).select("body").html();

Hope this help,

Alexandre

> 
> Thank you very much
> 
> -- 
> Andreas Niekler, Dipl. Ing. (FH)
> NLP Group | Department of Computer Science
> University of Leipzig
> Johannisgasse 26 | 04103 Leipzig
> 
> mail: aniek...@informatik.uni-leipzig.deg.de



Re: Using JCasGen outside eclipse

2012-10-12 Thread Alexandre Patry

On Fri 12 Oct 2012 02:41:45 PM EDT, Himanshu Gahlot wrote:

Hi,

Is it possible to use the JCas generator utility in some other IDE
(IntelliJ Idea, to be specific) other than Eclipse? Something where I just
need to write the xml for the new type and the corresponding Java class
gets generated using a call to some uima class/script.


I use jcasgen along with maven in Intellij IDEA. Here are the specific 
snippets for maven :


[...]


   org.uimafit
   uimafit
   1.4.0

[...]

   
   
   
   org.codehaus.mojo
   exec-maven-plugin
   1.2.1
   
   
   jcasgen
   generate-sources
   
   java
   
   
   
   
   
org.uimafit.util.JCasGenPomFriendly

   
   
file:${project.basedir}/src/main/resources/path/to/your/types/*.xml
   
${project.build.directory}/generated-sources/uima

   
   
   

   
   
   org.codehaus.mojo
   build-helper-maven-plugin
   1.7
   
   
   add-uima-sources
   generate-sources
   
   add-source
   
   
   
   
${project.build.directory}/generated-sources/uima

   
   
   
   
   
   


Looking at jcasgen.sh, you could also call the class 
org.apache.uima.tools.jcasgen.Jg from a run configuration.


Hope this help,

Alexandre

--
Alexandre Patry
Ingénieur-Chercheur
http://KeaText.com


Transformez vos documents en outils de décision

<< Turn your documents into decision tools