Hi Asher!
As a work around, you can use an empty type system,
TypeSystemDescription tsd =
TypeSystemDescriptionFactory.createTypeSystemDescription("EmptyTypeSystem");
add types programmatically,
tsd.addType(typeName, null, CAS.TYPE_NAME_ANNOTATION);
and get them later with
Type type =
Hi!
What's the best practice to combine analysis engines into CAS processors?
Should every analysis engine become its own CAS processor? Should analysis
engines be combined to aggregates which become CAS processors? What are the
conditions for doing so: technical, semantical, logical?
Best,
Hello Peter,
I found it thanks to your help. There was another Ruta script maliciously
hiding in the pipeline setting up test annotations and therefore using all of
Ruta's defaults. I discovered it as I used your code from the unit test which,
of course, works perfectly fine. I will create
Hi Peter,
doesn't work like that for me. I've removed DefaultSeeder and added my own
seeder implementing RutaAnnotationSeeder. Now, I have all of Ruta's standard
tokens plus my own tokenization at the same time.
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Peter Klügl
Hello Peter!
Please correct me if I'm wrong. My understanding of how Ruta works is as
follows.
1. The RutaBasic annotations are always created. RETAINTYPE and FILTERTYPE have
no influence of annotation creation. They influence the use of those types in
rules, only.
2. The configuration
Hi,
how to set configuration parameters of analysis engines created with the Maven
plugin, before they are created of course? Can the parameters be configured
from within the pom?
Cheers,
Armin
pgp_qtPVu0Bmq.pgp
Description: PGP signature
Hi Jens,
nice tips. I will try that one with the filters, first. I just need to make a
view changes.
Thank you,
Armin
-Ursprüngliche Nachricht-
Von: j...@grivolla.net [mailto:j...@grivolla.net] Im Auftrag von Jens Grivolla
Gesendet: Dienstag, 16. August 2016 13:34
An:
Hello again!
One down, one to go. Are there best practices or tricks to reduce Ruta's memory
needs? I tried to use the following script to merge names.
Document{->GREEDYANCHORING(true)};
First+ Full {->MARK(Full)};
Full Last+ {->MARK(Full)};
First+ Last+ {->MARK(Full)};
Hi!
I'm using uimaFIT 2.2.0 and uimaj 2.8.1. The collectection processing engine is
slowy eating up all memory until it gets killed by the system. This happens
even when I'm just runnging a collection reader and no other compoments (no
analysis at all). Does anyone has experiented a similar
Hi!
How to remove annotations in a collection processing engine? Doing it in
process() of an annotator failed. Is this even possible?
Best,
Armin
pgpfgCQFO7URp.pgp
Description: PGP signature
Hi Peter,
this helped a little bit, but it is still not running. I had to add the
resources section to the pom.
...
src/main/ruta
Hi Peter,
I like to add something to my last post. I can force that exception to occur by
setting resolveImports to true in the plain ruta project. There's no java yet.
Regards,
Armin
-Ursprüngliche Nachricht-
Von: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de]
Gesendet:
Hi,
how ist ruta-maven-plugin supposed to be used? Is there a detailed step by step
description?
I've created a new empty maven project, added a script in the source folder
src/main/ruta and a text file containing a list of words to src/main/resources.
mvn package builds a ...Engine.xml and a
Hi Richard!
No, I don't use initialize() without args directly. I use
initialize(UimaContext context) and call super.initialize(context).
Best,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:r...@apache.org]
Gesendet: Donnerstag, 30. Juli 2015 14:46
An:
Hello Peter!
That works but doesn't solve the underlying problem. The line is form DKPro's
StanfordNamedEntityRecognizer. Using your solution, I get the same error with
ClearTK-TimeML. There must be something wrong elsewhere. If I remember correct,
Richard said that it may be the
Hi Peter!
The change request UIMA-4062 is implemented, isn't it? So how does an end user
use it? How to read a wordlist as an UIMA external resource once and use it
with Ruta.apply() and MARKFAST on every CAS?
Thanks,
Armin
pgppzUqsCMnGM.pgp
Description: PGP signature
Hi!
How to use log4j with UIMA? Specifying -Dlog4j.configuration=file:path and
-Dorg.apache.uima.logger.class=org.apache.uima.util.impl.Log4jLogger_impl on
the VM command line yields a lot of INFO message from a lot of *_impl classes I
do not want to see. These messages are not logged with
Hi Peter!
There is no sarcasm at all. I really want to use Maven. It works fine for me.
convention over configuration is a nice thing.
And I prefer programming/typing over clicking. It gives me more control, is
more stable and such...
Cheers,
Armin
-Ursprüngliche Nachricht-
Von:
Hi Peter!
Experienced developers would maybe only use maven-based ruta project in
future and would not rely on the old Workspace projects at all.
Exactly, let me program it...
I assume that a user could convert a ruta project to a maven project and
do the building by configuring the pom.
Hi Aleksandar!
For full flexibility I use CAS (not JCas). It's a bit inelegant to use, but you
can introduce new types at runtime. Together with UIMAfit it is very nice in
JUnit tests. And you can set types (type names) as annotator parameters. For
example, you can choose the input and output
Hello!
This a very short and simple gazetteer using RUTA.
Document{-GREEDYANCHORING(true)};
%s*{-MARKFAST(%s,'%s')};
where the first %s is replaced using String.format() by the name of the source
type, the second %s is replaced by the target type name, and the third %s is
replaced by the URL
Hi Rob!
This simple code example sends annotations of type Person, Location and
Organization to a Solr server. There must be the fields text, person, location,
and organization defined in Solr, as well. You need
org.apache.solr:solr-solrj:4.9.0 or higher jar.
Regards,
Armin
public class
Hi Carsten,
I've never used it, but according to the documentation you can do this with a
flow controller. The bad thing is, Richard told me a while ago that it is not
so easy to build your own flow controller.
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Carsten Schnober
Hi!
PearPackagingMavenPlugin copies the CVS subdirs to the PEAR. Can this be
changed? How?
Cheers,
Armin
pgp5B1b1EW63M.pgp
Description: PGP signature
Hi Erik and Jörn,
I've used Solr in the meantime. It is so easy to quickly write a CAS consumer
that sends documents to a Solr web service. Writing to a Lucene index is
minimally more work. Could this be the reason why nobody cares about the
outdated version? Is there really a need for Lucas
Hi Renauld,
that's nice, thank you. Are you using Lucene 4.x or an older version?
It's a while ago, that I've asked that question and I didn't get much response.
Is the project dead? Is it just to easy to code a simple annotator for Lucene
or Solr to justify the effort maintaining Lucas and
Hi!
Is someone using Lucas? It seems to be slightly outdated. It depends on Lucene
2.9.3. Lucene is at version 4.9.0 right now. Is there an alternative?
Regards,
Armin
pgpV9OhSW9ts5.pgp
Description: PGP signature
Hi Richard,
Nailed it. The pipeline with DKPro's StanfordNamedEntityRecognizer does work
with uimafit-core:jar:2.0.0 and uimaj-core:jar:2.4.2 but it does not work with
uimafit-core:jar:2.1.0 and uimaj-core:jar:2.6.0. It runs with
uimafit-core:2.0.0 aund uimaj-core:2.6.0, too.
Thanks a lot,
Hi Richard!
It looks like your absolutely right. I have changed all JCas stuff in the
consumer's resource to pure CAS and it works.
But why is JCas support not initialized? The reader calls an annotator for
document meta data that uses JCas. It says new
DocumentMetaData(cas.getJCas()) and
Hi,
The final runnable jar contains the META-INF/org.apache.uima.fit/types.txt from
a maven dependency and not from the project itself. Can something be done about
this?
Cheers
Armin
pgpy5xmLtjXPW.pgp
Description: PGP signature
Hello!
On which annotation type does MARFKAST work? Can I restrict MARKFAST to a
single annotation Type, say my own token type? It would be nice to restrict a
ruta script to a set of annotations by giving that set of annotations
explicitly, like
Document{- INPUT(Token, Organization,
Hi, Peter!
I got that. I restricted MARKFAST on segments. It works just nearly perfect.
How does MARKFAST match things? Using
Document{-MARKFAST(MyType, { a, b, a b });
on
a b
yields
a b and b but not a.
I would like to have a as well. Can this be done?
Buy the way: I love Ruta.apply().
Hi Richard,
you're right. I have to use new CASes or views. Or I can use the same CAS and
restrict the analysis engine to a substring. But that would imply having
parameters for the substring's begin and end offsets in the analysis engine:
Oh, wait a minute, wasn't that my original question?
Hello!
I've got another maybe not so good idea. Why not pass an aggregate analysis
engine as a parameter? First, build an aggregate analysis engine the usual way.
Second, serialize it to an XML-string. Third, pass that string to the
SegmentProcessingAE as String parameter together with another
Hello!
Are there annotations sets in UIMA? With annotations sets you can group
annotations. For example, you may have named entity annotations in a gold
standard set and the actual named entities found by an analysis engine in
another set. In both sets the location entities are named Location,
Hello Richard!
I would like to have a writer that writes all mentions of a given type. The
type is given by name as a AE parameter. The way the mentions are formatted
should be interchangeable. So the formatter varies and should be encapsulated
as a AE resource (or maybe not?).
public class
Hi,
I cancelled it. Actually, I don't have a resource. I just tried to modularize
my code a little bit. But uimafit's use of injection makes this difficult and
no fun at all.
Some people consider using injection to be a good programming style. I
personally hate it. It kills my highly
Hi Richard,
I will try that and report back.
Thanks
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:r...@apache.org]
Gesendet: Montag, 19. Mai 2014 11:11
An: user@uima.apache.org
Betreff: Re: uimafit - String[] parameter in Resource_ImplBase
Hi Armin,
UIMA
Hi,
What are you doing with very large text documents in an UIMA Pipeline, for
example 9 GB in size.
A. I expect that you split the large file before putting it into the pipeline.
Or do you use a multiplier in the pipeline to split it? Anyway, where do you
split the input file? You can not
Hi Jens,
It's a log file.
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Jens Grivolla [mailto:j+...@grivolla.net]
Gesendet: Freitag, 18. Oktober 2013 11:05
An: user@uima.apache.org
Betreff: Re: Working with very large text documents
On 10/18/2013 10:06 AM, Armin Wegner wrote:
What
Dear Jens, dear Richard,
Looks like I have to use a log file specific pipeline. The problem was that I
did not knew it before the process crashed. It would be so nice having a
general approach.
Thanks,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho
Dear Richard,
to use StringStringMapEntry, needn't it subclass TOP or FeatureStructure? Is it
possible to store an arbitray object into a CAS?
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:r...@apache.org]
Gesendet: Mittwoch, 16. Oktober 2013 18:02
Hi Thomas,
thanks for your answer. Using HashMap, does the n-th element of keySet() always
corresponds to the n-th element of values()? Is this a defined behavior in Java?
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Thomas Ginter [mailto:thomas.gin...@utah.edu]
Gesendet: Mittwoch,
Dear Marshall,
Consider an input text from which only some parts should be processed. After
processing the text should be there in one piece again. Let A denote parts of
no interest and let b denote parts to analyse further. XAX is split up into X,
A, and X. There is nothing to do for the X
No, not for me. You can even switch to Java 7.
Armin
-Ursprüngliche Nachricht-
Von: Marshall Schor [mailto:m...@schor.com]
Gesendet: Sonntag, 28. Juli 2013 16:05
An: uima-user
Betreff: Java level prerequsite upgrade?
Dear Users,
The UIMA developers would like to be able to start
Hi,
Ruta Workbench 2.0.1 can not be installed on Eclipse Kepler because the
dependecy to DLTK are not matched. Any ideas?
Armin
pgppfECv17k3R.pgp
Description: PGP signature
Hi,
Using this code
AnnotationIndexAnnotationFS indexA = cas.getAnnotatinIndex(typeA);
FSIteratorAnnotationFS itA = indexA.iterator();
// outer loop
while (itA.hasNext()) {
AnnotationFS annotA = itA.next();
FSIteratorAnnotationFS itB = indexA.subiterator(annotA);
//
Hi,
I'd like to use java objectes in a pipeline which are constructed before the
pipeline is run and which are still there, after the pipeline has finished its
job. Is this even possible?
Cheers,
Armin
Hi Richard,
I'm using uimafit's Resource_Impl, now. It is even easer to use than
Initializable.
Thanks for all your fast help,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:richard.eck...@gmail.com]
Gesendet: Freitag, 14. Juni 2013 09:40
An:
Hi,
the uimafit source browser at code.google shows only HTML source code.
Cheers,
Armin
Hi,
the following code uses a file namer class of class
org.uimafit.factory.initializable.Initializable to create a CAS consumer.
aggregateBuilder.add(AnalysisEngineFactory.createPrimitiveDescription(Writer.class,
Writer.PARAM_OUTPUT_DIRECTORY_PATH, output,
Hi,
TikaAnnotator depends on Tika 0.8. The actual version of Tika is 1.3. Is there
a newer version of TikaAnnotator which does run with Tika 1.3?
Cheers,
Armin
Hi,
What is the right way to make a change request for Apache UIMA Dictionary
Annotator?
Cheers,
Armin
Hello Peter,
Now that I understand it, it's a nice feature.
By the way, where can I find a good documentation of Ruta? I only know of
http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html
and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description
Hi!
In Ruta 2.0.2-SNAPSHOT, rules with an optional first element do not work. The
optional part seems to be mandatory. Using
DECLARE Test;
a? b c{-MARK(Test, 1, 3)};
on
a b c x b c
marks a b c (0, 5) but not b c (8, 11).
Cheers,
Armin
Hello Jörn,
absolutely right. But for now I'm still a nooby. That's why I'm asking so much.
Cheers,
Armin
-Ursprüngliche Nachricht-
Von: Jörn Kottmann [mailto:kottm...@gmail.com]
Gesendet: Donnerstag, 23. Mai 2013 14:24
An: user@uima.apache.org
Betreff: Re: Ruta - MARKFAST
On
Hi Peter,
your example does work perfectly fine. But try this as word list and input
document:
nach Christus
nach der Zeitenwende
n. C.
n.C.
nC.
n. Chr.
n. d. Z.
n.d.Z.
unserer Zeit
unserer Zeitrechnung
u. Z.
u.Z.
v. C.
v.C.
vC.
v. Chr.
v. d. Z.
v.d.Z.
vor Christus
vor der Zeitenwende
vor
Hi!
I've checkout Ruta 2.0.2-SNAPSHOT with
svn checkout https://svn.apache.org/repos/asf/uima/sandbox/ruta/trunk
and build it succesfully with
mvn clean install.
Now, how to install the Eclipse plugins? Is there a local reposity or update
site for Eclipse? Or, which files need to be copied
Hi,
In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes
before a token with begin offset 0 and end offset 0. The token order is not as
I expected. Thus in my case, SourceDocumentAnnotation was the second token in
the token sequence and the rule didn't match. It took me
Hi Peter,
I think that the rule doesn't matter. But I tried to find calender dates. To
find out what was going wrong I reduced the original more complex rule to
DECLARE Date;
Document{-RETAINTYPE(BREAK, SPACE)};
NUM{REGEXP(\\d\\d)-MARK(Date, 1, 2)} PERIOD;
on the input text
12. Mai 1803
I
Hello!
Is there any possibility to match strings like
nC.
v. Chr.
with MARKFAST?
Cheers,
Armin
Hello!
Let A, B, C, D and F denote type names. Then, A B? C D{-MARK(F, 1, 4)} works.
A (B)? C D{-MARK(F, 1, 4)} causes a NullPointerException.
(A B)? C D{-MARK(F, 1, 4)} causes an ArrayIndexOutOfBoundException: -1.
Any ideas?
Cheers,
Armin
Hello!
In Ruta SNAPSHOT-2.0.1 Document{-RETAINTYPE(ALL)}; does not retain SPACE. It
is the same with ANY and WS.
By the way, where can I get the newest version of Ruta (jar, svn, etc)?
Cheers,
Armin
Hi Peter,
That was really helpful,
Thanks again,
Armin
-Ursprüngliche Nachricht-
Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de]
Gesendet: Montag, 6. Mai 2013 15:39
An: user@uima.apache.org
Betreff: Re: AW: Textmarker - Qualification of Types
Hi,
I should have mentioned that you
Hi Peter,
That is fine. I'm using 2.0.0 core jar from maven central. Can you give me a
snapshot update site, please?
Thank you,
Armin
-Ursprüngliche Nachricht-
Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de]
Gesendet: Freitag, 3. Mai 2013 15:04
An: user@uima.apache.org
Betreff: Re:
Hello Richard,
using your second suggestion, I've written a very simple CAS consumer like the
one in [2]. It's a one-liner and works fine:
public final void process(final CAS cas) throws AnalysisEngineProcessException {
try {
Hi Richard,
I'm not talking about type system descriptors, but of analysis engine
descriptors. I would like to create an AnalysisEngineDescription from an
analysis engine descriptor file, e. g. like one the Textmarker Workbench
created in a Textmarker Eclipse project. I'd like to add this
Hi Richard,
The file is in the file system. But I don't want to create an AnalysisEngine
with AnalysisEngineFactory.createAnalysisEngineFromPath(). I'd rather like to
have an AnalysisEngineDescription. But there is not method
createAnalysisEngineDescriptionFromPath(). Is there an easy way to
Hi!
In classical UIMA you use following code to create an AnalysisEngineDescription
from an XML descriptor file.
final Path descriptorFilePath = Paths.get(/some/path/,
AnalysisEngineDescriptorFile.xml);
final XMLInputSource xmlInputSource = new
XMLInputSource(descriptorFilePath.toFile());
Hi!
Using org.uimafit.factory.AggregateBuilder you can aggregate analysis engines
with different type systems. Is there any way to serialize the effective type
system from such an aggregate or from the CAS it is using?
Thanks,
Armin
Hi!
The CAS editor and Textmarker are some fine Eclipse plugins to view the
results. But I haven't yet managed to build a type system which works in java
code and in the editors? There's always something wrong with the paths, either
way. Is there a way to build a common type system, that is
Thank you, Burn. That helps.
Armin
-Ursprüngliche Nachricht-
Von: Burn Lewis [mailto:burnle...@gmail.com]
Gesendet: Freitag, 6. Juli 2012 14:58
An: user@uima.apache.org
Betreff: Re: Adding relations in a CAS
We created a type system for a project that included entries for Relations
Hi there,
is there any standard or best practice for adding relations of annotations to a
CAS? I've annotated named entities in text documents. Now I'd like to have
relations between the named entities as well.
Regards,
Armin
Hi,
I like to call an annotator from another annotator as it uses annotatations for
which annotators already exist. Calling is not a problem, but setting the
parameter is. Can this be done at all? How? Using uimaFIT the parameter values
are injected using the
Hi Richard,
That is exactly what I wanted to achieve. I got your point. I will try your
suggestion and report back.
Thank you very much,
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:eck...@ukp.informatik.tu-darmstadt.de]
Gesendet: Freitag, 8. Juni 2012
Hello Richard!
Besides view selection, your suggestion works. The outer analysis engine works
on the view given by AggregateBuilder.add(analysisEngineDescription,
CAS.NAME_DEFAULT_SOFA, viewName) as it should. The inner analysis engine is
called with analysisEngine.process(cas) from the outer
Hello!
Is there any tool I can visualize a view other than _InitialView? It
would be nice to do so in the Cas Editor.
Regards,
Armin
Hi Jörn!
I just updated. That's fine, thank you.
Armin
-Ursprüngliche Nachricht-
Von: Jörn Kottmann [mailto:kottm...@gmail.com]
Gesendet: Donnerstag, 29. Dezember 2011 13:12
An: user@uima.apache.org
Betreff: Re: AW: Tool for viewing views other than _InitialView
On 12/29/11 12:55 PM,
Hi Peter,
there is no such option in the context menu and I can't find it in the
documentation of version 2.3.1. Did I miss something?
Regards,
Armin
-Ursprüngliche Nachricht-
Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de]
Gesendet: Donnerstag, 29. Dezember 2011 11:30
An:
Hi Torsten,
I got the idea. The easiest way would be to use
org.uimafit.component.xwriter.XWriter as a kind of visualizer. But that does
not work. XWriter does always write the whole CAS instead of the mapped view. I
tried
CollectionReaderDescription reader = CollectionReaderFactory...
Hi Tomas,
you can get rid of the Java classes for annotation types by using CAS instead
of JCas. This is a little less comfortable than with JCas. But you still need
to have a type system. With CAS you can generate new types at runtime. This is
very nice for testing with JUnit. But I don't
Hi Richard,
thank you. I did it the hard way using CAS. JCas works fine as well. In both
cases SourceDocumentInformation.xml has to be included as a type system.
For the latter case, I derived a JCas from a CAS with getJCas() local to the
process method as in
Hi!
I need to know the name of the source documents when writing the
resulting CASes from a pipline which starts be reading source documents
with a collection reader. I thougt that
org.apache.umia.examples.SourceDocumentInformation is the correct means
to do it. But it is just an example and it
Hi Richard,
it works. I used
String documentText = FileUtils.reader2String(new
InputStreamReader(cas.getSofaDataStream)));
Thanks
Armin
-Ursprüngliche Nachricht-
Von: Richard Eckart de Castilho [mailto:eckar...@tk.informatik.tu-darmstadt.de]
Gesendet: Sonntag, 13. November 2011
Hello Jörn,
I created a new eclipse project. It works.
Thank you
Armin
-Ursprüngliche Nachricht-
Von: Jörn Kottmann [mailto:kottm...@gmail.com]
Gesendet: Dienstag, 8. November 2011 09:51
An: user@uima.apache.org
Betreff: Re: Eclipse Cas Editor Unsaved Quick Annotations
Hello,
Hi,
I have text containing control characters which should not be removed or
replaced with blanks as the text is not to be changed.
There has been a discussion of how to cope with control characters when
serializing text with XmiCasSerializer in 2007. What is the status quo
of this matter? Is
Hi,
I would like to add some annotations of type TestTypeA to view ViewA
using analysis engine analysisEngine. The first version given below
works just fine. The new annotations are added to ViewA. The scond
version does not work. The annotations are added to _InitialView. Why?
Logically there is
Hi,
what is the method setSofaDataURI(String uri, String mime) of JCas good
for? I thought that one could use it as an alternative to
setSofaDataString(). But doing so results in an empty String when
calling getDocumentText() in the analysis engine's process() method.
What am I missing?
Code
88 matches
Mail list logo