Re: General question about UimaFIT

2016-09-09 Thread Asher Stern
Hi.
Thanks for your answers.
So it seems that programming-language independence is the primal reason to
do things as they have done.

Many thanks,
Asher.


2016-09-09 18:49 GMT+03:00 Richard Eckart de Castilho :

> On 09.09.2016, at 17:37, Asher Stern  wrote:
> >
> > You explanation really makes things clear, and also answers my question.
> >
> > I still wonder whether some automatic mechanism can be developed to
> > automatically generate TypeDescription and TypeSystemDescription directly
> > from a Java class (under some conditions).
> > This can shorten the learning curve of UIMA and remove the need for
> > automatically-generated code, as well as tracking XML files in the
> > classpath. (Such benefits are actually part of the primary goals of
> > UimaFIT. Isn't it?)
> > Though, such a development, even if possible, would not be trivial.
>
> The way that UIMA works, the JCas files are not meant to be a canonical
> source of metadata information. The typesystem is an independent schema
> and the JCas classes are a convenience. They can be used, but they do
> not have to be used. In some cases, it is more reasonable to use
> the CAS API instead of the JCas API and to operate entirely without
> the JCas classes. E.g. an annotation editor where you can flexibibly
> define types (such as WebAnno) would rely on the CAS API where
> annotations are accessed by name instead of on the JCas API where
> compiled Java classes are required.
>
> So JCas is optional, but type descriptors are not.
>
> With components, it is different. The descriptor is meaningless without
> the implementing component.
>
> If somebody thought it would be worth the effort to generate type
> descriptors
> from annotated Java classes, it wouldn't be just annotating fields or
> methods.
> Some code generation would probably be involved, maybe comparable to how
> Lombok
> works. That approach has its own drawbacks starting from requiring a
> compiler
> plugin and going until e.g. Eclipse being unable to search for references
> of
> auto-generated methods.
>
> I'll not spend my time atm to follow that idea, but maybe you want to try
> it? ;)
>
> Btw. there is ongoing work on a reimplementation of the CAS (and JCas)
> towards
> a UIMA v3 in the future. If you consider diving into this, you may want to
> read a bit in the recent archives of the developer mailing list first.
>
> Cheers,
>
> -- Richard


Re: CPE processors and analysis engines

2016-09-09 Thread Richard Eckart de Castilho
At least put the AEs and the consumers into separate CAS processors.

Otherwise you run into either of the two situations:

1) AEs are not parallelized - no performance gain
2) consumers are parallelized - depending on their implementation, you might be 
getting mangled data, missing data, etc.

Cheers,

-- Richard

> On 06.09.2016, at 10:56, armin.weg...@bka.bund.de wrote:
> 
> Hi!
> 
> What's the best practice to combine analysis engines into CAS processors? 
> Should every analysis engine become its own CAS processor? Should analysis 
> engines be combined to aggregates which become CAS processors? What are the 
> conditions for doing so: technical, semantical, logical?
> 
> Best,
> Armin
> 



Re: UIMA RUTA script with stringfunctions doesnt work when executing from UIMA-Fit

2016-09-09 Thread Richard Eckart de Castilho
On 08.09.2016, at 19:46, Peter Klügl  wrote:
> 
> Sorry, that was the wrong extension... it should read:
> 
> = AnalysisEngineFactory.createEngine(RutaEngine.class, 
> RutaEngine.PARAM_ADDITIONAL_EXTENSIONS, new 
> String[]{BooleanOperationsExtension.class.getName()});
> 
> btw its better to create a description first, and then the analysis engine.

Unless you want to reuse the engine, it's actually a good idea not to 
instantiate the engine at all, but to leave that to e.g. 
SimplePipeline.runPipeline(), to the CPE or some other pre-build execution code 
which also ensures that all the lifecycle events are properly invoked on the 
engine.

Cheers,

-- Richard

Re: General question about UimaFIT

2016-09-09 Thread Richard Eckart de Castilho
On 09.09.2016, at 17:37, Asher Stern  wrote:
> 
> You explanation really makes things clear, and also answers my question.
> 
> I still wonder whether some automatic mechanism can be developed to
> automatically generate TypeDescription and TypeSystemDescription directly
> from a Java class (under some conditions).
> This can shorten the learning curve of UIMA and remove the need for
> automatically-generated code, as well as tracking XML files in the
> classpath. (Such benefits are actually part of the primary goals of
> UimaFIT. Isn't it?)
> Though, such a development, even if possible, would not be trivial.

The way that UIMA works, the JCas files are not meant to be a canonical
source of metadata information. The typesystem is an independent schema
and the JCas classes are a convenience. They can be used, but they do
not have to be used. In some cases, it is more reasonable to use
the CAS API instead of the JCas API and to operate entirely without
the JCas classes. E.g. an annotation editor where you can flexibibly
define types (such as WebAnno) would rely on the CAS API where 
annotations are accessed by name instead of on the JCas API where
compiled Java classes are required.

So JCas is optional, but type descriptors are not.

With components, it is different. The descriptor is meaningless without
the implementing component.

If somebody thought it would be worth the effort to generate type descriptors
from annotated Java classes, it wouldn't be just annotating fields or methods.
Some code generation would probably be involved, maybe comparable to how Lombok
works. That approach has its own drawbacks starting from requiring a compiler
plugin and going until e.g. Eclipse being unable to search for references of
auto-generated methods.

I'll not spend my time atm to follow that idea, but maybe you want to try it? ;)

Btw. there is ongoing work on a reimplementation of the CAS (and JCas) towards
a UIMA v3 in the future. If you consider diving into this, you may want to
read a bit in the recent archives of the developer mailing list first.

Cheers,

-- Richard 

Re: General question about UimaFIT

2016-09-09 Thread Jens Grivolla
And I guess you don't get JCAS classes for your type system without going
through JCasGen, which is another disadvantage to generating the types on
the fly. It also kind of goes against the fact that the type system should
be something you can rely on for communication between components, so it
would tend to be static.

Just out of curiosity, what's the use case for this (except maybe unit
testing as Armin mentioned)?

Best,
Jens

On Fri, Sep 9, 2016 at 4:31 PM, Richard Eckart de Castilho 
wrote:

> On 09.09.2016, at 13:39, Asher Stern  wrote:
> >
> > Hi Armin.
> > Thanks for your quick answer!
> >
> > While the workaround is indeed helpful, I am still curios why is there no
> > regular mechanism to define new types and create new descriptors
> > programmatically, much like all other UIMA components?
>
> Sure you can define types programmatically... it's just that for the
> case of types, defining them through XML is actually more convenient.
> Mind that the type-system is implementation independent! You can think
> of it as of an DTD or XSD.
>
> If you want to programmatically create a type, you can do this:
>
>   TypeSystemDescription tsd = new TypeSystemDescription_impl();
>   TypeDescription tokenTypeDesc = tsd.addType("Token", "",
> CAS.TYPE_NAME_ANNOTATION);
>   tokenTypeDesc.addFeature("length", "", CAS.TYPE_NAME_INTEGER);
>
>   CAS cas = CasCreationUtils.createCas(tsd, null, null);
>   cas.setDocumentText("This is a test.");
>
> Check out [1] slides 20 following.
>
> Cheers,
>
> -- Richard
>
> [1] https://github.com/dkpro/dkpro-tutorials/blob/master/
> GSCL2013/tags/latest/slides/GSCL2013UIMATutorialUKP.pdf


Re: initiating CpeComponentDescriptor from String or InputStream

2016-09-09 Thread Chen Xiaobin
Thanks, Richard!

That's also the solution I can think of: getting the descriptor contents
from the database and save them as temp files. Then use those temp files to
initiate the cas processors.

Best,
Xiaobin

On Fri, Sep 9, 2016 at 5:28 PM, Richard Eckart de Castilho 
wrote:

> Afaik there is no such thing. That is why the uimaFIT CpeBuilder
> stores programmatically created engine descriptors in temporary
> files.
>
> Cheers,
>
> -- Richard
>
> > On 09.09.2016, at 17:15, Chen Xiaobin  wrote:
> >
> > Hi,
> > I am wondering if there is a way to initiate a CpeComponentDescriptor
> from
> > an InputStream or a String, instead of from a physical descriptor file.
> > I am using the following code originally:
> >
> >CpeCasProcessor casProcessor = CpeDescriptorFactory.
> > produceCasProcessor(ae.getName());
> >CpeComponentDescriptor componentDescriptor = CpeDescriptorFactory.
> > produceComponentDescriptor("path/to/aeDescriptor.xml");
> >casProcessor.setCpeComponentDescriptor(componentDescriptor);
> >cpeDescription.addCasProcessor(casProcessor);
> >
> > But now in my application, all AE descriptors are stored in a database as
> > Strings. I need to construct a CPE and add some AEs to the CPE.
> >
> > Is there a way to substitute the second line of the above code to
> something
> > like:
> > CpeComponentDescriptor componentDescriptor =
> > CpeDescriptorFactory.produceComponentDescriptor(**
> descriptorFromAnInputStream
> > or String**);
> > The UIMA API does not provide such a method in the CpeDescriptorFactor.
> >
> > Thank you!
> >
> > Xiaobin
> > --
> > --
> > Eberhard Karls Universität Tübingen
> > LEAD Graduate School
> > Doctoral Candidate
> > Gartenstraße 29A · 72074 Tübingen · Germany
> > Phone +49 1765 7634 683
>
>


-- 
Xiaobin Chen
LEAD Graduate School & Research Network
Gartenstr. 29A, 72076 Tübingen


Re: General question about UimaFIT

2016-09-09 Thread Asher Stern
Hi Richard.
Many thanks!

You explanation really makes things clear, and also answers my question.

I still wonder whether some automatic mechanism can be developed to
automatically generate TypeDescription and TypeSystemDescription directly
from a Java class (under some conditions).
This can shorten the learning curve of UIMA and remove the need for
automatically-generated code, as well as tracking XML files in the
classpath. (Such benefits are actually part of the primary goals of
UimaFIT. Isn't it?)
Though, such a development, even if possible, would not be trivial.



2016-09-09 17:31 GMT+03:00 Richard Eckart de Castilho :

> On 09.09.2016, at 13:39, Asher Stern  wrote:
> >
> > Hi Armin.
> > Thanks for your quick answer!
> >
> > While the workaround is indeed helpful, I am still curios why is there no
> > regular mechanism to define new types and create new descriptors
> > programmatically, much like all other UIMA components?
>
> Sure you can define types programmatically... it's just that for the
> case of types, defining them through XML is actually more convenient.
> Mind that the type-system is implementation independent! You can think
> of it as of an DTD or XSD.
>
> If you want to programmatically create a type, you can do this:
>
>   TypeSystemDescription tsd = new TypeSystemDescription_impl();
>   TypeDescription tokenTypeDesc = tsd.addType("Token", "",
> CAS.TYPE_NAME_ANNOTATION);
>   tokenTypeDesc.addFeature("length", "", CAS.TYPE_NAME_INTEGER);
>
>   CAS cas = CasCreationUtils.createCas(tsd, null, null);
>   cas.setDocumentText("This is a test.");
>
> Check out [1] slides 20 following.
>
> Cheers,
>
> -- Richard
>
> [1] https://github.com/dkpro/dkpro-tutorials/blob/master/
> GSCL2013/tags/latest/slides/GSCL2013UIMATutorialUKP.pdf


Re: initiating CpeComponentDescriptor from String or InputStream

2016-09-09 Thread Jaroslaw Cwiklik
Can you just write out the component descriptor to a file and pass that to
the factory? I think you need a path since
the underlying code needs uima-style include which supports import by name
or location. Perhaps CPE can do this for you
with a new API you are suggesting but I the quickest path for you is to
create a file from string.

  public static CpeComponentDescriptor produceComponentDescriptor(String
aPath) {

CpeComponentDescriptor componentDescriptor = new
CpeComponentDescriptorImpl();
CpeInclude include = new CpeIncludeImpl();
include.set(aPath);
componentDescriptor.setInclude(include);
return componentDescriptor;
  }

-jerry


On Fri, Sep 9, 2016 at 11:15 AM, Chen Xiaobin  wrote:

> Hi,
> I am wondering if there is a way to initiate a CpeComponentDescriptor from
> an InputStream or a String, instead of from a physical descriptor file.
> I am using the following code originally:
>
> CpeCasProcessor casProcessor = CpeDescriptorFactory.
> produceCasProcessor(ae.getName());
> CpeComponentDescriptor componentDescriptor = CpeDescriptorFactory.
> produceComponentDescriptor("path/to/aeDescriptor.xml");
> casProcessor.setCpeComponentDescriptor(componentDescriptor);
> cpeDescription.addCasProcessor(casProcessor);
>
> But now in my application, all AE descriptors are stored in a database as
> Strings. I need to construct a CPE and add some AEs to the CPE.
>
> Is there a way to substitute the second line of the above code to something
> like:
>  CpeComponentDescriptor componentDescriptor =
> CpeDescriptorFactory.produceComponentDescriptor(**
> descriptorFromAnInputStream
> or String**);
> The UIMA API does not provide such a method in the CpeDescriptorFactor.
>
> Thank you!
>
> Xiaobin
> --
> --
> Eberhard Karls Universität Tübingen
> LEAD Graduate School
> Doctoral Candidate
> Gartenstraße 29A · 72074 Tübingen · Germany
> Phone +49 1765 7634 683
>


Re: initiating CpeComponentDescriptor from String or InputStream

2016-09-09 Thread Richard Eckart de Castilho
Afaik there is no such thing. That is why the uimaFIT CpeBuilder
stores programmatically created engine descriptors in temporary
files.

Cheers,

-- Richard

> On 09.09.2016, at 17:15, Chen Xiaobin  wrote:
> 
> Hi,
> I am wondering if there is a way to initiate a CpeComponentDescriptor from
> an InputStream or a String, instead of from a physical descriptor file.
> I am using the following code originally:
> 
>CpeCasProcessor casProcessor = CpeDescriptorFactory.
> produceCasProcessor(ae.getName());
>CpeComponentDescriptor componentDescriptor = CpeDescriptorFactory.
> produceComponentDescriptor("path/to/aeDescriptor.xml");
>casProcessor.setCpeComponentDescriptor(componentDescriptor);
>cpeDescription.addCasProcessor(casProcessor);
> 
> But now in my application, all AE descriptors are stored in a database as
> Strings. I need to construct a CPE and add some AEs to the CPE.
> 
> Is there a way to substitute the second line of the above code to something
> like:
> CpeComponentDescriptor componentDescriptor =
> CpeDescriptorFactory.produceComponentDescriptor(**descriptorFromAnInputStream
> or String**);
> The UIMA API does not provide such a method in the CpeDescriptorFactor.
> 
> Thank you!
> 
> Xiaobin
> -- 
> -- 
> Eberhard Karls Universität Tübingen
> LEAD Graduate School
> Doctoral Candidate
> Gartenstraße 29A · 72074 Tübingen · Germany
> Phone +49 1765 7634 683



initiating CpeComponentDescriptor from String or InputStream

2016-09-09 Thread Chen Xiaobin
Hi,
I am wondering if there is a way to initiate a CpeComponentDescriptor from
an InputStream or a String, instead of from a physical descriptor file.
I am using the following code originally:

CpeCasProcessor casProcessor = CpeDescriptorFactory.
produceCasProcessor(ae.getName());
CpeComponentDescriptor componentDescriptor = CpeDescriptorFactory.
produceComponentDescriptor("path/to/aeDescriptor.xml");
casProcessor.setCpeComponentDescriptor(componentDescriptor);
cpeDescription.addCasProcessor(casProcessor);

But now in my application, all AE descriptors are stored in a database as
Strings. I need to construct a CPE and add some AEs to the CPE.

Is there a way to substitute the second line of the above code to something
like:
 CpeComponentDescriptor componentDescriptor =
CpeDescriptorFactory.produceComponentDescriptor(**descriptorFromAnInputStream
or String**);
The UIMA API does not provide such a method in the CpeDescriptorFactor.

Thank you!

Xiaobin
-- 
-- 
Eberhard Karls Universität Tübingen
LEAD Graduate School
Doctoral Candidate
Gartenstraße 29A · 72074 Tübingen · Germany
Phone +49 1765 7634 683


Re: General question about UimaFIT

2016-09-09 Thread Richard Eckart de Castilho
On 09.09.2016, at 13:39, Asher Stern  wrote:
> 
> Hi Armin.
> Thanks for your quick answer!
> 
> While the workaround is indeed helpful, I am still curios why is there no
> regular mechanism to define new types and create new descriptors
> programmatically, much like all other UIMA components?

Sure you can define types programmatically... it's just that for the
case of types, defining them through XML is actually more convenient.
Mind that the type-system is implementation independent! You can think
of it as of an DTD or XSD.

If you want to programmatically create a type, you can do this:

  TypeSystemDescription tsd = new TypeSystemDescription_impl();
  TypeDescription tokenTypeDesc = tsd.addType("Token", "", 
CAS.TYPE_NAME_ANNOTATION);
  tokenTypeDesc.addFeature("length", "", CAS.TYPE_NAME_INTEGER);

  CAS cas = CasCreationUtils.createCas(tsd, null, null);
  cas.setDocumentText("This is a test.");

Check out [1] slides 20 following.

Cheers,

-- Richard

[1] 
https://github.com/dkpro/dkpro-tutorials/blob/master/GSCL2013/tags/latest/slides/GSCL2013UIMATutorialUKP.pdf

Re: General question about UimaFIT

2016-09-09 Thread Asher Stern
Hi Armin.
Thanks for your quick answer!

While the workaround is indeed helpful, I am still curios why is there no
regular mechanism to define new types and create new descriptors
programmatically, much like all other UIMA components?

I mean, what is the difference between type-system and all other UIMA
components, that forced the UimaFIT engineers to leave the XML-based
definitions for types, while getting rid of XMLs for all the rest of UIMA?





2016-09-09 13:59 GMT+03:00 :

> Hi Asher!
>
> As a work around, you can use an empty type system,
>
> TypeSystemDescription tsd = TypeSystemDescriptionFactory.
> createTypeSystemDescription("EmptyTypeSystem");
>
> add types programmatically,
>
> tsd.addType(typeName, null, CAS.TYPE_NAME_ANNOTATION);
>
> and get them later with
>
> Type type = cas.getTypeSystem().getType(typeName);
>
> The empty type system is an XML descriptor file without types residing
> somewhere in the class path. I use this for unit testing when I need a
> fresh type system.
>
> Cheers,
> Armin
>
>
> -Ursprüngliche Nachricht-
> Von: Asher Stern [mailto:aste...@gmail.com]
> Gesendet: Freitag, 9. September 2016 12:17
> An: user@uima.apache.org
> Betreff: General question about UimaFIT
>
> Hi.
> I have a general question regarding UimaFIT.
> In UimaFIT there is no longer need to write and deal with XML files, thanks
> to new classes and annotations.
>
> This is the case for almost all UIMA components, like AE, AAE, CPE, etc.
> However, for type-system definition, XML files are still required.
> My question is why?
> Is there a technical issue that makes it impossible to get rid of
> type-system XMLs? Or is it intentional due to some policy?
>
>
> Thanks in advance,
> Asher
>


AW: General question about UimaFIT

2016-09-09 Thread Armin.Wegner
Hi Asher!

As a work around, you can use an empty type system,

TypeSystemDescription tsd = 
TypeSystemDescriptionFactory.createTypeSystemDescription("EmptyTypeSystem");

add types programmatically,

tsd.addType(typeName, null, CAS.TYPE_NAME_ANNOTATION);

and get them later with

Type type = cas.getTypeSystem().getType(typeName);

The empty type system is an XML descriptor file without types residing 
somewhere in the class path. I use this for unit testing when I need a fresh 
type system.

Cheers,
Armin


-Ursprüngliche Nachricht-
Von: Asher Stern [mailto:aste...@gmail.com] 
Gesendet: Freitag, 9. September 2016 12:17
An: user@uima.apache.org
Betreff: General question about UimaFIT

Hi.
I have a general question regarding UimaFIT.
In UimaFIT there is no longer need to write and deal with XML files, thanks
to new classes and annotations.

This is the case for almost all UIMA components, like AE, AAE, CPE, etc.
However, for type-system definition, XML files are still required.
My question is why?
Is there a technical issue that makes it impossible to get rid of
type-system XMLs? Or is it intentional due to some policy?


Thanks in advance,
Asher


pgpBfjvrekVRh.pgp
Description: PGP signature


General question about UimaFIT

2016-09-09 Thread Asher Stern
Hi.
I have a general question regarding UimaFIT.
In UimaFIT there is no longer need to write and deal with XML files, thanks
to new classes and annotations.

This is the case for almost all UIMA components, like AE, AAE, CPE, etc.
However, for type-system definition, XML files are still required.
My question is why?
Is there a technical issue that makes it impossible to get rid of
type-system XMLs? Or is it intentional due to some policy?


Thanks in advance,
Asher