Re: Dynamically bind resources to AnalysisEngine

2018-04-11 Thread Richard Eckart de Castilho
Hi,

>> Is there a way to dynamically bind/update resources for an AnalysisEngine ?
> 

> There may be more conventions / built-in ways that DKPro has
> for this scenario.

There are no conventions in DKPro Core for resource binding. It should also not
interfere if you do resource binding with any of the components you may have
implemented yourself and mix/match in a pipeline with DKPro Core components.

> This link in the UIMA Reference manual describes Resources:
> https://uima.apache.org/d/uimaj-2.10.2/references.html#ugr.ref.resources
> 
> See also the Javadocs for SharedResourceObject
> https://uima.apache.org/d/uimaj-2.10.2/apidocs/org/apache/uima/resource/SharedResourceObject.html

uimaFIT also has support for external resources. If you use DKPro Core,
I expect you also make use of uimaFIT. You can find a bit of 
documentation here:

https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources

However, if you want to use external resources, having a look at 

https://svn.apache.org/repos/asf/uima/uimafit/trunk/uimafit-core/src/test/java/org/apache/uima/fit/factory/ExternalResourceFactoryTest.java

In particular, you might not want to use a SharedResourceObject, but instead
build your parser resource on top of Resource_ImplBase and instead of
relying on SharedResourceObject.load() you could just implement arbitrary
methods, e.g. "getLatestParser()".

That said, instead of rebinding resources to components, I would suggest that
you put your compiled parsers somewhere a "parser resource" bound
to an analysis engine would be able to find it. Then, when the 
AE asks the resource for the actual parser, the resource should
return the latest parser available. 

Cheers,

-- Richard

Re: Dynamically bind resources to AnalysisEngine

2018-04-11 Thread Marshall Schor
Hi,

I don't know about DKPro, so someone more familiar with its conventions could
respond.

UIMA supports a decoupling of resources, shared among annotators running in some
pipeline.  I'm guessing you're asking about this mechanism,  but before
proceeding, there's nothing preventing you from implementing an annotator (let's
call it the spelling corrector annotator) which could load a dictionary (let's
say, specified by a configuration parameter), and then have some mechanism to
"reload it", if it changes.

This link in the UIMA Reference manual describes Resources:
https://uima.apache.org/d/uimaj-2.10.2/references.html#ugr.ref.resources

See also the Javadocs for SharedResourceObject
https://uima.apache.org/d/uimaj-2.10.2/apidocs/org/apache/uima/resource/SharedResourceObject.html

These have a "load" method which the user is supposed to implement to cause the
resource to be "loaded".  Typically, if the resource, for example, implemennts a
hashmap, the load might read some external file and initialize the hashmap from
that.

The implementation of the load method is the responsibility of the resource
implementer. UIMA will instantiate the resource class, and call the load method,
once.

One possibility would be to have your spelling annotator check "every so often"
to see if the on-disk version has changed, and if so, call the load method
again.  If you consider doing this, remember that your annotator might (in some
deployments) be "scaled up" in multiple Java threads, so you might need to do
this under a synchronization lock.

Does this help?  There may be more conventions / built-in ways that DKPro has
for this scenario.

Cheers. -Marshall


On 4/11/2018 9:54 AM, Hugues de Mazancourt wrote:
> Hello,
>
> Is there a way to dynamically bind/update resources for an AnalysisEngine ?
> My use-case is : I build a query parser that will be used to retrieve 
> information in an indexed text database.
> The parser performs spelling correction, but doesn't have to consider words 
> in the index as spelling mistakes. Thus, the (aggregate) engine is bound to 
> the index vocabulary (ie a word list).
> My point is : when the index gets updated, its vocabulary will also be 
> updated. I can re-build a new aggregate parser, with the updated resource, 
> but this takes time, mainly for loading resources that were already loaded 
> (POS model, lexica, etc.). Is there a way to update a given resource on my 
> parser without having to rebuild it ?
>
> Thanks for your help,
> PS: I'm mostly building on top of DKPro components. I may miss some basic 
> UIMA mechanisms
> Hugues de Mazancourt
> Mazancourt Conseil
>
> E: hug...@mazancourt.com (mailto:hug...@mazancourt.com)
> P: +33-6 72 78 70 33 (tel:+33-6%2072%2078%2070%2033)
> W: http://www.mazancourt.com
>
>



Help, please

2018-04-11 Thread Igor Mayer
Hello, Peter!

Hope you are doing well.
I am a new user of UIMA RUTA, and sorry, that I dare to ask you questions
directly, but I have seen this email address at StackOverflow and you said
there, that it takes less time normally to receive an answer. I have to
solve the exercise attached to this letter. I fully read the UIMA RUTA
Guide & References posted on the Apache.org. However, I didn't found the
good approach there.
I have a task, the script has to 'understand' each sentence\utterance (one
by one) and link each to one of the intents. I tried to use Regular
Expressions as keywords, and sort sentences with some keywords with
Contains statements. However, this approach looks really duplicative. So,
would you be kind to help me, may I ask you to tell the best approach, to
solve this task and an example of how it should look like?

Thank you a lot in advance!


Exercise_Intents.xlsx
Description: MS-Excel 2007 spreadsheet


Dynamically bind resources to AnalysisEngine

2018-04-11 Thread Hugues de Mazancourt
Hello,

Is there a way to dynamically bind/update resources for an AnalysisEngine ?
My use-case is : I build a query parser that will be used to retrieve 
information in an indexed text database.
The parser performs spelling correction, but doesn't have to consider words in 
the index as spelling mistakes. Thus, the (aggregate) engine is bound to the 
index vocabulary (ie a word list).
My point is : when the index gets updated, its vocabulary will also be updated. 
I can re-build a new aggregate parser, with the updated resource, but this 
takes time, mainly for loading resources that were already loaded (POS model, 
lexica, etc.). Is there a way to update a given resource on my parser without 
having to rebuild it ?

Thanks for your help,
PS: I'm mostly building on top of DKPro components. I may miss some basic UIMA 
mechanisms
Hugues de Mazancourt
Mazancourt Conseil

E: hug...@mazancourt.com (mailto:hug...@mazancourt.com)
P: +33-6 72 78 70 33 (tel:+33-6%2072%2078%2070%2033)
W: http://www.mazancourt.com



Re: DUCC and CAS Consumers

2018-04-11 Thread Eddie Epstein
Hi Erik,

DUCC jobs can scale out user's components in two ways, horizontally by
running multiple processes (process_deployments_max)  and vertically by
running the pipeline defined by the CM, AE and CC components in multiple
threads (process_pipeline_count).  Since the constructed top AAE is
designed to run in multiple threads, it requires multiple deployments
enabled for all pipeline components.

The CM and CC components are optional as they could be already included in
the specified process_descriptor_AE. The reason for explicitly specifying
CM and CC components is to facilitate high scale out. The Job's collection
reader should create CASes with references to data which will often be
segmented by the CM into a collection of CASes to be processed by the users
AE. The initial CAS created by the driver normally does not flow into the
AE, but typically does flow to the CC after all child CASes from the CM
have been processed to trigger the CC to finalize the collection.

More information about the job model is described in the duccbook at
https://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-181000III

Regards,
Eddie


On Wed, Apr 11, 2018 at 5:16 AM, Erik Fäßler 
wrote:

> Hi all,
>
> I am doing my first steps with UIMA DUCC. I stumbled across the issue that
> my CAS consumer has allowMultipleDeployments=false since it is supposed to
> write multiple CAS document texts into one large ZIP file.
> DUCC complains about the discrepancy of the processing AAE being allowed
> for multiple deployment but one of its containers (my consumer) is not.
> I did specify the consumer with the "process_descriptor_CC” job file key
> and was assuming that DUCC would take care of it. After all, it is a key of
> its own. But it seems the consumer is just wrapped into a new AAE together
> with my annotator AAE. This new top AAE created by DUCC causes the error:
> My own AAE is allowed for multiple deployment and so are its delegates. But
> the consumer not, of course.
>
> How to handle this case? The documentation of DUCC is rather vague at this
> point. There is the section about CAS consumer changes but it doesn’t
> mention multiple deployment explicitly.
>
> What is the “process_descriptor_CC” for when it get wrapped up into an AAE
> with the user-delivered AAE anyway?
>
> Thanks and best regards,
>
> Erik
>
>


DUCC and CAS Consumers

2018-04-11 Thread Erik Fäßler
Hi all,

I am doing my first steps with UIMA DUCC. I stumbled across the issue that my 
CAS consumer has allowMultipleDeployments=false since it is supposed to write 
multiple CAS document texts into one large ZIP file.
DUCC complains about the discrepancy of the processing AAE being allowed for 
multiple deployment but one of its containers (my consumer) is not.
I did specify the consumer with the "process_descriptor_CC” job file key and 
was assuming that DUCC would take care of it. After all, it is a key of its 
own. But it seems the consumer is just wrapped into a new AAE together with my 
annotator AAE. This new top AAE created by DUCC causes the error: My own AAE is 
allowed for multiple deployment and so are its delegates. But the consumer not, 
of course.

How to handle this case? The documentation of DUCC is rather vague at this 
point. There is the section about CAS consumer changes but it doesn’t mention 
multiple deployment explicitly.

What is the “process_descriptor_CC” for when it get wrapped up into an AAE with 
the user-delivered AAE anyway?

Thanks and best regards,

Erik