Re: Dynamically bind resources to AnalysisEngine
Hi, >> Is there a way to dynamically bind/update resources for an AnalysisEngine ? > > There may be more conventions / built-in ways that DKPro has > for this scenario. There are no conventions in DKPro Core for resource binding. It should also not interfere if you do resource binding with any of the components you may have implemented yourself and mix/match in a pipeline with DKPro Core components. > This link in the UIMA Reference manual describes Resources: > https://uima.apache.org/d/uimaj-2.10.2/references.html#ugr.ref.resources > > See also the Javadocs for SharedResourceObject > https://uima.apache.org/d/uimaj-2.10.2/apidocs/org/apache/uima/resource/SharedResourceObject.html uimaFIT also has support for external resources. If you use DKPro Core, I expect you also make use of uimaFIT. You can find a bit of documentation here: https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources However, if you want to use external resources, having a look at https://svn.apache.org/repos/asf/uima/uimafit/trunk/uimafit-core/src/test/java/org/apache/uima/fit/factory/ExternalResourceFactoryTest.java In particular, you might not want to use a SharedResourceObject, but instead build your parser resource on top of Resource_ImplBase and instead of relying on SharedResourceObject.load() you could just implement arbitrary methods, e.g. "getLatestParser()". That said, instead of rebinding resources to components, I would suggest that you put your compiled parsers somewhere a "parser resource" bound to an analysis engine would be able to find it. Then, when the AE asks the resource for the actual parser, the resource should return the latest parser available. Cheers, -- Richard
Re: Dynamically bind resources to AnalysisEngine
Hi, I don't know about DKPro, so someone more familiar with its conventions could respond. UIMA supports a decoupling of resources, shared among annotators running in some pipeline. I'm guessing you're asking about this mechanism, but before proceeding, there's nothing preventing you from implementing an annotator (let's call it the spelling corrector annotator) which could load a dictionary (let's say, specified by a configuration parameter), and then have some mechanism to "reload it", if it changes. This link in the UIMA Reference manual describes Resources: https://uima.apache.org/d/uimaj-2.10.2/references.html#ugr.ref.resources See also the Javadocs for SharedResourceObject https://uima.apache.org/d/uimaj-2.10.2/apidocs/org/apache/uima/resource/SharedResourceObject.html These have a "load" method which the user is supposed to implement to cause the resource to be "loaded". Typically, if the resource, for example, implemennts a hashmap, the load might read some external file and initialize the hashmap from that. The implementation of the load method is the responsibility of the resource implementer. UIMA will instantiate the resource class, and call the load method, once. One possibility would be to have your spelling annotator check "every so often" to see if the on-disk version has changed, and if so, call the load method again. If you consider doing this, remember that your annotator might (in some deployments) be "scaled up" in multiple Java threads, so you might need to do this under a synchronization lock. Does this help? There may be more conventions / built-in ways that DKPro has for this scenario. Cheers. -Marshall On 4/11/2018 9:54 AM, Hugues de Mazancourt wrote: > Hello, > > Is there a way to dynamically bind/update resources for an AnalysisEngine ? > My use-case is : I build a query parser that will be used to retrieve > information in an indexed text database. > The parser performs spelling correction, but doesn't have to consider words > in the index as spelling mistakes. Thus, the (aggregate) engine is bound to > the index vocabulary (ie a word list). > My point is : when the index gets updated, its vocabulary will also be > updated. I can re-build a new aggregate parser, with the updated resource, > but this takes time, mainly for loading resources that were already loaded > (POS model, lexica, etc.). Is there a way to update a given resource on my > parser without having to rebuild it ? > > Thanks for your help, > PS: I'm mostly building on top of DKPro components. I may miss some basic > UIMA mechanisms > Hugues de Mazancourt > Mazancourt Conseil > > E: hug...@mazancourt.com (mailto:hug...@mazancourt.com) > P: +33-6 72 78 70 33 (tel:+33-6%2072%2078%2070%2033) > W: http://www.mazancourt.com > >
Help, please
Hello, Peter! Hope you are doing well. I am a new user of UIMA RUTA, and sorry, that I dare to ask you questions directly, but I have seen this email address at StackOverflow and you said there, that it takes less time normally to receive an answer. I have to solve the exercise attached to this letter. I fully read the UIMA RUTA Guide & References posted on the Apache.org. However, I didn't found the good approach there. I have a task, the script has to 'understand' each sentence\utterance (one by one) and link each to one of the intents. I tried to use Regular Expressions as keywords, and sort sentences with some keywords with Contains statements. However, this approach looks really duplicative. So, would you be kind to help me, may I ask you to tell the best approach, to solve this task and an example of how it should look like? Thank you a lot in advance! Exercise_Intents.xlsx Description: MS-Excel 2007 spreadsheet
Dynamically bind resources to AnalysisEngine
Hello, Is there a way to dynamically bind/update resources for an AnalysisEngine ? My use-case is : I build a query parser that will be used to retrieve information in an indexed text database. The parser performs spelling correction, but doesn't have to consider words in the index as spelling mistakes. Thus, the (aggregate) engine is bound to the index vocabulary (ie a word list). My point is : when the index gets updated, its vocabulary will also be updated. I can re-build a new aggregate parser, with the updated resource, but this takes time, mainly for loading resources that were already loaded (POS model, lexica, etc.). Is there a way to update a given resource on my parser without having to rebuild it ? Thanks for your help, PS: I'm mostly building on top of DKPro components. I may miss some basic UIMA mechanisms Hugues de Mazancourt Mazancourt Conseil E: hug...@mazancourt.com (mailto:hug...@mazancourt.com) P: +33-6 72 78 70 33 (tel:+33-6%2072%2078%2070%2033) W: http://www.mazancourt.com
Re: DUCC and CAS Consumers
Hi Erik, DUCC jobs can scale out user's components in two ways, horizontally by running multiple processes (process_deployments_max) and vertically by running the pipeline defined by the CM, AE and CC components in multiple threads (process_pipeline_count). Since the constructed top AAE is designed to run in multiple threads, it requires multiple deployments enabled for all pipeline components. The CM and CC components are optional as they could be already included in the specified process_descriptor_AE. The reason for explicitly specifying CM and CC components is to facilitate high scale out. The Job's collection reader should create CASes with references to data which will often be segmented by the CM into a collection of CASes to be processed by the users AE. The initial CAS created by the driver normally does not flow into the AE, but typically does flow to the CC after all child CASes from the CM have been processed to trigger the CC to finalize the collection. More information about the job model is described in the duccbook at https://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-181000III Regards, Eddie On Wed, Apr 11, 2018 at 5:16 AM, Erik Fäßler wrote: > Hi all, > > I am doing my first steps with UIMA DUCC. I stumbled across the issue that > my CAS consumer has allowMultipleDeployments=false since it is supposed to > write multiple CAS document texts into one large ZIP file. > DUCC complains about the discrepancy of the processing AAE being allowed > for multiple deployment but one of its containers (my consumer) is not. > I did specify the consumer with the "process_descriptor_CC” job file key > and was assuming that DUCC would take care of it. After all, it is a key of > its own. But it seems the consumer is just wrapped into a new AAE together > with my annotator AAE. This new top AAE created by DUCC causes the error: > My own AAE is allowed for multiple deployment and so are its delegates. But > the consumer not, of course. > > How to handle this case? The documentation of DUCC is rather vague at this > point. There is the section about CAS consumer changes but it doesn’t > mention multiple deployment explicitly. > > What is the “process_descriptor_CC” for when it get wrapped up into an AAE > with the user-delivered AAE anyway? > > Thanks and best regards, > > Erik > >
DUCC and CAS Consumers
Hi all, I am doing my first steps with UIMA DUCC. I stumbled across the issue that my CAS consumer has allowMultipleDeployments=false since it is supposed to write multiple CAS document texts into one large ZIP file. DUCC complains about the discrepancy of the processing AAE being allowed for multiple deployment but one of its containers (my consumer) is not. I did specify the consumer with the "process_descriptor_CC” job file key and was assuming that DUCC would take care of it. After all, it is a key of its own. But it seems the consumer is just wrapped into a new AAE together with my annotator AAE. This new top AAE created by DUCC causes the error: My own AAE is allowed for multiple deployment and so are its delegates. But the consumer not, of course. How to handle this case? The documentation of DUCC is rather vague at this point. There is the section about CAS consumer changes but it doesn’t mention multiple deployment explicitly. What is the “process_descriptor_CC” for when it get wrapped up into an AAE with the user-delivered AAE anyway? Thanks and best regards, Erik