[ 
https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882153#action_12882153
 ] 

Eddie Epstein commented on UIMA-1818:
-------------------------------------

bq. Is the idea that every CAS that comes thru a particular specified annotator 
would be saved to the file system?
Yes.

bq. if so - maybe some parameter to control how many, or how frequently to 
sample, etc.?
Implementing JMX control to dynamically turn on/off CAS logging would 
accomplish this.

{quote}The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non 
UIMA-AS cases - where an aggregate contains another aggregate, etc. This is 
already a convention in UIMA. So it would be good to just continue using it 
both for UIMA-AS cases and non-UIMA-AS cases.
{quote}
Right. The same syntax should work for UIMA CASes. To clarify, the code to 
implement this is in the aggregate controller, of which there is one for UIMA 
AS and another for core UIMA. The UIMA AS controller only sees asynchronous 
delegates and visa versa for the core UIMA controller. This issue is only 
covering implementation for asynchronous delegates.

{quote}Would it be valuable to have a spec to say if the logging was to be 
before or after the AnalysisEnging, for each delegate? For instance, the spec 
could be e.g., someAggName/somePrimName:before:after (showing both). "before" 
could be the default.{quote}
To me, much less valuable to capture output CASes, and more complicated to 
implement. The main use of capturing CASes going into a delegate is to be able 
to later run the delegate stand-alone in a debug environment. In my case, a 
scaled out delegate is hanging on one or more CASes and timing out. This 
utility will allow one to easily capture all the CASes sent to the queue, find 
the problem CAS and ultimately the cause.

bq.Would it be valuable to dump only the changed data (a/la "delta cas")? 
(possible syntax: add modifier :delta)
This sounds more appropriately handled by CAS journaling, where all CAS 
modifications can be attributed to specific annotators.

bq.It would be good if the output was consumable by the CAS Viewer, too
Interesting. The XmiCASes will be, but only if the CAS typesystem is available. 
The typesystem description should be written into the directory along with the 
CAS files.


> Provide simple mechanism to capture all CASes input to specified delegate
> -------------------------------------------------------------------------
>
>                 Key: UIMA-1818
>                 URL: https://issues.apache.org/jira/browse/UIMA-1818
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Async Scaleout
>            Reporter: Eddie Epstein
>            Assignee: Eddie Epstein
>
> The existing approach to capturing CASes sent to a component is to insert a 
> new CAS-serializer-annotator just before it in the flow, or modify the 
> component itself to serialize CASes. Both of these approaches require 
> modifications to existing code and/or component descriptors, are somewhat 
> time consuming and error prone.
> A much simpler approach is to just "turn on" CAS logging for a particular 
> component using Java properties before starting the process, or to turn CAS 
> logging on/off for an already running process using JMX operations.
> This issue covers using Java properties to turn on CAS logging for any 
> delegate of an asynchronous aggregate.
> CAS logging would be controlled by the following properties:
> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which 
> other directories with XmiCas files will be created. If not specified, the 
> processes current directory will be the base.
> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates 
> keys. If a delegate is nested inside a co-located async aggregate, the name 
> would include the key name of the aggregate, e.g. "someAggName/someDelName". 
> The XmiCas files will then be written into 
> $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/
> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in 
> the CAS containing a unique string to use the name each XmiCas file. If not 
> specified, XmiCas file name will be NNN.xmi, where NNN is  the time in 
> microseconds since the component was initialized.
> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; 
> this parameter gives the string feature to use. An example of type and 
> feature names to use would be 
> "org.apache.uima.examples.SourceDocumentInformation" and "uri".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to