sounds good. please go for it :-) -Marshall
On 6/24/2010 9:45 AM, Eddie Epstein (JIRA) wrote: > [ > https://issues.apache.org/jira/browse/UIMA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882153#action_12882153 > ] > > Eddie Epstein commented on UIMA-1818: > ------------------------------------- > > bq. Is the idea that every CAS that comes thru a particular specified > annotator would be saved to the file system? > Yes. > > bq. if so - maybe some parameter to control how many, or how frequently to > sample, etc.? > Implementing JMX control to dynamically turn on/off CAS logging would > accomplish this. > > {quote}The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non > UIMA-AS cases - where an aggregate contains another aggregate, etc. This is > already a convention in UIMA. So it would be good to just continue using it > both for UIMA-AS cases and non-UIMA-AS cases. > {quote} > Right. The same syntax should work for UIMA CASes. To clarify, the code to > implement this is in the aggregate controller, of which there is one for UIMA > AS and another for core UIMA. The UIMA AS controller only sees asynchronous > delegates and visa versa for the core UIMA controller. This issue is only > covering implementation for asynchronous delegates. > > {quote}Would it be valuable to have a spec to say if the logging was to be > before or after the AnalysisEnging, for each delegate? For instance, the spec > could be e.g., someAggName/somePrimName:before:after (showing both). "before" > could be the default.{quote} > To me, much less valuable to capture output CASes, and more complicated to > implement. The main use of capturing CASes going into a delegate is to be > able to later run the delegate stand-alone in a debug environment. In my > case, a scaled out delegate is hanging on one or more CASes and timing out. > This utility will allow one to easily capture all the CASes sent to the > queue, find the problem CAS and ultimately the cause. > > bq.Would it be valuable to dump only the changed data (a/la "delta cas")? > (possible syntax: add modifier :delta) > This sounds more appropriately handled by CAS journaling, where all CAS > modifications can be attributed to specific annotators. > > bq.It would be good if the output was consumable by the CAS Viewer, too > Interesting. The XmiCASes will be, but only if the CAS typesystem is > available. The typesystem description should be written into the directory > along with the CAS files. > > > >> Provide simple mechanism to capture all CASes input to specified delegate >> ------------------------------------------------------------------------- >> >> Key: UIMA-1818 >> URL: https://issues.apache.org/jira/browse/UIMA-1818 >> Project: UIMA >> Issue Type: New Feature >> Components: Async Scaleout >> Reporter: Eddie Epstein >> Assignee: Eddie Epstein >> >> The existing approach to capturing CASes sent to a component is to insert a >> new CAS-serializer-annotator just before it in the flow, or modify the >> component itself to serialize CASes. Both of these approaches require >> modifications to existing code and/or component descriptors, are somewhat >> time consuming and error prone. >> A much simpler approach is to just "turn on" CAS logging for a particular >> component using Java properties before starting the process, or to turn CAS >> logging on/off for an already running process using JMX operations. >> This issue covers using Java properties to turn on CAS logging for any >> delegate of an asynchronous aggregate. >> CAS logging would be controlled by the following properties: >> UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which >> other directories with XmiCas files will be created. If not specified, the >> processes current directory will be the base. >> UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates >> keys. If a delegate is nested inside a co-located async aggregate, the name >> would include the key name of the aggregate, e.g. "someAggName/someDelName". >> The XmiCas files will then be written into >> $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/ >> UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in >> the CAS containing a unique string to use the name each XmiCas file. If not >> specified, XmiCas file name will be NNN.xmi, where NNN is the time in >> microseconds since the component was initialized. >> UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; >> this parameter gives the string feature to use. An example of type and >> feature names to use would be >> "org.apache.uima.examples.SourceDocumentInformation" and "uri". >> >
