Re: capabilityLangugaeFlow - computeResultSpec
Very possible that results specification doesn't work correctly through the JNI. Nobody has ever used them in C++ since I've been working with it. Eddie On Wed, Jan 23, 2008 at 4:02 PM, Marshall Schor [EMAIL PROTECTED] wrote: Eddie - this is for you to check I think: There is code in UimacppEngine in method serializeResultSpecification which adds result spec types and features to 2 IntVector arrays (one for Types, one for Features). As currently designed, these miss getting the subtypes of types, and all the features for types marked with the all-features flag in the capabilities. Are these required here? Also, I notice that the result spec supports languages - but the serialization for this doesn't support languages. Is that intended? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
LeHouillier, Frank D. wrote: While making this change wouldn't affect us in any way as I can see now, it would still be possible to use the Features in the Result Spec in a similar way. Suppose you have an information extraction component that extracts entities with attributes and you want to control which attributes are actually being added to the CAS with the Result Spec. You might have type Person, with a range of features such as Address, Phone number, Age, etc. some of which you want to output in a given configuration and others not. Suppose the information extraction component also extracts attributes which are so useless that you don't include them as features in the type system at all such as an internal id number. Currently, with a compiled Result Spec you could have the annotator look up the feature on the basis of the name of the feature and then you could reliably instantiate the feature without further ado. After your change, the feature would have to be checked to see if it actually exists. We added code in the actual change that now checks to see if the feature actually exists (for a compiled Result Spec). I thought it was better to preserve the status quo here, rather than remove this check (for performance reasons). It didn't seem like it would have any measurable performance impact - it's one hash table lookup, basically. Cheers. -Marshall Again, this doesn't seem like it is that big a deal to me but I thought I might just point out that it might have a use case. In practice, it seems to me that most annotators figure out the features available either during compilation by using the JCas or during the initialization of the Annotator. -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 3:57 PM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec LeHouillier, Frank D. wrote: We have an annotator that wraps a black box information extraction component that can return objects of a variety of types. We check the result specification to see if the object is something we want to output based the actual string of the name of the type. If you take away the compiled version of the ResultSpecification then we will have to also check whether the type that we get back from the type system is null or not. Hi Frank - This change would *not* take away the compiled version of the Result Spec. It would only change 1 behavior - that of returning true if a *feature* (not a type, as in your example above) was associated with a type where the capability was marked allAnnotatorFeatures, even if the Feature didn't exist. Suppose you had a type T1, and a type T2 whose super-type was T1, and features T1:f1 T2:f2, with an output capability = T1 with allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and feature T3:f3, and the output capability including T3 with allAnnotatorFeatures = false Here's the current behavior: Before compile: The following would all return true except as marked: containsType(T1) containsType(T2) returns false, T2 not in output capability, and before compile, T2 isn't recognized as a subtype of T1 containsType(T2:f2) returns false, not in output, etc. containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) yes... that's what it does - it ignores the actual feature name because allAnnotatorFeatures is true After compile the following return true except as marked: containsType(T1) containsType(T2) T2 not in output capability, but is recognized as a subtype of T1 containsType(T2:f2) T1's *allAnnotatorFeatures* is inherited containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) false: the actual features are looked up After the change I'm proposing, everything would be same except that containsFeature(T1:asdfasdfasdfasdf) would return true. I don't think this would affect the way you are using result specs, but please let me know if I've misunderstood something. We don't want to impact users with this change. Thanks for your comments :-) -Marshall -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 5:06 AM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: I think my change is ready for code review. I kept all the idiosyncratic behavior of the old code, so users should not notice any difference. All the tests run, and test case above runs at the 6000ms range. There are 3 areas changed: 1) ResultSpecification_impl is restructured for speed and smaller memory footprint 2) The compiling of this is deferred till the latest possible point; operations that can be done with the uncompiled form are done that way. 3) The code in the CapabilityLanguageFlow where it returns a next step now caches the result spec by component key, and only sends it down if it is different from what this controller sent the last time in invoked this component in the flow. This test depends on the precomputed result specs kept in the mTable variable being constant - which I believe they are (once they are computed) - but Michael -can you confirm this? Yes the mTable variable contains the precomputed result specs for sequence engines. These result specs are constant and do not change during the processing. The computation is done based on the output types of the aggregate that defines the capabilityLanguageFlow. If the result spec is passed in by the process method, the precomputed mTable cannot be used since then results that should be may be different from the aggregate output types. With this change, the code in the framework to intersect the result spec with a component's output capabilities, by language, is not redone on every call, but only when the language changes. That code (to do the intersection) is running faster, in any case, due to the restructuring. Because this is a big change it would be good to do a code review of some kind - any thoughts on how to do this? I hoped that Adam could look at this, since he know the code best from my point of view. All the capabilityLanguageFlow related items has been discussed already on the list in detail and I think now we also have some good tests for this. If the code is checked in I can run again my performance tests to check the performance improvements. Opinions? -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: The Capability Language Flow for an aggregate is computed in CapabilityLanguageFlowController.computeFlowTable. This starts with the aggregates output capabilities, and figures out a flow for each language, that produces all the outputs. Should this computation also include in the set of needed outputs, inputs that downstream annotators need from upstream ones? That part seems to be missing in this computation? Here's an example: An aggregate G has delegates A B. If B needs A to produce some type T for some language, but T is not among G's outputs, but something that B produces is among G's output, the flow controller would need to tell A to produce T so that B could produce the desired output at the aggregate level. -Marshall Adding the input capabilities automatically is fine with me. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall Yes, if it is possible to add the missing functionality to the ResultSpecification class, fine with me. For example the important method - hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, doFuzzySearch) is currently not available at the ResultSpecification class. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
I may have missed something - I don't see what would need to be added to the ResultSpecification class. The method hasOutputTypeOrFeature(...) is always called with doFuzzySearch== true, which is how the containsType or containsFeature methods operate (always) in the Result Specification class. Is there some other difference I'm missing? -Marshall Michael Baessler wrote: Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall Yes, if it is possible to add the missing functionality to the ResultSpecification class, fine with me. For example the important method - hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, doFuzzySearch) is currently not available at the ResultSpecification class. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
I went back and checked the Javadocs for the ResultSpecification, prior to my reworking of it. I think I treated the x-unspecified slightly wrong, and if I had done it right, then the anomaly noted in the previous note (below) would not be there. The previous Javadocs all say that the setters for a typeOrFeature without a language argument, are equivalent to passing in the x-unspecified language. The method containsType/Feature(foo, x-unspecified) should be made to return true only if the Result specification for this contained x-unspecified. It might not, if, for instance, the setting for Foo was only for languages en and de. A consequence of making it work this way is the following: containsType(foo, x-unspecified) will return false if foo is in the result spec only for particular languages. and the containsType(foo) no language argument would also return false, if foo is in the result spec only for particular languages. I plan correct the treatment of x-unspecified, along these lines, to work as described above. Please post any concerns/objections :-) -Marshall Marshall Schor wrote: While experimenting with this approach, I found some tests wouldn't run. (By the way, the test cases are great - they have been a great help :-) ). Here's a case I'm want to be sure I understand: Let's suppose that the aggregate says it produces type Foo with language x-unspecified. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en, the 2nd one produces Foo with language x-unspecified. A flow given language x-unspecified should run the 2nd annotator, skipping the first one. (This is how it works now). === Here's another similar case, using the other language subsumption between en-us and en. Let's suppose that the aggregate says it produces type Foo with language en. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en-us, the 2nd one produces Foo with language en. A flow given language en should run the 2nd annotator, skipping the first one. (This is how it works now, I think). With this explanation, I see there is a modification to the result spec's containsType/Feature method with a language argument needed for this use. Currently, the ResultSpecification matching works like this: Language arg RsltSpc Matches enen-us no en-us en yes x-unspecified *any* yes behavior needs to be different enx-unsp..yes Is this correct? -Marshall Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Yes, that is correct! - Michael Marshall Schor wrote: While experimenting with this approach, I found some tests wouldn't run. (By the way, the test cases are great - they have been a great help :-) ). Here's a case I'm want to be sure I understand: Let's suppose that the aggregate says it produces type Foo with language x-unspecified. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en, the 2nd one produces Foo with language x-unspecified. A flow given language x-unspecified should run the 2nd annotator, skipping the first one. (This is how it works now). === Here's another similar case, using the other language subsumption between en-us and en. Let's suppose that the aggregate says it produces type Foo with language en. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en-us, the 2nd one produces Foo with language en. A flow given language en should run the 2nd annotator, skipping the first one. (This is how it works now, I think). With this explanation, I see there is a modification to the result spec's containsType/Feature method with a language argument needed for this use. Currently, the ResultSpecification matching works like this: Language arg RsltSpc Matches enen-us no en-us en yes x-unspecified *any* yes behavior needs to be different enx-unsp..yes Is this correct? -Marshall Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
While experimenting with this approach, I found some tests wouldn't run. (By the way, the test cases are great - they have been a great help :-) ). Here's a case I'm want to be sure I understand: Let's suppose that the aggregate says it produces type Foo with language x-unspecified. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en, the 2nd one produces Foo with language x-unspecified. A flow given language x-unspecified should run the 2nd annotator, skipping the first one. (This is how it works now). === Here's another similar case, using the other language subsumption between en-us and en. Let's suppose that the aggregate says it produces type Foo with language en. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en-us, the 2nd one produces Foo with language en. A flow given language en should run the 2nd annotator, skipping the first one. (This is how it works now, I think). With this explanation, I see there is a modification to the result spec's containsType/Feature method with a language argument needed for this use. Currently, the ResultSpecification matching works like this: Language arg RsltSpc Matches enen-us no en-us en yes x-unspecified *any* yes behavior needs to be different enx-unsp..yes Is this correct? -Marshall Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
The code which checks if a type or feature is in a result spec, for a particular language, always includes generalizing the language specifier by dropping the part beyond the first -. For example, en-us and en-uk are simplified to en. Because of this, I'm thinking of shrinking the result specification (for performance / space reasons) by normalizing any language specs it uses by dropping the country extensions, if present. Any objections? -Marshall
RE: capabilityLangugaeFlow - computeResultSpec
We have an annotator that wraps a black box information extraction component that can return objects of a variety of types. We check the result specification to see if the object is something we want to output based the actual string of the name of the type. If you take away the compiled version of the ResultSpecification then we will have to also check whether the type that we get back from the type system is null or not. It isn't terribly onerous to have to check for null but it does actually take some code modification and this situation might be present in other people's analysis engines too. -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 5:06 AM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result spec to return true for features not in the type system where the type had the allAnnotatorFeatures flag set. This case shouldn't come up in practice because I can't think of good reason an annotator would ask if a feature not in its type system was present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result spec to return true for features not in the type system where the type had the allAnnotatorFeatures flag set. This case shouldn't come up in practice because I can't think of good reason an annotator would ask if a feature not in its type system was present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
LeHouillier, Frank D. wrote: We have an annotator that wraps a black box information extraction component that can return objects of a variety of types. We check the result specification to see if the object is something we want to output based the actual string of the name of the type. If you take away the compiled version of the ResultSpecification then we will have to also check whether the type that we get back from the type system is null or not. Hi Frank - This change would *not* take away the compiled version of the Result Spec. It would only change 1 behavior - that of returning true if a *feature* (not a type, as in your example above) was associated with a type where the capability was marked allAnnotatorFeatures, even if the Feature didn't exist. Suppose you had a type T1, and a type T2 whose super-type was T1, and features T1:f1 T2:f2, with an output capability = T1 with allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and feature T3:f3, and the output capability including T3 with allAnnotatorFeatures = false Here's the current behavior: Before compile: The following would all return true except as marked: containsType(T1) containsType(T2) returns false, T2 not in output capability, and before compile, T2 isn't recognized as a subtype of T1 containsType(T2:f2) returns false, not in output, etc. containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) yes... that's what it does - it ignores the actual feature name because allAnnotatorFeatures is true After compile the following return true except as marked: containsType(T1) containsType(T2) T2 not in output capability, but is recognized as a subtype of T1 containsType(T2:f2) T1's *allAnnotatorFeatures* is inherited containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) false: the actual features are looked up After the change I'm proposing, everything would be same except that containsFeature(T1:asdfasdfasdfasdf) would return true. I don't think this would affect the way you are using result specs, but please let me know if I've misunderstood something. We don't want to impact users with this change. Thanks for your comments :-) -Marshall -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 5:06 AM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result spec to return true for features not in the type system where the type had the allAnnotatorFeatures flag set. This case shouldn't come up in practice because I can't think of good reason an annotator would ask if a feature not in its type system was present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Michael Baessler wrote: Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. I think my change is ready for code review. I kept all the idiosyncratic behavior of the old code, so users should not notice any difference. All the tests run, and test case above runs at the 6000ms range. There are 3 areas changed: 1) ResultSpecification_impl is restructured for speed and smaller memory footprint 2) The compiling of this is deferred till the latest possible point; operations that can be done with the uncompiled form are done that way. 3) The code in the CapabilityLanguageFlow where it returns a next step now caches the result spec by component key, and only sends it down if it is different from what this controller sent the last time in invoked this component in the flow. This test depends on the precomputed result specs kept in the mTable variable being constant - which I believe they are (once they are computed) - but Michael -can you confirm this? With this change, the code in the framework to intersect the result spec with a component's output capabilities, by language, is not redone on every call, but only when the language changes. That code (to do the intersection) is running faster, in any case, due to the restructuring. Because this is a big change it would be good to do a code review of some kind - any thoughts on how to do this? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Adam Lally wrote: On Jan 24, 2008 9:51 AM, Michael Baessler [EMAIL PROTECTED] wrote: Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. Why should an ae or a flow produce A-subtype when only A is required? Because an instance of A-subtype is also by definition an instance of A. Say a downstream annotator wants input type Person. I have upstream annotators that can produce instances of GovernmentOfficial, Actor, and Author, all of which are subtypes of Person. Shouldn't the upstream annotator produce these types? From my point of view, when using the capabilityLanguageFlow the application must specify all three or four person subtypes when they should occur in the result. I think this is flow specific, another flow can it do different. I absolutely agree that the result spec that is responsible for what can be produced should contain all types automatically if the Person type is added. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? Yes that is correct. The mFlowTable only contains these output types that are specified in the aggregate ae as output type. The guideline for the capabilityLanguageFlow was to specify all output results (with all interim results) in the aggregate that must be produced. I we now change the mFlowTable content to match the resultSpec we also changes the capabilityLanguageFlow. So if we do that, how can I prevent the a sub types isn't produced if a super type must be produced? So I prefer to stay with the current design - specify all you need. What do you think? -- Michale
Re: capabilityLangugaeFlow - computeResultSpec
What about allAnnotatorFeatures? Supposed the aggregate says it needs a particular Feature of a particular type. Suppose a delegate is marked as producing that type, and has allAnnotatorFeatures marked. This wouldn't work. You could say in this case that the output capability of the delegate *must not* rely on allAnnotatorFeatures, but instead *must* explicitly list those features it produces. In one sense, this could be a good idea, because no delegate could *accurately* mark that it outputs allAnnotatorFeatures, anyway, due to the possiblity that some other component could add features to the type in question, completely unknown to this delegate - and of course, this delegate would not be setting those other features. This would lead to another question - should we deprecate allAnnotatoreFeatures because of this? -Marshall Michael Baessler wrote: Marshall Schor wrote: Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? Yes that is correct. The mFlowTable only contains these output types that are specified in the aggregate ae as output type. The guideline for the capabilityLanguageFlow was to specify all output results (with all interim results) in the aggregate that must be produced. I we now change the mFlowTable content to match the resultSpec we also changes the capabilityLanguageFlow. So if we do that, how can I prevent the a sub types isn't produced if a super type must be produced? So I prefer to stay with the current design - specify all you need. What do you think? -- Michale
Re: capabilityLangugaeFlow - computeResultSpec
From this point of view.. +1 to deprecate allAnnotatoreFeatures -- Michael Marshall Schor wrote: What about allAnnotatorFeatures? Supposed the aggregate says it needs a particular Feature of a particular type. Suppose a delegate is marked as producing that type, and has allAnnotatorFeatures marked. This wouldn't work. You could say in this case that the output capability of the delegate *must not* rely on allAnnotatorFeatures, but instead *must* explicitly list those features it produces. In one sense, this could be a good idea, because no delegate could *accurately* mark that it outputs allAnnotatorFeatures, anyway, due to the possiblity that some other component could add features to the type in question, completely unknown to this delegate - and of course, this delegate would not be setting those other features. This would lead to another question - should we deprecate allAnnotatoreFeatures because of this? -Marshall Michael Baessler wrote: Marshall Schor wrote: Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? Yes that is correct. The mFlowTable only contains these output types that are specified in the aggregate ae as output type. The guideline for the capabilityLanguageFlow was to specify all output results (with all interim results) in the aggregate that must be produced. I we now change the mFlowTable content to match the resultSpec we also changes the capabilityLanguageFlow. So if we do that, how can I prevent the a sub types isn't produced if a super type must be produced? So I prefer to stay with the current design - specify all you need. What do you think? -- Michale
Re: capabilityLangugaeFlow - computeResultSpec
On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote: If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. We should be supporting this as well. Again I didn't follow why the behavior you described above doesn't do this. -Adam
Re: capabilityLangugaeFlow - computeResultSpec
Adam Lally wrote: On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote: If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. Why should an ae or a flow produce A-subtype when only A is required? -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
The thing that adds allAnnotatorFeatures and subtypes is compiling the result spec. The builder of the mFlowTable doesn't compile the resultspec before using it - so it doesn't have these consequences. -Marshall Adam Lally wrote: On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote: If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. We should be supporting this as well. Again I didn't follow why the behavior you described above doesn't do this. -Adam
Re: capabilityLangugaeFlow - computeResultSpec
On Jan 24, 2008 9:51 AM, Michael Baessler [EMAIL PROTECTED] wrote: Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. Why should an ae or a flow produce A-subtype when only A is required? Because an instance of A-subtype is also by definition an instance of A. Say a downstream annotator wants input type Person. I have upstream annotators that can produce instances of GovernmentOfficial, Actor, and Author, all of which are subtypes of Person. Shouldn't the upstream annotator produce these types? -Adam
Re: capabilityLangugaeFlow - computeResultSpec
In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: In looking thru the code for ResultSpecification_Impl, it seems there seems to be an inconsistency - unless I (quite possible :-) ) missed something. The calls to the containsType(...) method operate in one of 2 ways, depending on whether or not the result specification has been compiled by calling the compile method. If the result spec has not been compiled, then containsType(...) returns true iff the type specified is equal(...) to a type in the Result Specification. If it has been compiled, then the containsType returns true iff the type specified is equal to a type *or any of its subtypes* in the Result Specification. This is because compiling a resultSpecification adds the subtypes. Can others confirm this? In actual use within annotators, it may be that the result spec is always compiled before use (I haven't yet traced that down). Yes, you are right, when the result spec is compiled all subtypes of a type are additionally added to the map. The same for features, if the allAnnotationFeatures is set to true. Should the code and Javadocs be updated to have containsType return true for subtypes of types in the result spec, always? I think both ways should return the same result. But which way is correct? If I specify a type in the result spec is it correct that all subtypes are also in? If I just want to have the sub types in the result spec it is easy to do, but what if I only want to have the super types in the result spec without the subtypes? -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Marshall Schor wrote: I'm thinking of simplifying the CapabilityContainer class. Right now it has code to process input and well as output capabilities, but the input ones appear never to be used. Can anyone confirm that? If confirmed, I would propose to remove the part related to input capabilities. Currently I think that is true. The idea behind this CapabilityContainer was that maybe someone can create an sophisticated flow the computes the best sequence for the engines based on their input and output capabilities... But if that is needed we also add the input capabilities again. :-) There is a HashMap, outputToFCapability, whose keys are Strings corresponding to an output type-or-feature name, for any language, for any capability-set. The values do not seem to be used. I'd like to replace this with a hashSet. Any objections? Yes, that seems to be correct. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I did this trace. Here's how it works now, without calling this. The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl which calls setResultSpecification on the AEEngine_impl object, which 1) clones the result-spec object 2) adds capabilities to it from the *inputs* of all components of this aggregate 3) uses this one cloned object as the result spec passed down to each component. Before going further - Michael - a question: isn't this union-with-all-inputs-behavior something you didn't want for capability language flow? Maybe it doesn't matter in that the use of capability language flow is not done in the real application use cases by passing the result spec in the top level call to the process method of the analysis engine? -Marshall Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
OK, will do. -- Michael Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Fine with me. Seems to be the way it works in the past, so we should not change it! -- Michael Marshall Schor wrote: Given that (as far as I can tell - let's see, that would be AFAICT), the resultSpec is *always* used in compiled mode (because the wrapper always compiles it), the current implementation would have the effect that 1) the allFeatures flag would work 2) subtypes of a type specified in the resultSpec would also be implicitly in the resultSpec Therefore, to keep the implementation behavior constant (a good thing to try for, always :-) ) we should insure any changes continue to exhibit this behavior, and update the Javadocs and documentation to reflect this. Other opinions? -Marshall Michael Baessler wrote: Marshall Schor wrote: In looking thru the code for ResultSpecification_Impl, it seems there seems to be an inconsistency - unless I (quite possible :-) ) missed something. The calls to the containsType(...) method operate in one of 2 ways, depending on whether or not the result specification has been compiled by calling the compile method. If the result spec has not been compiled, then containsType(...) returns true iff the type specified is equal(...) to a type in the Result Specification. If it has been compiled, then the containsType returns true iff the type specified is equal to a type *or any of its subtypes* in the Result Specification. This is because compiling a resultSpecification adds the subtypes. Can others confirm this? In actual use within annotators, it may be that the result spec is always compiled before use (I haven't yet traced that down). Yes, you are right, when the result spec is compiled all subtypes of a type are additionally added to the map. The same for features, if the allAnnotationFeatures is set to true. Should the code and Javadocs be updated to have containsType return true for subtypes of types in the result spec, always? I think both ways should return the same result. But which way is correct? If I specify a type in the result spec is it correct that all subtypes are also in? If I just want to have the sub types in the result spec it is easy to do, but what if I only want to have the super types in the result spec without the subtypes? -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Here's the trace of how this works, when run from a top level process(cas) call: 1) the call goes to the AnalysisEngine_Impl process method, which calls processAndOutputNewCASes in the same object. This calls the ASB_impl process method, which creates a new AggregateCasIterator(aCAS). This constructor calls computeFlow on the ...asb.impl.FlowControllerContainer object. This calls the particular flow controller's computeFlow method. In this case, the flowController is the CapabilityLanguageFlowController. Since this a new CAS coming in to the aggregate, the computeFlow method makes a new CapabilityLanguageFlowObject, passing in the pre-computed Flow Table). So that's how it uses this constructor, in the case where no specific result spec is passed. -Marshall Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
So in the older version of the capabilityLanguageFlow the inputs where not recognized. But I think it is not bad that these are added automatically since the flow can't work if those are missing! -- Michael Marshall Schor wrote: I did this trace. Here's how it works now, without calling this. The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl which calls setResultSpecification on the AEEngine_impl object, which 1) clones the result-spec object 2) adds capabilities to it from the *inputs* of all components of this aggregate 3) uses this one cloned object as the result spec passed down to each component. Before going further - Michael - a question: isn't this union-with-all-inputs-behavior something you didn't want for capability language flow? Maybe it doesn't matter in that the use of capability language flow is not done in the real application use cases by passing the result spec in the top level call to the process method of the analysis engine? -Marshall Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
On Jan 23, 2008 8:06 AM, Michael Baessler [EMAIL PROTECTED] wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). I think no one knows exactly. This area of the code grew somewhat organically to address requirements over time. I don't think I ever fully understood it how CapabilityLanguageFlow was implemented. When I was adding the custom flow controller in v2.0, I did my best to port whatever behavior was there and make sure all the test cases passed. It turned out we were missing some important test cases though, and that's how we came around to adding the SimpleStepWithResultSpec class in order to replicate the old behavior. I think the key thing is to make sure we have the right test cases in place to be sure we're preserving backward compatibility, and then I'm all for having Marshall clean up the code so it makes more sense. -Adam
Re: capabilityLangugaeFlow - computeResultSpec
Eddie - this is for you to check I think: There is code in UimacppEngine in method serializeResultSpecification which adds result spec types and features to 2 IntVector arrays (one for Types, one for Features). As currently designed, these miss getting the subtypes of types, and all the features for types marked with the all-features flag in the capabilities. Are these required here? Also, I notice that the result spec supports languages - but the serialization for this doesn't support languages. Is that intended? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
I'll fix the Javadocs to correspond to what the code does. This will have the result that addResultFeature(1-feature, languages) will *add* to the existing languages, while addResultFeature(1-feature) will *replace* all existing languages with x-unspecified. -Marshall Marshall Schor wrote: I'm doing a redesign for the result spec area to improve performance. The basic idea is to put a hasBeenChanged flag into the result spec object, and use it being false to enable users to avoid recomputing things. Why not use equal ? because a single result spec object is shared among multiple users, and when updated, the object is updated in place (so there is no other object to compare it to). Looking at the ResultSpec object - it has a hashMap that stores the Types and Features (TypeOrFeature objects) as the keys; the values are hashSets holding languages for which these types and features are in the result spec. (There is a special hash set having just the entry of the default language = UNSPECIFIED_LANGUAGE = x-unspecified). I'm going to try and make the default language hash set a constant, and create just one instance of it - this should improve performance, especially when languages are not being used. There are 2 kinds of methods to add types/features to a result spec: ones with language(s) and ones without. The ones without reset any language spec associated with the type or feature(s) to the UNSPECIFIED_LANGUAGE. The ones with a language, sometimes replace the language associated with the type/feature, and other times, they add the language (assuming the type/feature is already an entry in the hashMap of types and features). methods which are replacing any existing languages: setResultTypesAndFeatures[array of TypeOrFeature)repl with x-unspecified language setResultTypesAndFeatures[array of TypeOrFeature, languages) repl with languages addResultTypeOrFeature(1-TypeOrFeature) repl with x-unspecified language addResultTypeOrFeature(1-TypeOrFeature, languages) repl with languages addResultType(String, boolean) repl with x-unspecified language addResultFeature(1-feature, languages)repl with languagesx-unspecified methods which are adding to existing languages: addResultType(1-type, boolean, languages) adds languages addResultFeature(1-feature) adds x-unspecified The set... method essentially clears the result spec and sets it with completely new information, so it is reasonable that it replaces any existing language information. The addResult methods, when used to add a type or feature which already present, are inconsistent - with one method adding, and the others, replacing. This behavior is documented in the JavaDocs for the class. The JavaDocs have the behavior for adding a Feature by name reversed with the behavior for adding a Type by name. In one case, including the language is treated as a replace, in the other as an add. This seems likely a bug in the Javadocs. The code for the addResultFeature is reversed from the Javadocs: the code will add languages if specified, but replaces (with the x-unspecified) if languages are not specified in the method call. Does anyone know what the correct behavior of these methods is supposed to be? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Some corner cases. Case 1: If using the method to alter an existing result spec by adding a single type with an associated set of languages, the passed in allAnnotatorFeatures boolean will now be unioned with any existing setting of this. Javadocs updated to reflect this. Case 2: If you have a capability for language 1 which says output type A (not all features), and have another capability for language 2 which says output type A (allAnnotatorFeatures), this will be represented in the result spec by having language 1 also be for all features. Case 3: when setting the result spec, passing null in as the value of the languages (for those set/add things that take language arrays) will be equivalent to passing in the one language x-unspecified. So, in particular, if a spec says produce type A for lang 1 and 2, and then you use the addResultType(for type A, null-passed-in-for-language-spec) this will add the language x-unspecified for type A. I will attempt to document these in the Javadocs. Please post a response if these corner cases need to be handled differently. -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
In looking thru the code for ResultSpecification_Impl, it seems there seems to be an inconsistency - unless I (quite possible :-) ) missed something. The calls to the containsType(...) method operate in one of 2 ways, depending on whether or not the result specification has been compiled by calling the compile method. If the result spec has not been compiled, then containsType(...) returns true iff the type specified is equal(...) to a type in the Result Specification. If it has been compiled, then the containsType returns true iff the type specified is equal to a type *or any of its subtypes* in the Result Specification. This is because compiling a resultSpecification adds the subtypes. Can others confirm this? In actual use within annotators, it may be that the result spec is always compiled before use (I haven't yet traced that down). Should the code and Javadocs be updated to have containsType return true for subtypes of types in the result spec, always? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I'm thinking of simplifying the CapabilityContainer class. Right now it has code to process input and well as output capabilities, but the input ones appear never to be used. Can anyone confirm that? If confirmed, I would propose to remove the part related to input capabilities. There is a HashMap, outputToFCapability, whose keys are Strings corresponding to an output type-or-feature name, for any language, for any capability-set. The values do not seem to be used. I'd like to replace this with a hashSet. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Yes, I think so. This test dumps the result spec for each AE to a file to check if it was computed correctly. The computation of the result spec is done during the initialization of the aggregate AE when the capability language flow is created. The precomputed result spec can later be used in the document processing, but this is currently not used. It is recomputed each time. For the my simple performance test I removed the second computation that is done during runtime processing ( PrimitiveAnalysisEngine_impl.java: protected ResultSpecification computeAnalysisComponentResultSpec() ). So the original computed result spec is used. But we cannot remove this code completely since it can happen that a result spec is provided by the application and it must be recomputed dynamically. -- Michael Marshall Schor wrote: Michael - I'm confused about how this test is setup. The test descriptor this code uses loads an aggregate, and then runs a process method which ends up calling some dummy process method called SequencerTestAnnotator. This process method dumps (to a file) the result spec. Is that the case you're running? How do you turn on and off the (re)computation of the result spec? -Marshall Michael Baessler wrote: Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
I'm doing a redesign for the result spec area to improve performance. The basic idea is to put a hasBeenChanged flag into the result spec object, and use it being false to enable users to avoid recomputing things. Why not use equal ? because a single result spec object is shared among multiple users, and when updated, the object is updated in place (so there is no other object to compare it to). Looking at the ResultSpec object - it has a hashMap that stores the Types and Features (TypeOrFeature objects) as the keys; the values are hashSets holding languages for which these types and features are in the result spec. (There is a special hash set having just the entry of the default language = UNSPECIFIED_LANGUAGE = x-unspecified). I'm going to try and make the default language hash set a constant, and create just one instance of it - this should improve performance, especially when languages are not being used. There are 2 kinds of methods to add types/features to a result spec: ones with language(s) and ones without. The ones without reset any language spec associated with the type or feature(s) to the UNSPECIFIED_LANGUAGE. The ones with a language, sometimes replace the language associated with the type/feature, and other times, they add the language (assuming the type/feature is already an entry in the hashMap of types and features). methods which are replacing any existing languages: setResultTypesAndFeatures[array of TypeOrFeature)repl with x-unspecified language setResultTypesAndFeatures[array of TypeOrFeature, languages) repl with languages addResultTypeOrFeature(1-TypeOrFeature) repl with x-unspecified language addResultTypeOrFeature(1-TypeOrFeature, languages) repl with languages addResultType(String, boolean) repl with x-unspecified language addResultFeature(1-feature, languages)repl with languagesx-unspecified methods which are adding to existing languages: addResultType(1-type, boolean, languages) adds languages addResultFeature(1-feature) adds x-unspecified The set... method essentially clears the result spec and sets it with completely new information, so it is reasonable that it replaces any existing language information. The addResult methods, when used to add a type or feature which already present, are inconsistent - with one method adding, and the others, replacing. This behavior is documented in the JavaDocs for the class. The JavaDocs have the behavior for adding a Feature by name reversed with the behavior for adding a Type by name. In one case, including the language is treated as a replace, in the other as an add. This seems likely a bug in the Javadocs. The code for the addResultFeature is reversed from the Javadocs: the code will add languages if specified, but replaces (with the x-unspecified) if languages are not specified in the method call. Does anyone know what the correct behavior of these methods is supposed to be? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Michael - I'm confused about how this test is setup. The test descriptor this code uses loads an aggregate, and then runs a process method which ends up calling some dummy process method called SequencerTestAnnotator. This process method dumps (to a file) the result spec. Is that the case you're running? How do you turn on and off the (re)computation of the result spec? -Marshall Michael Baessler wrote: Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Adam Lally wrote: On Dec 18, 2007 8:55 AM, Michael Baessler [EMAIL PROTECTED] wrote: Hi, I got the request on my table that the computation of the result spec for the capabilityLanguageFlow takes to much time. I looked at the code and found something interesting... maybe I'm wrong, I'm not sure. When looking at the ASB_impl.java at processUntilNextOutputCas() I found the following: //check if we have to set result spec, to support capability language flow if (nextStep instanceof SimpleStepWithResultSpec) { ResultSpecification rs = ((SimpleStepWithResultSpec)nextStep).getResultSpecification(); if (rs != null) { nextAe.setResultSpecification(rs); } } // invoke next AE in flow CasIterator casIter = null; CAS outputCas = null; //used if the AE we call outputs a new CAS try { casIter = nextAe.processAndOutputNewCASes(cas); When a capabilityLanguageFlow is used, the ResultSpec for the flow engines are precomputed if possible. The code above takes this precomputed ResultSpec from the flow node and set it for the current AE. When I go deeper to casIter = nextAe.processAndOutputNewCASes(cas); I found in the PrimitiveAnalysisEngine_impl.java class in the callAnalysisComponentProcess() method the following: if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) { mLastTypeSystem = view.getTypeSystem(); mCurrentResultSpecification.compile(mLastTypeSystem); // the actual ResultSpec we send to the component is formed by // looking at this primitive AE's declared output types and eliminiating // any that are not in mCurrentResultSpecification. ResultSpecification analysisComponentResultSpec = computeAnalysisComponentResultSpec( mCurrentResultSpecification, getAnalysisEngineMetaData().getCapabilities()); // compile result spec - necessary to get type subsumption to work properly analysisComponentResultSpec.compile(mLastTypeSystem); mAnalysisComponent.setResultSpecification(analysisComponentResultSpec); mResultSpecChanged = false; } any time when the ResultSpec changed, the ResultSpec is recomputed. But the ResultSpec is changed any time when setResultSpecification() is called. So what does this mean. The first code fragment in the email shows how to get the ResultSpec from the flow controller and set it on the AE. - So the result spec changed - The second code fragment shows what is executed if the ResultSpec has been changed and how it is recomputed. This means that the ResultSpec is recomputed each time process is called. I don't think this is necessary. That seems like a good analysis of the situation. I think what we need is to detect when the ResultSpecification has actually changed and when it hasn't. That might be tricky to do right. If we just check if the new ResultSpecification is == to the existing ResultSpecification, that wouldn't work if the ResultSpecification had been modified (it would be == but the contents wouldn't be the same). Perhaps we could add a dirty flag to the ResultSpecification to catch this. I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? Thanks Michael
Re: capabilityLangugaeFlow - computeResultSpec
On Dec 18, 2007 8:55 AM, Michael Baessler [EMAIL PROTECTED] wrote: Hi, I got the request on my table that the computation of the result spec for the capabilityLanguageFlow takes to much time. I looked at the code and found something interesting... maybe I'm wrong, I'm not sure. When looking at the ASB_impl.java at processUntilNextOutputCas() I found the following: //check if we have to set result spec, to support capability language flow if (nextStep instanceof SimpleStepWithResultSpec) { ResultSpecification rs = ((SimpleStepWithResultSpec)nextStep).getResultSpecification(); if (rs != null) { nextAe.setResultSpecification(rs); } } // invoke next AE in flow CasIterator casIter = null; CAS outputCas = null; //used if the AE we call outputs a new CAS try { casIter = nextAe.processAndOutputNewCASes(cas); When a capabilityLanguageFlow is used, the ResultSpec for the flow engines are precomputed if possible. The code above takes this precomputed ResultSpec from the flow node and set it for the current AE. When I go deeper to casIter = nextAe.processAndOutputNewCASes(cas); I found in the PrimitiveAnalysisEngine_impl.java class in the callAnalysisComponentProcess() method the following: if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) { mLastTypeSystem = view.getTypeSystem(); mCurrentResultSpecification.compile(mLastTypeSystem); // the actual ResultSpec we send to the component is formed by // looking at this primitive AE's declared output types and eliminiating // any that are not in mCurrentResultSpecification. ResultSpecification analysisComponentResultSpec = computeAnalysisComponentResultSpec( mCurrentResultSpecification, getAnalysisEngineMetaData().getCapabilities()); // compile result spec - necessary to get type subsumption to work properly analysisComponentResultSpec.compile(mLastTypeSystem); mAnalysisComponent.setResultSpecification(analysisComponentResultSpec); mResultSpecChanged = false; } any time when the ResultSpec changed, the ResultSpec is recomputed. But the ResultSpec is changed any time when setResultSpecification() is called. So what does this mean. The first code fragment in the email shows how to get the ResultSpec from the flow controller and set it on the AE. - So the result spec changed - The second code fragment shows what is executed if the ResultSpec has been changed and how it is recomputed. This means that the ResultSpec is recomputed each time process is called. I don't think this is necessary. That seems like a good analysis of the situation. I think what we need is to detect when the ResultSpecification has actually changed and when it hasn't. That might be tricky to do right. If we just check if the new ResultSpecification is == to the existing ResultSpecification, that wouldn't work if the ResultSpecification had been modified (it would be == but the contents wouldn't be the same). Perhaps we could add a dirty flag to the ResultSpecification to catch this. Beyond that it seems to me that the ResultsSpec mCurrentResultSpecification and the computed ResultSpec analysisComponentResultSpec have the same content. Not in all cases. The computeAnalysisComponentResultSpec() method does an intersection of the ResultSpec with the component's output capabilities. I suppose with CapabilityLanguageFlow, it would never output any type that's not in the component's output capabilities. However think of the case of a nested aggregate where CapabilityLanguageFlow is used in the outermost aggregate. This would cause setResultSpecification to be called on the sub-aggregate. That in turn causes the ResultSpecificaiton for each annotator to be computed by the intersection of the sub-aggregate's ResultSpecification with that annotator's output capabilities. -Adam