subject:"Re\: capabilityLangugaeFlow \- computeResultSpec"

Re: capabilityLangugaeFlow - computeResultSpec

2008-03-22 Thread Eddie Epstein

Very possible that results specification doesn't work correctly
through the JNI. Nobody has ever used them in C++ since I've been
working with it.

Eddie

On Wed, Jan 23, 2008 at 4:02 PM, Marshall Schor [EMAIL PROTECTED] wrote:
 Eddie - this is for you to check I think:

  There is code in UimacppEngine in method serializeResultSpecification
  which adds result spec types and features to 2 IntVector arrays (one for
  Types, one for Features).  As currently designed, these miss getting
  the subtypes of types, and all the features for types marked with the
  all-features flag in the capabilities.

  Are these required here?

  Also, I notice that the result spec supports languages - but the
  serialization for this doesn't support languages.  Is that intended?

  -Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-02-01 Thread Marshall Schor


LeHouillier, Frank D. wrote:

While making this change wouldn't affect us in any way as I can see now,
it would still be possible to use the Features in the Result Spec in a
similar way.  


Suppose you have an information extraction component that extracts
entities with attributes and you want to control which attributes are
actually being added to the CAS with the Result Spec.  You might have
type Person, with a range of features such as Address, Phone number,
Age, etc. some of which you want to output in a given configuration and
others not.  Suppose the information extraction component also extracts
attributes which are so useless that you don't include them as features
in the type system at all such as an internal id number.  Currently,
with a compiled Result Spec you could have the annotator look up the
feature on the basis of the name of the feature and then you could
reliably instantiate the feature without further ado.  After your
change, the feature would have to be checked to see if it actually
exists.  
  
We added code in the actual change that now checks to see if the feature 
actually exists (for a compiled Result Spec).  I thought it was better 
to preserve the status quo here, rather than remove this check (for 
performance reasons).  It didn't seem like it would have any measurable 
performance impact - it's one hash table lookup, basically.


Cheers. -Marshall

Again, this doesn't seem like it is that big a deal to me but I thought
I might just point out that it might have a use case.  In practice, it
seems to me that most annotators figure out the features available
either during compilation by using the JCas or during the initialization
of the Annotator.  


-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 3:57 PM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

LeHouillier, Frank D. wrote:
  

We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to


output
  

based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null


or
  
not.  


Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning true if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked allAnnotatorFeatures, even if the


Feature didn't exist.

Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false



Here's the current behavior:

Before compile:  The following would all return true except as marked:
   containsType(T1)
   containsType(T2)   returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1

   containsType(T2:f2)   returns false, not in output, etc.
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf)  yes... that's what it does -

it ignores the actual feature name because allAnnotatorFeatures is true

After compile the following return true except as marked:
   containsType(T1)
   containsType(T2)   T2 not in output capability, but is recognized 
as a subtype of T1

   containsType(T2:f2)   T1's *allAnnotatorFeatures* is inherited
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf)  false: the actual features 
are looked up
  
After the change I'm proposing, everything would be same except that

   containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.


Thanks for your comments :-)

-Marshall
  

-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 5:06 AM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec

does 
  

the following:

If the result-spec is not compiled, it says the feature is present

if 
  

it specifically put in, or if its type has the allAnnotatorFeatures


flag
  

set.

If the result-spec is compiled, it says the feature is present if it



  

is specifically put in, or if its type has the allAnnotatorFeatures


flag
  

set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-29 Thread Michael Baessler


Marshall Schor wrote:
I think my change is ready for code review.  I kept all the 
idiosyncratic behavior of the old code, so users should not notice any 
difference.  All the tests run, and test case above runs at the 6000ms 
range.

There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller 
memory footprint
2) The compiling of this is deferred till the latest possible point; 
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step 
now caches the result spec by component key, and only sends it down if 
it is different from what this controller sent the last time in 
invoked this component in the flow.
This test depends on the precomputed result specs kept in the mTable 
variable being constant - which I believe they are (once they are 
computed) - but Michael -can you confirm this?
Yes the mTable variable contains the precomputed result specs for 
sequence engines. These result specs are constant and do not change 
during the processing. The computation is done based on the output types 
of the aggregate that defines the capabilityLanguageFlow. If the result 
spec is passed in by the process method, the precomputed mTable cannot 
be used since then results that should be may be different from the 
aggregate output types.
With this change, the code in the framework to intersect the result 
spec with a component's output capabilities, by language, is not 
redone on every call, but only when the language changes.  That code 
(to do the intersection) is running faster, in any case, due to the 
restructuring.


Because this is a big change it would be good to do a code review of 
some kind - any thoughts on how to do this?
I hoped that Adam could look at this, since he know the code best from 
my point of view. All the capabilityLanguageFlow related items has been 
discussed already on the list in detail and I think now we also have 
some good tests for this.
If the code is checked in I can run again my performance tests to check 
the performance improvements.


Opinions?

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Michael Baessler


Marshall Schor wrote:
The Capability Language Flow for an aggregate is computed in 
CapabilityLanguageFlowController.computeFlowTable.


This starts with the aggregates output capabilities, and figures out a 
flow for each language, that produces all the outputs.


Should this computation also include in the set of needed outputs, 
inputs that downstream annotators need from upstream ones?  That part 
seems to be missing in this computation?


Here's an example:

An aggregate G has delegates A  B.   If B needs A to produce some 
type  T for some language, but T is not among G's outputs, but 
something that B produces is among G's output, the flow controller 
would need to tell A to produce T so that B could produce  the desired 
output at the aggregate level.


-Marshall

Adding the input capabilities automatically is fine with me.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Michael Baessler


Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall
Yes, if it is possible to add the missing functionality to the 
ResultSpecification class, fine with me.
For example the important method - 
hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
doFuzzySearch) is currently

not available at the ResultSpecification class.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor

I may have missed something - I don't see what would need to be added to 
the ResultSpecification class.  The method hasOutputTypeOrFeature(...) 
is always called with doFuzzySearch== true, which is how the 
containsType or containsFeature methods operate (always) in the Result 
Specification class.


Is there some other difference I'm missing?

-Marshall

Michael Baessler wrote:

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall
Yes, if it is possible to add the missing functionality to the 
ResultSpecification class, fine with me.
For example the important method - 
hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
doFuzzySearch) is currently

not available at the ResultSpecification class.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor

I went back and checked the Javadocs for the ResultSpecification, prior 
to my reworking of it.  I think I treated the x-unspecified slightly 
wrong, and if I had done it right, then the anomaly noted in the 
previous note (below) would not be there.


The previous Javadocs all say that the setters for a typeOrFeature 
without a language argument, are equivalent to passing in the 
x-unspecified language.   The method containsType/Feature(foo, 
x-unspecified) should be made to return true only if the Result 
specification for this contained x-unspecified.  It might not, if, for 
instance, the setting for Foo was only for languages en and de. 


A consequence of making it work this way is the following:

  containsType(foo, x-unspecified) will return false if foo is in 
the result spec

  only for particular languages.

   and the containsType(foo)   no language argument
   would also return false, if foo is in the result spec
   only for particular languages.

I plan correct the treatment of x-unspecified, along these lines, to 
work as described above.

Please post any concerns/objections :-)

-Marshall

Marshall Schor wrote:
While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).


Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with 
language x-unspecified.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en, the 2nd one produces Foo with 
language x-unspecified.  A flow given language x-unspecified 
should run the 2nd annotator, skipping the first one.  (This is how it 
works now).


===

Here's another similar case, using the other language subsumption 
between en-us and en.


Let's suppose that the aggregate says it produces type Foo with 
language en.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en-us, the 2nd one produces Foo with 
language en.  A flow given language en should run the 2nd 
annotator, skipping the first one. (This is how it works now, I think).


With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. Currently, the ResultSpecification matching works like this:

 Language arg RsltSpc Matches
  enen-us   no
  en-us en  yes
  x-unspecified *any* yes behavior needs to be different
  enx-unsp..yes

Is this correct?

-Marshall

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Michael Baessler


Yes, that is correct!

- Michael

Marshall Schor wrote:
While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).


Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with 
language x-unspecified.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en, the 2nd one produces Foo with 
language x-unspecified.  A flow given language x-unspecified 
should run the 2nd annotator, skipping the first one.  (This is how it 
works now).


===

Here's another similar case, using the other language subsumption 
between en-us and en.


Let's suppose that the aggregate says it produces type Foo with 
language en.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en-us, the 2nd one produces Foo with 
language en.  A flow given language en should run the 2nd 
annotator, skipping the first one. (This is how it works now, I think).


With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. Currently, the ResultSpecification matching works like this:

 Language arg RsltSpc Matches
  enen-us   no
  en-us en  yes
  x-unspecified *any* yes behavior needs to be different
  enx-unsp..yes

Is this correct?

-Marshall

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor

While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).


Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with language 
x-unspecified.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en, the 2nd one produces Foo with language 
x-unspecified.  A flow given language x-unspecified should run the 
2nd annotator, skipping the first one.  (This is how it works now).


===

Here's another similar case, using the other language subsumption 
between en-us and en.


Let's suppose that the aggregate says it produces type Foo with language 
en.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en-us, the 2nd one produces Foo with 
language en.  A flow given language en should run the 2nd annotator, 
skipping the first one. (This is how it works now, I think).


With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. 
Currently, the ResultSpecification matching works like this:

 Language arg RsltSpc Matches
  enen-us   no
  en-us en  yes
  x-unspecified *any* yes behavior needs to be different
  enx-unsp..yes

Is this correct?

-Marshall

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-26 Thread Marshall Schor

Can I replace the class CapabilityContainer with the much more efficient 
(now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor

The code which checks if a type or feature is in a result spec, for a 
particular language, always includes generalizing the language specifier 
by dropping the part beyond the first -.  For example, en-us and 
en-uk are simplified to en.  Because of this, I'm thinking of 
shrinking the result specification (for performance / space reasons) by 
normalizing any language specs it uses by dropping the country 
extensions, if present.


Any objections?

-Marshall

RE: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread LeHouillier, Frank D.

We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to output
based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null or
not.  It isn't terribly onerous to have to check for null but it does
actually take some code modification and this situation might be present
in other people's analysis engines too.

-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 5:06 AM
To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec does 
the following:

If the result-spec is not compiled, it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag

set.

If the result-spec is compiled, it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag

set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature

not in its type system was present. 

Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor

The implementation for checking if a feature is in the result spec does 
the following:


If the result-spec is not compiled, it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag 
set.


If the result-spec is compiled, it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag 
set and the feature exists in the type system.


For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature 
not in its type system was present. 


Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor


LeHouillier, Frank D. wrote:

We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to output
based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null or
not.  

Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning true if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked allAnnotatorFeatures, even if the 
Feature didn't exist.


Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false



Here's the current behavior:

Before compile:  The following would all return true except as marked:
  containsType(T1)
  containsType(T2)   returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1

  containsType(T2:f2)   returns false, not in output, etc.
  containsFeature(T1:f1)
  containsFeature(T1:asdfasdfasdfasdf)  yes... that's what it does - 
it ignores the actual feature name because allAnnotatorFeatures is true


After compile the following return true except as marked:
  containsType(T1)
  containsType(T2)   T2 not in output capability, but is recognized 
as a subtype of T1

  containsType(T2:f2)   T1's *allAnnotatorFeatures* is inherited
  containsFeature(T1:f1)
  containsFeature(T1:asdfasdfasdfasdf)  false: the actual features 
are looked up
 
After the change I'm proposing, everything would be same except that

  containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.


Thanks for your comments :-)

-Marshall


-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 5:06 AM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec does 
the following:


If the result-spec is not compiled, it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag


set.

If the result-spec is compiled, it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag


set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature


not in its type system was present. 


Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor


Michael Baessler wrote:

Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 
I tried to figure out how the ResultSpecification handling in 
uima-core

works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to 
check if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without 
recomputing the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.
I think my change is ready for code review.  I kept all the 
idiosyncratic behavior of the old code, so users should not notice any 
difference.  All the tests run, and test case above runs at the 6000ms 
range. 


There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller memory 
footprint
2) The compiling of this is deferred till the latest possible point; 
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step 
now caches the result spec by component key, and only sends it down if 
it is different from what this controller sent the last time in invoked 
this component in the flow. 

This test depends on the precomputed result specs kept in the mTable 
variable being constant - which I believe they are (once they are 
computed) - but Michael -can you confirm this? 

With this change, the code in the framework to intersect the result 
spec with a component's output capabilities, by language, is not redone 
on every call, but only when the language changes.  That code (to do the 
intersection) is running faster, in any case, due to the restructuring.


Because this is a big change it would be good to do a code review of 
some kind - any thoughts on how to do this?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Michael Baessler


Adam Lally wrote:

On Jan 24, 2008 9:51 AM, Michael Baessler [EMAIL PROTECTED] wrote:
  

Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
and if the type has subtypes, it adds those too?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.
  

Why should an ae or a flow produce A-subtype when only A is required?




Because an instance of A-subtype is also by definition an instance of
A.  Say a downstream annotator wants input type Person.  I have
upstream annotators that can produce instances of GovernmentOfficial,
Actor, and Author, all of which are subtypes of Person.  Shouldn't the
upstream annotator produce these types?
From my point of view, when using the capabilityLanguageFlow the 
application must specify all three or four
person subtypes when they should occur in the result. I think this is 
flow specific, another flow can it do different.


I absolutely agree that the result spec that is responsible for what 
can be produced should contain all types automatically if

the Person type is added.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor

Without actually testing this (so this may be a wrong conclusion) - it 
seems to me that the code in CapabilityLanguageFlowController that sets 
up the result specs for components, by language, in the mFlowTable, 
ignores the typesOrFeatures that the result spec adds when compile() is 
called.


If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and if 
the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be fixed, 
or is it working as intended?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Michael Baessler


Marshall Schor wrote:
Without actually testing this (so this may be a wrong conclusion) - it 
seems to me that the code in CapabilityLanguageFlowController that 
sets up the result specs for components, by language, in the 
mFlowTable, ignores the typesOrFeatures that the result spec adds when 
compile() is called.


If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and 
if the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be 
fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types 
that are specified in the aggregate ae as output type. The guideline for 
the capabilityLanguageFlow was to
specify all output results (with all interim results) in the aggregate 
that must be produced.


I we now change the mFlowTable content to match the resultSpec we also 
changes the capabilityLanguageFlow. So if we do that, how can  I prevent 
the  a sub types  isn't produced if a super type must be produced? So I 
prefer to stay with the current design - specify all you need.


What do you think?

-- Michale

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor

What about allAnnotatorFeatures?  Supposed the aggregate says it needs a 
particular Feature of a particular type.  Suppose a delegate is marked 
as producing that type, and has allAnnotatorFeatures marked.  This 
wouldn't work. 

You could say in this case that the output capability of the delegate 
*must not* rely on allAnnotatorFeatures, but instead *must* explicitly 
list those features it produces.  In one sense, this could be a good 
idea, because no delegate could *accurately* mark that it outputs 
allAnnotatorFeatures, anyway, due to the possiblity that some other 
component could add features to the type in question, completely unknown 
to this delegate - and of course, this delegate would not be setting 
those other features.


This would lead to another question - should we deprecate 
allAnnotatoreFeatures because of this?


-Marshall

Michael Baessler wrote:

Marshall Schor wrote:
Without actually testing this (so this may be a wrong conclusion) - 
it seems to me that the code in CapabilityLanguageFlowController that 
sets up the result specs for components, by language, in the 
mFlowTable, ignores the typesOrFeatures that the result spec adds 
when compile() is called.


If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and 
if the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be 
fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types 
that are specified in the aggregate ae as output type. The guideline 
for the capabilityLanguageFlow was to
specify all output results (with all interim results) in the aggregate 
that must be produced.


I we now change the mFlowTable content to match the resultSpec we also 
changes the capabilityLanguageFlow. So if we do that, how can  I 
prevent the  a sub types  isn't produced if a super type must be 
produced? So I prefer to stay with the current design - specify all 
you need.


What do you think?

-- Michale

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Michael Baessler


From this point of view..

+1 to deprecate allAnnotatoreFeatures

-- Michael

Marshall Schor wrote:
What about allAnnotatorFeatures?  Supposed the aggregate says it needs 
a particular Feature of a particular type.  Suppose a delegate is 
marked as producing that type, and has allAnnotatorFeatures marked.  
This wouldn't work.
You could say in this case that the output capability of the delegate 
*must not* rely on allAnnotatorFeatures, but instead *must* explicitly 
list those features it produces.  In one sense, this could be a good 
idea, because no delegate could *accurately* mark that it outputs 
allAnnotatorFeatures, anyway, due to the possiblity that some other 
component could add features to the type in question, completely 
unknown to this delegate - and of course, this delegate would not be 
setting those other features.


This would lead to another question - should we deprecate 
allAnnotatoreFeatures because of this?


-Marshall

Michael Baessler wrote:

Marshall Schor wrote:
Without actually testing this (so this may be a wrong conclusion) - 
it seems to me that the code in CapabilityLanguageFlowController 
that sets up the result specs for components, by language, in the 
mFlowTable, ignores the typesOrFeatures that the result spec adds 
when compile() is called.


If you recall, the compile method for results specifications 
augments the set of types/features by doing 2 things:  if the type 
has allAnnotatorFeatures=true, it adds all the features of the type; 
and if the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with 
output capability type-A with allAnnotatorFeatures marked, having 
that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if 
things are as I've conjectured above, is this something that needs 
to be fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types 
that are specified in the aggregate ae as output type. The guideline 
for the capabilityLanguageFlow was to
specify all output results (with all interim results) in the 
aggregate that must be produced.


I we now change the mFlowTable content to match the resultSpec we 
also changes the capabilityLanguageFlow. So if we do that, how can  I 
prevent the  a sub types  isn't produced if a super type must be 
produced? So I prefer to stay with the current design - specify all 
you need.


What do you think?

-- Michale

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Adam Lally

On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote:
 If you recall, the compile method for results specifications augments
 the set of types/features by doing 2 things:  if the type has
 allAnnotatorFeatures=true, it adds all the features of the type; and if
 the type has subtypes, it adds those too, propagating the
 allAnnotatorFeatures processing down.

 A consequence would be that the mFlowTable would miss these cases:

An aggregate wants type A output, and has a delegate with output
 capability A-subtype.


Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
and if the type has subtypes, it adds those too?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.

An aggregate wants Feature F output, and has a delegate with output
 capability type-A with allAnnotatorFeatures marked, having that feature.


We should be supporting this as well.  Again I didn't follow why the
behavior you described above doesn't do this.

-Adam

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Michael Baessler


Adam Lally wrote:

On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote:
  

If you recall, the compile method for results specifications augments
the set of types/features by doing 2 things:  if the type has
allAnnotatorFeatures=true, it adds all the features of the type; and if
the type has subtypes, it adds those too, propagating the
allAnnotatorFeatures processing down.

A consequence would be that the mFlowTable would miss these cases:

   An aggregate wants type A output, and has a delegate with output
capability A-subtype.




Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
and if the type has subtypes, it adds those too?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.

Why should an ae or a flow produce A-subtype when only A is required?

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor

The thing that adds allAnnotatorFeatures and subtypes is compiling the 
result spec. The builder of the mFlowTable doesn't compile the 
resultspec before using it - so it doesn't have these consequences.


-Marshall

Adam Lally wrote:

On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote:
  

If you recall, the compile method for results specifications augments
the set of types/features by doing 2 things:  if the type has
allAnnotatorFeatures=true, it adds all the features of the type; and if
the type has subtypes, it adds those too, propagating the
allAnnotatorFeatures processing down.

A consequence would be that the mFlowTable would miss these cases:

   An aggregate wants type A output, and has a delegate with output
capability A-subtype.




Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
and if the type has subtypes, it adds those too?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.

  

   An aggregate wants Feature F output, and has a delegate with output
capability type-A with allAnnotatorFeatures marked, having that feature.




We should be supporting this as well.  Again I didn't follow why the
behavior you described above doesn't do this.

-Adam

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Adam Lally

On Jan 24, 2008 9:51 AM, Michael Baessler [EMAIL PROTECTED] wrote:
  Without looking at the code, I didn't understand why this is a
  consequence of the behavior you described above.  I thought you said
  and if the type has subtypes, it adds those too?  Anyway, I
  definitely think that this should work.  By the definition of subtype,
  A-subtype *IS A* A.  So if an aggregate wants type A produced, then
  A-subtype should be produced.
 Why should an ae or a flow produce A-subtype when only A is required?


Because an instance of A-subtype is also by definition an instance of
A.  Say a downstream annotator wants input type Person.  I have
upstream annotators that can produce instances of GovernmentOfficial,
Actor, and Author, all of which are subtypes of Person.  Shouldn't the
upstream annotator produce these types?

  -Adam

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler

In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)  constructor was used when the result 
was set by an application using the process method with the resultSpec 
argument. In the current version it seems that only the version with the 
precomputed FlowTable is used. But I can't say if that is correct or not 
since I don't know the details about the ResultSpec restructuring (maybe 
only Adam knows). But you are right, if this constructor isn't necessary 
both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a section 
of the logic in CapabilityLanguageFlowObject which is never used, 
because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used when 
the result was set by an application using the process method with the 
resultSpec argument. In the current version it seems that only the 
version with the precomputed FlowTable is used. But I can't say if 
that is correct or not since I don't know the details about the 
ResultSpec restructuring (maybe only Adam knows). But you are right, 
if this constructor isn't necessary both, the code and the 
constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a section 
of the logic in CapabilityLanguageFlowObject which is never used, 
because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler



Marshall Schor wrote:
In looking thru the code for ResultSpecification_Impl, it seems there 
seems to be an inconsistency - unless I (quite possible :-) ) missed 
something.


The calls to the containsType(...) method operate in one of 2 ways, 
depending on whether or not the result specification has been 
compiled by calling the compile method.


If the result spec has not been compiled, then containsType(...) 
returns true iff the type specified is equal(...) to a type in the 
Result Specification.


If it has been compiled, then the containsType returns true iff the 
type specified is equal to a type *or any of its subtypes* in the 
Result Specification.  This is because compiling a resultSpecification 
adds the subtypes.


Can others confirm this?  In actual use within annotators, it may be 
that the result spec is always compiled before use (I haven't yet 
traced that down).
Yes, you are right, when the result spec is compiled all subtypes of a 
type are additionally added to the map. The same for features, if the 
allAnnotationFeatures is set to true.


Should the code and Javadocs be updated to have containsType return 
true for subtypes of types in the result spec, always?
I think both ways should return the same result. But which way is 
correct? If I specify a type in the result spec is it correct that all 
subtypes are also in?
If I just want to have the sub types in the result spec it is easy to 
do, but what if I only want to have the super types in the result spec 
without the subtypes?


-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler

When looking at the tests for the capability language flow I see both 
tests one with the result spec argument in the process() method and one 
without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used when 
the result was set by an application using the process method with 
the resultSpec argument. In the current version it seems that only 
the version with the precomputed FlowTable is used. But I can't say 
if that is correct or not since I don't know the details about the 
ResultSpec restructuring (maybe only Adam knows). But you are right, 
if this constructor isn't necessary both, the code and the 
constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a section 
of the logic in CapabilityLanguageFlowObject which is never used, 
because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler


Marshall Schor wrote:
I'm thinking of simplifying the CapabilityContainer class.  Right now 
it has code to process input and well as output capabilities, but the 
input ones appear never to be used.  Can anyone confirm that?  If 
confirmed, I would propose to remove the part related to input 
capabilities.
Currently I think that is true. The idea behind this CapabilityContainer 
was that maybe someone can create an sophisticated flow the computes the 
best sequence for the engines based on their input and output 
capabilities... But if that is needed we also add the input capabilities 
again. :-)


There is a HashMap, outputToFCapability, whose keys are Strings 
corresponding to an output type-or-feature name, for any language, for 
any capability-set.  The values do not seem to be used.  I'd like to 
replace this with a hashSet.  Any objections?

Yes, that seems to be correct.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

OK.  This would confirm that the other constructor is no longer needed, 
since the test that passes a result-spec arg in the process method no 
longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see both 
tests one with the result spec argument in the process() method and 
one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process method 
with the resultSpec argument. In the current version it seems that 
only the version with the precomputed FlowTable is used. But I can't 
say if that is correct or not since I don't know the details about 
the ResultSpec restructuring (maybe only Adam knows). But you are 
right, if this constructor isn't necessary both, the code and the 
constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is never 
used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor


Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() method 
and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process method 
with the resultSpec argument. In the current version it seems that 
only the version with the precomputed FlowTable is used. But I 
can't say if that is correct or not since I don't know the details 
about the ResultSpec restructuring (maybe only Adam knows). But 
you are right, if this constructor isn't necessary both, the code 
and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor


I did this trace.  Here's how it works now, without calling this.

The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl 
which calls setResultSpecification on the AEEngine_impl object, which

1) clones the result-spec object
2) adds capabilities to it from the *inputs* of all components of this 
aggregate
3) uses this one cloned object as the result spec passed down to each 
component.


Before going further - Michael - a question: isn't this 
union-with-all-inputs-behavior something you didn't want for capability 
language flow?


Maybe it doesn't matter in that the use of capability language flow is 
not done in the real application use cases by passing the result spec in 
the top level call to the process method of the analysis engine?


-Marshall

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler


OK, will do.

-- Michael

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler

Fine with me. Seems to be the way it works in the past, so we should not 
change it!


-- Michael

Marshall Schor wrote:
Given that (as far as I can tell - let's see, that would be AFAICT), 
the resultSpec is *always* used in compiled mode (because the wrapper 
always compiles it), the current implementation would have the effect 
that


 1) the allFeatures flag would work
 2) subtypes of a type specified in the resultSpec would also be 
implicitly in the resultSpec


Therefore, to keep the implementation behavior constant (a good thing 
to try for, always :-) ) we should insure any changes continue to 
exhibit this behavior, and update the Javadocs and documentation to 
reflect this.


Other opinions?

-Marshall

Michael Baessler wrote:


Marshall Schor wrote:
In looking thru the code for ResultSpecification_Impl, it seems 
there seems to be an inconsistency - unless I (quite possible :-) ) 
missed something.


The calls to the containsType(...) method operate in one of 2 ways, 
depending on whether or not the result specification has been 
compiled by calling the compile method.


If the result spec has not been compiled, then containsType(...) 
returns true iff the type specified is equal(...) to a type in the 
Result Specification.


If it has been compiled, then the containsType returns true iff the 
type specified is equal to a type *or any of its subtypes* in the 
Result Specification.  This is because compiling a 
resultSpecification adds the subtypes.


Can others confirm this?  In actual use within annotators, it may be 
that the result spec is always compiled before use (I haven't yet 
traced that down).
Yes, you are right, when the result spec is compiled all subtypes of 
a type are additionally added to the map. The same for features, if 
the allAnnotationFeatures is set to true.


Should the code and Javadocs be updated to have containsType return 
true for subtypes of types in the result spec, always?
I think both ways should return the same result. But which way is 
correct? If I specify a type in the result spec is it correct that 
all subtypes are also in?
If I just want to have the sub types in the result spec it is easy to 
do, but what if I only want to have the super types in the result 
spec without the subtypes?


-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

Here's the trace of how this works, when run from a top level 
process(cas) call:


1) the call goes to the AnalysisEngine_Impl process method, which calls 
processAndOutputNewCASes in the same object.  This calls the ASB_impl 
process method, which creates a new AggregateCasIterator(aCAS).  This 
constructor calls computeFlow on the ...asb.impl.FlowControllerContainer 
object.  This calls the particular flow controller's computeFlow 
method.  In this case, the flowController is the 
CapabilityLanguageFlowController.  Since this a new CAS coming in to the 
aggregate, the computeFlow method makes a new 
CapabilityLanguageFlowObject, passing in the pre-computed Flow Table). 

So that's how it uses this constructor, in the case where no specific 
result spec is passed.


-Marshall

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Michael Baessler

So in the older version of the capabilityLanguageFlow the inputs where 
not recognized. But I think it is not bad that these are added 
automatically since the flow can't work

if those are missing!

-- Michael

Marshall Schor wrote:

I did this trace.  Here's how it works now, without calling this.

The process(cas, result-spec) call goes to 
AggregateAnalysisEngine_Impl which calls setResultSpecification on the 
AEEngine_impl object, which

1) clones the result-spec object
2) adds capabilities to it from the *inputs* of all components of this 
aggregate
3) uses this one cloned object as the result spec passed down to each 
component.


Before going further - Michael - a question: isn't this 
union-with-all-inputs-behavior something you didn't want for 
capability language flow?


Maybe it doesn't matter in that the use of capability language flow is 
not done in the real application use cases by passing the result spec 
in the top level call to the process method of the analysis engine?


-Marshall

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how 
it works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Adam Lally

On Jan 23, 2008 8:06 AM, Michael Baessler [EMAIL PROTECTED] wrote:
 In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList,
 ResultSpecification resultSpec)  constructor was used when the result
 was set by an application using the process method with the resultSpec
 argument. In the current version it seems that only the version with the
 precomputed FlowTable is used. But I can't say if that is correct or not
 since I don't know the details about the ResultSpec restructuring (maybe
 only Adam knows).

I think no one knows exactly.  This area of the code grew somewhat
organically to address requirements over time. I don't think I ever
fully understood it how CapabilityLanguageFlow was implemented.  When
I was adding the custom flow controller in v2.0, I did my best to port
whatever behavior was there and make sure all the test cases passed.
It turned out we were missing some important test cases though, and
that's how we came around to adding the SimpleStepWithResultSpec class
in order to replicate the old behavior.  I think the key thing is to
make sure we have the right test cases in place to be sure we're
preserving backward compatibility, and then I'm all for having
Marshall clean up the code so it makes more sense.

  -Adam

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor


Eddie - this is for you to check I think:

There is code in UimacppEngine in method serializeResultSpecification 
which adds result spec types and features to 2 IntVector arrays (one for 
Types, one for Features).  As currently designed, these miss getting 
the subtypes of types, and all the features for types marked with the 
all-features flag in the capabilities. 

Are these required here? 

Also, I notice that the result spec supports languages - but the 
serialization for this doesn't support languages.  Is that intended?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

2008-01-23 Thread Marshall Schor

I'll fix the Javadocs to correspond to what the code does.  This will 
have the result that
  addResultFeature(1-feature, languages) will *add* to the existing 
languages, while
  addResultFeature(1-feature) will *replace* all existing languages 
with x-unspecified.


-Marshall


Marshall Schor wrote:

I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being false to enable users to avoid recomputing 
things.
Why not use equal ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in 
the result spec.  (There is a special hash set having just the entry 
of the default language = UNSPECIFIED_LANGUAGE = x-unspecified).
I'm going to try and make the default language hash set a constant, 
and create just one instance of it - this should improve performance, 
especially when languages are not being used.


There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without.

   The ones without reset any language spec associated with the type or
   feature(s) to the UNSPECIFIED_LANGUAGE.

   The ones with a language, sometimes replace  the language
   associated with the type/feature, and other times, they add the
   language (assuming the type/feature is already an entry in the
   hashMap of types and features).

   methods which are replacing any existing languages:

   setResultTypesAndFeatures[array of TypeOrFeature)repl with
   x-unspecified language
   setResultTypesAndFeatures[array of TypeOrFeature, languages)  
   repl with languages
   addResultTypeOrFeature(1-TypeOrFeature) repl
   with x-unspecified language
   addResultTypeOrFeature(1-TypeOrFeature, languages)  repl with
   languages
   addResultType(String, boolean) repl with x-unspecified
   language
   addResultFeature(1-feature, languages)repl with
   languagesx-unspecified

   methods which are adding to existing languages:

   addResultType(1-type, boolean, languages)  adds languages
   addResultFeature(1-feature)   adds x-unspecified

The set... method essentially clears the result spec and sets it 
with completely new information, so it is reasonable that it replaces 
any existing language information.


The addResult methods, when used to add a type or feature which 
already present, are inconsistent - with one method adding, and the 
others, replacing. This behavior is documented in the JavaDocs for the 
class.


The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including 
the language is treated as a replace, in the other as an add.  This 
seems likely a bug in the Javadocs. The code for the addResultFeature 
is reversed from the Javadocs: the code will add languages if 
specified, but replaces (with the x-unspecified) if languages are 
not specified in the method call.


Does anyone know what the correct behavior of these methods is 
supposed to be?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor


Some corner cases.

Case 1:  If using the method to alter an existing result spec by adding 
a single type with an associated set of languages,  the passed in 
allAnnotatorFeatures boolean will now be unioned with any existing 
setting of this.  Javadocs updated to reflect this.


Case 2: If you have a capability for language 1 which says output type A 
(not all features), and have another capability for language 2 which 
says output type A (allAnnotatorFeatures), this will be represented in 
the result spec by having language 1 also be for all features.


Case 3: when setting the result spec, passing null in as the value of 
the languages (for those set/add things that take language arrays) will 
be equivalent to passing in the one language x-unspecified.  So, in 
particular, if a spec says produce type A for lang 1 and 2, and then you 
use the addResultType(for type A, null-passed-in-for-language-spec) this 
will add the language x-unspecified for type A. 

I will attempt to document these in the Javadocs.  Please post a 
response if these corner cases need to be handled differently.


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor

The class CapabilityLanguageFlowObject has 2 defined constructors, but 
one is never used/referenced:

CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec)

Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor

If this is removed or if it is never called, then there is a section of 
the logic in CapabilityLanguageFlowObject which is never used, because 
mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, but 
one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor

In looking thru the code for ResultSpecification_Impl, it seems there 
seems to be an inconsistency - unless I (quite possible :-) ) missed 
something.


The calls to the containsType(...) method operate in one of 2 ways, 
depending on whether or not the result specification has been compiled 
by calling the compile method.


If the result spec has not been compiled, then containsType(...) returns 
true iff the type specified is equal(...) to a type in the Result 
Specification.


If it has been compiled, then the containsType returns true iff the type 
specified is equal to a type *or any of its subtypes* in the Result 
Specification.  This is because compiling a resultSpecification adds the 
subtypes.


Can others confirm this?  In actual use within annotators, it may be 
that the result spec is always compiled before use (I haven't yet traced 
that down).


Should the code and Javadocs be updated to have containsType return true 
for subtypes of types in the result spec, always?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor

I'm thinking of simplifying the CapabilityContainer class.  Right now it 
has code to process input and well as output capabilities, but the input 
ones appear never to be used.  Can anyone confirm that?  If confirmed, I 
would propose to remove the part related to input capabilities.


There is a HashMap, outputToFCapability, whose keys are Strings 
corresponding to an output type-or-feature name, for any language, for 
any capability-set.  The values do not seem to be used.  I'd like to 
replace this with a hashSet.  Any objections?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-21 Thread Michael Baessler

Yes, I think so. This test dumps the result spec for each AE to a file 
to check if it was computed correctly.
The computation of the result spec is done during the initialization of 
the aggregate AE when the capability language flow is created.
The precomputed result spec can later be used in the document 
processing, but this is currently not used. It is recomputed each time.


For the my simple performance test I removed the second computation that 
is done during runtime processing
( PrimitiveAnalysisEngine_impl.java: protected ResultSpecification 
computeAnalysisComponentResultSpec() ). So the original computed result 
spec is used.
But we cannot remove this code completely since it can happen that a 
result spec is provided by the application and it must be recomputed 
dynamically.


-- Michael

Marshall Schor wrote:

Michael -

I'm confused about how this test is setup.  The test descriptor this 
code uses loads an aggregate, and then runs a process method which 
ends up calling some dummy process method called 
SequencerTestAnnotator.  This process method dumps (to a file) the 
result spec.  Is that the case you're running?


How do you turn on and off the (re)computation of the result spec?

-Marshall

Michael Baessler wrote:

Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 
I tried to figure out how the ResultSpecification handling in 
uima-core

works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to 
check if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow 
test case of the uimaj-core project.
Maybe a sysout helps to detect if the unnecessary calls are done or 
not.


Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, 
null);

CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without 
recomputing the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.


-- Michael

Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

2008-01-21 Thread Marshall Schor


I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being false to enable users to avoid recomputing 
things.
Why not use equal ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in the 
result spec.  (There is a special hash set having just the entry of the 
default language = UNSPECIFIED_LANGUAGE = x-unspecified). 

I'm going to try and make the default language hash set a constant, and 
create just one instance of it - this should improve performance, 
especially when languages are not being used.


There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without. 


   The ones without reset any language spec associated with the type or
   feature(s) to the UNSPECIFIED_LANGUAGE.

   The ones with a language, sometimes replace  the language
   associated with the type/feature, and other times, they add the
   language (assuming the type/feature is already an entry in the
   hashMap of types and features).

   methods which are replacing any existing languages:

   setResultTypesAndFeatures[array of TypeOrFeature)repl with
   x-unspecified language
   setResultTypesAndFeatures[array of TypeOrFeature, languages)  
   repl with languages
   addResultTypeOrFeature(1-TypeOrFeature) repl
   with x-unspecified language
   addResultTypeOrFeature(1-TypeOrFeature, languages)  repl with
   languages
   addResultType(String, boolean) repl with x-unspecified
   language
   addResultFeature(1-feature, languages)repl with
   languagesx-unspecified

   methods which are adding to existing languages:

   addResultType(1-type, boolean, languages)  adds languages
   addResultFeature(1-feature)   adds x-unspecified

The set... method essentially clears the result spec and sets it with 
completely new information, so it is reasonable that it replaces any 
existing language information.


The addResult methods, when used to add a type or feature which already 
present, are inconsistent - with one method adding, and the others, 
replacing. This behavior is documented in the JavaDocs for the class.


The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including the 
language is treated as a replace, in the other as an add.  This seems 
likely a bug in the Javadocs. The code for the addResultFeature is 
reversed from the Javadocs: the code will add languages if specified, 
but replaces (with the x-unspecified) if languages are not specified 
in the method call.


Does anyone know what the correct behavior of these methods is 
supposed to be?


-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-11 Thread Marshall Schor


Michael -

I'm confused about how this test is setup.  The test descriptor this 
code uses loads an aggregate, and then runs a process method which ends 
up calling some dummy process method called SequencerTestAnnotator.  
This process method dumps (to a file) the result spec.  Is that the case 
you're running?


How do you turn on and off the (re)computation of the result spec?

-Marshall

Michael Baessler wrote:

Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 
I tried to figure out how the ResultSpecification handling in 
uima-core

works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to 
check if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without 
recomputing the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.


-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-09 Thread Michael Baessler


Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 

I tried to figure out how the ResultSpecification handling in uima-core
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to check 
if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without recomputing 
the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.


-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-08 Thread Michael Baessler


Adam Lally wrote:

On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote:
  

I tried to figure out how the ResultSpecification handling in uima-core
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using the 
debugger.


I used the following main() with a loop over the process call to check 
if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the result 
spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

2008-01-07 Thread Michael Baessler


Adam Lally wrote:

On Dec 18, 2007 8:55 AM, Michael Baessler [EMAIL PROTECTED] wrote:
  

Hi,
I got the request on my table that the computation of the result spec
for the capabilityLanguageFlow takes to much time.
I looked at the code and found something interesting... maybe I'm wrong,
I'm not sure.

When looking at the ASB_impl.java at processUntilNextOutputCas() I found
the following:

   //check if we have to set result spec, to support
capability language flow
if (nextStep instanceof SimpleStepWithResultSpec) {
  ResultSpecification rs =
((SimpleStepWithResultSpec)nextStep).getResultSpecification();
  if (rs != null) {
nextAe.setResultSpecification(rs);
  }
}
// invoke next AE in flow
CasIterator casIter = null;
CAS outputCas = null; //used if the AE we call outputs a
new CAS
try {
  casIter = nextAe.processAndOutputNewCASes(cas);

When a capabilityLanguageFlow is used, the ResultSpec for the flow
engines are precomputed if possible. The code above takes this
precomputed ResultSpec from the flow node and set it for the current AE.

When I go deeper to

 casIter = nextAe.processAndOutputNewCASes(cas);

I found in the PrimitiveAnalysisEngine_impl.java class in the
callAnalysisComponentProcess() method the following:

if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
  mLastTypeSystem = view.getTypeSystem();
  mCurrentResultSpecification.compile(mLastTypeSystem);
  // the actual ResultSpec we send to the component is formed by
  // looking at this primitive AE's declared output types and
eliminiating
  // any that are not in mCurrentResultSpecification.
  ResultSpecification analysisComponentResultSpec =
computeAnalysisComponentResultSpec(
  mCurrentResultSpecification,
getAnalysisEngineMetaData().getCapabilities());
  // compile result spec - necessary to get type subsumption to
work properly
  analysisComponentResultSpec.compile(mLastTypeSystem);

mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
  mResultSpecChanged = false;
}

any time when the ResultSpec changed, the ResultSpec is recomputed. But
the ResultSpec is changed any time when setResultSpecification() is called.
So what does this mean. The first code fragment in the email shows how
to get the ResultSpec from the flow controller and set it on the AE.
- So the result spec changed - The second code fragment shows what is
executed if the ResultSpec has been changed and how it is recomputed.
This means that the ResultSpec is recomputed each time process is
called. I don't think this is necessary.




That seems like a good analysis of the situation.  I think what we
need is to detect when the ResultSpecification has actually changed
and when it hasn't.  That might be tricky to do right.  If we just
check if the new ResultSpecification is == to the existing
ResultSpecification, that wouldn't work if the ResultSpecification had
been modified (it would be == but the contents wouldn't be the same).
Perhaps we could add a dirty flag to the ResultSpecification to catch
this.
I tried to figure out how the ResultSpecification handling in uima-core 
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able 
to, there are to much open questions where I don't know

exactly if it is right in any case ... :-(

Adam can you please look at this issue?

Thanks Michael

Re: capabilityLangugaeFlow - computeResultSpec

2007-12-18 Thread Adam Lally

On Dec 18, 2007 8:55 AM, Michael Baessler [EMAIL PROTECTED] wrote:
 Hi,
 I got the request on my table that the computation of the result spec
 for the capabilityLanguageFlow takes to much time.
 I looked at the code and found something interesting... maybe I'm wrong,
 I'm not sure.

 When looking at the ASB_impl.java at processUntilNextOutputCas() I found
 the following:

//check if we have to set result spec, to support
 capability language flow
 if (nextStep instanceof SimpleStepWithResultSpec) {
   ResultSpecification rs =
 ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
   if (rs != null) {
 nextAe.setResultSpecification(rs);
   }
 }
 // invoke next AE in flow
 CasIterator casIter = null;
 CAS outputCas = null; //used if the AE we call outputs a
 new CAS
 try {
   casIter = nextAe.processAndOutputNewCASes(cas);

 When a capabilityLanguageFlow is used, the ResultSpec for the flow
 engines are precomputed if possible. The code above takes this
 precomputed ResultSpec from the flow node and set it for the current AE.

 When I go deeper to

  casIter = nextAe.processAndOutputNewCASes(cas);

 I found in the PrimitiveAnalysisEngine_impl.java class in the
 callAnalysisComponentProcess() method the following:

 if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
   mLastTypeSystem = view.getTypeSystem();
   mCurrentResultSpecification.compile(mLastTypeSystem);
   // the actual ResultSpec we send to the component is formed by
   // looking at this primitive AE's declared output types and
 eliminiating
   // any that are not in mCurrentResultSpecification.
   ResultSpecification analysisComponentResultSpec =
 computeAnalysisComponentResultSpec(
   mCurrentResultSpecification,
 getAnalysisEngineMetaData().getCapabilities());
   // compile result spec - necessary to get type subsumption to
 work properly
   analysisComponentResultSpec.compile(mLastTypeSystem);

 mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
   mResultSpecChanged = false;
 }

 any time when the ResultSpec changed, the ResultSpec is recomputed. But
 the ResultSpec is changed any time when setResultSpecification() is called.
 So what does this mean. The first code fragment in the email shows how
 to get the ResultSpec from the flow controller and set it on the AE.
 - So the result spec changed - The second code fragment shows what is
 executed if the ResultSpec has been changed and how it is recomputed.
 This means that the ResultSpec is recomputed each time process is
 called. I don't think this is necessary.


That seems like a good analysis of the situation.  I think what we
need is to detect when the ResultSpecification has actually changed
and when it hasn't.  That might be tricky to do right.  If we just
check if the new ResultSpecification is == to the existing
ResultSpecification, that wouldn't work if the ResultSpecification had
been modified (it would be == but the contents wouldn't be the same).
Perhaps we could add a dirty flag to the ResultSpecification to catch
this.

 Beyond that it seems to me that the ResultsSpec
mCurrentResultSpecification
 and the computed ResultSpec
analysisComponentResultSpec
 have the same content.


Not in all cases.  The computeAnalysisComponentResultSpec() method
does an intersection of the ResultSpec with the component's output
capabilities.  I suppose with CapabilityLanguageFlow, it would never
output any type that's not in the component's output capabilities.
However think of the case of a nested aggregate where
CapabilityLanguageFlow is used in the outermost aggregate.  This would
cause setResultSpecification to be called on the sub-aggregate.  That
in turn causes the ResultSpecificaiton for each annotator to be
computed by the intersection of the sub-aggregate's
ResultSpecification with that annotator's output capabilities.

-Adam

51 matches

Mail list logo