Re: Generification of FSIndex

2009-08-28 Thread Marshall Schor


Jörn Kottmann wrote:
> Marshall Schor wrote:
>> Jörn Kottmann wrote:
>>  
>>> Marshall Schor wrote:
>>>
 OK - sounds like this won't be on the fix list for 2.3.0 ;-)
 And, given the value of being able to do this in selected cases, it
 doesn't sound right, either, to change the arguments to find, moveTo,
 etc from FeatureStructure.
 So let's leave this as is. 
>>> I created a jira issue to change moveTo back to FeatureStructure
>>> argument,
>>> it was not changed yet because I misunderstood you.
>>>
>>> Which other methods are you referring to ?
>>> 
>>
>> The other methods that take a FeatureStructure for purposes of
>> specifying a location in an index:
>>
>> For these, I'm not sure if the test for equality includes testing that
>> the "types" are the same, or just is testing (for sorted and set
>> indexes) that the declared "keys" are "equal".
>>
>> The methods are:
>>
>> contains
>> find
>> compare (although the javadoc says the result is undefined if the FS's
>> are not of the type of the index, but I wonder if that's checked?)
>> iterator(fs)
>>   
>
> Here are the declarations of these methods:
>
> boolean contains(FeatureStructure fs);
> FeatureStructure find(FeatureStructure fs);
> Maybe we should return T here ?
+1, since FSIndex which has this method is already parameterized on T -
it should be returning only things of type T.
>
> FeatureStructure find(FeatureStructure fs);
> I think it makes sense to return a FS of type T here.
+1
>
> int compare(FeatureStructure fs1, FeatureStructure fs2);
> If we use T here, it would enforce the API specification a little,
> because at least the java type must match.
I think we should keep this, because it could change the behavior in
this case:

Suppose you have an index over Token, and you have an instance of
Sentence, and want to see if a token is in a sentence.  Assuming the
Token index uses the same keys as the base Annotation Index, doing a
compare between Token and Sentence would be useful and make sense.

-Marshall


Re: Generification of FSIndex

2009-08-27 Thread Jörn Kottmann

Marshall Schor wrote:

Jörn Kottmann wrote:
  

Marshall Schor wrote:


OK - sounds like this won't be on the fix list for 2.3.0 ;-)
And, given the value of being able to do this in selected cases, it
doesn't sound right, either, to change the arguments to find, moveTo,
etc from FeatureStructure.
So let's leave this as is.   
  

I created a jira issue to change moveTo back to FeatureStructure
argument,
it was not changed yet because I misunderstood you.

Which other methods are you referring to ?



The other methods that take a FeatureStructure for purposes of
specifying a location in an index:

For these, I'm not sure if the test for equality includes testing that
the "types" are the same, or just is testing (for sorted and set
indexes) that the declared "keys" are "equal".

The methods are:

contains
find
compare (although the javadoc says the result is undefined if the FS's
are not of the type of the index, but I wonder if that's checked?)
iterator(fs)
  


Here are the declarations of these methods:

boolean contains(FeatureStructure fs);
FeatureStructure find(FeatureStructure fs);
Maybe we should return T here ?

FeatureStructure find(FeatureStructure fs);
I think it makes sense to return a FS of type T here.

int compare(FeatureStructure fs1, FeatureStructure fs2);
If we use T here, it would enforce the API specification a little,
because at least the java type must match.

Jörn


Re: Generification of FSIndex

2009-08-26 Thread Thilo Goetz
Marshall Schor wrote:
> 
> Jörn Kottmann wrote:
>> Marshall Schor wrote:
>>> OK - sounds like this won't be on the fix list for 2.3.0 ;-)
>>> And, given the value of being able to do this in selected cases, it
>>> doesn't sound right, either, to change the arguments to find, moveTo,
>>> etc from FeatureStructure.
>>> So let's leave this as is.   
>> I created a jira issue to change moveTo back to FeatureStructure
>> argument,
>> it was not changed yet because I misunderstood you.
>>
>> Which other methods are you referring to ?
> 
> The other methods that take a FeatureStructure for purposes of
> specifying a location in an index:
> 
> For these, I'm not sure if the test for equality includes testing that
> the "types" are the same, or just is testing (for sorted and set
> indexes) that the declared "keys" are "equal".
> 
> The methods are:
> 
> contains
> find
> compare (although the javadoc says the result is undefined if the FS's
> are not of the type of the index, but I wonder if that's checked?)

I looked yesterday, there is no check.  There is definitely room
for improvement here.  I'll open a Jira issue, but I'm not proposing
to do anything for this release.

--Thilo

> iterator(fs)
> 
> -Marshall
> 
>> Jörn
>>
>>



Re: Generification of FSIndex

2009-08-26 Thread Marshall Schor


Jörn Kottmann wrote:
> Marshall Schor wrote:
>> OK - sounds like this won't be on the fix list for 2.3.0 ;-)
>> And, given the value of being able to do this in selected cases, it
>> doesn't sound right, either, to change the arguments to find, moveTo,
>> etc from FeatureStructure.
>> So let's leave this as is.   
> I created a jira issue to change moveTo back to FeatureStructure
> argument,
> it was not changed yet because I misunderstood you.
>
> Which other methods are you referring to ?

The other methods that take a FeatureStructure for purposes of
specifying a location in an index:

For these, I'm not sure if the test for equality includes testing that
the "types" are the same, or just is testing (for sorted and set
indexes) that the declared "keys" are "equal".

The methods are:

contains
find
compare (although the javadoc says the result is undefined if the FS's
are not of the type of the index, but I wonder if that's checked?)
iterator(fs)

-Marshall

> Jörn
>
>


Re: Generification of FSIndex

2009-08-26 Thread Jörn Kottmann

Marshall Schor wrote:
OK - sounds like this won't be on the fix list for 2.3.0 ;-) 


And, given the value of being able to do this in selected cases, it
doesn't sound right, either, to change the arguments to find, moveTo,
etc from FeatureStructure.
So let's leave this as is. 
  

I created a jira issue to change moveTo back to FeatureStructure argument,
it was not changed yet because I misunderstood you.

Which other methods are you referring to ?

Jörn


Re: Generification of FSIndex

2009-08-11 Thread Marshall Schor


Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> Thilo Goetz wrote:
>> 
>>> Marshall Schor wrote:
>>>   
>>>   
 Thilo Goetz wrote:
 
 
> Adam Lally wrote:
>   
>   
>   
>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
>> 
>> 
>> 
>>> Adam Lally wrote:
>>>   
>>>   
>>>   
 On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann 
 wrote:

 
 
 
> Marshall Schor wrote:
>
>   
>   
>   
>> The generification of FSIndex currently specifies one type, > extends
>> FeatureStructure> that is the type of item being returned.
>>
>>
>> The contains and find methods have arguments of type 
>> FeatureStructure.
>> These could be changed to take type "T".
>>
>>
>> 
>> 
>> 
> No I do not think that they could be changed to take type T.
> Lets take the case of the contains method.
> The javadoc says:
> "Check if the index contains an element equal to the given feature 
> structure
> according to the
> ordering of the index. Note that this is in general not the same as 
> feature
> structure identity."
> and it for the param fs it says "The FS we're looking for.". There is 
> no
> place where
> it says that contains can only be called for FSes which have the type 
> of the
> index.
>
> The change of the parameter from FeatureStructure to T would also 
> change
> the contract of the method a little, because then it would not be 
> possible
> anymore
> to pass a FeatureStructure which has not type T.
>
>
>   
>   
>   
 I agree.  It's sometimes useful to call FSIterator.moveTo method and
 pass an FS of a Type other than the one that the index was defined
 over, as part of implementing something like a subiterator.

 
 
 
>>> I agree with the case where it's the "bag" index being used, because
>>> that uses a test which works on all feature structures.
>>>
>>> However, for Set and Sorted, the implication of passing a FS which is
>>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>>> This is because the code assumes the layout of the features and their
>>> values is appropriate for the type.  In other words, if the type of some
>>> key was a string, it might take a value from the main int heap and use
>>> it as an index into the string array - and if the int heap object was
>>> not the right type, it could pull an arbitrary value from that slot, and
>>> end up throwing an array index out of bounds exception.  When I looked,
>>> it didn't appear to me that the code checked for any kind of type
>>> subsumption before proceeding... (but I may have missed it...)
>>>
>>> It could turn out that the data (whatever is being pulled) would just
>>> happen to "match", even though the types are different.
>>>
>>> Because this is an undefined operation that could throw various kinds of
>>> runtime errors, or return an equal match where none really exists, I
>>> think it should not be allowed, for set and sorted indexes.
>>>
>>>   
>>>   
>>>   
>> In the case of the AnnotationIndex, the object that you pass to
>> FSIterator.moveTo must be a subtype of Annotation (else you would get
>> all the weird effects that you describe).  But it is still valid for a
>> user to do:
>> FSIndex index = cas.getAnnotationIndex(annotType1);
>> index.moveTo(annotType2);
>>
>> where annotType1 and annotType2 are both subtypes of 
>> uima.tcas.Annotation.
>>
>> In general, the object that you pass to moveTo() must be a subtype of
>> the type that was in the index definition (in the user's descriptor,
>> or for the case of the built-in AnnotationIndex,
>> uima.tcas.Annotation).
>>
>>   -Adam
>> 
>> 
>> 
> One concrete example of Adam's point: suppose you have a sentenceFS
> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
> position the token iterator at the first word in the sentence (modulo
> some subtleties that are beside the point here).  Very useful.
>   
>   
>   
 Yes.  This works because the "index" being iterated over is the general
 annotation index (the one that

Re: Generification of FSIndex

2009-08-11 Thread Thilo Goetz


Marshall Schor wrote:
> 
> Thilo Goetz wrote:
>> Marshall Schor wrote:
>>   
>>> Thilo Goetz wrote:
>>> 
 Adam Lally wrote:
   
   
> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
> 
> 
>> Adam Lally wrote:
>>   
>>   
>>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann 
>>> wrote:
>>>
>>> 
>>> 
 Marshall Schor wrote:

   
   
> The generification of FSIndex currently specifies one type,  FeatureStructure> that is the type of item being returned.
>
>
> The contains and find methods have arguments of type FeatureStructure.
> These could be changed to take type "T".
>
>
> 
> 
 No I do not think that they could be changed to take type T.
 Lets take the case of the contains method.
 The javadoc says:
 "Check if the index contains an element equal to the given feature 
 structure
 according to the
 ordering of the index. Note that this is in general not the same as 
 feature
 structure identity."
 and it for the param fs it says "The FS we're looking for.". There is 
 no
 place where
 it says that contains can only be called for FSes which have the type 
 of the
 index.

 The change of the parameter from FeatureStructure to T would also 
 change
 the contract of the method a little, because then it would not be 
 possible
 anymore
 to pass a FeatureStructure which has not type T.


   
   
>>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>>> pass an FS of a Type other than the one that the index was defined
>>> over, as part of implementing something like a subiterator.
>>>
>>> 
>>> 
>> I agree with the case where it's the "bag" index being used, because
>> that uses a test which works on all feature structures.
>>
>> However, for Set and Sorted, the implication of passing a FS which is
>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>> This is because the code assumes the layout of the features and their
>> values is appropriate for the type.  In other words, if the type of some
>> key was a string, it might take a value from the main int heap and use
>> it as an index into the string array - and if the int heap object was
>> not the right type, it could pull an arbitrary value from that slot, and
>> end up throwing an array index out of bounds exception.  When I looked,
>> it didn't appear to me that the code checked for any kind of type
>> subsumption before proceeding... (but I may have missed it...)
>>
>> It could turn out that the data (whatever is being pulled) would just
>> happen to "match", even though the types are different.
>>
>> Because this is an undefined operation that could throw various kinds of
>> runtime errors, or return an equal match where none really exists, I
>> think it should not be allowed, for set and sorted indexes.
>>
>>   
>>   
> In the case of the AnnotationIndex, the object that you pass to
> FSIterator.moveTo must be a subtype of Annotation (else you would get
> all the weird effects that you describe).  But it is still valid for a
> user to do:
> FSIndex index = cas.getAnnotationIndex(annotType1);
> index.moveTo(annotType2);
>
> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
>
> In general, the object that you pass to moveTo() must be a subtype of
> the type that was in the index definition (in the user's descriptor,
> or for the case of the built-in AnnotationIndex,
> uima.tcas.Annotation).
>
>   -Adam
> 
> 
 One concrete example of Adam's point: suppose you have a sentenceFS
 and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
 position the token iterator at the first word in the sentence (modulo
 some subtleties that are beside the point here).  Very useful.
   
   
>>> Yes.  This works because the "index" being iterated over is the general
>>> annotation index (the one that's built-it) - and the presumeption is
>>> that "token" and "sentenceFS" are both subtypes of AnnotationFS.
>>>
>>> Is there any reason *not* to add a check to the various methods in
>>> indexing that take one or more Feature Structures, to see if they are
>>> being passed a subtype of the type being indexed (except for bag indexes)?
>>> 
>> Sure.  Suppose B < A, and we have an index on B, but the ordering
>> relation is defined in terms of features that B inher

Re: Generification of FSIndex

2009-08-11 Thread Marshall Schor


Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> Thilo Goetz wrote:
>> 
>>> Adam Lally wrote:
>>>   
>>>   
 On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
 
 
> Adam Lally wrote:
>   
>   
>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>>
>> 
>> 
>>> Marshall Schor wrote:
>>>
>>>   
>>>   
 The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned.


 The contains and find methods have arguments of type FeatureStructure.
 These could be changed to take type "T".


 
 
>>> No I do not think that they could be changed to take type T.
>>> Lets take the case of the contains method.
>>> The javadoc says:
>>> "Check if the index contains an element equal to the given feature 
>>> structure
>>> according to the
>>> ordering of the index. Note that this is in general not the same as 
>>> feature
>>> structure identity."
>>> and it for the param fs it says "The FS we're looking for.". There is no
>>> place where
>>> it says that contains can only be called for FSes which have the type 
>>> of the
>>> index.
>>>
>>> The change of the parameter from FeatureStructure to T would also change
>>> the contract of the method a little, because then it would not be 
>>> possible
>>> anymore
>>> to pass a FeatureStructure which has not type T.
>>>
>>>
>>>   
>>>   
>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>> pass an FS of a Type other than the one that the index was defined
>> over, as part of implementing something like a subiterator.
>>
>> 
>> 
> I agree with the case where it's the "bag" index being used, because
> that uses a test which works on all feature structures.
>
> However, for Set and Sorted, the implication of passing a FS which is
> not in the type hierarchy is, according to the JavaDocs, "undefined".
> This is because the code assumes the layout of the features and their
> values is appropriate for the type.  In other words, if the type of some
> key was a string, it might take a value from the main int heap and use
> it as an index into the string array - and if the int heap object was
> not the right type, it could pull an arbitrary value from that slot, and
> end up throwing an array index out of bounds exception.  When I looked,
> it didn't appear to me that the code checked for any kind of type
> subsumption before proceeding... (but I may have missed it...)
>
> It could turn out that the data (whatever is being pulled) would just
> happen to "match", even though the types are different.
>
> Because this is an undefined operation that could throw various kinds of
> runtime errors, or return an equal match where none really exists, I
> think it should not be allowed, for set and sorted indexes.
>
>   
>   
 In the case of the AnnotationIndex, the object that you pass to
 FSIterator.moveTo must be a subtype of Annotation (else you would get
 all the weird effects that you describe).  But it is still valid for a
 user to do:
 FSIndex index = cas.getAnnotationIndex(annotType1);
 index.moveTo(annotType2);

 where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.

 In general, the object that you pass to moveTo() must be a subtype of
 the type that was in the index definition (in the user's descriptor,
 or for the case of the built-in AnnotationIndex,
 uima.tcas.Annotation).

   -Adam
 
 
>>> One concrete example of Adam's point: suppose you have a sentenceFS
>>> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
>>> position the token iterator at the first word in the sentence (modulo
>>> some subtleties that are beside the point here).  Very useful.
>>>   
>>>   
>> Yes.  This works because the "index" being iterated over is the general
>> annotation index (the one that's built-it) - and the presumeption is
>> that "token" and "sentenceFS" are both subtypes of AnnotationFS.
>>
>> Is there any reason *not* to add a check to the various methods in
>> indexing that take one or more Feature Structures, to see if they are
>> being passed a subtype of the type being indexed (except for bag indexes)?
>> 
>
> Sure.  Suppose B < A, and we have an index on B, but the ordering
> relation is defined in terms of features that B inherits from
>A.
> Then I can use an A to position iterators on the B index.  This
> works now, and I don't see why we should prohibit it.
>   

Right, I knew I wa

Re: Generification of FSIndex

2009-08-11 Thread Thilo Goetz
Marshall Schor wrote:
> 
> Thilo Goetz wrote:
>> Adam Lally wrote:
>>   
>>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
>>> 
 Adam Lally wrote:
   
> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>
> 
>> Marshall Schor wrote:
>>
>>   
>>> The generification of FSIndex currently specifies one type, >> FeatureStructure> that is the type of item being returned.
>>>
>>>
>>> The contains and find methods have arguments of type FeatureStructure.
>>> These could be changed to take type "T".
>>>
>>>
>>> 
>> No I do not think that they could be changed to take type T.
>> Lets take the case of the contains method.
>> The javadoc says:
>> "Check if the index contains an element equal to the given feature 
>> structure
>> according to the
>> ordering of the index. Note that this is in general not the same as 
>> feature
>> structure identity."
>> and it for the param fs it says "The FS we're looking for.". There is no
>> place where
>> it says that contains can only be called for FSes which have the type of 
>> the
>> index.
>>
>> The change of the parameter from FeatureStructure to T would also change
>> the contract of the method a little, because then it would not be 
>> possible
>> anymore
>> to pass a FeatureStructure which has not type T.
>>
>>
>>   
> I agree.  It's sometimes useful to call FSIterator.moveTo method and
> pass an FS of a Type other than the one that the index was defined
> over, as part of implementing something like a subiterator.
>
> 
 I agree with the case where it's the "bag" index being used, because
 that uses a test which works on all feature structures.

 However, for Set and Sorted, the implication of passing a FS which is
 not in the type hierarchy is, according to the JavaDocs, "undefined".
 This is because the code assumes the layout of the features and their
 values is appropriate for the type.  In other words, if the type of some
 key was a string, it might take a value from the main int heap and use
 it as an index into the string array - and if the int heap object was
 not the right type, it could pull an arbitrary value from that slot, and
 end up throwing an array index out of bounds exception.  When I looked,
 it didn't appear to me that the code checked for any kind of type
 subsumption before proceeding... (but I may have missed it...)

 It could turn out that the data (whatever is being pulled) would just
 happen to "match", even though the types are different.

 Because this is an undefined operation that could throw various kinds of
 runtime errors, or return an equal match where none really exists, I
 think it should not be allowed, for set and sorted indexes.

   
>>> In the case of the AnnotationIndex, the object that you pass to
>>> FSIterator.moveTo must be a subtype of Annotation (else you would get
>>> all the weird effects that you describe).  But it is still valid for a
>>> user to do:
>>> FSIndex index = cas.getAnnotationIndex(annotType1);
>>> index.moveTo(annotType2);
>>>
>>> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
>>>
>>> In general, the object that you pass to moveTo() must be a subtype of
>>> the type that was in the index definition (in the user's descriptor,
>>> or for the case of the built-in AnnotationIndex,
>>> uima.tcas.Annotation).
>>>
>>>   -Adam
>>> 
>> One concrete example of Adam's point: suppose you have a sentenceFS
>> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
>> position the token iterator at the first word in the sentence (modulo
>> some subtleties that are beside the point here).  Very useful.
>>   
> 
> Yes.  This works because the "index" being iterated over is the general
> annotation index (the one that's built-it) - and the presumeption is
> that "token" and "sentenceFS" are both subtypes of AnnotationFS.
> 
> Is there any reason *not* to add a check to the various methods in
> indexing that take one or more Feature Structures, to see if they are
> being passed a subtype of the type being indexed (except for bag indexes)?

Sure.  Suppose B < A, and we have an index on B, but the ordering
relation is defined in terms of features that B inherits from
   A.
Then I can use an A to position iterators on the B index.  This
works now, and I don't see why we should prohibit it.


> -Marshall
> 
> 
>> --Thilo
>>
>>
>>
>>   


Re: Generification of FSIndex

2009-08-11 Thread Marshall Schor


Thilo Goetz wrote:
> Adam Lally wrote:
>   
>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
>> 
>>> Adam Lally wrote:
>>>   
 On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:

 
> Marshall Schor wrote:
>
>   
>> The generification of FSIndex currently specifies one type, > FeatureStructure> that is the type of item being returned.
>>
>>
>> The contains and find methods have arguments of type FeatureStructure.
>> These could be changed to take type "T".
>>
>>
>> 
> No I do not think that they could be changed to take type T.
> Lets take the case of the contains method.
> The javadoc says:
> "Check if the index contains an element equal to the given feature 
> structure
> according to the
> ordering of the index. Note that this is in general not the same as 
> feature
> structure identity."
> and it for the param fs it says "The FS we're looking for.". There is no
> place where
> it says that contains can only be called for FSes which have the type of 
> the
> index.
>
> The change of the parameter from FeatureStructure to T would also change
> the contract of the method a little, because then it would not be possible
> anymore
> to pass a FeatureStructure which has not type T.
>
>
>   
 I agree.  It's sometimes useful to call FSIterator.moveTo method and
 pass an FS of a Type other than the one that the index was defined
 over, as part of implementing something like a subiterator.

 
>>> I agree with the case where it's the "bag" index being used, because
>>> that uses a test which works on all feature structures.
>>>
>>> However, for Set and Sorted, the implication of passing a FS which is
>>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>>> This is because the code assumes the layout of the features and their
>>> values is appropriate for the type.  In other words, if the type of some
>>> key was a string, it might take a value from the main int heap and use
>>> it as an index into the string array - and if the int heap object was
>>> not the right type, it could pull an arbitrary value from that slot, and
>>> end up throwing an array index out of bounds exception.  When I looked,
>>> it didn't appear to me that the code checked for any kind of type
>>> subsumption before proceeding... (but I may have missed it...)
>>>
>>> It could turn out that the data (whatever is being pulled) would just
>>> happen to "match", even though the types are different.
>>>
>>> Because this is an undefined operation that could throw various kinds of
>>> runtime errors, or return an equal match where none really exists, I
>>> think it should not be allowed, for set and sorted indexes.
>>>
>>>   
>> In the case of the AnnotationIndex, the object that you pass to
>> FSIterator.moveTo must be a subtype of Annotation (else you would get
>> all the weird effects that you describe).  But it is still valid for a
>> user to do:
>> FSIndex index = cas.getAnnotationIndex(annotType1);
>> index.moveTo(annotType2);
>>
>> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
>>
>> In general, the object that you pass to moveTo() must be a subtype of
>> the type that was in the index definition (in the user's descriptor,
>> or for the case of the built-in AnnotationIndex,
>> uima.tcas.Annotation).
>>
>>   -Adam
>> 
>
> One concrete example of Adam's point: suppose you have a sentenceFS
> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
> position the token iterator at the first word in the sentence (modulo
> some subtleties that are beside the point here).  Very useful.
>   

Yes.  This works because the "index" being iterated over is the general
annotation index (the one that's built-it) - and the presumeption is
that "token" and "sentenceFS" are both subtypes of AnnotationFS.

Is there any reason *not* to add a check to the various methods in
indexing that take one or more Feature Structures, to see if they are
being passed a subtype of the type being indexed (except for bag indexes)?

-Marshall


> --Thilo
>
>
>
>   


Re: Generification of FSIndex

2009-08-10 Thread Thilo Goetz
Adam Lally wrote:
> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
>> Adam Lally wrote:
>>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>>>
 Marshall Schor wrote:

> The generification of FSIndex currently specifies one type,  FeatureStructure> that is the type of item being returned.
>
>
> The contains and find methods have arguments of type FeatureStructure.
> These could be changed to take type "T".
>
>
 No I do not think that they could be changed to take type T.
 Lets take the case of the contains method.
 The javadoc says:
 "Check if the index contains an element equal to the given feature 
 structure
 according to the
 ordering of the index. Note that this is in general not the same as feature
 structure identity."
 and it for the param fs it says "The FS we're looking for.". There is no
 place where
 it says that contains can only be called for FSes which have the type of 
 the
 index.

 The change of the parameter from FeatureStructure to T would also change
 the contract of the method a little, because then it would not be possible
 anymore
 to pass a FeatureStructure which has not type T.


>>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>>> pass an FS of a Type other than the one that the index was defined
>>> over, as part of implementing something like a subiterator.
>>>
>> I agree with the case where it's the "bag" index being used, because
>> that uses a test which works on all feature structures.
>>
>> However, for Set and Sorted, the implication of passing a FS which is
>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>> This is because the code assumes the layout of the features and their
>> values is appropriate for the type.  In other words, if the type of some
>> key was a string, it might take a value from the main int heap and use
>> it as an index into the string array - and if the int heap object was
>> not the right type, it could pull an arbitrary value from that slot, and
>> end up throwing an array index out of bounds exception.  When I looked,
>> it didn't appear to me that the code checked for any kind of type
>> subsumption before proceeding... (but I may have missed it...)
>>
>> It could turn out that the data (whatever is being pulled) would just
>> happen to "match", even though the types are different.
>>
>> Because this is an undefined operation that could throw various kinds of
>> runtime errors, or return an equal match where none really exists, I
>> think it should not be allowed, for set and sorted indexes.
>>
> 
> In the case of the AnnotationIndex, the object that you pass to
> FSIterator.moveTo must be a subtype of Annotation (else you would get
> all the weird effects that you describe).  But it is still valid for a
> user to do:
> FSIndex index = cas.getAnnotationIndex(annotType1);
> index.moveTo(annotType2);
> 
> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
> 
> In general, the object that you pass to moveTo() must be a subtype of
> the type that was in the index definition (in the user's descriptor,
> or for the case of the built-in AnnotationIndex,
> uima.tcas.Annotation).
> 
>   -Adam

One concrete example of Adam's point: suppose you have a sentenceFS
and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
position the token iterator at the first word in the sentence (modulo
some subtleties that are beside the point here).  Very useful.

--Thilo



Re: Generification of FSIndex

2009-08-10 Thread Jörn Kottmann

Marshall Schor wrote:

Marshall Schor wrote:
  

Adam Lally wrote:
  


On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
  

  

Marshall Schor wrote:

  


The generification of FSIndex currently specifies one type,  that is the type of item being returned.


The contains and find methods have arguments of type FeatureStructure.
These could be changed to take type "T".

  

  

No I do not think that they could be changed to take type T.
Lets take the case of the contains method.
The javadoc says:
"Check if the index contains an element equal to the given feature structure
according to the
ordering of the index. Note that this is in general not the same as feature
structure identity."
and it for the param fs it says "The FS we're looking for.". There is no
place where
it says that contains can only be called for FSes which have the type of the
index.

The change of the parameter from FeatureStructure to T would also change
the contract of the method a little, because then it would not be possible
anymore
to pass a FeatureStructure which has not type T.


  


I agree.  It's sometimes useful to call FSIterator.moveTo method and
pass an FS of a Type other than the one that the index was defined
over, as part of implementing something like a subiterator.
  

  

I agree with the case where it's the "bag" index being used, because
that uses a test which works on all feature structures.

However, for Set and Sorted, the implication of passing a FS which is
not in the type hierarchy is, according to the JavaDocs, "undefined". 
This is because the code assumes the layout of the features and their

values is appropriate for the type.  In other words, if the type of some
key was a string, it might take a value from the main int heap and use
it as an index into the string array - and if the int heap object was
not the right type, it could pull an arbitrary value from that slot, and
end up throwing an array index out of bounds exception.  When I looked,
it didn't appear to me that the code checked for any kind of type
subsumption before proceeding... (but I may have missed it...)

It could turn out that the data (whatever is being pulled) would just
happen to "match", even though the types are different.

Because this is an undefined operation that could throw various kinds of
runtime errors, or return an equal match where none really exists, I
think it should not be allowed, for set and sorted indexes.
  



Or - we should change the implementation of these methods to check for
type subsumption of the arguments before calling the low-level compare code.
  

The implementation should handle this case correctly, thats the only
way to fix it. With generification it can not be fixed because a client 
would

still be allowed to use a FSIndex as a raw type.

Jörn


Re: Generification of FSIndex

2009-08-10 Thread Adam Lally
On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote:
> Adam Lally wrote:
>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>>
>>> Marshall Schor wrote:
>>>
 The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned.


 The contains and find methods have arguments of type FeatureStructure.
 These could be changed to take type "T".


>>> No I do not think that they could be changed to take type T.
>>> Lets take the case of the contains method.
>>> The javadoc says:
>>> "Check if the index contains an element equal to the given feature structure
>>> according to the
>>> ordering of the index. Note that this is in general not the same as feature
>>> structure identity."
>>> and it for the param fs it says "The FS we're looking for.". There is no
>>> place where
>>> it says that contains can only be called for FSes which have the type of the
>>> index.
>>>
>>> The change of the parameter from FeatureStructure to T would also change
>>> the contract of the method a little, because then it would not be possible
>>> anymore
>>> to pass a FeatureStructure which has not type T.
>>>
>>>
>>
>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>> pass an FS of a Type other than the one that the index was defined
>> over, as part of implementing something like a subiterator.
>>
>
> I agree with the case where it's the "bag" index being used, because
> that uses a test which works on all feature structures.
>
> However, for Set and Sorted, the implication of passing a FS which is
> not in the type hierarchy is, according to the JavaDocs, "undefined".
> This is because the code assumes the layout of the features and their
> values is appropriate for the type.  In other words, if the type of some
> key was a string, it might take a value from the main int heap and use
> it as an index into the string array - and if the int heap object was
> not the right type, it could pull an arbitrary value from that slot, and
> end up throwing an array index out of bounds exception.  When I looked,
> it didn't appear to me that the code checked for any kind of type
> subsumption before proceeding... (but I may have missed it...)
>
> It could turn out that the data (whatever is being pulled) would just
> happen to "match", even though the types are different.
>
> Because this is an undefined operation that could throw various kinds of
> runtime errors, or return an equal match where none really exists, I
> think it should not be allowed, for set and sorted indexes.
>

In the case of the AnnotationIndex, the object that you pass to
FSIterator.moveTo must be a subtype of Annotation (else you would get
all the weird effects that you describe).  But it is still valid for a
user to do:
FSIndex index = cas.getAnnotationIndex(annotType1);
index.moveTo(annotType2);

where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.

In general, the object that you pass to moveTo() must be a subtype of
the type that was in the index definition (in the user's descriptor,
or for the case of the built-in AnnotationIndex,
uima.tcas.Annotation).

  -Adam


Re: Generification of FSIndex

2009-08-10 Thread Marshall Schor


Marshall Schor wrote:
> Adam Lally wrote:
>   
>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>>   
>> 
>>> Marshall Schor wrote:
>>> 
>>>   
 The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned.


 The contains and find methods have arguments of type FeatureStructure.
 These could be changed to take type "T".

   
 
>>> No I do not think that they could be changed to take type T.
>>> Lets take the case of the contains method.
>>> The javadoc says:
>>> "Check if the index contains an element equal to the given feature structure
>>> according to the
>>> ordering of the index. Note that this is in general not the same as feature
>>> structure identity."
>>> and it for the param fs it says "The FS we're looking for.". There is no
>>> place where
>>> it says that contains can only be called for FSes which have the type of the
>>> index.
>>>
>>> The change of the parameter from FeatureStructure to T would also change
>>> the contract of the method a little, because then it would not be possible
>>> anymore
>>> to pass a FeatureStructure which has not type T.
>>>
>>> 
>>>   
>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>> pass an FS of a Type other than the one that the index was defined
>> over, as part of implementing something like a subiterator.
>>   
>> 
>
> I agree with the case where it's the "bag" index being used, because
> that uses a test which works on all feature structures.
>
> However, for Set and Sorted, the implication of passing a FS which is
> not in the type hierarchy is, according to the JavaDocs, "undefined". 
> This is because the code assumes the layout of the features and their
> values is appropriate for the type.  In other words, if the type of some
> key was a string, it might take a value from the main int heap and use
> it as an index into the string array - and if the int heap object was
> not the right type, it could pull an arbitrary value from that slot, and
> end up throwing an array index out of bounds exception.  When I looked,
> it didn't appear to me that the code checked for any kind of type
> subsumption before proceeding... (but I may have missed it...)
>
> It could turn out that the data (whatever is being pulled) would just
> happen to "match", even though the types are different.
>
> Because this is an undefined operation that could throw various kinds of
> runtime errors, or return an equal match where none really exists, I
> think it should not be allowed, for set and sorted indexes.
>   

Or - we should change the implementation of these methods to check for
type subsumption of the arguments before calling the low-level compare code.

-Marshall
> -Marshall
>   
>>   -Adam
>>
>>
>>   
>> 
>
>
>   


Re: Generification of FSIndex

2009-08-10 Thread Marshall Schor


Adam Lally wrote:
> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
>   
>> Marshall Schor wrote:
>> 
>>> The generification of FSIndex currently specifies one type, >> FeatureStructure> that is the type of item being returned.
>>>
>>>
>>> The contains and find methods have arguments of type FeatureStructure.
>>> These could be changed to take type "T".
>>>
>>>   
>> No I do not think that they could be changed to take type T.
>> Lets take the case of the contains method.
>> The javadoc says:
>> "Check if the index contains an element equal to the given feature structure
>> according to the
>> ordering of the index. Note that this is in general not the same as feature
>> structure identity."
>> and it for the param fs it says "The FS we're looking for.". There is no
>> place where
>> it says that contains can only be called for FSes which have the type of the
>> index.
>>
>> The change of the parameter from FeatureStructure to T would also change
>> the contract of the method a little, because then it would not be possible
>> anymore
>> to pass a FeatureStructure which has not type T.
>>
>> 
>
> I agree.  It's sometimes useful to call FSIterator.moveTo method and
> pass an FS of a Type other than the one that the index was defined
> over, as part of implementing something like a subiterator.
>   

I agree with the case where it's the "bag" index being used, because
that uses a test which works on all feature structures.

However, for Set and Sorted, the implication of passing a FS which is
not in the type hierarchy is, according to the JavaDocs, "undefined". 
This is because the code assumes the layout of the features and their
values is appropriate for the type.  In other words, if the type of some
key was a string, it might take a value from the main int heap and use
it as an index into the string array - and if the int heap object was
not the right type, it could pull an arbitrary value from that slot, and
end up throwing an array index out of bounds exception.  When I looked,
it didn't appear to me that the code checked for any kind of type
subsumption before proceeding... (but I may have missed it...)

It could turn out that the data (whatever is being pulled) would just
happen to "match", even though the types are different.

Because this is an undefined operation that could throw various kinds of
runtime errors, or return an equal match where none really exists, I
think it should not be allowed, for set and sorted indexes.

-Marshall
>   -Adam
>
>
>   


Re: Generification of FSIndex

2009-08-10 Thread Adam Lally
On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote:
> Marshall Schor wrote:
>>
>> The generification of FSIndex currently specifies one type, > FeatureStructure> that is the type of item being returned.
>>
>>
>> The contains and find methods have arguments of type FeatureStructure.
>> These could be changed to take type "T".
>>
>
> No I do not think that they could be changed to take type T.
> Lets take the case of the contains method.
> The javadoc says:
> "Check if the index contains an element equal to the given feature structure
> according to the
> ordering of the index. Note that this is in general not the same as feature
> structure identity."
> and it for the param fs it says "The FS we're looking for.". There is no
> place where
> it says that contains can only be called for FSes which have the type of the
> index.
>
> The change of the parameter from FeatureStructure to T would also change
> the contract of the method a little, because then it would not be possible
> anymore
> to pass a FeatureStructure which has not type T.
>

I agree.  It's sometimes useful to call FSIterator.moveTo method and
pass an FS of a Type other than the one that the index was defined
over, as part of implementing something like a subiterator.
  -Adam


Re: Generification of FSIndex

2009-08-10 Thread Jörn Kottmann

Jörn Kottmann wrote:

No I do not think that they could be changed to take type T.
Lets take the case of the contains method.
The javadoc says:
"Check if the index contains an element equal to the given feature 
structure according to the
ordering of the index. Note that this is in general not the same as 
feature structure identity."
and it for the param fs it says "The FS we're looking for.". There is 
no place where
it says that contains can only be called for FSes which have the type 
of the index.


The change of the parameter from FeatureStructure to T would also change
the contract of the method a little, because then it would not be 
possible anymore

to pass a FeatureStructure which has not type T.

The java developers had similar case in the Collection interface.
Collection has a contains method which has still the signature 
contains(Object)

and not contains(E).

See here:
http://java.sun.com/javase/6/docs/api/java/util/Collection.html

Jörn


Re: Generification of FSIndex

2009-08-10 Thread Jörn Kottmann

Marshall Schor wrote:

The generification of FSIndex currently specifies one type,  that is the type of item being returned.


The contains and find methods have arguments of type FeatureStructure. 
These could be changed to take type "T".
  

No I do not think that they could be changed to take type T.
Lets take the case of the contains method.
The javadoc says:
"Check if the index contains an element equal to the given feature 
structure according to the
ordering of the index. Note that this is in general not the same as 
feature structure identity."
and it for the param fs it says "The FS we're looking for.". There is no 
place where
it says that contains can only be called for FSes which have the type of 
the index.


The change of the parameter from FeatureStructure to T would also change
the contract of the method a little, because then it would not be 
possible anymore

to pass a FeatureStructure which has not type T.

Jörn