Re: Generification of FSIndex
Jörn Kottmann wrote: > Marshall Schor wrote: >> Jörn Kottmann wrote: >> >>> Marshall Schor wrote: >>> OK - sounds like this won't be on the fix list for 2.3.0 ;-) And, given the value of being able to do this in selected cases, it doesn't sound right, either, to change the arguments to find, moveTo, etc from FeatureStructure. So let's leave this as is. >>> I created a jira issue to change moveTo back to FeatureStructure >>> argument, >>> it was not changed yet because I misunderstood you. >>> >>> Which other methods are you referring to ? >>> >> >> The other methods that take a FeatureStructure for purposes of >> specifying a location in an index: >> >> For these, I'm not sure if the test for equality includes testing that >> the "types" are the same, or just is testing (for sorted and set >> indexes) that the declared "keys" are "equal". >> >> The methods are: >> >> contains >> find >> compare (although the javadoc says the result is undefined if the FS's >> are not of the type of the index, but I wonder if that's checked?) >> iterator(fs) >> > > Here are the declarations of these methods: > > boolean contains(FeatureStructure fs); > FeatureStructure find(FeatureStructure fs); > Maybe we should return T here ? +1, since FSIndex which has this method is already parameterized on T - it should be returning only things of type T. > > FeatureStructure find(FeatureStructure fs); > I think it makes sense to return a FS of type T here. +1 > > int compare(FeatureStructure fs1, FeatureStructure fs2); > If we use T here, it would enforce the API specification a little, > because at least the java type must match. I think we should keep this, because it could change the behavior in this case: Suppose you have an index over Token, and you have an instance of Sentence, and want to see if a token is in a sentence. Assuming the Token index uses the same keys as the base Annotation Index, doing a compare between Token and Sentence would be useful and make sense. -Marshall
Re: Generification of FSIndex
Marshall Schor wrote: Jörn Kottmann wrote: Marshall Schor wrote: OK - sounds like this won't be on the fix list for 2.3.0 ;-) And, given the value of being able to do this in selected cases, it doesn't sound right, either, to change the arguments to find, moveTo, etc from FeatureStructure. So let's leave this as is. I created a jira issue to change moveTo back to FeatureStructure argument, it was not changed yet because I misunderstood you. Which other methods are you referring to ? The other methods that take a FeatureStructure for purposes of specifying a location in an index: For these, I'm not sure if the test for equality includes testing that the "types" are the same, or just is testing (for sorted and set indexes) that the declared "keys" are "equal". The methods are: contains find compare (although the javadoc says the result is undefined if the FS's are not of the type of the index, but I wonder if that's checked?) iterator(fs) Here are the declarations of these methods: boolean contains(FeatureStructure fs); FeatureStructure find(FeatureStructure fs); Maybe we should return T here ? FeatureStructure find(FeatureStructure fs); I think it makes sense to return a FS of type T here. int compare(FeatureStructure fs1, FeatureStructure fs2); If we use T here, it would enforce the API specification a little, because at least the java type must match. Jörn
Re: Generification of FSIndex
Marshall Schor wrote: > > Jörn Kottmann wrote: >> Marshall Schor wrote: >>> OK - sounds like this won't be on the fix list for 2.3.0 ;-) >>> And, given the value of being able to do this in selected cases, it >>> doesn't sound right, either, to change the arguments to find, moveTo, >>> etc from FeatureStructure. >>> So let's leave this as is. >> I created a jira issue to change moveTo back to FeatureStructure >> argument, >> it was not changed yet because I misunderstood you. >> >> Which other methods are you referring to ? > > The other methods that take a FeatureStructure for purposes of > specifying a location in an index: > > For these, I'm not sure if the test for equality includes testing that > the "types" are the same, or just is testing (for sorted and set > indexes) that the declared "keys" are "equal". > > The methods are: > > contains > find > compare (although the javadoc says the result is undefined if the FS's > are not of the type of the index, but I wonder if that's checked?) I looked yesterday, there is no check. There is definitely room for improvement here. I'll open a Jira issue, but I'm not proposing to do anything for this release. --Thilo > iterator(fs) > > -Marshall > >> Jörn >> >>
Re: Generification of FSIndex
Jörn Kottmann wrote: > Marshall Schor wrote: >> OK - sounds like this won't be on the fix list for 2.3.0 ;-) >> And, given the value of being able to do this in selected cases, it >> doesn't sound right, either, to change the arguments to find, moveTo, >> etc from FeatureStructure. >> So let's leave this as is. > I created a jira issue to change moveTo back to FeatureStructure > argument, > it was not changed yet because I misunderstood you. > > Which other methods are you referring to ? The other methods that take a FeatureStructure for purposes of specifying a location in an index: For these, I'm not sure if the test for equality includes testing that the "types" are the same, or just is testing (for sorted and set indexes) that the declared "keys" are "equal". The methods are: contains find compare (although the javadoc says the result is undefined if the FS's are not of the type of the index, but I wonder if that's checked?) iterator(fs) -Marshall > Jörn > >
Re: Generification of FSIndex
Marshall Schor wrote: OK - sounds like this won't be on the fix list for 2.3.0 ;-) And, given the value of being able to do this in selected cases, it doesn't sound right, either, to change the arguments to find, moveTo, etc from FeatureStructure. So let's leave this as is. I created a jira issue to change moveTo back to FeatureStructure argument, it was not changed yet because I misunderstood you. Which other methods are you referring to ? Jörn
Re: Generification of FSIndex
Thilo Goetz wrote: > Marshall Schor wrote: > >> Thilo Goetz wrote: >> >>> Marshall Schor wrote: >>> >>> Thilo Goetz wrote: > Adam Lally wrote: > > > >> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: >> >> >> >>> Adam Lally wrote: >>> >>> >>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: > Marshall Schor wrote: > > > > >> The generification of FSIndex currently specifies one type, > extends >> FeatureStructure> that is the type of item being returned. >> >> >> The contains and find methods have arguments of type >> FeatureStructure. >> These could be changed to take type "T". >> >> >> >> >> > No I do not think that they could be changed to take type T. > Lets take the case of the contains method. > The javadoc says: > "Check if the index contains an element equal to the given feature > structure > according to the > ordering of the index. Note that this is in general not the same as > feature > structure identity." > and it for the param fs it says "The FS we're looking for.". There is > no > place where > it says that contains can only be called for FSes which have the type > of the > index. > > The change of the parameter from FeatureStructure to T would also > change > the contract of the method a little, because then it would not be > possible > anymore > to pass a FeatureStructure which has not type T. > > > > > I agree. It's sometimes useful to call FSIterator.moveTo method and pass an FS of a Type other than the one that the index was defined over, as part of implementing something like a subiterator. >>> I agree with the case where it's the "bag" index being used, because >>> that uses a test which works on all feature structures. >>> >>> However, for Set and Sorted, the implication of passing a FS which is >>> not in the type hierarchy is, according to the JavaDocs, "undefined". >>> This is because the code assumes the layout of the features and their >>> values is appropriate for the type. In other words, if the type of some >>> key was a string, it might take a value from the main int heap and use >>> it as an index into the string array - and if the int heap object was >>> not the right type, it could pull an arbitrary value from that slot, and >>> end up throwing an array index out of bounds exception. When I looked, >>> it didn't appear to me that the code checked for any kind of type >>> subsumption before proceeding... (but I may have missed it...) >>> >>> It could turn out that the data (whatever is being pulled) would just >>> happen to "match", even though the types are different. >>> >>> Because this is an undefined operation that could throw various kinds of >>> runtime errors, or return an equal match where none really exists, I >>> think it should not be allowed, for set and sorted indexes. >>> >>> >>> >>> >> In the case of the AnnotationIndex, the object that you pass to >> FSIterator.moveTo must be a subtype of Annotation (else you would get >> all the weird effects that you describe). But it is still valid for a >> user to do: >> FSIndex index = cas.getAnnotationIndex(annotType1); >> index.moveTo(annotType2); >> >> where annotType1 and annotType2 are both subtypes of >> uima.tcas.Annotation. >> >> In general, the object that you pass to moveTo() must be a subtype of >> the type that was in the index definition (in the user's descriptor, >> or for the case of the built-in AnnotationIndex, >> uima.tcas.Annotation). >> >> -Adam >> >> >> > One concrete example of Adam's point: suppose you have a sentenceFS > and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will > position the token iterator at the first word in the sentence (modulo > some subtleties that are beside the point here). Very useful. > > > Yes. This works because the "index" being iterated over is the general annotation index (the one that
Re: Generification of FSIndex
Marshall Schor wrote: > > Thilo Goetz wrote: >> Marshall Schor wrote: >> >>> Thilo Goetz wrote: >>> Adam Lally wrote: > On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: > > >> Adam Lally wrote: >> >> >>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann >>> wrote: >>> >>> >>> Marshall Schor wrote: > The generification of FSIndex currently specifies one type, FeatureStructure> that is the type of item being returned. > > > The contains and find methods have arguments of type FeatureStructure. > These could be changed to take type "T". > > > > No I do not think that they could be changed to take type T. Lets take the case of the contains method. The javadoc says: "Check if the index contains an element equal to the given feature structure according to the ordering of the index. Note that this is in general not the same as feature structure identity." and it for the param fs it says "The FS we're looking for.". There is no place where it says that contains can only be called for FSes which have the type of the index. The change of the parameter from FeatureStructure to T would also change the contract of the method a little, because then it would not be possible anymore to pass a FeatureStructure which has not type T. >>> I agree. It's sometimes useful to call FSIterator.moveTo method and >>> pass an FS of a Type other than the one that the index was defined >>> over, as part of implementing something like a subiterator. >>> >>> >>> >> I agree with the case where it's the "bag" index being used, because >> that uses a test which works on all feature structures. >> >> However, for Set and Sorted, the implication of passing a FS which is >> not in the type hierarchy is, according to the JavaDocs, "undefined". >> This is because the code assumes the layout of the features and their >> values is appropriate for the type. In other words, if the type of some >> key was a string, it might take a value from the main int heap and use >> it as an index into the string array - and if the int heap object was >> not the right type, it could pull an arbitrary value from that slot, and >> end up throwing an array index out of bounds exception. When I looked, >> it didn't appear to me that the code checked for any kind of type >> subsumption before proceeding... (but I may have missed it...) >> >> It could turn out that the data (whatever is being pulled) would just >> happen to "match", even though the types are different. >> >> Because this is an undefined operation that could throw various kinds of >> runtime errors, or return an equal match where none really exists, I >> think it should not be allowed, for set and sorted indexes. >> >> >> > In the case of the AnnotationIndex, the object that you pass to > FSIterator.moveTo must be a subtype of Annotation (else you would get > all the weird effects that you describe). But it is still valid for a > user to do: > FSIndex index = cas.getAnnotationIndex(annotType1); > index.moveTo(annotType2); > > where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. > > In general, the object that you pass to moveTo() must be a subtype of > the type that was in the index definition (in the user's descriptor, > or for the case of the built-in AnnotationIndex, > uima.tcas.Annotation). > > -Adam > > One concrete example of Adam's point: suppose you have a sentenceFS and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will position the token iterator at the first word in the sentence (modulo some subtleties that are beside the point here). Very useful. >>> Yes. This works because the "index" being iterated over is the general >>> annotation index (the one that's built-it) - and the presumeption is >>> that "token" and "sentenceFS" are both subtypes of AnnotationFS. >>> >>> Is there any reason *not* to add a check to the various methods in >>> indexing that take one or more Feature Structures, to see if they are >>> being passed a subtype of the type being indexed (except for bag indexes)? >>> >> Sure. Suppose B < A, and we have an index on B, but the ordering >> relation is defined in terms of features that B inher
Re: Generification of FSIndex
Thilo Goetz wrote: > Marshall Schor wrote: > >> Thilo Goetz wrote: >> >>> Adam Lally wrote: >>> >>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: > Adam Lally wrote: > > >> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: >> >> >> >>> Marshall Schor wrote: >>> >>> >>> The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned. The contains and find methods have arguments of type FeatureStructure. These could be changed to take type "T". >>> No I do not think that they could be changed to take type T. >>> Lets take the case of the contains method. >>> The javadoc says: >>> "Check if the index contains an element equal to the given feature >>> structure >>> according to the >>> ordering of the index. Note that this is in general not the same as >>> feature >>> structure identity." >>> and it for the param fs it says "The FS we're looking for.". There is no >>> place where >>> it says that contains can only be called for FSes which have the type >>> of the >>> index. >>> >>> The change of the parameter from FeatureStructure to T would also change >>> the contract of the method a little, because then it would not be >>> possible >>> anymore >>> to pass a FeatureStructure which has not type T. >>> >>> >>> >>> >> I agree. It's sometimes useful to call FSIterator.moveTo method and >> pass an FS of a Type other than the one that the index was defined >> over, as part of implementing something like a subiterator. >> >> >> > I agree with the case where it's the "bag" index being used, because > that uses a test which works on all feature structures. > > However, for Set and Sorted, the implication of passing a FS which is > not in the type hierarchy is, according to the JavaDocs, "undefined". > This is because the code assumes the layout of the features and their > values is appropriate for the type. In other words, if the type of some > key was a string, it might take a value from the main int heap and use > it as an index into the string array - and if the int heap object was > not the right type, it could pull an arbitrary value from that slot, and > end up throwing an array index out of bounds exception. When I looked, > it didn't appear to me that the code checked for any kind of type > subsumption before proceeding... (but I may have missed it...) > > It could turn out that the data (whatever is being pulled) would just > happen to "match", even though the types are different. > > Because this is an undefined operation that could throw various kinds of > runtime errors, or return an equal match where none really exists, I > think it should not be allowed, for set and sorted indexes. > > > In the case of the AnnotationIndex, the object that you pass to FSIterator.moveTo must be a subtype of Annotation (else you would get all the weird effects that you describe). But it is still valid for a user to do: FSIndex index = cas.getAnnotationIndex(annotType1); index.moveTo(annotType2); where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. In general, the object that you pass to moveTo() must be a subtype of the type that was in the index definition (in the user's descriptor, or for the case of the built-in AnnotationIndex, uima.tcas.Annotation). -Adam >>> One concrete example of Adam's point: suppose you have a sentenceFS >>> and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will >>> position the token iterator at the first word in the sentence (modulo >>> some subtleties that are beside the point here). Very useful. >>> >>> >> Yes. This works because the "index" being iterated over is the general >> annotation index (the one that's built-it) - and the presumeption is >> that "token" and "sentenceFS" are both subtypes of AnnotationFS. >> >> Is there any reason *not* to add a check to the various methods in >> indexing that take one or more Feature Structures, to see if they are >> being passed a subtype of the type being indexed (except for bag indexes)? >> > > Sure. Suppose B < A, and we have an index on B, but the ordering > relation is defined in terms of features that B inherits from >A. > Then I can use an A to position iterators on the B index. This > works now, and I don't see why we should prohibit it. > Right, I knew I wa
Re: Generification of FSIndex
Marshall Schor wrote: > > Thilo Goetz wrote: >> Adam Lally wrote: >> >>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: >>> Adam Lally wrote: > On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: > > >> Marshall Schor wrote: >> >> >>> The generification of FSIndex currently specifies one type, >> FeatureStructure> that is the type of item being returned. >>> >>> >>> The contains and find methods have arguments of type FeatureStructure. >>> These could be changed to take type "T". >>> >>> >>> >> No I do not think that they could be changed to take type T. >> Lets take the case of the contains method. >> The javadoc says: >> "Check if the index contains an element equal to the given feature >> structure >> according to the >> ordering of the index. Note that this is in general not the same as >> feature >> structure identity." >> and it for the param fs it says "The FS we're looking for.". There is no >> place where >> it says that contains can only be called for FSes which have the type of >> the >> index. >> >> The change of the parameter from FeatureStructure to T would also change >> the contract of the method a little, because then it would not be >> possible >> anymore >> to pass a FeatureStructure which has not type T. >> >> >> > I agree. It's sometimes useful to call FSIterator.moveTo method and > pass an FS of a Type other than the one that the index was defined > over, as part of implementing something like a subiterator. > > I agree with the case where it's the "bag" index being used, because that uses a test which works on all feature structures. However, for Set and Sorted, the implication of passing a FS which is not in the type hierarchy is, according to the JavaDocs, "undefined". This is because the code assumes the layout of the features and their values is appropriate for the type. In other words, if the type of some key was a string, it might take a value from the main int heap and use it as an index into the string array - and if the int heap object was not the right type, it could pull an arbitrary value from that slot, and end up throwing an array index out of bounds exception. When I looked, it didn't appear to me that the code checked for any kind of type subsumption before proceeding... (but I may have missed it...) It could turn out that the data (whatever is being pulled) would just happen to "match", even though the types are different. Because this is an undefined operation that could throw various kinds of runtime errors, or return an equal match where none really exists, I think it should not be allowed, for set and sorted indexes. >>> In the case of the AnnotationIndex, the object that you pass to >>> FSIterator.moveTo must be a subtype of Annotation (else you would get >>> all the weird effects that you describe). But it is still valid for a >>> user to do: >>> FSIndex index = cas.getAnnotationIndex(annotType1); >>> index.moveTo(annotType2); >>> >>> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. >>> >>> In general, the object that you pass to moveTo() must be a subtype of >>> the type that was in the index definition (in the user's descriptor, >>> or for the case of the built-in AnnotationIndex, >>> uima.tcas.Annotation). >>> >>> -Adam >>> >> One concrete example of Adam's point: suppose you have a sentenceFS >> and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will >> position the token iterator at the first word in the sentence (modulo >> some subtleties that are beside the point here). Very useful. >> > > Yes. This works because the "index" being iterated over is the general > annotation index (the one that's built-it) - and the presumeption is > that "token" and "sentenceFS" are both subtypes of AnnotationFS. > > Is there any reason *not* to add a check to the various methods in > indexing that take one or more Feature Structures, to see if they are > being passed a subtype of the type being indexed (except for bag indexes)? Sure. Suppose B < A, and we have an index on B, but the ordering relation is defined in terms of features that B inherits from A. Then I can use an A to position iterators on the B index. This works now, and I don't see why we should prohibit it. > -Marshall > > >> --Thilo >> >> >> >>
Re: Generification of FSIndex
Thilo Goetz wrote: > Adam Lally wrote: > >> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: >> >>> Adam Lally wrote: >>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: > Marshall Schor wrote: > > >> The generification of FSIndex currently specifies one type, > FeatureStructure> that is the type of item being returned. >> >> >> The contains and find methods have arguments of type FeatureStructure. >> These could be changed to take type "T". >> >> >> > No I do not think that they could be changed to take type T. > Lets take the case of the contains method. > The javadoc says: > "Check if the index contains an element equal to the given feature > structure > according to the > ordering of the index. Note that this is in general not the same as > feature > structure identity." > and it for the param fs it says "The FS we're looking for.". There is no > place where > it says that contains can only be called for FSes which have the type of > the > index. > > The change of the parameter from FeatureStructure to T would also change > the contract of the method a little, because then it would not be possible > anymore > to pass a FeatureStructure which has not type T. > > > I agree. It's sometimes useful to call FSIterator.moveTo method and pass an FS of a Type other than the one that the index was defined over, as part of implementing something like a subiterator. >>> I agree with the case where it's the "bag" index being used, because >>> that uses a test which works on all feature structures. >>> >>> However, for Set and Sorted, the implication of passing a FS which is >>> not in the type hierarchy is, according to the JavaDocs, "undefined". >>> This is because the code assumes the layout of the features and their >>> values is appropriate for the type. In other words, if the type of some >>> key was a string, it might take a value from the main int heap and use >>> it as an index into the string array - and if the int heap object was >>> not the right type, it could pull an arbitrary value from that slot, and >>> end up throwing an array index out of bounds exception. When I looked, >>> it didn't appear to me that the code checked for any kind of type >>> subsumption before proceeding... (but I may have missed it...) >>> >>> It could turn out that the data (whatever is being pulled) would just >>> happen to "match", even though the types are different. >>> >>> Because this is an undefined operation that could throw various kinds of >>> runtime errors, or return an equal match where none really exists, I >>> think it should not be allowed, for set and sorted indexes. >>> >>> >> In the case of the AnnotationIndex, the object that you pass to >> FSIterator.moveTo must be a subtype of Annotation (else you would get >> all the weird effects that you describe). But it is still valid for a >> user to do: >> FSIndex index = cas.getAnnotationIndex(annotType1); >> index.moveTo(annotType2); >> >> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. >> >> In general, the object that you pass to moveTo() must be a subtype of >> the type that was in the index definition (in the user's descriptor, >> or for the case of the built-in AnnotationIndex, >> uima.tcas.Annotation). >> >> -Adam >> > > One concrete example of Adam's point: suppose you have a sentenceFS > and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will > position the token iterator at the first word in the sentence (modulo > some subtleties that are beside the point here). Very useful. > Yes. This works because the "index" being iterated over is the general annotation index (the one that's built-it) - and the presumeption is that "token" and "sentenceFS" are both subtypes of AnnotationFS. Is there any reason *not* to add a check to the various methods in indexing that take one or more Feature Structures, to see if they are being passed a subtype of the type being indexed (except for bag indexes)? -Marshall > --Thilo > > > >
Re: Generification of FSIndex
Adam Lally wrote: > On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: >> Adam Lally wrote: >>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: >>> Marshall Schor wrote: > The generification of FSIndex currently specifies one type, FeatureStructure> that is the type of item being returned. > > > The contains and find methods have arguments of type FeatureStructure. > These could be changed to take type "T". > > No I do not think that they could be changed to take type T. Lets take the case of the contains method. The javadoc says: "Check if the index contains an element equal to the given feature structure according to the ordering of the index. Note that this is in general not the same as feature structure identity." and it for the param fs it says "The FS we're looking for.". There is no place where it says that contains can only be called for FSes which have the type of the index. The change of the parameter from FeatureStructure to T would also change the contract of the method a little, because then it would not be possible anymore to pass a FeatureStructure which has not type T. >>> I agree. It's sometimes useful to call FSIterator.moveTo method and >>> pass an FS of a Type other than the one that the index was defined >>> over, as part of implementing something like a subiterator. >>> >> I agree with the case where it's the "bag" index being used, because >> that uses a test which works on all feature structures. >> >> However, for Set and Sorted, the implication of passing a FS which is >> not in the type hierarchy is, according to the JavaDocs, "undefined". >> This is because the code assumes the layout of the features and their >> values is appropriate for the type. In other words, if the type of some >> key was a string, it might take a value from the main int heap and use >> it as an index into the string array - and if the int heap object was >> not the right type, it could pull an arbitrary value from that slot, and >> end up throwing an array index out of bounds exception. When I looked, >> it didn't appear to me that the code checked for any kind of type >> subsumption before proceeding... (but I may have missed it...) >> >> It could turn out that the data (whatever is being pulled) would just >> happen to "match", even though the types are different. >> >> Because this is an undefined operation that could throw various kinds of >> runtime errors, or return an equal match where none really exists, I >> think it should not be allowed, for set and sorted indexes. >> > > In the case of the AnnotationIndex, the object that you pass to > FSIterator.moveTo must be a subtype of Annotation (else you would get > all the weird effects that you describe). But it is still valid for a > user to do: > FSIndex index = cas.getAnnotationIndex(annotType1); > index.moveTo(annotType2); > > where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. > > In general, the object that you pass to moveTo() must be a subtype of > the type that was in the index definition (in the user's descriptor, > or for the case of the built-in AnnotationIndex, > uima.tcas.Annotation). > > -Adam One concrete example of Adam's point: suppose you have a sentenceFS and a tokenIterator. Then tokenIterator.moveTo(sentenceFS) will position the token iterator at the first word in the sentence (modulo some subtleties that are beside the point here). Very useful. --Thilo
Re: Generification of FSIndex
Marshall Schor wrote: Marshall Schor wrote: Adam Lally wrote: On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: Marshall Schor wrote: The generification of FSIndex currently specifies one type, that is the type of item being returned. The contains and find methods have arguments of type FeatureStructure. These could be changed to take type "T". No I do not think that they could be changed to take type T. Lets take the case of the contains method. The javadoc says: "Check if the index contains an element equal to the given feature structure according to the ordering of the index. Note that this is in general not the same as feature structure identity." and it for the param fs it says "The FS we're looking for.". There is no place where it says that contains can only be called for FSes which have the type of the index. The change of the parameter from FeatureStructure to T would also change the contract of the method a little, because then it would not be possible anymore to pass a FeatureStructure which has not type T. I agree. It's sometimes useful to call FSIterator.moveTo method and pass an FS of a Type other than the one that the index was defined over, as part of implementing something like a subiterator. I agree with the case where it's the "bag" index being used, because that uses a test which works on all feature structures. However, for Set and Sorted, the implication of passing a FS which is not in the type hierarchy is, according to the JavaDocs, "undefined". This is because the code assumes the layout of the features and their values is appropriate for the type. In other words, if the type of some key was a string, it might take a value from the main int heap and use it as an index into the string array - and if the int heap object was not the right type, it could pull an arbitrary value from that slot, and end up throwing an array index out of bounds exception. When I looked, it didn't appear to me that the code checked for any kind of type subsumption before proceeding... (but I may have missed it...) It could turn out that the data (whatever is being pulled) would just happen to "match", even though the types are different. Because this is an undefined operation that could throw various kinds of runtime errors, or return an equal match where none really exists, I think it should not be allowed, for set and sorted indexes. Or - we should change the implementation of these methods to check for type subsumption of the arguments before calling the low-level compare code. The implementation should handle this case correctly, thats the only way to fix it. With generification it can not be fixed because a client would still be allowed to use a FSIndex as a raw type. Jörn
Re: Generification of FSIndex
On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor wrote: > Adam Lally wrote: >> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: >> >>> Marshall Schor wrote: >>> The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned. The contains and find methods have arguments of type FeatureStructure. These could be changed to take type "T". >>> No I do not think that they could be changed to take type T. >>> Lets take the case of the contains method. >>> The javadoc says: >>> "Check if the index contains an element equal to the given feature structure >>> according to the >>> ordering of the index. Note that this is in general not the same as feature >>> structure identity." >>> and it for the param fs it says "The FS we're looking for.". There is no >>> place where >>> it says that contains can only be called for FSes which have the type of the >>> index. >>> >>> The change of the parameter from FeatureStructure to T would also change >>> the contract of the method a little, because then it would not be possible >>> anymore >>> to pass a FeatureStructure which has not type T. >>> >>> >> >> I agree. It's sometimes useful to call FSIterator.moveTo method and >> pass an FS of a Type other than the one that the index was defined >> over, as part of implementing something like a subiterator. >> > > I agree with the case where it's the "bag" index being used, because > that uses a test which works on all feature structures. > > However, for Set and Sorted, the implication of passing a FS which is > not in the type hierarchy is, according to the JavaDocs, "undefined". > This is because the code assumes the layout of the features and their > values is appropriate for the type. In other words, if the type of some > key was a string, it might take a value from the main int heap and use > it as an index into the string array - and if the int heap object was > not the right type, it could pull an arbitrary value from that slot, and > end up throwing an array index out of bounds exception. When I looked, > it didn't appear to me that the code checked for any kind of type > subsumption before proceeding... (but I may have missed it...) > > It could turn out that the data (whatever is being pulled) would just > happen to "match", even though the types are different. > > Because this is an undefined operation that could throw various kinds of > runtime errors, or return an equal match where none really exists, I > think it should not be allowed, for set and sorted indexes. > In the case of the AnnotationIndex, the object that you pass to FSIterator.moveTo must be a subtype of Annotation (else you would get all the weird effects that you describe). But it is still valid for a user to do: FSIndex index = cas.getAnnotationIndex(annotType1); index.moveTo(annotType2); where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation. In general, the object that you pass to moveTo() must be a subtype of the type that was in the index definition (in the user's descriptor, or for the case of the built-in AnnotationIndex, uima.tcas.Annotation). -Adam
Re: Generification of FSIndex
Marshall Schor wrote: > Adam Lally wrote: > >> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: >> >> >>> Marshall Schor wrote: >>> >>> The generification of FSIndex currently specifies one type, >>> FeatureStructure> that is the type of item being returned. The contains and find methods have arguments of type FeatureStructure. These could be changed to take type "T". >>> No I do not think that they could be changed to take type T. >>> Lets take the case of the contains method. >>> The javadoc says: >>> "Check if the index contains an element equal to the given feature structure >>> according to the >>> ordering of the index. Note that this is in general not the same as feature >>> structure identity." >>> and it for the param fs it says "The FS we're looking for.". There is no >>> place where >>> it says that contains can only be called for FSes which have the type of the >>> index. >>> >>> The change of the parameter from FeatureStructure to T would also change >>> the contract of the method a little, because then it would not be possible >>> anymore >>> to pass a FeatureStructure which has not type T. >>> >>> >>> >> I agree. It's sometimes useful to call FSIterator.moveTo method and >> pass an FS of a Type other than the one that the index was defined >> over, as part of implementing something like a subiterator. >> >> > > I agree with the case where it's the "bag" index being used, because > that uses a test which works on all feature structures. > > However, for Set and Sorted, the implication of passing a FS which is > not in the type hierarchy is, according to the JavaDocs, "undefined". > This is because the code assumes the layout of the features and their > values is appropriate for the type. In other words, if the type of some > key was a string, it might take a value from the main int heap and use > it as an index into the string array - and if the int heap object was > not the right type, it could pull an arbitrary value from that slot, and > end up throwing an array index out of bounds exception. When I looked, > it didn't appear to me that the code checked for any kind of type > subsumption before proceeding... (but I may have missed it...) > > It could turn out that the data (whatever is being pulled) would just > happen to "match", even though the types are different. > > Because this is an undefined operation that could throw various kinds of > runtime errors, or return an equal match where none really exists, I > think it should not be allowed, for set and sorted indexes. > Or - we should change the implementation of these methods to check for type subsumption of the arguments before calling the low-level compare code. -Marshall > -Marshall > >> -Adam >> >> >> >> > > >
Re: Generification of FSIndex
Adam Lally wrote: > On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: > >> Marshall Schor wrote: >> >>> The generification of FSIndex currently specifies one type, >> FeatureStructure> that is the type of item being returned. >>> >>> >>> The contains and find methods have arguments of type FeatureStructure. >>> These could be changed to take type "T". >>> >>> >> No I do not think that they could be changed to take type T. >> Lets take the case of the contains method. >> The javadoc says: >> "Check if the index contains an element equal to the given feature structure >> according to the >> ordering of the index. Note that this is in general not the same as feature >> structure identity." >> and it for the param fs it says "The FS we're looking for.". There is no >> place where >> it says that contains can only be called for FSes which have the type of the >> index. >> >> The change of the parameter from FeatureStructure to T would also change >> the contract of the method a little, because then it would not be possible >> anymore >> to pass a FeatureStructure which has not type T. >> >> > > I agree. It's sometimes useful to call FSIterator.moveTo method and > pass an FS of a Type other than the one that the index was defined > over, as part of implementing something like a subiterator. > I agree with the case where it's the "bag" index being used, because that uses a test which works on all feature structures. However, for Set and Sorted, the implication of passing a FS which is not in the type hierarchy is, according to the JavaDocs, "undefined". This is because the code assumes the layout of the features and their values is appropriate for the type. In other words, if the type of some key was a string, it might take a value from the main int heap and use it as an index into the string array - and if the int heap object was not the right type, it could pull an arbitrary value from that slot, and end up throwing an array index out of bounds exception. When I looked, it didn't appear to me that the code checked for any kind of type subsumption before proceeding... (but I may have missed it...) It could turn out that the data (whatever is being pulled) would just happen to "match", even though the types are different. Because this is an undefined operation that could throw various kinds of runtime errors, or return an equal match where none really exists, I think it should not be allowed, for set and sorted indexes. -Marshall > -Adam > > >
Re: Generification of FSIndex
On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann wrote: > Marshall Schor wrote: >> >> The generification of FSIndex currently specifies one type, > FeatureStructure> that is the type of item being returned. >> >> >> The contains and find methods have arguments of type FeatureStructure. >> These could be changed to take type "T". >> > > No I do not think that they could be changed to take type T. > Lets take the case of the contains method. > The javadoc says: > "Check if the index contains an element equal to the given feature structure > according to the > ordering of the index. Note that this is in general not the same as feature > structure identity." > and it for the param fs it says "The FS we're looking for.". There is no > place where > it says that contains can only be called for FSes which have the type of the > index. > > The change of the parameter from FeatureStructure to T would also change > the contract of the method a little, because then it would not be possible > anymore > to pass a FeatureStructure which has not type T. > I agree. It's sometimes useful to call FSIterator.moveTo method and pass an FS of a Type other than the one that the index was defined over, as part of implementing something like a subiterator. -Adam
Re: Generification of FSIndex
Jörn Kottmann wrote: No I do not think that they could be changed to take type T. Lets take the case of the contains method. The javadoc says: "Check if the index contains an element equal to the given feature structure according to the ordering of the index. Note that this is in general not the same as feature structure identity." and it for the param fs it says "The FS we're looking for.". There is no place where it says that contains can only be called for FSes which have the type of the index. The change of the parameter from FeatureStructure to T would also change the contract of the method a little, because then it would not be possible anymore to pass a FeatureStructure which has not type T. The java developers had similar case in the Collection interface. Collection has a contains method which has still the signature contains(Object) and not contains(E). See here: http://java.sun.com/javase/6/docs/api/java/util/Collection.html Jörn
Re: Generification of FSIndex
Marshall Schor wrote: The generification of FSIndex currently specifies one type, that is the type of item being returned. The contains and find methods have arguments of type FeatureStructure. These could be changed to take type "T". No I do not think that they could be changed to take type T. Lets take the case of the contains method. The javadoc says: "Check if the index contains an element equal to the given feature structure according to the ordering of the index. Note that this is in general not the same as feature structure identity." and it for the param fs it says "The FS we're looking for.". There is no place where it says that contains can only be called for FSes which have the type of the index. The change of the parameter from FeatureStructure to T would also change the contract of the method a little, because then it would not be possible anymore to pass a FeatureStructure which has not type T. Jörn