Re: CAS and CasView redesign - question if all views should share thesame indexes?
I haven't thought this through yet, but here's how I see indexes and their relation to views right now. Let me know if this agrees with your views, or how it differs. The index repository is a set of indexes, at least right now. All it can do is to give you indexes. The index repository of the CAS holds all indexes, a view's repository a subset thereof. An index is retrieved by name (i.e., each index has at least one name). Currently, if there is more than one index with the same indexing spec, but different names, all those names actually point to the same physical index. However, that choice is transparent to the user. I assume this needs to change. If we have more than one view, and they all have annotation indexes, those should be different indexes (at least conceptually, but I think also physically). So views create a simple sort of name space: an index can either belong to the global namespace, or to that of an view. All indexes can be accessed from the CAS, but only global indexes and the indexes for the given view can be accessed from the index repository of that view. --Thilo
Re: CAS and CasView redesign - question if all views should share thesame indexes?
On 12/21/06, Thilo Goetz [EMAIL PROTECTED] wrote: I haven't thought this through yet, but here's how I see indexes and their relation to views right now. Let me know if this agrees with your views, or how it differs. The index repository is a set of indexes, at least right now. All it can do is to give you indexes. The index repository of the CAS holds all indexes, a view's repository a subset thereof. An index is retrieved by name (i.e., each index has at least one name). Currently, if there is more than one index with the same indexing spec, but different names, all those names actually point to the same physical index. However, that choice is transparent to the user. I assume this needs to change. If we have more than one view, and they all have annotation indexes, those should be different indexes (at least conceptually, but I think also physically). So views create a simple sort of name space: an index can either belong to the global namespace, or to that of an view. All indexes can be accessed from the CAS, but only global indexes and the indexes for the given view can be accessed from the index repository of that view. I think this basically makes sense. I want to clarify though, that what we *do* currently have different indexes for each view (for example each view has its own annotation index, which holds the annotations relating to that view's sofa). This is done by replicating the index repository for each view. A key question is do all views have the same set of index _definitions_? Currently, yes - the component descriptors declare index definitions without reference to views, and consequently, for every view we create an instance of each defined index. Your note above, and Marshall's, argue that this shouldn't necessarily be the case -- some indexes may make sense only for certain views (but also, only for certain components, a further complication). I think that probably makes sense, but I'm not sure it's a critical thing to implement now, if we haven't seen a real use case where it's a problem to create instances of indexes in every view even if they're not used. The other key idea here is the global index repository that contains all of the indexes from all views -- we don't currently have anything like that. Take the annotation index as an example, and say there are multiple views each with their own annotation index. I also want to enable operations on the CAS like get me all annotations in all views, or get me all annotations of type Person in all views. To do that we also create an annotation index in the base CAS (the global namespace). I think you could do such a thing in your suggestion; if you had a global annotation index then whenever anyone did view.addFsToIndexes(myAnnot) in any view, myAnnot would also be added to the global annotation index (because you said the global index is visible from the index repository of the view). My idea was a little different, and I guess maybe just an implementation detail. Instead of actually adding myAnnot to a separate, global index, I would just add it to it's own view's index. Then, when someone asks for an iterator off of the global annotation index, I would do a dynamic merge of the annotation indexes in all views (the same way we do merging of indexes across types). But the effect is the same - we have a global index that provides access to everything that was indexed in any view. -Adam
Re: Eclise Annotation Editor
All the code is owned by my employer Calcucare GmbH (www.calcucare.com). I think we have to sign the CCLA too. CCLA and ICLA are now signed and send via facsimile. How show we proceed now ? I can prepare the code at sourceforge for moving to apache this would be: + changing the license form cpl to apache license + clean code from eclipse source code + adapt your code guideline + make a last release at sourceforge + clean-up code Jörn
Re: CAS and CasView redesign - question if all views should share thesame indexes?
Adam Lally wrote: Thilo's stuff snipped I think this basically makes sense. I want to clarify though, that what we *do* currently have different indexes for each view (for example each view has its own annotation index, which holds the annotations relating to that view's sofa). This is done by replicating the index repository for each view. Right. I would like to change that in the course of introducing CasViews. A key question is do all views have the same set of index _definitions_? Currently, yes - the component descriptors declare index definitions without reference to views, and consequently, for every view we create an instance of each defined index. Your note above, and Marshall's, argue that this shouldn't necessarily be the case -- some indexes may make sense only for certain views (but also, only for certain components, a further complication). I think that probably makes sense, but I'm not sure it's a critical thing to implement now, if we haven't seen a real use case where it's a problem to create instances of indexes in every view even if they're not used. Hm, somehow, we need to distinguish between indexes that are global to all views, and those that are local to a view. How do we do that? The other key idea here is the global index repository that contains all of the indexes from all views -- we don't currently have anything like that. Take the annotation index as an example, and say there are multiple views each with their own annotation index. I also want to enable operations on the CAS like get me all annotations in all views, or get me all annotations of type Person in all views. To do that we also create an annotation index in the base CAS (the global namespace). I think you could do such a thing in your suggestion; if you had a global annotation index then whenever anyone did view.addFsToIndexes(myAnnot) in any view, myAnnot would also be added to the global annotation index (because you said the global index is visible from the index repository of the view). My idea was a little different, and I guess maybe just an implementation detail. Instead of actually adding myAnnot to a separate, global index, I would just add it to it's own view's index. Then, when someone asks for an iterator off of the global annotation index, I would do a dynamic merge of the annotation indexes in all views (the same way we do merging of indexes across types). But the effect is the same - we have a global index that provides access to everything that was indexed in any view. I didn't mean to suggest to have duplicate indexes. What I meant to say was, each view should have its own annotation index. In the CAS, each of these annotation indexes can be accessed separately. In fact, I think this is pretty much what you're saying as well. I don't see a use case for a global merged annotation index, other than tooling and utilities. And even for tooling, I think it makes sense to access the annotation for each view separately. If we need to iterate over annotations from different views sorted by their offsets, irrespective of the sofa they point into, we can provide a utility function that does that on the fly. Note however that this implies that one should never do addFsToIndexes() on the CAS with an annotation, as it would be added to all annotation indexes. My suggestion implies that the index repository itself is agnostic of views and sofas. If you add an annotation to the wrong repository, it's your own fault. So to summarize, I would suggest that annotation indexes, for example, only live in views, there is no global annotation index (neither conceptually, nor physically). To access annotations from the CAS, you still need to access view-specific indexes. Non-sofa indexes, on the other hand, only exist in the global namespace. The only rule of visibility is that one view can not access the view-specific indexes of another view. Everything else is always visible. So what I haven't figured out for myself is, what makes a sofa-index a sofa-index? Do we need a declaration, or can we figure this out automatically? --Thilo
Re: Backwards compatibility for CAS API redesign
On 12/21/06, Thilo Goetz [EMAIL PROTECTED] wrote: The idea is that a CAS has a current view (best term I can think of for it right now). Any methods on the CAS that are view-oriented will apply to the current view. This includes but is not limited to: getSofa() getDocumentText() getIndexRepository() addFsToIndexes() createAnnotation(int begin, int end) //needs to know which Sofa to refer to It seems to me that this makes the CAS a view, maybe a deprecated one ;-) Well, the current view isn't fixed. For each annotator that's called the current view may actually be a different physical view. That's why I think the better mental model is of the CAS having several views and at any given time one is designated as the current view. Note that this approach also allows single-sofa application code to work. We have a lot of code that does: AnalysisEngine ae = ... CAS cas = ae.newCAS(); cas.setDocumentText(someString); ae.process(cas); and I think it would be really nice if this continues to work. Very true, if this should cease to work, it would break a lot of code. +1 to preserving this functionality. Excellent. There haven't been nearly enough +1's in this thread so far. :) /** * Gets the global index repository, which provides access to all indexed FS * in the entire CAS. */ FSIndexRepository CAS.getGlobalIndexRepository() /** * Gets the index repository for the current view. */ FSIndexRepository CAS.getIndexRepository() And what about addFsToIndexes()? I guess it should be local to the current view. Yes, for backwards compatibility to work we would need CAS.addFsToIndexes() to apply to the current view only. What I'm not so sure about is, do we need addToAllIndexes()? It doesn't make sense anyway to add annotations to indexes of other views. We need to sort out the meaning of global indexes over on the other thread before we can come to a final answer here. But, I was hoping that if we have CAS.getGlobalIndexRepository() we'd also have CAS.addFsToGlobalIndexes(), just for consistency of naming. I'll just say this once, because I know I won't get through with changing it: to me, the term view in this context has different associations from what we mean by it. When I hear indexes and views, I think databases. In DBs, a view is just a different way to look at your data, and not necessarily a filter. Our views are always filters, and don't make the data accessible in any different way than it was before. On the other hand, our use of the term index is not DB conformant either, so maybe I should just get this association out of my head. I do wonder if other people have the same issue, though. Duly noted. :) Maybe documentation can help... in the chapter that introduces the CAS we can point out that our definitions are not consistent with how those terms in used in databases. -Adam
Re: CAS and CasView redesign - question if all views should share thesame indexes?
Adam Lally wrote: On 12/21/06, Thilo Goetz [EMAIL PROTECTED] wrote: I didn't mean to suggest to have duplicate indexes. What I meant to say was, each view should have its own annotation index. In the CAS, each of these annotation indexes can be accessed separately. In fact, I think this is pretty much what you're saying as well. I don't see a use case for a global merged annotation index, other than tooling and utilities. And even for tooling, I think it makes sense to access the annotation for each view separately. I think maybe we should take a step back and try to agree on a few basic things that we want to be true of CASes and CasViews. Here are the ideas that I had, mostly drawing on the definition in the UIMA spec proposal. (1) The CAS is the container for all of the analysis data (as per the UIMA spec). It must be possible to create FS directly on the CAS and there must be some reasonable way to retrieve the FS in the CAS without having to be concerened wtih views. Agreed. It should be possible to say, on the global index repository: give me all indexes. This will include the global indexes, as well as all view-specific indexes. You can then iterate over all data in all indexes, without knowing anything about views. (2) A CasView is a way of accessing a subset of FS in the CAS. It must be possible to assert than an FS is a _member_ of a CasView, and there must be some reasonable way to retrieve the members of the CasView. In the general CAS, we can only access those FSs that are in some index. If you need to be able to retrieve any FS whatsoever, you need to define a bag index over all types. I would propose to handle views the same way. A FS is a member of a view iff it's contained in one of the indexes specific to the view. The same FS may live in several indexes, belonging to different views. That seems in accordance with the spec proposal. snip If we need to iterate over annotations from different views sorted by their offsets, irrespective of the sofa they point into, we can provide a utility function that does that on the fly. I agree that it doesn't make much sense that if I access annotations irrespective of sofas, they would be sorted by begin, end. However, I still think I might just want to get all annotations (of some type) and not care about the order. You can do that under my proposal: just get all annotation indexes for all views and iterate over each of them in turn. If we need a utility function for that, it's easy enough to do. Note however that this implies that one should never do addFsToIndexes() on the CAS with an annotation, as it would be added to all annotation indexes. My suggestion implies that the index repository itself is agnostic of views and sofas. If you add an annotation to the wrong repository, it's your own fault. This behavior doesn't mesh well with the 3 ideas above. To me, indexing an FS in the CAS just means that I want to be able to retrieve this FS back out of the CAS later. It does not mean that I'm asserting it to be a member of any view. A view to me is just a set of indexes; moreover, it's a subset of the set of all indexes, which are exactly the indexes defined in the CAS. When I add a FS to all those indexes, it will be added to all applicable indexes, and that means all view indexes as well. Alternatively, we can say adding an FS in the CAS means adding it to global, non-view indexes only. That would make sense, but it doesn't sync with the idea that the CAS index repository contains all indexes, not just the global ones. Maybe we need a special API for that, addFsToGlobalIndexes(). So maybe getGlobalIndexRepository() should be called something else, to avoid confusion. getCompleteIndexRepository() or something. Moreover, I think the reverse direction should be true -- indexing an FS in a view's index repository DOES add it (at least conceptually) to indexes that apply to the CAS as a whole. I liked this latter idea because it provided a way to get at all the FS in the CAS without having to be concerned with views. I agree, and I hope that has been clear from my previous posts. Any view-specific index is visible from the CAS, in my approach. So to summarize, I would suggest that annotation indexes, for example, only live in views, there is no global annotation index (neither conceptually, nor physically). To access annotations from the CAS, you still need to access view-specific indexes. Non-sofa indexes, on the other hand, only exist in the global namespace. The only rule of visibility is that one view can not access the view-specific indexes of another view. Everything else is always visible. So what I haven't figured out for myself is, what makes a sofa-index a sofa-index? Do we need a declaration, or can we figure this out automatically? I think it's a view-index, not necessarily a sofa-index (for now it doesn't matter, but we may someday
[jira] Closed: (UIMA-135) Remove Entity View mode from DocumentAnalyzer
[ http://issues.apache.org/jira/browse/UIMA-135?page=all ] Adam Lally closed UIMA-135. --- Resolution: Fixed Changed entity view mode to use a user-supplied EntityResolver object, rather than depend on an IBM-specific typesystem. Remove Entity View mode from DocumentAnalyzer - Key: UIMA-135 URL: http://issues.apache.org/jira/browse/UIMA-135 Project: UIMA Issue Type: Task Components: Tools Reporter: Adam Lally Assigned To: Adam Lally Fix For: 2.1 The DocumentAnalyzer's entity view mode is currently broken, and it only ever worked for annotators that used an IBM-proprietary type system. We need to remove this mode and leave the ability for IBM to add such capability in its own derivative of the DocumentAnalyzer. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: CAS and CasView redesign - question if all views should share thesame indexes?
Re: Need for Global indexes Adam Lally wrote: snip Moreover, I think the reverse direction should be true -- indexing an FS in a view's index repository DOES add it (at least conceptually) to indexes that apply to the CAS as a whole. I liked this latter idea because it provided a way to get at all the FS in the CAS without having to be concerned with views. I agree, and I hope that has been clear from my previous posts. Any view-specific index is visible from the CAS, in my approach. OK, as I said above I think I was just stuck on whether or not the thing that from the base CAS gives you a merged view of all the view indexes was called an index, or whether it's just a utility method. I'm using the terms index definitions and index instances here; we can have one global set of index definitions (or not :-) while having multiple index instances for those definitions, one per view, and perhaps (a conceptual, maybe not real) one for the base CAS or global view or whatever we want to call it - something used by people not concerned about views. What is the use case or the global view set of indexes? I can't recall the use-case for this, beyond being able to get all the data. This thread has suggested other utilities that can effectively merge the results from other view's index instances. Are there other use cases? We had once discussed a use case where some collection of parts (annotators) that worked with views wanted to share some data that was global to their views. We thought that the best-practice way to do that was to have this collection of parts define another view to serve as their global-sharing-place, in preference to a system-provided global-sharing-place because that would enable this collection of parts to be combined with other parts in the future without having any accidental collisions in the global-sharing-space, from other unknown users of this space. I guess I would vote to have the thing that gets all the FS in all views be just a utility method. I hope if we put our minds to it we can get this done for 2.1. I'm hoping after 2.1 we can go a good long time without breaking backwards compatibility again. +1 to that :-) -Marshall