Re: Eclise Annotation Editor

2006-12-14 Thread Thilo Goetz

Hi Joern,

is this your Text Analysis Environment on SourceForge 
(https://sourceforge.net/projects/tae/)?  Looks pretty cool!  This would 
be a nice addition to our Eclipse-based tooling.


--Thilo

Jörn Kottmann wrote:

Hello,

I have developed an eclipse editor to edit xcas files, it can add, 
remove and change
 Annotations and FeatureStructures. This is done within the editor and 
some views.
The plugin also defines its own project and has an special explorer for 
it like the package explorer in
JDT. Its also supports the execution of Annotators and CASConsumers 
against the xcas files stored

inside the project.

If you are interested to add this or parts of it to the UIMA project I 
would like to sponsor the code and

time to integrate it.

Let me know what you think,
Jörn


Re: Eclise Annotation Editor

2006-12-14 Thread Jörn Kottmann
yes, there are some still compatibility issues with UIMA. The current  
release has many smaller issues,

many of them are already fixed. I will make the next release soon.

On Dec 14, 2006, at 10:27 AM, Thilo Goetz wrote:


Hi Joern,

is this your Text Analysis Environment on SourceForge (https:// 
sourceforge.net/projects/tae/)?  Looks pretty cool!  This would be  
a nice addition to our Eclipse-based tooling.


--Thilo

Jörn Kottmann wrote:

Hello,
I have developed an eclipse editor to edit xcas files, it can add,  
remove and change
 Annotations and FeatureStructures. This is done within the editor  
and some views.
The plugin also defines its own project and has an special  
explorer for it like the package explorer in
JDT. Its also supports the execution of Annotators and  
CASConsumers against the xcas files stored

inside the project.
If you are interested to add this or parts of it to the UIMA  
project I would like to sponsor the code and

time to integrate it.
Let me know what you think,
Jörn




[jira] Created: (UIMA-128) ll_setStringValue not checking if feature range is subtype of String with Allowed Values, not doing Allowed Value check

2006-12-14 Thread Marshall Schor (JIRA)
ll_setStringValue not checking if feature range is subtype of String with 
Allowed Values, not doing Allowed Value check
---

 Key: UIMA-128
 URL: http://issues.apache.org/jira/browse/UIMA-128
 Project: UIMA
  Issue Type: Bug
  Components: Core Java Framework
Affects Versions: 2.1
Reporter: Marshall Schor
Priority: Minor


The JCas code generated for setting string values uses the ll_setStringValue 
method in the CASImpl.  This method does not check if the type of the feature 
being set is a *subtype* of String with allowed values, and doesn't throw the 
needed exception if the item being set is not in the set of allowed values.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Eclise Annotation Editor

2006-12-14 Thread Adam Lally

On 12/14/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:

Hi Joern,

is this your Text Analysis Environment on SourceForge
(https://sourceforge.net/projects/tae/)?  Looks pretty cool!  This would
be a nice addition to our Eclipse-based tooling.

--Thilo



I got this from SourceForge but was unable to run it.  The net.sf.tae
plugins show up with red X's in the plugin registry, even though I've
installed GEF and UIMA 1.3.2.  There's nothing in the error log.  What
might I be doing wrong?

Anway, it sounds like this would be a useful addition.  Thanks, Joern,
for offering to contribute it.

What do the other commiters think -- would this make a good first
project for our UIMA sandbox?

-Adam


Re: [jira] Created: (UIMA-116) Always deliver the base CAS to the process method

2006-12-14 Thread Thilo Goetz

Adam Lally wrote:

On 12/13/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:


I couldn't agree more (except for the default bag indexes).  It makes no
sense at all that global indexes must be accessed via a particular view.



I can't tell what exactly you're agreeing to.  Are you thinking that
anything indexed in a view would also be by definition indexed in the
"global" view?  Do we need different index definitions for the global
view (so we don't have a global index over annotations sorted by
begin, end but containing annotations from multiple Sofas)?


I was agreeing to your statement that non-sofa (i.e., non-annotation) 
indexes make sense in a global view.


I would think that annotations for different sofas would be in different 
indexes.  Not sure what we currently do though.  All those indexes might 
be accessible from the global view.



In that case, a view could be seen as just a set of indexes, with
possibly just two methods: getIndexes() (and variations) and
addToIndexes(FS).  The base CAS would be a view on everything.  A view
might be what we now call index repository.  In fact, if we just rename
the index repository to "view", we're done ;-).  Just a little
implementation to make more than one index repository possible.



We haven't addressed Sofas yet.

The base CAS does not have a single subject of anlaysis, so methods
like getDocumentText() and its relatives are a problem.  These methods
should belong to a view.  (According to the spec, not all views
necessarily have a Sofa, but it is a common use case supported by the
particular kind of view called an Anchored View.)


Sure, those would be on the view as well.  Would we then have 
text-specific views, like TCasView?  I'm not proposing this, mind you, 
just asking.



So no CAS cas = inCas.getView()?



Certainly, I never liked that idea but are we back to essentially 
requiring:

CasView viewOfMySofa = inCas.getView() ?


+1 to that.



Since inCas.getDocumentText() would not work, and inCas cannot be used
to iterate over or index annotations belonging to a particular Sofa.


inCas.getDocumentText() would not work, +1 to that.  However, I was 
thinking that you would be able to access *all* indexes (and their 
contents of course) from the CAS, not just the sofa-neutral ones. 
Perhaps you wouldn't know how to interpret the sofa information, but the 
annotations would still be accessible.  I think this is consistent with 
your idea/proposal that the CAS is the container of all data.  Was this 
also what you were thinking?


--Thilo



Re: [jira] Created: (UIMA-128) ll_setStringValue not checking if feature range is subtype of String with Allowed Values, not doing Allowed Value check

2006-12-14 Thread Thilo Goetz

Oops.  Let me look into that.

--Thilo

Marshall Schor (JIRA) wrote:

ll_setStringValue not checking if feature range is subtype of String with 
Allowed Values, not doing Allowed Value check
---

 Key: UIMA-128
 URL: http://issues.apache.org/jira/browse/UIMA-128
 Project: UIMA
  Issue Type: Bug
  Components: Core Java Framework
Affects Versions: 2.1
Reporter: Marshall Schor
Priority: Minor


The JCas code generated for setting string values uses the ll_setStringValue 
method in the CASImpl.  This method does not check if the type of the feature 
being set is a *subtype* of String with allowed values, and doesn't throw the 
needed exception if the item being set is not in the set of allowed values.



Re: Eclise Annotation Editor

2006-12-14 Thread Michael Baessler

Adam Lally wrote:

What do the other commiters think -- would this make a good first
project for our UIMA sandbox?
Yes I think this is good first component for the UIMA sandbox. But first 
we have to clarify the details for the submission...

I seems that we need an Apache Software Grant for this code.

If found the following at: www.apache.org/licenses
Software Grants


When an individual or corporation decides to donate a body of existing 
software or documentation to one of the Apache projects, they need to 
execute a formal Software Grant 
 agreement with the 
ASF. Typically, this is done after negotiating approval with the ASF 
Incubator  or one of the PMCs, since the 
ASF will not accept software unless there is a viable community 
available to support a collaborative project.


-- Michael


Re: [jira] Created: (UIMA-116) Always deliver the base CAS to the process method

2006-12-14 Thread Adam Lally

On 12/14/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:

Sure, those would be on the view as well.  Would we then have
text-specific views, like TCasView?  I'm not proposing this, mind you,
just asking.



My sense is we want to stay away from modality-specific views.
Perhaps I might change my mind if there were a compelling argument.  I
think we'll need to have a JCasView, though.


CasView viewOfMySofa = inCas.getView() ?

+1 to that.


I'm OK with this being the way that new code does things, but still
think it's important to have a deprecation strategy for older code.


inCas.getDocumentText() would not work, +1 to that.  However, I was
thinking that you would be able to access *all* indexes (and their
contents of course) from the CAS, not just the sofa-neutral ones.
Perhaps you wouldn't know how to interpret the sofa information, but the
annotations would still be accessible.  I think this is consistent with
your idea/proposal that the CAS is the container of all data.  Was this
also what you were thinking?



Let me see if I can summarize where we are now:

* Index definitions are shared across the entire CAS.
* Each defined index will have one instance in the CAS as well as an
instance for each view (or sofa?  right now sofas and views are 1-1 so
it doesn't matter but I wonder what the right terminology is)
* You can add FS to the indexes in a view (or multiple views).  You
can also add FS to the indexes on the CAS, which is a place to store
indexed FS that don't belong to any view.
* If you get an iterator over an index from the CAS, this iterator
will return you FS that were indexed in the CAS well as FS that were
indexed in any view.


-Adam


DocumentAnnotation and type-merging

2006-12-14 Thread Adam Lally

Yes, this topic again...

While working on a utility to migrate code from IBM UIMA to Apache
UIMA, I encountered the case where the user's project has a definition
of com.ibm.uima.jcas.tcas.DocumentAnnotation.  That's because JCasGen
creates this, to account for cases where the user has defined their
own features to add to DocumentAnnotation.

As far as the migration goes, this would need to become
org.apache.uima.jcas.tcas.DocumentAnnotation.  But I can't do that
with search-and-replace, I'd actually have to move the file to a
different directory.  Instead of doing that, this may become a manual
migration step.  (If the user did not add custom code, just deleting
this file and rerunning JCasGen is sufficient.)

It also occurred to me that we lost uima_jcas_builtin_types.jar as
part of our Maven restructuring.  This jar contained the
DocumentAnnotation class, which was not in uima_core.jar.  This was
done so that applications could exclude this jar from their classpath
to allow a user-generated DocumentAnnotation class to be used instead
of the one that ships with UIMA.  But this is pretty ugly.

Questions:
1) Do we continue to support adding features to DocumentAnnotation?
2) If so, should we delete the JCAS DocumentAnnotation class from the
framework code entirely, to avoid the problem that required the
uima_jcas_builtin_types.jar workaround?
3) What should the migration story be for existing user code?

-Adam


JUnit test extension files

2006-12-14 Thread Michael Baessler

Hi,

I think we should move all files from the uimaj-test-utils project that 
are used in uimaj-core also to the uimaj-core project. I would like to 
remove the dependency that

uimaj-core needs uimaj-test-util.
These are:
ExceptionPrinter.java
FileCompare.java
JUnitExtension.java
TestPropertyReader.java

Why is this necessary. I would like to use the uimaj-test-util project 
to provide some helper classes for the annotator testing. So that users 
that write analysis component for our sandbox
can all use the same test methods. This make things easier to 
understand. To do this, uimaj-test-util needs a dependency on uimaj-core.


What do you think?

-- Michael




Re: JUnit test extension files

2006-12-14 Thread Adam Lally

On 12/14/06, Michael Baessler <[EMAIL PROTECTED]> wrote:

I think we should move all files from the uimaj-test-utils project that
are used in uimaj-core also to the uimaj-core project.


They are used not only in uimaj-core but also in other projects like
uimaj-cpe.  So I think these need to be in a separate project that can
be a dependency of both.

If they are added to uimaj-core/src/main/java, they will end up in
uima-core.jar, which I don't think we want.  If they are added to
uimaj-core/src/test/java, then uimaj-core's unit tests will work but
uimaj-cpe's will not (uimaj-cpe can't refer to test case code inside
uimaj-core).

However, it's worth considering if we need these classes at all.  As
noted in http://issues.apache.org/jira/browse/UIMA-45 the
TestPropertyReader.getJUnitBasePath() method is currently hacked up to
allow locating resource files in the classpath.  It might be better to
scrap this and update all the unit tests to do ClassLoader lookups
themselves.

-Adam


[jira] Reopened: (UIMA-61) CasCreationUtils.createCas(Collection) silently ignores TypeSystemDescription objects,

2006-12-14 Thread Adam Lally (JIRA)
 [ http://issues.apache.org/jira/browse/UIMA-61?page=all ]

Adam Lally reopened UIMA-61:


 
This broke some user code.  Reconsidering.

> CasCreationUtils.createCas(Collection) silently ignores TypeSystemDescription 
> objects,
> --
>
> Key: UIMA-61
> URL: http://issues.apache.org/jira/browse/UIMA-61
> Project: UIMA
>  Issue Type: Bug
>  Components: Core Java Framework
>Reporter: Adam Lally
> Assigned To: Adam Lally
>Priority: Minor
> Fix For: 2.1
>
> Attachments: UIMA-61-testCase.patch
>
>
> The CasCreationUtils.createCas(Collection,...) methods only accept certain 
> kinds of objects in the Collection: AnalysisEngineDescription, 
> CollectionReaderDescription, CasInitializerDescription, 
> CasConsumerDescription, or ProcessingResourceMetaData.  Any other kinds of 
> objects in the collection are silently ignored.
> A user tried to pass a TypeSystemDescription object, expecting that it would 
> be used to initialize the CAS type system.  This didn't work but didn't cause 
> an error, so the user had a hard time figuring out what was wrong with their 
> application.
> There's no reason why these methods could accept TypeSystemDescription 
> objects (as well as FsIndexCollection and TypePriorities objects).  
> Furthermore they should throw an error if passed a type of object that is not 
> allowed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Closed: (UIMA-61) CasCreationUtils.createCas(Collection) silently ignores TypeSystemDescription objects,

2006-12-14 Thread Adam Lally (JIRA)
 [ http://issues.apache.org/jira/browse/UIMA-61?page=all ]

Adam Lally closed UIMA-61.
--

Resolution: Fixed

Previously this threw an exception if an Aggregate AE descriptor contained a 
URISpecifier.  This broke some user code.
Fixed problem by adding support for URISpecifiers.

> CasCreationUtils.createCas(Collection) silently ignores TypeSystemDescription 
> objects,
> --
>
> Key: UIMA-61
> URL: http://issues.apache.org/jira/browse/UIMA-61
> Project: UIMA
>  Issue Type: Bug
>  Components: Core Java Framework
>Reporter: Adam Lally
> Assigned To: Adam Lally
>Priority: Minor
> Fix For: 2.1
>
> Attachments: UIMA-61-testCase.patch
>
>
> The CasCreationUtils.createCas(Collection,...) methods only accept certain 
> kinds of objects in the Collection: AnalysisEngineDescription, 
> CollectionReaderDescription, CasInitializerDescription, 
> CasConsumerDescription, or ProcessingResourceMetaData.  Any other kinds of 
> objects in the collection are silently ignored.
> A user tried to pass a TypeSystemDescription object, expecting that it would 
> be used to initialize the CAS type system.  This didn't work but didn't cause 
> an error, so the user had a hard time figuring out what was wrong with their 
> application.
> There's no reason why these methods could accept TypeSystemDescription 
> objects (as well as FsIndexCollection and TypePriorities objects).  
> Furthermore they should throw an error if passed a type of object that is not 
> allowed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Progress on Migration Utility

2006-12-14 Thread Adam Lally

I've made some good progress on a utility that can be used to help
users migrate their code from IBM UIMA to Apache UIMA.

The class is org.apache.uima.tools.migration.IbmUimaToApacheUima in
uimaj-tools, and there are corresponding .bat/.sh scripts in
uimaj-distr.

It's basically just a glorified search-and-replace utility, but has
some special treatments of package names to make sure that it only
updates actual UIMA package names, not just everything with a
com.ibm.uima prefix (which several of our users within IBM do).

Anyway I've tried it on some pretty big UIMA projects and it seems to
do very well.  Here are some things it won't handle:

* User code that's in a package with the same exact name as one of the
UIMA packages.  Hopefully this occurs rarely, but unfortunately
there's one common case - DocumentAnnotation - which I mentioned in a
previous email. In such a case the package statement would get
replaced, but the .java file will then be in the wrong place in the
source tree.

* Package names that are prefixed by org.apache.uima AND start with a
capital letter.  I hope no one has a package named
com.ibm.uima.MyPackage.  This would be treated as a class name and
replaced with org.apache.uima.MyPackage wherever it occurs.

* Use of _undocumented_ classes in the com.ibm.uima.util package,
because these moved to a different package than the documented
classes.  Can be fixed in Eclipse by a simple Organize Imports
operation.

* xi:include in descriptors.  There's no easy way to automatically
replace this, unfortunately.  Users will have to manually replace them
with the appropriate use of .


More work on this will probably be needed as we make more decisions to
change things, for example if we entirely remove TCAS.

-Adam