Re: uima website updated, please review index-draft.html

2006-12-12 Thread Michael Baessler

Adam Lally wrote:

I still agree with my own suggestion. ;)  Given the lack of objections
from anyone, if you have the bandwidth I'd suggest you draft something
and let us review.
I added the news section to the website, feel free to comment or add 
some news.


-- Michael


UIMA sandbox component build

2006-12-13 Thread Michael Baessler

Hi,

when writing the first analysis component for the UIMA sandbox I came to 
the point where I have to provide a build for the component.
My first thought was to do this also with Maven but when trying to add 
the pom.xml some incompatibilities came to my mind.


- To package a analysis component as pear file we have to use eclipse, 
since we do not have a pear packager outside of eclipse. So users have 
to use eclipse if they want to package an analysis component.
- To package the analysis component correctly we have to put the 
resource and the code to the directories specified in the pear 
description (UIMA nature) layout but this layout does not match the 
Maven layout.


What do you think, how should be proceed.

1. Should we keep the UIMA nature layout for UIMA sandbox projects and 
add a Maven build that has a different file structure than suggested by 
Maven and used in the UIMA core framework.


2. Should we keep the UIMA nature layout for UIMA sandbox projects and 
build it with an ant build?


other suggestions?

-- Michael






Re: UIMA sandbox component build

2006-12-13 Thread Michael Baessler

Adam Lally wrote:

Hmmm.. how hard would it be to have a pear packager that runs outside
of Eclipse?  It does seem like a poor dependency to have since we
claim UIMA doesn't require Eclipse.
So in that case we can also think about creating a pear package during 
the build.
I will check if it is possible to redesign the code to have an API to 
build a pear package and

maybe to create a pear package during the build.

I thought the default PEAR directory layout was optional.  It would be
nice if the Maven layout were supported.
I don't think the UIMA nature is optional. Some of the directories must 
exist but maybe we can change that...


So our goal is to have a Maven based structure for the sandbox 
components and spend time to adapt the pear utilities on this.

Again, let me check how expensive this is...

-- Michael


Re: Eclise Annotation Editor

2006-12-14 Thread Michael Baessler

Adam Lally wrote:

What do the other commiters think -- would this make a good first
project for our UIMA sandbox?
Yes I think this is good first component for the UIMA sandbox. But first 
we have to clarify the details for the submission...

I seems that we need an Apache Software Grant for this code.

If found the following at: www.apache.org/licenses
Software Grants


When an individual or corporation decides to donate a body of existing 
software or documentation to one of the Apache projects, they need to 
execute a formal Software Grant 
 agreement with the 
ASF. Typically, this is done after negotiating approval with the ASF 
Incubator  or one of the PMCs, since the 
ASF will not accept software unless there is a viable community 
available to support a collaborative project.


-- Michael


JUnit test extension files

2006-12-14 Thread Michael Baessler

Hi,

I think we should move all files from the uimaj-test-utils project that 
are used in uimaj-core also to the uimaj-core project. I would like to 
remove the dependency that

uimaj-core needs uimaj-test-util.
These are:
ExceptionPrinter.java
FileCompare.java
JUnitExtension.java
TestPropertyReader.java

Why is this necessary. I would like to use the uimaj-test-util project 
to provide some helper classes for the annotator testing. So that users 
that write analysis component for our sandbox
can all use the same test methods. This make things easier to 
understand. To do this, uimaj-test-util needs a dependency on uimaj-core.


What do you think?

-- Michael




Re: JUnit test extension files

2006-12-15 Thread Michael Baessler

Adam Lally wrote:

They are used not only in uimaj-core but also in other projects like
uimaj-cpe.  So I think these need to be in a separate project that can
be a dependency of both.

If they are added to uimaj-core/src/main/java, they will end up in
uima-core.jar, which I don't think we want.  If they are added to
uimaj-core/src/test/java, then uimaj-core's unit tests will work but
uimaj-cpe's will not (uimaj-cpe can't refer to test case code inside
uimaj-core).

However, it's worth considering if we need these classes at all.  As
noted in http://issues.apache.org/jira/browse/UIMA-45 the
TestPropertyReader.getJUnitBasePath() method is currently hacked up to
allow locating resource files in the classpath.  It might be better to
scrap this and update all the unit tests to do ClassLoader lookups
themselves.
I see, but I think we still need some helper classes for for the current 
tests. I also think when using the class loader
to find resources this should be externalized to a helper class so that 
the tests could easy access the resources.
Another sample is the file compare helper class or the JUnit exception 
printer stuff that is used in some of the tests. So my suggestion is to 
keep the
uimaj-test-util project but fix and redesign some of the methods to use 
class loading to find resources. For example replace the
JUnitExtension.getFile() method with the code below to load the 
resources using the class loader.


public static File getFile(String aRelativeFilePath) {
   URL url = 
JUnitExtension.class.getClassLoader().getResource(aRelativeFilePath);

   File file = null;
   if(url != null) {
 file = new File(url.getFile());
   }
   return file;
 }

Rewrite all other tests that currently do not use this way to use it. So 
the only open item are tests that just creates temp files using the 
JUnitBasePath. For these we can
use the Java temp file mechanism and specify a working directory where 
the file should be created. The working directory could be a parent 
directory form on of the loaded

file resources already used in the test.

To create my sandbox test helper classes I will use another project 
called uimaj-component-test-util. This project will depend on uimaj-core.


-- Michael




Re: Release plan

2006-12-19 Thread Michael Baessler

Thilo Goetz wrote:
I have added a short page on a release plan to the Wiki.  I'm sure 
this isn't the final word, and we'll keep refining it as we go.


Given this is our first release in the Incubator, and from what I've 
been observing on [EMAIL PROTECTED], I think we should calculate 
about 2 weeks for the voting process.  It would be quite amazing if we 
could get it right the first time.


I guess we want about 2 weeks of testing, doc review etc. before that. 
So this would mean we enter the test phase around the middle of 
January.  So here's a tentative schedule:


now - 1/21/07:coding, general enhancements
1/22/07 - 2/4/07: testing & bug fixing
2/5/07:   cut the release, start the vote
2/15/07:  release UIMA 2.1

Let me know what you think.  I'll add this to the Wiki once we have 
agreement on the dev list.


--Thilo



+1

Fine with me.

-- Michael


Re: JUnit test extension files

2006-12-19 Thread Michael Baessler

On 12/15/06, Michael Baessler <[EMAIL PROTECTED]> wrote:

So my suggestion is to
keep the
uimaj-test-util project but fix and redesign some of the methods to use
class loading to find resources. For example replace the
JUnitExtension.getFile() method with the code below to load the
resources using the class loader.

public static File getFile(String aRelativeFilePath) {
URL url =
JUnitExtension.class.getClassLoader().getResource(aRelativeFilePath);
File file = null;
if(url != null) {
  file = new File(url.getFile());
}
return file;
  }

Rewrite all other tests that currently do not use this way to use it.
I did the changes described above and checked in the changes to SVN ... 
everything works fine again in my eclipse workspace and only test fails 
with the Maven build.
I will check that during the next days. If anyone has problems with some 
tests, please let me know.


-- Michael



Re: JUnit test extension files

2007-01-03 Thread Michael Baessler

Michael Baessler wrote:
To create my sandbox test helper classes I will use another project 
called uimaj-component-test-util. This project will depend on uimaj-core.


I added the uimaj-component-test-util project with some helper classes 
for e.g. annotator components. These helper classes can by used e.g. for 
sandbox components.


I will add some documentation on the website/wiki when I finished my work.

I also added the project to the uimaj POM module list. If anything 
doesn't work after my change, please let me know.


-- Michael



Re: Sandbox projects use of "org.apache.uima"

2007-01-04 Thread Michael Baessler

Marshall Schor wrote:
I wonder if things like the Whitespace Annotator in the sandbox should 
use "org.apache.uima" as the start of their package names.
I think this is fine if the thing is destined to be an incorporated 
part of the uima java project.

Should it be something else, otherwise?

-Marshall


I would vote to start at least with "org.apache.uima...", since the 
stuff belongs to the Apache UIMA project. If we should not use exactly 
the same name space we can also use
"org.apache.uima_annotators..." or "org.apache.uima_sandbox.annotators" 
if we think it is better.


-- Michael


Re: Are configuration parameter groups depreciated in version 2?

2007-01-04 Thread Michael Baessler

Eddie Epstein wrote:

[EMAIL PROTECTED] wrote:


The current parameter group design requires replicating all
configuration
parameters for each language. This is neither elegant nor easy to
maintain. A better design would use common configuration paramters with
the provision to override specific parameters for a given configuration.
All parameters could be in a flat space, but a parameter could be
addressed using a variable suffix. For example:

   windowSize = (Integer)getConfigParameterValue("WindowSize", suffix);

where suffix could be a string variable extraced from the CAS. If
"WindowSize" + suffix did not exist as a parameter, the framework would
attempt to use just "WindowSize" as a parameter key
  

But how do I know in my annotator code what groups/suffixes are
available? Currently I can ask for the groups and
get back an array of available groups.


The list of suffixes would be dynamically created based on scanning all
configuration parameters and returning a list of unique suffixes found.

Eddie


  

So if there is an API to return all suffixes, fine with me.

-- Michael


Re: License question for Snowball annotators in Sandbox

2007-01-05 Thread Michael Baessler

Marshall Schor wrote:
And I guess this project needs a "NOTICES" file, documenting all of 
these issues.  
I added a NOTICE.txt file with all the information I think is necessary. 
Feel free to edit if you think something is missing.


-- Michael


Re: CAS/CasView design, another summary

2007-01-08 Thread Michael Baessler

Thilo Goetz wrote:

So here's what I would do:

- No view API in UIMA 2.x.  Possibly change the view/sofa related APIs 
in a manner indicated above (CAS getView(String viewName) -> void 
setView(String viewName) etc).  I would find that much more intuitive, 
but I'm not hopeful that other people on this list will.


- Start discussing UIMA 3 now.  Take a clean slate approach, consider 
how we want to handle indexing in the future.  Have a couple of beta 
releases to get feedback from users.


- Support both UIMA 2 and 3 in parallel, for as long as necessary. 
Consider interoperability, possibly in the same JVM.


I really think that the CAS/view redesign proposal as it currently 
stands will make things even worse, and that it should be reconsidered.
Till now I only read the posts to the CAS/CasView discussion and tried 
to understand the details, sometimes without success :-) I think this 
also depends on may current knowledge and I have to do my homeworks to 
contribute to that discussion.


My current point of view, as far as I understand the discussion is that 
we do not really have a consent on the CAS/CasView architecture. So I 
like the suggestions to start this discussion again for UIMA 3 and do 
things step by step. This would help me and maybe others (if we find 
some contributors after the release) to contribute to the discussion.
This suggestions seems also more reasonably when thinking about the 
first UIMA release with code freeze in about two weeks to get out a 
stable version of UIMA.


Just my point of view.

-- Michael



Re: [jira] Assigned: (UIMA-161) adding documentation for PEAR API

2007-01-08 Thread Michael Baessler

Marshall Schor (JIRA) wrote:

 [ 
https://issues.apache.org/jira/browse/UIMA-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor reassigned UIMA-161:
---

Assignee: Michael Baessler  (was: Marshall Schor)

Minor formatting edits done.  Transferring back to Michael Baessler as 
requested.

  

adding documentation for PEAR API
-

Key: UIMA-161
URL: https://issues.apache.org/jira/browse/UIMA-161
Project: UIMA
 Issue Type: Improvement
 Components: Documentation
   Affects Versions: 2.1
   Reporter: Michael Baessler
Assigned To: Michael Baessler
Fix For: 2.1


adding some chapters about the PEAR API to the documentation. E.g. how to 
install a PEAR file using the API.



  

Hi Lev,

can you please review the PEAR API documentation changes done with this 
issue.


Thanks

-- Michael



Re: DocumentAnnotation and type-merging

2007-01-09 Thread Michael Baessler

Adam Lally wrote:

On 12/22/06, Adam Lally <[EMAIL PROTECTED]> wrote:

On 12/22/06, Marshall Schor <[EMAIL PROTECTED]> wrote:
> Re: What to do about Document Annotation for 2.1.
>
> a) Do the work to make it easy to get singletons (or whatever we're
> calling this feature) out of the CAS
>
> b) Change JCasGen to not generate DocumentAnnotation if the merged
> version = the base
>
> c) Add the DocumentAnnotation jar back to the build, with instructions
> to remove it from the
> deployment and substitute a JCasGenerated one if the application 
needs a

> special one.
>
> I've may have forgotten some point(s) - does this address the issues?


My only complaints are that I think it is ugly to have a
uima-jcas-builtin-types.jar (or uima-document-annotation.jar or
whatever its called) that has just this one class in it, and I think
this this is a dangerous feature that users can hang themselves with.

But, I suppose I can live with this solution... keeping support for
the feature so we don't break compatibility, but putting sufficient
warnings in to do our best to keep users from hanging themselves with
it.

I do think that it seems like an improvement to not generate a new
version of DocumentAnnotation if the merged version = the base.  Why
didn't we think of that before?




I created a JIRA issue for this.

I'm pretty sure (a) is still controversial, and I suppose not strictly
needed for v2.1.  I'll post on a separate thread to at least get the
ball rolling.

For (c) I think I will have to create a new component
uimaj-document-annotation that has just the one Java class
(org.apache.uima.jcas.tcas.DocumentAnnotation) in it, and which builds
into uimaj-document-annotation.jar.  It's a little ugly but I haven't
heard a better suggestion.  Agreed?

-Adam

Right now, it seems to be the only way. Fine with me.

What about the documentation. I think we should add a paragraph about 
the uimaj-document-annotation.jar and how it should be used/replaced.
There is already a chapter in the documentation writing about issues 
with the document annotation, but this mainly talks about the extension 
class loader issues.


-- Michael


Re: Eliminating TCAS

2007-01-09 Thread Michael Baessler

Fine with me.  +1

-- Michael


Marshall Schor wrote:

+1 from me.

-Marshall

Adam Lally wrote:

Even if we don't do the larger view redesign, I still think it makes
sense to drop TCAS:
http://issues.apache.org/jira/browse/UIMA-115

There is little reason for TCAS to still exist at this point - it no
longer defines any methods that are not on CAS, and the methods
CAS.getTCAS() and CAS.getTCAS(SofaFS) have been deprecated since 2.0,
along with a lot of other TCAS-related methods.   So this change would
remove quite a lot of deprecated methods (as opposed to adding more!).

And it's a good candidate for search-and-replace:

* TCAS goes to CAS
* getTCAS goes to getView.  [Or better, "getTCAS()" does to
"getCurrentView()" and then "getTCAS(" goes to get "getView(" - I like
that better than having a 0-arg CAS.getView() method.]

Can we get consensus on this at least?
-Adam










Re: OSGi enablement and JCas

2007-01-09 Thread Michael Baessler

Eddie Epstein wrote:
The work done here with OSGi convinced me that distributed type 
definition is
incompatible and should be eliminated. Instead, UIMA should have type 
system

objects containing one or more type defintions and their associated
JCas classes.
Type system objects would be a good fit for OSGi, and be more 
consistent with

the concept of types being shared between different analytic components.

Did I understand you correct, you would like to have for each bundle a 
service interface containing the type system that the bundle defines and 
of course the
implementation of that. So all other bundles can refer these bundles and 
use their implementation. But this only works when we do not support 
distributed type
definitions, so that the same type can be defined/extended by another 
bundle. Right?


-- Michael



Re: OSGi enablement and JCas

2007-01-09 Thread Michael Baessler

Adam Lally wrote:

On 1/9/07, Eddie Epstein <[EMAIL PROTECTED]> wrote:

On 1/9/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:
> a) backward compatibility and b) some people actually seem to be using
> it and don't want to be without it.

Then perhaps OSGi components will not be compatible with distributed
definitions,
but other component packaging will still be supported that do.



That makes sense to me.

-Adam


I don't like the idea that the UIMA OSGi extension does not support 
distributed type definitions but the "old" framework does. I think OSGi 
is the future and if
we do not support distributed type definitions with OSGi we should 
deprecate that completely. But I don't think this is really possible.


-- Michael



Re: Does anyone object to removing @author tags in comments at the top of source files?

2007-01-09 Thread Michael Baessler

Adam Lally wrote:

On 1/9/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
This is not on our "code conventions" page, but I'd like to put it 
there.


I think we discussed this a long time ago - and found the "apache way"
was more to honor the community, rather than call out specific authors
of files.  For those who wish attribution, the suggestion was to add
that to the NOTICES file.

It was suggested that this allows less time/energy being spent on whose
name should be included (e.g., if someone removes some things, do they
get to add their name to the set of "authors").

If no objections, I'll enter a Jira issue to clean up the remaining
@author tags.



+1

-Adam



+1

-- Michael


Re: [jira] Created: (UIMA-164) Add source build

2007-01-10 Thread Michael Baessler

Adam Lally wrote:

I think we're agreed that the docbooks sources should be in the UIMA
source distribution.  The question then is what do do about building
it.  It seems our choices are:
(a) Include all the jars in SVN as part of our source distribution.
The licenses all seem OK, but we will need to update our NOTICE file,
for our source release only.
(b) Do not include any of the jars.  Require that users get them from
our SVN, instead.  I guess that means we wouldn't have to do anything
with our NOTICE file. (But IANAL.)

Which should we do?
I would prefer a) since if I download a  source distribution I don't 
want to load additional files from the SVN. Seems to me like unnecessary 
effort.


-- Michael




Re: [jira] Assigned: (UIMA-161) adding documentation for PEAR API

2007-01-10 Thread Michael Baessler

Lev Kozakov wrote:
I believe, the additional text contains enough details to make 
developers understand how to install a PEAR and use it to deploy AE in 
UIMA. My only concern is the example application in the 
tug.application.xml file (in tutorials_and_user_guides). This 
application mentions a PEAR ("get Resource Specifier from XML file or 
PEAR"), but does not refer to the PEAR API documentation added by 
Michael. 
I would prefer to replace the "get Resource Specifier from XML file or 
PEAR" comments in the tutorials_and_user_guides with ""get Resource 
Specifier from XML file".
The samples provided there do not show or mention how to work with PEARs 
and PackageBrowser objects. So I think we should remove the "or PEAR" 
part from the comment.

Other opinions?

-- Michael



Re: Global FS variables, another suggestion

2007-01-11 Thread Michael Baessler

Thilo Goetz wrote:

CAS.declareFsVariable(String name, Type type)
CAS.isFsVariable(String name):boolean
CAS.getFsVariableType(String name):Type

I would like to have an additional method like

CAS.getFsVariableForType(Type type):FS

so that an analysis component do not have to know the variable name, it 
only have to know that for a certain type a variable exist that contains the

FS of interest and can retrieve it.

-- Michael




Re: [jira] Commented: (UIMA-193) PEAR Encoding Test gives NullPointerException under Sun Java 1.4.2

2007-01-22 Thread Michael Baessler

Adam Lally wrote:

For me (I just tried again), the failing tests are:
testUTF8WithSignature
testUTF16NoSignature

Can one of the other committers try this and see if they get the same
result as I do?

The same for me when using the SUN JVM 1.4.2_12

-- Michael


Re: Writing something about the Sandbox

2007-01-23 Thread Michael Baessler

Marshall Schor wrote:
Michael - could you write something about the Sandbox on Apache?  
Purpose, what's in it.  Something short - to be included in the 
docbook docs.


-Marshall


So you want to include a short paragraph about the sandbox to the 
docbook of the UIMA 2.1 release. I would only add something similar as 
we have on the web:


   The UIMA sandbox is a workspace that is open to all UIMA committers
   and developers who would like to contribute code and join the UIMA
   developer community. The sandbox is designed to host analysis
   components like annotators, parser or consumers, as well as UIMA
   tooling. All the components developed in the sandbox are free to use
   and licensed under the Apache Software License
   . 


   A list of proposed analysis components and tooling for UIMA is
   available at the UIMA wiki
    and can
   be discussed there.


I would not tell the people what's currently in since we have no release 
for the sandbox. My plan is to have also a release for the sandbox as 
soon as we have released the core
framework. For the next UIMA release we can add the components that have 
already been released in the sandbox. Does this make sense?


-- Michael




Re: Writing something about the Sandbox

2007-01-23 Thread Michael Baessler

Adam Lally wrote:

On 1/23/07, Tong Fin <[EMAIL PROTECTED]> wrote:

I think Sandbox may have different meaning for other Apache project
developers: It is a place that developers develop the code to 
test/try new
ideas/concepts. If the developed code is mature enough, it will be 
moved to

the main stream build.


I think that's what we mean, too.

Indeed, I thought the same but explained it more particularly...

I didn't think that the sandbox would have "releases".  I was
imagining that people who wanted to try out sandbox code would just
get it from SVN.  Eventually, if there's something in the sandbox
that's considered mature enough and makes sense to include the main
stream SDK build, we'll move it to the uimaj code branch.

My thoughts about the sandbox release:
- when we think (or later also others) that own of the components in 
the sandbox is "ready to use" we build a binary version of it and make 
it available

   at the download section.
- Users can browse all released sandbox components and use them out of 
the box and must not build them self


-- Michael


Re: tar files with long paths

2007-01-23 Thread Michael Baessler

Thilo Goetz wrote:
Aix out of the box ships with a version of tar that can't handle these 
files, either.  I guess we should document the issue and tell people 
to use the zip version if they have problems.  There are unzip tools 
for every platform nowadays.


Are you sure? I think if you have a plain AIX installation, there is no 
zip utility installed out of the box. I would hope that most of the 
admin installs a zip utility, so we

hopefully have no problems :-)

-- Michael



Re: test plan put up on cwiki

2007-01-29 Thread Michael Baessler

Marshall Schor wrote:
Put the Jira issues I think have some doc relevance into a table. Doc 
issues in Jira marked as closed I did not include... I presume the doc 
work was completed :-)


I added a UIMA framework test plan template on the UIMA wiki 
(http://cwiki.apache.org/confluence/display/UIMA/Test+Plan+Template). I 
added some tests
that I think are not covered by our automated test framework. Please 
review the tests and add additional ones or remove tests if you think 
they are already covered by our automated test framework. Feel also free 
to improve the test case description or add topics if you think it is 
important.


My plan is to reuse the test plan template for each release, so when the 
plan is reviewed, I will copy the template to the Release Plan plan 2.1 
section also on the UIMA wiki and everyone can add his name to the tests 
he would like to execute.


-- Michael


Re: test plan put up on cwiki

2007-01-29 Thread Michael Baessler

Adam Lally wrote:

+1 (with one additional test I added, for the CPE GUI).

Note I've already done all four CPM and the CPE GUI test and I have
them as automated scripts that I can easily rerun on new levels.
Great, so when I put the plan to the release plan section, please update 
the plan with that information.


-- Michael


Re: Selecting a Logo

2007-01-29 Thread Michael Baessler

Adam Lally wrote:

The logo artwork has been posted to
http://issues.apache.org/jira/browse/UIMA-22.  Please vote for your
favorite.

We need a banner and a small logo (16x16, we can resize what's posted
to fit) for the window icon.  A larger logo I may use on the "about"
dialogs.

As for my vote... I think the white-background banner looks better and
is easier to add to the GUI (our old banner had a white-background).
And I think I would go with #6 UIMA-logo-big.png for the logo.

-Adam


My votes are: #2 and #5. But when really using #5 we have to change #2 
since this also contains the people ..:-)


-- Michael


Re: [VOTE] Approve the new UIMA logo

2007-01-30 Thread Michael Baessler

Marshall Schor wrote:

Adam Lally wrote:

Oh, BTW I am +1.  :)

-Adam

On 1/29/07, Adam Lally <[EMAIL PROTECTED]> wrote:

The proposal is to adopt the following as the UIMA logo:
https://issues.apache.org/jira/secure/attachment/12349816/UIMA-logo-big-without-people.png 



This would be used in our graphical tools as the window icon, in the
about box, and as part of the banner.  It may be used elsewhere such
as on the website or in the documentation.

Please vote as follows:
[ ] +1  In favor of adopting the logo
[ ] +0  Abstain / No Opinion
[ ] -0  Don't support adopting the logo but see no better alternative
[ ] -1  Opposed to adopting the logo






+1



+1

-- Michael


Re: test plan put up on cwiki

2007-01-30 Thread Michael Baessler

Michael Baessler wrote:
My plan is to reuse the test plan template for each release, so when 
the plan is reviewed, I will copy the template to the Release Plan 
plan 2.1 section also on the UIMA wiki and everyone can add his name 
to the tests he would like to execute.
I copied the test plan template to the 2.1 release plan section... 
please add your names to the tests you "would like to do".
After the test is executed, please provide some more descriptive 
information about what you have done.. so we get a better test case 
description for the next release.



Thanks

--Michael


Re: Banner Text

2007-01-30 Thread Michael Baessler

Adam Lally wrote:

This still needs a decision to be made.  Proposed options for the text
in the lower right portion of the banner are:

(1) An Open Source Project   [this is what's currently there]
(2) An Apache Incubator Project
(3) An Apache Open Source Effort - http://incubator.apache.org/uima
(4) http://incubator.apache.org/uima


Personally, I agree with Thilo that #3 is too much to put on a banner.
But I don't really like #4 either, with the URL as part of the
graphic (and presumably not hyperlinked, at least I'm not going to
bother getting that to work).

#2 seems about right.  Of course we'll have to remote "Incubator" when
we graduate, but that doesn't seem like a big deal.

Other thoughts/suggestions?

-Adam


#2 seems the best for me.

-- Michael



UIMA documentation improvements

2007-01-31 Thread Michael Baessler

Hi,

I have some suggestions, that I think will improve our documentation. 
Please let me know what you think.


1) I miss a readme file in our distribution that contains some first 
steps like:
   - Environment variables/setup for command line tools: how do it set 
them, so that I can use the provided script files

   - Verification steps to verify if the installation was correct
   - Getting Started section that links me to the HTML or PDF 
documents. Explain which book contains the content I'm interested in.


2) Maybe we can improve the names of our pdf files. They should at least 
contain UIMA in the file name. If someone copies the files from the
   UIMA install directory to another place, the names are not related 
to UIMA.


3) Thilo already changed the doc directory layout slightly. The docs 
directory now contains a api, html and pdf directory. Within the html 
directory there
   are the subdirectories for the different docbooks. I think it will 
be good to have something like an index.html directly in the html 
directory that links to the
   different books we have. Also it gives the user a short overview and 
tells them where to find the information he needs.




  
  


Re: Writing something about the Sandbox

2007-02-01 Thread Michael Baessler

Marshall Schor wrote:
The difference may be one of emphasis.  Or, maybe I've misunderstood 
what Michael said (sorry if true.)

I'm trying to emphasize the purpose of the Sandbox is not in
what it hosts (Annotators, Cas Consumers, Tools, etc.), but rather 
that it is more free of the constraints
we have for other projects / subprojects.  There's less need for 
consensus here, more opportunity to

try new things, some of which might be accepted eventually, others, not.
That was my intention. The sandbox is a place where new things can be 
developed... for example annotators, tools... and so on
Maybe some of these project get also integrated to the core framework. 
But I'm not sure if, e.g. annotator components will be added to the
core. I think such analysis components will ever stay in the sandbox and 
can be downloaded there. Other opinions?


And I agree with Thilo, that we should add some of the sandbox 
components to the next release. So we can provide

binary versions of components that are ready to use.

-- Michael



Re: [VOTE] accept CAS Editor (tae) bulk contribution into Sandbox?

2007-02-01 Thread Michael Baessler

Marshall Schor wrote:
The Jira issue http://issues.apache.org/jira/browse/UIMA-155 contains 
a proposed submission to the Sandbox of some

tooling supporting editing of CAS data.

As part of our process for accepting bulk submissions, we need a vote.

Please vote as follows:
[ ] +1  In favor of accepting the submission into the Sandbox.
[ ] +0  Abstain / No Opinion
[ ] -0  Have some concerns (please state) but won't block acceptance 
into the Sandbox
[ ] -1  Have serious issue (please state) that needs resolution before 
proceeding.


-Marshall



+1

-- Michael


Re: UIMA documentation improvements

2007-02-01 Thread Michael Baessler

Marshall Schor wrote:

Michael Baessler wrote:

Hi,

I have some suggestions, that I think will improve our documentation. 
Please let me know what you think.


1) I miss a readme file in our distribution that contains some first 
steps like:
   - Environment variables/setup for command line tools: how do it 
set them, so that I can use the provided script files

   - Verification steps to verify if the installation was correct
   - Getting Started section that links me to the HTML or PDF 
documents. Explain which book contains the content I'm interested in.


+1  There is a Jira issue open for this:  
https://issues.apache.org/jira/browse/UIMA-240


2) Maybe we can improve the names of our pdf files. They should at 
least contain UIMA in the file name. If someone copies the files from 
the
   UIMA install directory to another place, the names are not related 
to UIMA.



Probably late to do for 2.1 release.  Would impact the docbook build 
script.  Could do in next release.
I opened a Jira issue for this: 
https://issues.apache.org/jira/browse/UIMA-258 (release 2.2.)



3) Thilo already changed the doc directory layout slightly. The docs 
directory now contains a api, html and pdf directory. Within the html 
directory there
   are the subdirectories for the different docbooks. I think it will 
be good to have something like an index.html directly in the html 
directory that links to the
   different books we have. Also it gives the user a short overview 
and tells them where to find the information he needs.


+1 to having an index.html.  It could link, besides the docbook html 
books, other useful things, like the readmes, etc.
I would like to have the readme of 1) on the top level of the UIMA 
distribution and not in the docs sub folder. Can the index.html be 
created also using docbook with links,
or must we create that file without using docbook, since only we know 
our build structure for the html books?


-- Michael



Re: Writing something about the Sandbox

2007-02-01 Thread Michael Baessler

Marshall Schor wrote:

Michael Baessler wrote:


Maybe some of these project get also integrated to the core 
framework. But I'm not sure if, e.g. annotator components will be 
added to the
core. I think such analysis components will ever stay in the sandbox 
and can be downloaded there. Other opinions?


I prefer creating "subprojects" of Apache UIMA to hold these, for 
reasons stated in previous notes.  For instance, how about a 
subproject called "Apache UIMA Components", holding annotators?  
(Another subproject might be "corpii" - common test data, etc.)  We 
could do distributions/releases of these.
So you mean, when a sandbox analysis component is ready to use instead 
of adding it to the core framework we will add it to the "Apache UIMA 
components" project.
That's also fine with me. Though the sandbox in this case is only a 
temporary space and nothing that ever create any binary distributed 
component, right? In that case we also do not
need to add the sandbox to any release since if a sandbox component "is 
ready" we vote and add it to the core framework of to the components 
subproject. That is a kind of redefinition

of the current sandbox.

-- Michael


How to know which Jira issues are fixed with a UIMA level

2007-02-06 Thread Michael Baessler

Hi,

for all the are interested. I have written a tool, LevelIssueAnalyzer, 
that analyzes the commit logs messages based on the last level name and 
revision information and extract all Jira issue keys that are part of 
the commit messages. So the tool can show all Jira issues that have been 
fixed with the newly created level. Also for each detected Jira issue, 
the issue abstract
is retrieved from the web. That the detection works fine, the newly 
created level must have a key like "levelname:uimaj-2.1.0-003" in the 
commit log.


The usage is very simple, see the sample below:

Call the tool with the level name of the last level that was created:

LevelIssueAnalyzer uimaj-2.1.0-002

The output will be:

The created level will contain/fix the following Jira issues:
https://issues.apache.org/jira/browse/UIMA-22: [#UIMA-22] Tools still 
use IBM splashscreen - ASF JIRA

...

The tool is located in a new project called uimaj-internal-tools that is 
not part of the distribution and will not be build automatically. To 
build the component use the provided pom.xml and see the HOWTO 
documentation.


If you have any additional question or comments, please let me know.

-- Michael




Re: Next test level built

2007-02-06 Thread Michael Baessler

Thilo Goetz wrote:
Release test level uimaj-2.1.0-003 will shortly be available on people 
in /home/twgoetz/uima-distributions/2.1.0/003.


--Thilo



The level contains the following JIRA issues:
https://issues.apache.org/jira/browse/UIMA-259: [#UIMA-259] add an 
overview HTML document with links to the different HTML book
https://issues.apache.org/jira/browse/UIMA-257: [#UIMA-257] Document 
Analyzer sometimes names style map file incorrectly
https://issues.apache.org/jira/browse/UIMA-256: [#UIMA-256] CVD manual 
not displayed in distribution
https://issues.apache.org/jira/browse/UIMA-266: [#UIMA-266] 
DocumentAnalyzer also use wrong default directory docs/examples/data
https://issues.apache.org/jira/browse/UIMA-264: [#UIMA-264] CAS Visual 
Debugger does not support CAS Multiplier components
https://issues.apache.org/jira/browse/UIMA-253: [#UIMA-253] default path 
for CAS annotation viewer does not exist
https://issues.apache.org/jira/browse/UIMA-274: [#UIMA-274] CDE add new 
feature - not making visible the additional input fields for element 
types when range type changed
https://issues.apache.org/jira/browse/UIMA-263: [#UIMA-263] CAS Visual 
Debugger shows an error message when a user tries to open the log file 
but not log file was written.
https://issues.apache.org/jira/browse/UIMA-262: [#UIMA-262] CAS Visual 
Debugger command line parameters does not work
https://issues.apache.org/jira/browse/UIMA-272: [#UIMA-272] CVD manual  
& help missing?
https://issues.apache.org/jira/browse/UIMA-261: [#UIMA-261] In Glossary 
section of docs, linked glossary terms are not rendered.
https://issues.apache.org/jira/browse/UIMA-260: [#UIMA-260] Set env vars 
in setUimaClasspath and the Eclipser run configs
https://issues.apache.org/jira/browse/UIMA-280: [#UIMA-280] add 
uimaj-internal-tools project and add tooling to detect jira issues for 
an UIMA level.
https://issues.apache.org/jira/browse/UIMA-22: [#UIMA-22] Tools still 
use IBM splashscreen


Open test items for UIMA release 2.1

2007-02-07 Thread Michael Baessler

Here is the list with open test items for the UIMA release 2.1

- There are some documentation chapters that are currently not reviewed. 
Please see the TestPlan2.1 on the wiki for details

  http://cwiki.apache.org/confluence/display/UIMA/TestPlan2.1

- We have some unassigned test cases like:
   - UIMA_uimaj_Core_003 - sofa related test
   - UIMA_uimaj_Examples_001 - Test provided UIMA examples


So please assign and finish testing also on these items so that we get 
all done by the end of the week.


-- Michael


Re: Test level uimaj-2.1.0-004

2007-02-09 Thread Michael Baessler

Thilo Goetz wrote:
The level is available as usual on people.apache.org, in 
/home/twgoetz/uima-distributions/2.1.0/004.  Below I include the 
output from Michael's level analyzer.


This is clearly not the final level, as at least I have some more 
things to take care of.  There was also a high level of bug fixing 
activity over the last few days.  I can build a level on the weekend, 
if that's helpful to anyone.  Otherwise, I would like to aim for 
Monday to build a release candidate and start the vote.  Let me know 
what you all think.
I successfully verified my open test cases with uimaj-2.1.0-004. So it's 
fine with me to build the final level on Monday.


-- Michael


Re: releasing - looking at what others are doing

2007-02-11 Thread Michael Baessler

Adam Lally wrote:

You don't need a tool to do this.  Just go to JIRA and click the
"Release Notes" link at the top.

I opened an issue UIMA-254 a while back that says we need release
notes, and that JIRA can generate them.
I agree we Adam, for creating release notes, we can also use JIRA, we 
don't need to use my tool.
My tool is helpful when you need to to know which issues were fixed for 
levels during the test for example, or when having branches/tags; what 
was fixed based on such a tag/branch.


Further, I played around with the JIRA release notes generation and 
figured out that the release notes are created based on the the "Fixed 
Version" attribute of an issue.
So if that field contains no data, the issue is not added to the release 
notes of the selected release. :-) We have about 50 fixed issues that 
have no version specified in the
"Fixed Version" field. So some of them are certainly issues from the 
migration of UIMA to Apache and it doesn't matter if they appear in the 
release notes or not. But others, I think should appear.


So my suggestion is, that the assignee reopens these issues that should 
appear in the release notes and "fixed them again" with adding the 
"Fixed Version" attribute correctly.

What do others think?

-- Michael




Re: releasing - looking at what others are doing

2007-02-11 Thread Michael Baessler

Marshall Schor wrote:

Michael Baessler wrote:

Adam Lally wrote:

You don't need a tool to do this.  Just go to JIRA and click the
"Release Notes" link at the top.

I opened an issue UIMA-254 a while back that says we need release
notes, and that JIRA can generate them.
I agree we Adam, for creating release notes, we can also use JIRA, we 
don't need to use my tool.
My tool is helpful when you need to to know which issues were fixed 
for levels during the test for example, or when having branches/tags; 
what was fixed based on such a tag/branch.


Further, I played around with the JIRA release notes generation and 
figured out that the release notes are created based on the the 
"Fixed Version" attribute of an issue.
So if that field contains no data, the issue is not added to the 
release notes of the selected release. :-) We have about 50 fixed 
issues that have no version specified in the
"Fixed Version" field. So some of them are certainly issues from the 
migration of UIMA to Apache and it doesn't matter if they appear in 
the release notes or not. But others, I think should appear.


So my suggestion is, that the assignee reopens these issues that 
should appear in the release notes and "fixed them again" with adding 
the "Fixed Version" attribute correctly.

What do others think?


+1 to marking things fixed in 2.1 as fixed in 2.1.
Did your comment mean to imply that for things that are not real bug 
fixes but rather are just porting, we should mark as "blank"?  I don't 
think I like that idea.
Here's a couple of thoughts.  My (slight, easy to talk me out of it) 
preference is to just go ahead and mark everything in Jira that is 
included in 2.1, as being included in 2.1.  Yes, that will be a big 
list.  But we can say why it's so big.


Alternatively, we could create another release number - call it 
2.1-migration-to-Apache for instance, and mark those things that we 
think should not be in the 2.1 "fix" list, as being in this other 
list.  That way, we're clearly documenting things, making it easier 
for the reviewers (and future users) to understand what's going on.


-Marshall



So in this case, I would mark all issues that are currently fixed as 
fixed in 2.1. I'm fine with that. I don't like the idea creating another 
release. I think we cannot differentiate all issues clearly into two 
releases.


-- Michael.


Re: [jira] Commented: (UIMA-210) faulty use of .read(buffer...) in several places - not checking for fewer than expected bytes/chars read

2007-02-12 Thread Michael Baessler

Marshall Schor wrote:

Michael Baessler (JIRA) wrote:
[ 
https://issues.apache.org/jira/browse/UIMA-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472098 
]

Michael Baessler commented on UIMA-210:
---

What exactly do you want me to check?
  


Hi Michael - in the best of all possible worlds, (1) code inspection 
of the changes to see if they look right, (2) running a test or 2 to 
see if the code works :-)
In the real world - probably just a code inspection, and if you have 
one test that exercises one of the instances of this, that would be 
good to run :-)


So let me check if I understand that all correct.

My code was:

   public static String file2String(File file) throws IOException {
   // Read the file into a string using a char buffer.
   char[] buf = new char[1];
   int charsRead;
   BufferedReader reader = new BufferedReader(new FileReader(file));
   StringBuffer strbuf = new StringBuffer();
   while ((charsRead = reader.read(buf)) >= 0) {
 strbuf.append(buf, 0, charsRead);
   }
   reader.close();
   final String text = strbuf.toString();
   return text;
 }


you changed that to:

 public static String file2String(File file) throws IOException {
   BufferedReader bReader = new BufferedReader(new FileReader(file));
   int length = (int)file.length();
   // Read the file into a string using a char buffer.
   char[] buf = new char[length];
   try {
 // will read all the chars of the file, calling read repeatedly
 // as needed in the underlying layer
 //  Note: this 3 argument version is the only one documented
   to do this.
 bReader.read(buf, 0, length);
   } finally {
 bReader.close();
   }
   return new String(buf);
 }


and later to:

public static String file2String(File file) throws IOException {
   return reader2String(
   new FileReader(file),
   (int) file.length());  
 }

 public static String reader2String(Reader reader, int bufSize)

   throws IOException {
   char[] buf = new char[bufSize];
   int read_so_far = 0;
   try {
 while (read_so_far < bufSize) {
   int count = reader.read(buf, read_so_far, bufSize -
   read_so_far);
   if (0 > count) {
 break;
   }
   read_so_far += count;
 }
   } finally {
 reader.close();
   }
   return new String(buf, 0, read_so_far);
 }

This version seems to be correct, but not efficient for e.g. Chinese 
documents. In that case the character buffer is too big.


BTW: This version works fine when running the WhitespaceTokenizer test 
cases that use the FileUtils implementation.


-- Michael


Re: releasing - looking at what others are doing

2007-02-12 Thread Michael Baessler

Adam Lally wrote:

On 2/11/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

So my suggestion is, that the assignee reopens these issues that should
appear in the release notes and "fixed them again" with adding the
"Fixed Version" attribute correctly.
What do others think?



BTW you can use the JIRA "Bulk Change" feature to update many issues
at once.  Just do a search for all your closed/resolved issues that
have no "fix version", then click the "bulk change" link.  You can
reopen all the issues, then close them all again assigning the correct
fix version.  However this loses the setting of the "resolution" field
which is annoying.

I think you have see it... it is done. :-)

For the next time, I saw that you can mark a check box that no email is 
send out for these changes. :-)




Re: setting up for download of release

2007-02-12 Thread Michael Baessler

Marshall Schor wrote:

I looked at several incubator projects to see how they're doing things.

Most have a downloads html page on their website (so do we - for 
downloading the Eclipse Code Style Prefs).


Where they host the download zip / tar file varies, including:
  people.apache.org/some-user   the project website, under a dir 
called "downloads"
  The maven repository for incubating projects:  
people.apache.org/repo/m2-incubating-repository/
 
org/apache/openjpa/openjpa-project/0.9.6-incubating/openjpa-project-0.9.6-incubating-binary.zip 



I don't know the reason for the repeated names in this example - they 
seem redundant to me.


Have we discussed how to do this?  My preference would be to borrow 
something like OpenJPA does
(incubator.apache.org/openjpa/downloads.html) and change it to suit.  
I don't have a strong
feeling about using Maven repository - I think that may be useful if 
other projects are treating your project
as a maven component - and I don't think we've crossed that bridge 
yet, so something simpler would probably
be better for now?   My (slight) preference would be to have our 
downloads in our existing /downloads directory.


I suppose we need a "temporary place" to put these before they are 
officially blessed.  Maybe we could set up a dir called
to_be_approved, with subdirs /downloads and /download.html - this 
latter being a proposed update to our existing

download page.  Other opinions?

-Marshall



+1

The simple way without using Maven sounds good to me.

-- Michael


Re: [jira] Commented: (UIMA-299) Remove SNAPSHOT from version numbers prior to release

2007-02-13 Thread Michael Baessler

Adam Lally wrote:

If it's agreed that our next build should be a release candidate,
should I go ahead and update the version numbers to 2.1.0-incubating
right now?
I think we should do the change now. The next level will be the first 
release candidate where all should be in place.


-- Michael


Re: Remove Sandbox and Website issues from 2.1?

2007-02-13 Thread Michael Baessler

Adam Lally wrote:

I took a look at the release notes generated by JIRA.  We might want
to take out some of the things relating to the sandbox and website
since they aren't part of the release.  Examples:

# [UIMA-95] - add sandbox infrastructure
# [UIMA-151] - Add project for uima whitespace tokenizer implementation
# [UIMA-154] - add snowball annotator project
# [UIMA-69] - update website with SVN commit comment guidelines for
JIRA issue tracker

There may be others, we could search based on the "component" field in
the JIRA issue.


Also are the following in the source distribution?  If so I guess they
can stay in the release notes, if not we might want to get rid of them
as well:
# [UIMA-152] - add component test utilities project
# [UIMA-280] - add uimaj-internal-tools project and add tooling to
detect jira issues for an UIMA level.
I removed all closed issues from the 2.1 release that belongs to the 
components "Sandbox" and "Website".


All closed issues of the component "No component" are still in. I think 
it is OK, since most of the issues are pre-setup of code.


-- Michael


Re: Release Notes committed, ready for release candidate?

2007-02-13 Thread Michael Baessler

Adam Lally wrote:

On 2/13/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:

I have created a tag for the release candidate and will upload the files
soon.  Meanwhile, I'm wondering how we should proceed with checking in
new code for the next few days.  If we want to continue/start
development for the next release, we need to branch now.  Should we do
that, or can we hold off checking in new code for the next few days?  If
we branch now and find any more issues with the release candidate, we'll
need to make fixes in two branches.  On the other hand, we don't expect
any more issues, do we ;-)

Opinions?  I'd personally be happy to wait with branching until we've
had the first round of feedback from the Incubator PMC folks...



I'm happy to hold off on checking anything new in for now.
I agree, to wait with new stuff  until we get feedback from the 
Incubator PMC.


If anyone is idle during this time... we also have the sandbox :-)

-- Michael


Re: [VOTE] accept CAS Editor (tae) bulk contribution into Sandbox?

2007-02-14 Thread Michael Baessler

Thilo Goetz wrote:

Hi Joern,

we're all a bit pressed for time while we're preparing for our first 
release.  We'll try to check in your code during the next few days. 
Once that's done, we should start discussions on the dev list on how 
to proceed: what your plans are for further development, what features 
to add etc.


As Marshall mentioned, commit access is not usually granted to a new 
contributor immediately.  I hope that was clear to you before you 
decided to contribute your code.  In addition to the links Marshall 
provided, I'd like to draw your attention to 
http://www.apache.org/foundation/how-it-works.html, which explains 
things pretty well.  We hope you will continue contributing to UIMA 
and become a committer.

Hi Joern,

I have checked in your code to the UIMA sandbox. The URL is: 
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/CasEditor/


I created only one project called CasEditor and hope that the eclipse 
plugins you have work also with a single eclipse project layout. If not, 
we have to separate them again.
I also created a initial pom.xml based on the uima eclipse plugins, but 
there are still some open items to work on (final names, dependencies...).


Before I checked in the code I re-factored the packages to 
org.apache.uima.caseditor and remove old UIMA 1.4 dependencies.


I think we should generate a list of TODOs what we have to do until the 
code is ready to use. E.g.

   - get the code compiled as plugin with Maven
   - Test if the plugin still works with the current UIMA version
   - remove author and CVS comments
   - ...
Maybe you can do that on the UIMA Wiki 
(http://cwiki.apache.org/confluence/display/UIMA/UIMA+Sandbox+Components)


Please take a look at the code and let me know what you think. Any 
further changes from your side should be handled as patches, so we can 
easily check it in.


Please let me know what you think. Improvements and suggestions are welcome.

-- Michael



Re: [VOTE] accept CAS Editor (tae) bulk contribution into Sandbox?

2007-02-15 Thread Michael Baessler

Jörn Kottmann wrote:
This will brake some smaller things, but nothing which cannot be 
changed. Maybe this will cause some

troubles for the plugin tests, but I am not sure.

What is the reason to merge them ?
Besides the comments Marshall already did, the Maven build will be very 
complex for the current sandbox structure if we divide your stuff into 
three projects.


I think it is now difficult to get it back working since you have done 
a few changes at once.
New namespace for all classes, porting from IBM Uima to Apache Uima, 
merging and

maven support.

In my opinion the fastest way to get everything back working is that I 
do these changes by myself

and then attach at it again to the JIRA issue (as patch ?).

I have to do the following task to complete the contribution:

+ Change namespace
+ remove author and cvs tags
+ Port to apache uima
+ Maybe merge the plugins
+ add maven support

What do you think ?
When you think it is easier for you to do this all again, it's fine with 
me. But it will take some time.

But if doable, please use the current CasEditor layout I already checked in.

-- Michael


Re: [VOTE] accept CAS Editor (tae) bulk contribution into Sandbox?

2007-02-15 Thread Michael Baessler

Jörn Kottmann wrote:

Ok I have now merged my version into the CasEditor project from svn.

Can I add an extra test folder for the plugin tests e.g 
src/plugintest/java ?

The default directory layout that Maven would like is:

For production code:
src/main/java
src/main/resources

For test code:
src/test/java
src/test/resources

Only these directories are automatically recognized by Maven, so if you 
could use these, that will be great.
If you have to use another one like src/plugintest/java you have to 
specify this separately in the pom.xml.


My code depends on java 5.
How can I overwrite the compiler settings inside the pom.xml 
(currently it tries to use java 1.4)?

You can add the following to the pom.xml


org.apache.maven.plugins
maven-compiler-plugin

1.5
1.5



in addition to the other plugins.

-- Michael




Re: [VOTE] Release of Apache UIMA 2.1.0

2007-02-16 Thread Michael Baessler

Thilo Goetz wrote:
I'm uploading release candidate 3 to 
/home/twgoetz/uima-distributions/2.1.0/RC3 on people.a.o.  Changes to 
the previous version are confined to fixing licensing issues:


- added LICENSE and NOTICE files to all jars
- added license header to one Java source file and one XML file
- removed one redundant file

I went through the RAT reports again and think we're clean.  We all 
know how much time we spent working on test and documentation the last 
few weeks.  So please cast your vote on releasing uimaj-2.1.0-RC3 as 
Apache UIMA 2.1.


[ ] +1 Approve release
[ ] -1 Veto release (please give reason)

--Thilo



+1

-- Michael


Re: [jira] Updated: (UIMA-155) add cas editor (tae) project

2007-02-21 Thread Michael Baessler

Marshall Schor wrote:
I applied this patch, and committed the changes except for 2 files 
which I think are not "needed":


build.xml
preferences.ini

If these should be checked in, please explain the reason.  I think 
they're either built by the mvn eclipse:eclipse operation or are 
particular to individual users depending on their personal Eclipse setup.


In order to get this to compile properly, I had to make some import 
changes, in particular, there were imports:


import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;

which I changed to

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;


Is this correct?  The other imports seem to be particular to the sun 
jvm and not part of other jvms.  They're also marked "internal" which 
makes me think they can't be safely used here.

I think using

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

so not OK. When I remember correctly, these classes are not available 
when using a plain SUN JVM. Is that possible?


I think we changed in the past the whole UIMA code, that we don't have 
to use these classes. Is it possible to use the


org.apache.uima.util.XMLSerializer that can also write formatted output 
using javax.xml.transform.Transformer with javax.xml.transform.OutputKeys?


-- Michael





Re: Enhancement to XMI deserializer as a foundation for remote parallel processing

2007-02-22 Thread Michael Baessler

Adam Lally wrote:

I am working on some modifications to the XMI deserializer that will
allow the following scenario:

1) A CAS is serialized to XMI.
2) Copies of the XMI documents are sent to multiple remote services
3) Each remote service appends to the CAS (does not delete or modify
existing stuff) and responds with a new serialized XMI CAS)
4) The multiple XMI responses are all merged back into a single CAS 
instance


This would permit multiple remote services that don't depend on each
other to run in parallel, assuming they only append to the CAS (which
is common).  Of course there's other work on the runtime needed to
actually do the parallel invocations, but XMI serializer support is a
prerequisite.

The basic XMI deserializer changes aren't too complicated.  First,
when the CAS is originally serialized in step 1, we make available to
the caller the maximum xmi:id value in that CAS (this is mostly
already done, we just need to add a public accessor for this value).
Next, we allow passing this value to the deserializer as an optional
argument that essentially says "deserialize only XMI elements whose
xmi:id is greater than the specified value".

Any comments on this approach?
Currently don't understand how the xmi:id is generated. For example we 
have a CAS in step 1 that is serialized with a max xmi:id of 100.
Now two other remote services get this XMI document and do their 
processing on it and serialize the CAS again. First service adds for 
example FS with ids from 101 to 150, right?
And what do the second process? I think it also starts with numbering at 
101. So how are the XMI documents merged in the XMI deserializer. Is it 
necessary
so merge the xmi:id attributes before creating the CAS or is it 
sufficient to just read the additional xmi:ids (greater than 100) and 
add them to the CAS?


I think when xmi:ids must be merged before the CAS is created the XMI 
deserializer have to take care about the references with special offset 
values. That is easy when using



  


but more difficult when using




How will be merging be done?

-- Michael


Re: Enhancement to XMI deserializer as a foundation for remote parallel processing

2007-02-22 Thread Michael Baessler

Adam Lally wrote:

On 2/22/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

Currently don't understand how the xmi:id is generated. For example we
have a CAS in step 1 that is serialized with a max xmi:id of 100.
Now two other remote services get this XMI document and do their
processing on it and serialize the CAS again. First service adds for
example FS with ids from 101 to 150, right?
And what do the second process? I think it also starts with numbering at
101.


Yes, that's correct.  The IDs used by the two services for new FS they
create will not be unique.


So how are the XMI documents merged in the XMI deserializer. Is it
necessary
so merge the xmi:id attributes before creating the CAS or is it
sufficient to just read the additional xmi:ids (greater than 100) and
add them to the CAS?



Good questions...

It actually should work without having to merge (if you mean to make
unique) the IDs produced by the services.  Take your example, when the
original max ID is 100 (I call 100 the "merge point"), and the two
responses "xmi1" and "xmi2" both have appended FS with xmi:id=101:

The deserialization of xmi1 works as normal - all FS including the one
with ID 101 are added to the CAS.  When the deserialization of xmi2 is
done, the deserializer code is told that the "merge point" is 100.
This effect of this is that all of the FS with xmi:id <= 100 are
ignored.  The FS with xmi:id = 101 will be added to the CAS as a new
FeatureStruture.  It doesn't matter that xmi1 also had an FS with the
same id - the deserializer will know that because 101 is greater than
the mergePoint, a new FS should be created.

As for references between FS.. during the deserialization of xmi2, the
deserializer knows that any reference to an xmi:id with value <= 100
is a reference to an FS that pre-existed in the CAS, while any
reference to an xmi:id with value > 100 is a reference to an FS that
is part of the xmi2 document that's currently being deserialized.  So
all the information is there to do this correctly.  If xmi2 contained
a reference to id 101, the deserializer would know that this was
supposed to refer to id 101 in xmi2, NOT in xmi1.



I think when xmi:ids must be merged before the CAS is created the XMI
deserializer have to take care about the references with special offset
values. That is easy when using


   


but more difficult when using





The deserializer knows the TypeSystem of the CAS, so will know that
myFoo is defined as a reference and not an integer.  This is in fact
needed for "normal" deserialization to work, even without merging.

Thanks!

So the xmi:ids that are not unique must not be merged they can just be 
added to the CAS one XMI document after the other.


-- Michael







Re: Thoughts on extending FlowController API

2007-02-28 Thread Michael Baessler

Adam Lally wrote:

On 2/26/07, Adam Lally <[EMAIL PROTECTED]> wrote:

3) Notification of errors to allow continuing after a failure.  This
would support an action like the current CPM's "continue" action.
There would be a new API:
Flow.onFailure(String failedAnalysisEngineKey, Throwable failure)

If the runtime wanted to continue after a failure, it would call this
method on the Flow Controller, and then would go back to calling
hasNext/next.  Without this notification, a "continue" action wouldn't
make much sense, because a dynamic FlowControlle may make an
assumption that the last step it issued completed successfully.

Note for #2 and #3 I'm not intending on having the existing framework
call these methods, yet.  These Flow Controller extensions are a
prerequisite for doing more advanced flow things like parallel flows
and error recovery.



Actually as I think about it more I wonder if it would be better if
when I add the Flow.onFailure() I also change the framework to call
this method when an error occurs.  The existing FixedFlowControllers
(such as fixed flow) could just refuse to continue, so the default
beahvior would be unchanged.  This could be a configuration parameter
on the FixedFlowController so people could configure their AEs to
continue after errors.  It seems like this would provide some value so
may be worth doing, rather than just adding a method that's never
called.

Possibly we might want to allow the application to control whether the
FlowController is consulted when an error occurs.  This could be made
configurable through the additionalParams map when the Aggregate AE is
constructed.  Then an application could always use "terminate on
error" mode if desired, regardless of the FlowController being used. 
Sound also be more reasonable for me... but some additional 
comments/questions form my side.


How does it work with the additionalParams map to configure my 
application to 'continue'
or 'terminate' in case of errors. Will it be configurable for each 
analysis engine separately?
I think it would be very useful since the error handling depends on the 
analysis engine. So when using the additionalParams map, does the 
application
have to take care how to get the configuration or will that be part of 
any of the common descriptors?
I think a good place to specify this will the flowConstraints section in 
an aggregate descriptor.


When having a build-in flow, it can look like:
   
 
   ae1
   ae2
 
   

but when having a FlowController plugged in, this section is missing. 
But I wonder why. I think for these flows, the order of the
analysis engines can also be relevant. How does this work currently? I 
think the order of the analysis engine definition is used, right?

Why we don't have a section like:

   
  
   ae1
   ae2
 
   

to specify the customFlow items in a oder of choice. So it will be easy 
possible to add additional information for each analysis engine to the 
FlowController.


-- Michael




Re: Thoughts on extending FlowController API

2007-03-02 Thread Michael Baessler

Adam Lally wrote:

On 2/28/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

but when having a FlowController plugged in, this section is missing.


Actually it is possible, but not required, to have a 
section when using a custom FlowController.  (I think the CDE supports
this too, but I'm not sure.)


But I wonder why. I think for these flows, the order of the
analysis engines can also be relevant. How does this work currently? I
think the order of the analysis engine definition is used, right?


The reason it's optional is that a custom FlowController often
wouldn't use a fixed ordering of AnalysisEngines - it may make dynamic
flow decisions based on other criteria.

Note that FlowControllers can define configuration parameters just
like AEs can, so whatever information the FlowController needs to make
routing decisions can be provided that way, if it can't be represented
by the  object.

The ordering of analysis engine definitions can't be used to make flow
decisions.  These are put into a HashMap and the ordering is lost
before it gets to the FlowController.

But how does it work when I would like to implement something like a 
CapabilityLanguage flow we already have as build-in
flow. When I implement this a custom flow I would like to specify the 
possible order of analysis engines. The custom flow can now decide if
all the engines are called but if, please use the order I have 
specified. So I think in this case, it is not a fixed flow that a use 
will specify for the custom flow.


-- Michael


Re: Thoughts on extending FlowController API

2007-03-02 Thread Michael Baessler

Adam Lally wrote:

On 3/2/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

But how does it work when I would like to implement something like a
CapabilityLanguage flow we already have as build-in
flow. When I implement this a custom flow I would like to specify the
possible order of analysis engines. The custom flow can now decide if
all the engines are called but if, please use the order I have
specified. So I think in this case, it is not a fixed flow that a use
will specify for the custom flow.



You can specify both a custom FlowController AND a  (or
) element in your descriptor.  Your
FlowController will be called but it can access all of the
AnalysisEngineMetdata from the aggregate descriptor, including the
fixedFlow and capabilityLanguageFlow sections.
But how can I specify a order for my custom flow? Is this also possible 
or do I have to use configuration parameter settings?


-- Michael



Re: Thoughts on extending FlowController API

2007-03-06 Thread Michael Baessler

Adam Lally wrote:

On 3/2/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

But how can I specify a order for my custom flow? Is this also possible
or do I have to use configuration parameter settings?

Like this:

xmlns="http://uima.apache.org/resourceSpecifier";>

org.apache.uima.java
false


 
   
 
 
   
 


 



Aggregate with custom flow controller
1.0
The Apache Software Foundation


 
   a1
   a2
 






Your custom flow controller in its intialization can query the
 and find out that the sequence is supposed to be a1,
a2; and it can act accordingly.
But is this not a little bit confusing for our users when using a 
fixedFlow constraint just to configure a custom flow?


I think it would be better to have an additional flowConstraint tag like 
 to specify the order of the custom flow. So that fixedFlow 
and capabilityLanguageFlow must only be used when

these flows are used.

-- Michael


org.apache.uima.flow.impl.FixedFlowController.java

2007-03-06 Thread Michael Baessler

Hi,

the current org.apache.uima.flow.impl.FixedFlowController.java has some 
invalid UTF-8 characters in the "possible values" comment. So the file 
could not be read in my workspace.

I changed the bad characters and replaced them with a ":".

So all should be fine now.

-- Michael


Re: org.apache.uima.flow.impl.FixedFlowController.java

2007-03-06 Thread Michael Baessler

Adam Lally wrote:

On 3/6/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

Hi,

the current org.apache.uima.flow.impl.FixedFlowController.java has some
invalid UTF-8 characters in the "possible values" comment. So the file
could not be read in my workspace.
I changed the bad characters and replaced them with a ":".

So all should be fine now.



Sorry about that.  Is there an Eclipse preference we can set to
prevent this from happening again?


You can set your default text encoding to UTF-8.

Eclipse preferences -> General -> Workspace -> Text file encoding

-- Michael



Re: Ready for RC4?

2007-03-07 Thread Michael Baessler

Thilo Goetz wrote:
I'll be gone on vacation for the next two weeks starting this 
Saturday, so Michael is taking over doing the builds.  He'll build 
this one already so he gets in the habit ;-)


I wonder if we'll have a release by the time I get back...
OK, the release candidate - RC4 - is ready and available at:  
http://people.apache.org/~mbaessler/UIMAReleases/uimaj-2.1.0-incubating-RC4/. 



The level has passed our automated regression test suite without errors.

-- Michael


Re: RAT reports?

2007-03-07 Thread Michael Baessler

Adam Lally wrote:

Thilo or Michael, can you produce new annotated RAT reports for RC4 as
you did for the previous release candidate?

Thanks,
 -Adam



We are currently working on it...

-- Michael


Re: RAT reports?

2007-03-07 Thread Michael Baessler

Michael Baessler wrote:

Adam Lally wrote:

Thilo or Michael, can you produce new annotated RAT reports for RC4 as
you did for the previous release candidate?

Thanks,
 -Adam



We are currently working on it...
The RAT reports are ready and available at 
http://people.apache.org/~mbaessler/UIMAReleases/uimaj-2.1.0-incubating-RC4/


-- Michael


Re: Ready for RC4?

2007-03-07 Thread Michael Baessler

Adam Lally wrote:

On 3/7/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

OK, the release candidate - RC4 - is ready and available at:
http://people.apache.org/~mbaessler/UIMAReleases/uimaj-2.1.0-incubating-RC4/. 





I did a diff between RC4 and RC3 and see no unintended changes.  There
appear to be no .class file changes at all.  The images appear
corrrect in the pdfs now.

I checked that the signatures check out, the src dist builds, and
documentanalyzer runs on a simple example.

OK the RAT reports are in place and all seems to be fine.

Adam will you post the new release again on Apache incubator?

-- Michael


Re: RAT reports?

2007-03-07 Thread Michael Baessler

Adam Lally wrote:

On 3/7/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

The RAT reports are ready and available at
http://people.apache.org/~mbaessler/UIMAReleases/uimaj-2.1.0-incubating-RC4/ 





The RAT reports seem to have many mentions of RC3 in them, for example:
c:/code/RC3/uimaj-2.1.0-RC3/uimaj-distr/target/uimaj-2.1.0-incubating/DISCLAIMER 



Ups... I fixed this now. I compared the two RAT reports from RC3 and RC4 
to check the differences. As there were only a few changes in the 
reports, so we decide to
just update the annotated RAT report Thilo created last time with the 
changes. Seems that the locations are not updated correctly. But this 
should be fixed now.


The current RC4 RAT report is also available containing all the data.

-- Michael


Re: [VOTE] Release uimaj-2.1.0-RC4

2007-03-07 Thread Michael Baessler

Adam Lally wrote:

Release candidate 4 is located here:
http://people.apache.org/~mbaessler/UIMAReleases/uimaj-2.1.0-incubating-RC4/ 



This release fixes issues with our NOTICE file, removes redundant
LICENSE and NOTICE files from the source distrib., corrects some of
the metdata in our Maven POM files, and has a rebuild of the
documentation to correctly include figures. There are no code changes.

Please cast your vote on releasing this as Apache uimaj-2.1.0-incubating:

[ ] +1 Approve release
[ ] -1 Veto release (please give reason)


Here is my +1.

-Adam



+1 from my side.

-- Michael


Re: RAT reports?

2007-03-07 Thread Michael Baessler

Adam Lally wrote:

On 3/7/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

Ups... I fixed this now. I compared the two RAT reports from RC3 and RC4
to check the differences. As there were only a few changes in the
reports, so we decide to
just update the annotated RAT report Thilo created last time with the
changes. Seems that the locations are not updated correctly. But this
should be fixed now.

The current RC4 RAT report is also available containing all the data.



OK that looks better.

However this file was removed from the source distrib. but is still
showing in the RAT report:
 N 
D:/code/apache/uimaj-2.1.0-RC4/uimaj-distr/target/uimaj-2.1.0-incubating/uima-docbooks/LICENSE.txt 




I also fixed that... and it seems that the idea was a little bit buggy. 
So maybe it is better to focus on the full RAT report that was created 
to RC4.


I have also rechecked the files... so I hope I used the correct versions 
and there are no more errors. :-)


-- Michael


Re: Release notes for RC4

2007-03-08 Thread Michael Baessler

Adam Lally wrote:

Michael, could you please post the RELEASE_NOTES.html file from RC4 in
your public_html directory under
UIMAReleases/uimaj-2.1.0-incubating-RC4?  I'd like to link to it from
the VOTE message I send to [EMAIL PROTECTED]

Thanks,
 -Adam



Done.

-- Michael


Re: Thoughts on extending FlowController API

2007-03-09 Thread Michael Baessler

Adam Lally wrote:

More thought on the flow controller / flowConstraints topic:

I think there's a fundamental question here as to how the flow ought
to be specified, now that we've opened things up so that the flow
specification might take a variety of forms, not just a flat list.

Do we want to:
(a) support specifying the flow through the FlowController's
configuration parameters

OR

(b) support extending the  section of the aggregate
descriptor with new kinds of flows in addition to  and
.  We might even imagine a  that
could be filled in with arbitrary XML, it being the FlowController's
job to make sense out of this.


An advantage of (a) are that we use the common configuration parameter
mechanisms we already have, so for example we could use the same GUIs
we use for setting other parmeters to also set the parameters on the
flow controller.  (In contrast, if we allow arbitrary XML, the user
would need an XML editor to be able to edit the flow.)

Advantages of (b): It's closer to what the user already knows.  It can
be much less verbose than using configuration parameters (which also
require overrides in the aggregate if the flow is to be specified
there).  If there's already an XML syntax for the flow it could
potentially be used directly.  (Although this last could also be done
with an external resource referring to a separate file containig the
flow definitions.)


I prefer (b). I think the users already know how to specify a flow and 
it seems to be easier for me to specify

the flow using the flowConstrainst than using configuration parameters.

I'm currently not sure what will be the best way to do this, but we 
should try to get feedback from out users/community what they think 
about this. Do we know anyone that is already using

custom flow controllers?

-- Michael



Re: Allowing custom service adapters to be plugged in to UIMA

2007-03-14 Thread Michael Baessler

Adam Lally wrote:

Currently there's no easy way to plug in an additional kind of service
adapter (to support a protocol other than SOAP or Vinci).  UIMA
already has the foundation for pluggable adapters, with its use of
descriptors and factory methods that produce Resource objects (like
AnalysisEngines) from descriptors.  But we've never provided a way for
users to plug in their own adapter classes without editing internal
framework configuration files.  Here's a simple suggestion that would
change that:

We could add a new ResourceSpecifier (descriptor) type:
xmlns="http://uima.apache.org/resourceSpecifier";>

com.foo.MyCustomServiceAdapter


...



The  specifies the exact name of some user class
which must be located on the classpath (the UIMA extension classpath
will work, if provided).  That class must implement the UIMA Resource
interface (for an AE service adapter it would also have to implement
the AnalysisEngine interface).  The Resource interface provides a
method initialize(ResouceSpecifier,Map) which the factory calls and
passes the resource specifier.  The user would implement the
initialize method to read the  and set itself up.

All the basic support for this is already there.  It's relatively easy
to add a new kind of ResourceSpecifier and the associated factory for
instantiating the Resource from the specifier.  Then there would be
the documentation about how to implement your resource class, which
would be a little more work.

Thoughts?

This sounds very useful to me.

+1 for doing this.

-- Michael



Re: Posted release files

2007-03-15 Thread Michael Baessler

Adam Lally wrote:

I moved our release artifacts under www.people.apache.org/dist/uima
and modified our downloads page to point to them.  I did the svn
extract to www.incubator.apache.org/uima so it should go live soon.
Once that happens we should send an announcement to [EMAIL PROTECTED],
uima-dev, and uima-user.  I can do that tommorrow morning, or anyone
else can feel free to do it first.  There are some example of
announcements in the [EMAIL PROTECTED] archives.
What do you think should we add an additional link to the KEYS file in 
the SVN repository to our download page. There is already a link 
available, but this goes to the trunk version of the KEYS file.
Currently the file content is the same, and hopefully that will not 
change in the future. But we never know.


So I thought about a link like:
http://svn.apache.org/repos/asf/incubator/uima/uimaj/branches/uimaj-2.1.0/uimaj-distr/src/main/readme/KEYS

So users that want to verify the release have the correct KEYS file in 
place.


I also verified the download of the files. All seems to be correct. The 
user can start downloading :-)


-- Michael


UIMA pear runtime

2007-03-15 Thread Michael Baessler
Currently it is not possible to run an installed pear file out of the 
box in UIMA. I mean by just specifying the pear installation path or 
something similar.
To run installed pear files there is a lot of user configuration and 
implementation necessary. So it would be nice to have a UIMA pear 
runtime that can run an installed pear file out of the box.


With the suggestion of having custom resource specifiers 
 we can provide an easy way to integrate such a 
UIMA pear runtime.
We just have to implement a new PearAnalysisEngineWrapper that extends 
the AnalysisEngineImplBase class that knows how to start an installed 
pear file. All the necessary information is available and can be parsed 
from the metadata of the installed pear. The utilities, e.g. to 
dynamically load the classes (UIMA extension class loader) is also in 
place and can be used. So an example of the  
can look like:


http://uima.apache.org/resourceSpecifier";>
   org.apache.PearAnalyisEngineWrapper 


   
   value="/path/to/the/root/directory/of/the/installed/pear/file"/>

   
   


This solution will also work out of the box in our tooling. The tools 
does not have to implement a PEAR runtime engine itself.


Do we have any limitations when implementing this approach?

Thoughts?

-- Michael







Re: Website logo

2007-03-15 Thread Michael Baessler

Adam Lally wrote:
Should we replace the Apache Incubator logo on our website with the 
UIMA logo?
I think we should additionally add the UIMA logo to our website. What 
about having one on the left and the other on the right side of the page.
When checking some other Apache Incubator projects, most of them still 
have the Apache Incubator logo.


-- Michael




Re: UIMA pear runtime

2007-03-21 Thread Michael Baessler

Adam Lally wrote:

I think there may be a problem with the JCas classes if your adapter
uses a different ClassLoader for the AnalysisEngine in the PEAR than
the ClassLoader that's being used by the calling
aggregate/application.

When a CAS is created it has to be told what ClassLoader it will use
to locate JCas classes.  If the PEAR ClassLoader is known only inside
the PearAnalysisEngineWrapper, this will not be available to the CAS
and the classes can't be loaded.

This could be addressed using CAS serialization.  So the
PearAnalysisEngineWrapper would create its own CAS with the correct
ClassLoader.  When it's process method was called it would need to
serialize the input CAS and deserialize it into its "private" CAS
before calling the wrapped AE.  It would have to serialize on the way
back as well.


Let me understand the real issue here. When the CAS is created it gets a 
ClassLoader that is used to located the JCas classes.

As far as I know, the CAS stores the references to the JCas classes, right?
So when the aggregate creates these references in the CAS the pear 
runtime wrapper with the UIMAClassLoader cannot use it's own JCas 
classes since they are
not referenced in the CAS. Is that also right? So it would be nice if 
the JCas references in the CAS can be changed later, will that be possible?


If true, will it be possible to provide a CAS.reinitialize(ClassLoader) 
method to reinitialize the JCas classes in the CAS when the ClassLoader 
changed?
This method can either be called by an application or by the UIMA pear 
runtime wrapper.


With that, will it be possible to provide a JCas reference map within 
the CAS for each ClassLoader that is passed to the CAS? In that case we 
can just switch the JCas references when the ClassLoader changed and 
must not reinitialize everything again except the first time.


What do you think?

-- Michael




UIMA datapath support for pear files

2007-03-21 Thread Michael Baessler
Currently when creating a pear file it is not possible to set the UIMA 
datapath as a defined value. You can create your own custom parameter 
that is called
data_path or DATAPATH or UIMA_datapath. But it is not defined how the 
UIMA datapath should be named. I would recommend to define:
When a pear file must set the UIMA datapath the datapath parameter is 
called: "uima.datapath"


With this definition it is also possible to provide an API at the 
current PackageBrowser object that returns the UIMA datapath setting for 
the pear file. Currently the PackageBrowser can be used to e.g. retrieve 
the custom parameter settings or the classpath of a pear. When adding 
the datapath API, pear files that needs the UIMA datapath can be 
integrated to UIMA more easily without using assumptions of what might 
the the UIMA datapath setting of the pear.


I think the changes are quite simple. We have to update the 
documentation so that users know how to set the UIMA datapath and we 
have to implement an additional API at the PackageBrowser that search in 
the configuration parameters of the PEAR for the "uima.datapath" parameter.


Thoughts or comments?

-- Michael




Re: Allowing custom service adapters to be plugged in to UIMA

2007-03-21 Thread Michael Baessler

Adam Lally wrote:

I implemented the customResourceSpecifier as described below and
committed the changes.  Michael, maybe you can give this a try with
your PearAnalysisEngineWrapper?


I change the implementation to use the customResourceSpecifier. And it 
seems that all works fine.

I checked in my changes.

Thanks for quick implementation.

-- Michael


Re: UIMA pear runtime

2007-03-21 Thread Michael Baessler

Adam Lally wrote:

I think there may be a problem with the JCas classes if your adapter
uses a different ClassLoader for the AnalysisEngine in the PEAR than
the ClassLoader that's being used by the calling
aggregate/application.

When a CAS is created it has to be told what ClassLoader it will use
to locate JCas classes.  If the PEAR ClassLoader is known only inside
the PearAnalysisEngineWrapper, this will not be available to the CAS
and the classes can't be loaded.

This could be addressed using CAS serialization.  So the
PearAnalysisEngineWrapper would create its own CAS with the correct
ClassLoader.  When it's process method was called it would need to
serialize the input CAS and deserialize it into its "private" CAS
before calling the wrapped AE.  It would have to serialize on the way
back as well.

So there's a performance penalty, but it's still better than nothing.

I really like the fact that this would allow existing UIMA tools like
DocumentAnalyzer to execute PEARs.
So, with the current implementation I tested some different ways to do 
the CAS serialization/deserialization.
The fastest one seems to be the blob serialization using a byte array. 
But it seems that this serialization/deserialization either has a bug

or I do something wrong.

Adam or Eddie, can you please look at the code in 
PearAnalysisEngineWrapper.java if the serialization/deserialization is 
correct?


If I use the fastest serialization and run an installed pear using CVD I 
see after the processing of the pear file a view that is called:
"_InitialView" and another one that is called " _InitialVie". Also the 
document language is not correct after the processing it is " n" instead 
of "en".


If I use the other serialization all works fine!

-- Michael


Re: Changing jar file names in uimaj-ep-runtime plugin

2007-03-22 Thread Michael Baessler

Adam Lally wrote:

I want to get the other committers' opinion on this.  It has to do
with Joern's proposed patch to the uimaj-ep-runtime plugin so that mvn
eclipse:eclipse will create a PDE project instead of a regular Java
project.

Because of mvn eclipse:eclipse has limited configuration options, it
appears we'd have to change the names of the jar files in the runtime
plugin to use the default Maven names, which include the version
number.  Do we want to do that?

More info in the JIRA issue 
https://issues.apache.org/jira/browse/UIMA-355.
I don't like the idea to have different names for the same version of 
the jar. I think it will be better to have the same name for all jars.
As I understand the uimaj-core-2.2-incubating-SNAPSHOT.jar naming schema 
is the default that maven generates and as far as I know it

is also what most of the Apache projects use.

So we already had the discussion about the uima jar file names earlier, 
but maybe we can have them again with this new aspects. It think it will be
more transparent for the users if all jars that have the same content 
also have the same name.


I currently can't find the main issues of the naming discussion, does 
anyone else remember?


-- Michael


Re: Changing jar file names in uimaj-ep-runtime plugin

2007-03-22 Thread Michael Baessler

Jörn Kottmann wrote:

I strongly dislike jar file names with version numbers in them.  Every
time you get a new version you have to update classpath settings in
script files, eclipse run configurations, or whever else jar files are
referred to by name.  I just think it's very impractical and would be
annoying to our users.


We can give only the jars inside the eclipse plugins version numbers.
Nobody will notice this cause the osgi runtime hides this fact from 
the user.
The only thing which is important for the user is that the plugin name 
does

not change.

I think this can be easily done inside the assemble-plugin.xml.

What do you think ?
So if the user don't see the jars and if the administration of the xml 
files is easy to do I'm fine with that.
I don't what to have many places where we have to change the jar file 
names for each new release.


-- Michael


Re: UIMA pear runtime

2007-03-23 Thread Michael Baessler

Adam Lally wrote:

Michael, I was looking over PearAnalysisEngineWrapper.java and I think
there's something missing.

When you create your CAS you aren't telling it the ClassLoader for the
PEAR.  So if JCAS were used and the JCas classes were in the PEAR, I
think you would get class not found exceptions.  Have you tried a JCAS
annotator with this?
The way to tell the CAS about the ClassLoader is to pass the
ResourceManager (the same one you passed to produceAnalysisEngine) to
the CasCreationUtils.createCas call.  It's an optional argument you
can add in addition to the arguments you're already passing.
You are right, without the fix the processing of JCas annotators fails. 
I checked in the new version

that also supports JCas annotators.

Thanks!

-- Michael



Re: Website logo

2007-03-27 Thread Michael Baessler

Adam Lally wrote:
Should we replace the Apache Incubator logo on our website with the 
UIMA logo?


-Adam



I updated the website with the UIMA project logo on the right top corner.

-- Michael


Re: UIMA pear runtime

2007-03-27 Thread Michael Baessler

Adam Lally wrote:

On 3/21/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

Let me understand the real issue here. When the CAS is created it gets a
ClassLoader that is used to located the JCas classes.
As far as I know, the CAS stores the references to the JCas classes, 
right?


Yes.


So when the aggregate creates these references in the CAS the pear
runtime wrapper with the UIMAClassLoader cannot use it's own JCas
classes since they are
not referenced in the CAS. Is that also right? So it would be nice if
the JCas references in the CAS can be changed later, will that be 
possible?


If true, will it be possible to provide a CAS.reinitialize(ClassLoader)
method to reinitialize the JCas classes in the CAS when the ClassLoader
changed?
This method can either be called by an application or by the UIMA pear
runtime wrapper.

With that, will it be possible to provide a JCas reference map within
the CAS for each ClassLoader that is passed to the CAS? In that case we
can just switch the JCas references when the ClassLoader changed and
must not reinitialize everything again except the first time.

What do you think?



In theory it seems possible, Marshall would have to comment on how
difficult it is.

There's a further complication which is that JCas-generated class
_instances_ are also cached in the JCas object.  So if an JCas-based
annotator creates an annotation Java object, and the application later
uses an iterator to retrieve that object, it would get the same Java
object back.  Obviously this won't work if the application and
annotator don't share the same ClassLoader for accessing that class.

We could work around that, too, by clearing out the cache if the
ClassLoader changes, or using a different cache for each ClassLoader.
This wouldn't allow sharing additional data in Java fields that were
manually added to the JCas-generated class, although that doesn't work
when serialization is involved anyway, so may not be a big loss.

There's a lot of additional complexity involved in doing this.  Is it
worth it?  I'm not sure.
Hi Marshall, can you please comment this. Will that be possible or does 
it have any impact?


Thanks Michael



Re: Documentation link on the website

2007-03-29 Thread Michael Baessler

Thilo Goetz wrote:
I would like to suggest to move the documentation link to a more 
prominent place.  I only ever find it because I know it's there and it 
has to be somewhere.  I would like to put it in the "General" section. 
I would also like like to move the "ASF" and "License" links further 
down in the list.


If there are no objections, I'll do this tomorrow sometime.

--Thilo



+1

-- Michael


Re: Next UIMA release: 2.1.1?

2007-03-29 Thread Michael Baessler

Adam Lally wrote:

On 3/29/07, Marshall Schor <[EMAIL PROTECTED]> wrote:

There is a serious problem that was fixed in the CDE (see
http://issues.apache.org/jira/browse/UIMA-364 ) which I'd like to get 
out.


How about a 2.1.1 release?  Any other things to get into that?



My gut feeling is that the hotfix path is sufficient to get this one
critical fix out but that I don't want to rush a 2.1.1 release.  There
are other useful things to get out ( in CDE descriptors
probalby is the main one, I have already had people ask for it), but
I'm not comfortable with the amounf of testing (not much) this and
other things have undergone.  I don't know that we have the bandwidth
to do appropriate testing for a full release anytime soon (I certainly
don't).
I think whenever we do a 2.1.1 the customResourceSpecifier stuff should 
also be in.


-- Michael



testcase cleanup

2007-03-29 Thread Michael Baessler
Some of our test cases still use deprecated UIMA framework methods. I 
think we should try to cleanup this if possible to use the newly 
provided methods.
In some cases it can still make sense to also test some deprecated 
methods, but I think these are some minor special cases.


I think the most effort will be the replace TextAnalysisEngine methods 
with AnalysisEngine methods.


What do others think?

-- Michael




Re: [VOTE] [HOTFIX] release hot fix Eclipse plugin for UIMA CDE tool

2007-03-30 Thread Michael Baessler

Marshall Schor wrote:
One other issue I just remembered - this artifact is not "signed".  I 
don't have the time right
now due to other pressing things, so this looks like it will slip into 
next week.
I absolutely agree with Thilo, we should fix and tag this in the 2.1.0 
branch. I think we should do things as good as we can to make it 
understandable
for all users. And providing hot fixes for a release that is based on 
trunk seems to me a little bit confusing.


If there is there anything I can help with, please let me know.

-- Michael




Re: [VOTE][HOTFIX] [RETRY1] release hot fix Eclipse plugin for UIMA CDE tool

2007-04-03 Thread Michael Baessler

Adam Lally wrote:

On 4/2/07, Marshall Schor <[EMAIL PROTECTED]> wrote:

In this retry, the following has changed:
1) The artifact (the zip file which is the Eclipse plugin) now includes
the standard DISCLAIMER, LICENSE, and NOTICES files.

2) A README-HOTFIX-1 was added and appears in the Zip file when 
unzipped.


2) The artifact is signed; the website has the signature keys

3) The changes are made on top of the SVN 2.1.0 branch.

==

The CDE fix in Jira: http://issues.apache.org/jira/browse/UIMA-364 has
been packaged into a hotfix.

The hotfix consists only of one Eclipse Plugin project; this project
(uimaj-ep-configurator) was tagged in SVN:
https://svn.apache.org/repos/asf/incubator/uima/uimaj/tags/uimaj-ep-configurator-2.1.0-hotfix-1-RC1/uimaj-ep-configurator 




The artifact is located in
http://people.apache.org/~schor/org.apache.uima.desceditor.2.1.0.incubating-hotfix-1.zip; 


this is a "binary" deployable object usable by UIMA users directly,
after unzipping into the Eclipse plugins directory.

Please cast your vote on releasing:
org.apache.uima.desceditor.2.1.0.incubating-hotfix-1.zip and
including a reference to the source at:
https://svn.apache.org/repos/asf/incubator/uima/uimaj/tags/uimaj-ep-configurator-2.1.0-hotfix-1-RC1/uimaj-ep-configurator 


Note: I don't plan to zip up this source further into a release
artifact, unless it's required.

We ask that you please vote to approve this release:
[ ] +1 Approve the release as Apache UIMA 2.1.0-incubating
[ ] -1 Recommend against releasing at this time (identify issues you
consider showstoppers)



Your description of the +1 is still incorrect. :)

I am +1 to release as 
org.apache.uima.desceditor.2.1.0.incubating-hotfix-1.zip.


-Adam



+1 to release org.apache.uima.desceditor.2.1.0.incubating-hotfix-1.zip.

-- Michael


Re: testcase cleanup

2007-04-10 Thread Michael Baessler

Michael Baessler wrote:
Some of our test cases still use deprecated UIMA framework methods. I 
think we should try to cleanup this if possible to use the newly 
provided methods.
In some cases it can still make sense to also test some deprecated 
methods, but I think these are some minor special cases.


I think the most effort will be the replace TextAnalysisEngine methods 
with AnalysisEngine methods.


What do others think?

No objections, I will start working on it.

-- Michael



Re: Urgent: Board report due TODAY

2007-04-11 Thread Michael Baessler

Adam Lally wrote:

Here's my suggested board report.  Please review this ASAP.
 -Adam



UIMA is a component framework for the analysis of unstructured content
such as text, audio and video.


Some recent activity:

We completed our first incubating release last month, resolving all
legal issues and obtaining the necessary Incubator PMC approval.


Items to complete before graduation:
  * Attract new committers


Community:
  * Traffic on the uima-user list has started to pick up since our
release.  We hope to be able to eventually attract some Apache UIMA
users to become committers.
  * uima-dev list has a good amount of traffic, mostly from the
original committers.
  * We have one contributor, Jörn Kottman who has submitted a few 
patches.



Code:
  * The UIMA Java framework code has been released
  * The UIMA C++ framework code was donated with a software grant and
has been added to SVN.  We're currently doing the final migration and
testing work to enable a release of the C++ framework.
  * We've established a sandbox and accepted a contribution from Jörn 
Kottman




+1

-- Michael


Re: Default Result Specifications too complicated?

2007-04-17 Thread Michael Baessler

Adam Lally wrote:

P.S. Here are the specific rules for the Result Spec (this is
documented in the manual more or less in this form):

The default Result Spec is automatically computed from the
capabilities in the component descriptors, as follows:

1) The outermost aggregate's result spec is set to the list of its
declared output types.
2) The result spec for each delegate is set to the union of the
aggregate's result spec with the set of all input types of all other
delegates in the aggregate.  (This is so that we ask each annotator to
produce types that may be needed by a subsequent annotator.  This rule
is applied independent of the order of the flow, so as to be
completely general in the case of a custom flow controller.)
3) For a nested aggregate, apply rule #2 recursively.

I think these rules make logical sense, and I can't think of any
easier rules to apply other than to forget the whole thing.

Sorry, I currently don't understand what is going on here...

you mentioned above how the default Result Spec is computed. But I think 
the Result Spec depends on the used
aggregate flow, isn't it? So what flow is used in the sample above, the 
fixed flow? I know when using the capabilityLanguageFlow all is slightly 
different for example.
So my thought was that the Result Spec depends on the used flow and if 
the flow does not manipulate the Result Spec the annotators produce all 
they can.
So is "all they can" implemented as default Result Spec? If true, I 
think in this case we can turn off/ignore the Result Spec.


-- Michael




Re: [jira] Created: (UIMA-377) add API to build PEAR packages

2007-04-18 Thread Michael Baessler

Michael Baessler (JIRA) wrote:

add API to build PEAR packages
--

 Key: UIMA-377
 URL: https://issues.apache.org/jira/browse/UIMA-377
 Project: UIMA
  Issue Type: New Feature
  Components: Core Java Framework
Reporter: Michael Baessler
 Assigned To: Michael Baessler
Priority: Minor
 Fix For: 2.2


Currently pear packages can only be created when having the PearPackager plugin 
installed. I would like to add a pear packager API to create pear files without 
using the pearPackager eclipse plugin.

Also I plan to create an ANT task to build pear packages directly in the build. 
I choose ANT since I think it is more common. We can run it in our Maven build 
and annotator developers can run it in their build as well even when they don't 
use Maven
I just want to get some other opinion about where to check-in the Java 
class for the ANT task. The class has a dependency on uima_core and to 
an ANT jar.


What do you think...

1) add it to the uima_core project where all the other pear stuff is 
checked-in


2) add it to a new project in the sandbox

-- Michael


  


Re: Default Result Specifications too complicated?

2007-04-18 Thread Michael Baessler

Adam Lally wrote:

This changed in 2.0 with the introduction of the flow controller.  The
ResultSpec no longer has any dependence on the flow.  The framework
assumes the most general case of the custom flow controller.

The effect is that an annotator's ResultSpec will include all of the
input types of *any* component in the same aggregate, even if that
component happens to run before the annotator, not after it.  We
decide that this is such a strange case that it wasn't worth worrying
about.
OK, but can the FlowController manipulate the ResultSpec for an 
annotator before it is
called? Or can the FlowController just decide the flow of the engines 
(if they are called, which order ...)
If the FlowController does not manipulate the ResultSpec where else can 
it be done, only in the application

that calls the main engine?

The default result spec is not "all they can".  A type is only
included in an annotator's default result spec if one of the following
is true:
(a) The type is listed in the outputs of all containing aggregates (so
it's concluded that the type is a final output of the whole aggregate
which the application may inspect)
(b) The type is listed in the inputs of some other component of the
aggregate (so it's concluded that the type may be a necessary
intermediate type that some other annotator may inspect in order to
produce output that's of interest to the application)

As I recall, you (Michael) and/or Thilo felt that a default result
spec of "all they can" was not appropriate, and that it was a
necessary feature to restrict the outputs of an annotator based on the
declared outputs of an enclosing aggregate.  In any case this was
always done in the capability language flow in previous versions of
UIMA, but in 2.0 was generalized to be independent of flow.
I still think that it is necessary to have a way to specify what output 
types an annotator
should produce. And it is not efficient that an annotator always produce 
anything that's possible.


I will look into the code to get a better understanding how this stuff 
works in UIMA 2.1.


-- Michael



Re: CVD package renaming

2007-04-20 Thread Michael Baessler

Thilo Goetz wrote:

Adam Lally wrote:

On 4/19/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:

All,

I would like to gradually clean up some of the CVD code.  To start 
with,

I'd like to rename the packages, annot_view to cvd, and I need to think
about ts_editor (it's not an editor).  Anybody object to that?  I'll
need to update the scripts.  Anything else I need to do when renaming
those packages?



PEAR Installer also launches CVD and may have its class name hardcoded
somewhere.

We might consider leaving around a deprecated stub class under the old
name and just having its main method call the main method of the new
class.  That way if any user had their own script it would continue to
work.

-Adam


Good idea, I'll do that.  I'll wait another day, though, to see if 
anybody else has any suggestions/concerns.
The PearInstaller run Gladis/CVD to verify the installed component. The 
code is available in the uimaj-tools project at 
org.apache.uima.tools.pear.install.InstallPear:runGladis().
The main class that is loaded to start Gladis/CVD is loaded from the 
gladis.properties file.
Maybe when doing some changes here, you can also change the name Gladis 
(used in several methods and comments) to the new name.


-- Michael


Re: Next release?

2007-04-27 Thread Michael Baessler

Thilo Goetz wrote:
We have a few things that people are waiting for, notably Pear related 
things.  I would propose to shoot for having the next release 
candidate ready by the end of May (so we could hopefully do the 
release in early June).  Thoughts?


How about the C++ version?  Are we keeping that completely separate, 
or should there be a common release?


I guess the next version will be 2.2.  That's our working number in 
Jira, anyway.  Any reason to go to anything else, like 2.1.1?


--Thilo


In the time frame of the next UIMA release I would propose to have a 
"sandbox" release. So that the users can download some of the sandbox 
components as binaries.


-- Michael


Re: Is there any way to make CDE installation easier?

2007-04-27 Thread Michael Baessler

Adam Lally wrote:

There was just a post on uima-user (which got resolved already) about
difficulting in getting the CDE to work.  This keeps coming up, and is
almost always an EMF installation problem.  In this case it was an
incorrect EMF version for the Eclipse version being used.  Is there
anything we can do to improve the situation?  Would making an Eclipse
"feature" or setting up an update site help at all?

-Adam


Why can't we provide a list of eclipse versions we have tested with the 
corresponding EMF versions? So if users do a clean install of eclipse 
and EMF they can install the versions that work together correctly. Or 
we post a link to 
http://www.eclipse.org/modeling/emf/downloads/?project=emf where users 
can find the correct EMF version for their eclipse.


-- Michael



Re: UIMA pear runtime

2007-05-02 Thread Michael Baessler
After doing most of the UIMA pear runtime work... I would like to 
suggest something else that came to my mind when implementing the pear 
runtime.


Currently we work with a customResourceSpecifier. I would like to change 
that to a real pearSpecifier. There are two main reasons.
   - Since the pear runtime is fully integrated to the UIMA core, I 
don't like that the descriptor has the source class that is used to load 
the pears in the specifier.
  When using an own descriptor we can hide this unnecessary setting 
from the user.
   - When we are now able to run pears in an aggregate flow, we may 
should also take care of the input and output capabilities for the pear 
main component descriptor. So
  my suggestion is to create a pear descriptor that also can have 
input and output capabilities, so FlowController can handle pears as well.


To make the whole pear stuff easier to use for the user, I would like to 
generate the pearSpecifier during the pear installation. We have all the 
necessary information when the pear is installed and can also set the 
input and output capabilities. So our users can use the generated pear 
descriptor to run the pear or to reference it in an aggregate descriptor.


What do you think?

-- Michael



Michael Baessler wrote:
Currently it is not possible to run an installed pear file out of the 
box in UIMA. I mean by just specifying the pear installation path or 
something similar.
To run installed pear files there is a lot of user configuration and 
implementation necessary. So it would be nice to have a UIMA pear 
runtime that can run an installed pear file out of the box.


With the suggestion of having custom resource specifiers 
 we can provide an easy way to integrate such 
a UIMA pear runtime.
We just have to implement a new PearAnalysisEngineWrapper that extends 
the AnalysisEngineImplBase class that knows how to start an installed 
pear file. All the necessary information is available and can be 
parsed from the metadata of the installed pear. The utilities, e.g. to 
dynamically load the classes (UIMA extension class loader) is also in 
place and can be used. So an example of the  
can look like:


xmlns="http://uima.apache.org/resourceSpecifier";>
   
org.apache.PearAnalyisEngineWrapper 


   
   value="/path/to/the/root/directory/of/the/installed/pear/file"/>

   
   


This solution will also work out of the box in our tooling. The tools 
does not have to implement a PEAR runtime engine itself.


Do we have any limitations when implementing this approach?

Thoughts?




Re: UIMA pear runtime

2007-05-02 Thread Michael Baessler

Tong Fin wrote:
My thinking is that "pear runtime" is more closer to the " tooling" 
than the
"framework". It is hard to draw the boundary. Also, there are many 
runtimes

that UIMA can/may support.
I don't think that we should do something too specific to the "pear
runtime".
If I look from a UIMA users perspective I see that there is something 
that is called pear to package annotators and UIMA components but
there is no way to easy run these packaged components in the UIMA 
runtime later. So that's why I think the core framework should have
a pear runtime integration and API to run these pear packages out of the 
box without doing anything manually after the pear installation.


-- Michael


Re: UIMA pear runtime

2007-05-04 Thread Michael Baessler

Adam Lally wrote:

On 5/2/07, Michael Baessler <[EMAIL PROTECTED]> wrote:

After doing most of the UIMA pear runtime work... I would like to
suggest something else that came to my mind when implementing the pear
runtime.

Currently we work with a customResourceSpecifier. I would like to change
that to a real pearSpecifier.


OK with me.


OK I implemented the following pearSpecifier resource:


http://uima.apache.org/resourceSpecifier";>
   InstalledPear
   /home/test/WhitespaceTokenizer


The resourceType is the type of the pear archive we use. So maybe in the 
future we can also work on archived pear files that do not have to be 
installed before.
The pearPath is the path to the installed pear root directory. If we 
decide to work also on archived pear files, this path can also be a 
valid pear archive file path.


Additionally the pearSpecifier can have parameters like
 
  
  
but currently I don't see the need for this.

I also added the necessary methods for the ResourceSpecifierFactory and 
for the XMLParser.


Additionally to the new pearSpecifier stuff I added an extra step in the 
pear installation api to automatically create a pearSpecifier when a 
pear file is installed.
So after the installation of a pear file in the main root directory of 
the installed pear there is a descriptor called _pear.xml 
that can be used
to easily run the installed pear file unsing CVD or any other UIMA 
tooling. With this, no classpath changes or anything else is necessary 
to run the pear. The created descriptor can also be used to add it to an 
aggregate that should contain the pear.


With these changes I think the pear stuff is more attractive to use than 
before.


Any comments or issues on this? If not I will check in the code to SVN.

-- Michael





  1   2   3   4   5   6   7   8   9   10   >