Where to put large generated things for our website to access

2007-12-21 Thread Marshall Schor
Here are some examples of large files (or large collections of files

The api java docs;
The api java docs as a zip file
The 4 books in html format

It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads.

It seems bad to put them in SVN (because there's no need for
versioning these - they're generated, and they are big, taking up SVN
space).

Our current strategy is hybrid:

1) for current release only: api java docs and the api java docs.zip
file are put in people.a.o/www/incubator.a.o/uima/downloads, and are
*not* kept in SVN.
2) for current release only: the 4 books in html format are put in SVN
and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn
update command.

I see, in fact that for release 2.2.0, we managed to put the books in
html format into SVN twice - once under
 2.2.0-incubating/docs/html, and once under
 2.2.0-incubating/html
and of course, on the website, it shows up twice also...  not good.

Is there any automated process for getting the files installed on
people.a.o/www/incubator.a.o/uima/downloads?  (Has anyone done any
scripts for this)?

Does everyone agree that it's best to keep these out of SVN, and to put
them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads?

===

The mirrored distribution spot contains, in addition to /binaries and
/source, a /docs directory with the the following:
release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1]
release/api / -- unzipped set of javaDoc html files, no
signing files
release/html/.   -- set of 4 books as html files, no signing files
release/pdf/.-- set of 4 books as pdfs, no signing files

I think everything that's put onto the mirroring system is supposed to
be signed, because Apache doesn't control what goes on at the mirrors
(e.g., they could be hacked).

Currently, our download page is slient about the existance of these.

I think we should delete these on the mirroring distribution system. 
Assuming we followed the top part of this note, we would have everything
(except the pdf form of the 4 books) on the UIMA website, directly (not
going thru a mirror).

Other opinions?

-Marshall



Re: Where to put large generated things for our website to access

2007-12-21 Thread Marshall Schor
One more consideration:

Some people use a URL to javadocs as part of their javadoc build
process.  For that, we might want to consider if we want to support
people doing that - and if so, we probably need to keep the javadocs for
each version, forever.

The dist system supports this, via archiving + redirects.

-Marshall

Marshall Schor wrote:
 Here are some examples of large files (or large collections of files

 The api java docs;
 The api java docs as a zip file
 The 4 books in html format

 It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads.

 It seems bad to put them in SVN (because there's no need for
 versioning these - they're generated, and they are big, taking up SVN
 space).

 Our current strategy is hybrid:

 1) for current release only: api java docs and the api java docs.zip
 file are put in people.a.o/www/incubator.a.o/uima/downloads, and are
 *not* kept in SVN.
 2) for current release only: the 4 books in html format are put in SVN
 and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn
 update command.

 I see, in fact that for release 2.2.0, we managed to put the books in
 html format into SVN twice - once under
  2.2.0-incubating/docs/html, and once under
  2.2.0-incubating/html
 and of course, on the website, it shows up twice also...  not good.

 Is there any automated process for getting the files installed on
 people.a.o/www/incubator.a.o/uima/downloads?  (Has anyone done any
 scripts for this)?

 Does everyone agree that it's best to keep these out of SVN, and to put
 them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads?

 ===

 The mirrored distribution spot contains, in addition to /binaries and
 /source, a /docs directory with the the following:
 release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1]
 release/api / -- unzipped set of javaDoc html files, no
 signing files
 release/html/.   -- set of 4 books as html files, no signing files
 release/pdf/.-- set of 4 books as pdfs, no signing files

 I think everything that's put onto the mirroring system is supposed to
 be signed, because Apache doesn't control what goes on at the mirrors
 (e.g., they could be hacked).

 Currently, our download page is slient about the existance of these.

 I think we should delete these on the mirroring distribution system. 
 Assuming we followed the top part of this note, we would have everything
 (except the pdf form of the 4 books) on the UIMA website, directly (not
 going thru a mirror).

 Other opinions?

 -Marshall



   



Re: Where to put large generated things for our website to access

2007-12-22 Thread Marshall Schor
Here's an argument for keeping the big things we point to from our
website, like the javaDocs and the 4 books in html form, on the
a.o/dist/incubator/uima site:

It is automatically archived.  And, when it's deleted from the
mirror, a redirect is put in to the archive spot.

This would be ideal for being able to have older versions kept with
permanent static URLs.

So - upon further reflection - I think I'm changing my mind on this, and
am now in favor of keeping these on the mirroring system. 

We can avoid having people who are on our web site and want to view the
documentation, having to check signatures by not using the mirroring
system, but just pointing them to the main a.o/dist location.  I'm not
sure if this would be OK in terms of protocol for load balancing, but
I'll set the doc page up this way for now, and we can change it if we
need to.  

-Marshall

Michael Baessler wrote:
 Fine with me to delete the HTML documentations (manual and javadoc) on
 the mirror. I thought we can use it and link them from our website.

 As far as I know, there is no script to upload the documentation. I
 did it manually.

 -- Michael

 Marshall Schor wrote:
 Here are some examples of large files (or large collections of files

 The api java docs;
 The api java docs as a zip file
 The 4 books in html format

 It seems good to put them in
 people.a.o/www/incubator.a.o/uima/downloads.

 It seems bad to put them in SVN (because there's no need for
 versioning these - they're generated, and they are big, taking up SVN
 space).

 Our current strategy is hybrid:

 1) for current release only: api java docs and the api java docs.zip
 file are put in people.a.o/www/incubator.a.o/uima/downloads, and are
 *not* kept in SVN.
 2) for current release only: the 4 books in html format are put in SVN
 and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn
 update command.

 I see, in fact that for release 2.2.0, we managed to put the books in
 html format into SVN twice - once under
  2.2.0-incubating/docs/html, and once under
  2.2.0-incubating/html
 and of course, on the website, it shows up twice also...  not good.

 Is there any automated process for getting the files installed on
 people.a.o/www/incubator.a.o/uima/downloads?  (Has anyone done any
 scripts for this)?

 Does everyone agree that it's best to keep these out of SVN, and to put
 them in the web server spot on
 people.a.o/www/incubator.a.o/uima/downloads?

 ===

 The mirrored distribution spot contains, in addition to /binaries and
 /source, a /docs directory with the the following:
 release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1]
 release/api / -- unzipped set of javaDoc html files, no
 signing files
 release/html/.   -- set of 4 books as html files, no signing
 files
 release/pdf/.-- set of 4 books as pdfs, no signing files

 I think everything that's put onto the mirroring system is supposed to
 be signed, because Apache doesn't control what goes on at the mirrors
 (e.g., they could be hacked).

 Currently, our download page is slient about the existance of these.

 I think we should delete these on the mirroring distribution system.
 Assuming we followed the top part of this note, we would have everything
 (except the pdf form of the 4 books) on the UIMA website, directly (not
 going thru a mirror).

 Other opinions?

 -Marshall

   






Re: Where to put large generated things for our website to access

2007-12-22 Thread Marshall Schor
One further thought:  A lot of projects put the RELEASE NOTES for
particular releases at the top level of the dist/ - where the file
name includes the release:  for example:
  ANT:   RELEASE-NOTES-1.7.0.html
  HTTPD:   CHANGES_2.2.6

Since these will get archived, and redirects can be done for them, their
URLs can be permanent.

To be consistent with this practice, I would like to put our release
notes for display from our web site in a.o/dist/...

They don't need to be signed, because only archives need that; also,
other projects don't sign these kinds of things. Any objections?

-Marshall


Marshall Schor wrote:
 Here's an argument for keeping the big things we point to from our
 website, like the javaDocs and the 4 books in html form, on the
 a.o/dist/incubator/uima site:

 It is automatically archived.  And, when it's deleted from the
 mirror, a redirect is put in to the archive spot.

 This would be ideal for being able to have older versions kept with
 permanent static URLs.

 So - upon further reflection - I think I'm changing my mind on this, and
 am now in favor of keeping these on the mirroring system. 

 We can avoid having people who are on our web site and want to view the
 documentation, having to check signatures by not using the mirroring
 system, but just pointing them to the main a.o/dist location.  I'm not
 sure if this would be OK in terms of protocol for load balancing, but
 I'll set the doc page up this way for now, and we can change it if we
 need to.  

 -Marshall

 Michael Baessler wrote:
   
 Fine with me to delete the HTML documentations (manual and javadoc) on
 the mirror. I thought we can use it and link them from our website.

 As far as I know, there is no script to upload the documentation. I
 did it manually.

 -- Michael

 Marshall Schor wrote:
 
 Here are some examples of large files (or large collections of files

 The api java docs;
 The api java docs as a zip file
 The 4 books in html format

 It seems good to put them in
 people.a.o/www/incubator.a.o/uima/downloads.

 It seems bad to put them in SVN (because there's no need for
 versioning these - they're generated, and they are big, taking up SVN
 space).

 Our current strategy is hybrid:

 1) for current release only: api java docs and the api java docs.zip
 file are put in people.a.o/www/incubator.a.o/uima/downloads, and are
 *not* kept in SVN.
 2) for current release only: the 4 books in html format are put in SVN
 and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn
 update command.

 I see, in fact that for release 2.2.0, we managed to put the books in
 html format into SVN twice - once under
  2.2.0-incubating/docs/html, and once under
  2.2.0-incubating/html
 and of course, on the website, it shows up twice also...  not good.

 Is there any automated process for getting the files installed on
 people.a.o/www/incubator.a.o/uima/downloads?  (Has anyone done any
 scripts for this)?

 Does everyone agree that it's best to keep these out of SVN, and to put
 them in the web server spot on
 people.a.o/www/incubator.a.o/uima/downloads?

 ===

 The mirrored distribution spot contains, in addition to /binaries and
 /source, a /docs directory with the the following:
 release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1]
 release/api / -- unzipped set of javaDoc html files, no
 signing files
 release/html/.   -- set of 4 books as html files, no signing
 files
 release/pdf/.-- set of 4 books as pdfs, no signing files

 I think everything that's put onto the mirroring system is supposed to
 be signed, because Apache doesn't control what goes on at the mirrors
 (e.g., they could be hacked).

 Currently, our download page is slient about the existance of these.

 I think we should delete these on the mirroring distribution system.
 Assuming we followed the top part of this note, we would have everything
 (except the pdf form of the 4 books) on the UIMA website, directly (not
 going thru a mirror).

 Other opinions?

 -Marshall

   
   

 



   



Re: Where to put large generated things for our website to access

2007-12-23 Thread Marshall Schor
I updated our website download page and documentation page.  I made the
download page work with mirrors, and changed the format for accessing
previous archived files to follow the common practice on other sites,
referring to the archive.apache.org site.

I made our documentation page refer to apache.org/dist/incubator/uima
for the doc files - and didn't put any of these into our SVN for our
website.

I also followed common practice and put our Release notes into the
apache.org/dist/i/u at the top level (I changed the name to add the
suffix of the release version).  This allows (via the archive system)
for these things to be always available.

I added the Eclipse update site to a.o/d/i/u

The only thing not done as of yet is setting up a .htaccess file in this
directory, and adding HEADER.html and README.html files to make
directory listing more customized.  I'm not going to tackle this right
now - if anyone else wants to take a crack, OK with me.

-Marshall

Michael Baessler wrote:
 Fine with me.

 -- Michael

 Marshall Schor wrote:
 One further thought:  A lot of projects put the RELEASE NOTES for
 particular releases at the top level of the dist/ - where the file
 name includes the release:  for example:
   ANT:   RELEASE-NOTES-1.7.0.html
   HTTPD:   CHANGES_2.2.6

 Since these will get archived, and redirects can be done for them, their
 URLs can be permanent.

 To be consistent with this practice, I would like to put our release
 notes for display from our web site in a.o/dist/...

 They don't need to be signed, because only archives need that; also,
 other projects don't sign these kinds of things. Any objections?

 -Marshall


 Marshall Schor wrote:
  
 Here's an argument for keeping the big things we point to from our
 website, like the javaDocs and the 4 books in html form, on the
 a.o/dist/incubator/uima site:

 It is automatically archived.  And, when it's deleted from the
 mirror, a redirect is put in to the archive spot.

 This would be ideal for being able to have older versions kept with
 permanent static URLs.

 So - upon further reflection - I think I'm changing my mind on this,
 and
 am now in favor of keeping these on the mirroring system.
 We can avoid having people who are on our web site and want to view the
 documentation, having to check signatures by not using the mirroring
 system, but just pointing them to the main a.o/dist location.  I'm not
 sure if this would be OK in terms of protocol for load balancing, but
 I'll set the doc page up this way for now, and we can change it if we
 need to. 
 -Marshall

 Michael Baessler wrote:
  
 Fine with me to delete the HTML documentations (manual and javadoc) on
 the mirror. I thought we can use it and link them from our website.

 As far as I know, there is no script to upload the documentation. I
 did it manually.

 -- Michael

 Marshall Schor wrote:
  
 Here are some examples of large files (or large collections of files

 The api java docs;
 The api java docs as a zip file
 The 4 books in html format

 It seems good to put them in
 people.a.o/www/incubator.a.o/uima/downloads.

 It seems bad to put them in SVN (because there's no need for
 versioning these - they're generated, and they are big, taking
 up SVN
 space).

 Our current strategy is hybrid:

 1) for current release only: api java docs and the api java docs.zip
 file are put in people.a.o/www/incubator.a.o/uima/downloads, and are
 *not* kept in SVN.
 2) for current release only: the 4 books in html format are put in
 SVN
 and copied to people.a.o/www/incubator.a.o/uima/downloads with the
 svn
 update command.

 I see, in fact that for release 2.2.0, we managed to put the books in
 html format into SVN twice - once under
  2.2.0-incubating/docs/html, and once under
  2.2.0-incubating/html
 and of course, on the website, it shows up twice also...  not good.

 Is there any automated process for getting the files installed on
 people.a.o/www/incubator.a.o/uima/downloads?  (Has anyone done any
 scripts for this)?

 Does everyone agree that it's best to keep these out of SVN, and
 to put
 them in the web server spot on
 people.a.o/www/incubator.a.o/uima/downloads?

 ===

 The mirrored distribution spot contains, in addition to /binaries and
 /source, a /docs directory with the the following:
 release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1]
 release/api / -- unzipped set of javaDoc html files, no
 signing files
 release/html/.   -- set of 4 books as html files, no signing
 files
 release/pdf/.-- set of 4 books as pdfs, no signing files

 I think everything that's put onto the mirroring system is
 supposed to
 be signed, because Apache doesn't control what goes on at the
 mirrors
 (e.g., they could be hacked).

 Currently, our download page is slient about the existance of these.

 I think we should delete these on the mirroring distribution system.
 Assuming we followed the top part of this note, we would have

[Fwd: Re: [schor] incubator/uima/eclipseUpdateSite/]

2007-12-29 Thread Marshall Schor


 Original Message 
Subject:Re: [schor] incubator/uima/eclipseUpdateSite/
Date:   Sat, 29 Dec 2007 07:35:24 +0100 (CET)
From:   Henk P. Penning [EMAIL PROTECTED]
To: Marshall Schor [EMAIL PROTECTED]
References: [EMAIL PROTECTED]
[EMAIL PROTECTED]



On Sat, 29 Dec 2007, Marshall Schor wrote:

 Date: Sat, 29 Dec 2007 01:13:19 -0500
 From: Marshall Schor [EMAIL PROTECTED]
 To: Henk Penning [EMAIL PROTECTED], uima-dev uima-dev@incubator.apache.org
 Subject: Re: [schor] incubator/uima/eclipseUpdateSite/


 This has now been done - the signature and hash sums (MD5 and SHA1) are
 uploaded to incubator/uima/eclipseUpdateSite for the files flagged in
 the report.

Marshall Schor,

  ok ; thanks ; the checker picked it up already ; all's fine.

 -Marshall

  regards,

  Henk Penning

   _
Henk P. Penning, Computer Systems Group   R Uithof CGN-A232  _/ \_
Dept of Computer Science, Utrecht University  T +31 30 253 4106 / \_/ \
Padualaan 14, 3584CH Utrecht, the Netherlands F +31 30 253 2804 \_/ \_/
http://people.cs.uu.nl/henkp/ M [EMAIL PROTECTED]  \_/





Re: permissions owners on w.a.o/dist/incubator/uima

2007-12-29 Thread Marshall Schor
Thanks, Robert.

I think we've finished with the tasks needed for migrating to the
mirror system, perhaps with the exception of setting up .htaccess on
w.a.o/dist/incubator/uima, and adding HEADER.html and README.html to
that directory.  However, I see other projects do not necessarily have
that. 

Are any other lines (e.g., redirects of some sort) needed in the
.htaccess file at this time?

We would appreciate any checking you could do of our work to migrate
to the mirror system.

Thanks for all your help.

-Marshall

Robert Burrell Donkin wrote:
 On Dec 25, 2007 3:23 AM, Marshall Schor [EMAIL PROTECTED] wrote:
   
 When putting files into this spot, I think the permissions should
 include group writable - so others in the project can update things, and
 world read-only for obvious reasons.
 

 +1

   
 Is there a group for uima?  I think there may not be, but I don't
 remember how to check that on linux.
 

 or FreeBSD ;-)

 i use:

 grep incubator /etc/group

 but those with more BSD-fu usually have more elegant solutions than mine...

   
 If there is not, then we should
 make the group be incubator if possible.
 

 +1

 the basic infrastructure rule is one group per TLP. so, whilst UIMA is
 in the incubator, the incubator group should be used. if UIMA
 gradulates to a TLP then a new uima group will be created and that
 group should be used for releases. if UIMA graduates as a subproject
 of Project Cool (say) then group cool will be used for releases.

 - robert


   



Re: Where to put large generated things for our website to access

2007-12-30 Thread Marshall Schor
Robert Burrell Donkin wrote:
 (apologies for not jumping in promptly)

 On Dec 24, 2007 3:48 AM, Marshall Schor [EMAIL PROTECTED] wrote:
   
 I updated our website download page and documentation page.  I made the
 download page work with mirrors, and changed the format for accessing
 previous archived files to follow the common practice on other sites,
 referring to the archive.apache.org site.

 I made our documentation page refer to apache.org/dist/incubator/uima
 for the doc files - and didn't put any of these into our SVN for our
 website.
 

 after feeling a little uncertain about this, i asked the
 intrastructure team who gave some good arguments for storing docs in
 dist:

 1. rsync is good for large files but struggles with lots of small files
 2. mirrored documentation is not supported so push all that content to
 the mirrors is wasteful
 3. released documentation should have an unchanging URL. when a
 release is archived, the documentation URL would need to change (a
 redirect would help people but not all robots).

 having release documentation permanently stored and archived is a good
 idea but it's strongly recommended that subversion is used. the zip'd
 archive is fine where it is but it would be better for the contents of
 the folders to be committed to subversion and then checked out to an
 appropriate place on the website.

 - robert
   
I felt uncertain about all of this, too.  It seems to me that the right
way to do this would be to have something like w.a.o/dist-not-mirrored/
... etc. where the same archive mechanism could be used as is used for
/dist/, but which doesn't do mirroring.  Has this come up before in
discussions - a way to have things that are not to be mirrored, but 
which would reasonably be archived?  You might say that the docs don't
need to be archived (because they can always be extracted from an
archived release zip/tar), but I find having at least some older
versions of the docs quite useful in helping users running on a specific
level - I can say things like see xxx on page yyy and know it matches
their documentation.  

It seems inefficient to store large generated things in SVN, such as the
javadocs (these are large numbers of small files) -- but I would be
happy to learn if I'm worrying about this unnecessarily.

I can see an argument against something like w.a.o/dist-not-mirrored/ -
avoiding creating even more infrastructure stuff. 

Other opinions / options? 

-Marshall


Re: Where to put large generated things for our website to access

2008-01-02 Thread Marshall Schor
Michael Baessler wrote:
 Robert Burrell Donkin wrote:
 subversion really is the way to go for release documentation
   
 OK, so as far as I understand we go with the documentation the same
 way as with the previous Apache UIMA releases. We check in the
 documentation
 to SVN and provide a download similar to release 2.2.0-incubating:
   
 http://incubator.apache.org/uima/downloads/releaseDocs/2.2.0-incubating/docs/html/index.html


 The JavaDocs will also go to SVN in both versions, HTML and zip.

 I can do the necessary changes, if all agree on that.

 -- Michael



+1.

Also - remove the docs from /dist/incubator/uima

-Marshall


Re: Where to put large generated things for our website to access

2008-01-02 Thread Marshall Schor
Michael Baessler wrote:
 Marshall Schor wrote:
 Michael Baessler wrote:
  
 Robert Burrell Donkin wrote:

 subversion really is the way to go for release documentation
 
 OK, so as far as I understand we go with the documentation the same
 way as with the previous Apache UIMA releases. We check in the
 documentation
 to SVN and provide a download similar to release 2.2.0-incubating:
  
 http://incubator.apache.org/uima/downloads/releaseDocs/2.2.0-incubating/docs/html/index.html



 The JavaDocs will also go to SVN in both versions, HTML and zip.

 I can do the necessary changes, if all agree on that.

 -- Michael



 
 +1.

 Also - remove the docs from /dist/incubator/uima

 -Marshall
   
 Should we really remove the documentation from there. I think other
 projects also have the documentation there. So I think we should
 provide it too, maybe as one package to download.?

 -- Michael
This one is a judgement call - I can see arguments on both sides.

We've heard that the rsync mechanism handles small numbers of large
files better than large numbers of small files - so putting only the 1
archive file to download seems a better fit, if we do this.

Putting them in /dist/ means they will be mirrored, and archived.  So we
will have dual archiving (one in SVN, and one in the archive spot). 

The mirroring would be useful *if* we expected a large load on the
apache servers for downloading these.  I think this will not be the
case.  Most of the time I use these to send people links to specific
sections of the docs; for that it would be annoying if when they clicked
the link, they were asked to pick a mirror.

Considering all of this - I'm slightly in favor of keeping the docs just
in SVN, and not on the /dist/ mirroring system.


Some more things to think about

Since we want our docs pages to refer not only to the current release
but also previous releases, it would be good to figure out a fairly
automatic system for this.  (That was a virtue of the /dist/ - archive
system - we could point the previous releases doc links to a directory
containing all the releases, and wouldn't need to update this link for
subsequent releases). 

The other thing to do is to figure out how to keep the archived things
on our web-site.  It was suggested that we should do this like we handle
the web-site checkout.
Probably the straight-forward thing to do is to have a special directory
where all the docs we want to refer to live, have the archive link point
there, and have special links that refer to the current version.

As I recall, the web-site, itself, is replicated to other servers (since
after you update it , you  have to wait a while for it to appear).

I have to confess that this seems quite wasteful of disk resources
(double+ copies of things like javadocs - one in SVN, one on
people.apache.org in our web-site place, and maybe (several?) additional
copies on web-servers used for incubator.apache.org/uima).

But Robert Donkin suggested this was the best way.

-Marshall






Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation

2008-01-02 Thread Marshall Schor
Hi Thilo -

I forgot about that email trail :-)  The Eclipse update site is created
using an ant build script.  Is there a way to make the poms work for these?
That would be nicer than more build scripts.

-Marshall

Thilo Goetz wrote:
 Marshall,

 how does this relate to this mail trail:
 http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html

 --Thilo

 Marshall Schor (JIRA) wrote:
   
 [ 
 https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343
  ] 

 Marshall Schor commented on UIMA-677:
 -

 I found these utilities on Linux (Suse 10) and Windows (via Cygwin):  
 sha1sum and md5sum.

 For signing the Eclipse update site (all jars need to be signed - since 
 they're archives) I wrote a small shell script. I also wrote one to 
 automatically check the signatures.

 If you put gpg into the path, the scripts should work.  I'll check them into 
 SVN.  I would suggest they be combined with the other signing script, and 
 the other signing script altered to use the sha1sum/md5sum utilities.




 
 improve MD5 and SHA1 checksum generation
 

 Key: UIMA-677
 URL: https://issues.apache.org/jira/browse/UIMA-677
 Project: UIMA
  Issue Type: Bug
  Components: Build, Packaging and Test
Reporter: Michael Baessler

 Comes up on the incubator mailing list:
 There are some problems with the MD5 and SHA1 files.
 For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5:
 
 uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D  BB 12 82 58 D0 
 7D
 CA 4B
 
 The hash is spread over two lines and into hex pairs. The normal
 format is either:
 53206afb751f079dbb128258d07dca4b
 or
 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2
 The SHA1 checksums have the same problem.
   



   



Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation

2008-01-02 Thread Marshall Schor
Marshall Schor wrote:
 Hi Thilo -

 I forgot about that email trail :-)  The Eclipse update site is created
 using an ant build script.  Is there a way to make the poms work for these?
 That would be nicer than more build scripts.
   
Looking at the pom xml more carefully, I'm guessing it could be modified
to create sha1 and md5 for the eclipse update site.
I'll give it a try...

-Marshall
 -Marshall

 Thilo Goetz wrote:
   
 Marshall,

 how does this relate to this mail trail:
 http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html

 --Thilo

 Marshall Schor (JIRA) wrote:
   
 
 [ 
 https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343
  ] 

 Marshall Schor commented on UIMA-677:
 -

 I found these utilities on Linux (Suse 10) and Windows (via Cygwin):  
 sha1sum and md5sum.

 For signing the Eclipse update site (all jars need to be signed - since 
 they're archives) I wrote a small shell script. I also wrote one to 
 automatically check the signatures.

 If you put gpg into the path, the scripts should work.  I'll check them 
 into SVN.  I would suggest they be combined with the other signing script, 
 and the other signing script altered to use the sha1sum/md5sum utilities.




 
   
 improve MD5 and SHA1 checksum generation
 

 Key: UIMA-677
 URL: https://issues.apache.org/jira/browse/UIMA-677
 Project: UIMA
  Issue Type: Bug
  Components: Build, Packaging and Test
Reporter: Michael Baessler

 Comes up on the incubator mailing list:
 There are some problems with the MD5 and SHA1 files.
 For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5:
 
 uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D  BB 12 82 58 
 D0 7D
 CA 4B
 
 The hash is spread over two lines and into hex pairs. The normal
 format is either:
 53206afb751f079dbb128258d07dca4b
 or
 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2
 The SHA1 checksums have the same problem.
   
 

   
 



   



Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation

2008-01-02 Thread Marshall Schor
Marshall Schor wrote:
 Marshall Schor wrote:
   
 Hi Thilo -

 I forgot about that email trail :-)  The Eclipse update site is created
 using an ant build script.  Is there a way to make the poms work for these?
 That would be nicer than more build scripts.
   
 
 Looking at the pom xml more carefully, I'm guessing it could be modified
 to create sha1 and md5 for the eclipse update site.
 I'll give it a try...
   
Adding these lines in the checksum task to Thilo's pom version for
uimaj-distr worked:

  fileset
dir=../uimaj-eclipse-update-site/target/features
include name=*.jar /
  /fileset
  fileset
dir=../uimaj-eclipse-update-site/target/plugins
include name=*.jar /
  /fileset

I'll check in these changes to the uimaj-distr pom.  We still need to
add signing of eclipse update site jars.  I'll take a look at that.
-Marshall

 -Marshall
   
 -Marshall

 Thilo Goetz wrote:
   
 
 Marshall,

 how does this relate to this mail trail:
 http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html

 --Thilo

 Marshall Schor (JIRA) wrote:
   
 
   
 [ 
 https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343
  ] 

 Marshall Schor commented on UIMA-677:
 -

 I found these utilities on Linux (Suse 10) and Windows (via Cygwin):  
 sha1sum and md5sum.

 For signing the Eclipse update site (all jars need to be signed - since 
 they're archives) I wrote a small shell script. I also wrote one to 
 automatically check the signatures.

 If you put gpg into the path, the scripts should work.  I'll check them 
 into SVN.  I would suggest they be combined with the other signing script, 
 and the other signing script altered to use the sha1sum/md5sum utilities.




 
   
 
 improve MD5 and SHA1 checksum generation
 

 Key: UIMA-677
 URL: https://issues.apache.org/jira/browse/UIMA-677
 Project: UIMA
  Issue Type: Bug
  Components: Build, Packaging and Test
Reporter: Michael Baessler

 Comes up on the incubator mailing list:
 There are some problems with the MD5 and SHA1 files.
 For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5:
 
 uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D  BB 12 82 58 
 D0 7D
 CA 4B
 
 The hash is spread over two lines and into hex pairs. The normal
 format is either:
 53206afb751f079dbb128258d07dca4b
 or
 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2
 The SHA1 checksums have the same problem.
   
 
   
   
 
   

   
 



   



Re: [jira] Commented: (UIMA-681) change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT

2008-01-03 Thread Marshall Schor
I posted a question on the maven users list asking for best practices
for addressing the updating-the-parent-link when the version changes, in
case there's something obvious we could be doing :-).

-Marshall

Michael Baessler (JIRA) wrote:
 [ 
 https://issues.apache.org/jira/browse/UIMA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1208#action_1208
  ] 

 Michael Baessler commented on UIMA-681:
 ---

 Greate seems to work. I guess I will do a mix between both suggestions. 
 But we still have to change for each child POM the version number of the 
 parent :-(

   
 change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT
 --

 Key: UIMA-681
 URL: https://issues.apache.org/jira/browse/UIMA-681
 Project: UIMA
  Issue Type: Task
  Components: Build, Packaging and Test
Affects Versions: 2.2.1
Reporter: Michael Baessler
Assignee: Michael Baessler
 Fix For: 2.3


 


   



Re: [jira] Closed: (UIMA-679) update UIMA website with release 2.2.1-incubating

2008-01-03 Thread Marshall Schor
Michael - can you announce UIMA 2.2.1 release on the various
announcement places?

-Marshall

Michael Baessler (JIRA) wrote:
  [ 
 https://issues.apache.org/jira/browse/UIMA-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]

 Michael Baessler closed UIMA-679.
 -

 Resolution: Fixed

 I think all updates are done

   
 update UIMA website with release 2.2.1-incubating 
 --

 Key: UIMA-679
 URL: https://issues.apache.org/jira/browse/UIMA-679
 Project: UIMA
  Issue Type: New Feature
  Components: Transport Adapters - SOAP, Vinci
Reporter: Michael Baessler
Assignee: Michael Baessler

 


   



Re: Ready to announce the release ?

2008-01-03 Thread Marshall Schor
Michael Baessler wrote:

+1

The Eclipse update site I thought was documented in our manual on
setting up Eclipse: 
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.1-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.install_uima_eclipse_plugins

Probably should have a side-bar link to it on our web-site, though.
-Marshall
 If all necessary updates are in place, I think we can announce the
 uimaj-2.2.1-incubating release.

 - The release artifacts are uploaded and works with the mirror
 - The website is updated with the latest documentation
 - The release artifacts are uploaded to the Maven Incubator repository
 - The eclipse update site is in place (but currently not documented !?)

 -- Michael





Re: [jira] Commented: (UIMA-681) change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT

2008-01-03 Thread Marshall Schor
Marshall Schor wrote:
 I posted a question on the maven users list asking for best practices
 for addressing the updating-the-parent-link when the version changes, in
 case there's something obvious we could be doing :-).
   
The answer was:

you should try using the release feature of Maven :
http://maven.apache.org/plugins/maven-release-plugin

I looked at this, and it seems the release:prepare step does the following:

1) starts with the SVN state, 
2) updates the POMs for the release version number, 
3) runs the tests
4) commits the POMs
5) makes a TAG with those values
6) updates the POMs for the next -Snapshot 
7) commits those.

So - this might be worth trying... next release?

-Marshall
 Michael Baessler (JIRA) wrote:
   
 [ 
 https://issues.apache.org/jira/browse/UIMA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1208#action_1208
  ] 

 Michael Baessler commented on UIMA-681:
 ---

 Greate seems to work. I guess I will do a mix between both suggestions. 
 But we still have to change for each child POM the version number of the 
 parent :-(

   
 
 change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT
 --

 Key: UIMA-681
 URL: https://issues.apache.org/jira/browse/UIMA-681
 Project: UIMA
  Issue Type: Task
  Components: Build, Packaging and Test
Affects Versions: 2.2.1
Reporter: Michael Baessler
Assignee: Michael Baessler
 Fix For: 2.3


 
   
   
 



   



Re: [Fwd: REMINDER: Board Reports Due THIS Week]

2008-01-07 Thread Marshall Schor

Hi Everyone -

I entered a start at a board report.  It needs some filling out -

Joern - can you add a line or two about progress in the CAS editor?

Michael and Thilo - you've been doing quite a bit of work in the sandbox 
projects - perhaps say something about progress here?


Any other additions appreciated!

-Marshall

Thilo Goetz wrote:

We're due to report this month.

 Original Message 
Subject: REMINDER: Board Reports Due THIS Week
Date: Sun, 6 Jan 2008 23:44:29 -0500
From: Noel J. Bergman [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

Yes, yes, it seems early, but that's what happens when the 1st is a 
Tuesday.

:-)  All Board reports are due this week.

--- Noel



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






[Fwd: ASF grant for UIMA]

2008-01-07 Thread Marshall Schor
The Software Grant for UIMA-EE has been officially received by the 
secretary of Apache.  I'll proceed to 1) put in a Jira issue with a zip 
file for it, and 2) set up in the sandbox under

sandbox
   trunk
  uima-ee
 project 1
 project 2 etc.

the files for this.  As part of this, I plan to reconfigure the parts to 
follow the maven conventions.


-Marshall

 Original Message 
Subject:ASF grant for UIMA
Date:   Mon, 7 Jan 2008 10:37:10 -0500
From:   Jonathan Jagielski [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]



Hello,

I'm writing this email to inform you that the grant for UIMA from IBM 
was received. I'm sorry that you seem to have been kept out of the loop, 
as I replied to other emails saying that it had been received although 
it wasn't entered into the registry. This is probably my fault as I 
didn't make sure that the email I sent to others was received by you.


I'm very sorry that this has taken so long, and the grant should be 
recorded later today.


Sincerely,
Jonathan Jagielski




Re: [jira] Created: (UIMA-678) Update notice file

2008-01-08 Thread Marshall Schor
My understanding from discussions and reading 
http://people.apache.org/~rubys/3party.html 
http://people.apache.org/%7Erubys/3party.html, the principles for 
distributing things involving Eclipse are:


1) You can distribute binaries (but not sources) of things that are EPL 
http://opensource.org/licenses/eclipse-1.0.php (Eclipse Public 
License) licensed, as long as the notice file identifies these and 
provides a link to the source for these.  (This requirement comes from 
the EPL license).


2) You cannot include in your distribution Eclipse sources - because 
that would require using the EPL as the license, not the Apache 
license.  The 3party page has this, though:


   For small amounts of source that is directly consumed by the ASF
   product at runtime in source form, and for which that source is
   unlikely to be changed anyway (say, by virtue of being specified by
   a standard), this action is sufficient. An example of this is the
   web-facesconfig_1_0.dtd
   http://java.sun.com/dtd/web-facesconfig_1_0.dtd, whose inclusion
   is mandated by the JSR 127: JavaServer Faces
   http://jcp.org/en/jsr/detail?id=127 specification.

3) If you have source code which is a derivative work of Eclipse 
source, which can happen if you take an eclipse source file and modify 
it and incorporate the modified/customized file into your source, then 
that's a gray area I'm not too clear about.


Ignoring the version differences, what specific source code files are 
you incorporating?


-Marshall

Jörn Kottmann wrote:

what Eclipse SW does the CAS Editor include?


This depends on the eclipse version which is used to create the build.
The current eclipse version is 3.3.1.1.

The guys from the apache directory studio 
(http://directory.apache.org/studio/) also

do not include the version in the notice file.

Jörn





Re: [Fwd: ASF grant for UIMA]

2008-01-08 Thread Marshall Schor

Hi Robert -

I have some confusion about the IP form.  The page 
http://incubator.apache.org/ip-clearance/index.html seems to be written 
with an implicit assumption that a Top level project with a real, 
project level PMC is doing the receiving - so there are phrases like:


   The receiving PMC is responsible for doing the work. The Incubator
   is simply the repository of the needed information. Once a PMC
   directly checks-in a filled-out short form, the Incubator PMC will
   need to approve the paper work after which point the receiving PMC
   is free to import the code.

Other places say that this IP Clearance work needs to be done by an ASF 
Officer or Member:  for instance, on page 
http://incubator.apache.org/ip-clearance/ip-clearance-template.html it says:


   IP Clearance processing must be executed either by an Officer or a
   Member of the ASF.

So, my basic question is: does this process apply to incubator projects 
which, while incubating, receive additional code via a software grant, 
and if so, is the receiving PMC the Incubator PMC or the 
podling-learning-mode-unofficial-PMC (of which I think I am a member)?  
And, if we are to use the IP Clearance form, how do we have the 
processing executed by an Officer or Member of the ASF?


Thanks for your help and guidance, as usual :-)  -Marshall


Robert Burrell Donkin wrote:

On Jan 7, 2008 10:56 PM, Marshall Schor [EMAIL PROTECTED] wrote:
  

The Software Grant for UIMA-EE has been officially received by the
secretary of Apache.  I'll proceed to 1) put in a Jira issue with a zip
file for it, and 2) set up in the sandbox under
sandbox
trunk
   uima-ee
  project 1
  project 2 etc.

the files for this.  As part of this, I plan to reconfigure the parts to
follow the maven conventions.



remember to fill in the incubator IP clearance form :-)

- robert


  




Re: [Fwd: REMINDER: Board Reports Due THIS Week]

2008-01-09 Thread Marshall Schor
I vote for not duplicating things - can we find one link that would 
always be relevant?


-Marshall

Michael Baessler wrote:

It's seems that we do not update our website with the Board Reports...
http://incubator.apache.org/uima/apache-board-status.html

Either we update it frequently or we remove the page.
Another possibility would be to link to the wiki with the Board Reports.

-- Michael

Thilo Goetz wrote:

We're due to report this month.

 Original Message 
Subject: REMINDER: Board Reports Due THIS Week
Date: Sun, 6 Jan 2008 23:44:29 -0500
From: Noel J. Bergman [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

Yes, yes, it seems early, but that's what happens when the 1st is a 
Tuesday.

:-)  All Board reports are due this week.

--- Noel



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








Re: [Fwd: REMINDER: Board Reports Due THIS Week]

2008-01-09 Thread Marshall Schor
How about just linking to the top page in the wiki for all the board 
reports?  The user would need to do 1 more click to pick the year month, 
and then a scroll to, or find for, UIMA.


-Marshall

Michael Baessler wrote:
I don't think that we find an official link that directly links to the 
UIMA report.
When looking at the source of the Board Report wiki page you can 
construct a link like
http://wiki.apache.org/incubator/January2008#head-7d9a372767f91873c3e2c7152c445cc2adbb291e 

that directly links to the January 2008 report of UIMA. But I don't 
think this is a good idea... :-)


-- Michael

Marshall Schor wrote:
I vote for not duplicating things - can we find one link that would 
always be relevant?


-Marshall

Michael Baessler wrote:

It's seems that we do not update our website with the Board Reports...
http://incubator.apache.org/uima/apache-board-status.html

Either we update it frequently or we remove the page.
Another possibility would be to link to the wiki with the Board 
Reports.


-- Michael

Thilo Goetz wrote:

We're due to report this month.

 Original Message 
Subject: REMINDER: Board Reports Due THIS Week
Date: Sun, 6 Jan 2008 23:44:29 -0500
From: Noel J. Bergman [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

Yes, yes, it seems early, but that's what happens when the 1st is a 
Tuesday.

:-)  All Board reports are due this week.

--- Noel



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]














Re: [jira] Created: (UIMA-699) Fill out IP Clearance Form for UIMA-EE

2008-01-09 Thread Marshall Schor

Marshall Schor (JIRA) wrote:

I've filled out the IP Clearance form as far as I can.  Ken - can you 
fill in the rest and respond with any issues/concerns on the mailing 
list here?


The ip form is here: 
http://svn.apache.org/viewvc/incubator/uima/site/trunk/uima-website/xdocs/ip-clearances/uima-ee.xml?view=markup


-Marshall


Fill out IP Clearance Form for UIMA-EE
--

 Key: UIMA-699
 URL: https://issues.apache.org/jira/browse/UIMA-699
 Project: UIMA
  Issue Type: Task
  Components: Async Scaleout
Reporter: Marshall Schor
Assignee: Marshall Schor
Priority: Minor


Fill out the IP clearance form 
(http://incubator.apache.org/ip-clearance/ip-clearance-template.html ) and have an 
officer / member execute it.

  




startup issue with maven for uimaj-ee

2008-01-10 Thread Marshall Schor
For those of you who may try and build uimaj-ee in the sandbox, there is 
a 1-time maven startup problem.


We currently use a POM structure which has a common parent (in this 
case, it is uimaj-ee's POM).  The common parent factors out some common 
settings, like release numbers and formats. 

Therefore, the child POMs require the common parent in order to be 
processed.


The common parent also specifies in its modules element the child 
POMs.  When you do a mvn install on the parent - it builds the children.


So - the very first time you try this, the child POMs are read *before* 
the uimaj-ee's POM has been installed to your local repo (currently as 
a snapshot).  The effect is that the mvn install of uimaj-ee fails 
because the child POMs can't be processed because their parent is 
missing (in the repository).


The work-around for now is to (1 time only) comment out the module 
elements in the modules section of uimaj-ee, then mvn install it (to 
your local repo).  Then you can uncomment out the module elements. and 
build normally.


I'm not sure how to fix this in a better way.  One idea would be to put 
in the settings / configuration needed to upload SNAPSHOTs to the 
/www/people.apache.org/repo/m2-snapshot-repository/ 
http://people.apache.org/repo/m2-snapshot-repository/ on p.a.o (see 
http://www.apache.org/dev/repository-faq.html - it says, in part:


   The /incubating/ repositories are for releases from projects within
   the Apache Incubator - incubating snapshots still goto the
   /snapshot/ repositories.

Is this something we should do?  My worry is that this would be an 
excessive load on p.a.o for every build anyone does.  Perhaps the better 
idea would be to just manually upload it once?  Anyone know the maven 
magic to do this (if so , please post)?


Of course, we would then need to configure the POMs or the local maven 
user settings to know about using the p.a.o's snapshot repo

--
Another idea would be to change our POM hierarchy to split these two 
functions.  This seems like extra work/complexity, though.  Is there a 
maven parameter to temporarily ignore the module part when doing mvn 
install?

--
Other ideas?

-Marshall




Re: Incubator Eclipse Update Site How To

2008-01-10 Thread Marshall Schor

Robert Burrell Donkin wrote:

the incubator needs to document how to build a mirrored eclipse update
site. my eclipse-foo is not up to the task so i wondered if there are
be any volunteers here in uimaland who'd be willing to help out either
by answering some questions or (even better) creating documentation
patches.

any volunteers?

- robert


  




Re: Incubator Eclipse Update Site How To

2008-01-10 Thread Marshall Schor
Well, this time I'll enter some text before pushing send :-)  I guess 
my mailing-foo or key-pushing-foo suffered a (hopefully) temporary 
breakdown...


I'll volunteer to do this.  Can you point me to where to stick the info, 
and any other protocol-ish things I should be sure to pay attention to?


-Marshall

Robert Burrell Donkin wrote:

the incubator needs to document how to build a mirrored eclipse update
site. my eclipse-foo is not up to the task so i wondered if there are
be any volunteers here in uimaland who'd be willing to help out either
by answering some questions or (even better) creating documentation
patches.

any volunteers?

- robert


  




Re: startup issue with maven for uimaj-ee

2008-01-10 Thread Marshall Schor

Thilo Goetz wrote:



Marshall Schor wrote:
For those of you who may try and build uimaj-ee in the sandbox, there 
is a 1-time maven startup problem.

...

How do we do this in the core?  Aren't we using the same
mechanisms there?
Good question.  I did this experiment: 


   delete uimaj-ee from local maven repo, try building - get error.
   delete uimaj from local maven repo, try building - works!

So, I guess I'll do some differential analysis to see what's going 
on...  My guess is that I factored something that shouldn't be 
factored into the parent.


-Marshall


--Thilo






Re: startup issue with maven for uimaj-ee

2008-01-10 Thread Marshall Schor

Here's what differential analysis found:

The working uimaj POM had

   . . .
 version2.3.0-incubating-SNAPSHOT/version
 properties
 uimaj-version2.3.0/uimaj-version

   uimaj-release-version${uimaj-version}-incubating-SNAPSHOT/uimaj-release-version

   . . .

I noticed that 2.3.0-incubating-SNAPSHOT was available as a property, so 
in the improved (but non-working :-) ) uimaj-ee POM it read:


   . . .
   properties
 . . .
 uimaj-ee-version0.7.0/uimaj-version

   uimaj-ee-release-version${uimaj-version}-incubating-SNAPSHOT/uimaj-release-version

   . . .
   version${uimaj-ee-release-version}/version

Fix was to not use a property substitution in the version element, and 
instead copy the uimaj-ee-release-version tag.  This is probably a Maven 
defect - I'll see (on the maven list).


-Marshall


Marshall Schor wrote:

Thilo Goetz wrote:



Marshall Schor wrote:
For those of you who may try and build uimaj-ee in the sandbox, 
there is a 1-time maven startup problem.

...

How do we do this in the core?  Aren't we using the same
mechanisms there?

Good question.  I did this experiment:
   delete uimaj-ee from local maven repo, try building - get error.
   delete uimaj from local maven repo, try building - works!

So, I guess I'll do some differential analysis to see what's going 
on...  My guess is that I factored something that shouldn't be 
factored into the parent.


-Marshall


--Thilo










Re: startup issue with maven for uimaj-ee

2008-01-10 Thread Marshall Schor

Marshall Schor wrote:
For those of you who may try and build uimaj-ee in the sandbox, there 
is a 1-time maven startup problem.


We currently use a POM structure which has a common parent (in this 
case, it is uimaj-ee's POM).  The common parent factors out some 
common settings, like release numbers and formats.
Therefore, the child POMs require the common parent in order to be 
processed.


The common parent also specifies in its modules element the child 
POMs.  When you do a mvn install on the parent - it builds the children.


So - the very first time you try this, the child POMs are read 
*before* the uimaj-ee's POM has been installed to your local repo 
(currently as a snapshot).  The effect is that the mvn install of 
uimaj-ee fails because the child POMs can't be processed because their 
parent is missing (in the repository).


The work-around for now is to (1 time only) comment out the module 
elements in the modules section of uimaj-ee, then mvn install it (to 
your local repo).  Then you can uncomment out the module elements. 
and build normally.
An easier work-around:  do mvn -N install. 

Maven command line arguments are documented nowhere (that I can find), 
but if you type mvn -? it tells you about this.


-Marshall


Re: startup issue with maven for uimaj-ee

2008-01-11 Thread Marshall Schor

How about this:

When it's time to generate a test build candidate, we do the basic 
release prepare process:


   change the 2.3.0-incubating-SNAPSHOT to 2.3.0-incubating
   save this as a tag in SVN using the candidate release name:
   2.3.0-rc1-incubating
   increment the base SVN to 2.4.0-incubating-SNAPSHOTnote

We then run tests, etc.
If we find a problem, we fix in the base, and do another release prepare 
for the next candidate:


   change the 2.4.0-incubating-SNAPSHOT to 2.3.0-incubating
   save this as a tag in SVN 2.3.0-rc2-incubating
   increment the base SVN to 2.4.0-incubating-SNAPSHOTnote

At some point we find we're satisfied; our last release candidate tag is 
then released;

SVN is already setup for the next level.

The only drawback I see with this is that it would conflate fixing 
release candidates with working on the next version.  We could fix that 
by incrementing the base to a version number that specifically included 
the release candidate info, such as 2.3.0-rc[n]-incubating-SNAPSHOT. 
Then, at the end, we'd need one more release:prepare step to


   update the poms to 2.3.0-incubating, tag to 2.3.0-incubating-release
   (or something like that), and then increment the poms to
   2.4.0-incubating-SNAPSHOT

Would this be a reasonable process?

-Marshall

Adam Lally wrote:

On Jan 11, 2008 8:56 AM, Marshall Schor [EMAIL PROTECTED] wrote:
  

It was also suggested that we use the maven release plugin to update
the version stuff. I think we should investigate that for our next release.




The thing that's always bugged me about the release plugin is that I
don't think it supports our usual mode of operation where we build a
release candidate, then people go off and do lots of manual testing on
it, it gets approved by the IPMC, etc., and then we want to release
exactly that release candidate.

AIUI, the release plugin builds the release from SVN, tags it, and
increments the versions for the next release, all at the same time.
So it doesn't seem to fit the above process.  If we rebuild the
release in this way, then we wouldn't be releasing _exactly_ the same
thing that had been tested and approved.  (I suppose we could diff it,
but even then I think timestamps end up in generated artifacts so it
isn't exactly the same.)

Maybe there's some way to run only the version-number-update part of
the release plugin, and not the other stuff.

-Adam


  




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-11 Thread Marshall Schor

Michael -

I'm confused about how this test is setup.  The test descriptor this 
code uses loads an aggregate, and then runs a process method which ends 
up calling some dummy process method called SequencerTestAnnotator.  
This process method dumps (to a file) the result spec.  Is that the case 
you're running?


How do you turn on and off the (re)computation of the result spec?

-Marshall

Michael Baessler wrote:

Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 
I tried to figure out how the ResultSpecification handling in 
uima-core

works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to 
check if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without 
recomputing the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.


-- Michael






Re: DOUBT FROM AN INDIAN STUDENT

2008-01-13 Thread Marshall Schor

Hi -

Can you please post more information so we might be better able to help?

For instance, what version of UIMA did you install, where did you get it 
from, what steps did you take when installing it, did you set up the 
environment variables as described in the README?


-Marshall

chandra sekhar wrote:

Respected sir , I am V.chandra sekhar , from INDIA doing MS in
Information Technology in DA-IICT  (One of finest Tech Schools in
INDIA), I am doing my internship  in UIMA, I installed UIMA SDK , but
i am not able to run  document analyzer.bat fle. i need to get
Document Analyzer window, but i didnt get. please help me in this
regard .

regards

v.chandra sekhar
PG - Student
DA-IICT
India


  




Re: DOUBT FROM AN INDIAN STUDENT

2008-01-13 Thread Marshall Schor

The path variable seems to show several possible problems.

There appear to be several installs of UIMA, possibly at different 
levels, from different sources, on your machine.  The PATH variable 
points to the following:


   C:\UIMA\bin;
   C:\Program Files\IBM\uima\bin;
   C:\uima\uima1\bin;
   C:\Program Files\Java\jdk1.5.0\bin;
   C:\TODAY\apache-uima\uimacpp\bin;
   C:\TODAY\apache-uima\uimacpp\examples\tutorial\src

 

Can you fix this so that the PATH variable excludes the other UIMA 
installs, and instead, points just to the one you installed?


Another thing that may be a problem is: as of version 2.2.1, Apache UIMA 
requires Java 5 or later to run.  I see in your path that you have 
Java 1.4.  Can you try fixing this too, and seeing if that helps?


-Marshall

Chandra Sekhar wrote:

Respected sir ,

I stored UIMA SDK in (TODAY folder) C:\TODAY\apache-uima\bin .

I set environment variable UIMA_HOME as C:\TODAY\apache-uima .  I set
this variable in system variables location .

This the error i am getting when i double clik on document.analyzer.bat file;


C:\TODAY\apache-uima\binsetlocal

C:\TODAY\apache-uima\bincall C:\TODAY\apache-uima\bin\setUimaClassPath

C:\TODAY\apache-uima\binset UIMA_CLASSPATH=;C:\TODAY\apache-uima\examples\resou
rces;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\lib\uima-docume
nt-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\apache-uima\lib
\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODAY\apache-uima\
lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vinci.jar;\webap
ps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.jar;\webapps\a
xis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\commons-discover
y-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\axis\WEB-INF\li
b\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\webapps\axis\W
EB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\apache-uima\lib\
jVinci.jar;;

C:\TODAY\apache-uima\binset PATH=C:\UIMA\bin;C:\Program Files\IBM\uima\bin;
C:\uima\uima1\bin;C:\Program Files\Java\jdk1.5.0\bin;C:\TODAY\apache-uima\uima
cpp\bin;C:\TODAY\apache-uima\uimacpp\examples\tutorial\src

C:\TODAY\apache-uima\binif C:\j2sdk1.4.2_03 ==  (set UIMA_JAVA_CALL=java )
 else (set UIMA_JAVA_CALL=C:\j2sdk1.4.2_03\bin\java )

C:\TODAY\apache-uima\binC:\j2sdk1.4.2_03\bin\java -cp ;C:\TODAY\apache-uima\
examples\resources;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\l
ib\uima-document-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\a
pache-uima\lib\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODA
Y\apache-uima\lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vi
nci.jar;\webapps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.
jar;\webapps\axis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\co
mmons-discovery-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\a
xis\WEB-INF\lib\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\
webapps\axis\WEB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\ap
ache-uima\lib\jVinci.jar;; -Duima.home=C:\TODAY\apache-uima -Duima.datapath=
 -DVNS_HOST=localhost -DVNS_PORT=9000 -Djava.util.logging.config.file=C:\TODAY
\apache-uima\config\Logger.properties -Xms128M -Xmx800M org.apache.uima.tools.d
ocanalyzer.DocumentAnalyzer
The system cannot find the path specified.

C:\TODAY\apache-uima\binPAUSE
Press any key to continue . . .

sir , please give me solution for this.

regards

sekhar.


  




Re: DOUBT FROM AN INDIAN STUDENT

2008-01-13 Thread Marshall Schor

I see another problem - this is probably the direct problem.

On you machine, you have an environment variable called JAVA_HOME, and 
it is set to


C:\j2sdk1.4.2_03

However, it appears you no longer have Java installed there.

To fix, please install Java 5 (or 6 - these are the levels required for 
UIMA 2.2.1) and set the environment variable JAVA_HOME to where you 
installed it.


-Marshall

chandra sekhar wrote:

Respected sir ,

I stored UIMA SDK in (TODAY folder) C:\TODAY\apache-uima\bin .

I set environment variable UIMA_HOME as C:\TODAY\apache-uima .  I set
this variable in system variables location .

This the error i am getting when i double clik on document.analyzer.bat file;


C:\TODAY\apache-uima\binsetlocal

C:\TODAY\apache-uima\bincall C:\TODAY\apache-uima\bin\setUimaClassPath

C:\TODAY\apache-uima\binset UIMA_CLASSPATH=;C:\TODAY\apache-uima\examples\resou
rces;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\lib\uima-docume
nt-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\apache-uima\lib
\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODAY\apache-uima\
lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vinci.jar;\webap
ps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.jar;\webapps\a
xis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\commons-discover
y-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\axis\WEB-INF\li
b\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\webapps\axis\W
EB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\apache-uima\lib\
jVinci.jar;;

C:\TODAY\apache-uima\binset PATH=C:\UIMA\bin;C:\Program Files\IBM\uima\bin;
C:\uima\uima1\bin;C:\Program Files\Java\jdk1.5.0\bin;C:\TODAY\apache-uima\uima
cpp\bin;C:\TODAY\apache-uima\uimacpp\examples\tutorial\src

C:\TODAY\apache-uima\binif C:\j2sdk1.4.2_03 ==  (set UIMA_JAVA_CALL=java )
 else (set UIMA_JAVA_CALL=C:\j2sdk1.4.2_03\bin\java )

C:\TODAY\apache-uima\binC:\j2sdk1.4.2_03\bin\java -cp ;C:\TODAY\apache-uima\
examples\resources;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\l
ib\uima-document-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\a
pache-uima\lib\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODA
Y\apache-uima\lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vi
nci.jar;\webapps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.
jar;\webapps\axis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\co
mmons-discovery-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\a
xis\WEB-INF\lib\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\
webapps\axis\WEB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\ap
ache-uima\lib\jVinci.jar;; -Duima.home=C:\TODAY\apache-uima -Duima.datapath=
 -DVNS_HOST=localhost -DVNS_PORT=9000 -Djava.util.logging.config.file=C:\TODAY
\apache-uima\config\Logger.properties -Xms128M -Xmx800M org.apache.uima.tools.d
ocanalyzer.DocumentAnalyzer
The system cannot find the path specified.

C:\TODAY\apache-uima\binPAUSE
Press any key to continue . . .

sir , please give me solution for this.

regards

sekhar.


  




Re: DOUBT FROM AN INDIAN STUDENT

2008-01-14 Thread Marshall Schor

Hi -

Please post what you have the JAVA_HOME environment variable set to.

It appears to be set to:

   C:\Program Files\\bin\

This doesn't look correct.

-Marshall

chandra sekhar wrote:

Respected sir , I set JAVA_HOME to C:Program Files only. Error in
document analyzer is solved,but there is an error in
adjustExamplePaths.bat file.

the error message is like this:


C:\Program Files\apache-uima\binsetlocal

C:\Program Files\apache-uima\binif C:\Program Files\ ==  (set UIMA_JAVA_CAL
L=java )  else (set UIMA_JAVA_CALL=C:\Program Files\\bin\java )

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples\data .xml C:/Program_ Files/apach
e-uima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .classpath C:/Program Files/apach
e-uima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .launch C:/Program Files/apache-u
ima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .wsdd C:/Program Files/apache-uim
a C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binPAUSE
Press any key to continue . . .

please help me in this regard sir.


  




Re: DOUBT FROM AN INDIAN STUDENT

2008-01-14 Thread Marshall Schor

chandra sekhar wrote:

Respected Sir , now i set

 JAVA_HOME  variable to C:\Program Files
  
This still appears to be incorrect, I think.  Is there a local student 
or teacher at your university who can help you set up your JAVA_HOME 
variable to point to where Java 5 is installed?


I'm guessing that the Java installer might have installed Java at some 
place like:


   C:\Program Files\Java\jdk1.5.0_something

in which case your JAVA_HOME variable should be something like:

   C:\Program Files\Java\jdk1.5.0_something

(Of course 1.5.0_something is just an example, your actual install 
would have some number in place of the something).


-Marshall

   the error in document analyzer is solved, but there is a path not
found error in adjustExamplePaths.bat file.

The error message is like this.


C:\Program Files\apache-uima\binsetlocal

C:\Program Files\apache-uima\binif C:\Program Files\ ==  (set UIMA_JAVA_CAL
L=java )  else (set UIMA_JAVA_CALL=C:\Program Files\\bin\java )

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples\data .xml C:/Program_ Files/apach
e-uima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .classpath C:/Program Files/apach
e-uima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .launch C:/Program Files/apache-u
ima C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi
les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn
Files C:\Program Files\apache-uima/examples .wsdd C:/Program Files/apache-uim
a C:\Program Files\apache-uima -ignorecase
The system cannot find the path specified.

C:\Program Files\apache-uima\binPAUSE
Press any key to continue . . .


  




[DISCUSS] Naming for sandbox project for Asynchronous Scaleout

2008-01-14 Thread Marshall Schor
There is a new sandbox project, currently called uima-ee.  Should we 
change it's name?


A suggested alternative uima-as.

Some arguments pro / con changing the name

Pro:

  1. uima-as goes with UIMA, Asynchronous Scaleout, and the name,
 therefore, more clearly matches the functionality.  This is good
 from the perspective of being clear and transparent to new
 users/developers.
  2. uima-ee has no official meaning; it came from a practice of
 labeling some products with these kinds of features as enterprise
 edition, such as J2EE.  This is kind of a marketing buzzword,
 without any specific semantics, and could be used to include other
 kinds of enterprise scale capabilities beyond asynchronous
 scaleout (so it is too broad for the current thing, at least).

Con:

  1. uima-ee is already in use; we'd have to do extra (but probably
 1-time) work to change it
  2. uima-ee is broader - so we could include additional enterprise
 scale capability, over time, in the new project, not related
 specifically to Asynchronous Scaleout.
  3. Written without the dash, uima-as becomes uima as and is confusing
 (because as is a common English word, whereas uima ee has no
 such issue
  4. It's always more make-work work to change a name than you think

There are probably other arguments pro / con, please post if significant :-)

Please register your opinions on doing this name change.   When you do, 
please also indicate the strength of your view and reasons for it :-) 
Except for the work, I'm slightly in favor of changing to uima-as.


-Marshall



Re: DOUBT FROM AN INDIAN STUDENT

2008-01-14 Thread Marshall Schor

chandra sekhar wrote:

Respected Sir ,

I didnt found any error messages while running both batch files.But i didnt
get a window when i run documentAnalyzer.bat file.
  
I don't have a good idea what to suggest specifically.  Please see if 
you can get a professor, or another student at your university to take a 
look at your computer and setup and see if they can tell what's going 
wrong.


-Marshall



Workaround for maven eclipse:eclipse failure

2008-01-18 Thread Marshall Schor
If you run eclipse:eclipse goal in the root POM project (uimaj), it 
runs, but doesn't do the right things.  It doesn't reliably set up 
linked resources in the .project, and doesn't reliably set up the 
.classpath file with the proper entries for those linked resources.  The 
observed result is that you get compliation failures in Eclipse - saying 
it can't find things that are in the linked jars. 

The fix is to run eclipse:eclipse in the individual projects, not at the 
root project.


Also, beware of running eclipse:clean goal on the root project - it 
erases the .project file because it is cleaning, and then won't put it 
back (it sees this is a POM project, not a JAR project, and won't put 
.project files in the POM project).


If anyone has any insight on how to get the eclipse:eclipse goal to work 
from the root, that would be nice to hear.


-Marshall


Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-19 Thread Marshall Schor

There are maybe two or three problems.

First - please check if you have a firewall that is blocking internet 
access for particular programs (many firewalls have configuration per 
program, and block out-bound internet access)  One way to check this is 
to turn off the firewall for a test, and see if the connections go thru.


Second, the update site for eclipse emf is documented on 
http://www.eclipse.org/modeling/emf/updates/ 
(Note - this is not the update sige, but it tells you what the update 
site is.  According to this, the update site is 
*http://download.eclipse.org/modeling/emf/updates/site.xml*


Third - if you downloaded and installed the most recent version of 
Eclipse, depending on which packaging you downloaded, it may already 
have EMF.  See the compare packages page on the download page for Eclipse: 
http://www.eclipse.org/downloads/moreinfo/compare.php


-Marshall

chandra sekhar wrote:

Respected Sir , I am sekhar from India.

sir ,  when I am working eclipse modeling framework, when i followed these
steps : Help- Software Updates- Find and install etc ,

I am getting error message like :  network connection problems encountered
during search . when i click detail button of these windows , these are the
details.


Network connection problems encountered during search.
  Unable to access http://wiki.eclipse.org/EMF/Installation;.
Unable to access site: http://wiki.eclipse.org/EMF/Installation;
[Server returned HTTP response code: 403 Forbidden for URL:
http://wiki.eclipse.org/EMF/Installation.]
Server returned HTTP response code: 403 Forbidden for URL:
http://wiki.eclipse.org/EMF/Installation.
Unable to access site: http://wiki.eclipse.org/EMF/Installation;
[Server returned HTTP response code: 403 Forbidden for URL:
http://wiki.eclipse.org/EMF/Installation.]
Server returned HTTP response code: 403 Forbidden for URL:
http://wiki.eclipse.org/EMF/Installation.

can u suggest me which site to use.
please provide a solution for this.

  




Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-20 Thread Marshall Schor
Did you install the uima plugins? 
If so, please do menu Window - Show View - Other - PDE - Plug-ins 
and verify that the plugin
org.apache.uima.desceditor is shown, without any error markers.  If 
there are error markers, please do
menu Window-Show View - Other - PDE -Plug-in Dependencies, and see 
if you can find the org.apache.uima.desceditor and see if it is missing 
some required dependency.


-Marshall

chandra sekhar wrote:

Respected Sir,

While working with UIMA, I am running an productNumbertype example , I
have imported project uima_examples. When I ran it using document analyzer
it is showing window. I have given input, output directory path. click on
RUN and it is running and as result of that new window is appearing after
run.

When I tried to create new system descriptor file by right clicking
descriptor - New - Other, I did find UIMA expander in that wizard. I have
executed successfully all previous steps. I am getting Eclipse Modeling
framework, Example EMF model creation wizard, but not UIMA and Simple
expander in that wizard.

Regards,
sekhar

  




jar naming with/without versions

2008-01-20 Thread Marshall Schor

The basic maven build creates Jars in the target with alternate names:
 project uimaj-core   = drop the j from uimaj, and don't suffix the 
version - uima-core.jar


An exception to this is jVinci - jVinci becomes jVinci.jar (no version)

Another exception: when the uimaj-ep-runtime plugin is built, it has 
Jars in with the names:

  project uimaj-core = uimaj-core-{version}.jar

These names have to match entries in the manifest.

I'm working on improvements to our maven POMs to more automate the 
builds.  I've been able to build the uimaj-ep-runtime jar so it contains 
the other jars, but it puts in the jars as named by the other projects, 
so it has, e.g., uima-core.jar (no j, and no version).


This would make our uimaj-ep-plugin internal jars follow our other jar 
naming conventions, and would reduce the need to have uimaj-ep-runtime 
manifest updated to change version numbers.


Is this OK to do, or is there a reason we keep the uimaj-ep-runtime 
inner jars naming conventions different?


-Marshall




Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-21 Thread Marshall Schor

Hi -

You need to add classpath entries for your project that refer to the 
UIMA jars in the lib directory of UIMA_HOME.
A simple way to do this is to open your project's properties (select the 
project, then do menu: Project - Properties) and select Java Build 
Path.  Then select the Libraries tab, and click on Add Variable.  If you 
haven't already done this, add a variable UIMA_HOME and set it to where 
you installed UIMA.  (to do this, if you need to, click Configure 
Variables...).  Select the UIMA_HOME variable, and click Extend... Then 
expand the lib folder, and select uima-core.jar.


-Marshall

chandra sekhar wrote:

Respected Sir ,
import org.apache.uima.jcas.JCas;
import org.apache.uima.jcas.JCasRegistry;
import org.apache.uima.cas.impl.CASImpl;
import org.apache.uima.cas.impl.FSGenerator;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.impl.TypeImpl;
import org.apache.uima.cas.Type;
import org.apache.uima.cas.impl.FeatureImpl;
import org.apache.uima.cas.Feature;
import org.apache.uima.jcas.tcas.Annotation_Type;

I m getting error org.apache cannot resolved.

regards
sekhar.

  




Re: jar naming with/without versions

2008-01-21 Thread Marshall Schor

Thanks Jörn.

I think that with some experimentation we can get the PDE nature to work 
properly - at least, that's my goal for now :-)  I'll do some more tests 
to see if I can come up with an approach which allows both maven and 
Eclipse to work .


Thanks. -Marshall


Jörn Kottmann wrote:


On Jan 21, 2008, at 10:58 AM, Michael Baessler wrote:

Is this OK to do, or is there a reason we keep the uimaj-ep-runtime 
inner jars naming conventions different?



The motivation was to add a PDE nature to the eclipse plugin projects 
with the maven eclipse plugin. The only way I found

to make it work was to add the version in the manifest.

The other side is that it never worked very well to add a PDE nature 
from the maven plugin, there were still some problems
left and it does not work correctly, e.g Classpath problems in 
eclipse, Cas Editor export does not work, etc.


I think it would be ok to remove the version number and remove the pde 
stuff from the POM file and search another way

to address the pde project nature issue.

What do you think ?

Jörn







Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

2008-01-21 Thread Marshall Schor

I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being false to enable users to avoid recomputing 
things.
Why not use equal ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in the 
result spec.  (There is a special hash set having just the entry of the 
default language = UNSPECIFIED_LANGUAGE = x-unspecified). 

I'm going to try and make the default language hash set a constant, and 
create just one instance of it - this should improve performance, 
especially when languages are not being used.


There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without. 


   The ones without reset any language spec associated with the type or
   feature(s) to the UNSPECIFIED_LANGUAGE.

   The ones with a language, sometimes replace  the language
   associated with the type/feature, and other times, they add the
   language (assuming the type/feature is already an entry in the
   hashMap of types and features).

   methods which are replacing any existing languages:

   setResultTypesAndFeatures[array of TypeOrFeature)repl with
   x-unspecified language
   setResultTypesAndFeatures[array of TypeOrFeature, languages)  
   repl with languages
   addResultTypeOrFeature(1-TypeOrFeature) repl
   with x-unspecified language
   addResultTypeOrFeature(1-TypeOrFeature, languages)  repl with
   languages
   addResultType(String, boolean) repl with x-unspecified
   language
   addResultFeature(1-feature, languages)repl with
   languagesx-unspecified

   methods which are adding to existing languages:

   addResultType(1-type, boolean, languages)  adds languages
   addResultFeature(1-feature)   adds x-unspecified

The set... method essentially clears the result spec and sets it with 
completely new information, so it is reasonable that it replaces any 
existing language information.


The addResult methods, when used to add a type or feature which already 
present, are inconsistent - with one method adding, and the others, 
replacing. This behavior is documented in the JavaDocs for the class.


The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including the 
language is treated as a replace, in the other as an add.  This seems 
likely a bug in the Javadocs. The code for the addResultFeature is 
reversed from the Javadocs: the code will add languages if specified, 
but replaces (with the x-unspecified) if languages are not specified 
in the method call.


Does anyone know what the correct behavior of these methods is 
supposed to be?


-Marshall






Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-22 Thread Marshall Schor

perhaps this is just a couple of spelling errors?

1. Descriptor has an e (not an i) as the 2nd letter.
2. In the UIMA example descriptors, there is no 
ProductNumberDescriptor file.  Did you put one in there that you are 
trying to access?


-Marshall

chandra sekhar wrote:

Respected sir ,

I am implementing the pdf which i am attached with this mail. I executed
correctly upto SPECIFY THE ANALYSIS ENGINE  DESCRIPTOR.  While executing
SPECIFY THE ANALYSIS ENGINE  DESCRIPTOR  .
I am getting an error   An import cannot be resolved.no .xml file with file
name *ProductNumberDiscriptor.xml*  was found in class path or data path.


*Note : I given the name as ProductNumberAEDiscriptor.xml *

my data path is set to : C:\Program Files\IBM\uima\docs\examples\descriptors
my class path  :

C:\ProgramFiles\IBM\uima\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program
Files\IBM\uima\docs\examples\descriptors\vinciService.


regards
sekhar.

  




Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-22 Thread Marshall Schor
I do not know what your pdf file is.  Although you say in an earlier 
mail you attached it, attachments do not come thru on this mailing 
list.  Can you describe the PDF file?  I looked through our UIMA PDF's 
and don't find this name. 

If this name is coming from some other download you have done from some 
other provider, please check to see that you have followed their 
instructions for installation.  See if you can locate that file.  Once 
you locate it, please add that path to your class path or data path.


-Marshall

chandra sekhar wrote:

Respected sir ,  I am just copying the names given by them in pdf file.

my descriptors folder doesnt contain ant file by name
ProductNumberDescriptor. I am trying to acces to
ProductNumberAEDescriptor  from my descriptor folder.even though I am
getting an error.

regards
sekhar.

  




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor
The class CapabilityLanguageFlowObject has 2 defined constructors, but 
one is never used/referenced:

CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec)

Can this be removed?

-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor
If this is removed or if it is never called, then there is a section of 
the logic in CapabilityLanguageFlowObject which is never used, because 
mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, but 
one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall






Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor
In looking thru the code for ResultSpecification_Impl, it seems there 
seems to be an inconsistency - unless I (quite possible :-) ) missed 
something.


The calls to the containsType(...) method operate in one of 2 ways, 
depending on whether or not the result specification has been compiled 
by calling the compile method.


If the result spec has not been compiled, then containsType(...) returns 
true iff the type specified is equal(...) to a type in the Result 
Specification.


If it has been compiled, then the containsType returns true iff the type 
specified is equal to a type *or any of its subtypes* in the Result 
Specification.  This is because compiling a resultSpecification adds the 
subtypes.


Can others confirm this?  In actual use within annotators, it may be 
that the result spec is always compiled before use (I haven't yet traced 
that down).


Should the code and Javadocs be updated to have containsType return true 
for subtypes of types in the result spec, always?


-Marshall



Re: capabilityLangugaeFlow - computeResultSpec

2008-01-22 Thread Marshall Schor
I'm thinking of simplifying the CapabilityContainer class.  Right now it 
has code to process input and well as output capabilities, but the input 
ones appear never to be used.  Can anyone confirm that?  If confirmed, I 
would propose to remove the part related to input capabilities.


There is a HashMap, outputToFCapability, whose keys are Strings 
corresponding to an output type-or-feature name, for any language, for 
any capability-set.  The values do not seem to be used.  I'd like to 
replace this with a hashSet.  Any objections?


-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used when 
the result was set by an application using the process method with the 
resultSpec argument. In the current version it seems that only the 
version with the precomputed FlowTable is used. But I can't say if 
that is correct or not since I don't know the details about the 
ResultSpec restructuring (maybe only Adam knows). But you are right, 
if this constructor isn't necessary both, the code and the 
constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a section 
of the logic in CapabilityLanguageFlowObject which is never used, 
because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall












Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor
OK.  This would confirm that the other constructor is no longer needed, 
since the test that passes a result-spec arg in the process method no 
longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see both 
tests one with the result spec argument in the process() method and 
one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process method 
with the resultSpec argument. In the current version it seems that 
only the version with the precomputed FlowTable is used. But I can't 
say if that is correct or not since I don't know the details about 
the ResultSpec restructuring (maybe only Adam knows). But you are 
right, if this constructor isn't necessary both, the code and the 
constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is never 
used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined constructors, 
but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall


















Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() method 
and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process method 
with the resultSpec argument. In the current version it seems that 
only the version with the precomputed FlowTable is used. But I 
can't say if that is correct or not since I don't know the details 
about the ResultSpec restructuring (maybe only Adam knows). But 
you are right, if this constructor isn't necessary both, the code 
and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
resultSpec)


Can this be removed?

-Marshall
























Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

I did this trace.  Here's how it works now, without calling this.

The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl 
which calls setResultSpecification on the AEEngine_impl object, which

1) clones the result-spec object
2) adds capabilities to it from the *inputs* of all components of this 
aggregate
3) uses this one cloned object as the result spec passed down to each 
component.


Before going further - Michael - a question: isn't this 
union-with-all-inputs-behavior something you didn't want for capability 
language flow?


Maybe it doesn't matter in that the use of capability language flow is 
not done in the real application use cases by passing the result spec in 
the top level call to the process method of the analysis engine?


-Marshall

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall




























Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor
Here's the trace of how this works, when run from a top level 
process(cas) call:


1) the call goes to the AnalysisEngine_Impl process method, which calls 
processAndOutputNewCASes in the same object.  This calls the ASB_impl 
process method, which creates a new AggregateCasIterator(aCAS).  This 
constructor calls computeFlow on the ...asb.impl.FlowControllerContainer 
object.  This calls the particular flow controller's computeFlow 
method.  In this case, the flowController is the 
CapabilityLanguageFlowController.  Since this a new CAS coming in to the 
aggregate, the computeFlow method makes a new 
CapabilityLanguageFlowObject, passing in the pre-computed Flow Table). 

So that's how it uses this constructor, in the case where no specific 
result spec is passed.


-Marshall

Marshall Schor wrote:

Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
But it would still be interesting why this is never needed and how it 
works now.


-- Michael

Marshall Schor wrote:
OK.  This would confirm that the other constructor is no longer 
needed, since the test that passes a result-spec arg in the process 
method no longer calls that.


Thanks.  -Marshall

Michael Baessler wrote:
When looking at the tests for the capability language flow I see 
both tests one with the result spec argument in the process() 
method and one without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.


-- Michael

Marshall Schor wrote:
Thanks.  I'll see about comparing the older method with the 
current method, to verify this.  -Marshall


Michael Baessler wrote:
In older UIMA versions the CapabilityLanguageFlowObject(List 
aNodeList, ResultSpecification resultSpec)  constructor was used 
when the result was set by an application using the process 
method with the resultSpec argument. In the current version it 
seems that only the version with the precomputed FlowTable is 
used. But I can't say if that is correct or not since I don't 
know the details about the ResultSpec restructuring (maybe only 
Adam knows). But you are right, if this constructor isn't 
necessary both, the code and the constructor, can be removed.


Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
If this is removed or if it is never called, then there is a 
section of the logic in CapabilityLanguageFlowObject which is 
never used, because mNodeList == null:



if (mNodeList != null) {
 //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
The class CapabilityLanguageFlowObject has 2 defined 
constructors, but one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)


Can this be removed?

-Marshall




























Re: UIMA Sandbox releases

2008-01-23 Thread Marshall Schor

Re: releasing the Cas Editor - with or without some pre-packaged annotators.

I suspect that Joern would be willing to be the release manager for 
this :-).  He may even be willing to bundle some of the more stable 
sandbox components with it, but certainly not uima-as (uima-ee), which 
is not ready.


The pragmatic, least - work approach would be to pick those sandbox 
projects that would be ready now, and do one release packaging that 
included the Cas Editor.  However, I don't think that's the clearest 
approach for our users.  I think they might like to see bundles arranged 
by topics - and so might like a bundle of annotators, and might 
separately like the Cas Editor.


So - my preference for now would be to keep the Cas Editor as a 
separately packaged thing coming from the project.  If we get additional 
tools, over time, which we consider add-ons and not fundamentally 
needed as part of the core, then perhaps we can have a tools-bundle.


To do this effectively using the Maven way - we might want to have 
each tool (in one project) produce one jar (maven way:  each project 
= one jar), at a particular version level.  These would be available 
in the maven jar repository, and maven tooling could be used to fetch 
them.   Maven assemblies could then be used to package multiples of 
these into bigger packages of things.  A basic idea here would be that 
the version of the assembly would be on a different schedule than the 
components.  So someone downloading an assembled bundle would get 
parts, each of which had their own version number.  This is similar to 
what you get with other big projects that include jars from other 
sources.  The parts which are stable and not changing would not have 
their version numbers incremented in the assembled bundle.


-Marshall


Michael Baessler wrote:

Marshall Schor wrote:

Thilo Goetz wrote:
 

Hi Marshall,

as usual, my view is pretty much the exact opposite ;-)

First of all, I don't see the sense in creating yet another
category.  To my mind, there's nothing wrong with having
mature components in the sandbox.  The only thing I would
consider is to move some sandbox components that are really
important to people into the core.
  

I think people might feel that the sandbox isn't a place to get
production-quality things, and I was hoping that some of these
components were production-quality :-)   
I think Thilo raised a good point here. We still have an empty 
framework that does not provide any linguistic functionality out of 
the box. So maybe we should think about moving sandbox components that 
are ready to use and are important for most of the UIMA users to the 
core. We could than also provide some more out of the box analytics by 
combining the components.


For all the other Sandbox components that are ready to use but are not 
relevant for most of the UIMA users we can consider to do a separate 
release for each component. I guess the release cycles are larger for 
those components so that we do not have so much Sandbox component 
releases.


Opinions?

-- Michael







Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

Eddie - this is for you to check I think:

There is code in UimacppEngine in method serializeResultSpecification 
which adds result spec types and features to 2 IntVector arrays (one for 
Types, one for Features).  As currently designed, these miss getting 
the subtypes of types, and all the features for types marked with the 
all-features flag in the capabilities. 

Are these required here? 

Also, I notice that the result spec supports languages - but the 
serialization for this doesn't support languages.  Is that intended?


-Marshall


Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

2008-01-23 Thread Marshall Schor
I'll fix the Javadocs to correspond to what the code does.  This will 
have the result that
  addResultFeature(1-feature, languages) will *add* to the existing 
languages, while
  addResultFeature(1-feature) will *replace* all existing languages 
with x-unspecified.


-Marshall


Marshall Schor wrote:

I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being false to enable users to avoid recomputing 
things.
Why not use equal ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in 
the result spec.  (There is a special hash set having just the entry 
of the default language = UNSPECIFIED_LANGUAGE = x-unspecified).
I'm going to try and make the default language hash set a constant, 
and create just one instance of it - this should improve performance, 
especially when languages are not being used.


There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without.

   The ones without reset any language spec associated with the type or
   feature(s) to the UNSPECIFIED_LANGUAGE.

   The ones with a language, sometimes replace  the language
   associated with the type/feature, and other times, they add the
   language (assuming the type/feature is already an entry in the
   hashMap of types and features).

   methods which are replacing any existing languages:

   setResultTypesAndFeatures[array of TypeOrFeature)repl with
   x-unspecified language
   setResultTypesAndFeatures[array of TypeOrFeature, languages)  
   repl with languages
   addResultTypeOrFeature(1-TypeOrFeature) repl
   with x-unspecified language
   addResultTypeOrFeature(1-TypeOrFeature, languages)  repl with
   languages
   addResultType(String, boolean) repl with x-unspecified
   language
   addResultFeature(1-feature, languages)repl with
   languagesx-unspecified

   methods which are adding to existing languages:

   addResultType(1-type, boolean, languages)  adds languages
   addResultFeature(1-feature)   adds x-unspecified

The set... method essentially clears the result spec and sets it 
with completely new information, so it is reasonable that it replaces 
any existing language information.


The addResult methods, when used to add a type or feature which 
already present, are inconsistent - with one method adding, and the 
others, replacing. This behavior is documented in the JavaDocs for the 
class.


The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including 
the language is treated as a replace, in the other as an add.  This 
seems likely a bug in the Javadocs. The code for the addResultFeature 
is reversed from the Javadocs: the code will add languages if 
specified, but replaces (with the x-unspecified) if languages are 
not specified in the method call.


Does anyone know what the correct behavior of these methods is 
supposed to be?


-Marshall










Re: capabilityLangugaeFlow - computeResultSpec

2008-01-23 Thread Marshall Schor

Some corner cases.

Case 1:  If using the method to alter an existing result spec by adding 
a single type with an associated set of languages,  the passed in 
allAnnotatorFeatures boolean will now be unioned with any existing 
setting of this.  Javadocs updated to reflect this.


Case 2: If you have a capability for language 1 which says output type A 
(not all features), and have another capability for language 2 which 
says output type A (allAnnotatorFeatures), this will be represented in 
the result spec by having language 1 also be for all features.


Case 3: when setting the result spec, passing null in as the value of 
the languages (for those set/add things that take language arrays) will 
be equivalent to passing in the one language x-unspecified.  So, in 
particular, if a spec says produce type A for lang 1 and 2, and then you 
use the addResultType(for type A, null-passed-in-for-language-spec) this 
will add the language x-unspecified for type A. 

I will attempt to document these in the Javadocs.  Please post a 
response if these corner cases need to be handled differently.


-Marshall


Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-23 Thread Marshall Schor

Hi -

From another email you sent, I see you got by this error.  What did you 
do to resolve this one?


-Marshall

chandra sekhar wrote:



Respected sir ,

I am implementing the pdf which i am attached with this mail. I 
executed correctly upto SPECIFY THE ANALYSIS ENGINE  DESCRIPTOR.  
While executing

SPECIFY THE ANALYSIS ENGINE  DESCRIPTOR  .
I am getting an error   An import cannot be resolved.no 
http://resolved.no/ .xml file with file name 
*ProductNumberDiscriptor.xml*  was found in class path or data path.



*Note : I given the name as ProductNumberAEDiscriptor.xml *

my data path is set to : C:\Program Files\IBM\uima\docs\examples
\descriptors
my class path  :

C:\ProgramFiles\IBM\uima\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program 
Files\IBM\uima\docs\examples\descriptors\vinciService.



regards
sekhar.





Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-23 Thread Marshall Schor
For others wanting to follow this, the references to the right side, 
left side refer to the CAS Visual Debugger tool.  The PDF he refers to 
is the article from IBM DeveloperWorks which is a tutorial on creating 
UIMA applications, by Nicholas Chase, in case you want to search for it. 

What this appears to be is that the annotator is not receiving any input 
to annotate.  Please check the previous step where you are asked to 
specify the data to be analyzed.  What did you specify?  Does that file 
actually exist?


-Marshall

 Copy below is of email sent to me and Tong Fin 23 Jan 2008, 11:23 AM

Respected Sir ,

I am implementing the pdf file attached with this mail.  I implemented 
every thing well and without errors upto* SPECIFY THE ANALYSIS 
DESCRIPTOR . *when I run the debuger , my text file is not appearing on 
the right side. I am also attaching the ProductNumberAnnotator java 
fiile I am using.   when I execute the command Run  Run 
ProductNumberAEDescriptor in debugger window. I am getting these values 
in left side of debugger .


Annotation Index [1]
  uima.tcas.Annotation[1]
  uima.tcas.DocumentAnnotation
  com.backstopmedia.uima,tutorial.ProductNumber[0]
sofalIndex [0].

Sir ,please give me a solution for this.



Re: [DISCUSS] Naming for sandbox project for Asynchronous Scaleout

2008-01-23 Thread Marshall Schor
OK, without further a-do, we change the name to uima-as.  At some point 
when I get a moment, I'll enter a Jira, assign  it to me, and rename the 
uimaj-ee things in SVN to uimaj-as in the sandbox.  After I do that, 
I'll notify everyone...  by posting here again.  (If some other 
committer wants to do this, that's fine, too...)


-Marshall

Jörn Kottmann wrote:

+1

Jörn

On Jan 21, 2008, at 10:49 AM, Michael Baessler wrote:


+1 for changing the name to uima-as.

I think a clear and transparent name is very important that people 
get interested in and work with.
It is also better and easier to integrate to the core if we decide to 
move it from the Sandbox to the core any time in the future.


-- Michael


Marshall Schor wrote:
There is a new sandbox project, currently called uima-ee.  Should we 
change it's name?


A suggested alternative uima-as.

Some arguments pro / con changing the name

Pro:

 1. uima-as goes with UIMA, Asynchronous Scaleout, and the name,
therefore, more clearly matches the functionality.  This is good
from the perspective of being clear and transparent to new
users/developers.
 2. uima-ee has no official meaning; it came from a practice of
labeling some products with these kinds of features as enterprise
edition, such as J2EE.  This is kind of a marketing buzzword,
without any specific semantics, and could be used to include other
kinds of enterprise scale capabilities beyond asynchronous
scaleout (so it is too broad for the current thing, at least).

Con:

 1. uima-ee is already in use; we'd have to do extra (but probably
1-time) work to change it
 2. uima-ee is broader - so we could include additional enterprise
scale capability, over time, in the new project, not related
specifically to Asynchronous Scaleout.
 3. Written without the dash, uima-as becomes uima as and is confusing
(because as is a common English word, whereas uima ee has no
such issue
 4. It's always more make-work work to change a name than you think

There are probably other arguments pro / con, please post if 
significant :-)


Please register your opinions on doing this name change.   When you 
do, please also indicate the strength of your view and reasons for 
it :-) Except for the work, I'm slightly in favor of changing to 
uima-as.


-Marshall











Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor
Without actually testing this (so this may be a wrong conclusion) - it 
seems to me that the code in CapabilityLanguageFlowController that sets 
up the result specs for components, by language, in the mFlowTable, 
ignores the typesOrFeatures that the result spec adds when compile() is 
called.


If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and if 
the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be fixed, 
or is it working as intended?


-Marshall




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor
What about allAnnotatorFeatures?  Supposed the aggregate says it needs a 
particular Feature of a particular type.  Suppose a delegate is marked 
as producing that type, and has allAnnotatorFeatures marked.  This 
wouldn't work. 

You could say in this case that the output capability of the delegate 
*must not* rely on allAnnotatorFeatures, but instead *must* explicitly 
list those features it produces.  In one sense, this could be a good 
idea, because no delegate could *accurately* mark that it outputs 
allAnnotatorFeatures, anyway, due to the possiblity that some other 
component could add features to the type in question, completely unknown 
to this delegate - and of course, this delegate would not be setting 
those other features.


This would lead to another question - should we deprecate 
allAnnotatoreFeatures because of this?


-Marshall

Michael Baessler wrote:

Marshall Schor wrote:
Without actually testing this (so this may be a wrong conclusion) - 
it seems to me that the code in CapabilityLanguageFlowController that 
sets up the result specs for components, by language, in the 
mFlowTable, ignores the typesOrFeatures that the result spec adds 
when compile() is called.


If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and 
if the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.


A consequence would be that the mFlowTable would miss these cases:

  An aggregate wants type A output, and has a delegate with output 
capability A-subtype.


  An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.


Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be 
fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types 
that are specified in the aggregate ae as output type. The guideline 
for the capabilityLanguageFlow was to
specify all output results (with all interim results) in the aggregate 
that must be produced.


I we now change the mFlowTable content to match the resultSpec we also 
changes the capabilityLanguageFlow. So if we do that, how can  I 
prevent the  a sub types  isn't produced if a super type must be 
produced? So I prefer to stay with the current design - specify all 
you need.


What do you think?

-- Michale







Re: RESPECTED SIR , A DOUBT IN UIMA.

2008-01-24 Thread Marshall Schor
Suggestion:  isolate the problem from the Cas Visual Debugger tool, by 
running a simple Java application that runs the annotator. 

Instructions for how to do that are in the tutorials and user guides 
document on the Apache UIMA web site. 

You can use the Eclipse debugger to single-step through things and see 
where things are going wrong.


-Marshall

chandra sekhar wrote:

Sir, I am specifying the data already in the data folder , even though , I
am not getting the annotation results on right side window. The windows
remaining empty.


please give me a suggestion.


-- sekhar.

  




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-24 Thread Marshall Schor
The thing that adds allAnnotatorFeatures and subtypes is compiling the 
result spec. The builder of the mFlowTable doesn't compile the 
resultspec before using it - so it doesn't have these consequences.


-Marshall

Adam Lally wrote:

On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote:
  

If you recall, the compile method for results specifications augments
the set of types/features by doing 2 things:  if the type has
allAnnotatorFeatures=true, it adds all the features of the type; and if
the type has subtypes, it adds those too, propagating the
allAnnotatorFeatures processing down.

A consequence would be that the mFlowTable would miss these cases:

   An aggregate wants type A output, and has a delegate with output
capability A-subtype.




Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
and if the type has subtypes, it adds those too?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.

  

   An aggregate wants Feature F output, and has a delegate with output
capability type-A with allAnnotatorFeatures marked, having that feature.




We should be supporting this as well.  Again I didn't follow why the
behavior you described above doesn't do this.

-Adam


  




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor
The code which checks if a type or feature is in a result spec, for a 
particular language, always includes generalizing the language specifier 
by dropping the part beyond the first -.  For example, en-us and 
en-uk are simplified to en.  Because of this, I'm thinking of 
shrinking the result specification (for performance / space reasons) by 
normalizing any language specs it uses by dropping the country 
extensions, if present.


Any objections?

-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor
The implementation for checking if a feature is in the result spec does 
the following:


If the result-spec is not compiled, it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag 
set.


If the result-spec is compiled, it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag 
set and the feature exists in the type system.


For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature 
not in its type system was present. 


Any objections?

-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor

LeHouillier, Frank D. wrote:

We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to output
based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null or
not.  

Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning true if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked allAnnotatorFeatures, even if the 
Feature didn't exist.


Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false



Here's the current behavior:

Before compile:  The following would all return true except as marked:
  containsType(T1)
  containsType(T2)   returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1

  containsType(T2:f2)   returns false, not in output, etc.
  containsFeature(T1:f1)
  containsFeature(T1:asdfasdfasdfasdf)  yes... that's what it does - 
it ignores the actual feature name because allAnnotatorFeatures is true


After compile the following return true except as marked:
  containsType(T1)
  containsType(T2)   T2 not in output capability, but is recognized 
as a subtype of T1

  containsType(T2:f2)   T1's *allAnnotatorFeatures* is inherited
  containsFeature(T1:f1)
  containsFeature(T1:asdfasdfasdfasdf)  false: the actual features 
are looked up
 
After the change I'm proposing, everything would be same except that

  containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.


Thanks for your comments :-)

-Marshall


-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 5:06 AM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec does 
the following:


If the result-spec is not compiled, it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag


set.

If the result-spec is compiled, it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag


set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature


not in its type system was present. 


Any objections?

-Marshall


  




Re: capabilityLangugaeFlow - computeResultSpec

2008-01-25 Thread Marshall Schor

Michael Baessler wrote:

Michael Baessler wrote:

Adam Lally wrote:
On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] 
wrote:
 
I tried to figure out how the ResultSpecification handling in 
uima-core

works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?




I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
  
Sorry I have to performance test case. I checked my assumption using 
the debugger.


I used the following main() with a loop over the process call to 
check if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.

Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the 
result spec. I will look at this tomorrow.


 public static void main(String[] args) {

 AnalysisEngine ae = null;
 try {

String desc = SequencerCapabilityLanguageAggregateES.xml;

XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));

ResourceSpecifier specifier = UIMAFramework.getXMLParser()
  .parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = Hello world!;
cas.setDocumentText(text);
cas.setDocumentLanguage(en);
for (int i = 0; i  10; i++) {
   ae.process(cas);
}
 } catch (Exception ex) {
ex.printStackTrace();
 }
  }

-- Michael
When setting the loop counter to 1000 I have 6000ms without 
recomputing the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.
I think my change is ready for code review.  I kept all the 
idiosyncratic behavior of the old code, so users should not notice any 
difference.  All the tests run, and test case above runs at the 6000ms 
range. 


There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller memory 
footprint
2) The compiling of this is deferred till the latest possible point; 
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step 
now caches the result spec by component key, and only sends it down if 
it is different from what this controller sent the last time in invoked 
this component in the flow. 

This test depends on the precomputed result specs kept in the mTable 
variable being constant - which I believe they are (once they are 
computed) - but Michael -can you confirm this? 

With this change, the code in the framework to intersect the result 
spec with a component's output capabilities, by language, is not redone 
on every call, but only when the language changes.  That code (to do the 
intersection) is running faster, in any case, due to the restructuring.


Because this is a big change it would be good to do a code review of 
some kind - any thoughts on how to do this?


-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-26 Thread Marshall Schor
Can I replace the class CapabilityContainer with the much more efficient 
(now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor
I may have missed something - I don't see what would need to be added to 
the ResultSpecification class.  The method hasOutputTypeOrFeature(...) 
is always called with doFuzzySearch== true, which is how the 
containsType or containsFeature methods operate (always) in the Result 
Specification class.


Is there some other difference I'm missing?

-Marshall

Michael Baessler wrote:

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall
Yes, if it is possible to add the missing functionality to the 
ResultSpecification class, fine with me.
For example the important method - 
hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
doFuzzySearch) is currently

not available at the ResultSpecification class.

-- Michael






Re: Website update, new files?

2008-01-28 Thread Marshall Schor
ip-clearances is where we have our ip clearance forms for UIMA.  There's 
one (in progress) for uima-as, not done yet.


It's ok.

-Marshall

Michael Baessler wrote:

Thilo Goetz wrote:

I updated our website with information on the LREC
workshop.  When I did svn up on people, some new
files were added that apparently had been checked
in, but not extracted on people.a.o.  Is it ok to
leave it like that?  I assume that things that are
checked in are ok to post to the website.  Here's
what svn said:

Unews.html
Udecisions.html
Ubulk-contribution-checklist.html
Uroles.html
Uexternal-resources.html
Uproject-guidelines.html
Usandbox.html
UcodeConventions.html
Udoc-uima-why.html
Uindex-draft.html
Ufaq.html
Udoc-uima-examples.html
Umanagement.html
Ucontribution-policy.html
Umail-lists.html
Udistribution.html
Arelease.html
Ulicense.html
Udependencies.html
Uapache-board-status.html
Ucode-scan-tools.html
Udownloads.html
Alrec08.html
Uteam-list.html
Aip-clearances
Aip-clearances/uima-ee.html
Ugldv07.html
Uget-involved.html
Ucommunication.html
Usvn.html
Ujavadoc.html
Uindex.html
Udocumentation.html
Uuima-specification.html
Updated to revision 615921.

So what about ip-clearances and release?

--Thilo
release.html isn't ready for publishing. I haven't checked in that 
file, so I'm not sure why it occurs in that list.


-- Michael






Clarifying language subsumption in Result Specifications

2008-01-28 Thread Marshall Schor
Language specifications are in a hierarchy.  For example, from most 
inclusive to finer subsets, we have:


x-unspecified
  en
en-us

A result spec's most common use is in a negative sense - Annotators can 
check a result spec and if it doesn't contain the type or feature, it 
can skip producing that type or feature.


For simplicity, let's consider we have only one type or feature, called TF.

If the annotator thinks it produces TF for language en-us only, and 
wants to check if should skip producing this, it calls 
containsType/Feature(TF, en-us).  This is defined in the current impl 
to return true, if the result spec has languages x-unspecified, en, or 
en-us.


Let's consider the opposite case.  Suppose we have an annotator that can 
produce TF for en.  Suppose the result-spec has an entry for TF only 
for the language en-us.  Should that annotator produce results?  If it 
calls containsType/Feature(TF, en), it will get a false (current 
implementation).


After some thinking about this and some discussion (because I don't 
think I got it right, just by myself :-) ),

it seems that this is correct.  Consider the following case:
 The language of the document is en, and the containing (top-most) 
aggregate specified explicitly it wanted
 output only for en-us.  In that case, the annotator should not produce 
any results, because the language
 of this doc is not en-us, and the assembler put together things that 
they said should only output en-us results.


This same logic seems to apply to x-unspecified:

Suppose we have an annotator that can produce TF for x-unspecified.  
Suppose the result-spec has an entry for TF only for the language en.  
Should that annotator produce results?  If it calls 
containsType/Feature(TF, x-unspecified), it should get a false 
(broken in the current implementation!, but was true I think in the 
previous one).


Assume the language of the document is x-unspecified, and the 
containing (top-most) aggregate specified explicitly it wanted
output only for en.  In that case, the annotator should not produce any 
results, because the language
of this doc is not en, and the assembler put together things that they 
said should only output en results.


Do others agree with this?

-Marshall


Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor
I went back and checked the Javadocs for the ResultSpecification, prior 
to my reworking of it.  I think I treated the x-unspecified slightly 
wrong, and if I had done it right, then the anomaly noted in the 
previous note (below) would not be there.


The previous Javadocs all say that the setters for a typeOrFeature 
without a language argument, are equivalent to passing in the 
x-unspecified language.   The method containsType/Feature(foo, 
x-unspecified) should be made to return true only if the Result 
specification for this contained x-unspecified.  It might not, if, for 
instance, the setting for Foo was only for languages en and de. 


A consequence of making it work this way is the following:

  containsType(foo, x-unspecified) will return false if foo is in 
the result spec

  only for particular languages.

   and the containsType(foo)   no language argument
   would also return false, if foo is in the result spec
   only for particular languages.

I plan correct the treatment of x-unspecified, along these lines, to 
work as described above.

Please post any concerns/objections :-)

-Marshall

Marshall Schor wrote:
While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).


Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with 
language x-unspecified.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en, the 2nd one produces Foo with 
language x-unspecified.  A flow given language x-unspecified 
should run the 2nd annotator, skipping the first one.  (This is how it 
works now).


===

Here's another similar case, using the other language subsumption 
between en-us and en.


Let's suppose that the aggregate says it produces type Foo with 
language en.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en-us, the 2nd one produces Foo with 
language en.  A flow given language en should run the 2nd 
annotator, skipping the first one. (This is how it works now, I think).


With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. Currently, the ResultSpecification matching works like this:

 Language arg RsltSpc Matches
  enen-us   no
  en-us en  yes
  x-unspecified *any* yes behavior needs to be different
  enx-unsp..yes

Is this correct?

-Marshall

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall










Re: capabilityLangugaeFlow - computeResultSpec

2008-01-28 Thread Marshall Schor
While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).


Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with language 
x-unspecified.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en, the 2nd one produces Foo with language 
x-unspecified.  A flow given language x-unspecified should run the 
2nd annotator, skipping the first one.  (This is how it works now).


===

Here's another similar case, using the other language subsumption 
between en-us and en.


Let's suppose that the aggregate says it produces type Foo with language 
en.


Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language en-us, the 2nd one produces Foo with 
language en.  A flow given language en should run the 2nd annotator, 
skipping the first one. (This is how it works now, I think).


With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. 
Currently, the ResultSpecification matching works like this:

 Language arg RsltSpc Matches
  enen-us   no
  en-us en  yes
  x-unspecified *any* yes behavior needs to be different
  enx-unsp..yes

Is this correct?

-Marshall

Marshall Schor wrote:
Can I replace the class CapabilityContainer with the much more 
efficient (now) ResultSpecification class?


It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.


Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -

smaller code base = less maintenance effort in the future :-)

-Marshall






Re: Clarifying language subsumption in Result Specifications

2008-01-28 Thread Marshall Schor
I tried implimenting this change, and 2 test cases fail.  They look like 
they are failing exactly in the case where the result specification has 
a TypeOrFeature with a specified type other than x-unspecified, and 
the containsTypeOrFeature method is being called using the form which 
doesn't pass in an explicit type, so is being treated as if 
x-unspecified was passed in. 

As discussed below, this should give false, but the text cases expect 
true.


Should I change the test cases?  The failing ones are:

ResultSpecification_implTest:  It defines a result spec containing the 
type FakeType for languages en, de, en-US, en-GB, but not 
x-unspecified.  So the call rs.containsType(FakeType) returns false, 
but the test says it should return true (because the set of languages 
for FakeType is missing x-unspecified).


The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.
The descriptor for one of the tests, the TutorialDateTime descriptor 
says it output 3 types, *but for language en* (only, and not for 
x-unspecified in particular).


The result spec built for the aggregate is empty (the test case has 
nothing specified here). 

When it is passed down to the delegates, the setResultSpecification for 
the Pear descriptor in PearAnalysisEngineWrapper is called.  This is not 
implemented, so it inherits from its super, which is 
AnalysisEngineImplBase - and this impl does nothing (expecting to be 
overridden).  I'll write this up as a Jira issue.  

But even if this were fixed, because the outer Aggregate had nothing 
specified in its capability, the inner primitive analysis engine is set 
up initially with a default result spec, which is its own output 
capabilities.  This spec says it should produce results just for en, 
and in particular it should *not* produce output for x-unspecified.  
This annotator is written to respect the result spec, so it doesn't 
produce anything.


Anyone object to my changing the test cases?

-Marshall

Marshall Schor wrote:
Language specifications are in a hierarchy.  For example, from most 
inclusive to finer subsets, we have:


x-unspecified
  en
en-us

A result spec's most common use is in a negative sense - Annotators 
can check a result spec and if it doesn't contain the type or feature, 
it can skip producing that type or feature.


For simplicity, let's consider we have only one type or feature, 
called TF.


If the annotator thinks it produces TF for language en-us only, and 
wants to check if should skip producing this, it calls 
containsType/Feature(TF, en-us).  This is defined in the current 
impl to return true, if the result spec has languages x-unspecified, 
en, or en-us.


Let's consider the opposite case.  Suppose we have an annotator that 
can produce TF for en.  Suppose the result-spec has an entry for TF 
only for the language en-us.  Should that annotator produce 
results?  If it calls containsType/Feature(TF, en), it will get a 
false (current implementation).


After some thinking about this and some discussion (because I don't 
think I got it right, just by myself :-) ),

it seems that this is correct.  Consider the following case:
 The language of the document is en, and the containing (top-most) 
aggregate specified explicitly it wanted
 output only for en-us.  In that case, the annotator should not 
produce any results, because the language
 of this doc is not en-us, and the assembler put together things that 
they said should only output en-us results.


This same logic seems to apply to x-unspecified:

Suppose we have an annotator that can produce TF for x-unspecified.  
Suppose the result-spec has an entry for TF only for the language 
en.  Should that annotator produce results?  If it calls 
containsType/Feature(TF, x-unspecified), it should get a false 
(broken in the current implementation!, but was true I think in the 
previous one).


Assume the language of the document is x-unspecified, and the 
containing (top-most) aggregate specified explicitly it wanted
output only for en.  In that case, the annotator should not produce 
any results, because the language
of this doc is not en, and the assembler put together things that 
they said should only output en results.


Do others agree with this?

-Marshall






Re: Clarifying language subsumption in Result Specifications

2008-01-29 Thread Marshall Schor

Michael Baessler wrote:

Marshall Schor wrote:
I tried implimenting this change, and 2 test cases fail.  They look 
like they are failing exactly in the case where the result 
specification has a TypeOrFeature with a specified type other than 
x-unspecified, and the containsTypeOrFeature method is being called 
using the form which doesn't pass in an explicit type, so is being 
treated as if x-unspecified was passed in.
As discussed below, this should give false, but the text cases 
expect true.


Should I change the test cases?  The failing ones are:

ResultSpecification_implTest:  It defines a result spec containing 
the type FakeType for languages en, de, en-US, en-GB, but 
not x-unspecified.  So the call rs.containsType(FakeType) returns 
false, but the test says it should return true (because the set of 
languages for FakeType is missing x-unspecified).



Which test method you are talking about? I would like to look at.
The call is on line 332 of class ResultSpecification_implTest.  This 
changed behavior arises from the proposed change to how containsType 
method works:  the changed logic is:  if the language x-unspecified is 
given (or if no language is given, as in this case), return true only if 
the result specification for this type or feature includes the langauge 
x-unspecified.  In this test, the result specification for the type 
FakeType is set from the component's capabilities specification, which 
said this component outputs FakeType for languages en, de, en-US, 
en-GB, but not x-unspecified.  So with the propsed changed to how 
containsType works, it returns false.  But the test case expects true.



The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.
The descriptor for one of the tests, the TutorialDateTime descriptor 
says it output 3 types, *but for language en* (only, and not for 
x-unspecified in particular).


The result spec built for the aggregate is empty (the test case has 
nothing specified here).
When it is passed down to the delegates, the setResultSpecification 
for the Pear descriptor in PearAnalysisEngineWrapper is called.  This 
is not implemented, so it inherits from its super, which is 
AnalysisEngineImplBase - and this impl does nothing (expecting to be 
overridden).  I'll write this up as a Jira issue. But even if this 
were fixed, because the outer Aggregate had nothing specified in 
its capability, the inner primitive analysis engine is set up 
initially with a default result spec, which is its own output 
capabilities.  This spec says it should produce results just for 
en, and in particular it should *not* produce output for 
x-unspecified.  This annotator is written to respect the result spec, 
so it doesn't produce anything.


The PearRuntimeTest does not use to capabilityLanguageFlow so we have 
a different behavior there!
This test is just testing if the component's behavior with respect to 
using the result specification; I don't think it has anything to do with 
the capabilityLanguageFlow?


-Marshall


-- Michael






Re: Clarifying language subsumption in Result Specifications

2008-01-29 Thread Marshall Schor

Michael Baessler wrote:

Marshall Schor wrote:

Michael Baessler wrote:

Marshall Schor wrote:
Language specifications are in a hierarchy.  For example, from most 
inclusive to finer subsets, we have:


x-unspecified
  en
en-us

A result spec's most common use is in a negative sense - Annotators 
can check a result spec and if it doesn't contain the type or 
feature, it can skip producing that type or feature.


For simplicity, let's consider we have only one type or feature, 
called TF.


If the annotator thinks it produces TF for language en-us only, and 
wants to check if should skip producing this, it calls 
containsType/Feature(TF, en-us).  This is defined in the current 
impl to return true, if the result spec has languages 
x-unspecified, en, or en-us.


Let's consider the opposite case.  Suppose we have an annotator 
that can produce TF for en.  Suppose the result-spec has an entry 
for TF only for the language en-us.  Should that annotator 
produce results?  If it calls containsType/Feature(TF, en), it 
will get a false (current implementation).


After some thinking about this and some discussion (because I don't 
think I got it right, just by myself :-) ),

it seems that this is correct.  Consider the following case:
 The language of the document is en, and the containing 
(top-most) aggregate specified explicitly it wanted
 output only for en-us.  In that case, the annotator should not 
produce any results, because the language
 of this doc is not en-us, and the assembler put together things 
that they said should only output en-us results.


This same logic seems to apply to x-unspecified:

Suppose we have an annotator that can produce TF for 
x-unspecified.  Suppose the result-spec has an entry for TF only 
for the language en.  Should that annotator produce results?  If 
it calls containsType/Feature(TF, x-unspecified), it should get a 
false (broken in the current implementation!, but was true I 
think in the previous one).
I'm not sure you are right here. I think if an annotator can produce 
TF for x-unspecified that means that it can produce TF for all 
languages. So if an en document comes in the annotator should 
produce a result.
hmmm, this seems to contradict your statement below, saying That 
case is correct.


In the example below, the result-spec passed in to the annotator has 
only en, not x-unspecified.  This is the case proposed in my 
paragraph.  Below you say it is right for the annotator to *not* 
produce results, while above you say it should produce results.  This 
is inconsistent, unless I've mangled something...   Can you clarify?


-Marshall


Assume the language of the document is x-unspecified, and the 
containing (top-most) aggregate specified explicitly it wanted
output only for en.  In that case, the annotator should not produce 
any results, because the language
of this doc is not en, and the assembler put together things that 
they said should only output en results.



That case is correct.

-- Michael




Maybe the confusion comes from the different treatment of 
x-unspecified. If x-unspecified is specified in the output spec of 
an annotator it means that it can produce results for all languages. 
True - and that works.  But that wasn't the case I was trying to 
describe - I was trying to describe the opposite case:  The case  where 
the *output spec* of an annotator is *missing* the x-unspecified. 

To restate the case:  The output spec has en (only), and the 
annotator, when running, queries the result spec with x-unspecified.  
This proposal says in that case, containsType should return false.  Do 
you agree this should be the result in this case?  It seems you do above 
when you say That case is correct, but disagree in the paragraph where 
you say I'm not sure you are right here.. 

Perhaps I have not clearly described the two cases, but I think they are 
the same case (and therefore need to have the same answer ;-) ) 


-Marshall



-- Michael







Result Specification fixes and Capability Language Flow speed up work now done

2008-01-29 Thread Marshall Schor
Except for UIMA-727.  Michael - please run any performance tests you 
have.   I hope the performance is now significantly improved :-)


-Marshall


Re: Clarifying language subsumption in Result Specifications

2008-01-29 Thread Marshall Schor

Michael Baessler wrote:

Marshall Schor wrote:

Michael Baessler wrote:

Marshall Schor wrote:
I tried implimenting this change, and 2 test cases fail.  They look 
like they are failing exactly in the case where the result 
specification has a TypeOrFeature with a specified type other than 
x-unspecified, and the containsTypeOrFeature method is being 
called using the form which doesn't pass in an explicit type, so is 
being treated as if x-unspecified was passed in.
As discussed below, this should give false, but the text cases 
expect true.


Should I change the test cases?  The failing ones are:

ResultSpecification_implTest:  It defines a result spec containing 
the type FakeType for languages en, de, en-US, en-GB, but 
not x-unspecified.  So the call rs.containsType(FakeType) 
returns false, but the test says it should return true (because the 
set of languages for FakeType is missing x-unspecified).



Which test method you are talking about? I would like to look at.
The call is on line 332 of class ResultSpecification_implTest.  This 
changed behavior arises from the proposed change to how containsType 
method works:  the changed logic is:  if the language x-unspecified 
is given (or if no language is given, as in this case), return true 
only if the result specification for this type or feature includes 
the langauge x-unspecified.  In this test, the result specification 
for the type FakeType is set from the component's capabilities 
specification, which said this component outputs FakeType for 
languages en, de, en-US, en-GB, but not x-unspecified.  So 
with the propsed changed to how containsType works, it returns 
false.  But the test case expects true.
I don't know that test, but it is fine with me to change the behavior 
since it seems to be wrong!



The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.
The descriptor for one of the tests, the TutorialDateTime 
descriptor says it output 3 types, *but for language en* (only, 
and not for x-unspecified in particular).


The result spec built for the aggregate is empty (the test case has 
nothing specified here).
When it is passed down to the delegates, the setResultSpecification 
for the Pear descriptor in PearAnalysisEngineWrapper is called.  
This is not implemented, so it inherits from its super, which is 
AnalysisEngineImplBase - and this impl does nothing (expecting to 
be overridden).  I'll write this up as a Jira issue. But even if 
this were fixed, because the outer Aggregate had nothing 
specified in its capability, the inner primitive analysis engine is 
set up initially with a default result spec, which is its own 
output capabilities.  This spec says it should produce results just 
for en, and in particular it should *not* produce output for 
x-unspecified.  This annotator is written to respect the result 
spec, so it doesn't produce anything.


The PearRuntimeTest does not use to capabilityLanguageFlow so we 
have a different behavior there!
This test is just testing if the component's behavior with respect to 
using the result specification; I don't think it has anything to do 
with the capabilityLanguageFlow?
So you mean that the computation of the default result spec does not 
work correctly, since it is not implemented correctly? If that is 
true, please go ahead and fix it. I was not aware of that. Thanks for 
catching it!

This has been entered as Jira-727.  Not fixed yet (or assigned).
-Marshall


-- Michael







Possible design change for Capability Language Flow to consider needed inputs?

2008-01-30 Thread Marshall Schor
Suppose I have a capability language flow for an aggregate having 2 
delegates where the aggregate's capability spec says it outputs type 
Toutput.  

Let's say the delegate #2 has a capability spec saying it outputs 
Toutput,

but needs Tinput as an input, and the aggregate's Capability spec
*doesn't* include Tinput as an input.

Let's say delegate#1 has a capability spec saying it outputs Tinput,
the input needed by delegate#2

The current logic in CapabilityLanguageFlowController.computeSequence 
would build a flow having only delegate #2, because it doesn't currently 
consider the need for some flow elements to produce input types needed 
by later delegates.


I'm not sure if this is worth fixing, or, if it should just be 
documented as a limitation.  A proper fix might take some work - as it 
should consider sequencing to insure needed inputs are produced before 
they're needed.  Any opinions?


Also, currently, it is possible that 
CapabilityLanguageFlowController.computeSequence can fail to find a flow 
that produces all of the types listed in the Output Spec for its 
aggregate.  In this case, it produces a partial flow - one in which some 
(perhaps 0) annotators will run, and not produce all the outputs 
needed.  Currently this is not flagged as an error, or logged.  Should 
it be?


-Marshall


UIMA java objects which implement MetaDataObject, question about equals and hashCode

2008-01-30 Thread Marshall Schor

Many UIMA framework objects implement the MetaDataObject interface.

This interface has an equals method, which does a attribute by attribute 
equals check (recursively).


This interface, however, doesn't implement the hashCode method. 

So, if any object were to insert one of these objects into a hash table, 
two equal objects could get different hash codes.


For instance, TypeOrFeature instances implements the MetaDataObject.  It 
might be stored in a hash table or hash set (this was done in the 
previous impl of ResultSpecification _impl). 

Wouldn't this (at least in principle, theoretically) cause a problem? 

Is the general, safe, fix to add a hashCode method to the MetaDataObject 
interface and impl?


-Marshall


Re: [jira] Commented: (UIMA-735) ResultSpecification_impl missing equals and hashCode for inner class - causing intermittant test case failure

2008-01-31 Thread Marshall Schor

Thilo Goetz (JIRA) wrote:
[ https://issues.apache.org/jira/browse/UIMA-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564316#action_12564316 ] 


Thilo Goetz commented on UIMA-735:
--

Any particular reason you didn't close this issue?
  
Because the condition which led me to suspect this is quite 
intermittant, I wanted to run tests for a few days to be sure that was 
the cause.


-Marshall


  

ResultSpecification_impl missing equals and hashCode for inner class - causing 
intermittant test case failure
-

Key: UIMA-735
URL: https://issues.apache.org/jira/browse/UIMA-735
Project: UIMA
 Issue Type: Bug
   Affects Versions: 2.2.1
   Reporter: Marshall Schor
   Assignee: Marshall Schor
   Priority: Minor
Fix For: 2.3


The ResultSpec impl has an inner class, ToF_Languages.  When comparing 2 result 
specificaitons for equal in test cases, these are compared.  But they are 
missing an equals (and hashCode) methods.  So the test case fails to say 
they're equal unless they're identical.  But cloning happens a lot in the way 
Result specs are used, and in this test, they may be equal (I think) but not ==.
Solution: Add proper equals and hashCode to this inner class.



  




Re: [jira] Commented: (UIMA-735) ResultSpecification_impl missing equals and hashCode for inner class - causing intermittant test case failure

2008-01-31 Thread Marshall Schor

Thilo Goetz wrote:

Marshall Schor wrote:

Thilo Goetz (JIRA) wrote:
[ 
https://issues.apache.org/jira/browse/UIMA-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564316#action_12564316 
]

Thilo Goetz commented on UIMA-735:
--

Any particular reason you didn't close this issue?
  
Because the condition which led me to suspect this is quite 
intermittant, I wanted to run tests for a few days to be sure that 
was the cause.


-Marshall


It would be good then if you could put that in a comment
in Jira.  It's confusing to see the status change to
resolved without any indication what is needed to close
the issue.

OK - you make a good point :-).  -Marshall


--Thilo







Re: UIMA java objects which implement MetaDataObject, question about equals and hashCode

2008-01-31 Thread Marshall Schor

Adam Lally wrote:

MetaDataObject_impl already implements hashCode.  The MetaDataObject
interface, though, explicitly declares equals() but not hashCode().
This doesn't actually have any effect on the behavior (declaring these
on the interface doesn't actually force anyone to override the method
if they implement the interface).  But it does seem inconsistent from
a documentation perspective - either we should declare neither method
or both.  Really I could go either way.  The upside of declaring them
is to document that equals and hashCode should be overridden by any
implementation of MetaDataObject.  The downside is that people might
think this is actually enforced by Java, when it is not.
  
Are you saying that if the interface you say your class implements has 
hashCode, but your implementation doesn't implement it, and neither do 
any of your superclasses, that this won't be caught as a compile error 
by Java?  or just that you don't have to implement it directly in your 
class? (this I understand).  If it just the latter, then it seems to me 
quite valuable to include this in the interface, in case someone says 
they implement it but don't have your implementation of 
MetaDataObject_impl in their superclass path (unlikely, I know...).

  -Adam

  
Interesting observations...  Eclipse pointed out to me that there was an 
issue of some kind here, when I asked it to implement the equals and 
hashCode methods for the new inner class in ResultSpecification.  It 
said that the TypeOrFeature had an issue with hasCode. 



On Jan 30, 2008 10:57 PM, Marshall Schor [EMAIL PROTECTED] wrote:
  

Many UIMA framework objects implement the MetaDataObject interface.

This interface has an equals method, which does a attribute by attribute
equals check (recursively).

This interface, however, doesn't implement the hashCode method.

So, if any object were to insert one of these objects into a hash table,
two equal objects could get different hash codes.

For instance, TypeOrFeature instances implements the MetaDataObject.  It
might be stored in a hash table or hash set (this was done in the
previous impl of ResultSpecification _impl).

Wouldn't this (at least in principle, theoretically) cause a problem?

Is the general, safe, fix to add a hashCode method to the MetaDataObject
interface and impl?

-Marshall





  




renaming uima-ee- to uima-as this Sunday?

2008-01-31 Thread Marshall Schor
I plan to rename things in SVN from uima-ee to uima-as this Sunday, as 
was discussed in another mail thread.  This may break the uima-as builds 
for a while as we work out the loose ends.  If this timing is bad, 
please voice your opinion.


-Marshall


Re: renaming uima-ee- to uima-as this Sunday?

2008-01-31 Thread Marshall Schor
I't's been pointed out to me that some people may be planning to submit 
patches to the uima-ee code, and that we should allow time for this to 
happen, and for the patches to be committed.  This should cut down on 
make-work due to the naming change.


Based on that, I'm moving the proposed renaming to Wednesday, Feb 6.

-Marshall

Marshall Schor wrote:
I plan to rename things in SVN from uima-ee to uima-as this Sunday, as 
was discussed in another mail thread.  This may break the uima-as 
builds for a while as we work out the loose ends.  If this timing is 
bad, please voice your opinion.


-Marshall






Re: capabilityLangugaeFlow - computeResultSpec

2008-02-01 Thread Marshall Schor

LeHouillier, Frank D. wrote:

While making this change wouldn't affect us in any way as I can see now,
it would still be possible to use the Features in the Result Spec in a
similar way.  


Suppose you have an information extraction component that extracts
entities with attributes and you want to control which attributes are
actually being added to the CAS with the Result Spec.  You might have
type Person, with a range of features such as Address, Phone number,
Age, etc. some of which you want to output in a given configuration and
others not.  Suppose the information extraction component also extracts
attributes which are so useless that you don't include them as features
in the type system at all such as an internal id number.  Currently,
with a compiled Result Spec you could have the annotator look up the
feature on the basis of the name of the feature and then you could
reliably instantiate the feature without further ado.  After your
change, the feature would have to be checked to see if it actually
exists.  
  
We added code in the actual change that now checks to see if the feature 
actually exists (for a compiled Result Spec).  I thought it was better 
to preserve the status quo here, rather than remove this check (for 
performance reasons).  It didn't seem like it would have any measurable 
performance impact - it's one hash table lookup, basically.


Cheers. -Marshall

Again, this doesn't seem like it is that big a deal to me but I thought
I might just point out that it might have a use case.  In practice, it
seems to me that most annotators figure out the features available
either during compilation by using the JCas or during the initialization
of the Annotator.  


-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 3:57 PM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

LeHouillier, Frank D. wrote:
  

We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to


output
  

based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null


or
  
not.  


Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning true if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked allAnnotatorFeatures, even if the


Feature didn't exist.

Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false



Here's the current behavior:

Before compile:  The following would all return true except as marked:
   containsType(T1)
   containsType(T2)   returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1

   containsType(T2:f2)   returns false, not in output, etc.
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf)  yes... that's what it does -

it ignores the actual feature name because allAnnotatorFeatures is true

After compile the following return true except as marked:
   containsType(T1)
   containsType(T2)   T2 not in output capability, but is recognized 
as a subtype of T1

   containsType(T2:f2)   T1's *allAnnotatorFeatures* is inherited
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf)  false: the actual features 
are looked up
  
After the change I'm proposing, everything would be same except that

   containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.


Thanks for your comments :-)

-Marshall
  

-Original Message-
From: Marshall Schor [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 5:06 AM

To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec

does 
  

the following:

If the result-spec is not compiled, it says the feature is present

if 
  

it specifically put in, or if its type has the allAnnotatorFeatures


flag
  

set.

If the result-spec is compiled, it says the feature is present if it



  

is specifically put in, or if its type has the allAnnotatorFeatures


flag
  

set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result

Re: A DOUBT IN UIMA

2008-02-01 Thread Marshall Schor

Hello -

Can you post two things, to the uima-user list (this list is the 
uima-dev list, and this thread is off topic):


First, the entire stack trace when you get the error.

Second, the location of the ProductNumberAnnotator in your system.

Thanks; with that information, we may be able to help you more.
 -Marshall

chandra sekhar wrote:

Respected sir ,  I am implementing  nicholas chase paper on UIMA (Product
Number Annotation).  Sir , I am getting an error which is same as the error
,prescribed in this link of IBM.

http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=138977tstart=0

The error I am getting is :
com.ibm.uima.resource.ResourceInitializationException: Annotator class 
com.ibm.uima.tutorial.ProductNumberAnnotator was not found.


I used UIMA_SDK_2_0_2_setupwin32 to install my UIMA .

Mr .Lally specified a solution for UIMA version 2-0.

*sir ,can you specify ,how to find a plugin , for my class 
com.ibm.uima.tutorial.ProductNumberAnnotator, I dont know how to find a
plugin for a class. please help me sir.

*regards
sekhar.*


*

  




Javadoc building - we have 2

2008-02-01 Thread Marshall Schor
There are 2 configurations in the POM for building Javadocs: one in the 
parent uimaj POM and one in the uimaj-distr POM.


The thinking behind this was that the one in the uimaj-distr POM would 
run as part of the assembly process, and build Javadocs for the binary 
distribution, consisting of the external APIs.  The one in the parent 
would run when doing a mvn site plugin, it includes more internal 
packages in the set of Javadocs being produced.


The idea is that we could post the internal ones for developers to 
use/access on our web site.  I don't think we're doing this, now, 
however.  We do post the ones generated for the release, I think, 
instead.  Can anyone confirm this?


What should we do going forward?  I see somewhat limited value to doing 
another set of developer javadocs, given that the developers have the 
source to work with.


-Marshall


Some finds regarding maven eclipse svn

2008-02-02 Thread Marshall Schor
Maven was built to expect a hierarchical (not flat) project / sub 
project structure.  There are many fixes to maven that focus on making 
it work for flat (e.g. Eclipse-like) project structures.  But some 
things appear not to work properly.  See e.g. 
http://jira.codehaus.org/browse/MRELEASE-261 which is Open and not 
being worked on.


A main issue is whether or not Eclipse itself supports a nested, 
hierarchical project structure.  Apparently, as of Europa (3.3) version, 
it doesn't quite, but one email post I found says:



 Europa supports having multiple projects in workspace that overlap : my
 root
 project contains all sub-projects as folder.

 Nico.



SVN apparently has two different Eclipse plugin providers:   Subclipse 
(which I've been using), and now there's a new, Eclipse-official one (in 
incubation), called Subversive.   The new one apparently also supports 
some kind of hierarchical SVN operations, while Subclipse doesn't.


From the above MRELEASE-261:

Subclipse plugin for Eclipse can not handle nested projects in Eclipse 
at all, and from the dialogue on their list, do not intend to. As 
Subversive provides much better support for nested Subversion structures 
in Eclipse, and has since become the 'official' (or so I'm informed) 
Eclipse foundation Subversion plugin, we have moved to using Subversive 
and find that the Eclipse multi-project import-export plugin works 
pretty well. Note the impact analysis for the work of changing the 
release plugin to be more 'directory aware' was pretty good, 3 days 
would have it cracked I would expect (inc. ITs etc)


-Marshall


more maven conventions

2008-02-02 Thread Marshall Schor
The eclipse:eclipse plugin will take the maven artifactId and use it 
for the Eclipse Plugin ID


The artifact id we use are things like uimaj-ep-debug.
The eclipse plugin id is different:org.apache.uima.debug

Any objection to my changing the artifact IDs in the POMs for our 
Eclipse plugins to match the Eclipse plugin ids?


Right now this is not a show-stopper, because we've disabled maven from 
altering the manifest.  But in the future, if we converge toward a more 
conventional maven build, we may want to change this.


-Marshall


building Eclipse plugins with Maven - some discoveries

2008-02-04 Thread Marshall Schor
I did an experiment where I configered the maven POM for one of our 
Eclipse plugins to let maven's eclipse:eclipse update the PDE Manifest.  
However, I found a bug in how it treated -SNAPSHOT - it turned

2.3.0.incubating-SNAPSHOT in the maven version into a Manifest entry
2.3.0.incubating.SNAPSHOT  (changed the '-' to a '.') 

When I posted a patch to the maven-eclipse-plugin to fix this, I 
commented that the thing I patched was Deprecated.  I got a quick reply 
saying it was, indeed deprecated - we should be using some OSGi tooling 
from Apache Felix. 


The comment on the Maven list says:

you should look into the Apache Felix bundle plugin. It has a
bundle:manifest goal that will generate the OSGi manifest, that's why
the eclipse pluign class is deprecated

Check Adding OSGi metadata to existing projects without changing the
packaging type
http://felix.apache.org/site/maven-bundle-plugin-bnd.html

It looks like the tooling for generating OSGi bundles has advanced quite 
far, and may solve many of the difficulties we've had in building these 
kinds of things.  Among the things it apparently can be configured to 
handle is our library plugin, containing jars from other places. I'll 
going to try and see if we can make this all work for our projects.  If 
anyone else has insights, please post :-).


-Marshall



It may not be possible to use the UIMA SOAP interfaces via the Eclipse run-time-plugin?

2008-02-04 Thread Marshall Schor
UIMA's Soap implementation depends on having the axis classes (from 
TomCat?) in its classpath.  For normal UIMA deployments, this is 
accomplished by adding the needed Jar to the classpath.


For Eclipse and RCP plugin environments, the user's plugin is depending 
on the uima-ep-runtime plugin, which has our SOAP implementation. Is it 
possible at run time to add to the classpath of the uima-ep-runtime 
plugin the axis Jar? 

If not, then I don't think our current Eclipse runtime plugin bundle 
supports users who want to use the Soap APIs.


-Marshall


Re: It may not be possible to use the UIMA SOAP interfaces via the Eclipse run-time-plugin?

2008-02-06 Thread Marshall Schor
If this is indeed true, we probably should remove the uima-adapter-soap 
Jar from the runtime-plugin build, since I couldn't run anyway.  If 
someone wanted to use SOAP within an OSGi bundle, they could always 
build their own runtime bundle, including our uima-adapter-soap jar plus 
the jars it depends on from axis.


What do others think?

-Marshall

Marshall Schor wrote:
UIMA's Soap implementation depends on having the axis classes (from 
TomCat?) in its classpath.  For normal UIMA deployments, this is 
accomplished by adding the needed Jar to the classpath.


For Eclipse and RCP plugin environments, the user's plugin is 
depending on the uima-ep-runtime plugin, which has our SOAP 
implementation. Is it possible at run time to add to the classpath of 
the uima-ep-runtime plugin the axis Jar?
If not, then I don't think our current Eclipse runtime plugin bundle 
supports users who want to use the Soap APIs.


-Marshall






making build/version info available and using in error messages

2008-02-08 Thread Marshall Schor
A patch I just committed in uima-as accesses version/build info from 
some private spot (in uima-as), probably for use in error messages.


It seems to me that this capability would be generally of use in UIMA, 
and it would be good to have some standard way of including it in our 
error/log messages (or do we already ?).


Is there a convenient source for this information?  It seems it would be 
in the various manifests, etc.   Is there a standard way to provide this?


-Marshall


svn and jira not coupling?

2008-02-08 Thread Marshall Schor
It used to be that you could see in the Jira the svn commits that were 
done.  Today when I looked in Jira, there were no SVN commits listed, 
even on old issues.  Is it just my setup, or do others see this?


-Marshall

P.S. - it may have had something to do with the fisheye tab, which I 
clicked , just to see what it was - there was this error message:
Error communicating with FishEye: java.io.IOException: repository not in 
correct state: Repository is stopped


<    1   2   3   4   5   6   7   8   9   10   >