Where to put large generated things for our website to access
Here are some examples of large files (or large collections of files The api java docs; The api java docs as a zip file The 4 books in html format It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads. It seems bad to put them in SVN (because there's no need for versioning these - they're generated, and they are big, taking up SVN space). Our current strategy is hybrid: 1) for current release only: api java docs and the api java docs.zip file are put in people.a.o/www/incubator.a.o/uima/downloads, and are *not* kept in SVN. 2) for current release only: the 4 books in html format are put in SVN and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn update command. I see, in fact that for release 2.2.0, we managed to put the books in html format into SVN twice - once under 2.2.0-incubating/docs/html, and once under 2.2.0-incubating/html and of course, on the website, it shows up twice also... not good. Is there any automated process for getting the files installed on people.a.o/www/incubator.a.o/uima/downloads? (Has anyone done any scripts for this)? Does everyone agree that it's best to keep these out of SVN, and to put them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads? === The mirrored distribution spot contains, in addition to /binaries and /source, a /docs directory with the the following: release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1] release/api / -- unzipped set of javaDoc html files, no signing files release/html/. -- set of 4 books as html files, no signing files release/pdf/.-- set of 4 books as pdfs, no signing files I think everything that's put onto the mirroring system is supposed to be signed, because Apache doesn't control what goes on at the mirrors (e.g., they could be hacked). Currently, our download page is slient about the existance of these. I think we should delete these on the mirroring distribution system. Assuming we followed the top part of this note, we would have everything (except the pdf form of the 4 books) on the UIMA website, directly (not going thru a mirror). Other opinions? -Marshall
Re: Where to put large generated things for our website to access
One more consideration: Some people use a URL to javadocs as part of their javadoc build process. For that, we might want to consider if we want to support people doing that - and if so, we probably need to keep the javadocs for each version, forever. The dist system supports this, via archiving + redirects. -Marshall Marshall Schor wrote: Here are some examples of large files (or large collections of files The api java docs; The api java docs as a zip file The 4 books in html format It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads. It seems bad to put them in SVN (because there's no need for versioning these - they're generated, and they are big, taking up SVN space). Our current strategy is hybrid: 1) for current release only: api java docs and the api java docs.zip file are put in people.a.o/www/incubator.a.o/uima/downloads, and are *not* kept in SVN. 2) for current release only: the 4 books in html format are put in SVN and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn update command. I see, in fact that for release 2.2.0, we managed to put the books in html format into SVN twice - once under 2.2.0-incubating/docs/html, and once under 2.2.0-incubating/html and of course, on the website, it shows up twice also... not good. Is there any automated process for getting the files installed on people.a.o/www/incubator.a.o/uima/downloads? (Has anyone done any scripts for this)? Does everyone agree that it's best to keep these out of SVN, and to put them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads? === The mirrored distribution spot contains, in addition to /binaries and /source, a /docs directory with the the following: release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1] release/api / -- unzipped set of javaDoc html files, no signing files release/html/. -- set of 4 books as html files, no signing files release/pdf/.-- set of 4 books as pdfs, no signing files I think everything that's put onto the mirroring system is supposed to be signed, because Apache doesn't control what goes on at the mirrors (e.g., they could be hacked). Currently, our download page is slient about the existance of these. I think we should delete these on the mirroring distribution system. Assuming we followed the top part of this note, we would have everything (except the pdf form of the 4 books) on the UIMA website, directly (not going thru a mirror). Other opinions? -Marshall
Re: Where to put large generated things for our website to access
Here's an argument for keeping the big things we point to from our website, like the javaDocs and the 4 books in html form, on the a.o/dist/incubator/uima site: It is automatically archived. And, when it's deleted from the mirror, a redirect is put in to the archive spot. This would be ideal for being able to have older versions kept with permanent static URLs. So - upon further reflection - I think I'm changing my mind on this, and am now in favor of keeping these on the mirroring system. We can avoid having people who are on our web site and want to view the documentation, having to check signatures by not using the mirroring system, but just pointing them to the main a.o/dist location. I'm not sure if this would be OK in terms of protocol for load balancing, but I'll set the doc page up this way for now, and we can change it if we need to. -Marshall Michael Baessler wrote: Fine with me to delete the HTML documentations (manual and javadoc) on the mirror. I thought we can use it and link them from our website. As far as I know, there is no script to upload the documentation. I did it manually. -- Michael Marshall Schor wrote: Here are some examples of large files (or large collections of files The api java docs; The api java docs as a zip file The 4 books in html format It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads. It seems bad to put them in SVN (because there's no need for versioning these - they're generated, and they are big, taking up SVN space). Our current strategy is hybrid: 1) for current release only: api java docs and the api java docs.zip file are put in people.a.o/www/incubator.a.o/uima/downloads, and are *not* kept in SVN. 2) for current release only: the 4 books in html format are put in SVN and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn update command. I see, in fact that for release 2.2.0, we managed to put the books in html format into SVN twice - once under 2.2.0-incubating/docs/html, and once under 2.2.0-incubating/html and of course, on the website, it shows up twice also... not good. Is there any automated process for getting the files installed on people.a.o/www/incubator.a.o/uima/downloads? (Has anyone done any scripts for this)? Does everyone agree that it's best to keep these out of SVN, and to put them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads? === The mirrored distribution spot contains, in addition to /binaries and /source, a /docs directory with the the following: release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1] release/api / -- unzipped set of javaDoc html files, no signing files release/html/. -- set of 4 books as html files, no signing files release/pdf/.-- set of 4 books as pdfs, no signing files I think everything that's put onto the mirroring system is supposed to be signed, because Apache doesn't control what goes on at the mirrors (e.g., they could be hacked). Currently, our download page is slient about the existance of these. I think we should delete these on the mirroring distribution system. Assuming we followed the top part of this note, we would have everything (except the pdf form of the 4 books) on the UIMA website, directly (not going thru a mirror). Other opinions? -Marshall
Re: Where to put large generated things for our website to access
One further thought: A lot of projects put the RELEASE NOTES for particular releases at the top level of the dist/ - where the file name includes the release: for example: ANT: RELEASE-NOTES-1.7.0.html HTTPD: CHANGES_2.2.6 Since these will get archived, and redirects can be done for them, their URLs can be permanent. To be consistent with this practice, I would like to put our release notes for display from our web site in a.o/dist/... They don't need to be signed, because only archives need that; also, other projects don't sign these kinds of things. Any objections? -Marshall Marshall Schor wrote: Here's an argument for keeping the big things we point to from our website, like the javaDocs and the 4 books in html form, on the a.o/dist/incubator/uima site: It is automatically archived. And, when it's deleted from the mirror, a redirect is put in to the archive spot. This would be ideal for being able to have older versions kept with permanent static URLs. So - upon further reflection - I think I'm changing my mind on this, and am now in favor of keeping these on the mirroring system. We can avoid having people who are on our web site and want to view the documentation, having to check signatures by not using the mirroring system, but just pointing them to the main a.o/dist location. I'm not sure if this would be OK in terms of protocol for load balancing, but I'll set the doc page up this way for now, and we can change it if we need to. -Marshall Michael Baessler wrote: Fine with me to delete the HTML documentations (manual and javadoc) on the mirror. I thought we can use it and link them from our website. As far as I know, there is no script to upload the documentation. I did it manually. -- Michael Marshall Schor wrote: Here are some examples of large files (or large collections of files The api java docs; The api java docs as a zip file The 4 books in html format It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads. It seems bad to put them in SVN (because there's no need for versioning these - they're generated, and they are big, taking up SVN space). Our current strategy is hybrid: 1) for current release only: api java docs and the api java docs.zip file are put in people.a.o/www/incubator.a.o/uima/downloads, and are *not* kept in SVN. 2) for current release only: the 4 books in html format are put in SVN and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn update command. I see, in fact that for release 2.2.0, we managed to put the books in html format into SVN twice - once under 2.2.0-incubating/docs/html, and once under 2.2.0-incubating/html and of course, on the website, it shows up twice also... not good. Is there any automated process for getting the files installed on people.a.o/www/incubator.a.o/uima/downloads? (Has anyone done any scripts for this)? Does everyone agree that it's best to keep these out of SVN, and to put them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads? === The mirrored distribution spot contains, in addition to /binaries and /source, a /docs directory with the the following: release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1] release/api / -- unzipped set of javaDoc html files, no signing files release/html/. -- set of 4 books as html files, no signing files release/pdf/.-- set of 4 books as pdfs, no signing files I think everything that's put onto the mirroring system is supposed to be signed, because Apache doesn't control what goes on at the mirrors (e.g., they could be hacked). Currently, our download page is slient about the existance of these. I think we should delete these on the mirroring distribution system. Assuming we followed the top part of this note, we would have everything (except the pdf form of the 4 books) on the UIMA website, directly (not going thru a mirror). Other opinions? -Marshall
Re: Where to put large generated things for our website to access
I updated our website download page and documentation page. I made the download page work with mirrors, and changed the format for accessing previous archived files to follow the common practice on other sites, referring to the archive.apache.org site. I made our documentation page refer to apache.org/dist/incubator/uima for the doc files - and didn't put any of these into our SVN for our website. I also followed common practice and put our Release notes into the apache.org/dist/i/u at the top level (I changed the name to add the suffix of the release version). This allows (via the archive system) for these things to be always available. I added the Eclipse update site to a.o/d/i/u The only thing not done as of yet is setting up a .htaccess file in this directory, and adding HEADER.html and README.html files to make directory listing more customized. I'm not going to tackle this right now - if anyone else wants to take a crack, OK with me. -Marshall Michael Baessler wrote: Fine with me. -- Michael Marshall Schor wrote: One further thought: A lot of projects put the RELEASE NOTES for particular releases at the top level of the dist/ - where the file name includes the release: for example: ANT: RELEASE-NOTES-1.7.0.html HTTPD: CHANGES_2.2.6 Since these will get archived, and redirects can be done for them, their URLs can be permanent. To be consistent with this practice, I would like to put our release notes for display from our web site in a.o/dist/... They don't need to be signed, because only archives need that; also, other projects don't sign these kinds of things. Any objections? -Marshall Marshall Schor wrote: Here's an argument for keeping the big things we point to from our website, like the javaDocs and the 4 books in html form, on the a.o/dist/incubator/uima site: It is automatically archived. And, when it's deleted from the mirror, a redirect is put in to the archive spot. This would be ideal for being able to have older versions kept with permanent static URLs. So - upon further reflection - I think I'm changing my mind on this, and am now in favor of keeping these on the mirroring system. We can avoid having people who are on our web site and want to view the documentation, having to check signatures by not using the mirroring system, but just pointing them to the main a.o/dist location. I'm not sure if this would be OK in terms of protocol for load balancing, but I'll set the doc page up this way for now, and we can change it if we need to. -Marshall Michael Baessler wrote: Fine with me to delete the HTML documentations (manual and javadoc) on the mirror. I thought we can use it and link them from our website. As far as I know, there is no script to upload the documentation. I did it manually. -- Michael Marshall Schor wrote: Here are some examples of large files (or large collections of files The api java docs; The api java docs as a zip file The 4 books in html format It seems good to put them in people.a.o/www/incubator.a.o/uima/downloads. It seems bad to put them in SVN (because there's no need for versioning these - they're generated, and they are big, taking up SVN space). Our current strategy is hybrid: 1) for current release only: api java docs and the api java docs.zip file are put in people.a.o/www/incubator.a.o/uima/downloads, and are *not* kept in SVN. 2) for current release only: the 4 books in html format are put in SVN and copied to people.a.o/www/incubator.a.o/uima/downloads with the svn update command. I see, in fact that for release 2.2.0, we managed to put the books in html format into SVN twice - once under 2.2.0-incubating/docs/html, and once under 2.2.0-incubating/html and of course, on the website, it shows up twice also... not good. Is there any automated process for getting the files installed on people.a.o/www/incubator.a.o/uima/downloads? (Has anyone done any scripts for this)? Does everyone agree that it's best to keep these out of SVN, and to put them in the web server spot on people.a.o/www/incubator.a.o/uima/downloads? === The mirrored distribution spot contains, in addition to /binaries and /source, a /docs directory with the the following: release/apiDocs.zip , plus the 3 signing files [asc, md5, sha1] release/api / -- unzipped set of javaDoc html files, no signing files release/html/. -- set of 4 books as html files, no signing files release/pdf/.-- set of 4 books as pdfs, no signing files I think everything that's put onto the mirroring system is supposed to be signed, because Apache doesn't control what goes on at the mirrors (e.g., they could be hacked). Currently, our download page is slient about the existance of these. I think we should delete these on the mirroring distribution system. Assuming we followed the top part of this note, we would have
[Fwd: Re: [schor] incubator/uima/eclipseUpdateSite/]
Original Message Subject:Re: [schor] incubator/uima/eclipseUpdateSite/ Date: Sat, 29 Dec 2007 07:35:24 +0100 (CET) From: Henk P. Penning [EMAIL PROTECTED] To: Marshall Schor [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] On Sat, 29 Dec 2007, Marshall Schor wrote: Date: Sat, 29 Dec 2007 01:13:19 -0500 From: Marshall Schor [EMAIL PROTECTED] To: Henk Penning [EMAIL PROTECTED], uima-dev uima-dev@incubator.apache.org Subject: Re: [schor] incubator/uima/eclipseUpdateSite/ This has now been done - the signature and hash sums (MD5 and SHA1) are uploaded to incubator/uima/eclipseUpdateSite for the files flagged in the report. Marshall Schor, ok ; thanks ; the checker picked it up already ; all's fine. -Marshall regards, Henk Penning _ Henk P. Penning, Computer Systems Group R Uithof CGN-A232 _/ \_ Dept of Computer Science, Utrecht University T +31 30 253 4106 / \_/ \ Padualaan 14, 3584CH Utrecht, the Netherlands F +31 30 253 2804 \_/ \_/ http://people.cs.uu.nl/henkp/ M [EMAIL PROTECTED] \_/
Re: permissions owners on w.a.o/dist/incubator/uima
Thanks, Robert. I think we've finished with the tasks needed for migrating to the mirror system, perhaps with the exception of setting up .htaccess on w.a.o/dist/incubator/uima, and adding HEADER.html and README.html to that directory. However, I see other projects do not necessarily have that. Are any other lines (e.g., redirects of some sort) needed in the .htaccess file at this time? We would appreciate any checking you could do of our work to migrate to the mirror system. Thanks for all your help. -Marshall Robert Burrell Donkin wrote: On Dec 25, 2007 3:23 AM, Marshall Schor [EMAIL PROTECTED] wrote: When putting files into this spot, I think the permissions should include group writable - so others in the project can update things, and world read-only for obvious reasons. +1 Is there a group for uima? I think there may not be, but I don't remember how to check that on linux. or FreeBSD ;-) i use: grep incubator /etc/group but those with more BSD-fu usually have more elegant solutions than mine... If there is not, then we should make the group be incubator if possible. +1 the basic infrastructure rule is one group per TLP. so, whilst UIMA is in the incubator, the incubator group should be used. if UIMA gradulates to a TLP then a new uima group will be created and that group should be used for releases. if UIMA graduates as a subproject of Project Cool (say) then group cool will be used for releases. - robert
Re: Where to put large generated things for our website to access
Robert Burrell Donkin wrote: (apologies for not jumping in promptly) On Dec 24, 2007 3:48 AM, Marshall Schor [EMAIL PROTECTED] wrote: I updated our website download page and documentation page. I made the download page work with mirrors, and changed the format for accessing previous archived files to follow the common practice on other sites, referring to the archive.apache.org site. I made our documentation page refer to apache.org/dist/incubator/uima for the doc files - and didn't put any of these into our SVN for our website. after feeling a little uncertain about this, i asked the intrastructure team who gave some good arguments for storing docs in dist: 1. rsync is good for large files but struggles with lots of small files 2. mirrored documentation is not supported so push all that content to the mirrors is wasteful 3. released documentation should have an unchanging URL. when a release is archived, the documentation URL would need to change (a redirect would help people but not all robots). having release documentation permanently stored and archived is a good idea but it's strongly recommended that subversion is used. the zip'd archive is fine where it is but it would be better for the contents of the folders to be committed to subversion and then checked out to an appropriate place on the website. - robert I felt uncertain about all of this, too. It seems to me that the right way to do this would be to have something like w.a.o/dist-not-mirrored/ ... etc. where the same archive mechanism could be used as is used for /dist/, but which doesn't do mirroring. Has this come up before in discussions - a way to have things that are not to be mirrored, but which would reasonably be archived? You might say that the docs don't need to be archived (because they can always be extracted from an archived release zip/tar), but I find having at least some older versions of the docs quite useful in helping users running on a specific level - I can say things like see xxx on page yyy and know it matches their documentation. It seems inefficient to store large generated things in SVN, such as the javadocs (these are large numbers of small files) -- but I would be happy to learn if I'm worrying about this unnecessarily. I can see an argument against something like w.a.o/dist-not-mirrored/ - avoiding creating even more infrastructure stuff. Other opinions / options? -Marshall
Re: Where to put large generated things for our website to access
Michael Baessler wrote: Robert Burrell Donkin wrote: subversion really is the way to go for release documentation OK, so as far as I understand we go with the documentation the same way as with the previous Apache UIMA releases. We check in the documentation to SVN and provide a download similar to release 2.2.0-incubating: http://incubator.apache.org/uima/downloads/releaseDocs/2.2.0-incubating/docs/html/index.html The JavaDocs will also go to SVN in both versions, HTML and zip. I can do the necessary changes, if all agree on that. -- Michael +1. Also - remove the docs from /dist/incubator/uima -Marshall
Re: Where to put large generated things for our website to access
Michael Baessler wrote: Marshall Schor wrote: Michael Baessler wrote: Robert Burrell Donkin wrote: subversion really is the way to go for release documentation OK, so as far as I understand we go with the documentation the same way as with the previous Apache UIMA releases. We check in the documentation to SVN and provide a download similar to release 2.2.0-incubating: http://incubator.apache.org/uima/downloads/releaseDocs/2.2.0-incubating/docs/html/index.html The JavaDocs will also go to SVN in both versions, HTML and zip. I can do the necessary changes, if all agree on that. -- Michael +1. Also - remove the docs from /dist/incubator/uima -Marshall Should we really remove the documentation from there. I think other projects also have the documentation there. So I think we should provide it too, maybe as one package to download.? -- Michael This one is a judgement call - I can see arguments on both sides. We've heard that the rsync mechanism handles small numbers of large files better than large numbers of small files - so putting only the 1 archive file to download seems a better fit, if we do this. Putting them in /dist/ means they will be mirrored, and archived. So we will have dual archiving (one in SVN, and one in the archive spot). The mirroring would be useful *if* we expected a large load on the apache servers for downloading these. I think this will not be the case. Most of the time I use these to send people links to specific sections of the docs; for that it would be annoying if when they clicked the link, they were asked to pick a mirror. Considering all of this - I'm slightly in favor of keeping the docs just in SVN, and not on the /dist/ mirroring system. Some more things to think about Since we want our docs pages to refer not only to the current release but also previous releases, it would be good to figure out a fairly automatic system for this. (That was a virtue of the /dist/ - archive system - we could point the previous releases doc links to a directory containing all the releases, and wouldn't need to update this link for subsequent releases). The other thing to do is to figure out how to keep the archived things on our web-site. It was suggested that we should do this like we handle the web-site checkout. Probably the straight-forward thing to do is to have a special directory where all the docs we want to refer to live, have the archive link point there, and have special links that refer to the current version. As I recall, the web-site, itself, is replicated to other servers (since after you update it , you have to wait a while for it to appear). I have to confess that this seems quite wasteful of disk resources (double+ copies of things like javadocs - one in SVN, one on people.apache.org in our web-site place, and maybe (several?) additional copies on web-servers used for incubator.apache.org/uima). But Robert Donkin suggested this was the best way. -Marshall
Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation
Hi Thilo - I forgot about that email trail :-) The Eclipse update site is created using an ant build script. Is there a way to make the poms work for these? That would be nicer than more build scripts. -Marshall Thilo Goetz wrote: Marshall, how does this relate to this mail trail: http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html --Thilo Marshall Schor (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343 ] Marshall Schor commented on UIMA-677: - I found these utilities on Linux (Suse 10) and Windows (via Cygwin): sha1sum and md5sum. For signing the Eclipse update site (all jars need to be signed - since they're archives) I wrote a small shell script. I also wrote one to automatically check the signatures. If you put gpg into the path, the scripts should work. I'll check them into SVN. I would suggest they be combined with the other signing script, and the other signing script altered to use the sha1sum/md5sum utilities. improve MD5 and SHA1 checksum generation Key: UIMA-677 URL: https://issues.apache.org/jira/browse/UIMA-677 Project: UIMA Issue Type: Bug Components: Build, Packaging and Test Reporter: Michael Baessler Comes up on the incubator mailing list: There are some problems with the MD5 and SHA1 files. For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5: uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D BB 12 82 58 D0 7D CA 4B The hash is spread over two lines and into hex pairs. The normal format is either: 53206afb751f079dbb128258d07dca4b or 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2 The SHA1 checksums have the same problem.
Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation
Marshall Schor wrote: Hi Thilo - I forgot about that email trail :-) The Eclipse update site is created using an ant build script. Is there a way to make the poms work for these? That would be nicer than more build scripts. Looking at the pom xml more carefully, I'm guessing it could be modified to create sha1 and md5 for the eclipse update site. I'll give it a try... -Marshall -Marshall Thilo Goetz wrote: Marshall, how does this relate to this mail trail: http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html --Thilo Marshall Schor (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343 ] Marshall Schor commented on UIMA-677: - I found these utilities on Linux (Suse 10) and Windows (via Cygwin): sha1sum and md5sum. For signing the Eclipse update site (all jars need to be signed - since they're archives) I wrote a small shell script. I also wrote one to automatically check the signatures. If you put gpg into the path, the scripts should work. I'll check them into SVN. I would suggest they be combined with the other signing script, and the other signing script altered to use the sha1sum/md5sum utilities. improve MD5 and SHA1 checksum generation Key: UIMA-677 URL: https://issues.apache.org/jira/browse/UIMA-677 Project: UIMA Issue Type: Bug Components: Build, Packaging and Test Reporter: Michael Baessler Comes up on the incubator mailing list: There are some problems with the MD5 and SHA1 files. For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5: uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D BB 12 82 58 D0 7D CA 4B The hash is spread over two lines and into hex pairs. The normal format is either: 53206afb751f079dbb128258d07dca4b or 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2 The SHA1 checksums have the same problem.
Re: [jira] Commented: (UIMA-677) improve MD5 and SHA1 checksum generation
Marshall Schor wrote: Marshall Schor wrote: Hi Thilo - I forgot about that email trail :-) The Eclipse update site is created using an ant build script. Is there a way to make the poms work for these? That would be nicer than more build scripts. Looking at the pom xml more carefully, I'm guessing it could be modified to create sha1 and md5 for the eclipse update site. I'll give it a try... Adding these lines in the checksum task to Thilo's pom version for uimaj-distr worked: fileset dir=../uimaj-eclipse-update-site/target/features include name=*.jar / /fileset fileset dir=../uimaj-eclipse-update-site/target/plugins include name=*.jar / /fileset I'll check in these changes to the uimaj-distr pom. We still need to add signing of eclipse update site jars. I'll take a look at that. -Marshall -Marshall -Marshall Thilo Goetz wrote: Marshall, how does this relate to this mail trail: http://www.mail-archive.com/uima-dev%40incubator.apache.org/msg05057.html --Thilo Marshall Schor (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555343#action_12555343 ] Marshall Schor commented on UIMA-677: - I found these utilities on Linux (Suse 10) and Windows (via Cygwin): sha1sum and md5sum. For signing the Eclipse update site (all jars need to be signed - since they're archives) I wrote a small shell script. I also wrote one to automatically check the signatures. If you put gpg into the path, the scripts should work. I'll check them into SVN. I would suggest they be combined with the other signing script, and the other signing script altered to use the sha1sum/md5sum utilities. improve MD5 and SHA1 checksum generation Key: UIMA-677 URL: https://issues.apache.org/jira/browse/UIMA-677 Project: UIMA Issue Type: Bug Components: Build, Packaging and Test Reporter: Michael Baessler Comes up on the incubator mailing list: There are some problems with the MD5 and SHA1 files. For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5: uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D BB 12 82 58 D0 7D CA 4B The hash is spread over two lines and into hex pairs. The normal format is either: 53206afb751f079dbb128258d07dca4b or 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2 The SHA1 checksums have the same problem.
Re: [jira] Commented: (UIMA-681) change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT
I posted a question on the maven users list asking for best practices for addressing the updating-the-parent-link when the version changes, in case there's something obvious we could be doing :-). -Marshall Michael Baessler (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1208#action_1208 ] Michael Baessler commented on UIMA-681: --- Greate seems to work. I guess I will do a mix between both suggestions. But we still have to change for each child POM the version number of the parent :-( change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT -- Key: UIMA-681 URL: https://issues.apache.org/jira/browse/UIMA-681 Project: UIMA Issue Type: Task Components: Build, Packaging and Test Affects Versions: 2.2.1 Reporter: Michael Baessler Assignee: Michael Baessler Fix For: 2.3
Re: [jira] Closed: (UIMA-679) update UIMA website with release 2.2.1-incubating
Michael - can you announce UIMA 2.2.1 release on the various announcement places? -Marshall Michael Baessler (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Baessler closed UIMA-679. - Resolution: Fixed I think all updates are done update UIMA website with release 2.2.1-incubating -- Key: UIMA-679 URL: https://issues.apache.org/jira/browse/UIMA-679 Project: UIMA Issue Type: New Feature Components: Transport Adapters - SOAP, Vinci Reporter: Michael Baessler Assignee: Michael Baessler
Re: Ready to announce the release ?
Michael Baessler wrote: +1 The Eclipse update site I thought was documented in our manual on setting up Eclipse: http://incubator.apache.org/uima/downloads/releaseDocs/2.2.1-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup.install_uima_eclipse_plugins Probably should have a side-bar link to it on our web-site, though. -Marshall If all necessary updates are in place, I think we can announce the uimaj-2.2.1-incubating release. - The release artifacts are uploaded and works with the mirror - The website is updated with the latest documentation - The release artifacts are uploaded to the Maven Incubator repository - The eclipse update site is in place (but currently not documented !?) -- Michael
Re: [jira] Commented: (UIMA-681) change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT
Marshall Schor wrote: I posted a question on the maven users list asking for best practices for addressing the updating-the-parent-link when the version changes, in case there's something obvious we could be doing :-). The answer was: you should try using the release feature of Maven : http://maven.apache.org/plugins/maven-release-plugin I looked at this, and it seems the release:prepare step does the following: 1) starts with the SVN state, 2) updates the POMs for the release version number, 3) runs the tests 4) commits the POMs 5) makes a TAG with those values 6) updates the POMs for the next -Snapshot 7) commits those. So - this might be worth trying... next release? -Marshall Michael Baessler (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1208#action_1208 ] Michael Baessler commented on UIMA-681: --- Greate seems to work. I guess I will do a mix between both suggestions. But we still have to change for each child POM the version number of the parent :-( change UIMA version from 2.2.1-incubating to 2.3.0-incubating-SNAPSHOT -- Key: UIMA-681 URL: https://issues.apache.org/jira/browse/UIMA-681 Project: UIMA Issue Type: Task Components: Build, Packaging and Test Affects Versions: 2.2.1 Reporter: Michael Baessler Assignee: Michael Baessler Fix For: 2.3
Re: [Fwd: REMINDER: Board Reports Due THIS Week]
Hi Everyone - I entered a start at a board report. It needs some filling out - Joern - can you add a line or two about progress in the CAS editor? Michael and Thilo - you've been doing quite a bit of work in the sandbox projects - perhaps say something about progress here? Any other additions appreciated! -Marshall Thilo Goetz wrote: We're due to report this month. Original Message Subject: REMINDER: Board Reports Due THIS Week Date: Sun, 6 Jan 2008 23:44:29 -0500 From: Noel J. Bergman [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Yes, yes, it seems early, but that's what happens when the 1st is a Tuesday. :-) All Board reports are due this week. --- Noel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[Fwd: ASF grant for UIMA]
The Software Grant for UIMA-EE has been officially received by the secretary of Apache. I'll proceed to 1) put in a Jira issue with a zip file for it, and 2) set up in the sandbox under sandbox trunk uima-ee project 1 project 2 etc. the files for this. As part of this, I plan to reconfigure the parts to follow the maven conventions. -Marshall Original Message Subject:ASF grant for UIMA Date: Mon, 7 Jan 2008 10:37:10 -0500 From: Jonathan Jagielski [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Hello, I'm writing this email to inform you that the grant for UIMA from IBM was received. I'm sorry that you seem to have been kept out of the loop, as I replied to other emails saying that it had been received although it wasn't entered into the registry. This is probably my fault as I didn't make sure that the email I sent to others was received by you. I'm very sorry that this has taken so long, and the grant should be recorded later today. Sincerely, Jonathan Jagielski
Re: [jira] Created: (UIMA-678) Update notice file
My understanding from discussions and reading http://people.apache.org/~rubys/3party.html http://people.apache.org/%7Erubys/3party.html, the principles for distributing things involving Eclipse are: 1) You can distribute binaries (but not sources) of things that are EPL http://opensource.org/licenses/eclipse-1.0.php (Eclipse Public License) licensed, as long as the notice file identifies these and provides a link to the source for these. (This requirement comes from the EPL license). 2) You cannot include in your distribution Eclipse sources - because that would require using the EPL as the license, not the Apache license. The 3party page has this, though: For small amounts of source that is directly consumed by the ASF product at runtime in source form, and for which that source is unlikely to be changed anyway (say, by virtue of being specified by a standard), this action is sufficient. An example of this is the web-facesconfig_1_0.dtd http://java.sun.com/dtd/web-facesconfig_1_0.dtd, whose inclusion is mandated by the JSR 127: JavaServer Faces http://jcp.org/en/jsr/detail?id=127 specification. 3) If you have source code which is a derivative work of Eclipse source, which can happen if you take an eclipse source file and modify it and incorporate the modified/customized file into your source, then that's a gray area I'm not too clear about. Ignoring the version differences, what specific source code files are you incorporating? -Marshall Jörn Kottmann wrote: what Eclipse SW does the CAS Editor include? This depends on the eclipse version which is used to create the build. The current eclipse version is 3.3.1.1. The guys from the apache directory studio (http://directory.apache.org/studio/) also do not include the version in the notice file. Jörn
Re: [Fwd: ASF grant for UIMA]
Hi Robert - I have some confusion about the IP form. The page http://incubator.apache.org/ip-clearance/index.html seems to be written with an implicit assumption that a Top level project with a real, project level PMC is doing the receiving - so there are phrases like: The receiving PMC is responsible for doing the work. The Incubator is simply the repository of the needed information. Once a PMC directly checks-in a filled-out short form, the Incubator PMC will need to approve the paper work after which point the receiving PMC is free to import the code. Other places say that this IP Clearance work needs to be done by an ASF Officer or Member: for instance, on page http://incubator.apache.org/ip-clearance/ip-clearance-template.html it says: IP Clearance processing must be executed either by an Officer or a Member of the ASF. So, my basic question is: does this process apply to incubator projects which, while incubating, receive additional code via a software grant, and if so, is the receiving PMC the Incubator PMC or the podling-learning-mode-unofficial-PMC (of which I think I am a member)? And, if we are to use the IP Clearance form, how do we have the processing executed by an Officer or Member of the ASF? Thanks for your help and guidance, as usual :-) -Marshall Robert Burrell Donkin wrote: On Jan 7, 2008 10:56 PM, Marshall Schor [EMAIL PROTECTED] wrote: The Software Grant for UIMA-EE has been officially received by the secretary of Apache. I'll proceed to 1) put in a Jira issue with a zip file for it, and 2) set up in the sandbox under sandbox trunk uima-ee project 1 project 2 etc. the files for this. As part of this, I plan to reconfigure the parts to follow the maven conventions. remember to fill in the incubator IP clearance form :-) - robert
Re: [Fwd: REMINDER: Board Reports Due THIS Week]
I vote for not duplicating things - can we find one link that would always be relevant? -Marshall Michael Baessler wrote: It's seems that we do not update our website with the Board Reports... http://incubator.apache.org/uima/apache-board-status.html Either we update it frequently or we remove the page. Another possibility would be to link to the wiki with the Board Reports. -- Michael Thilo Goetz wrote: We're due to report this month. Original Message Subject: REMINDER: Board Reports Due THIS Week Date: Sun, 6 Jan 2008 23:44:29 -0500 From: Noel J. Bergman [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Yes, yes, it seems early, but that's what happens when the 1st is a Tuesday. :-) All Board reports are due this week. --- Noel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Fwd: REMINDER: Board Reports Due THIS Week]
How about just linking to the top page in the wiki for all the board reports? The user would need to do 1 more click to pick the year month, and then a scroll to, or find for, UIMA. -Marshall Michael Baessler wrote: I don't think that we find an official link that directly links to the UIMA report. When looking at the source of the Board Report wiki page you can construct a link like http://wiki.apache.org/incubator/January2008#head-7d9a372767f91873c3e2c7152c445cc2adbb291e that directly links to the January 2008 report of UIMA. But I don't think this is a good idea... :-) -- Michael Marshall Schor wrote: I vote for not duplicating things - can we find one link that would always be relevant? -Marshall Michael Baessler wrote: It's seems that we do not update our website with the Board Reports... http://incubator.apache.org/uima/apache-board-status.html Either we update it frequently or we remove the page. Another possibility would be to link to the wiki with the Board Reports. -- Michael Thilo Goetz wrote: We're due to report this month. Original Message Subject: REMINDER: Board Reports Due THIS Week Date: Sun, 6 Jan 2008 23:44:29 -0500 From: Noel J. Bergman [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Yes, yes, it seems early, but that's what happens when the 1st is a Tuesday. :-) All Board reports are due this week. --- Noel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Created: (UIMA-699) Fill out IP Clearance Form for UIMA-EE
Marshall Schor (JIRA) wrote: I've filled out the IP Clearance form as far as I can. Ken - can you fill in the rest and respond with any issues/concerns on the mailing list here? The ip form is here: http://svn.apache.org/viewvc/incubator/uima/site/trunk/uima-website/xdocs/ip-clearances/uima-ee.xml?view=markup -Marshall Fill out IP Clearance Form for UIMA-EE -- Key: UIMA-699 URL: https://issues.apache.org/jira/browse/UIMA-699 Project: UIMA Issue Type: Task Components: Async Scaleout Reporter: Marshall Schor Assignee: Marshall Schor Priority: Minor Fill out the IP clearance form (http://incubator.apache.org/ip-clearance/ip-clearance-template.html ) and have an officer / member execute it.
startup issue with maven for uimaj-ee
For those of you who may try and build uimaj-ee in the sandbox, there is a 1-time maven startup problem. We currently use a POM structure which has a common parent (in this case, it is uimaj-ee's POM). The common parent factors out some common settings, like release numbers and formats. Therefore, the child POMs require the common parent in order to be processed. The common parent also specifies in its modules element the child POMs. When you do a mvn install on the parent - it builds the children. So - the very first time you try this, the child POMs are read *before* the uimaj-ee's POM has been installed to your local repo (currently as a snapshot). The effect is that the mvn install of uimaj-ee fails because the child POMs can't be processed because their parent is missing (in the repository). The work-around for now is to (1 time only) comment out the module elements in the modules section of uimaj-ee, then mvn install it (to your local repo). Then you can uncomment out the module elements. and build normally. I'm not sure how to fix this in a better way. One idea would be to put in the settings / configuration needed to upload SNAPSHOTs to the /www/people.apache.org/repo/m2-snapshot-repository/ http://people.apache.org/repo/m2-snapshot-repository/ on p.a.o (see http://www.apache.org/dev/repository-faq.html - it says, in part: The /incubating/ repositories are for releases from projects within the Apache Incubator - incubating snapshots still goto the /snapshot/ repositories. Is this something we should do? My worry is that this would be an excessive load on p.a.o for every build anyone does. Perhaps the better idea would be to just manually upload it once? Anyone know the maven magic to do this (if so , please post)? Of course, we would then need to configure the POMs or the local maven user settings to know about using the p.a.o's snapshot repo -- Another idea would be to change our POM hierarchy to split these two functions. This seems like extra work/complexity, though. Is there a maven parameter to temporarily ignore the module part when doing mvn install? -- Other ideas? -Marshall
Re: Incubator Eclipse Update Site How To
Robert Burrell Donkin wrote: the incubator needs to document how to build a mirrored eclipse update site. my eclipse-foo is not up to the task so i wondered if there are be any volunteers here in uimaland who'd be willing to help out either by answering some questions or (even better) creating documentation patches. any volunteers? - robert
Re: Incubator Eclipse Update Site How To
Well, this time I'll enter some text before pushing send :-) I guess my mailing-foo or key-pushing-foo suffered a (hopefully) temporary breakdown... I'll volunteer to do this. Can you point me to where to stick the info, and any other protocol-ish things I should be sure to pay attention to? -Marshall Robert Burrell Donkin wrote: the incubator needs to document how to build a mirrored eclipse update site. my eclipse-foo is not up to the task so i wondered if there are be any volunteers here in uimaland who'd be willing to help out either by answering some questions or (even better) creating documentation patches. any volunteers? - robert
Re: startup issue with maven for uimaj-ee
Thilo Goetz wrote: Marshall Schor wrote: For those of you who may try and build uimaj-ee in the sandbox, there is a 1-time maven startup problem. ... How do we do this in the core? Aren't we using the same mechanisms there? Good question. I did this experiment: delete uimaj-ee from local maven repo, try building - get error. delete uimaj from local maven repo, try building - works! So, I guess I'll do some differential analysis to see what's going on... My guess is that I factored something that shouldn't be factored into the parent. -Marshall --Thilo
Re: startup issue with maven for uimaj-ee
Here's what differential analysis found: The working uimaj POM had . . . version2.3.0-incubating-SNAPSHOT/version properties uimaj-version2.3.0/uimaj-version uimaj-release-version${uimaj-version}-incubating-SNAPSHOT/uimaj-release-version . . . I noticed that 2.3.0-incubating-SNAPSHOT was available as a property, so in the improved (but non-working :-) ) uimaj-ee POM it read: . . . properties . . . uimaj-ee-version0.7.0/uimaj-version uimaj-ee-release-version${uimaj-version}-incubating-SNAPSHOT/uimaj-release-version . . . version${uimaj-ee-release-version}/version Fix was to not use a property substitution in the version element, and instead copy the uimaj-ee-release-version tag. This is probably a Maven defect - I'll see (on the maven list). -Marshall Marshall Schor wrote: Thilo Goetz wrote: Marshall Schor wrote: For those of you who may try and build uimaj-ee in the sandbox, there is a 1-time maven startup problem. ... How do we do this in the core? Aren't we using the same mechanisms there? Good question. I did this experiment: delete uimaj-ee from local maven repo, try building - get error. delete uimaj from local maven repo, try building - works! So, I guess I'll do some differential analysis to see what's going on... My guess is that I factored something that shouldn't be factored into the parent. -Marshall --Thilo
Re: startup issue with maven for uimaj-ee
Marshall Schor wrote: For those of you who may try and build uimaj-ee in the sandbox, there is a 1-time maven startup problem. We currently use a POM structure which has a common parent (in this case, it is uimaj-ee's POM). The common parent factors out some common settings, like release numbers and formats. Therefore, the child POMs require the common parent in order to be processed. The common parent also specifies in its modules element the child POMs. When you do a mvn install on the parent - it builds the children. So - the very first time you try this, the child POMs are read *before* the uimaj-ee's POM has been installed to your local repo (currently as a snapshot). The effect is that the mvn install of uimaj-ee fails because the child POMs can't be processed because their parent is missing (in the repository). The work-around for now is to (1 time only) comment out the module elements in the modules section of uimaj-ee, then mvn install it (to your local repo). Then you can uncomment out the module elements. and build normally. An easier work-around: do mvn -N install. Maven command line arguments are documented nowhere (that I can find), but if you type mvn -? it tells you about this. -Marshall
Re: startup issue with maven for uimaj-ee
How about this: When it's time to generate a test build candidate, we do the basic release prepare process: change the 2.3.0-incubating-SNAPSHOT to 2.3.0-incubating save this as a tag in SVN using the candidate release name: 2.3.0-rc1-incubating increment the base SVN to 2.4.0-incubating-SNAPSHOTnote We then run tests, etc. If we find a problem, we fix in the base, and do another release prepare for the next candidate: change the 2.4.0-incubating-SNAPSHOT to 2.3.0-incubating save this as a tag in SVN 2.3.0-rc2-incubating increment the base SVN to 2.4.0-incubating-SNAPSHOTnote At some point we find we're satisfied; our last release candidate tag is then released; SVN is already setup for the next level. The only drawback I see with this is that it would conflate fixing release candidates with working on the next version. We could fix that by incrementing the base to a version number that specifically included the release candidate info, such as 2.3.0-rc[n]-incubating-SNAPSHOT. Then, at the end, we'd need one more release:prepare step to update the poms to 2.3.0-incubating, tag to 2.3.0-incubating-release (or something like that), and then increment the poms to 2.4.0-incubating-SNAPSHOT Would this be a reasonable process? -Marshall Adam Lally wrote: On Jan 11, 2008 8:56 AM, Marshall Schor [EMAIL PROTECTED] wrote: It was also suggested that we use the maven release plugin to update the version stuff. I think we should investigate that for our next release. The thing that's always bugged me about the release plugin is that I don't think it supports our usual mode of operation where we build a release candidate, then people go off and do lots of manual testing on it, it gets approved by the IPMC, etc., and then we want to release exactly that release candidate. AIUI, the release plugin builds the release from SVN, tags it, and increments the versions for the next release, all at the same time. So it doesn't seem to fit the above process. If we rebuild the release in this way, then we wouldn't be releasing _exactly_ the same thing that had been tested and approved. (I suppose we could diff it, but even then I think timestamps end up in generated artifacts so it isn't exactly the same.) Maybe there's some way to run only the version-number-update part of the release plugin, and not the other stuff. -Adam
Re: capabilityLangugaeFlow - computeResultSpec
Michael - I'm confused about how this test is setup. The test descriptor this code uses loads an aggregate, and then runs a process method which ends up calling some dummy process method called SequencerTestAnnotator. This process method dumps (to a file) the result spec. Is that the case you're running? How do you turn on and off the (re)computation of the result spec? -Marshall Michael Baessler wrote: Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. -- Michael
Re: DOUBT FROM AN INDIAN STUDENT
Hi - Can you please post more information so we might be better able to help? For instance, what version of UIMA did you install, where did you get it from, what steps did you take when installing it, did you set up the environment variables as described in the README? -Marshall chandra sekhar wrote: Respected sir , I am V.chandra sekhar , from INDIA doing MS in Information Technology in DA-IICT (One of finest Tech Schools in INDIA), I am doing my internship in UIMA, I installed UIMA SDK , but i am not able to run document analyzer.bat fle. i need to get Document Analyzer window, but i didnt get. please help me in this regard . regards v.chandra sekhar PG - Student DA-IICT India
Re: DOUBT FROM AN INDIAN STUDENT
The path variable seems to show several possible problems. There appear to be several installs of UIMA, possibly at different levels, from different sources, on your machine. The PATH variable points to the following: C:\UIMA\bin; C:\Program Files\IBM\uima\bin; C:\uima\uima1\bin; C:\Program Files\Java\jdk1.5.0\bin; C:\TODAY\apache-uima\uimacpp\bin; C:\TODAY\apache-uima\uimacpp\examples\tutorial\src Can you fix this so that the PATH variable excludes the other UIMA installs, and instead, points just to the one you installed? Another thing that may be a problem is: as of version 2.2.1, Apache UIMA requires Java 5 or later to run. I see in your path that you have Java 1.4. Can you try fixing this too, and seeing if that helps? -Marshall Chandra Sekhar wrote: Respected sir , I stored UIMA SDK in (TODAY folder) C:\TODAY\apache-uima\bin . I set environment variable UIMA_HOME as C:\TODAY\apache-uima . I set this variable in system variables location . This the error i am getting when i double clik on document.analyzer.bat file; C:\TODAY\apache-uima\binsetlocal C:\TODAY\apache-uima\bincall C:\TODAY\apache-uima\bin\setUimaClassPath C:\TODAY\apache-uima\binset UIMA_CLASSPATH=;C:\TODAY\apache-uima\examples\resou rces;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\lib\uima-docume nt-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\apache-uima\lib \uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODAY\apache-uima\ lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vinci.jar;\webap ps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.jar;\webapps\a xis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\commons-discover y-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\axis\WEB-INF\li b\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\webapps\axis\W EB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\apache-uima\lib\ jVinci.jar;; C:\TODAY\apache-uima\binset PATH=C:\UIMA\bin;C:\Program Files\IBM\uima\bin; C:\uima\uima1\bin;C:\Program Files\Java\jdk1.5.0\bin;C:\TODAY\apache-uima\uima cpp\bin;C:\TODAY\apache-uima\uimacpp\examples\tutorial\src C:\TODAY\apache-uima\binif C:\j2sdk1.4.2_03 == (set UIMA_JAVA_CALL=java ) else (set UIMA_JAVA_CALL=C:\j2sdk1.4.2_03\bin\java ) C:\TODAY\apache-uima\binC:\j2sdk1.4.2_03\bin\java -cp ;C:\TODAY\apache-uima\ examples\resources;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\l ib\uima-document-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\a pache-uima\lib\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODA Y\apache-uima\lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vi nci.jar;\webapps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis. jar;\webapps\axis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\co mmons-discovery-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\a xis\WEB-INF\lib\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\ webapps\axis\WEB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\ap ache-uima\lib\jVinci.jar;; -Duima.home=C:\TODAY\apache-uima -Duima.datapath= -DVNS_HOST=localhost -DVNS_PORT=9000 -Djava.util.logging.config.file=C:\TODAY \apache-uima\config\Logger.properties -Xms128M -Xmx800M org.apache.uima.tools.d ocanalyzer.DocumentAnalyzer The system cannot find the path specified. C:\TODAY\apache-uima\binPAUSE Press any key to continue . . . sir , please give me solution for this. regards sekhar.
Re: DOUBT FROM AN INDIAN STUDENT
I see another problem - this is probably the direct problem. On you machine, you have an environment variable called JAVA_HOME, and it is set to C:\j2sdk1.4.2_03 However, it appears you no longer have Java installed there. To fix, please install Java 5 (or 6 - these are the levels required for UIMA 2.2.1) and set the environment variable JAVA_HOME to where you installed it. -Marshall chandra sekhar wrote: Respected sir , I stored UIMA SDK in (TODAY folder) C:\TODAY\apache-uima\bin . I set environment variable UIMA_HOME as C:\TODAY\apache-uima . I set this variable in system variables location . This the error i am getting when i double clik on document.analyzer.bat file; C:\TODAY\apache-uima\binsetlocal C:\TODAY\apache-uima\bincall C:\TODAY\apache-uima\bin\setUimaClassPath C:\TODAY\apache-uima\binset UIMA_CLASSPATH=;C:\TODAY\apache-uima\examples\resou rces;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\lib\uima-docume nt-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\apache-uima\lib \uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODAY\apache-uima\ lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vinci.jar;\webap ps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis.jar;\webapps\a xis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\commons-discover y-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\axis\WEB-INF\li b\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\webapps\axis\W EB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\apache-uima\lib\ jVinci.jar;; C:\TODAY\apache-uima\binset PATH=C:\UIMA\bin;C:\Program Files\IBM\uima\bin; C:\uima\uima1\bin;C:\Program Files\Java\jdk1.5.0\bin;C:\TODAY\apache-uima\uima cpp\bin;C:\TODAY\apache-uima\uimacpp\examples\tutorial\src C:\TODAY\apache-uima\binif C:\j2sdk1.4.2_03 == (set UIMA_JAVA_CALL=java ) else (set UIMA_JAVA_CALL=C:\j2sdk1.4.2_03\bin\java ) C:\TODAY\apache-uima\binC:\j2sdk1.4.2_03\bin\java -cp ;C:\TODAY\apache-uima\ examples\resources;C:\TODAY\apache-uima\lib\uima-core.jar;C:\TODAY\apache-uima\l ib\uima-document-annotation.jar;C:\TODAY\apache-uima\lib\uima-cpe.jar;C:\TODAY\a pache-uima\lib\uima-tools.jar;C:\TODAY\apache-uima\lib\uima-examples.jar;C:\TODA Y\apache-uima\lib\uima-adapter-soap.jar;C:\TODAY\apache-uima\lib\uima-adapter-vi nci.jar;\webapps\axis\WEB-INF\lib\activation.jar;\webapps\axis\WEB-INF\lib\axis. jar;\webapps\axis\WEB-INF\lib\commons-discovery.jar;\webapps\axis\WEB-INF\lib\co mmons-discovery-0.2.jar;\webapps\axis\WEB-INF\lib\commons-logging.jar;\webapps\a xis\WEB-INF\lib\commons-logging-1.0.4.jar;\webapps\axis\WEB-INF\lib\jaxrpc.jar;\ webapps\axis\WEB-INF\lib\mail.jar;\webapps\axis\WEB-INF\lib\saaj.jar;C:\TODAY\ap ache-uima\lib\jVinci.jar;; -Duima.home=C:\TODAY\apache-uima -Duima.datapath= -DVNS_HOST=localhost -DVNS_PORT=9000 -Djava.util.logging.config.file=C:\TODAY \apache-uima\config\Logger.properties -Xms128M -Xmx800M org.apache.uima.tools.d ocanalyzer.DocumentAnalyzer The system cannot find the path specified. C:\TODAY\apache-uima\binPAUSE Press any key to continue . . . sir , please give me solution for this. regards sekhar.
Re: DOUBT FROM AN INDIAN STUDENT
Hi - Please post what you have the JAVA_HOME environment variable set to. It appears to be set to: C:\Program Files\\bin\ This doesn't look correct. -Marshall chandra sekhar wrote: Respected sir , I set JAVA_HOME to C:Program Files only. Error in document analyzer is solved,but there is an error in adjustExamplePaths.bat file. the error message is like this: C:\Program Files\apache-uima\binsetlocal C:\Program Files\apache-uima\binif C:\Program Files\ == (set UIMA_JAVA_CAL L=java ) else (set UIMA_JAVA_CALL=C:\Program Files\\bin\java ) C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples\data .xml C:/Program_ Files/apach e-uima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .classpath C:/Program Files/apach e-uima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .launch C:/Program Files/apache-u ima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .wsdd C:/Program Files/apache-uim a C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binPAUSE Press any key to continue . . . please help me in this regard sir.
Re: DOUBT FROM AN INDIAN STUDENT
chandra sekhar wrote: Respected Sir , now i set JAVA_HOME variable to C:\Program Files This still appears to be incorrect, I think. Is there a local student or teacher at your university who can help you set up your JAVA_HOME variable to point to where Java 5 is installed? I'm guessing that the Java installer might have installed Java at some place like: C:\Program Files\Java\jdk1.5.0_something in which case your JAVA_HOME variable should be something like: C:\Program Files\Java\jdk1.5.0_something (Of course 1.5.0_something is just an example, your actual install would have some number in place of the something). -Marshall the error in document analyzer is solved, but there is a path not found error in adjustExamplePaths.bat file. The error message is like this. C:\Program Files\apache-uima\binsetlocal C:\Program Files\apache-uima\binif C:\Program Files\ == (set UIMA_JAVA_CAL L=java ) else (set UIMA_JAVA_CALL=C:\Program Files\\bin\java ) C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples\data .xml C:/Program_ Files/apach e-uima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .classpath C:/Program Files/apach e-uima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .launch C:/Program Files/apache-u ima C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binC:\Program Files\\bin\java -cp C:\Program Fi les\apache-uima/lib/uima-core.jar org.apache.uima.internal.util.ReplaceStringIn Files C:\Program Files\apache-uima/examples .wsdd C:/Program Files/apache-uim a C:\Program Files\apache-uima -ignorecase The system cannot find the path specified. C:\Program Files\apache-uima\binPAUSE Press any key to continue . . .
[DISCUSS] Naming for sandbox project for Asynchronous Scaleout
There is a new sandbox project, currently called uima-ee. Should we change it's name? A suggested alternative uima-as. Some arguments pro / con changing the name Pro: 1. uima-as goes with UIMA, Asynchronous Scaleout, and the name, therefore, more clearly matches the functionality. This is good from the perspective of being clear and transparent to new users/developers. 2. uima-ee has no official meaning; it came from a practice of labeling some products with these kinds of features as enterprise edition, such as J2EE. This is kind of a marketing buzzword, without any specific semantics, and could be used to include other kinds of enterprise scale capabilities beyond asynchronous scaleout (so it is too broad for the current thing, at least). Con: 1. uima-ee is already in use; we'd have to do extra (but probably 1-time) work to change it 2. uima-ee is broader - so we could include additional enterprise scale capability, over time, in the new project, not related specifically to Asynchronous Scaleout. 3. Written without the dash, uima-as becomes uima as and is confusing (because as is a common English word, whereas uima ee has no such issue 4. It's always more make-work work to change a name than you think There are probably other arguments pro / con, please post if significant :-) Please register your opinions on doing this name change. When you do, please also indicate the strength of your view and reasons for it :-) Except for the work, I'm slightly in favor of changing to uima-as. -Marshall
Re: DOUBT FROM AN INDIAN STUDENT
chandra sekhar wrote: Respected Sir , I didnt found any error messages while running both batch files.But i didnt get a window when i run documentAnalyzer.bat file. I don't have a good idea what to suggest specifically. Please see if you can get a professor, or another student at your university to take a look at your computer and setup and see if they can tell what's going wrong. -Marshall
Workaround for maven eclipse:eclipse failure
If you run eclipse:eclipse goal in the root POM project (uimaj), it runs, but doesn't do the right things. It doesn't reliably set up linked resources in the .project, and doesn't reliably set up the .classpath file with the proper entries for those linked resources. The observed result is that you get compliation failures in Eclipse - saying it can't find things that are in the linked jars. The fix is to run eclipse:eclipse in the individual projects, not at the root project. Also, beware of running eclipse:clean goal on the root project - it erases the .project file because it is cleaning, and then won't put it back (it sees this is a POM project, not a JAR project, and won't put .project files in the POM project). If anyone has any insight on how to get the eclipse:eclipse goal to work from the root, that would be nice to hear. -Marshall
Re: RESPECTED SIR , A DOUBT IN UIMA.
There are maybe two or three problems. First - please check if you have a firewall that is blocking internet access for particular programs (many firewalls have configuration per program, and block out-bound internet access) One way to check this is to turn off the firewall for a test, and see if the connections go thru. Second, the update site for eclipse emf is documented on http://www.eclipse.org/modeling/emf/updates/ (Note - this is not the update sige, but it tells you what the update site is. According to this, the update site is *http://download.eclipse.org/modeling/emf/updates/site.xml* Third - if you downloaded and installed the most recent version of Eclipse, depending on which packaging you downloaded, it may already have EMF. See the compare packages page on the download page for Eclipse: http://www.eclipse.org/downloads/moreinfo/compare.php -Marshall chandra sekhar wrote: Respected Sir , I am sekhar from India. sir , when I am working eclipse modeling framework, when i followed these steps : Help- Software Updates- Find and install etc , I am getting error message like : network connection problems encountered during search . when i click detail button of these windows , these are the details. Network connection problems encountered during search. Unable to access http://wiki.eclipse.org/EMF/Installation;. Unable to access site: http://wiki.eclipse.org/EMF/Installation; [Server returned HTTP response code: 403 Forbidden for URL: http://wiki.eclipse.org/EMF/Installation.] Server returned HTTP response code: 403 Forbidden for URL: http://wiki.eclipse.org/EMF/Installation. Unable to access site: http://wiki.eclipse.org/EMF/Installation; [Server returned HTTP response code: 403 Forbidden for URL: http://wiki.eclipse.org/EMF/Installation.] Server returned HTTP response code: 403 Forbidden for URL: http://wiki.eclipse.org/EMF/Installation. can u suggest me which site to use. please provide a solution for this.
Re: RESPECTED SIR , A DOUBT IN UIMA.
Did you install the uima plugins? If so, please do menu Window - Show View - Other - PDE - Plug-ins and verify that the plugin org.apache.uima.desceditor is shown, without any error markers. If there are error markers, please do menu Window-Show View - Other - PDE -Plug-in Dependencies, and see if you can find the org.apache.uima.desceditor and see if it is missing some required dependency. -Marshall chandra sekhar wrote: Respected Sir, While working with UIMA, I am running an productNumbertype example , I have imported project uima_examples. When I ran it using document analyzer it is showing window. I have given input, output directory path. click on RUN and it is running and as result of that new window is appearing after run. When I tried to create new system descriptor file by right clicking descriptor - New - Other, I did find UIMA expander in that wizard. I have executed successfully all previous steps. I am getting Eclipse Modeling framework, Example EMF model creation wizard, but not UIMA and Simple expander in that wizard. Regards, sekhar
jar naming with/without versions
The basic maven build creates Jars in the target with alternate names: project uimaj-core = drop the j from uimaj, and don't suffix the version - uima-core.jar An exception to this is jVinci - jVinci becomes jVinci.jar (no version) Another exception: when the uimaj-ep-runtime plugin is built, it has Jars in with the names: project uimaj-core = uimaj-core-{version}.jar These names have to match entries in the manifest. I'm working on improvements to our maven POMs to more automate the builds. I've been able to build the uimaj-ep-runtime jar so it contains the other jars, but it puts in the jars as named by the other projects, so it has, e.g., uima-core.jar (no j, and no version). This would make our uimaj-ep-plugin internal jars follow our other jar naming conventions, and would reduce the need to have uimaj-ep-runtime manifest updated to change version numbers. Is this OK to do, or is there a reason we keep the uimaj-ep-runtime inner jars naming conventions different? -Marshall
Re: RESPECTED SIR , A DOUBT IN UIMA.
Hi - You need to add classpath entries for your project that refer to the UIMA jars in the lib directory of UIMA_HOME. A simple way to do this is to open your project's properties (select the project, then do menu: Project - Properties) and select Java Build Path. Then select the Libraries tab, and click on Add Variable. If you haven't already done this, add a variable UIMA_HOME and set it to where you installed UIMA. (to do this, if you need to, click Configure Variables...). Select the UIMA_HOME variable, and click Extend... Then expand the lib folder, and select uima-core.jar. -Marshall chandra sekhar wrote: Respected Sir , import org.apache.uima.jcas.JCas; import org.apache.uima.jcas.JCasRegistry; import org.apache.uima.cas.impl.CASImpl; import org.apache.uima.cas.impl.FSGenerator; import org.apache.uima.cas.FeatureStructure; import org.apache.uima.cas.impl.TypeImpl; import org.apache.uima.cas.Type; import org.apache.uima.cas.impl.FeatureImpl; import org.apache.uima.cas.Feature; import org.apache.uima.jcas.tcas.Annotation_Type; I m getting error org.apache cannot resolved. regards sekhar.
Re: jar naming with/without versions
Thanks Jörn. I think that with some experimentation we can get the PDE nature to work properly - at least, that's my goal for now :-) I'll do some more tests to see if I can come up with an approach which allows both maven and Eclipse to work . Thanks. -Marshall Jörn Kottmann wrote: On Jan 21, 2008, at 10:58 AM, Michael Baessler wrote: Is this OK to do, or is there a reason we keep the uimaj-ep-runtime inner jars naming conventions different? The motivation was to add a PDE nature to the eclipse plugin projects with the maven eclipse plugin. The only way I found to make it work was to add the version in the manifest. The other side is that it never worked very well to add a PDE nature from the maven plugin, there were still some problems left and it does not work correctly, e.g Classpath problems in eclipse, Cas Editor export does not work, etc. I think it would be ok to remove the version number and remove the pde stuff from the POM file and search another way to address the pde project nature issue. What do you think ? Jörn
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
I'm doing a redesign for the result spec area to improve performance. The basic idea is to put a hasBeenChanged flag into the result spec object, and use it being false to enable users to avoid recomputing things. Why not use equal ? because a single result spec object is shared among multiple users, and when updated, the object is updated in place (so there is no other object to compare it to). Looking at the ResultSpec object - it has a hashMap that stores the Types and Features (TypeOrFeature objects) as the keys; the values are hashSets holding languages for which these types and features are in the result spec. (There is a special hash set having just the entry of the default language = UNSPECIFIED_LANGUAGE = x-unspecified). I'm going to try and make the default language hash set a constant, and create just one instance of it - this should improve performance, especially when languages are not being used. There are 2 kinds of methods to add types/features to a result spec: ones with language(s) and ones without. The ones without reset any language spec associated with the type or feature(s) to the UNSPECIFIED_LANGUAGE. The ones with a language, sometimes replace the language associated with the type/feature, and other times, they add the language (assuming the type/feature is already an entry in the hashMap of types and features). methods which are replacing any existing languages: setResultTypesAndFeatures[array of TypeOrFeature)repl with x-unspecified language setResultTypesAndFeatures[array of TypeOrFeature, languages) repl with languages addResultTypeOrFeature(1-TypeOrFeature) repl with x-unspecified language addResultTypeOrFeature(1-TypeOrFeature, languages) repl with languages addResultType(String, boolean) repl with x-unspecified language addResultFeature(1-feature, languages)repl with languagesx-unspecified methods which are adding to existing languages: addResultType(1-type, boolean, languages) adds languages addResultFeature(1-feature) adds x-unspecified The set... method essentially clears the result spec and sets it with completely new information, so it is reasonable that it replaces any existing language information. The addResult methods, when used to add a type or feature which already present, are inconsistent - with one method adding, and the others, replacing. This behavior is documented in the JavaDocs for the class. The JavaDocs have the behavior for adding a Feature by name reversed with the behavior for adding a Type by name. In one case, including the language is treated as a replace, in the other as an add. This seems likely a bug in the Javadocs. The code for the addResultFeature is reversed from the Javadocs: the code will add languages if specified, but replaces (with the x-unspecified) if languages are not specified in the method call. Does anyone know what the correct behavior of these methods is supposed to be? -Marshall
Re: RESPECTED SIR , A DOUBT IN UIMA.
perhaps this is just a couple of spelling errors? 1. Descriptor has an e (not an i) as the 2nd letter. 2. In the UIMA example descriptors, there is no ProductNumberDescriptor file. Did you put one in there that you are trying to access? -Marshall chandra sekhar wrote: Respected sir , I am implementing the pdf which i am attached with this mail. I executed correctly upto SPECIFY THE ANALYSIS ENGINE DESCRIPTOR. While executing SPECIFY THE ANALYSIS ENGINE DESCRIPTOR . I am getting an error An import cannot be resolved.no .xml file with file name *ProductNumberDiscriptor.xml* was found in class path or data path. *Note : I given the name as ProductNumberAEDiscriptor.xml * my data path is set to : C:\Program Files\IBM\uima\docs\examples\descriptors my class path : C:\ProgramFiles\IBM\uima\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\IBM\uima\docs\examples\descriptors\vinciService. regards sekhar.
Re: RESPECTED SIR , A DOUBT IN UIMA.
I do not know what your pdf file is. Although you say in an earlier mail you attached it, attachments do not come thru on this mailing list. Can you describe the PDF file? I looked through our UIMA PDF's and don't find this name. If this name is coming from some other download you have done from some other provider, please check to see that you have followed their instructions for installation. See if you can locate that file. Once you locate it, please add that path to your class path or data path. -Marshall chandra sekhar wrote: Respected sir , I am just copying the names given by them in pdf file. my descriptors folder doesnt contain ant file by name ProductNumberDescriptor. I am trying to acces to ProductNumberAEDescriptor from my descriptor folder.even though I am getting an error. regards sekhar.
Re: capabilityLangugaeFlow - computeResultSpec
The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
In looking thru the code for ResultSpecification_Impl, it seems there seems to be an inconsistency - unless I (quite possible :-) ) missed something. The calls to the containsType(...) method operate in one of 2 ways, depending on whether or not the result specification has been compiled by calling the compile method. If the result spec has not been compiled, then containsType(...) returns true iff the type specified is equal(...) to a type in the Result Specification. If it has been compiled, then the containsType returns true iff the type specified is equal to a type *or any of its subtypes* in the Result Specification. This is because compiling a resultSpecification adds the subtypes. Can others confirm this? In actual use within annotators, it may be that the result spec is always compiled before use (I haven't yet traced that down). Should the code and Javadocs be updated to have containsType return true for subtypes of types in the result spec, always? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I'm thinking of simplifying the CapabilityContainer class. Right now it has code to process input and well as output capabilities, but the input ones appear never to be used. Can anyone confirm that? If confirmed, I would propose to remove the part related to input capabilities. There is a HashMap, outputToFCapability, whose keys are Strings corresponding to an output type-or-feature name, for any language, for any capability-set. The values do not seem to be used. I'd like to replace this with a hashSet. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I did this trace. Here's how it works now, without calling this. The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl which calls setResultSpecification on the AEEngine_impl object, which 1) clones the result-spec object 2) adds capabilities to it from the *inputs* of all components of this aggregate 3) uses this one cloned object as the result spec passed down to each component. Before going further - Michael - a question: isn't this union-with-all-inputs-behavior something you didn't want for capability language flow? Maybe it doesn't matter in that the use of capability language flow is not done in the real application use cases by passing the result spec in the top level call to the process method of the analysis engine? -Marshall Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Here's the trace of how this works, when run from a top level process(cas) call: 1) the call goes to the AnalysisEngine_Impl process method, which calls processAndOutputNewCASes in the same object. This calls the ASB_impl process method, which creates a new AggregateCasIterator(aCAS). This constructor calls computeFlow on the ...asb.impl.FlowControllerContainer object. This calls the particular flow controller's computeFlow method. In this case, the flowController is the CapabilityLanguageFlowController. Since this a new CAS coming in to the aggregate, the computeFlow method makes a new CapabilityLanguageFlowObject, passing in the pre-computed Flow Table). So that's how it uses this constructor, in the case where no specific result spec is passed. -Marshall Marshall Schor wrote: Easy to see- just trace the test case... -Marshall Michael Baessler wrote: But it would still be interesting why this is never needed and how it works now. -- Michael Marshall Schor wrote: OK. This would confirm that the other constructor is no longer needed, since the test that passes a result-spec arg in the process method no longer calls that. Thanks. -Marshall Michael Baessler wrote: When looking at the tests for the capability language flow I see both tests one with the result spec argument in the process() method and one without. In older UIMA versions, when using the debugger I see that both constructors are used there. -- Michael Marshall Schor wrote: Thanks. I'll see about comparing the older method with the current method, to verify this. -Marshall Michael Baessler wrote: In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) constructor was used when the result was set by an application using the process method with the resultSpec argument. In the current version it seems that only the version with the precomputed FlowTable is used. But I can't say if that is correct or not since I don't know the details about the ResultSpec restructuring (maybe only Adam knows). But you are right, if this constructor isn't necessary both, the code and the constructor, can be removed. Seems that the architecture has changed here. :-) -- Michael Marshall Schor wrote: If this is removed or if it is never called, then there is a section of the logic in CapabilityLanguageFlowObject which is never used, because mNodeList == null: if (mNodeList != null) { // 80 or lines of code elided } Can this logic be removed? -Marshall Marshall Schor wrote: The class CapabilityLanguageFlowObject has 2 defined constructors, but one is never used/referenced: CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec) Can this be removed? -Marshall
Re: UIMA Sandbox releases
Re: releasing the Cas Editor - with or without some pre-packaged annotators. I suspect that Joern would be willing to be the release manager for this :-). He may even be willing to bundle some of the more stable sandbox components with it, but certainly not uima-as (uima-ee), which is not ready. The pragmatic, least - work approach would be to pick those sandbox projects that would be ready now, and do one release packaging that included the Cas Editor. However, I don't think that's the clearest approach for our users. I think they might like to see bundles arranged by topics - and so might like a bundle of annotators, and might separately like the Cas Editor. So - my preference for now would be to keep the Cas Editor as a separately packaged thing coming from the project. If we get additional tools, over time, which we consider add-ons and not fundamentally needed as part of the core, then perhaps we can have a tools-bundle. To do this effectively using the Maven way - we might want to have each tool (in one project) produce one jar (maven way: each project = one jar), at a particular version level. These would be available in the maven jar repository, and maven tooling could be used to fetch them. Maven assemblies could then be used to package multiples of these into bigger packages of things. A basic idea here would be that the version of the assembly would be on a different schedule than the components. So someone downloading an assembled bundle would get parts, each of which had their own version number. This is similar to what you get with other big projects that include jars from other sources. The parts which are stable and not changing would not have their version numbers incremented in the assembled bundle. -Marshall Michael Baessler wrote: Marshall Schor wrote: Thilo Goetz wrote: Hi Marshall, as usual, my view is pretty much the exact opposite ;-) First of all, I don't see the sense in creating yet another category. To my mind, there's nothing wrong with having mature components in the sandbox. The only thing I would consider is to move some sandbox components that are really important to people into the core. I think people might feel that the sandbox isn't a place to get production-quality things, and I was hoping that some of these components were production-quality :-) I think Thilo raised a good point here. We still have an empty framework that does not provide any linguistic functionality out of the box. So maybe we should think about moving sandbox components that are ready to use and are important for most of the UIMA users to the core. We could than also provide some more out of the box analytics by combining the components. For all the other Sandbox components that are ready to use but are not relevant for most of the UIMA users we can consider to do a separate release for each component. I guess the release cycles are larger for those components so that we do not have so much Sandbox component releases. Opinions? -- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Eddie - this is for you to check I think: There is code in UimacppEngine in method serializeResultSpecification which adds result spec types and features to 2 IntVector arrays (one for Types, one for Features). As currently designed, these miss getting the subtypes of types, and all the features for types marked with the all-features flag in the capabilities. Are these required here? Also, I notice that the result spec supports languages - but the serialization for this doesn't support languages. Is that intended? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
I'll fix the Javadocs to correspond to what the code does. This will have the result that addResultFeature(1-feature, languages) will *add* to the existing languages, while addResultFeature(1-feature) will *replace* all existing languages with x-unspecified. -Marshall Marshall Schor wrote: I'm doing a redesign for the result spec area to improve performance. The basic idea is to put a hasBeenChanged flag into the result spec object, and use it being false to enable users to avoid recomputing things. Why not use equal ? because a single result spec object is shared among multiple users, and when updated, the object is updated in place (so there is no other object to compare it to). Looking at the ResultSpec object - it has a hashMap that stores the Types and Features (TypeOrFeature objects) as the keys; the values are hashSets holding languages for which these types and features are in the result spec. (There is a special hash set having just the entry of the default language = UNSPECIFIED_LANGUAGE = x-unspecified). I'm going to try and make the default language hash set a constant, and create just one instance of it - this should improve performance, especially when languages are not being used. There are 2 kinds of methods to add types/features to a result spec: ones with language(s) and ones without. The ones without reset any language spec associated with the type or feature(s) to the UNSPECIFIED_LANGUAGE. The ones with a language, sometimes replace the language associated with the type/feature, and other times, they add the language (assuming the type/feature is already an entry in the hashMap of types and features). methods which are replacing any existing languages: setResultTypesAndFeatures[array of TypeOrFeature)repl with x-unspecified language setResultTypesAndFeatures[array of TypeOrFeature, languages) repl with languages addResultTypeOrFeature(1-TypeOrFeature) repl with x-unspecified language addResultTypeOrFeature(1-TypeOrFeature, languages) repl with languages addResultType(String, boolean) repl with x-unspecified language addResultFeature(1-feature, languages)repl with languagesx-unspecified methods which are adding to existing languages: addResultType(1-type, boolean, languages) adds languages addResultFeature(1-feature) adds x-unspecified The set... method essentially clears the result spec and sets it with completely new information, so it is reasonable that it replaces any existing language information. The addResult methods, when used to add a type or feature which already present, are inconsistent - with one method adding, and the others, replacing. This behavior is documented in the JavaDocs for the class. The JavaDocs have the behavior for adding a Feature by name reversed with the behavior for adding a Type by name. In one case, including the language is treated as a replace, in the other as an add. This seems likely a bug in the Javadocs. The code for the addResultFeature is reversed from the Javadocs: the code will add languages if specified, but replaces (with the x-unspecified) if languages are not specified in the method call. Does anyone know what the correct behavior of these methods is supposed to be? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Some corner cases. Case 1: If using the method to alter an existing result spec by adding a single type with an associated set of languages, the passed in allAnnotatorFeatures boolean will now be unioned with any existing setting of this. Javadocs updated to reflect this. Case 2: If you have a capability for language 1 which says output type A (not all features), and have another capability for language 2 which says output type A (allAnnotatorFeatures), this will be represented in the result spec by having language 1 also be for all features. Case 3: when setting the result spec, passing null in as the value of the languages (for those set/add things that take language arrays) will be equivalent to passing in the one language x-unspecified. So, in particular, if a spec says produce type A for lang 1 and 2, and then you use the addResultType(for type A, null-passed-in-for-language-spec) this will add the language x-unspecified for type A. I will attempt to document these in the Javadocs. Please post a response if these corner cases need to be handled differently. -Marshall
Re: RESPECTED SIR , A DOUBT IN UIMA.
Hi - From another email you sent, I see you got by this error. What did you do to resolve this one? -Marshall chandra sekhar wrote: Respected sir , I am implementing the pdf which i am attached with this mail. I executed correctly upto SPECIFY THE ANALYSIS ENGINE DESCRIPTOR. While executing SPECIFY THE ANALYSIS ENGINE DESCRIPTOR . I am getting an error An import cannot be resolved.no http://resolved.no/ .xml file with file name *ProductNumberDiscriptor.xml* was found in class path or data path. *Note : I given the name as ProductNumberAEDiscriptor.xml * my data path is set to : C:\Program Files\IBM\uima\docs\examples \descriptors my class path : C:\ProgramFiles\IBM\uima\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\IBM\uima\docs\examples\descriptors\vinciService. regards sekhar.
Re: RESPECTED SIR , A DOUBT IN UIMA.
For others wanting to follow this, the references to the right side, left side refer to the CAS Visual Debugger tool. The PDF he refers to is the article from IBM DeveloperWorks which is a tutorial on creating UIMA applications, by Nicholas Chase, in case you want to search for it. What this appears to be is that the annotator is not receiving any input to annotate. Please check the previous step where you are asked to specify the data to be analyzed. What did you specify? Does that file actually exist? -Marshall Copy below is of email sent to me and Tong Fin 23 Jan 2008, 11:23 AM Respected Sir , I am implementing the pdf file attached with this mail. I implemented every thing well and without errors upto* SPECIFY THE ANALYSIS DESCRIPTOR . *when I run the debuger , my text file is not appearing on the right side. I am also attaching the ProductNumberAnnotator java fiile I am using. when I execute the command Run Run ProductNumberAEDescriptor in debugger window. I am getting these values in left side of debugger . Annotation Index [1] uima.tcas.Annotation[1] uima.tcas.DocumentAnnotation com.backstopmedia.uima,tutorial.ProductNumber[0] sofalIndex [0]. Sir ,please give me a solution for this.
Re: [DISCUSS] Naming for sandbox project for Asynchronous Scaleout
OK, without further a-do, we change the name to uima-as. At some point when I get a moment, I'll enter a Jira, assign it to me, and rename the uimaj-ee things in SVN to uimaj-as in the sandbox. After I do that, I'll notify everyone... by posting here again. (If some other committer wants to do this, that's fine, too...) -Marshall Jörn Kottmann wrote: +1 Jörn On Jan 21, 2008, at 10:49 AM, Michael Baessler wrote: +1 for changing the name to uima-as. I think a clear and transparent name is very important that people get interested in and work with. It is also better and easier to integrate to the core if we decide to move it from the Sandbox to the core any time in the future. -- Michael Marshall Schor wrote: There is a new sandbox project, currently called uima-ee. Should we change it's name? A suggested alternative uima-as. Some arguments pro / con changing the name Pro: 1. uima-as goes with UIMA, Asynchronous Scaleout, and the name, therefore, more clearly matches the functionality. This is good from the perspective of being clear and transparent to new users/developers. 2. uima-ee has no official meaning; it came from a practice of labeling some products with these kinds of features as enterprise edition, such as J2EE. This is kind of a marketing buzzword, without any specific semantics, and could be used to include other kinds of enterprise scale capabilities beyond asynchronous scaleout (so it is too broad for the current thing, at least). Con: 1. uima-ee is already in use; we'd have to do extra (but probably 1-time) work to change it 2. uima-ee is broader - so we could include additional enterprise scale capability, over time, in the new project, not related specifically to Asynchronous Scaleout. 3. Written without the dash, uima-as becomes uima as and is confusing (because as is a common English word, whereas uima ee has no such issue 4. It's always more make-work work to change a name than you think There are probably other arguments pro / con, please post if significant :-) Please register your opinions on doing this name change. When you do, please also indicate the strength of your view and reasons for it :-) Except for the work, I'm slightly in favor of changing to uima-as. -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
What about allAnnotatorFeatures? Supposed the aggregate says it needs a particular Feature of a particular type. Suppose a delegate is marked as producing that type, and has allAnnotatorFeatures marked. This wouldn't work. You could say in this case that the output capability of the delegate *must not* rely on allAnnotatorFeatures, but instead *must* explicitly list those features it produces. In one sense, this could be a good idea, because no delegate could *accurately* mark that it outputs allAnnotatorFeatures, anyway, due to the possiblity that some other component could add features to the type in question, completely unknown to this delegate - and of course, this delegate would not be setting those other features. This would lead to another question - should we deprecate allAnnotatoreFeatures because of this? -Marshall Michael Baessler wrote: Marshall Schor wrote: Without actually testing this (so this may be a wrong conclusion) - it seems to me that the code in CapabilityLanguageFlowController that sets up the result specs for components, by language, in the mFlowTable, ignores the typesOrFeatures that the result spec adds when compile() is called. If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. Can anyone confirm this? (perhaps adding a test case :-) )? Michael - do you know what the design intent was for this - if things are as I've conjectured above, is this something that needs to be fixed, or is it working as intended? Yes that is correct. The mFlowTable only contains these output types that are specified in the aggregate ae as output type. The guideline for the capabilityLanguageFlow was to specify all output results (with all interim results) in the aggregate that must be produced. I we now change the mFlowTable content to match the resultSpec we also changes the capabilityLanguageFlow. So if we do that, how can I prevent the a sub types isn't produced if a super type must be produced? So I prefer to stay with the current design - specify all you need. What do you think? -- Michale
Re: RESPECTED SIR , A DOUBT IN UIMA.
Suggestion: isolate the problem from the Cas Visual Debugger tool, by running a simple Java application that runs the annotator. Instructions for how to do that are in the tutorials and user guides document on the Apache UIMA web site. You can use the Eclipse debugger to single-step through things and see where things are going wrong. -Marshall chandra sekhar wrote: Sir, I am specifying the data already in the data folder , even though , I am not getting the annotation results on right side window. The windows remaining empty. please give me a suggestion. -- sekhar.
Re: capabilityLangugaeFlow - computeResultSpec
The thing that adds allAnnotatorFeatures and subtypes is compiling the result spec. The builder of the mFlowTable doesn't compile the resultspec before using it - so it doesn't have these consequences. -Marshall Adam Lally wrote: On Jan 24, 2008 7:54 AM, Marshall Schor [EMAIL PROTECTED] wrote: If you recall, the compile method for results specifications augments the set of types/features by doing 2 things: if the type has allAnnotatorFeatures=true, it adds all the features of the type; and if the type has subtypes, it adds those too, propagating the allAnnotatorFeatures processing down. A consequence would be that the mFlowTable would miss these cases: An aggregate wants type A output, and has a delegate with output capability A-subtype. Without looking at the code, I didn't understand why this is a consequence of the behavior you described above. I thought you said and if the type has subtypes, it adds those too? Anyway, I definitely think that this should work. By the definition of subtype, A-subtype *IS A* A. So if an aggregate wants type A produced, then A-subtype should be produced. An aggregate wants Feature F output, and has a delegate with output capability type-A with allAnnotatorFeatures marked, having that feature. We should be supporting this as well. Again I didn't follow why the behavior you described above doesn't do this. -Adam
Re: capabilityLangugaeFlow - computeResultSpec
The code which checks if a type or feature is in a result spec, for a particular language, always includes generalizing the language specifier by dropping the part beyond the first -. For example, en-us and en-uk are simplified to en. Because of this, I'm thinking of shrinking the result specification (for performance / space reasons) by normalizing any language specs it uses by dropping the country extensions, if present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result spec to return true for features not in the type system where the type had the allAnnotatorFeatures flag set. This case shouldn't come up in practice because I can't think of good reason an annotator would ask if a feature not in its type system was present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
LeHouillier, Frank D. wrote: We have an annotator that wraps a black box information extraction component that can return objects of a variety of types. We check the result specification to see if the object is something we want to output based the actual string of the name of the type. If you take away the compiled version of the ResultSpecification then we will have to also check whether the type that we get back from the type system is null or not. Hi Frank - This change would *not* take away the compiled version of the Result Spec. It would only change 1 behavior - that of returning true if a *feature* (not a type, as in your example above) was associated with a type where the capability was marked allAnnotatorFeatures, even if the Feature didn't exist. Suppose you had a type T1, and a type T2 whose super-type was T1, and features T1:f1 T2:f2, with an output capability = T1 with allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and feature T3:f3, and the output capability including T3 with allAnnotatorFeatures = false Here's the current behavior: Before compile: The following would all return true except as marked: containsType(T1) containsType(T2) returns false, T2 not in output capability, and before compile, T2 isn't recognized as a subtype of T1 containsType(T2:f2) returns false, not in output, etc. containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) yes... that's what it does - it ignores the actual feature name because allAnnotatorFeatures is true After compile the following return true except as marked: containsType(T1) containsType(T2) T2 not in output capability, but is recognized as a subtype of T1 containsType(T2:f2) T1's *allAnnotatorFeatures* is inherited containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) false: the actual features are looked up After the change I'm proposing, everything would be same except that containsFeature(T1:asdfasdfasdfasdf) would return true. I don't think this would affect the way you are using result specs, but please let me know if I've misunderstood something. We don't want to impact users with this change. Thanks for your comments :-) -Marshall -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 5:06 AM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result spec to return true for features not in the type system where the type had the allAnnotatorFeatures flag set. This case shouldn't come up in practice because I can't think of good reason an annotator would ask if a feature not in its type system was present. Any objections? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Michael Baessler wrote: Michael Baessler wrote: Adam Lally wrote: On Jan 7, 2008 6:56 AM, Michael Baessler [EMAIL PROTECTED] wrote: I tried to figure out how the ResultSpecification handling in uima-core works with all side effects to check how it can be done to detect when a ResultSpec has changed. Unfortunately I was not able to, there are to much open questions where I don't know exactly if it is right in any case ... :-( Adam can you please look at this issue? I can try to take a look, but I don't have a lot of time. Do you have a test case for this, where you expect I would see a significant performance improvement if I fix this? Sorry I have to performance test case. I checked my assumption using the debugger. I used the following main() with a loop over the process call to check if the result spec is recomputed each time. The descriptor is the same as used in the capabilityLanguageFlow test case of the uimaj-core project. Maybe a sysout helps to detect if the unnecessary calls are done or not. Maybe when iterating more than 10 times will give you performance numbers before and after. Maybe adding additional capabilities that must be analyzed will increase the time used to compute the result spec. I will look at this tomorrow. public static void main(String[] args) { AnalysisEngine ae = null; try { String desc = SequencerCapabilityLanguageAggregateES.xml; XMLInputSource in = new XMLInputSource(JUnitExtension.getFile(desc)); ResourceSpecifier specifier = UIMAFramework.getXMLParser() .parseResourceSpecifier(in); ae = UIMAFramework.produceAnalysisEngine(specifier, null, null); CAS cas = ae.newCAS(); String text = Hello world!; cas.setDocumentText(text); cas.setDocumentLanguage(en); for (int i = 0; i 10; i++) { ae.process(cas); } } catch (Exception ex) { ex.printStackTrace(); } } -- Michael When setting the loop counter to 1000 I have 6000ms without recomputing the result spec and 27000ms when recomputing the result spec. I think this should be sufficient for testing. I think my change is ready for code review. I kept all the idiosyncratic behavior of the old code, so users should not notice any difference. All the tests run, and test case above runs at the 6000ms range. There are 3 areas changed: 1) ResultSpecification_impl is restructured for speed and smaller memory footprint 2) The compiling of this is deferred till the latest possible point; operations that can be done with the uncompiled form are done that way. 3) The code in the CapabilityLanguageFlow where it returns a next step now caches the result spec by component key, and only sends it down if it is different from what this controller sent the last time in invoked this component in the flow. This test depends on the precomputed result specs kept in the mTable variable being constant - which I believe they are (once they are computed) - but Michael -can you confirm this? With this change, the code in the framework to intersect the result spec with a component's output capabilities, by language, is not redone on every call, but only when the language changes. That code (to do the intersection) is running faster, in any case, due to the restructuring. Because this is a big change it would be good to do a code review of some kind - any thoughts on how to do this? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I may have missed something - I don't see what would need to be added to the ResultSpecification class. The method hasOutputTypeOrFeature(...) is always called with doFuzzySearch== true, which is how the containsType or containsFeature methods operate (always) in the Result Specification class. Is there some other difference I'm missing? -Marshall Michael Baessler wrote: Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall Yes, if it is possible to add the missing functionality to the ResultSpecification class, fine with me. For example the important method - hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, doFuzzySearch) is currently not available at the ResultSpecification class. -- Michael
Re: Website update, new files?
ip-clearances is where we have our ip clearance forms for UIMA. There's one (in progress) for uima-as, not done yet. It's ok. -Marshall Michael Baessler wrote: Thilo Goetz wrote: I updated our website with information on the LREC workshop. When I did svn up on people, some new files were added that apparently had been checked in, but not extracted on people.a.o. Is it ok to leave it like that? I assume that things that are checked in are ok to post to the website. Here's what svn said: Unews.html Udecisions.html Ubulk-contribution-checklist.html Uroles.html Uexternal-resources.html Uproject-guidelines.html Usandbox.html UcodeConventions.html Udoc-uima-why.html Uindex-draft.html Ufaq.html Udoc-uima-examples.html Umanagement.html Ucontribution-policy.html Umail-lists.html Udistribution.html Arelease.html Ulicense.html Udependencies.html Uapache-board-status.html Ucode-scan-tools.html Udownloads.html Alrec08.html Uteam-list.html Aip-clearances Aip-clearances/uima-ee.html Ugldv07.html Uget-involved.html Ucommunication.html Usvn.html Ujavadoc.html Uindex.html Udocumentation.html Uuima-specification.html Updated to revision 615921. So what about ip-clearances and release? --Thilo release.html isn't ready for publishing. I haven't checked in that file, so I'm not sure why it occurs in that list. -- Michael
Clarifying language subsumption in Result Specifications
Language specifications are in a hierarchy. For example, from most inclusive to finer subsets, we have: x-unspecified en en-us A result spec's most common use is in a negative sense - Annotators can check a result spec and if it doesn't contain the type or feature, it can skip producing that type or feature. For simplicity, let's consider we have only one type or feature, called TF. If the annotator thinks it produces TF for language en-us only, and wants to check if should skip producing this, it calls containsType/Feature(TF, en-us). This is defined in the current impl to return true, if the result spec has languages x-unspecified, en, or en-us. Let's consider the opposite case. Suppose we have an annotator that can produce TF for en. Suppose the result-spec has an entry for TF only for the language en-us. Should that annotator produce results? If it calls containsType/Feature(TF, en), it will get a false (current implementation). After some thinking about this and some discussion (because I don't think I got it right, just by myself :-) ), it seems that this is correct. Consider the following case: The language of the document is en, and the containing (top-most) aggregate specified explicitly it wanted output only for en-us. In that case, the annotator should not produce any results, because the language of this doc is not en-us, and the assembler put together things that they said should only output en-us results. This same logic seems to apply to x-unspecified: Suppose we have an annotator that can produce TF for x-unspecified. Suppose the result-spec has an entry for TF only for the language en. Should that annotator produce results? If it calls containsType/Feature(TF, x-unspecified), it should get a false (broken in the current implementation!, but was true I think in the previous one). Assume the language of the document is x-unspecified, and the containing (top-most) aggregate specified explicitly it wanted output only for en. In that case, the annotator should not produce any results, because the language of this doc is not en, and the assembler put together things that they said should only output en results. Do others agree with this? -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
I went back and checked the Javadocs for the ResultSpecification, prior to my reworking of it. I think I treated the x-unspecified slightly wrong, and if I had done it right, then the anomaly noted in the previous note (below) would not be there. The previous Javadocs all say that the setters for a typeOrFeature without a language argument, are equivalent to passing in the x-unspecified language. The method containsType/Feature(foo, x-unspecified) should be made to return true only if the Result specification for this contained x-unspecified. It might not, if, for instance, the setting for Foo was only for languages en and de. A consequence of making it work this way is the following: containsType(foo, x-unspecified) will return false if foo is in the result spec only for particular languages. and the containsType(foo) no language argument would also return false, if foo is in the result spec only for particular languages. I plan correct the treatment of x-unspecified, along these lines, to work as described above. Please post any concerns/objections :-) -Marshall Marshall Schor wrote: While experimenting with this approach, I found some tests wouldn't run. (By the way, the test cases are great - they have been a great help :-) ). Here's a case I'm want to be sure I understand: Let's suppose that the aggregate says it produces type Foo with language x-unspecified. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en, the 2nd one produces Foo with language x-unspecified. A flow given language x-unspecified should run the 2nd annotator, skipping the first one. (This is how it works now). === Here's another similar case, using the other language subsumption between en-us and en. Let's suppose that the aggregate says it produces type Foo with language en. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en-us, the 2nd one produces Foo with language en. A flow given language en should run the 2nd annotator, skipping the first one. (This is how it works now, I think). With this explanation, I see there is a modification to the result spec's containsType/Feature method with a language argument needed for this use. Currently, the ResultSpecification matching works like this: Language arg RsltSpc Matches enen-us no en-us en yes x-unspecified *any* yes behavior needs to be different enx-unsp..yes Is this correct? -Marshall Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
While experimenting with this approach, I found some tests wouldn't run. (By the way, the test cases are great - they have been a great help :-) ). Here's a case I'm want to be sure I understand: Let's suppose that the aggregate says it produces type Foo with language x-unspecified. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en, the 2nd one produces Foo with language x-unspecified. A flow given language x-unspecified should run the 2nd annotator, skipping the first one. (This is how it works now). === Here's another similar case, using the other language subsumption between en-us and en. Let's suppose that the aggregate says it produces type Foo with language en. Let's suppose there are 2 annotators in the flow: the first one produces Foo with language en-us, the 2nd one produces Foo with language en. A flow given language en should run the 2nd annotator, skipping the first one. (This is how it works now, I think). With this explanation, I see there is a modification to the result spec's containsType/Feature method with a language argument needed for this use. Currently, the ResultSpecification matching works like this: Language arg RsltSpc Matches enen-us no en-us en yes x-unspecified *any* yes behavior needs to be different enx-unsp..yes Is this correct? -Marshall Marshall Schor wrote: Can I replace the class CapabilityContainer with the much more efficient (now) ResultSpecification class? It seems to me they do the almost same thing, and the ResultSpecification may be handling the corner cases better. Is there some subtle difference I'm missing? It would be nice to eliminate a class - smaller code base = less maintenance effort in the future :-) -Marshall
Re: Clarifying language subsumption in Result Specifications
I tried implimenting this change, and 2 test cases fail. They look like they are failing exactly in the case where the result specification has a TypeOrFeature with a specified type other than x-unspecified, and the containsTypeOrFeature method is being called using the form which doesn't pass in an explicit type, so is being treated as if x-unspecified was passed in. As discussed below, this should give false, but the text cases expect true. Should I change the test cases? The failing ones are: ResultSpecification_implTest: It defines a result spec containing the type FakeType for languages en, de, en-US, en-GB, but not x-unspecified. So the call rs.containsType(FakeType) returns false, but the test says it should return true (because the set of languages for FakeType is missing x-unspecified). The other test is the PearRuntimeTest. This test loads two Pears, runs them and then looks at the CAS result. The descriptor for one of the tests, the TutorialDateTime descriptor says it output 3 types, *but for language en* (only, and not for x-unspecified in particular). The result spec built for the aggregate is empty (the test case has nothing specified here). When it is passed down to the delegates, the setResultSpecification for the Pear descriptor in PearAnalysisEngineWrapper is called. This is not implemented, so it inherits from its super, which is AnalysisEngineImplBase - and this impl does nothing (expecting to be overridden). I'll write this up as a Jira issue. But even if this were fixed, because the outer Aggregate had nothing specified in its capability, the inner primitive analysis engine is set up initially with a default result spec, which is its own output capabilities. This spec says it should produce results just for en, and in particular it should *not* produce output for x-unspecified. This annotator is written to respect the result spec, so it doesn't produce anything. Anyone object to my changing the test cases? -Marshall Marshall Schor wrote: Language specifications are in a hierarchy. For example, from most inclusive to finer subsets, we have: x-unspecified en en-us A result spec's most common use is in a negative sense - Annotators can check a result spec and if it doesn't contain the type or feature, it can skip producing that type or feature. For simplicity, let's consider we have only one type or feature, called TF. If the annotator thinks it produces TF for language en-us only, and wants to check if should skip producing this, it calls containsType/Feature(TF, en-us). This is defined in the current impl to return true, if the result spec has languages x-unspecified, en, or en-us. Let's consider the opposite case. Suppose we have an annotator that can produce TF for en. Suppose the result-spec has an entry for TF only for the language en-us. Should that annotator produce results? If it calls containsType/Feature(TF, en), it will get a false (current implementation). After some thinking about this and some discussion (because I don't think I got it right, just by myself :-) ), it seems that this is correct. Consider the following case: The language of the document is en, and the containing (top-most) aggregate specified explicitly it wanted output only for en-us. In that case, the annotator should not produce any results, because the language of this doc is not en-us, and the assembler put together things that they said should only output en-us results. This same logic seems to apply to x-unspecified: Suppose we have an annotator that can produce TF for x-unspecified. Suppose the result-spec has an entry for TF only for the language en. Should that annotator produce results? If it calls containsType/Feature(TF, x-unspecified), it should get a false (broken in the current implementation!, but was true I think in the previous one). Assume the language of the document is x-unspecified, and the containing (top-most) aggregate specified explicitly it wanted output only for en. In that case, the annotator should not produce any results, because the language of this doc is not en, and the assembler put together things that they said should only output en results. Do others agree with this? -Marshall
Re: Clarifying language subsumption in Result Specifications
Michael Baessler wrote: Marshall Schor wrote: I tried implimenting this change, and 2 test cases fail. They look like they are failing exactly in the case where the result specification has a TypeOrFeature with a specified type other than x-unspecified, and the containsTypeOrFeature method is being called using the form which doesn't pass in an explicit type, so is being treated as if x-unspecified was passed in. As discussed below, this should give false, but the text cases expect true. Should I change the test cases? The failing ones are: ResultSpecification_implTest: It defines a result spec containing the type FakeType for languages en, de, en-US, en-GB, but not x-unspecified. So the call rs.containsType(FakeType) returns false, but the test says it should return true (because the set of languages for FakeType is missing x-unspecified). Which test method you are talking about? I would like to look at. The call is on line 332 of class ResultSpecification_implTest. This changed behavior arises from the proposed change to how containsType method works: the changed logic is: if the language x-unspecified is given (or if no language is given, as in this case), return true only if the result specification for this type or feature includes the langauge x-unspecified. In this test, the result specification for the type FakeType is set from the component's capabilities specification, which said this component outputs FakeType for languages en, de, en-US, en-GB, but not x-unspecified. So with the propsed changed to how containsType works, it returns false. But the test case expects true. The other test is the PearRuntimeTest. This test loads two Pears, runs them and then looks at the CAS result. The descriptor for one of the tests, the TutorialDateTime descriptor says it output 3 types, *but for language en* (only, and not for x-unspecified in particular). The result spec built for the aggregate is empty (the test case has nothing specified here). When it is passed down to the delegates, the setResultSpecification for the Pear descriptor in PearAnalysisEngineWrapper is called. This is not implemented, so it inherits from its super, which is AnalysisEngineImplBase - and this impl does nothing (expecting to be overridden). I'll write this up as a Jira issue. But even if this were fixed, because the outer Aggregate had nothing specified in its capability, the inner primitive analysis engine is set up initially with a default result spec, which is its own output capabilities. This spec says it should produce results just for en, and in particular it should *not* produce output for x-unspecified. This annotator is written to respect the result spec, so it doesn't produce anything. The PearRuntimeTest does not use to capabilityLanguageFlow so we have a different behavior there! This test is just testing if the component's behavior with respect to using the result specification; I don't think it has anything to do with the capabilityLanguageFlow? -Marshall -- Michael
Re: Clarifying language subsumption in Result Specifications
Michael Baessler wrote: Marshall Schor wrote: Michael Baessler wrote: Marshall Schor wrote: Language specifications are in a hierarchy. For example, from most inclusive to finer subsets, we have: x-unspecified en en-us A result spec's most common use is in a negative sense - Annotators can check a result spec and if it doesn't contain the type or feature, it can skip producing that type or feature. For simplicity, let's consider we have only one type or feature, called TF. If the annotator thinks it produces TF for language en-us only, and wants to check if should skip producing this, it calls containsType/Feature(TF, en-us). This is defined in the current impl to return true, if the result spec has languages x-unspecified, en, or en-us. Let's consider the opposite case. Suppose we have an annotator that can produce TF for en. Suppose the result-spec has an entry for TF only for the language en-us. Should that annotator produce results? If it calls containsType/Feature(TF, en), it will get a false (current implementation). After some thinking about this and some discussion (because I don't think I got it right, just by myself :-) ), it seems that this is correct. Consider the following case: The language of the document is en, and the containing (top-most) aggregate specified explicitly it wanted output only for en-us. In that case, the annotator should not produce any results, because the language of this doc is not en-us, and the assembler put together things that they said should only output en-us results. This same logic seems to apply to x-unspecified: Suppose we have an annotator that can produce TF for x-unspecified. Suppose the result-spec has an entry for TF only for the language en. Should that annotator produce results? If it calls containsType/Feature(TF, x-unspecified), it should get a false (broken in the current implementation!, but was true I think in the previous one). I'm not sure you are right here. I think if an annotator can produce TF for x-unspecified that means that it can produce TF for all languages. So if an en document comes in the annotator should produce a result. hmmm, this seems to contradict your statement below, saying That case is correct. In the example below, the result-spec passed in to the annotator has only en, not x-unspecified. This is the case proposed in my paragraph. Below you say it is right for the annotator to *not* produce results, while above you say it should produce results. This is inconsistent, unless I've mangled something... Can you clarify? -Marshall Assume the language of the document is x-unspecified, and the containing (top-most) aggregate specified explicitly it wanted output only for en. In that case, the annotator should not produce any results, because the language of this doc is not en, and the assembler put together things that they said should only output en results. That case is correct. -- Michael Maybe the confusion comes from the different treatment of x-unspecified. If x-unspecified is specified in the output spec of an annotator it means that it can produce results for all languages. True - and that works. But that wasn't the case I was trying to describe - I was trying to describe the opposite case: The case where the *output spec* of an annotator is *missing* the x-unspecified. To restate the case: The output spec has en (only), and the annotator, when running, queries the result spec with x-unspecified. This proposal says in that case, containsType should return false. Do you agree this should be the result in this case? It seems you do above when you say That case is correct, but disagree in the paragraph where you say I'm not sure you are right here.. Perhaps I have not clearly described the two cases, but I think they are the same case (and therefore need to have the same answer ;-) ) -Marshall -- Michael
Result Specification fixes and Capability Language Flow speed up work now done
Except for UIMA-727. Michael - please run any performance tests you have. I hope the performance is now significantly improved :-) -Marshall
Re: Clarifying language subsumption in Result Specifications
Michael Baessler wrote: Marshall Schor wrote: Michael Baessler wrote: Marshall Schor wrote: I tried implimenting this change, and 2 test cases fail. They look like they are failing exactly in the case where the result specification has a TypeOrFeature with a specified type other than x-unspecified, and the containsTypeOrFeature method is being called using the form which doesn't pass in an explicit type, so is being treated as if x-unspecified was passed in. As discussed below, this should give false, but the text cases expect true. Should I change the test cases? The failing ones are: ResultSpecification_implTest: It defines a result spec containing the type FakeType for languages en, de, en-US, en-GB, but not x-unspecified. So the call rs.containsType(FakeType) returns false, but the test says it should return true (because the set of languages for FakeType is missing x-unspecified). Which test method you are talking about? I would like to look at. The call is on line 332 of class ResultSpecification_implTest. This changed behavior arises from the proposed change to how containsType method works: the changed logic is: if the language x-unspecified is given (or if no language is given, as in this case), return true only if the result specification for this type or feature includes the langauge x-unspecified. In this test, the result specification for the type FakeType is set from the component's capabilities specification, which said this component outputs FakeType for languages en, de, en-US, en-GB, but not x-unspecified. So with the propsed changed to how containsType works, it returns false. But the test case expects true. I don't know that test, but it is fine with me to change the behavior since it seems to be wrong! The other test is the PearRuntimeTest. This test loads two Pears, runs them and then looks at the CAS result. The descriptor for one of the tests, the TutorialDateTime descriptor says it output 3 types, *but for language en* (only, and not for x-unspecified in particular). The result spec built for the aggregate is empty (the test case has nothing specified here). When it is passed down to the delegates, the setResultSpecification for the Pear descriptor in PearAnalysisEngineWrapper is called. This is not implemented, so it inherits from its super, which is AnalysisEngineImplBase - and this impl does nothing (expecting to be overridden). I'll write this up as a Jira issue. But even if this were fixed, because the outer Aggregate had nothing specified in its capability, the inner primitive analysis engine is set up initially with a default result spec, which is its own output capabilities. This spec says it should produce results just for en, and in particular it should *not* produce output for x-unspecified. This annotator is written to respect the result spec, so it doesn't produce anything. The PearRuntimeTest does not use to capabilityLanguageFlow so we have a different behavior there! This test is just testing if the component's behavior with respect to using the result specification; I don't think it has anything to do with the capabilityLanguageFlow? So you mean that the computation of the default result spec does not work correctly, since it is not implemented correctly? If that is true, please go ahead and fix it. I was not aware of that. Thanks for catching it! This has been entered as Jira-727. Not fixed yet (or assigned). -Marshall -- Michael
Possible design change for Capability Language Flow to consider needed inputs?
Suppose I have a capability language flow for an aggregate having 2 delegates where the aggregate's capability spec says it outputs type Toutput. Let's say the delegate #2 has a capability spec saying it outputs Toutput, but needs Tinput as an input, and the aggregate's Capability spec *doesn't* include Tinput as an input. Let's say delegate#1 has a capability spec saying it outputs Tinput, the input needed by delegate#2 The current logic in CapabilityLanguageFlowController.computeSequence would build a flow having only delegate #2, because it doesn't currently consider the need for some flow elements to produce input types needed by later delegates. I'm not sure if this is worth fixing, or, if it should just be documented as a limitation. A proper fix might take some work - as it should consider sequencing to insure needed inputs are produced before they're needed. Any opinions? Also, currently, it is possible that CapabilityLanguageFlowController.computeSequence can fail to find a flow that produces all of the types listed in the Output Spec for its aggregate. In this case, it produces a partial flow - one in which some (perhaps 0) annotators will run, and not produce all the outputs needed. Currently this is not flagged as an error, or logged. Should it be? -Marshall
UIMA java objects which implement MetaDataObject, question about equals and hashCode
Many UIMA framework objects implement the MetaDataObject interface. This interface has an equals method, which does a attribute by attribute equals check (recursively). This interface, however, doesn't implement the hashCode method. So, if any object were to insert one of these objects into a hash table, two equal objects could get different hash codes. For instance, TypeOrFeature instances implements the MetaDataObject. It might be stored in a hash table or hash set (this was done in the previous impl of ResultSpecification _impl). Wouldn't this (at least in principle, theoretically) cause a problem? Is the general, safe, fix to add a hashCode method to the MetaDataObject interface and impl? -Marshall
Re: [jira] Commented: (UIMA-735) ResultSpecification_impl missing equals and hashCode for inner class - causing intermittant test case failure
Thilo Goetz (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564316#action_12564316 ] Thilo Goetz commented on UIMA-735: -- Any particular reason you didn't close this issue? Because the condition which led me to suspect this is quite intermittant, I wanted to run tests for a few days to be sure that was the cause. -Marshall ResultSpecification_impl missing equals and hashCode for inner class - causing intermittant test case failure - Key: UIMA-735 URL: https://issues.apache.org/jira/browse/UIMA-735 Project: UIMA Issue Type: Bug Affects Versions: 2.2.1 Reporter: Marshall Schor Assignee: Marshall Schor Priority: Minor Fix For: 2.3 The ResultSpec impl has an inner class, ToF_Languages. When comparing 2 result specificaitons for equal in test cases, these are compared. But they are missing an equals (and hashCode) methods. So the test case fails to say they're equal unless they're identical. But cloning happens a lot in the way Result specs are used, and in this test, they may be equal (I think) but not ==. Solution: Add proper equals and hashCode to this inner class.
Re: [jira] Commented: (UIMA-735) ResultSpecification_impl missing equals and hashCode for inner class - causing intermittant test case failure
Thilo Goetz wrote: Marshall Schor wrote: Thilo Goetz (JIRA) wrote: [ https://issues.apache.org/jira/browse/UIMA-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564316#action_12564316 ] Thilo Goetz commented on UIMA-735: -- Any particular reason you didn't close this issue? Because the condition which led me to suspect this is quite intermittant, I wanted to run tests for a few days to be sure that was the cause. -Marshall It would be good then if you could put that in a comment in Jira. It's confusing to see the status change to resolved without any indication what is needed to close the issue. OK - you make a good point :-). -Marshall --Thilo
Re: UIMA java objects which implement MetaDataObject, question about equals and hashCode
Adam Lally wrote: MetaDataObject_impl already implements hashCode. The MetaDataObject interface, though, explicitly declares equals() but not hashCode(). This doesn't actually have any effect on the behavior (declaring these on the interface doesn't actually force anyone to override the method if they implement the interface). But it does seem inconsistent from a documentation perspective - either we should declare neither method or both. Really I could go either way. The upside of declaring them is to document that equals and hashCode should be overridden by any implementation of MetaDataObject. The downside is that people might think this is actually enforced by Java, when it is not. Are you saying that if the interface you say your class implements has hashCode, but your implementation doesn't implement it, and neither do any of your superclasses, that this won't be caught as a compile error by Java? or just that you don't have to implement it directly in your class? (this I understand). If it just the latter, then it seems to me quite valuable to include this in the interface, in case someone says they implement it but don't have your implementation of MetaDataObject_impl in their superclass path (unlikely, I know...). -Adam Interesting observations... Eclipse pointed out to me that there was an issue of some kind here, when I asked it to implement the equals and hashCode methods for the new inner class in ResultSpecification. It said that the TypeOrFeature had an issue with hasCode. On Jan 30, 2008 10:57 PM, Marshall Schor [EMAIL PROTECTED] wrote: Many UIMA framework objects implement the MetaDataObject interface. This interface has an equals method, which does a attribute by attribute equals check (recursively). This interface, however, doesn't implement the hashCode method. So, if any object were to insert one of these objects into a hash table, two equal objects could get different hash codes. For instance, TypeOrFeature instances implements the MetaDataObject. It might be stored in a hash table or hash set (this was done in the previous impl of ResultSpecification _impl). Wouldn't this (at least in principle, theoretically) cause a problem? Is the general, safe, fix to add a hashCode method to the MetaDataObject interface and impl? -Marshall
renaming uima-ee- to uima-as this Sunday?
I plan to rename things in SVN from uima-ee to uima-as this Sunday, as was discussed in another mail thread. This may break the uima-as builds for a while as we work out the loose ends. If this timing is bad, please voice your opinion. -Marshall
Re: renaming uima-ee- to uima-as this Sunday?
I't's been pointed out to me that some people may be planning to submit patches to the uima-ee code, and that we should allow time for this to happen, and for the patches to be committed. This should cut down on make-work due to the naming change. Based on that, I'm moving the proposed renaming to Wednesday, Feb 6. -Marshall Marshall Schor wrote: I plan to rename things in SVN from uima-ee to uima-as this Sunday, as was discussed in another mail thread. This may break the uima-as builds for a while as we work out the loose ends. If this timing is bad, please voice your opinion. -Marshall
Re: capabilityLangugaeFlow - computeResultSpec
LeHouillier, Frank D. wrote: While making this change wouldn't affect us in any way as I can see now, it would still be possible to use the Features in the Result Spec in a similar way. Suppose you have an information extraction component that extracts entities with attributes and you want to control which attributes are actually being added to the CAS with the Result Spec. You might have type Person, with a range of features such as Address, Phone number, Age, etc. some of which you want to output in a given configuration and others not. Suppose the information extraction component also extracts attributes which are so useless that you don't include them as features in the type system at all such as an internal id number. Currently, with a compiled Result Spec you could have the annotator look up the feature on the basis of the name of the feature and then you could reliably instantiate the feature without further ado. After your change, the feature would have to be checked to see if it actually exists. We added code in the actual change that now checks to see if the feature actually exists (for a compiled Result Spec). I thought it was better to preserve the status quo here, rather than remove this check (for performance reasons). It didn't seem like it would have any measurable performance impact - it's one hash table lookup, basically. Cheers. -Marshall Again, this doesn't seem like it is that big a deal to me but I thought I might just point out that it might have a use case. In practice, it seems to me that most annotators figure out the features available either during compilation by using the JCas or during the initialization of the Annotator. -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 3:57 PM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec LeHouillier, Frank D. wrote: We have an annotator that wraps a black box information extraction component that can return objects of a variety of types. We check the result specification to see if the object is something we want to output based the actual string of the name of the type. If you take away the compiled version of the ResultSpecification then we will have to also check whether the type that we get back from the type system is null or not. Hi Frank - This change would *not* take away the compiled version of the Result Spec. It would only change 1 behavior - that of returning true if a *feature* (not a type, as in your example above) was associated with a type where the capability was marked allAnnotatorFeatures, even if the Feature didn't exist. Suppose you had a type T1, and a type T2 whose super-type was T1, and features T1:f1 T2:f2, with an output capability = T1 with allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and feature T3:f3, and the output capability including T3 with allAnnotatorFeatures = false Here's the current behavior: Before compile: The following would all return true except as marked: containsType(T1) containsType(T2) returns false, T2 not in output capability, and before compile, T2 isn't recognized as a subtype of T1 containsType(T2:f2) returns false, not in output, etc. containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) yes... that's what it does - it ignores the actual feature name because allAnnotatorFeatures is true After compile the following return true except as marked: containsType(T1) containsType(T2) T2 not in output capability, but is recognized as a subtype of T1 containsType(T2:f2) T1's *allAnnotatorFeatures* is inherited containsFeature(T1:f1) containsFeature(T1:asdfasdfasdfasdf) false: the actual features are looked up After the change I'm proposing, everything would be same except that containsFeature(T1:asdfasdfasdfasdf) would return true. I don't think this would affect the way you are using result specs, but please let me know if I've misunderstood something. We don't want to impact users with this change. Thanks for your comments :-) -Marshall -Original Message- From: Marshall Schor [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 5:06 AM To: uima-dev@incubator.apache.org Subject: Re: capabilityLangugaeFlow - computeResultSpec The implementation for checking if a feature is in the result spec does the following: If the result-spec is not compiled, it says the feature is present if it specifically put in, or if its type has the allAnnotatorFeatures flag set. If the result-spec is compiled, it says the feature is present if it is specifically put in, or if its type has the allAnnotatorFeatures flag set and the feature exists in the type system. For performance / space reasons, I'd like to drop the 2nd case; this would have the consequence of changing the result
Re: A DOUBT IN UIMA
Hello - Can you post two things, to the uima-user list (this list is the uima-dev list, and this thread is off topic): First, the entire stack trace when you get the error. Second, the location of the ProductNumberAnnotator in your system. Thanks; with that information, we may be able to help you more. -Marshall chandra sekhar wrote: Respected sir , I am implementing nicholas chase paper on UIMA (Product Number Annotation). Sir , I am getting an error which is same as the error ,prescribed in this link of IBM. http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=138977tstart=0 The error I am getting is : com.ibm.uima.resource.ResourceInitializationException: Annotator class com.ibm.uima.tutorial.ProductNumberAnnotator was not found. I used UIMA_SDK_2_0_2_setupwin32 to install my UIMA . Mr .Lally specified a solution for UIMA version 2-0. *sir ,can you specify ,how to find a plugin , for my class com.ibm.uima.tutorial.ProductNumberAnnotator, I dont know how to find a plugin for a class. please help me sir. *regards sekhar.* *
Javadoc building - we have 2
There are 2 configurations in the POM for building Javadocs: one in the parent uimaj POM and one in the uimaj-distr POM. The thinking behind this was that the one in the uimaj-distr POM would run as part of the assembly process, and build Javadocs for the binary distribution, consisting of the external APIs. The one in the parent would run when doing a mvn site plugin, it includes more internal packages in the set of Javadocs being produced. The idea is that we could post the internal ones for developers to use/access on our web site. I don't think we're doing this, now, however. We do post the ones generated for the release, I think, instead. Can anyone confirm this? What should we do going forward? I see somewhat limited value to doing another set of developer javadocs, given that the developers have the source to work with. -Marshall
Some finds regarding maven eclipse svn
Maven was built to expect a hierarchical (not flat) project / sub project structure. There are many fixes to maven that focus on making it work for flat (e.g. Eclipse-like) project structures. But some things appear not to work properly. See e.g. http://jira.codehaus.org/browse/MRELEASE-261 which is Open and not being worked on. A main issue is whether or not Eclipse itself supports a nested, hierarchical project structure. Apparently, as of Europa (3.3) version, it doesn't quite, but one email post I found says: Europa supports having multiple projects in workspace that overlap : my root project contains all sub-projects as folder. Nico. SVN apparently has two different Eclipse plugin providers: Subclipse (which I've been using), and now there's a new, Eclipse-official one (in incubation), called Subversive. The new one apparently also supports some kind of hierarchical SVN operations, while Subclipse doesn't. From the above MRELEASE-261: Subclipse plugin for Eclipse can not handle nested projects in Eclipse at all, and from the dialogue on their list, do not intend to. As Subversive provides much better support for nested Subversion structures in Eclipse, and has since become the 'official' (or so I'm informed) Eclipse foundation Subversion plugin, we have moved to using Subversive and find that the Eclipse multi-project import-export plugin works pretty well. Note the impact analysis for the work of changing the release plugin to be more 'directory aware' was pretty good, 3 days would have it cracked I would expect (inc. ITs etc) -Marshall
more maven conventions
The eclipse:eclipse plugin will take the maven artifactId and use it for the Eclipse Plugin ID The artifact id we use are things like uimaj-ep-debug. The eclipse plugin id is different:org.apache.uima.debug Any objection to my changing the artifact IDs in the POMs for our Eclipse plugins to match the Eclipse plugin ids? Right now this is not a show-stopper, because we've disabled maven from altering the manifest. But in the future, if we converge toward a more conventional maven build, we may want to change this. -Marshall
building Eclipse plugins with Maven - some discoveries
I did an experiment where I configered the maven POM for one of our Eclipse plugins to let maven's eclipse:eclipse update the PDE Manifest. However, I found a bug in how it treated -SNAPSHOT - it turned 2.3.0.incubating-SNAPSHOT in the maven version into a Manifest entry 2.3.0.incubating.SNAPSHOT (changed the '-' to a '.') When I posted a patch to the maven-eclipse-plugin to fix this, I commented that the thing I patched was Deprecated. I got a quick reply saying it was, indeed deprecated - we should be using some OSGi tooling from Apache Felix. The comment on the Maven list says: you should look into the Apache Felix bundle plugin. It has a bundle:manifest goal that will generate the OSGi manifest, that's why the eclipse pluign class is deprecated Check Adding OSGi metadata to existing projects without changing the packaging type http://felix.apache.org/site/maven-bundle-plugin-bnd.html It looks like the tooling for generating OSGi bundles has advanced quite far, and may solve many of the difficulties we've had in building these kinds of things. Among the things it apparently can be configured to handle is our library plugin, containing jars from other places. I'll going to try and see if we can make this all work for our projects. If anyone else has insights, please post :-). -Marshall
It may not be possible to use the UIMA SOAP interfaces via the Eclipse run-time-plugin?
UIMA's Soap implementation depends on having the axis classes (from TomCat?) in its classpath. For normal UIMA deployments, this is accomplished by adding the needed Jar to the classpath. For Eclipse and RCP plugin environments, the user's plugin is depending on the uima-ep-runtime plugin, which has our SOAP implementation. Is it possible at run time to add to the classpath of the uima-ep-runtime plugin the axis Jar? If not, then I don't think our current Eclipse runtime plugin bundle supports users who want to use the Soap APIs. -Marshall
Re: It may not be possible to use the UIMA SOAP interfaces via the Eclipse run-time-plugin?
If this is indeed true, we probably should remove the uima-adapter-soap Jar from the runtime-plugin build, since I couldn't run anyway. If someone wanted to use SOAP within an OSGi bundle, they could always build their own runtime bundle, including our uima-adapter-soap jar plus the jars it depends on from axis. What do others think? -Marshall Marshall Schor wrote: UIMA's Soap implementation depends on having the axis classes (from TomCat?) in its classpath. For normal UIMA deployments, this is accomplished by adding the needed Jar to the classpath. For Eclipse and RCP plugin environments, the user's plugin is depending on the uima-ep-runtime plugin, which has our SOAP implementation. Is it possible at run time to add to the classpath of the uima-ep-runtime plugin the axis Jar? If not, then I don't think our current Eclipse runtime plugin bundle supports users who want to use the Soap APIs. -Marshall
making build/version info available and using in error messages
A patch I just committed in uima-as accesses version/build info from some private spot (in uima-as), probably for use in error messages. It seems to me that this capability would be generally of use in UIMA, and it would be good to have some standard way of including it in our error/log messages (or do we already ?). Is there a convenient source for this information? It seems it would be in the various manifests, etc. Is there a standard way to provide this? -Marshall
svn and jira not coupling?
It used to be that you could see in the Jira the svn commits that were done. Today when I looked in Jira, there were no SVN commits listed, even on old issues. Is it just my setup, or do others see this? -Marshall P.S. - it may have had something to do with the fisheye tab, which I clicked , just to see what it was - there was this error message: Error communicating with FishEye: java.io.IOException: repository not in correct state: Repository is stopped