Re: [VOTE] Release Apache OpenOffice 3.4.1 (incubating) RC2

2012-08-21 Thread Thilo Goetz
On 21/08/12 13:59, Branko Čibej wrote:
 On 21.08.2012 12:52, sebb wrote:
 I think the NOTICE problems are serious enough to warrant a respin.
 This is an unreasonable request. The IPMC voted on the 3.4.0 release.
 The notice file has not changed between 3.4.0 and 3.4.1. How then do you
 justify this new requirement?

Let me offer some advice from somebody who has been where you
are now.  Please keep in mind that the ASF is a large, volunteer
organization.  The backs and forths you are seeing here are
normal and probably can't be avoided in flat organization like
this.  This can be strange and/or frustrating to people who are
either paid to do their Apache work, or who come from smaller
organizations where it was easier to come to a decision.  Try
to keep a positive attitude, go with the flow, and become a part
of the wider Apache community (not just your project).  Help
improve things where you see they are lacking.  This community
aspect is very important at Apache.

As to the issue at hand, this is not a new requirement.  The
issue just wasn't spotted last time.  Yes, that's annoying, but
it can't be helped.  The NOTICE and the LICENSE files are the
most important files in your distribution, and you should make
every effort to get them right.  Sebb raises valid concerns that
need to be addressed.

Just trying to help here, so no flak my way please :-)

BTW, I think AOO is doing an amazing job.  I was not optimistic
when the project came to Apache, and I'm amazed you are where
you are now.  Keep up the good work.


 It is not fair to the podling if the IPMC invents new requirements and
 reverses its own decisions for no apparent reason. This NOTICE issue
 certainly shouldn't be ground for vetoing a release.
 By the way, the same holds for binaries being included in the releases.
 The 3.4.0 release, with binaries, was approved. If the podling did not
 change its release procedures and policies and artefacts in the
 meantime, it's not reasonable to hold up what amounts to a security
 release solely based on the IPMC having screwed up the previous release
 It is fair to require changes for the next release. It's not fair to use
 different criteria for two successive, essentially identical releases.
 (N.B.: I use the term essentially identical in the sense that, whilst
 some of the sources have changed, the overall structure of the release
 artefacts has not.)
 -- Brane
 To unsubscribe, e-mail:
 For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

Re: [VOTE] Release Apache OpenOffice 3.4.1 (incubating) RC2

2012-08-21 Thread Thilo Goetz
On 21/08/12 15:24, Rob Weir wrote:
 A suggested exercise at ApacheCon.  Get a group of 20 Members, break
 them into groups of 5.  Give each group an identical list of 3rd party
 dependencies and ask them to create a NOTICE file that expresses them.
  Give them 30 minutes.  Compare the results.
 I'd bet any amount that all four NOTICE files will differ in
 substantive ways, and that there would be disagreement, both within
 the groups, and across the groups, on which was correct.

Sure.  You can do the same exercise with 20 IBM lawyers with
similar results.  And still you need to get the approval of
IBM legal.


To unsubscribe, e-mail:
For additional commands, e-mail:

[OT] Hoover is not an adjective, was: Re: [VOTE] Release Apache opennlp-1.5.2-incubating-rc5

2011-11-16 Thread Thilo Goetz
On 15/11/11 03:22, Benson Margulies wrote:
 On Mon, Nov 14, 2011 at 9:20 PM, sebb wrote:
 On 15 November 2011 02:12, Benson Margulies wrote:
 That page is very misleading, and there was a long discussion of this
 topic elsewhere.

 Look at the example just above the requirement:

 The Apache Xerces XML parsing library is easily configurable and
 compliant with current standards

 Yes, in some sense, Xerces is an adjective there, but really what the
 senator is trying to say is to construct a particular noun phrase.

 AIUI, trademarks need to be used as if they are adjectives.

 For example:

 The Hoover vacuum cleaner is versatile
 The Hoover is a versatile vacuum cleaner
 I think I've just discovered that I'm too tired to type. I agree:
 that's an adjective. However, it's really very hard to make sense of
 the requirement without the example, either what you just sent or
 what's on the page.

See subject line.

Feel free to completely ignore this, but the linguist in me
couldn't let this go ;-).  You can verify this by trying to
stick another adjective between Hoover and the rest.  Just
like car in car door is not an adjective (compare green
car door vs. car green door), Hoover is not an adjective
in Hoover vacuum cleaner (and neither is vacuum).

This might explain why people in a project like OpenNLP are
confused by that sentence on the trademarks page.


To unsubscribe, e-mail:
For additional commands, e-mail:

Re: [OT] Hoover is not an adjective, was: Re: [VOTE] Release Apache opennlp-1.5.2-incubating-rc5

2011-11-16 Thread Thilo Goetz
On 16/11/11 16:39, sebb wrote:
 On 16 November 2011 14:32, Thilo Goetz wrote:
 On 15/11/11 03:22, Benson Margulies wrote:
 On Mon, Nov 14, 2011 at 9:20 PM, sebb wrote:
 On 15 November 2011 02:12, Benson Margulies wrote:
 That page is very misleading, and there was a long discussion of this
 topic elsewhere.

 Look at the example just above the requirement:

 The Apache Xerces XML parsing library is easily configurable and
 compliant with current standards

 Yes, in some sense, Xerces is an adjective there, but really what the
 senator is trying to say is to construct a particular noun phrase.

 AIUI, trademarks need to be used as if they are adjectives.

 For example:

 The Hoover vacuum cleaner is versatile
 The Hoover is a versatile vacuum cleaner

 I think I've just discovered that I'm too tired to type. I agree:
 that's an adjective. However, it's really very hard to make sense of
 the requirement without the example, either what you just sent or
 what's on the page.

 See subject line.

 Feel free to completely ignore this, but the linguist in me
 couldn't let this go ;-).  You can verify this by trying to
 stick another adjective between Hoover and the rest.  Just
 like car in car door is not an adjective (compare green
 car door vs. car green door), Hoover is not an adjective
 in Hoover vacuum cleaner (and neither is vacuum).
 Not sure possible word order is relevant/conclusive.

It's called a distributional test and it's at least
a pretty good indication.

 This might explain why people in a project like OpenNLP are
 confused by that sentence on the trademarks page.
 The point is that the mark must be *used as* an adjective.

A noun is a noun is a noun.

 I think this was previously called an adjectival noun, it now seems to
 be called noun adjunct:
 What do you suggest the trademarks page should say in order to make it 

I don't know, I admit I don't really understand what it is
supposed to mean.  The page seems to contradict itself just
a few lines below where it says:
the Apache Foo project releases a software product called Apache Foo

If I understand this right, Apache Foo must not be used to denote
a specific thing (in this case the product), and it must always be
made clear what object we're talking about.  I.e., the Apache Foo
product, released by the Apache Foo project and voted on by the
Apache Foo PMC.  Or something like that.


 To unsubscribe, e-mail:
 For additional commands, e-mail:

 To unsubscribe, e-mail:
 For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

web space of graduate projects on people

2010-04-19 Thread Thilo Goetz
Hi all,

I just deleted the old webspace for UIMA on people, following
the instructions:

I noticed that quite a few graduated projects still have a
copy of their website on people, although they redirect
traffic to their new top-level site.  It may not be much,
but let's clean up after ourselves here in the incubator :-)

Here's a list of the projects, in no particular order
(didn't just want to delete other people's stuff).




To unsubscribe, e-mail:
For additional commands, e-mail:

Re: [DISCUSS] Changing poddling release voting process

2009-08-21 Thread Thilo Goetz
J Aaron Farr wrote:
 On Fri 21 Aug 2009 14:58, ant elder wrote:
 What do people think about changing the poddling release voting
 process so that there is just a single vote which is held on the
 poddlings dev list instead of the dual voting we have now with a
 poddling dev list vote followed by an general@ vote? 
 My only concern is that we have some very great volunteers here on the
 general@ list who check release compliance and I doubt they want to deal
 with the traffic of every podling dev list.

Also, the release votes here are a great way for other podlings
to learn about the release process.  I learn a lot from the
discussions.  Particularly while the process is so fuzzy, it
helps if you can go to one central mailing list to find out
about past votes.  One might say it shouldn't be like this, but
I think this change should not come before release policies are
a bit more settled and repeatable in the incubator.

So while I can see your point, I would prefer to keep the voting
and the discussion at gene...@i.a.o.


To unsubscribe, e-mail:
For additional commands, e-mail:

Re: Diversity as an insurance policy (Was: [VOTE] Graduation of Apache Pivot)

2009-08-05 Thread Thilo Goetz
Jukka Zitting wrote:
 On Wed, Aug 5, 2009 at 3:39 AM, Ralph wrote:
 Using these projects as an example is perhaps not the best from a community
 perspective because Ceki has no intention of running them like Apache
 projects. But even if he did, by these standards the projects might never
 make it out of the incubator. Even if those of us who would like them had
 commit rights I can guarantee that 95% of the commits would still be Ceki's.
 I don't see it as a problem if the vast majority of commits comes from
 one person (or company) as long as the community operates normally
 *and* there are others who won't have to start learning how to build
 the codebase and do an svn commit if the key developer leaves.
 That's why I measure the three independent committers criteria by
 looking at the commit log instead of the asf-authorization file. And
 I'm not asking much, just a few code commits in the past few months is
 good enough for me.
 That's the criteria that I held Sling against, and that's also what's
 currently keeping UIMA from graduating (and apparently also for over a
 year before I signed up to help them). If the consensus is that this
 is a bit too hard a requirement, then I'll be happy to bring UIMA up
 for graduation in the next few months.

Indeed.  If this is the way the wind is blowing right now, we're
more than ready for graduation.


 Jukka Zitting
 To unsubscribe, e-mail:
 For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

Re: [VOTE] apache-rat-project 0.6rc3

2009-04-02 Thread Thilo Goetz
Jochen Wiedmann wrote:
 On Thu, Apr 2, 2009 at 3:01 PM, sebb wrote:
 Ok, that brings up the question whether these three votes are sufficient.

 I can add +1, with the proviso that the (C) year must be corrected for
 the next release.

 I don't know if my +1 counts though.
 Thanks, the change for the year was already made in the trunk.
 But I still do not know the incubator rules. Which people are
 considered binding? How many binding votes are required?

A list of people on the Incubator PMC is here:

You need 3 +1 votes.


To unsubscribe, e-mail:
For additional commands, e-mail:

Re: Starting a new incubation

2009-03-17 Thread Thilo Goetz
Alexander Veit wrote:
 Dear all,
 I would like to start an incubator project at the Apache Software Foundation
 with a Java library (let's call it Jaffre) I've written. 
 Jaffre is a lightweight RPC library for the Java platform.
 It is designed to be simple, extensible, robust, and efficient with
 no required dependencies other than a JRE. Currently it supports
 transport over insecure or encrypted TCP channels. It supports sessions
 and can be customized so that calls are performed as a particular session
 dependent subject.
 Implementing a service is as simple as defining a service interface,
 implementing it in a pojo, and registering that pojo as a service
 endpoint. The service is exposed to clients through one ore more
 connectors. Clients call the service through proxy classes.
 As far as I understood the Apache guides and this mailing list, the first
 requirement to start an incubation is to find a champion. If this is
 correct, I would herewith like to announce the search for a champion.
 If I'm wrong, I would appreciate any advice how to proceed.
 Alexander Veit

Hi Alexander,

there are two things I'd advise you to do.

One, put up a project proposal in the incubator wiki:
You can also look at older proposals there.  There is
no need to do this all at once, you can build your
proposal incrementally (adding a champion etc.).

Two, it sounds like your project is related to the
web services world.  So check out the projects under
the Apache WS umbrella (, and
also Apache CXF.  Get on their mailing lists and talk
to them.  They may champion you proposal, or have some
other suggestions.


To unsubscribe, e-mail:
For additional commands, e-mail:

Re: UIMA [WAS Re: Suspending Projects]

2009-02-20 Thread Thilo Goetz
Niclas Hedhman wrote:
 On Fri, Feb 20, 2009 at 6:58 AM, Robert Burrell Donkin wrote:
 We should probably try to find the collective energy to review UIMA
 before the project's enthusiasm is sapped.
 That sounds like a healthy observation. My Q for the community is; Do
 you have a healthy and diverse set of users? If so, have the UIMA team
 looked at What is stopping these users from becoming contributors? ?

Yes, we do have a healthy and diverse user community.  We have
racked our brains what we could do to attract more community
contribution.  We've created a sandbox to facilitate the inclusion
of experimental technology.  There's been some uptake, but not
enough.  Some of us are working on scale out via JMS and are
hoping to attract contributions in that area.  We've started
discussions and suggested things for people to work on.  I don't
know, maybe we're going about this the wrong way.

My pet hypothesis (or maybe I'm just looking for excuses) is
this: UIMA is heavily used in academia.  Now academics have no
problems with open source, to the contrary.  But they have an
overwhelming need to publish and build up a reputation.  So
they like to publish their source code on their own web site,
where it's clear it's their work, rather than contribute to
some community effort.  If you look around, you'll see all
manner of university efforts around UIMA, but very little of
that code finds its way back into the ASF repo.

Enough whining.  If you have any suggestions, we'll be happy
to hear them.


 I could imagine a whole range of reasons, and if that is 'fixed' the
 diversity comes with it...
 If there is not a user community, then I would be concerned to
 graduate the project with the large set of single-employer committers.

To unsubscribe, e-mail:
For additional commands, e-mail:

Re: UIMA [WAS Re: Suspending Projects]

2009-02-20 Thread Thilo Goetz
Niclas Hedhman wrote:
 On Fri, Feb 20, 2009 at 4:31 PM, Thilo Goetz wrote:
 snip reason=user community reported healthy /
 My pet hypothesis (or maybe I'm just looking for excuses) is
 this: UIMA is heavily used in academia.  Now academics have no
 problems with open source, to the contrary.  But they have an
 overwhelming need to publish and build up a reputation.
 Does this have to mutually exclusive??
 Can you perhaps contact these individuals one-by-one and get their
 view on how they see it. Perhaps ask them to publish their work in
 parallel, and perhaps the 'OSS bug' will bite a couple of them??

I have talked to quite a few of them, and I'll keep hitting on them :-)

 I don't follow UIMA personally, so I am a bit short on other
 suggestions. Perhaps others have better ideas?

To unsubscribe, e-mail:
For additional commands, e-mail:

Re: Suspending Projects

2009-02-19 Thread Thilo Goetz
Jukka Zitting wrote:
 On Mon, Feb 16, 2009 at 7:56 PM, Noel J. Bergman wrote:
 I have put XAP's status to a vote.  What others do people feel should be
 considered?  I have several in mind, but want to hear from the Mentors and
 the Community as a whole.
 I'm not a mentor of Lokahi, but I hang around there. The project
 started incubation almost three years ago and has been idle for about
 a year now. I recently pinged them about status (see the thread at
 [1]). There's an idea (I'm not sure how substantial) of Tomcat
 adopting at least parts of the codebase, but apart from that it looks
 like the project would be ready for retirement.
 River is another long-term incubating project that seems to have lost
 its energy. However, there's still some remaining activity and as a
 mentor I'm putting some effort there, hoping to achieve a similar
 revival as we've seen in PDFBox. Niclas is another active mentor. So
 for now we'll keep trying.
 The UIMA project on the other hand seems to have been ripe for
 graduation for a long while already. They seem to be (unnecessarily?)
 stuck on the diversity issue. I believe they're too comfortable in the
 incubator, but perhaps the mentors have a better picture.

I'm not sure comfortable really fits the picture.  We would love
to get out of the incubator.  The effort to get 3 Incubator PMC
folks to review our releases every time is big enough to make it
desirable on that account alone.  I also think that the fact that
we're still stuck in the incubator is putting a bit of a damper on
our enthusiasm for the project.

As Jukka says, our main (or maybe only) challenge is diversity.  We
currently have 8 committers, only one of who is not from IBM.  From
the discussions on this list, that's hardly diverse enough.  However,
we'll keep trying.  And if anybody feels like helping out, you're
always welcome.


To unsubscribe, e-mail:
For additional commands, e-mail:

Re: How to start a new workflow project on ASF

2009-02-04 Thread Thilo Goetz
sasaboy wrote:
 Dear all,
 we are hosting Enhydra Shark workflow engine project on ObjectWeb
 ( and would like to move it to Apache.
 We would like to start an incubator project at ASF. How should we proceed?
 As we read, the candidate project should be approved by a sponsor to enter
 the incubator.
 A little bit about Enhydra Shark:
 It is the most popular open-source java workflow engine completely based on
 WfMC standards.
 It uses XPDL1.0 as its native workflow process definition language.
 It can be embedded into other Java applications (Swing, Console, ...) or can
 be used as a server through WebService, EJB, CORBA, ...
 Shark's modular, plug-in architecture makes shark suitable for integration
 into different kind of projects.
 It is a mature project started 5years ago, has a significant community and
 more than 10 downloads on ObjectWeb. It is widely used all around the
 world in production, typically integrated into different kind of
 applications, from DocumentManagement and government applications up to
 applications for special purposes like bank services, HR, Help desk, ... and
 in many, many others
 We would take out the core engine's functionality and WebService wrappers
 from existing Shark project and would continue development on Apache.
 The goal of the Shark project on Apache would be to support XPDL2.1.
 How do we proceed? Thanks for any input.

My personal, non-official advice:  take the time and go through the
archives of this mailing list for the past 6 months or so.  You'll
find the answer to your question, and tons of information that will
be very useful to you down the road (not least of which, you'll get
a feel for how things are done around here).



To unsubscribe, e-mail:
For additional commands, e-mail:

Re: please re-generate website after updating podling status pages

2008-12-04 Thread Thilo Goetz

David Crossley wrote:

Author: twgoetz
Date: Wed Dec  3 02:10:44 2008
New Revision: 722831

Update status: new committers Tong and Jerry


People seem to be forgetting to re-generate the web site
after updating source docs.

See instructions here:

Anyway, i followed up today for UIMA.

Thanks David.  I'll remember next time.  I guess
I thought there was some magic that would do it
for me ;-)

BTW, I updated the status page because I saw from
the clutch report that it was out of date.  Great



To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: AW: [VOTE] apache-empire-db-2.0.3-incubating and apache-empire-struts2-ext-1.0.3-incubating release

2008-09-08 Thread Thilo Goetz

Jörg Reiher wrote:
I've got the following questions regarding the KEYS file and the md5+sha hashes for a release:

There is actually a KEYS file located at
Is there something wrong with it or do we have to put it in any other place?

I used gpg to generate the MD5 and SHA hashes. The output of gpg differs from 
what would be produced by md5sum, because gpg formats the values with blanks 
between each HEX-String, md5sum doesn't.
Is this a problem? If so, the Apache guideline should take this into account 

It can be a problem, see
And yes, the release signing FAQ should be updated to reflect this.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal - Question

2008-08-20 Thread Thilo Goetz
Hi César,

Apache already hosts a couple of text related projects where your
proposal might fit in.  Mahout is a project for machine learning
on Hadoop, and I think they already have text categorization.
Another text related project is UIMA, which could also use a text
categorizer.  Not sure if Lucene also has a text categorizer, but
I'm sure they could use one.

I'd encourage you to check out these projects and see if you want to
contribute to one of them.  You may find that a text
categorizer is somewhat small in scope to be an Apache project of
its own, what with the necessary community building etc.


Cesar D. Rodas wrote:
 Hello to all,
 My name is César Rodas, from Paraguay, I'm newbie in this mail list, so my
 question may be recursive and quite stupid with a simple answer, so I ask
 I have a project, which I haven't start  coding yet but I will start ASAP.
 Basically it will be a Text Categorizer (Apache TextCat is a good name,
 right?), that will be topics and language independent, that will learn by
 I was thinking to build it in C using APR, and I planning to build it very
 modular, and really easy to extend. You may be wondering why C instead of
 Java, and the answer is quite simple, I want the project run faster, and
 that it can embedded, and wrapped from other languages, PHP, Python, Perl,
 Java, etc. This is only my opinion.
 Further technicals details will be explained into my proposal.
 My question is, do I need to have something working to propose the project
 to the Apache Incubator?, or I can propose a project that I'm planning to
 Also, will be great if the folk can say what you think about this project?,
 Will it be useful?
 Kind Regards,
 P.D: As you can see,I can't write a perfect English, since I'm not a native
 English speaker.

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Apache CouchDB 0.8.1-incubating release

2008-08-14 Thread Thilo Goetz


here's a list of the IPMC members:

It's not always up to date, so there can be false
negatives, but you can safely assume that whoever
is listed there has a binding vote.  Both Gianugo
and Ant are on the list :-)


Noah Slater wrote:

On Wed, Aug 13, 2008 at 08:10:05PM +0200, Gianugo Rabellino wrote:



To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Release Apache UIMA CasEditor-2.2.2-incubating

2008-07-15 Thread Thilo Goetz

Luciano Resende wrote:

The binary distributions, both windows and linux, have the in the plugins directory and the
about_files/license.html from the ICU jar mention the following :

ICU License - ICU 1.8.1 and later
Copyright (c) 1995-2007 International Business Machines Corporation and others
All rights reserved.

But I haven't found any attribution on License/Notice files.

Argh, you're right, we missed ICU.

Everything else ok, or did you stop looking when you found
this issue?


On Tue, Jul 15, 2008 at 10:46 AM, ant elder [EMAIL PROTECTED] wrote:



On Thu, Jul 3, 2008 at 10:52 AM, Jörn Kottmann [EMAIL PROTECTED] wrote:


the Cas Editor is a text annotation tool which supports manual and
annotation of CAS files. It is now ready for its first release.
Please review and vote for releasing the Apache Uima Cas Editor-2.2.2-02.

The release artifacts and the rat reports can be found at:

On the uima-dev list there were 5 +1 votes and no 0 or -1 votes to release

The Cas Editor originally came to Apache under a software grant.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve hotfix 1 for Apache UIMA v2.2.2-incubating

2008-07-10 Thread Thilo Goetz


this is just a small hotfix.  Could some kind
IPMC souls please take a look?  Thanks!


Michael Baessler wrote:

The Apache UIMA committers ask the Apache Incubator PMC for permission to 
publish a
hotfix for the UIMA core release v2.2.2-incubating.

The hotfix release fp1 contains fixes for two UIMA memory issues of release 
2.2.2-incubating. The release just ships the jar that has been modified and a 
readme that contains
the hotfix information. LICENSE, NOTICE, and DISCLAIMER files are inside the 

The fixes made for this hotfix are based on a UIMA v2.2.2 branch with a 
separate tag in SVN.
Release tag:
Hotfix source:

We had a vote on uima-dev that resulted in 6 binding +1s
(all the committers) and no 0s or -1s.  The vote thread is here:[EMAIL PROTECTED]/msg07527.html

Please review the release candidate at:
The package is just provided as zip since all the content (one jar) is platform 

The KEYS file can be found in the SVN at:

Please vote:
[ ] +1  Accept to release Apache UIMA 2.2.2-fp1-incubating
[ ] -1  No, because


-- Michael

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Release Apache UIMA CasEditor-2.2.2-incubating

2008-07-03 Thread Thilo Goetz

Jörn Kottmann wrote:


the Cas Editor is a text annotation tool which supports manual and 

annotation of CAS files. It is now ready for its first release.
Please review and vote for releasing the Apache Uima Cas Editor-2.2.2-02.

The release artifacts and the rat reports can be found at:

On the uima-dev list there were 5 +1 votes and no 0 or -1 votes to 
release it.

The vote thread is here:

The Cas Editor originally came to Apache under a software grant.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Champion wanted for Empire-db project

2008-05-20 Thread Thilo Goetz

I would start by asking on the general
Apache DB mailing list:


Rainer Döbele wrote:

Hello everyone,

We would like to donate our relational data persistence component called 
Empire-db to the apache software foundation. According to the Incubator 
documentation we need to find a Champion who can help us with the incubation 

Even though with iBatis, JDO and Torgue there already are several database 
related projects, we think our Empire-db could be a great extension to the ASF 
project list as it works considerably different in many ways. Most of all 
Empire-db uses a Java Object Model rather than XML-Mappings or Annotations to 
describe the database schema. Therefore all Tables, Views, Columns and so on 
are Objects which can be referenced from the code to dynamically build SQL 
commands or to access Data model metadata - entirely without the need to 
provide string literals. This in turn significantly increases 
compile-time-safety and simplifies testing and maintenance. Additionally uses 
benefit from the IDE's code completion when building column transformation 
expressions or constraints.

Empire-db is a mature project which has been proved its capabilities in many 
medium to large scale projects. However we have only recently made it Open 
Source under Apache 2.0 License.

There is a lot more information on the Empire-db website on

If anyone would like to act as a Champion or knows someone who could be a 
suitable Champion we would be happy to get his/her contact details and further 

Best regards,

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: maven repository

2008-05-15 Thread Thilo Goetz

Craig L Russell wrote:

On May 15, 2008, at 4:34 AM, Robert Burrell Donkin wrote:

On Thu, May 15, 2008 at 8:04 AM, Brett Porter [EMAIL PROTECTED] 

2008/5/15 Robert Burrell Donkin [EMAIL PROTECTED]:

It would be possible to create an incubator only repository in a
subdirectory, say. Or we could
just simplify everything by allowing incubator projects to use the
standard repository.Opinions?

IIUC, there are two differences between an incubating project depositing 
its artifacts into a special incubating repository versus the standard:

1. The incubating repository is not mirrored to the world, so incubating 
artifacts don't pollute the maven-o-sphere.

2. The incubating repository needs to be added to the maven remote repo 
definition of each project that wants to depend on an incubating artifact.

I think both of these have minor positive effects. So I'm really +0.3 on 
using a special incubating repository.

That's where opinions differ on whether this is a positive
effect.  One might argue that incubator releases go through
a very thorough release screening process, and there's no
reason to make them so hard to get afterwards; certainly
not from the point of view of the incubating projects who're
trying to build community around their code.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hama proposal, was: [VOTE] Accept Hama into the Incubator

2008-05-14 Thread Thilo Goetz

Edward J. Yoon wrote:

== Core Developers ==

The initial set of committers includes folks from the
[ Hadoop]  [
Hbase] communities. We have varying degrees of experience with
Apache-style open source development, ranging from none to ASF

Hi Edward,

can you elaborate on who of the initial committers are ASF
members?  Maybe you meant committers?

Also, a mentor needs to be an Incubator PMC member.  Not sure
any of your proposed mentors are.  If they're Apache members,
they can just ask to become IPMC members.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Hama proposal, was: [VOTE] Accept Hama into the Incubator

2008-05-14 Thread Thilo Goetz

Edward J. Yoon wrote:

Also, a mentor needs to be an Incubator PMC member.  Not sure
any of your proposed mentors are.  If they're Apache members,
they can just ask to become IPMC members.

Oh... Thank you for your advice.
Then, Should we ask to become IPMC members before mentoring and delay the vote?

I think you'll find it difficult to obtain 3 binding +1s until
you have enough (at least 2) Mentors.  If you, Jeff and Ian
already are Apache Members this will be very easy and there's
no reason to delay the vote, IMHO.

And just to be perfectly clear, when I say Apache Member, I
mean in the sense of

Description of the Mentor role is here:


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.2-incubating

2008-04-25 Thread Thilo Goetz

Martijn Dashorst wrote:

The time people have to spend chasing bugs because someone doesn't
have the right version on the classpath (or possibly multiple versions
without knowing it) can be better spent on solving actual bugs,
writing documentation or implementing new features.

Or by reviewing our release and voting on it :-)

Seriously, we're having interesting debates here about
version numbers in jar file names and svn eol styles.
And we are taking these things to heart.  However,
at the same time, we *would* like to release UIMA
2.2.2 now, and we're 1 (one) vote short.  Anyone?


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.2-incubating

2008-04-23 Thread Thilo Goetz

Hi Niall,

Niall Pemberton wrote:

Firstly, I'm +1 on this release. I have a few minor
comments/suggestions which you may want to consider for the next
release (or not!)

1) Theres a parent pom for apache which if you make the parent of the
uimaj pom means you don't have to duplicate the license and
organization details in your poms:

sounds good, we'll look into it.

2) The source distro unzips to directory uimaj-2.2.2-incubating and
the binary distro to apache-uima - which is inconsistent.

I don't recall if there was a reason for this, but I agree,
seems odd.  We'll follow up on uima-dev.

3) IMO its better if the jars include the version number - which they
do for the maven repo, but not the ones in the binary distro

I personally agree with you, but we had a long discussion about
this and the no version numbers in jar names faction carried
the day.  I believe the main reason is that it makes upgrading
to a new version easier, or switching between versions for
testing purposes.



To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: More timely release reviews? (was: Re: [VOTE] Approve release Apache UIMA 2.2.2-incubating

2008-04-22 Thread Thilo Goetz

Niclas Hedhman wrote:

On Monday 21 April 2008 17:46, Paul Fremantle wrote:

the mentors to be the
primary contacts for any podling.

+1. With 3 active mentors, you practically need no other PMC members to wake 
up (let me sleep ;o) ). If you don't have 3 active mentors, then that IS a 
problem of concern, and should be fixed asap.

Does the mentors know that they are mentors??

They do, and they have reviewed our previous releases.  They do seem
even busier than usual, though...

I haven't seen Ken around for a long time, and I guess Sam is really busy with 
his duties as Secretary. So perhaps, UIMA should ask the PMC for 3 new 

I think an additional mentor or two would be really
helpful.  First of all, we only have 2.  Secondly,
we've not been making the kind of progress we'd like
to make.  A fresh view would be very welcome.

That's my personal opinion, other UIMA devs please chime


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.2-incubating

2008-04-17 Thread Thilo Goetz

Folks, just a quick reminder that this vote is still
open.  Please vote if you can find the time to look
at our release.  Thanks.


Michael Baessler wrote:

The Apache UIMA committers ask the Apache Incubator PMC for permission to 
publish a new bug fix
release of Apache UIMA version 2.2.2. This release contains bug fixes of for 
release version 2.2.1
that was published in December 2007. For details about the fixes, please have a 
look at the release

We had a vote on uima-dev that resulted in 6 binding +1s
(all the committers) and no 0s or -1s.  The vote thread
is here:[EMAIL 

Please review the release candidate here:

There are subdirectories like:
/bin - contains the binary distribution files
/src - contains the source distribution files
/rat - contains the RAT reports (using RAT 0.5.1) with some comments

The SVN tag for this release candidate is:

The KEYS file can be found in the SVN at:

Please vote:
[ ] +1  Accept to release Apache UIMA 2.2.2
[ ] -1  No, because


-- Michael

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: guidance needed: looking into writing a proposal

2008-04-15 Thread Thilo Goetz

The proposal guide is here:

One way to find a champion is to start writing your
proposal on the incubator wiki, and notifying this
list.  As people understand what it is you want to
do, they may step up to the task.  Starting the
proposal on the wiki is not the same as officially
submitting it.


Angela Cymbalak wrote:
I am happy to write up the proposal.  The more I read and think about 
this project the more I really think that it would be a great one for 
Apache to undertake if the appropriate community can be built around it.

The only thing holding up my proposal writing is my concern for the 
process.  The process implies that one needs to have a Champion *before* 
writing the proposal and finding the Mentors.  I am aware of whom I can 
ask to be a Champion but I don't want to overstep my bounds.  There are 
two existing projects that I believe a photo gallery would compliment.  
Is a Champion required before submitting the proposal?

Also, did I miss finding an FAQ on the Proposal process?  I checked the 
Wiki and the Incubator Web site.  I don't want to post a lot of 
questions that may be answered easily elsewhere.


At 11:08 AM 4/14/2008, Robert Burrell Donkin wrote:
On Mon, Apr 14, 2008 at 3:32 PM, Andrus Adamchik 
 Also even if some other proposals got canned, this doesn't this one 
 get accepted. The best thing to do is to submit a proposal following 
one of
 the existing templates and let the people on this list comment on 
it, and

 see if you can find mentors here.


it's community, not code

(the acceptance process is also somewhat subjective which introduces
an element of chance)

- robert

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Report reviews

2008-04-15 Thread Thilo Goetz

Jukka Zitting wrote:


I reviewed the submitted project reports for April. I'm not sure if
people are actively following the page or the diffs, so here's a
summary of my review.


* CouchDB
* PDFBox (I'm a mentor)
* Tika (I'm a mentor)

OK with comments:

* CXF - Good luck for the TLP!
* Imperius - Issues before graduation?
* JSPWiki - Issues before graduation?
* Qpid - Did you understand the IPMC concerns about diversity? The
report seems to indicate otherwise. Also missing: Qpid is ...?
Incubating since?
* Sanselan - Issues before graduation?
* Shindig - Issues before graduation?
* UIMA - Too comfortable in the Incubator? Focus on growing the
community and on graduating!

Not at all comfortable, to the contrary ;-)  We're trying
as hard as we know how to grow our development community.
(The user community is doing fine, btw).  We realize that
the UIMA core may be a bit hard for people to get into, so
we've been trying to start things on the periphery, such as
analysis components and tooling, to get people to help.

We're open to suggestion on what we could do to attract
additional developers.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: guidance needed: looking into writing a proposal

2008-04-14 Thread Thilo Goetz

Angela Cymbalak wrote:


I am looking into the possibility of writing a proposal to have some 
code included as a podling within the Incubator project.  I have been 
reading the Incubator Web site and I believe that I have a pretty good 
idea of what I would need to do if I wanted to contribute my code.  The 
code that I have written is a photo gallery along the lines of Flickr.  
Instead of being written in PHP, it was written in Java.  Currently, I 
am the only person who has worked on the code.

I was looking for an area of the Incubator Web site that would list 
rejected proposals.  It occurs to me that a project like this may have 
been discussed previously and voted against.  I did not see a project 
like this among any of the existing Apache projects; although, I did see 
a few suggestions for projects such as Roller to include a gallery.

Did you look at the incubator wiki?  There's list of proposals
there, most of which got accepted though:

I don't remember a project like yours being proposed in the


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: April incubator report for JSPWiki

2008-04-13 Thread Thilo Goetz

Craig L Russell wrote:

2008-April JSPWiki Incubator status report

JSPWiki has been incubating since September 2007.

JSPWiki is a JSP-based wiki program.

During the past three months since our last report, the JSPWiki 
community has
grown nicely.  Currently over 120 people are monitoring the user mailing 
and over 50 on the dev list.  Apache JIRA has been integrated to our 

and the influx of patches seems to be growing steadily.

The JSPWiki development has been moved to the Apache SVN, and almost all 

has been relicensed under the Apache 2.0 license.  A few files still remain
under LGPL, mostly pending CLAs which have not yet found their way to ASF.

During the past three months, the 2.6 LGPL branch has seen two bug fix
releases using our old release mechanism, and the development for the
2.8 release is currently ongoing in the SVN trunk.

Craig Russell
Architect, Sun Java Enterprise System
408 276-5638 mailto:[EMAIL PROTECTED]
P.S. A good JDO? O, Gasp!

Maybe JSPWiki could be added to the reporting schedule on the incubator
wiki, and the reports as well?  The reporting schedule is here:


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-11 Thread Thilo Goetz

sebb wrote:

On 11/04/2008, Thilo Goetz [EMAIL PROTECTED] wrote:

sebb wrote:

The SVN tag

has lots of missing SVN eol-style settings. See the file

This should probably be applied to trunk as well ...


To unsubscribe, e-mail:


For additional commands, e-mail:

 Hi Sebb,

 thanks for looking over our release.

 There are a lot of files in your list where not setting
 the eol-style property is intentional: all our test files.

Which extensions are these?
I can change my script to treat these differently.

.txt mostly, some .xml.  So I think one needs to handle this
on an individual file level.

 Setting eol-style:native would make our tests fail on one
 platform or another as they're usually compared to some
 expected output, which in turn depends on the exact byte
 content of the files.

 Unfortunately, there is no (valid) eol-style:none
 or such that allows us to make this intention explicit.

In which case, the tests may fail to work on OSes with a different
line ending, unless you set the mime-type to binary.

I don't understand that remark.

 For the java code we could set it to native.  We just never
 felt the need.  Since we need to be careful with our test
 files, we don't follow the automatic eol-style client setup
 as recommended.  AFAIK, all UIMA developers use Eclipse
 for their development, and Eclipse doesn't care about
 eol style (or not that I noticed anyway).

No it doesn't mind. But SVN does.
If you edit a Java file on Unix and commit to SVN, then someone who
edits it on Mac or Windows and commits to SVN will generate an SVN
diff which shows the whole file has been changed. Makes it very
difficult to see what has actually changed. Likewise for pom.xml etc.

True.  We try to avoid that ;-).  Although most of us work on windows,
we use unix style eol chars for all source code.

 I hope you'll agree that it's up to the project to set an
 eol-style policy.  Our policy is not to set the property
 unless it's required (e.g., for .sh or .bat files).

Indeed, but see also:

These conventions are generally used by Java projects, e.g. all of Commons.

Yes, and they don't work for us, as I pointed out earlier.

There are also settings in there that I find rather doubtful.  What
is the point of having eol-style for .bat files set to native?

So how do you create a distribution?  To my mind, it shouldn't matter
if you extracted the code on linux or windows.  The distribution should
come out the same and work on both platforms.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.2-incubating

2008-04-11 Thread Thilo Goetz

Daniel Kulp wrote:

On Friday 11 April 2008, Adam Lally wrote:

On Fri, Apr 11, 2008 at 9:10 AM, sebb [EMAIL PROTECTED] wrote:

On 11/04/2008, Michael Baessler [EMAIL PROTECTED] wrote:
  sebb wrote:
Problem building uimaj:
1) javax.activation:activation:jar:1.0.2

 Unfortunately only the POM and metadata are present in the M2 repo
for that version - the jar is not present. I raised a JIRA issue for

This is because Sun's license prevents this jar from being
redistributed from the central Maven repository:

In the build instructions on the UIMA website, we describe how to
obtain the jar yourself and add it to your local Maven repo:

Any particular reason you don't exclude this dependency in you poms and 
then pull in a version that IS available.  Either the 1.1 version from

or the geronimo-specs version available at central:

(In general, I prefer the geronimo specs versions, but it doesn't really 

If that works, it'll be vastly preferable to the manual do-hicky
we have now.  We'll follow up on uima-dev.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-11 Thread Thilo Goetz

Daniel Kulp wrote:

On Friday 11 April 2008, Thilo Goetz wrote:

Indeed, but see also:

These conventions are generally used by Java projects, e.g. all of

Yes, and they don't work for us, as I pointed out earlier.

There are also settings in there that I find rather doubtful.  What
is the point of having eol-style for .bat files set to native?

So a unix person can edit it without leaving lines that don't have the 
cr/lf (or have to see ^M marks all over the place).   I do all kinds of 
edits to bat files from my Linux box.  However, if they get committed 
with mixed styles, some versions of windows complain loudly when you 
try to run them.

Ok, I'll take your word for it ;-)

So how do you handle releases, as I asked in a different mail
in this thread?  If you now extract the code on unix, you have
.bat files with unix eol chars.  I don't think the windows shell
handles that.  Same for .sh files, just the other way around.
I'm sure people have a solution for that, but I don't see it.

[In case this is not clear: we create one distribution with both
.sh files and .bat files.  The distro should work correctly on
unix and windows.  Just so we're all on the same page.]


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-11 Thread Thilo Goetz

sebb wrote:

On 11/04/2008, Thilo Goetz [EMAIL PROTECTED] wrote:


 True.  We try to avoid that ;-).  Although most of us work on windows,
 we use unix style eol chars for all source code.

That probably annoys the Mac Users...

We have Mac users (and developers), and they haven't
complained yet.  Maybe they're used to trouble, I
don't know.  Unfortunately, I don't own a Mac :-(


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-11 Thread Thilo Goetz

sebb wrote:

On 11/04/2008, Thilo Goetz [EMAIL PROTECTED] wrote:

Daniel Kulp wrote:

On Friday 11 April 2008, Thilo Goetz wrote:

Indeed, but see also:

These conventions are generally used by Java projects, e.g. all of

Yes, and they don't work for us, as I pointed out earlier.

There are also settings in there that I find rather doubtful.  What
is the point of having eol-style for .bat files set to native?

So a unix person can edit it without leaving lines that don't have the

cr/lf (or have to see ^M marks all over the place).   I do all kinds of
edits to bat files from my Linux box.  However, if they get committed with
mixed styles, some versions of windows complain loudly when you try to run
 Ok, I'll take your word for it ;-)

 So how do you handle releases, as I asked in a different mail
 in this thread?  If you now extract the code on unix, you have
 .bat files with unix eol chars.  I don't think the windows shell
 handles that.  Same for .sh files, just the other way around.
 I'm sure people have a solution for that, but I don't see it.

use eol-style: LF for .sh and CRLF for .bat/.cmd

Right, that's what we're doing.  Dan on the other hand is
recommending using eol-style:native, because he wants to
edit .bat files on unix.  And this is also the setting in
svn config that you pointed to above, btw.  We may have
entered a loop here, not quite sure yet.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-11 Thread Thilo Goetz

Daniel Kulp wrote:
Actually, there is a reverse issue  It also makes it quite 
difficult for people to help with UIMA if they are also contributors to 
other projects that DO use the normal svn settings,  Eclipse or not.   

For example, lets pretend for a moment that I'm a Windows user that uses 
eclipse and contributes to several Apache projects.  My svn is setup 
properly so any .java files I add are eol-style:native, etc...   I then 
am voted in as a committer to UIMA due to some awesome work I've done.  
Do I need to maintain completely different svn settings when working on 
UIMA so that I don't add files in windows format?


Dan, you've got me on the .java files, but you'll never convince me
that the default settings for .bat and .sh files are a good idea.  Nor
for .txt, for that matter.  This is all very well for some English readme
files, but if you work with text as data, and exotic code pages, you don't
want your data repository to go in and manipulate your data.  All hell
breaks loose, I've been there.

Anyway, as I said, if your hypothetical developer becomes real, we
can work this out I'm sure.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

April board reports

2008-04-10 Thread Thilo Goetz

I added the April board reports page on the wiki
since I wanted to add the UIMA report.  According
to the usual schedule, the reports were due 4/9.
I suspect that this date is not correct because of
ApacheCon EU?  Somebody may want to go in and
correct the date.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UIMA release - lots of missing SVN eol-style property settings

2008-04-10 Thread Thilo Goetz

sebb wrote:

The SVN tag

has lots of missing SVN eol-style settings. See the file

This should probably be applied to trunk as well ...

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hi Sebb,

thanks for looking over our release.

There are a lot of files in your list where not setting
the eol-style property is intentional: all our test files.
Setting eol-style:native would make our tests fail on one
platform or another as they're usually compared to some
expected output, which in turn depends on the exact byte
content of the files.

Unfortunately, there is no (valid) eol-style:none
or such that allows us to make this intention explicit.

For the java code we could set it to native.  We just never
felt the need.  Since we need to be careful with our test
files, we don't follow the automatic eol-style client setup
as recommended.  AFAIK, all UIMA developers use Eclipse
for their development, and Eclipse doesn't care about
eol style (or not that I noticed anyway).

I hope you'll agree that it's up to the project to set an
eol-style policy.  Our policy is not to set the property
unless it's required (e.g., for .sh or .bat files).


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [DISCUSS] Community diversity (again)

2008-03-27 Thread Thilo Goetz

Matthieu Riou wrote:

On Wed, Mar 26, 2008 at 5:03 PM, Robert Greig [EMAIL PROTECTED]

I think this is an important topic for future incubator project groups
to have clarified.

The project is not highly dependent on any single contributor (there

are at

least 3 legally independent committers and there is no single company or
entity that is vital to the success of the project)

Perhaps the word legally needs to be removed since from the Qpid
discussion it would appear that several people do not think a strict
legal interpretation should apply?

However, given that the intent (as I understand it) is to avoid the
case where a project dies because one entity withdraws funding,
perhaps some definition along the lines of for people who are paid to
contribute to the project, no single entity remunerates more than 50%
of the committers?

The thing is, for both for Qpid and Tuscany, there *were* 3 independent
committers. And in the case of Qpid, I believe the no single entity
remunerates more than 50% committers would have worked as well.

The thing that I've learned over the past 18 months as a podling
committer is that there are no hard and fast rules.  The community
(in this case, the IPMC) decides on a case-by-case basis, based
on more than just the letter of the law.  That's a good thing, but
it is a difficult concept to grasp when you're new to the Apache

So when you're new to this, as I am, you look for the well-defined
rules and want to cling to them.  You see the 3 independent
committer passage and you aim for that, because that's something
you can understand right off the bat.  However, when I go back
to the graduation guide now and read the whole passage, it says:
Graduation tests whether (in the opinion of the IPMC) a podling
has learned enough and is responsible enough to sustain itself
as such a community.

It's just that to an outsider, it is totally unclear what that
means.  And it may be impossible to really convey the meaning
of it in a few sentences.  So why don't we just say so.

Make it absolutely clear that the diversity of the community
will be judged by the IPMC based on the overall conduct of the
project, mailing list, commit activity etc.  I know it's there
already, but it could be reinforced.  Perhaps at the end of the
Creating an Open and Diverse community (community should be
capitalized, btw) paragraph: The IPMC will judge diversity of
the project based on many criteria.  These include mailing list
activity, commit activity and the affiliations of the committers.
There is no single sufficient criterion, it is the overall conduct
of the development community that counts.  Or something like that.

Another thing that might be helpful is to advise podlings to
engage in the incubator community.  The graduation guide sounds
a bit like [EMAIL PROTECTED] is a place where you just go for
help or questions.  I think that's actually misleading.  We should
strongly advise podling committers to subscribe to and follow
[EMAIL PROTECTED]  It's the best way (the only way?) to understand
what it takes to graduate.

So I would change Please post any questions about graduation to
the general incubator list to Subscribe to and closely follow
the general incubator list.  It is *the* place to learn about
graduation votes and current policies and their interpretation.
It is also where you can post any questions about graduation.

If this goes in the right direction, I'll propose a patch.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [DISCUSS] Community diversity (again)

2008-03-27 Thread Thilo Goetz

Noel J. Bergman wrote:

Endre Stølsvik wrote:

Thilo Goetz wrote:

Make it absolutely clear that the diversity of the community
will be judged by the IPMC based on the overall conduct of
the project, mailing list, commit activity etc.

I'm not sure that diversity and conduct are really mapping that way.  A 
non-diverse community can conduct itself properly and well, but still not be diverse.  And although 
conduct is important, we do place value on demonstrated diversity, as well.  I'm afraid that we can 
give you examples of where one or the other was present, but arguably not both, and we've had to 
deal with the consequences later.  So we all try to learn from prior experiences, and apply that in 
our current and future judgments.

My quote is a bit out of context like that.  What I meant to
convey was that it is not sufficient that there *be* diversity
(in the sense that 3 independent committers exist).  The business
of the project must be conducted by a diverse set of people.  At
least that's what I got out of the recent discussions.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: KEYS file in distribution

2008-03-18 Thread Thilo Goetz

Robert Burrell Donkin wrote:

On Mon, Mar 17, 2008 at 10:35 AM, Thilo Goetz [EMAIL PROTECTED] wrote:


UIMA currently ships the KEYS file as part of
its distributions.  I found one other Apache
project (Derby) that also does.  The others
that I checked, don't (random sample of what
I had on my hard drive).

I would assume that putting the KEYS file in
the distribution is at best not necessary, and
may be counterproductive, as it might lead
people to use it.  And of course you can't
verify your distribution that way.  So I'm
thinking of removing the KEYS file from the

Any compelling reasons to go one way or the

source distributions should be identical to the contents of version control
when the release is cut. if the KEYs file is present in the source that's
cut, it should be left. if you're worried, add a note to the top of the file
(it'll be ignored upon import). if the KEYs file is not present in version
control, it should not be added to the source distribution.

for binary distributions, it's best not to include the KEYs file

- robert

That makes sense.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

KEYS file in distribution

2008-03-17 Thread Thilo Goetz


UIMA currently ships the KEYS file as part of
its distributions.  I found one other Apache
project (Derby) that also does.  The others
that I checked, don't (random sample of what
I had on my hard drive).

I would assume that putting the KEYS file in
the distribution is at best not necessary, and
may be counterproductive, as it might lead
people to use it.  And of course you can't
verify your distribution that way.  So I'm
thinking of removing the KEYS file from the

Any compelling reasons to go one way or the


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Accept PDFBox for incubation

2008-02-01 Thread Thilo Goetz

Jukka Zitting wrote:

Incubator PMC,

Please vote on accepting the PDFBox project for incubation. The full
PDFBox proposal is available at the end of this message and as a wiki
page at We ask the
Incubator PMC to sponsor the PDFBox podling, with myself, Jeremias
Maerki, and Niall Pemberton as the mentors.

The vote is open for the next 72 hours and only votes from the
Incubator PMC are binding.

[X] +1 Accept PDFBox as a new podling
[ ] -1 Do not accept the new podling (provide reason, please)

+1 (non-binding).  This is a great addition to the growing Apache
text/unstructured stack.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-18 Thread Thilo Goetz
sebb wrote:
 On 18/12/2007, Michael Baessler [EMAIL PROTECTED] wrote:
 Leo Simons wrote:
 On Dec 18, 2007, at 12:15 PM, Bertrand Delacretaz wrote:
 On Dec 18, 2007 10:37 AM, ant elder [EMAIL PROTECTED] wrote:
 ...So it is a new requirement and I don't think we should be making
 up policy
 during a release vote
 +1, that's even part of the policy! It is just *so* annoying that this
 keeps happening.

 like this as it makes it very frustrating for
 poddling. How about letting this UIMA release out as-is for now
 while we
 work this out?...
 Yup. It always gets fuzzy when voting and discussion intermix, but I
 do count 3 +1 votes and more +1 votes than -1 votes. That typically
 means the vote passes, so UIMA can release.

 Right the vote runs for more than 72 hours and we have 3 +1 votes and
 one -1 vote.

 The other open issue was the checksum representation for MD5 and SHA1.
 The representation can be improved and we will do this for the next release.
 (I already opened a JIRA issue for that). It is just open what we have
 to do for the current release.
 I've created reformatted versions for the src and bin archives in:
 in case that helps.


 I've not fixed the Maven hashes.
 I don't know whether the Maven hash-checking can cope with the format
 or not - and does it ignore hash files with incorrect format or what?
 If it does not handle the format properly, this could cause download

The hashes for the maven artifacts are generated by maven, in the format
that maven wants them.  There should be no problem, there hasn't been in
the past.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: freedom to do sane release management

2007-12-18 Thread Thilo Goetz
Luciano Resende wrote:
 I guess, from the Incubator release management guide, the requirement
 is that the release can be built from a tag, in a later point in
 All releases should be built from a tag. It is occasionally necessary
 to rebuild releases many years later. Tagging is cheap and easy when
 using subversion. So, every release and candidate should be tagged.

Which is an eminently sensible requirement, and I'm
not debating it.  I also agree that the build process
should be automated and must be repeatable.

The question is if the process how the release is
built from a tag may be more complicated than a simple
tarring up of the svn extract.  I think the notion of
build here is what we usually understand by a software
build, an automated, repeatable process.  So in the case
of UIMA, we copy a handful of files from their usual nether
svn regions to the top level directory (to comply with
release layout policy for the most part), and then tar
the whole.  So under a reasonable interpretation of that
paragraph, we do comply with it.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
William A. Rowe, Jr. wrote:
 Marshall Schor wrote:
 We've put the LICENSE, NOTICES, and DISCLAIMERs into the top directory
 of the source (and binary) distribution(s), but didn't realize this also
 needs to be in the top level of the SVN tag, because we didn't know that
 was considered part of the distribution.

 Can you please confirm this is the case?  In which case, we'll of course
 Your distribution must correspond to subversion, otherwise it's very hard
 to track the artifacts in the tarball, where they came from, how they
 got there, and if they underwent the proper oversight prior to packaging.
 (Yes, we vote on the prepared tarball, but you can see how discrepancies
 do create questions.)

That's not how I interpret the policy document.  It says:

To apply the ALv2 to a new software distribution, include one copy of the 
license text by copying
the file:

into a file called LICENSE in the top directory of your distribution. If the 
distribution is a jar
or tar file, try to add the LICENSE file first in order to place it at the top 
of the archive.

That's what we do.  Of course we'll make every effort to make
our distribution easy to review.  However, it does seem that
we're ok wrt current policy, and view this as a suggestion
for next time.  Ok?


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
sebb wrote:
 On 15/12/2007, Jean T. Anderson [EMAIL PROTECTED] wrote:
 sebb wrote:
 [Eventually found the KEYS file in SVN, but it might be helpful to
 provide a pointer in the vote mails]

 yeah, I went to their website and followed the link from there.
 So did I.
 The NOTICE file in refers to the
 contributions from IBM:

 Software Grant License Agreement, informally known as the
 IBM UIMA License Agreement.

 however, that license is not in LICENSE, nor is it linked from there.

 this wording was approved in their first release -- iirc there were
 discussions about what specifically to put there [1] and I had a minor
 hand in that since they had borrowed wording from derby [2].
 I'm not objecting to the wording in the NOTICE file.
 However, since it refers to another license, I think that needs to be
 present in the LICENSE file:

We are not distributing code under several licenses.
The only license applicable to UIMA is the AL.  The
somewhat cryptic (to a non-lawyer) statement in the
NOTICE file was put there on request by the IPMC, we
didn't have it there originally.  To my way of thinking,
what it is supposed to say is that some of the code
originated with IBM, but has since been relicensed
under the Apache License.

This exact version of the NOTICE and LICENSE file has
been approved for our previous two releases, so I do
hope they are still ok.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
sebb wrote:
 [Eventually found the KEYS file in SVN, but it might be helpful to
 provide a pointer in the vote mails]

Good point, will do next time.

 There are some problems with the MD5 and SHA1 files.
 For example, uimaj-2.2.1-incubating-bin.tar.bz2.md5:
 uimaj-2.2.1-incubating-bin.tar.bz2: 53 20 6A FB 75 1F 07 9D  BB 12 82 58 D0 7D
 CA 4B
 The hash is spread over two lines and into hex pairs. The normal
 format is either:
 53206afb751f079dbb128258d07dca4b *uimaj-2.2.1-incubating-bin.tar.bz2
 The SHA1 checksums have the same problem.
 The PGP signatures are OK, however the format of the existing MD5/SHA1
 files means that most (all?) checking programs will have difficulty
 verifying the checksums.

We generate the checksums with

gpg --print-md MD5 [fileName]  [fileName].md5


gpg --print-md SHA1 [fileName]  [fileName].sha

respectively (as described in the release signing FAQ; however,
I suggested that text ;-).  The advantage of using gpg is that
you just need one tool for the various signatures.  If there
are alternatives, we'll be happy to entertain them (we use maven
as our build env).

Can you elaborate on what checking programs are commonly used?
It was my understanding that the primary signing mechanisms were
the PGP signatures, and the checksums were just for quick sanity
checks (visual verification, as they are so short).  Thanks.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
Bertrand Delacretaz wrote:
 On Dec 17, 2007 10:09 AM, Thilo Goetz [EMAIL PROTECTED] wrote:
 William A. Rowe, Jr. wrote:
 Marshall Schor wrote:
 We've put the LICENSE, NOTICES, and DISCLAIMERs into the top directory
 of the source (and binary) distribution(s), but didn't realize this also
 needs to be in the top level of the SVN tag
 Your distribution must correspond to subversion, otherwise it's very hard
 to track the artifacts in the tarball,...
 That's not how I interpret the policy document
 You might be right about the letter of the policy document, but all
 kinds of alarms go off in my brain when I see a distribution tarball
 that doesn't match the SVN tag that it's built from.
 Automated reproducible builds (including building distribution
 archives) are IMHO a must in our way of working.

Absolutely.  Our build is completely automated and reproducible,
including generating the distribution.  Here's (a slightly simplified
version of) our build script:

svn checkout $tag
cd $leveldir/uimaj
mvn`date` install
cd ..
cd uimaj-distr
mvn assembly:assembly

This does everything from svn extract to building the
release artifacts (except the signing, that requires
some manual intervention because of the release manager's
key phrase).  I don't think it gets much simpler
than that.

However, I don't think that addresses your concerns.
AIUI, you would like our repository to look exactly like
the source distribution.  Well, it almost is, except for
the files in the top level directory.  Those get copied
there when we build the distribution.  Why?  I guess
we thought it's cleaner that way, and complies with
maven's notion of organizing things.  For example,
the NOTICE file lives in uimaj-distr/src/main/readme.
It's still there in the source distribution, but it's
also at the top level.

Now if we absolutely must, we can change that.  I do
admit that I don't quite understand the reason, though,
and like it better the way it is ;-)


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
sebb wrote:
 Maven can generate the MD5 and SHA1 checksums itself; no need for a
 separate tool.
 I'm not familiar with Maven, so I don't know the commands off-hand,
 but I can probably find them.

Maybe it can, but I was unable to figure out how.

We need to create checksums for the artifacts
that fall out of the assembly step, and I don't
think maven supports creating those.

However, Ant can create checksums in the expected
format.  So we can call an Ant task from Maven
to do this.

For the purposes of this vote, would it be ok to
just modify the existing .md5 and .sha1 files?
I would prefer not having to create another release
candidate just for that purpose.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Approve release Apache UIMA 2.2.1-incubating

2007-12-17 Thread Thilo Goetz
Kevan Miller wrote:
 On Dec 17, 2007, at 4:09 AM, Thilo Goetz wrote:
 William A. Rowe, Jr. wrote:
 Marshall Schor wrote:
 We've put the LICENSE, NOTICES, and DISCLAIMERs into the top directory
 of the source (and binary) distribution(s), but didn't realize this
 needs to be in the top level of the SVN tag, because we didn't know
 was considered part of the distribution.

 Can you please confirm this is the case?  In which case, we'll of

 Your distribution must correspond to subversion, otherwise it's very
 to track the artifacts in the tarball, where they came from, how they
 got there, and if they underwent the proper oversight prior to
 (Yes, we vote on the prepared tarball, but you can see how discrepancies
 do create questions.)

 That's not how I interpret the policy document.  It says:

 To apply the ALv2 to a new software distribution, include one copy of
 the license text by copying
 the file:

 into a file called LICENSE in the top directory of your distribution.
 If the distribution is a jar
 or tar file, try to add the LICENSE file first in order to place it at
 the top of the archive.

 That's what we do.  Of course we'll make every effort to make
 our distribution easy to review.  However, it does seem that
 we're ok wrt current policy, and view this as a suggestion
 for next time.  Ok?
 Your interpretation works if your subversion repository is not a
 distribution. IMO, it is and should contain appropriate

If that is the consensus opinion here, that's what we'll
do.  But please put yourself in our shoes.  We can only go
by the information that is available to us.  If this is a
rule, it would be great if it could be written down so the
next incubator project doesn't have to go through this.  I'd
be happy to help with the docs.

Out of curiosity, I started going through the Apache SVN repo
to see what Apache projects complied with this requirement.  I
stopped after C because I'd already found 4 that didn't
comply: Avalon, Cayenne, Cocoon and Commons.  Compliant
were Activemq, Ant, APR and Beehive.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] approve stdcxx 4.2.0 release

2007-10-26 Thread Thilo Goetz
+1 (non-binding)

I ran RAT 0.5.1 on it, and it looks fine.

Note to Robert: RAT seems to get confused by file
names ending in *.lib.cpp (treats them as binary).
I'll submit a patch when RAT starts incubation... :-)


Martin Sebor wrote:
 Has anyone had a chance to review this release? Did we miss something
 that needs to be addressed in order to appove the request?
 Martin Sebor wrote:
 The stdcxx community has just successfully closed a vote to release
 stdcxx 4.2.0. In accordance with the Releases section of the Incubation
 Policy we request the permission of the Incubator PMC to publish the
 tarball containing the release on the stdcxx Download page.

 This vote will close in the usual 72 hours, on Friday, October 26 at
 8PM MDT. See for the countdown.


 Vote result (contains links to the tarball and README containing the
 required incubation artifacts):

 Stdcxx Download page:

 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SSL Error

2007-10-18 Thread Thilo Goetz

sorry, wrong list.  Please try the ftpserver-dev list, I'm sure
they can help you.

I notice that the ftp server mailing list page lists [EMAIL PROTECTED]
without explaining what that mailing list is:
Maybe a little more text there would be appropriate?  Just a


Brandon, Paul (MED US) (EXT) wrote:
 I am hoping someone can help me with an error that I am getting. I have
 set up the Apache FTP server to do Explicit FTPS (i.e. Implicit=false in file). I have created a self signed certificate for
 testing and imported it on the client machine. When I connect, I get the
 SSL/TLS client handshake failed (Error = 0x80090308)
 SSL: The token supplied to the function is invalid
 Since this looks like a certificate error, I obtained a trial
 certificate from Verisign and installed it with the same results. I have
 also attempted an Implicit FTPS connection from the client with the same
 error (with the ftpd properties file set to Implicit=true) . I've looked
 on the Apache Incubator site, but I can not find any information on this
 error. Any help or advice would be appreciated. Thanks.
 This message and any included attachments are from Siemens Medical Solutions 
 and are intended only for the addressee(s). 
 The information contained herein may include trade secrets or privileged or 
 otherwise confidential information. Unauthorized review, forwarding, 
 copying, distributing, or using such information is strictly prohibited and 
 be unlawful. If you received this message in error, or have reason to believe 
 you are not authorized to receive it, please promptly delete this message and 
 notify the sender by e-mail with a copy to [EMAIL PROTECTED] 
 Thank you

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to close down / put into dormant mode an incubator project / podling

2007-10-10 Thread Thilo Goetz
Leo Simons wrote:
 On Oct 7, 2007, at 10:29 PM, Matthieu Riou wrote:
 IIRC what's been done for Agila, it's roughly:
 So...for the archives...
 How to put an incubator podling to sleep
  * decide the project is going dormant, usually on the dev list
-- example:[EMAIL
* if not decided on the dev list, announce this on the dev list
  * announce on the users list if there is one
  * announce on incubator-general
-- example:[EMAIL 
  * add a final report to the current-pending incubator board report
-- example:

 * remove project from reporting schedule here:
   (there is a also a file in SVN somewhere, but I forget where as I don't have 
access ;-)

  * add a note on the project wiki
-- example:
  * add a note on the project website
-- example:
  * create (a) jira(s) for the infrastructure team to close down resources
* svn
* wiki
* mailing lists
-- examples INFRA-1377, INFRA-1378, INFRA-1379, INFRA-1380
 Leo Simons
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Incubator releases in central maven repo, was: [VOTE][policy] Release Distribution Directory

2007-10-09 Thread Thilo Goetz

this is great, thanks.  I don't want to muddy the
waters, but is it time to bring up the maven repo
question again as well?  If incubator releases *are*
official releases, is there a reason not to upload
maven artifacts to the central repository?


Robert Burrell Donkin wrote:
 infrastructure asks that all official releases are distributed from
 subdirectories of / this means that infrastructure
 team can gaurantee the availability and security of releases.  podling
 releases are official releases by the Incubator project. the
 disclaimers we ask projects to include ensure that users understand
 their status.
 therefore (as a matter of policy) these releases should be made
 available from within but this is
 not made clear in Incubator policy. here is a policy patch that makes
 this clear:
 AIUI most current releases are store elsewhere. i'm willing to perform
 the necessary rationalisation work.
 moving releases into dist does highlight issues about long term
 availability and security of releases but i think that these are best
 discussed in separate threads.
 i'd like to ask the IPMC to approve this policy clarification.
 here's my +1
 - robert
 [ ] +1 Insist that releases are distributed within
 [ ] +0
 [ ] -0
 [ ] -1 Do not apply this policy patch (reasons appreciated)
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Incubator Proposal: Pig

2007-09-24 Thread Thilo Goetz
Niclas Hedhman wrote:
 b) I can't say that I understand the technical merits of the proposal, and 
 just see the headline analyzing large data sets. And I would like to know 
 the relationship with UIMA's statement ... analyze large volumes of 
 unstructured information... and hear whether there are overlap, synergies 
 and/or collaboration in view.


I'm not 100% clear on where there could be synergies between
Pig and UIMA.  Map/reduce is a natural distribution
strategy for UIMA, so executing UIMA programs on top of Hadoop
seems natural.  Maybe Pig can help with that and make it easier
somehow.  However, that is not clear to me from the proposal
at this time.

At the same time, I don't really think there is any overlap.
Pig is concerned with computation in a distributed environment,
while UIMA is agnostic in that respect.  On the other hand,
UIMA offers a component model to develop analysis modules and
combine them into processing chains (with an emphasis on reuse).
I do not see from the proposal that Pig is in the business of
defining a component model.

So synergies probably yes, no overlap as far as I can see.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[VOTE] Approve release Apache UIMA 2.2.0-incubating

2007-08-16 Thread Thilo Goetz
The UIMA developers ask the Incubator PMC for permission to
publish a new release of UIMA, the second in the incubator.
This release contains many incremental changes and
improvements, please see the release notes for an exhaustive

We held a vote on uima-dev that resulted in 5 binding +1s
(all the committers) and no 0s or -1s.  The vote thread
is here:[EMAIL 

Please review the release candidate here:

There are 4 subdirectories.  /bin and /src contain the binary
and source distributions, respectively.  /RAT-reports contains
the RAT reports for the binary and src distributions, plus
a short README that explains those files that RAT isn't
sure about.  The RAT reports were generated with RAT 0.5.1.
The /maven directory finally contains a m2 staging repository
with the Maven artifacts we would like to release to the
incubator repo.  This is the first time we're releasing
Maven artifacts, but we got expert advice and hope that
everything is above board.

The SVN tag for this release candidate is:


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: General Subscribe

2007-07-17 Thread Thilo Goetz

on how to subscribe.  Briefly, send a note to [EMAIL PROTECTED]


Annette Keenleyside wrote:
 Please subscribe [EMAIL PROTECTED] to the above mailing list.
 Annette M. Keenleyside
 Manager, WebSphere Application Server Development
 Lotus Notes: Annette Keenleyside/Toronto/[EMAIL PROTECTED]
 905.413.2792 ( t/l 969)   905.413.4920 (fax)

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: monthly reports

2007-07-11 Thread Thilo Goetz
Martijn Dashorst wrote:
 On 7/11/07, Daniel Kulp [EMAIL PROTECTED] wrote:
 Seriously, this is a responsibility of the project to remember.   Once
 the project graduates, they'll have to still make periodic board reports
 so they really need to get used to remembering.
 Fortunately, once you have graduated, you get reminders for submitting
 the board reports. At least our first report was just sent in after 2
 reminders (first to warn of an upcoming report, and second to tell it
 needs to be sent in).
 These are fully automated, so I don't see why these can't be sent to
 the private@ lists for podlings. Apart the additional administration
 of course, but if we have just 1 place to put the reporting schedule
 in, then it would not be hard to auto-send the reminders :)

+1.  Although we (UIMA) haven't missed submitting a report yet, this is
to a large extent due to some kindly soul sending out a reminder on
[EMAIL PROTECTED]  If this could be automated, that would be great.

(I've also tried to get us to submit a report when we weren't due to
report, but that's another story ;-)


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Monthly Board Reports

2007-06-08 Thread Thilo Goetz

Martijn Dashorst wrote:

On 6/6/07, Thilo Goetz [EMAIL PROTECTED] wrote:
I was looking at the reporting schedule to see if UIMA needs to report 
(we don't),
and saw that recent graduates OpenJPA and Wicket were still listed; so 
I removed them.

With just one tiny little technical issue: Wicket isn't officially
graduated yet. I'm preparing our report (it has been 3 months, so we
do have something to say :)


Oops, sorry.  I just knew that the incubator vote had passed, I didn't
realize that it wasn't official yet.  Well anyway, you sure won't have
to report next time around ;-)


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Monthly Board Reports

2007-06-06 Thread Thilo Goetz
Noel J. Bergman wrote:
 It's that time again.  :-)  Please start preparing and submitting your board
   --- Noel

I was looking at the reporting schedule to see if UIMA needs to report (we 
and saw that recent graduates OpenJPA and Wicket were still listed; so I 
removed them.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Gauging interest for incubating PDFBox...

2007-05-07 Thread Thilo Goetz
Jukka Zitting wrote:
 On 5/7/07, Jeremias Maerki [EMAIL PROTECTED] wrote:
 So, after hearing PDFBox mentioned many times last week, I thought it
 might be a good idea to ask around inside a wider area to see how the
 ASF is potentially interested in adopting PDFBox among its projects.
 At least Jackrabbit and Nutch currently use PDFBox and the library
 would also be an ideal companion for the Tika toolkit, so I'd be very
 happy to see PDFBox joining the ASF. :-)
 Jukka Zitting

UIMA would also be interested in using PDFBox, most likely via Tika.  So
+1 to bringing PDFBox to Apache.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Missing reports

2007-04-15 Thread Thilo Goetz

Leo Simons wrote:

On Apr 15, 2007, at 7:23 AM, Thilo Goetz wrote:

Jukka Zitting wrote:

PS. The TripleSoup and UIMA reports are missing the incubating since

I have added this information for UIMA.

I added for TripleSoup and provided the specific date for UIMA.



Not that it matters much: the UIMA incubation status page 
( lists 10/3 as the date 
incubation started, which was when the vote was tallied.  On 10/5, Noel 
ack'ed the vote.  Is that why you list the 5th?  Just curious :-)


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Missing reports

2007-04-14 Thread Thilo Goetz

Jukka Zitting wrote:

PS. The TripleSoup and UIMA reports are missing the incubating since

I have added this information for UIMA.



Jukka Zitting

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: discussion of release of Apache Wicket 1.3.0-incubating-alpha

2007-04-04 Thread Thilo Goetz

William A. Rowe, Jr. wrote:

Martijn Dashorst wrote:

Also, the whole idea of the Incubator is to
withhold releases from the general public. 

Just to clarify - I don't think 'withhold' is a good description.
Release - but with no specific expectation of persistence at the
ASF is probably a better description.  E.g. here's code is fine,
here's a community would be premature.  Public releases of all
of our incubating projects IS goodness if it attracts more people
to the incubating communities, and increases their chances for a
successful graduation and project lifespan.

+1.  Doing releases to attract new contributors is essential to podlings 
who need to build a diverse developer community to exit the incubator. 
  Wicket doesn't have that issue :-)


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE][Retry] Approve the release of Apache UIMA 2.1.0-incubating

2007-03-09 Thread Thilo Goetz


Adam Lally wrote:

After correcting some issues in our last release candidate (thanks to
Jean Anderson for finding these), the Apache UIMA community has again
voted to release version
2.1.0-incubating.  We would now like to ask the Incubator PMC to
approve this release.

The issues that were corrected since the last release candidate are:
- Updated some files that used the old Apache license header so they
now use the newer version.
- Our NOTICE file now includes acknowledgment that the code was
originally contributed by IBM
- Removed redundant LICENSE and NOTICE files from our source distribution
- Corrected some of the metdata in our Maven POM files (Although we
are not asking to release our Maven artifacts at this time, only our
source and binary distributions.)

Release artifacts: 

The release was signed by Michael Baessler.  His public key is available 

Release notes: 

The release has been tagged in SVN:

Here is a link to our vote thread on the uima-dev list:

We've run the RAT utility and have posted the reports, along with our
own comments, here: 

The original unedited RAT reports are also posted: 

We have done thorough testing of this release, which is documented here:

We ask that you please vote to approve this release:

[ ] +1 Approve the release as Apache UIMA 2.1.0-incubating
[ ] -1 Recommend against releasing at this time (identify issues you
consider showstoppers)

- The Apache UIMA team

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] Tika, a content analysis toolkit

2007-03-07 Thread Thilo Goetz

+1 (non-binding)

Sounds like a very interesting proposal, and as you mention, I could 
imagine some interaction/collaboration with UIMA.  I would follow such a 
project closely and possibly contribute.


Jukka Zitting wrote:


[Cross-posting to announce the Tika proposal, please use for followup discussion.]

This is a proposal to start a content analysis toolkit project in the
Apache Incubator. The live version of the proposal is available at

Comments and questions are welcome. There is also a vacant place for a
third mentor. Once people are satisfied with the proposal I will first
call a vote on the Lucene PMC to sponsor the proposal and then a vote
on the Incubator PMC to accept the project for incubation.

PS. Based on quick Google and USPTO searches there doesn't seem to be
anything that would cause trouble with the Tika name.


Jukka Zitting

Tika, a content analysis toolkit


Tika is a toolkit for detecting and extracting metadata and structured
text content from various documents using existing parser libraries.


The Tika content analysis toolkit will include features for detecting
the content types, character encodings, languages, and other 

of existing documents and for extracting structured text content from
the documents.

The toolkit is targeted especially for search engines and other content
indexing and analysis tools, but will be useful also for other applications
that need to extract meaningful information from documents that might
be presented as nothing else than binary streams.

Instead of implementing its own document parsers, Tika will use existing
parser libraries like Jakarta POI [1] and PDFBox [2].


The initial idea for the Tika project was voiced in April 2006 by
Jérôme Charron and Chris A. Mattman on the Nutch mailing list. The Nutch
parser framework and other content analysis features were seen as
value-added components that would benefit also other projects. The idea
received positive feedback, but lacked the momentum.

The idea was revisited in August 2006 when Jukka Zitting from the
Jackrabbit project contacted Nutch for possible cooperation with similar
ideas. The original Tika idea gained extra momentum and a Google Code
project was set up as a staging area for prototype code before deciding
how to best handle the setup of a new project. After a few initial
commits the activity again declined.

In January 2007 the idea started gaining more momentum when Rida Benjelloun
offered to contribute the Lius project [3] to Apache Lucene and when Mark
Harwood also started looking for a generic toolkit like Tika.

This proposal is the result of the above efforts and related discussions
both in private and on various public forums. Some alternatives to
incubation, like Apache Labs [4] or Jakarta Commons [5], came up during
the discussions but we believe that taking the project to the Incubator
is the best way to start growing a viable community to sustain the Tika


There is ever more demand for tools that automatically analyze and index
documents in various formats. Search engines, content repositories, and
other tools often need to extract metadata and text content from documents
given as nothing or little else than a simple octet stream. While there
are a number of existing parser libraries for various document types,
each of them comes with a custom API and there are no generic tools for
automatically determining which parser to use for which documents.
Currently many projects end up creating their custom content analysis
and extraction tools.

The Tika project attempts to remove this duplication of efforts. We
believe that by pooling the efforts of multiple projects we will be able
to create a generic toolkit that exceeds the capabilities and quality of
the custom solutions of any single project. A generic toolkit project
will also provide common ground for the developers of parser libraries
and content applications to interact.

Initial Goals

The initial goals of the proposed project are:

   * Viable community around the Tika codebase

   * Active relationships and possible cooperation with related
 projects and communities

   * Generic parser API for extracting structured text content from
 various document formats

   * Flexible metadata detection and extraction API

   * Java implementations of the metadata standards mentioned below

Current Status


All the initial committers are familiar with the meritocracy principles
of Apache, and have already worked on the various source codebases. We will
follow the normal meritocracy rules also with other potential contributors.


There is not yet a clear Tika community. Instead we have a 

Re: log4net graduation, log4php retirement, Logging Services bylaws

2007-02-09 Thread Thilo Goetz

Yoav Shapira wrote:


On 2/8/07, Curt Arnold [EMAIL PROTECTED] wrote:


I've already started a thread on eliminating or simplifying the
Logging Services bylaws (
logging-generalm=117095930231054w=2).  I thought I saw an thread
here that suggested that project level bylaws or guidelines were
unnecessary, but could not find it in the archives.  Any other
references or guidance would be appreciated, but it would be most
helpful to join the discussion on [EMAIL PROTECTED]

I could swear I saw such a message here as well, essentially saying
Bylaws are for the Foundation itself, and projects should have
Guidelines as they see fit, which may mean very little Guidelines
outside the ASF's own general ones.  The Logging Services two-stage
release approval thing has got to end ;)

I remembered this, too, and found this after a little searching:


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Release Apache ServiceMix 3.1-incubating

2007-02-08 Thread Thilo Goetz


I'm currently involved in preparing the first UIMA incubator release, so 
I looked at your release with a view to what I could learn ;-)  To my 
(newbie) eyes it all looks very clean, license headers everywhere etc.

One thing I noted is that you're distributing the Java Activation 
framework.  It is my understanding that the Activation framework has a 
bit of a problematic license, which is why it is not available from our 
m2 repository, for example.  You also do not copy the license in your 
LICENSE file.  The license has some legalese that I do not pretend to 
understand.  If this is all above-board and you can point me to some 
previous discussion, all the better.


Guillaume Nodet wrote:

The ServiceMix community has voted to relase
We now ask the Incubator PMC to approve this release.

Download page, release notes, etc ...

Direct links:

Guillaume Nodet

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Vote] Incubating Project Policy

2007-02-02 Thread Thilo Goetz

Bertrand Delacretaz wrote:

On 2/2/07, William A. Rowe, Jr. [EMAIL PROTECTED] wrote:

...I'm proposing the following policy become explicit across the 

I'm +1 on the idea, with a suggestion below.

  ...committers may commit code they personally authored,
  or with proper attribution, commit patches posted ..

I'd add by their original authors or license holders here to make it
clear that posting to Jira by proxy is not allowed either.

+1, with Bertrand's addition.  The UIMA podling follows this policy, not 
because we have issues with third-party contributions but to keep our IP 
squeaky clean.

[We have the additional constraint that we don't commit anything (other 
than trivial changes) by people who do not have an ICLA on file.  It has 
been suggested that this sets the bar very high for potential 
contributors, but from our (limited) experience so far, this does not 
seem to be the case.]


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] TripleSoup - a SPARQL endpoint for httpd

2007-02-02 Thread Thilo Goetz

+1 (non-binding)

Sounds like a very useful (and fun) effort to me.


Leo Simons wrote:

Hi all,

This is a vote on a previously posted proposal to start a rdf database 
server project at apache. The entire proposal text is included below. 
There is only one change from the one posted to the [proposal] thread -- 
we will start without a triplesoup-users@ mailing list.

Please place your votes!

+1 from me.

This vote will run until we have a huge amount of enthousiastic +1s, 
with a sizeable amount of them from the Incubator PMC, and at a minimum 
for 72 hours. If you're not a part of the incubator PMC, you're still 
welcome to express support (or lack of it) for the proposal by voting; I 
will sort out votes when counting. (formal voting rules on incubator 


- Leo

= summary =

TripleSoup is the simplest thing that you can do to turn your apache
web server into a SPARQL endpoint.

TripleSoup will be an RDF [2] store [3], tooling to work with that
database, and a REST [4] web interface to talk to that database using
SPARQL [5], implemented as an apache webserver module.

Sponsor:   Incubator PMC
Champion:  Leo Simons [EMAIL PROTECTED]
Mentors:   Dirk-Willem van Gulik [EMAIL PROTECTED],
   Stefano Mazzocchi [EMAIL PROTECTED],
Resources: SVN:
   Mailing lists:
Initial committers:
   Dave Beckett [EMAIL PROTECTED], redland author
   Dirk-Willem van Gulik [EMAIL PROTECTED],
   Stefano Mazzocchi [EMAIL PROTECTED],
   Andrea Marchesini [EMAIL PROTECTED], b store author
   Alberto Reggiori [EMAIL PROTECTED], rdfstore author
   David Reid [EMAIL PROTECTED],
Initial source: mod_sparql, commercial triple store,
existing open source triple store
Known risks:None
Technologies:   c

= Proposal details =

== Technology (basics) ==

What is RDF? It is just about any kind of data, represented as triples of
(subject, predicate, object), usually with a rich vocabulary describing the
semantics of the data (with the vocabulary typically also encoded as

This data has a representation as RDF/XML as well as using other formats 

as N3, and a query language SPARQL for searching through it. See [6] for an

So if it is just some data in some format, why does it need a special
server? Because RDF data is fundamentally not constrained to a file, and
there often is no resource identifier that readily identifies 
something as a

document which can be served up over HTTP.

So why the REST interface? RDF is one of the building blocks proposed 
for the

semantic web, and that's why a system that works well with/over HTTP is
needed from the start.

== Technology (concrete) ==

This is just an example. Imagine that there is an application someapp on
the host which provides access to information about books,
and you want to get a list of those books (their URIs) and the names of the

$ telnet 80
SELECT /someapp HTTP/1.0
Accept: application/sparql-results+xml, rdf/xml, rdf/n3

PREFIX books:
SELECT ?book ?title
  { ?book dc:title ?title }

HTTP/1.0 200 Ok
Content-Type: application/sparql-results+xml
Content-Length: 1234

?xml version=1.0?
variable name=book/
variable name=title/
  results ordered=false distinct=false
  binding name=book
  binding name=title
literalHarry Potter and the Half-Blood Prince/literal

Connection closed by

It turns out there's only one book in the database in this example.
(Sample data taken from David Reid has some 
code that

does something not unlike this already [7], implemented as a httpd module,
using the Redland library [11,12] as its backend store.

== What 

Re: Write-up on release signing/verification

2007-01-26 Thread Thilo Goetz
Thanks for the feedback.  I have created INFRA-1133 
( and attached a patch. 
 Just a conservative extension, nothing revolutionary.  Let me know 
what you think.


robert burrell donkin wrote:

On 1/25/07, Yoav Shapira [EMAIL PROTECTED] wrote:


On 1/25/07, Thilo Goetz [EMAIL PROTECTED] wrote:
 so what do you propose?  The signing releases page does have all the
 info, but it's not very newbie friendly.

I propose that instead of rewriting a new set of docs from scratch,
you (or whoever is interested) submit patches against the current and other related
documents, that make the page conform with your vision of what's best,
or newbie-friendly, or whatever criteria you wish to use.  Just like
any feature enhancement on any software product.  It doesn't matter to
me whether it's FAQ style or normative style or whatever, just that
this info is in one central place, not duplicated all over the place.
In other words, the DRY principle


- robert

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Write-up on release signing/verification

2007-01-25 Thread Thilo Goetz

Matthias Wessendorf wrote:

Hi Thilo,

I was also getting me into the signing and since we (the Trinidad
podling) use Maven2, I found this useful as well


Thanks, I'll check that out.  The documentation is a bit on the short 
side.  Does it generate MD5 and SHA1 checksums as well?


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Write-up on release signing/verification

2007-01-25 Thread Thilo Goetz


I have recently started to familiarize myself with release signing for 
the upcoming UIMA release.  I have documented my experiences on our web 
site, for developers here: (section Signing a 

and for users here:

I would really appreciate it if someone more knowledgeable than myself 
could give this a quick read and point out any glaring mistakes.  It's 
really short ;-)

While I found good information on release signing on various Apache 
pages, I did not find corresponding information for users on what to do 
with the signature files.  If anybody knows of such information, could 
you let me know so I can link to it from our pages.  If there isn't, 
maybe what I wrote (after clean-up ;-) could be used as basis for a more 
general FAQ.

Note that I don't have anything on cross-signing of keys and web of 
trust yet, I hope to add something on that at a later date.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Write-up on release signing/verification

2007-01-25 Thread Thilo Goetz

Matthias Wessendorf wrote:

here it goes 

Hi Matthias,

you certainly have an abundance of signature files there.
maven-faces-plugin-incubator-m1-SNAPSHOT.jar.asc.asc.md5 seems a little 
excessive, surely?  Or what am I missing here...


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Write-up on release signing/verification

2007-01-25 Thread Thilo Goetz

Yoav Shapira wrote:

That's cool, and very considerate of you to take the time to document
your process.  Thank you.

However, I'm not sure that we need to duplicate what's already
documented and followed by most ASF projects: and its links.  Instead, we should
work to update, amend, and extend that set of documents as applicable.


Hi Yoav,

so what do you propose?  The signing releases page does have all the 
info, but it's not very newbie friendly.  The FAQ style is appropriate 
if you already know your stuff in principle, but want to look up 
something specific.  I was trying to give a bit more of a sequential 
presentation.  If there is a general place where this content should go, 
I'd be happy to help with that.

The other question I had was about the user side of things.  Is there a 
place where this has been described already?  I'd be more than happy to 
just link to existing content, or help create content that describes the 
user side of things in a general way.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Add UIMA to reporting schedule [was: December 2006 Incubator Reports]

2006-12-08 Thread Thilo Goetz

Dan Diephouse wrote:

Just add yourself to the monthly schedule on the wiki and then you'll be
scheduled (pick any group you like) :-)

Also, new incubator projects must report every month for the first three
months. After that then the reporting schedule kicks in for you.

- Dan


Thanks.  I've added UIMA to the schedule, March group.  So we'll report 
December and January as new project, then pick up our regular schedule 
in March.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Add UIMA to reporting schedule [was: December 2006 Incubator Reports]

2006-12-06 Thread Thilo Goetz

Noel J. Bergman wrote:

shows the list of projects needing to report, plus any that have been newly

UIMA reported in November for the first time.  We are currently listed 
as Needs to be added to the monthly schedule.  Who can do that, is 
there anything we can do to make this happen?

We're happy to report each month until further notice, but don't want to 
flood the board with unwanted reports either ;-)

I believe Wicket is in the same boat, they have also reported before and 
have not been added to the 3 month reporting schedule.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Weirdness with the web site

2006-11-14 Thread Thilo Goetz

David Crossley wrote:

On xml-based projects there are problems due to many weird
filename extensions, so i use a different approach with
It scans for text files and reports which do not have svn:eol-style native.

That looks like a good approach to me, we have just decided to do 
something similar for UIMA.  The only difference is that we have use 
cases where text files need to be (binary) identical on all platforms, 
mostly input for our test cases.  So for those files we've decided to 
set svn:eol-style to LF.  In our case, we would run a script that just 
checks that all text files have svn:eol-style set to *something* (or 
maybe native or LF).  Maybe your Perl script could be extended with a 
switch that allows that?

AFAIK one also needs to fix the line-endings to be appropriate
for the local operating system. So it is probably a manual job
to report and fix.

This type of thing wastes a lot of time for open source.

We have tried to encourage ASF committers to configure their
svn client:

The SVN FAQ mentions the possibility of (rather drastic) pre-commit hook.
Perhaps we should (/me ducks).

Or perhaps we should set up a regular (monthly?) job that scans
each project trunk and reports such issues.

Good idea, but see my comments above on not making this check require 


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: New Name for UIMA Podling?

2006-10-18 Thread Thilo Goetz
If there's no reason for us to change the project name, then I for one 
would just like to keep the one we have.  We have built some name 
recognition around UIMA already, and I hope the Ukrainian Institute of 
Modern Art will forgive us for usurping the #1 spot on Google ;-)


Mads Toftum wrote:

On Mon, Oct 16, 2006 at 11:58:53PM +0200, Leo Simons wrote:
Note UIMA is a fine name for an apache project. We have projects like  

+1 - there seems to have started some sort of fascination with changing
names where there is no need. In general I'm not really a fan of naming
things so that it is impossible to guess what a project is (that's hard
enough as it is already).


Mads Toftum

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-22 Thread Thilo Goetz

Marshall Schor wrote:
Hi everyone.  I'm restarting the UIMA Proposal thread based on the 
comments so far, with a revised proposal that more closely follows  The first paragraph 
was rewritten to more clearly state what the proposal was, in plainer 
language.  It is also slightly updated, reflecting the submission of 
UIMA to OASIS for standardization work.

I have updated the Wiki accordingly.  I wasn't too sure about the 
leveling of some of the sections, feel free to correct.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] accept UIMA as a podling

2006-09-19 Thread Thilo Goetz

Garrett Rooney wrote:

I'm sorry, but I have to vote -1 based on my new policy of rejecting
any potential podling that can't explain what it is that they do
within the first paragraph of the proposal.  I'm a fairly intelligent
person, but honestly I have no clue what an architecture and software
framework for creating, discovering, composing and deploying a broad
range of multi-modal analysis capabilities actually is, and I see
little potential for any project that's so bad at selling themselves
to actually grow a useful community.



you're right.  Others have noted that our opening paragraphs are not 
very clear.  We did however follow up with more explanation that 
satisfied others on the list.  Are you saying that these further 
explanations are still not clear, or that those explanations should go 
into the proposal itself (as opposed to a link from the Wiki)?

UIMA may not be the easiest thing in the world to explain, and I can 
accept that our proposal doesn't do a very good job.  However, I do 
believe that we address an important problem and can make an interesting 
contribution to Apache.  Making the first couple of paragraphs of the 
proposal more understandable should be a surmountable problem.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-27 Thread Thilo Goetz

Yonik Seeley wrote:

On 8/26/06, Thilo Goetz [EMAIL PROTECTED] wrote:

 From an application perspective, we have great hopes for a cooperation
with the Lucene project.

Great, I think this is something I'd like to get involved in!
I've been thinking about how Solr integration could work.

You then also need a search engine that
can index that extra information and make it available for search.

Without getting into too much detail here, some info could be
immediately usable by Lucene based apps (like entity extraction, where
you can add info via a new field in the document).  Parts-of-speech
type of stuff is currently more difficult of course.


I agree (with all of the above ;-).  Where it gets really interesting is 
with queries like show me all documents with book references whose 
author's last name is Knuth (highlighting the reference in the 
summary).  One might be able to create such a system based on a text 
search engine with special fields and some sophisticated query 
expansion, but it would be a lot easier if one had special support for 
embedded structures in the index -- like you need for XML indexing.

I'll be happy to continue this discussion over on solr-dev or wherever 
is appropriate.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-26 Thread Thilo Goetz

Ian Holsman wrote:

Hi Thilo
your explanation attracted me ;-)

is UIMA just the interface specification only ? (ie to produce a 
standard in the unstructured text-processing world so that other people 
can plug and play)

or does UIMA also provide tools for each component?

Hi Ian,

the main focus of UIMA is on the framework, to provide a basis for 
others to plug-n-play.  We do also provide tooling to create new 
components.  There are some sample components as well that we provide as 
part of the documentation.

However, we recognize that an empty framework without any content is not 
very attractive to new users.  That's one of the reasons why our 
proposal talks about creating something similar to the Lucene sandbox 
for UIMA: a place where actual components are developed.  (The main 
reason of course is to foster a community of UIMA users and create a 
place where Apache-licensed components can be developed and downloaded.)

I'm interested, and time permitting, could help as a mentor .. I'm not a 
java expert (compared to others on this list), or a text processing 
expert, but I know

a bit about the processes around the incubator.

Great!  We welcome all the help we can get.  I've been lurking on the 
general incubator list for a couple of months now, and I have to admit 
that the whole process looks a bit daunting to a newbie like me :-)
I get the impression that mentors are absolutely essential to the 
incubator, and that you can't have too many.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-26 Thread Thilo Goetz

Hi David,

we have some sample components today.  For example, we have wrappers 
around some of the OpenNLP tools ( to 
make them available as UIMA components.

Also, as I mentioned in my an answer to Ian, we would like to create 
something like the Lucene sandbox for the development of UIMA 
components.  Almost all text processing needs some basic functionality, 
such as segmentation and sentence detection, so it would be a good idea 
to have these available from and developed on Apache.

We have a whole boat load of sample applications that let you feed 
documents to your UIMA instance and then visualize the results in some 
way or other.  Those are more for demonstration and debugging purposes, 

From an application perspective, we have great hopes for a cooperation 
with the Lucene project.  Even today, so-called semantic search is a 
main application area of UIMA.  The basic idea of semantic search is 
that you can search for information that is not explicitly contained in 
the text, and UIMA is a good basis to create that extra information - 
but that's only half the story.  You then also need a search engine that 
can index that extra information and make it available for search.  An 
application package where you can simply plug in your UIMA entity 
detection (for example) and you have a full semantic search application 
would be very attractive, I believe.

That's more of a mid-term plan, though, as it would also require some 
changes to Lucene.

I've rambled a bit, but I hope somewhere in what I said is an answer to 
your question (the short answer being yes ;-).


David Welton wrote:

 What does it *do*?

I believe it is basically a big, pluggable, harness

Harness - will it be able to do something out of the box as a
demonstration of its capabilities?

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA

2006-08-25 Thread Thilo Goetz

Leo Simons wrote:


What does it *do*? How does it *work*? I understand there's a runtime and
a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what they are *for*.

The unstructured content we're talking about is mainly plain text today. 
 There is also some work going on analyzing video streams, as well as 
multi-modal streams (e.g., video + closed captioning).  I'm not really 
competent to talk about those, so I'll stick to text.  A typical 
processing chain for text analysis starts out something like this:

language identification - language specific segmentation - 
sentence boundary detection - entity detection (person/place names 
etc.) - ...

So you start by identifying the language the text is in (Chinese, 
English etc.).  Then you do token segmentation based on that information 
(it's completely different for Chinese than for English).  Based on the 
tokens you discovered, you may want to do sentence boundary detection, 
so you know what entities occur in the same sentence.  Then, again based 
on the tokens you've found, you can do so-called named entity detection, 
such as place names, person names etc.  After that, you may have another 
module that can discover relations between the entities that you have 
found.  And so on.

UIMA in its core is a component architecture that allows you to create 
analysis applications like the one described above.  It provides 
facilities for creating meta-information on documents like in the 
example above.  That is, the original artifact (i.e., the text) is not 
modified and the derived information is kept separately.

UIMA is mostly a framework, not an application.  So it is not concerned 
with fetching documents, like the crawler of a search engine.  Nor does 
UIMA provide facilities to do very much with the information you have 
extracted from the text (or other artifact).  Rather, the use case is 
that you have an application that has a need for the processing of 
unstructured information.  This application will provide the input data, 
and it will know what to do with the results.  The value of UIMA derives 
from the component model: it is easy to reuse existing analysis 
components that other people have written, and it's easy to exchange, 
say, one language identifier for another.

One standard application scenario is to use UIMA to extract some named 
entities from text, feed the results into a relational database, and use 
the database's mining capabilities to do, e.g., association analysis. 
Another area of application is enhanced text search, where in addition 
to regular free-form text search, you can search for documents 
containing certain entities.  Trivial standard example: you're looking 
for John's phone number in your email, so you use semantic search to 
look for documents that contain John's name and a phone number.  You'll 
use a UIMA component that knows that a pattern 123-456-7890 is a phone 
number and will create a phone number entity.

I hope this gives you a better idea what UIMA is about.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]