On Thu, Dec 10, 2009 at 7:26 AM, Isabel Drost <[email protected]> wrote:
> As for the jars we depend on: > > I assume that not all of Mahout depends on all libraries. Say the > clustering code certainly does not depend on HBase. Especially for those > users who do not want to use maven for their project, it might be pretty > interesting to know, which libraries are needed by the components they > are specifically interested in. > The dependency reports from maven are pretty helpful to this end. Thanks for setting these up. It is too bad that deep links into the repo can't be generated as a part of this report as well. I agree to not forcing users to use maven. I'm really like maven, but I know plenty of people who aren't or don't want to be bothered learning it. To move ahead with a binary release, it is necessary to determine the minimum set of dependencies we need to re-distribute with the release. The number of dependencies Mahout has is pretty large, but many of them are transitive. I suspect many of these are not needed, for example the jetty and tomcat releated jars pulled in by hadoop and some of the duplicates (2 versions of commons-cli, etc). See: http://people.apache.org/~isabel/mahout_site/mahout-core/dependencies.htmlfor the report, as a start. Grant, do you have a sense of which jars we can redistribute and which we can't? I did notice javax.mail was in there, but are there others? For that matter, how is javax.mail used anyway? It is present in the maven/pom.xml, but doesn't seem to break the build if it is removed. It is also worth discussing the goals of a binary release -- and whether it goes beyond providing pre-built jars, a limited set of dependencies and allows a number of examples to be run or includes a driver script similar to that included in hadoop or nutch (as proposed in MAHOUT-185). Does anyone have thoughts regarding this? Drew
