[Just wondering if one of the criteria for graduating to a top-level project should be "no dependency on the LimitedPrivate APIs of the parent project".]
Steve, I agree with your suggestion for a downstream-project-build-and-test instance. All I can say is, "stay tuned". - milind -- Milind Bhandarkar mbhandar...@linkedin.com +1-650-776-3167 On 6/9/11 4:42 AM, "Steve Loughran" <ste...@apache.org> wrote: >On 06/08/2011 06:41 PM, Suresh Srinivas wrote: >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects >>that are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in >>the >> future. > >I'm going to talk about my experience on the Ant team. > >One of the lessons of that project is that in the open source world, you >can't predict how your code gets used, or control it. If someone wants >to take your app and use it as a library -they can. If someone wants to >do something completely unexpected with that library -they can. And this >is a good thing, because your code gets used. Yes, you get new bugreps, >but every person using your code is someone not using somebody elses >code. You win. > >The other lesson from that is the following: in open source, there is no >such thing as private code. > >* If you mark something as package scoped, they just inject their >classes into your package (and who hasn't done that with their Hadoop >extensions?). >* If you mark something as protected, they subclass and open up its >privacy. >* If you mark something as private, they edit your source and create a >new JAR with the relaxed permission > >for any of these actions, you end up fielding the bugreps, as the stack >trace points to you. And it increases maintenance costs for everyone. > > >Alternatively they cut and paste your code into their codebase, possibly >-but not always- retaining the apache credits. > >That > * complicates copyright and lawsuits: > http://www.theserverside.com/news/thread.tss?thread_id=29958 > > * increases maintenance costs for everyone, especially if there are >security issues with the original code. > >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application >>that we >> do not know of breaks! > >The way Ant handled this with Gump, the nightly clean build of all the >OSS Java projects built with Ant >http://vmgump.apache.org/gump/public/ > >For all the projects, they thought they were getting a free CI build >run, but what it really was was a regression test of Ant and every >single OSS project. If a change in Ant broke anyone's build: we noticed. >If a change in Log4J broke a build, someone noticed. It became a >rapid-response regression test for the entire OSS suite. > >Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy >dependencies doesn't help either, it complicates classpaths no end. > >Even so, the idea is great: build and test your downstream applications, >and the things you depend on, so you find problems within 24 hours of >the change being committed -regardless of which project committed the >change. > >The way to do it now would be with Jenkins, not just building and >testing Hadooop-{core, hdfs, mapreduce}, but > -building and publishing every upstream dependency. > -test against the trunk versions build locally. > -build and test against the ivy-versioned artifacts that are >controlled by the version.properties > >Together this flags up when something works against the old artifacts, >but doesn't work against the trunk versions: that's their regressions, >caught early. > >Downstream > -build and test the OSS projects that work with Hadoop. > That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the >other ones, such as Cascading. > >That can be offered as a service to these projects "we will build and >test your code against our trunk", a service designed to benefit >everyone. They find their bugs, we find regressions. > >This is a pretty complex project, especially when you think about the >challenge of testing your RPM generation code will install the RPMs (I >bring up clean CentOS VMs for such a purpose), but without it you don't >get everything working together, which is the state things appear to be >in today. > >Ignoring the RPM install & test problems, if people are interested in >working on this, we should be able to do a lot of it on Jenkins. Who is >willing to get involved? > >-Steve