Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always choose to hardwire HBASE (option D) later.
Markus > Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall > we reduce the various options described before to a single one? > > Julien > > On 15 September 2011 19:55, Markus Jelsma <markus.jel...@openindex.io>wrote: > > > Hi Guys, > > > > > > I thought I'd chime in on this thread. My comments below: > > > > I understand and share your frustration, however you need to bear in > > > > mind > > > > > > that things are done only if people volunteer and have time - usually > > > > taken from their holiday, weekends, evenings. Chris (who is the de > > > > facto > > > > > > release master for Nutch and Gora) has not had the time and nobody > > > > else has volunteered to do it. > > > > > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that > > > will address the Maven issues. However it is on my roadmap for open > > > > source > > > > > stuff to get done in the next month, so that's a good thing. But yes, > > > > that > > > > > portion of my open source work is all volunteer time, so sometimes > > > other things take priority. > > > > > > >> As it happens, yesterday was the 1 year anniversary of the last > > > >> successful Hudson/Jenkins build... If that actually worked, we > > > >> could point people towards it as a useful recipe for how to get a > > > >> build working off trunk. I haven't been following Nutch too > > > >> closely, but it always strikes me as really odd, that there's a > > > >> nightly build and it doesn't bother anybody that it fails all the > > > >> time (and that there isn't a nightly build for the stable > > > >> branches). > > > > > > > > The real issue behind all this is what we should do with Nutch 2.0. > > > > What > > > > > > follows is only my opinion and I would love to hear what others have > > > > to say on this subject. > > > > > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > > > > storage > > > > to > > > > > > Gora, the latter hasn't really taken off since incubation. There have > > > > been some modest contributions to it but it does not seem to be used > > > > much and there is virtually nothing happening on it in terms of > > > > development. More worryingly, the people who initially contributed to > > > > it > > > > > > are not very active on the project (such is life, new jobs, different > > > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > > > progress in the last 12 months : we still have the same bugs, the > > > > tests > > > > > > do not work, the build has to be done manually etc... > > > > > > Yep. > > > > > > > At the same time, there has been a new lease of life into Nutch as a > > > > whole : there is definitely more activity on the mailing lists, new > > > > users, new active committers etc... and quite a few bugfixes and > > > > improvements - most of them backported from what had been done in the > > > > trunk and people seem fairly happy with what we can do with 1.4 > > > > > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I > > > > kind > > > > > of felt that maintaining a stable 1.X branch of Nutch (in parallel to > > > the 2.0 efforts) was really going to pay off since there was renewed > > > interest from users in leveraging (and furthermore accepting) the > > > nuances of 1.X. > > > > > > > So the question is : what shall we do with 2.0? Here are a few > > > > possibilities > > > > > > > > > > > > a) put some effort into it, fix the bugs and make so that it can be > > > > used > > > > > > instead of 1.x > > > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > > > > trunk again > > > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain > > > > two > > > > > > branches is quite a pain) > > > > d) abandon the idea of a neutral storage layer with Gora and hardwire > > > > it > > > > > > to e.g. HBase > > > > > > > > Option (a) has not happened in the last 12 months and I am not very > > > > hopeful about it. > > > > > > > > What do you guys think? > > > > > > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 > > > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is > > > > to > > > > > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we > > > get to ~1.6 over the next 6 months and there is still no active > > > development > > > > on > > > > > 2.0, I'd propose we do this at that point in time: > > > > > > 1. branch the current trunk as > > > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab > > > latest stable branch (e.g., > > > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and > > > > *replace* > > > > > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active > > > development on stable becomes active development in trunk and nutchgora > > > still exists in case anyone ever resurrects it. > > > > > > That way, we give another 6 months to see how it shakes out and > > > > potentially > > > > > allow for 1 or 2 or 3 more stable releases before switching those over > > > to trunk. > > > > > > Thoughts? > > > > Yes. I don't believe we should wait until january before discussing this > > topic > > again. I, for example, cannot spend considerable extra time on the issues > > i put in 1.4, also due to the fact that it's not entirely stable. > > > > There are many things i can write about this topic right now but don't > > feel it's neccessary. The choice is difficult and perhaps painful but > > when the voting round is opened by our project lead, i will vote for > > promoting 1.x back > > to trunk. > > > > My apologies for my impatience and pessimism. > > > > > BTW, I have a couple contributions from my CS572: Search Engines class > > > > from > > > > > a year ago that I'd love to port into the Nutch stable branch including > > > Hubs/Authorities ranking and some other goodies. I'll try and work on > > > those over the next few months, I'm just letting everyone know now so I > > > don't forget again :-) > > > > > > Cheers, > > > Chris > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Chris Mattmann, Ph.D. > > > Senior Computer Scientist > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > > Office: 171-266B, Mailstop: 171-246 > > > Email: chris.a.mattm...@nasa.gov > > > WWW: http://sunset.usc.edu/~mattmann/ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Adjunct Assistant Professor, Computer Science Department > > > University of Southern California, Los Angeles, CA 90089 USA > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++