Hi All,

We just wanted to flush some our discussions on Nutch 2.0 to the public
mailing lists. A few key points:

1. Doğacan, Julien and Enis are proposing to base Nutch 2.0 on GORA, a
technology that can be viewed here [1].

The upshot of GORA is that it's an ORM layer that abstracts away backend
stores like Cassandra or SQL or any other database (though Gora works better
with NoSQL stores).

A couple of questions I've got:

 a) is GORA ASL licensed?
 b) what's the maintenance plan for GORA? Will it continue to live in
Github? Will you guys propose it into the Apache Incubator as an ASF
project?

2. Development on Nutchbase has occurred at Github [2] since Doğacan
originally checked it into Nutch SVN under a branch [3] at the ASF. I
expressed some concerns about this since it's hard to review huge sweeping
patches and since a lot of development has occurred off of the public Apache
mailing lists. Specifically I asked Doğacan et al. to enumerate a list of
the changes in the Git version of Nutchbase [2] versus the ASF version [3].
We then need to come up with a plan of how to merge the 2 and get the latest
into ASF SVN. Doğacan estimates the difference between the Nutchbase branch
at the ASF [3] and that of Github [2] to be ~25 hrs of work.  Doğacan
generated this list of major changes that have happened at Github and not at
Apache:

---snip
1) Porting nutchbase to GORA: This was discussed in issues NUTCH-808
and NUTCH-811.
2) Using ivy in nutch:  NUTCH-821 and NUTCH-825
3) Removal of nutch's custom developed search code (and using SOLR instead).
IIRC, this was also
discussed and accepted by nutch community. However, if not, we can simply
put this code back (since
this was a trivial delete).
---snip

So, that really brings everyone up to speed I think. So, that said, I am +1
for moving forward on #1 above, provided we address the 2 questions I listed
(a+b). We need to understand it from a Nutch perspective. As for #2, we can
rectify it by doing the following things:

     (a) svn copy NutchBase from GitHub to the nutchbase branch in
http://svn.apache.org/repos/asf/nutch/branches/nutchbase bringing the ASF
branch up to date.
     (b) Once the GORA license issues are figured out (they must be
compatible with the ASF or we cannot use it), then we update Nutch to depend
on the GORA jars via Ivy?
     (c) svn tag current Nutch trunk as 1.2-branch
     (d) svn merge nutchbase branch with nutch trunk
     (e) roll the version # in nutch trunk to 2.0-dev
     (f) all issues in JIRA should be updated to reflect 2.0-dev fixes where
it makes sense
     (g) a 2.1 version is added to mark anything that we don't want in 2.0
and we file post 2.0 issues there
     (h) Nutch 2.0 trunk is fixed, and brought up to speed and old code is
removed. All unit tests should pass regression where it makes sense.
     (i) Nutch documentation is brought up to date on wiki and checked into
SVN
     (j) We roll a 2.0 release

That sound good to everyone?

Cheers,
Chris

[1] http://github.com/enis/gora
[2] http://github.com/dogacan/nutchbase
[3] http://svn.apache.org/repos/asf/nutch/branches/nutchbase/

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply via email to