Guys, I've created a new branch for 1.4 on * https://svn.apache.org/repos/asf/nutch/branches/branch-1.4 *
Thanks Jul On 10 June 2011 12:11, Markus Jelsma <markus.jel...@openindex.io> wrote: > > > Guys, > > > > I added a new label 1.4 on the JIRA. Shall we create a new branch 1.4 on > > SVN from the existing 1.3? I agree that it is a pain to have to maintain > > 1.x AND trunk in parallel but my feeling is that 2.0 needs more work > > before being completely reliable and in the meantime we might want to add > > new features to the stable 1.x branch. > > Agreed. > > > > > One possible feature would be to add a new endpoint for indexing-backends > > and make the indexing plugable. at the moment we are hardwired to SOLR - > > which is OK - but as other resources like ElasticSearch are becoming more > > popular it would be better to handle this as plugins. Not sure about the > > name of the endpoint though : we already have indexing-plugins (which are > > about generating fields sent to the backends) and moreover the backends > are > > not necessarily for indexing / searching but could be just an external > > storage e.g. CouchDB. The term backend on its own would be confusing in > 2.0 > > as this could be pertaining to the storage in GORA. 'indexing-backend' is > > the best name that came to my mind so far - please suggest better ones. > > Yes, i'd like to see this `renamed` as well. I makes perfectly sense to > have a > plugin to `index` to CouchDB as well as send the stuff to Solr and ES. I'm > unsure how to name this. Indexing becomes a bit ambiguous since 1.3. > > > > > For 1.4 (and 2.0) it would be good to improve the detection of duplicates > > so that it detects them using mapreduce on the crawldb instead of pulling > > the docs from SOLR. > > Yes, i remeber a ticket for deduplicating locally (or was it mentioned in > the > 404 cleaner). Anyway, this is really desired as it can take a lot of strain > on > the Solr index, especially if it is also a query/slave node. > > I think we should come up with generic map/reduce jobs for indexing, > deduplicating and cleaning and maybe add a Nutch extension point there so > we > can easily hook up indexing, cleaning and deduplicating for various ... > end- > points? > > > > > Let's just add to the wishlist on JIRA with the tag 1.4. Is everybody > happy > > with having a new branch 1.4? > > I'm not everybody but +1 anyway ;) > > > > > Jul > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com