Re: solr index question

Andrzej Bialecki Tue, 13 Oct 2009 14:04:42 -0700

david.stu...@progressivealliance.co.uk wrote:

  Hi,
I am being to use nutch to crawl site (great stuff btw) and combined itwith solr pushing the nutch index using the solrindex command. I haveset it up as specified on the wiki using the copyField url to id in theschema. Whilst this works fine it is stuff's up my inputs from othersources in solr (e.g. using the solr data import handler) as they haveboth id's and url's.My question is why was the id field not pushed to solr and this weirdcopy field used because you already know it is the id is going to be theurl. Are there any plans to change this or was a design decision madefor other reasons. Could we look at implementing a nutch xml schemadefining what basic nutch fields map to in your solr push. I have hackedin a fix to the SolrWriter.java but was wondering if it could be workedthrough into a long term supported option?

This comes from the fact that Nutch doesn't really know the schema thatyou are using in Solr, plus the fact that the functional equivalent of"uniqueKey" in Nutch has always been named "url", which is hardcoded insome places ... so, this is a deficiency in Nutch as well. Please notethat the reverse is true as well - SolrSearchBean hardcodes Solr'suniqueKey to "id" instead of using a configurable name.

I agree that both these places should use configurable names. Can youprovide a patch?


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: solr index question

Reply via email to