Hi,

I am being to use nutch to crawl site (great stuff btw) and combined it with
solr pushing the nutch index using the solrindex command. I have set it up as
specified on the wiki using the copyField url to id in the schema. Whilst this
works fine it is stuff's up my inputs from other sources in solr (e.g. using the
solr data import handler) as they have both id's and url's. 
My question is why was the id field not pushed to solr and this weird copy field
used because you already know it is the id is going to be the url. Are there any
plans to change this or was a design decision made for other reasons. Could we
look at implementing a nutch xml schema defining what basic nutch fields map to
in your solr push. I have hacked in a fix to the SolrWriter.java but was
wondering if it could be worked through into a long term supported option?

Regards,


David

Reply via email to