Hi Andrzej,

I will have a go and putting something together, just wanted to make sure I had
the background and I wasn't just fixing my problem.

Regards,

Dave


On 13 October 2009 at 23:04 Andrzej Bialecki <a...@getopt.org> wrote:

> david.stu...@progressivealliance.co.uk wrote:
> >   Hi,
> > 
> > I am being to use nutch to crawl site (great stuff btw) and combined it 
> > with solr pushing the nutch index using the solrindex command. I have 
> > set it up as specified on the wiki using the copyField url to id in the 
> > schema. Whilst this works fine it is stuff's up my inputs from other 
> > sources in solr (e.g. using the solr data import handler) as they have 
> > both id's and url's.
> > My question is why was the id field not pushed to solr and this weird 
> > copy field used because you already know it is the id is going to be the 
> > url. Are there any plans to change this or was a design decision made 
> > for other reasons. Could we look at implementing a nutch xml schema 
> > defining what basic nutch fields map to in your solr push. I have hacked 
> > in a fix to the SolrWriter.java but was wondering if it could be worked 
> > through into a long term supported option?
> 
> This comes from the fact that Nutch doesn't really know the schema that 
> you are using in Solr, plus the fact that the functional equivalent of 
> "uniqueKey" in Nutch has always been named "url", which is hardcoded in 
> some places ... so, this is a deficiency in Nutch as well. Please note 
> that the reverse is true as well - SolrSearchBean hardcodes Solr's 
> uniqueKey to "id" instead of using a configurable name.
> 
> I agree that both these places should use configurable names. Can you 
> provide a patch?
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>

Reply via email to