Hi Andrzej, I will have a go and putting something together, just wanted to make sure I had the background and I wasn't just fixing my problem.
Regards, Dave On 13 October 2009 at 23:04 Andrzej Bialecki <a...@getopt.org> wrote: > david.stu...@progressivealliance.co.uk wrote: > > Hi, > > > > I am being to use nutch to crawl site (great stuff btw) and combined it > > with solr pushing the nutch index using the solrindex command. I have > > set it up as specified on the wiki using the copyField url to id in the > > schema. Whilst this works fine it is stuff's up my inputs from other > > sources in solr (e.g. using the solr data import handler) as they have > > both id's and url's. > > My question is why was the id field not pushed to solr and this weird > > copy field used because you already know it is the id is going to be the > > url. Are there any plans to change this or was a design decision made > > for other reasons. Could we look at implementing a nutch xml schema > > defining what basic nutch fields map to in your solr push. I have hacked > > in a fix to the SolrWriter.java but was wondering if it could be worked > > through into a long term supported option? > > This comes from the fact that Nutch doesn't really know the schema that > you are using in Solr, plus the fact that the functional equivalent of > "uniqueKey" in Nutch has always been named "url", which is hardcoded in > some places ... so, this is a deficiency in Nutch as well. Please note > that the reverse is true as well - SolrSearchBean hardcodes Solr's > uniqueKey to "id" instead of using a configurable name. > > I agree that both these places should use configurable names. Can you > provide a patch? > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com >