[ 
https://issues.apache.org/jira/browse/NUTCH-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767918#action_12767918
 ] 

Andrzej Bialecki  commented on NUTCH-760:
-----------------------------------------

A few comments to the latest patch:

* the description of the property in nutch-default.xml could be more 
descriptive ;)

* <schema> element has name and version attributes - do we really need these? 
It's not a Solr schema.xml anyway, so we don't have to pretend that we follow 
the same format.

* SolrSchemaReader uses static instance of NutchConfiguration - this is a big 
no-no, the whole point of using the property in nutch-default.xml is that you 
could set different values, and making this field static basically pins down 
the configuration to the version set on the first instantiation of the class 
... Please do as other similar classes do - implement Configurable, or add 
Configuration to the constructor, and pass the current job configuration where 
appropriate.

* consequently, static references to SolrSchemaReader need to be un-staticized 
in other places.

* minor nits: code formatting should use 2 literal spaces indents. There are 
some accidental changes in NutchBean and SolrWriter.

> Allow field mapping from nutch to solr index
> --------------------------------------------
>
>                 Key: NUTCH-760
>                 URL: https://issues.apache.org/jira/browse/NUTCH-760
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: David Stuart
>         Attachments: solrindex_schema.patch, solrindex_schema.patch, 
> solrindex_schema.patch
>
>
> I am using nutch to crawl sites and have combined it
> with solr pushing the nutch index using the solrindex command. I have
> set it up as specified on the wiki using the copyField url to id in the
> schema. Whilst this works fine it is stuff's up my inputs from other
> sources in solr (e.g. using the solr data import handler) as they have
> both id's and url's. I have patch that implements a nutch xml schema
> defining what basic nutch fields map to in your solr push.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to