[ 
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924145#action_12924145
 ] 

Matthias Agethle commented on NUTCH-923:
----------------------------------------

What about querying Solr for the configured fields (perhaps one can do this 
using LukeRequestHandler, I'm not sure)?
When sending data to Solr one could check if they exist in the Solr schema; if 
not don't add this field and give a warning.

The other thing that comes to my mind is: what are valid field-names in solr? 
Obviously letters, numbers and so on, but is there a validation in Solr?
One could use this to check if a dynamically generated field name is compliant 
with solr (and in this way excluding control characters in field-names as 
Andrzej mentioned it).

> Multilingual support for Solr-index-mapping
> -------------------------------------------
>
>                 Key: NUTCH-923
>                 URL: https://issues.apache.org/jira/browse/NUTCH-923
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.2
>            Reporter: Matthias Agethle
>            Assignee: Markus Jelsma
>            Priority: Minor
>
> It would be useful to extend the mapping-possibilites when indexing to solr.
> One useful feature would be to use the detected language of the html page 
> (for example via the language-identifier plugin) and send the content to 
> corresponding language-aware solr-fields.
> The mapping file could be as follows:
> <field dest="lang" source="lang"/>
> <field dest="title_${lang}" source="title" />
> so that the title-field gets mapped to title_en for English-pages and 
> tilte_fr for French pages.
> What do you think? Could this be useful also to others?
> Or are there already other solutions out there?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to