[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967204#action_12967204 ]
Yonik Seeley commented on SOLR-1979: ------------------------------------ bq. Yonik, I wasn't planning on relying on dynamic fields necessarily. It may make sense to have users either predeclare the variations. Sure, but the problem was the ease by which a generated field of originalname_${langcode} could clash with existing fields (regardless of if they are dynamic fields) due to there being many different language codes. If we use regex naming as Jan suggests (or another configurable mechanism) then the issue comes down to what we configure by default or by example. > Create LanguageIdentifierUpdateProcessor > ---------------------------------------- > > Key: SOLR-1979 > URL: https://issues.apache.org/jira/browse/SOLR-1979 > Project: Solr > Issue Type: New Feature > Components: update > Reporter: Jan Høydahl > Assignee: Grant Ingersoll > Priority: Minor > Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch > > > We need the ability to detect language of some random text in order to act > upon it, such as indexing the content into language aware fields. Another > usecase is to be able to filter/facet on language on random unstructured > content. > To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The > processor is configurable like this: > {code:xml} > <processor > class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory"> > <str name="inputFields">name,subject</str> > <str name="outputField">language_s</str> > <str name="idField">id</str> > <str name="fallback">en</str> > </processor> > {code} > It will then read the text from inputFields name and subject, perform > language identification and output the ISO code for the detected language in > the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org