[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967010#action_12967010
]
Grant Ingersoll commented on SOLR-1979:
---------------------------------------
bq. I would like to see RFC 3066 instead
Yeah, that makes sense, however, I believe Tika returns 639. (Tika doesn't
recognize Chinese yet at all). One approach is we could normalize, I suppose.
Another is to fix Tika. I'd really like to see Tika support more languages,
too.
Longer term, I'd like to not do the fieldName_LangCode thing at all and instead
let the user supply a string that could have variable substitution if they
want, something like fieldName_${langCode}, or it could be
${langCode}_fieldName or it could just be another literal.
> Create LanguageIdentifierUpdateProcessor
> ----------------------------------------
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Jan Høydahl
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act
> upon it, such as indexing the content into language aware fields. Another
> usecase is to be able to filter/facet on language on random unstructured
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The
> processor is configurable like this:
> {code:xml}
> <processor
> class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> <str name="inputFields">name,subject</str>
> <str name="outputField">language_s</str>
> <str name="idField">id</str>
> <str name="fallback">en</str>
> </processor>
> {code}
> It will then read the text from inputFields name and subject, perform
> language identification and output the ISO code for the detected language in
> the outputField. If no language was detected, fallback language is used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]