[ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342023#comment-14342023 ]
Tyler Palsulich commented on TIKA-465: -------------------------------------- Is there still interest in implementing this? If not, I'll close off later this week. > LanguageIdentifier API enhancements > ----------------------------------- > > Key: TIKA-465 > URL: https://issues.apache.org/jira/browse/TIKA-465 > Project: Tika > Issue Type: Improvement > Components: languageidentifier > Reporter: Chris A. Mattmann > Assignee: Ken Krugler > Priority: Minor > > As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set > of improvements for the LanguageIdentifier that we should consider in Tika: > {quote} > More informations can be found on the following thread on Nutch-Dev mailing > list: > http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html > Summary: > 1. LanguageIdentifier API changes. The similarity methods should return an > ordered array of language-code/score pairs instead of a simple String > containing the language-code. > 2. Ensure consistency between LanguageIdentifier scoring and > NGramProfile.getSimilarity(). > {quote} > I just wanted to capture the issue here in Tika, since I'm about to close it > out in Nutch since LanguageIdentification is something that can happen in > Tika-ville... -- This message was sent by Atlassian JIRA (v6.3.4#6332)