LanguageIdentifier API enhancements
-----------------------------------

                 Key: TIKA-465
                 URL: https://issues.apache.org/jira/browse/TIKA-465
             Project: Tika
          Issue Type: Improvement
          Components: languageidentifier
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
            Priority: Minor


As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set 
of improvements for the LanguageIdentifier that we should consider in Tika:

{quote}
More informations can be found on the following thread on Nutch-Dev mailing 
list:
http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html

Summary:

1. LanguageIdentifier API changes. The similarity methods should return an 
ordered array of language-code/score pairs instead of a simple String 
containing the language-code.

2. Ensure consistency between LanguageIdentifier scoring and 
NGramProfile.getSimilarity().
{quote}

I just wanted to capture the issue here in Tika, since I'm about to close it 
out in Nutch since LanguageIdentification is something that can happen in 
Tika-ville...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to