[jira] [Closed] (TIKA-465) LanguageIdentifier API enhancements

Ken Krugler (JIRA) Sun, 01 Mar 2015 15:01:17 -0800

     [ 
https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ken Krugler closed TIKA-465.
----------------------------
    Resolution: Won't Fix

The change to the API to return more information about the detected languages 
is still interesting, but I think it makes more sense to look at using a 
different detector (e.g. language-detector/detection) versus improving the 
internal version that was ported from Nutch back in the day.

> LanguageIdentifier API enhancements
> -----------------------------------
>
>                 Key: TIKA-465
>                 URL: https://issues.apache.org/jira/browse/TIKA-465
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>            Reporter: Chris A. Mattmann
>            Assignee: Ken Krugler
>            Priority: Minor
>
> As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set 
> of improvements for the LanguageIdentifier that we should consider in Tika:
> {quote}
> More informations can be found on the following thread on Nutch-Dev mailing 
> list:
> http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html
> Summary:
> 1. LanguageIdentifier API changes. The similarity methods should return an 
> ordered array of language-code/score pairs instead of a simple String 
> containing the language-code.
> 2. Ensure consistency between LanguageIdentifier scoring and 
> NGramProfile.getSimilarity().
> {quote}
> I just wanted to capture the issue here in Tika, since I'm about to close it 
> out in Nutch since LanguageIdentification is something that can happen in 
> Tika-ville...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (TIKA-465) LanguageIdentifier API enhancements

Reply via email to