[ https://issues.apache.org/jira/browse/NUTCH-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986827#action_12986827 ]
Ken Krugler commented on NUTCH-960: ----------------------------------- There are a number of Tika issues filed that relate to this. See TIKA-369, TIKA-496, TIKA-568. > Language ID - confidence factor > ------------------------------- > > Key: NUTCH-960 > URL: https://issues.apache.org/jira/browse/NUTCH-960 > Project: Nutch > Issue Type: Wish > Affects Versions: 1.2 > Reporter: M Alexander > > Hi > In JAVA implementation, what is the best way to calculate the confidence of > the outcome of the language id for a given text? > For example: > n-gram matching / total n-gram * 100. > when a text is passed. The outcome would be "en" with 89% confidence. What is > the best way to implement this to the existig nutch language id code? > Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.