[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791979#action_12791979
]
Andrzej Bialecki commented on NUTCH-666:
-----------------------------------------
Do you think it was related to the quality of language models that you built
(presumably the ones in the patch?) versus the ones in the Nutch plugin, or due
to a different classification algorithm? I'm trying to understand the source of
such a big difference, because AFAIK the algorithm in textcat is essentially
the same as the one we use.
> Analysis plugins for multiple language and new Language Identifier Tool
> -----------------------------------------------------------------------
>
> Key: NUTCH-666
> URL: https://issues.apache.org/jira/browse/NUTCH-666
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.1
> Environment: All
> Reporter: Dennis Kubes
> Assignee: Dennis Kubes
> Fix For: 1.1
>
> Attachments: NUTCH-666-1-20081126.patch, NUTCH-666-2-20091217-nf.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch,
> russian, and thai. Also includes a new Language Identifier tool that used
> the new indexing framework in NUTCH-646.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.