[
https://issues.apache.org/jira/browse/OPENNLP-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nishant Shrivastava updated OPENNLP-1736:
-----------------------------------------
Description:
Currently, NGramLanguageModel uses stupid backoff to deal with “zero
probability n-grams”. https://issues.apache.org/jira/browse/OPENNLP-986
It will be useful, if we can refactor it to pass a smoothing/discounting logic
from outside.
This will allow us to add and use implementations of other
smoothing/discounting techniques (e.g. below) in future.
[https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing]
[https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation]
was:
Currently, NGramLanguageModel uses stupid backoff to deal with “zero
probability n-grams”. https://issues.apache.org/jira/browse/OPENNLP-986
It will be useful, if we can refactor it to pass a smoothing/discounting logic
from outside.
This will allow us to add implementations of other smoothing/discounting
techniques (e.g. below) in future.
https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
> NGramLanguageModel - Allow choice of smoothing/discounting algorithm
> --------------------------------------------------------------------
>
> Key: OPENNLP-1736
> URL: https://issues.apache.org/jira/browse/OPENNLP-1736
> Project: OpenNLP
> Issue Type: Improvement
> Components: language model
> Affects Versions: 2.5.4
> Reporter: Nishant Shrivastava
> Priority: Minor
>
> Currently, NGramLanguageModel uses stupid backoff to deal with “zero
> probability n-grams”. https://issues.apache.org/jira/browse/OPENNLP-986
> It will be useful, if we can refactor it to pass a smoothing/discounting
> logic from outside.
> This will allow us to add and use implementations of other
> smoothing/discounting techniques (e.g. below) in future.
> [https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing]
> [https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)