rzo1 opened a new pull request, #635: URL: https://github.com/apache/opennlp/pull/635
This (draft) PRs introduces a more aggressive caching strategy in the cached feature generator, which doesn't rely on `==`. However, the eval results are a bit odd for the Conll02 dataset: Default (Spanish): ``` opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=100385 misses=52923 hit%0.6547929657943486 opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=98677 misses=51533 hit%0.6569269689101924 ``` 1589-Caching (Spanish) ``` opennlp.tools.util.featuregen.CachedFeatureGenerator@2e385cce: hits=102229 misses=51079 hit%0.6668210399979126 opennlp.tools.util.featuregen.CachedFeatureGenerator@6e6f2380: hits=99197 misses=51013 hit%0.6603887890286931 ``` Default (main) (Dutch) ``` opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=67301 misses=37687 hit%0.6410351659237247 opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=123179 misses=68875 hit%0.6413769044123007 ``` 1589-Caching (Dutch) ``` opennlp.tools.util.featuregen.CachedFeatureGenerator@45d84a20: hits=68174 misses=36814 hit%0.6493504019506992 opennlp.tools.util.featuregen.CachedFeatureGenerator@52f27fbd: hits=124618 misses=67436 hit%0.6488695887614941 ``` As you can see, the aggressive mechanism results in better caching. It doesn't have an impact on Spanish and on any other eval test **but** the results for conll02 for **dutch** are odd (see changes in eval f1 scores). They are sometimes slightly better but at the same time decrease in some scenarios. I am actually wondering, why we don't see such changes in f-measure for Spanish. Therefore, I am opening this PR, so you can also investigate what is going on here. @mawiesne is also having a look here, but we would appreciate some additional 👁️ 👁️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
