rzo1 opened a new pull request, #635:
URL: https://github.com/apache/opennlp/pull/635

   This (draft) PRs introduces a more aggressive caching strategy in the cached 
feature generator, which doesn't rely on `==`.
   
   However, the eval results are a bit odd for the Conll02 dataset:
   
   Default (Spanish):
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=100385 
misses=52923 hit%0.6547929657943486
   opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=98677 
misses=51533 hit%0.6569269689101924
   ```
   
   1589-Caching (Spanish)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2e385cce: hits=102229 
misses=51079 hit%0.6668210399979126
   opennlp.tools.util.featuregen.CachedFeatureGenerator@6e6f2380: hits=99197 
misses=51013 hit%0.6603887890286931
   ```
   
   Default (main) (Dutch)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=67301 
misses=37687 hit%0.6410351659237247
   opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=123179 
misses=68875 hit%0.6413769044123007
   ```
   
   1589-Caching (Dutch)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@45d84a20: hits=68174 
misses=36814 hit%0.6493504019506992
   opennlp.tools.util.featuregen.CachedFeatureGenerator@52f27fbd: hits=124618 
misses=67436 hit%0.6488695887614941
   ```
   
   As you can see, the aggressive mechanism results in better caching.
   
   It doesn't have an impact on Spanish and on any other eval test **but** the 
results for conll02 for **dutch** are odd (see changes in eval f1 scores).
   
   They are sometimes slightly better but at the same time decrease in some 
scenarios. 
   
   I am actually wondering, why we don't see such changes in f-measure for 
Spanish. Therefore, I am opening this PR, so you can also investigate what is 
going on here.
   
   @mawiesne is also having a look here, but we would appreciate some 
additional 👁️ 👁️ 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to