kinow commented on code in PR #523:
URL: https://github.com/apache/opennlp/pull/523#discussion_r1149281568


##########
opennlp-dl/src/main/java/opennlp/dl/doccat/DocumentCategorizerDL.java:
##########
@@ -223,41 +214,14 @@ private int getKey(String value) {
 
   }
 
-  /**
-   * Loads a vocabulary file from disk.
-   * @param vocab The vocabulary file.
-   * @return A map of vocabulary words to integer IDs.
-   * @throws IOException Thrown if the vocabulary file cannot be opened and 
read.
-   */
-  private Map<String, Integer> loadVocab(File vocab) throws IOException {
-
-    final Map<String, Integer> v = new HashMap<>();
-
-    BufferedReader br = new BufferedReader(new FileReader(vocab.getPath()));
-    String line = br.readLine();
-    int x = 0;
-
-    while (line != null) {
-
-      line = br.readLine();
-      x++;
-
-      v.put(line, x);
-
-    }
-
-    return v;
-
-  }
-

Review Comment:
   Nice simplification :+1: !



##########
opennlp-dl/README.md:
##########
@@ -4,44 +4,50 @@ This module provides OpenNLP interface implementations for 
ONNX models using the
 
 **Important**: This does not provide the ability to train models. Model 
training is done outside of OpenNLP. This code provides the ability to use ONNX 
models from OpenNLP.
 
-To build with example models, download the models to the `/src/test/resources` 
directory. (These are the exported models described below.)
+Models used in the tests are available in the opennlp evaluation test data.
 
-```
-
-export OPENNLP_DATA=/tmp/
-mkdir /tmp/dl-doccat /tmp/dl-namefinder
+## NameFinderDL
 
-# Document categorizer model
-wget https://www.dropbox.com/s/n9uzs8r4xm9rhxb/model.onnx?dl=0 -O 
$OPENNLP_DATA/dl-doccat/model.onnx
-wget https://www.dropbox.com/s/aw6yjc68jw0jts6/vocab.txt?dl=0 -O 
$OPENNLP_DATA/dl-doccat/vocab.txt
+* Export a Huggingface NER model to ONNX, e.g.:
 
-# Namefinder model
-wget https://www.dropbox.com/s/zgogq65gs9tyfm1/model.onnx?dl=0 -O 
$OPENNLP_DATA/dl-namefinder/model.onnx
-wget https://www.dropbox.com/s/3byt1jggly1dg98/vocab.txt?dl=0 -O 
$OPENNLP_DATA/dl-/namefinder/vocab.txt
+```
+python -m transformers.onnx --model=dslim/bert-base-NER --feature 
token-classification exported
 ```
 
-## TokenNameFinder
+## DocumentCategorizerDL
 
-* Export a Huggingface NER model to ONNX, e.g.:
+* Export a Huggingface classification (e.g. sentiment) model to ONNX, e.g.:
 
 ```
-python -m transformers.onnx --model=dslim/bert-base-NER --feature 
token-classification exported
+python -m transformers.onnx 
--model=nlptown/bert-base-multilingual-uncased-sentiment --feature 
sequence-classification exported
 ```
 
-* Copy the exported model to `src/test/resources/namefinder/model.onnx`.
-* Copy the model's 
[vocab.txt](https://huggingface.co/dslim/bert-base-NER/tree/main) to 
`src/test/resources/namefinder/vocab.txt`.
+## SentenceVectors
 
-Now you can run the tests in `NameFinderDLTest`.
+* Convert a sentence vectors model to ONNX, e.g.:

Review Comment:
   Maybe the GitHub UI is confusing me, but was the `* ` intentional here? I'm 
seeing an H2, then this list item, but then after that I see paragraphs with 
"Install dependencies:", "Convert the model"... or were those supposed to be 
list items too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to