[ 
https://issues.apache.org/jira/browse/OPENNLP-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775282#comment-17775282
 ] 

ASF GitHub Bot commented on OPENNLP-1384:
-----------------------------------------

rzo1 commented on code in PR #553:
URL: https://github.com/apache/opennlp/pull/553#discussion_r1359516231


##########
opennlp-dl/src/test/java/opennlp/dl/doccat/DocumentCategorizerDLEval.java:
##########
@@ -92,6 +92,46 @@ public void categorize() throws IOException, OrtException {
 
   }
 
+  @Test
+  public void categorizeWithAutomaticLabels() throws IOException, OrtException 
{
+
+    final File model = new File(getOpennlpDataDir(),

Review Comment:
   No. In this configuration, it relies on a data directory, which must 
supplied via a `-DOPENNLP_DATA_DIR=PATH` parameter or system property value.
   
   There are other options used in OpenNLP as well:
   
   - 
https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java#L126
 is used to download models. Models are cached in a local directory on disk
   - One could also rely on maven download plugin to download and cache locally.
   - 
https://github.com/apache/opennlp/blob/6fde608cb0dd5866c6330f3e3dcd04f791c4ef96/opennlp-tools/src/test/java/opennlp/tools/EnabledWhenCDNAvailable.java#L38
 only download stuff, if you are online
   
   I could also imagine, that one could build a special annotation to only 
download the model files, if you are in a CI/CD context in which it doesn't 
really matter, if additional 600mb are downloaded ;-)





> Automatically generate document classifications map from model's config.json
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-1384
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1384
>             Project: OpenNLP
>          Issue Type: Task
>          Components: Deep Learning
>    Affects Versions: 2.0.0
>            Reporter: Jeff Zemerick
>            Assignee: Jeff Zemerick
>            Priority: Major
>
> Automatically generate classifications map from model's config.json.
> Currently, the implementations utilizing ONNX Runtime require a Map that 
> stores the model-assigned value along with the human readable name for each 
> value. This map must be created manually:
> Map<Integer, String> classifications = new HashMap<>();
> classifications.put(0, "negative");
> classifications.put(1, "positive");
> How to create this map is determined by looking at the model's config.json 
> file. This task is to have OpenNLP read the config.json file and make the map 
> automatically instead of requiring the user to make it manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to