[ https://issues.apache.org/jira/browse/OPENNLP-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775282#comment-17775282 ]
ASF GitHub Bot commented on OPENNLP-1384: ----------------------------------------- rzo1 commented on code in PR #553: URL: https://github.com/apache/opennlp/pull/553#discussion_r1359516231 ########## opennlp-dl/src/test/java/opennlp/dl/doccat/DocumentCategorizerDLEval.java: ########## @@ -92,6 +92,46 @@ public void categorize() throws IOException, OrtException { } + @Test + public void categorizeWithAutomaticLabels() throws IOException, OrtException { + + final File model = new File(getOpennlpDataDir(), Review Comment: No. In this configuration, it relies on a data directory, which must supplied via a `-DOPENNLP_DATA_DIR=PATH` parameter or system property value. There are other options used in OpenNLP as well: - https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java#L126 is used to download models. Models are cached in a local directory on disk - One could also rely on maven download plugin to download and cache locally. - https://github.com/apache/opennlp/blob/6fde608cb0dd5866c6330f3e3dcd04f791c4ef96/opennlp-tools/src/test/java/opennlp/tools/EnabledWhenCDNAvailable.java#L38 only download stuff, if you are online I could also imagine, that one could build a special annotation to only download the model files, if you are in a CI/CD context in which it doesn't really matter, if additional 600mb are downloaded ;-) > Automatically generate document classifications map from model's config.json > ---------------------------------------------------------------------------- > > Key: OPENNLP-1384 > URL: https://issues.apache.org/jira/browse/OPENNLP-1384 > Project: OpenNLP > Issue Type: Task > Components: Deep Learning > Affects Versions: 2.0.0 > Reporter: Jeff Zemerick > Assignee: Jeff Zemerick > Priority: Major > > Automatically generate classifications map from model's config.json. > Currently, the implementations utilizing ONNX Runtime require a Map that > stores the model-assigned value along with the human readable name for each > value. This map must be created manually: > Map<Integer, String> classifications = new HashMap<>(); > classifications.put(0, "negative"); > classifications.put(1, "positive"); > How to create this map is determined by looking at the model's config.json > file. This task is to have OpenNLP read the config.json file and make the map > automatically instead of requiring the user to make it manually. -- This message was sent by Atlassian Jira (v8.20.10#820010)