[ https://issues.apache.org/jira/browse/OPENNLP-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704779#comment-17704779 ]
ASF GitHub Bot commented on OPENNLP-1442: ----------------------------------------- jzonthemtn commented on code in PR #523: URL: https://github.com/apache/opennlp/pull/523#discussion_r1147990411 ########## opennlp-dl/src/main/java/opennlp/dl/AbstractDL.java: ########## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package opennlp.dl; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +import ai.onnxruntime.OrtEnvironment; +import ai.onnxruntime.OrtSession; + +import opennlp.tools.tokenize.Tokenizer; + +/** + * Base class for OpenNLP deep-learning classes using ONNX Runtime. + */ +public abstract class AbstractDL { + + public static final String INPUT_IDS = "input_ids"; + public static final String ATTENTION_MASK = "attention_mask"; + public static final String TOKEN_TYPE_IDS = "token_type_ids"; + + protected OrtEnvironment env; + protected OrtSession session; + protected Tokenizer tokenizer; + protected Map<String, Integer> vocab; + + /** + * Loads a vocabulary file from disk. + * @param vocabFile The vocabulary file. + * @return A map of vocabulary words to integer IDs. + * @throws IOException Thrown if the vocabulary file cannot be opened and read. + */ + public Map<String, Integer> loadVocab(File vocabFile) throws IOException { + + final Map<String, Integer> v = new HashMap<>(); + + BufferedReader br = new BufferedReader(new FileReader(vocabFile.getPath())); + String line = br.readLine(); Review Comment: I don't think so but I will check. > Use ONNX Runtime to support sentence-transformers > ------------------------------------------------- > > Key: OPENNLP-1442 > URL: https://issues.apache.org/jira/browse/OPENNLP-1442 > Project: OpenNLP > Issue Type: Task > Components: Deep Learning > Reporter: Jeff Zemerick > Assignee: Jeff Zemerick > Priority: Major > > Use ONNX Runtime to support sentence-transformers. OpenNLP should be able to > generate embeddings using an ONNX model. -- This message was sent by Atlassian Jira (v8.20.10#820010)