Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
github-actions[bot] commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-2060090828 This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the d...@solr.apache.org mailing list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
risdenk commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1941952643 @cpoerschke https://github.com/apache/solr/pull/1510 might be helpful here. I have a few wip prs for newer jdks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1941906981 > ... It would be interesting to think about if there was a way for Solr `main` branch to somehow depend on the Lucene `main` branch release that jumps the minimum Java versions all around, and would allow this PR to be merged. Technically I guess Solr `main` could continue to depend on whatever Lucene version and just jumping up the minimum Java version for Solr `main` to 17 would be sufficient? With all the ups-and-downs of `main` and `branch_9x` having different minimums. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1488136630 ## versions.props: ## @@ -49,6 +49,7 @@ org.apache.httpcomponents:httpmime=4.5.14 org.apache.kerby:*=1.0.1 org.apache.logging.log4j:*=2.21.0 org.apache.lucene:*=9.9.2 Review Comment: Temporarily within this pull request (pre-merge) we could change this to a Lucene 10 prerelease based on the _"Update Lucene prerelease"_ steps in https://github.com/apache/solr/blob/main/help/dependencies.txt ... ... might be worth waiting though until Lucene 9.10 is out and https://issues.apache.org/jira/browse/SOLR-17157 has upgraded Solr to use it i.e. then `solr/main` will be closer to `lucene/main` than it is right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1941576718 Lucene 9 requires older version of Java than the minimum required version that OpenNLP requires. That means that this PR is pending a release of Lucene 10, and the adoption of Lucene 10 by Solr. It would be interesting to think about if there was a way for Solr `main` branch to somehow depend on the Lucene `main` branch release that jumps the minimum Java versions all around, and would allow this PR to be merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1898950454 > Feels like what we should be doing is having Solr 10 target Java 17 since Lucene 10 will require it, and then this code goes on Solr 10, but not on Solr 9. This lets us have some more time to experiment with out dealing with the headaches of supporting an official release in the 9.x line (backcompat and the rest)?? I concur. Also a nice motivation for targeting Java 17 i.e. specific example of functionality that it would unlock. And in the meantime "independent plugin" approaches remain a possibility in the community, perhaps even in the https://github.com/apache/solr-sandbox if someone wanted to pursue that (haven't checked how that is built, just kinda "name dropping" `solr-sandbox` here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1898837417 Feels like what we should be doing is having Solr 10 target Java 17 since Lucene 10 will require it, and then this code goes on Solr 10, but not on Solr 9. This lets us have some more time to experiment with out dealing with the headaches of supporting an official release in the 9.x line (backcompat and the rest)?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
rzo1 commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1457694995 ## solr/licenses/onnxruntime-LICENSE-MIT.txt: ## @@ -0,0 +1,21 @@ +MIT License Review Comment: Next opennlp will have 1.16.3 (or higher) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
rzo1 commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1898807122 > So if the classes here were built as an independent [plugin](https://solr.apache.org/guide/solr/latest/configuration-guide/solr-plugins.html) (with minimum Java17) and then deployed (with the relevant dependencies) into a Solr setup running Java17 with the original Solr artefacts (built with Java11) -- I wonder if that would work? I guess this should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1898791814 > Looking at these build failures and the error message generated, it appears that it may be caused by us using Java 11 and OpenNLP being compiled with Java 17?? ... Is this a deal breaker for this PR? Hmm, interesting. So we have: * OpenNLP as minimum Java17 as you mention -- https://github.com/apache/opennlp/blob/opennlp-2.3.1/pom.xml#L167 * Lucene as minimum Java11 -- https://github.com/apache/lucene/blob/releases/lucene/9.9.1/build.gradle#L75-L76 * Solr as minimum Java11 -- https://github.com/apache/solr/blob/releases/solr/9.4.1/build.gradle#L88 So if the classes here were built as an independent [plugin](https://solr.apache.org/guide/solr/latest/configuration-guide/solr-plugins.html) (with minimum Java17) and then deployed (with the relevant dependencies) into a Solr setup running Java17 with the original Solr artefacts (built with Java11) -- I wonder if that would work? Also noting that https://github.com/apache/lucene/pull/579 bumped Lucene to Java17 on `main` branch i.e. presumably then a future Lucene10 will be minimum Java17 version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1883968479 Looking at these build failures and the error message generated, it appears that it may be caused by us using Java 11 and OpenNLP being compiled with Java 17?? ``` /home/runner/work/solr/solr/solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:39: error: cannot access InferenceOptions import opennlp.dl.InferenceOptions; ^ bad class file: /home/runner/.gradle/caches/modules-2/files-2.1/org.apache.opennlp/opennlp-dl/2.3.1/8ff28619e6a377fe467b47274f39fd1fc9b2c303/opennlp-dl-2.3.1.jar(/opennlp/dl/InferenceOptions.class) class file has wrong version 61.0, should be 55.0 Please remove or make sure it appears in the correct subdirectory of the classpath. /home/runner/work/solr/solr/solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:40: error: cannot access DocumentCategorizerDL ``` Is this a deal breaker for this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1446697110 ## solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java: ## @@ -0,0 +1,566 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR; + +import ai.onnxruntime.OrtException; +import java.io.File; +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.regex.PatternSyntaxException; +import opennlp.dl.InferenceOptions; +import opennlp.dl.doccat.DocumentCategorizerDL; +import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.common.util.Pair; +import org.apache.solr.core.SolrCore; +import org.apache.solr.filestore.FileStoreAPI; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams; +import org.apache.solr.util.plugin.SolrCoreAware; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DocumentCategorizerUpdateProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String SOURCE_PARAM = "source"; + public static final String DEST_PARAM = "dest"; + public static final String PATTERN_PARAM = "pattern"; + public static final String REPLACEMENT_PARAM = "replacement"; + public static final String MODEL_PARAM = "modelFile"; + public static final String VOCAB_PARAM = "vocabFile"; + + private Path solrHome; + + private SelectorParams srcInclusions = new SelectorParams(); + private Collection srcExclusions = new ArrayList<>(); + + private FieldNameSelector srcSelector = null; + + private String model = null; + private String vocab = null; + private String analyzerFieldType = null; + + /** + * If pattern is null, this this is a literal field name. If pattern is non-null then this is a + * replacement string that may contain meta-characters (ie: capture group identifiers) + * + * @see #pattern + */ + private String dest = null; + /** + * @see #dest + */ + private Pattern pattern = null; + + protected final FieldNameSelector getSourceSelector() { +if (null != srcSelector) return srcSelector; + +throw new SolrException( +SERVER_ERROR, "selector was never initialized, inform(SolrCore) never called???"); + } + + @Override + public void init(NamedList args) { + +// high level (loose) check for which type of config we have. +// +// individual init methods do more strict syntax checking +if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 0)) { + initSourceSelectorSyntax(args); +} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= args.indexOf(REPLACEMENT_PARAM, 0)) { + initSimpleRegexReplacement(args); +} else { + throw new SolrException( + SERVER_ERROR, + "A combination of either '" + + SOURCE_PARAM + + "' + '" + + DEST_PARAM + + "', or '" + + REPLACEMENT_PARAM + + "' + '" + + PATTERN_PARAM + + "' init params are mandatory"); +} + +Object modelParam = args.remove(MODEL_PARAM); +if
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1446696389 ## solr/licenses/onnxruntime-LICENSE-MIT.txt: ## @@ -0,0 +1,21 @@ +MIT License Review Comment: I *think* we don't specify the version of onnx, so maybe we poke them to update? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1446447648 ## solr/licenses/onnxruntime-LICENSE-MIT.txt: ## @@ -0,0 +1,21 @@ +MIT License Review Comment: part-answering own question: OpenNLP 2.3.1 uses onnxruntime 1.15.0 - https://github.com/apache/opennlp/blame/opennlp-2.3.1/pom.xml#L176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1446443017 ## solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java: ## @@ -0,0 +1,566 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR; + +import ai.onnxruntime.OrtException; +import java.io.File; +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.regex.PatternSyntaxException; +import opennlp.dl.InferenceOptions; +import opennlp.dl.doccat.DocumentCategorizerDL; +import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.common.util.Pair; +import org.apache.solr.core.SolrCore; +import org.apache.solr.filestore.FileStoreAPI; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams; +import org.apache.solr.util.plugin.SolrCoreAware; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DocumentCategorizerUpdateProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String SOURCE_PARAM = "source"; + public static final String DEST_PARAM = "dest"; + public static final String PATTERN_PARAM = "pattern"; + public static final String REPLACEMENT_PARAM = "replacement"; + public static final String MODEL_PARAM = "modelFile"; + public static final String VOCAB_PARAM = "vocabFile"; + + private Path solrHome; + + private SelectorParams srcInclusions = new SelectorParams(); + private Collection srcExclusions = new ArrayList<>(); + + private FieldNameSelector srcSelector = null; + + private String model = null; + private String vocab = null; + private String analyzerFieldType = null; + + /** + * If pattern is null, this this is a literal field name. If pattern is non-null then this is a + * replacement string that may contain meta-characters (ie: capture group identifiers) + * + * @see #pattern + */ + private String dest = null; + /** + * @see #dest + */ + private Pattern pattern = null; + + protected final FieldNameSelector getSourceSelector() { +if (null != srcSelector) return srcSelector; + +throw new SolrException( +SERVER_ERROR, "selector was never initialized, inform(SolrCore) never called???"); + } + + @Override + public void init(NamedList args) { + +// high level (loose) check for which type of config we have. +// +// individual init methods do more strict syntax checking +if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 0)) { + initSourceSelectorSyntax(args); +} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= args.indexOf(REPLACEMENT_PARAM, 0)) { + initSimpleRegexReplacement(args); +} else { + throw new SolrException( + SERVER_ERROR, + "A combination of either '" + + SOURCE_PARAM + + "' + '" + + DEST_PARAM + + "', or '" + + REPLACEMENT_PARAM + + "' + '" + + PATTERN_PARAM + + "' init params are mandatory"); +} + +Object modelParam = args.remove(MODEL_PARAM); +
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
cpoerschke commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1446436416 ## solr/licenses/onnxruntime-LICENSE-MIT.txt: ## @@ -0,0 +1,21 @@ +MIT License Review Comment: Looking up https://github.com/microsoft/onnxruntime/blob/v1.15.0/LICENSE here I noticed there's a 1.15.1 and 1.16.x now too, wondering about 1.15.0 vs. the later ones then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1883364525 @cpoerschke when I demoed this code at the last community meetup, @gerlowskija asked why not to commit it, and I didn't have a super great reason. I'd love your thoughts on this PR since you played some with ONNX as well.. Is there anything here you think needs changing before its get merged? I'd love to get the ONNX stuff in and unblock your work... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1833889740 > Looks like a great first step! Glad that OpenNLP 2.3.1 helped move it along. I did a community demo yesterday, and it went well.Having 2.3.1 meant I could remove some ugly moving of Jars! Which made the demo more compelling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
jzonthemtn commented on PR #1999: URL: https://github.com/apache/solr/pull/1999#issuecomment-1832486680 Looks like a great first step! Glad that OpenNLP 2.3.1 helped move it along. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
jzonthemtn commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1409711871 ## solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java: ## @@ -0,0 +1,566 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR; + +import ai.onnxruntime.OrtException; +import java.io.File; +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.regex.PatternSyntaxException; +import opennlp.dl.InferenceOptions; +import opennlp.dl.doccat.DocumentCategorizerDL; +import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.common.util.Pair; +import org.apache.solr.core.SolrCore; +import org.apache.solr.filestore.PackageStoreAPI; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector; +import org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams; +import org.apache.solr.util.plugin.SolrCoreAware; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DocumentCategorizerUpdateProcessorFactory extends UpdateRequestProcessorFactory +implements SolrCoreAware { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String SOURCE_PARAM = "source"; + public static final String DEST_PARAM = "dest"; + public static final String PATTERN_PARAM = "pattern"; + public static final String REPLACEMENT_PARAM = "replacement"; + public static final String MODEL_PARAM = "modelFile"; + public static final String VOCAB_PARAM = "vocabFile"; + + private Path solrHome; + + private SelectorParams srcInclusions = new SelectorParams(); + private Collection srcExclusions = new ArrayList<>(); + + private FieldNameSelector srcSelector = null; + + private String model = null; + private String vocab = null; + private String analyzerFieldType = null; + + /** + * If pattern is null, this this is a literal field name. If pattern is non-null then this is a + * replacement string that may contain meta-characters (ie: capture group identifiers) + * + * @see #pattern + */ + private String dest = null; + /** + * @see #dest + */ + private Pattern pattern = null; + + protected final FieldNameSelector getSourceSelector() { +if (null != srcSelector) return srcSelector; + +throw new SolrException( +SERVER_ERROR, "selector was never initialized, inform(SolrCore) never called???"); + } + + @Override + public void init(NamedList args) { + +// high level (loose) check for which type of config we have. +// +// individual init methods do more strict syntax checking +if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 0)) { + initSourceSelectorSyntax(args); +} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= args.indexOf(REPLACEMENT_PARAM, 0)) { + initSimpleRegexReplacement(args); +} else { + throw new SolrException( + SERVER_ERROR, + "A combination of either '" + + SOURCE_PARAM + + "' + '" + + DEST_PARAM + + "', or '" + + REPLACEMENT_PARAM + + "' + '" + + PATTERN_PARAM + + "' init params are mandatory"); +} + +Object modelParam = args.remove(MODEL_PARAM); +
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
epugh commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1406194193 ## solr/packaging/test/test_opennlp.bats: ## @@ -0,0 +1,110 @@ +#!/usr/bin/env bats + +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +load bats_helper + +setup_file() { + common_clean_setup + +} + +teardown_file() { + common_setup + solr stop -all +} + +setup() { + common_setup +} + +teardown() { + # save a snapshot of SOLR_HOME for failed tests + save_home_on_failure +} + +@test "Check lifecycle of sentiment classification" { + + # GPU versions is linux and windows only, not OSX. So swap jars. + rm -f ${SOLR_TIP}/modules/analysis-extras/lib/onnxruntime_gpu-1.14.0.jar Review Comment: Thank you!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]
rzo1 commented on code in PR #1999: URL: https://github.com/apache/solr/pull/1999#discussion_r1405818295 ## solr/packaging/test/test_opennlp.bats: ## @@ -0,0 +1,110 @@ +#!/usr/bin/env bats + +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +load bats_helper + +setup_file() { + common_clean_setup + +} + +teardown_file() { + common_setup + solr stop -all +} + +setup() { + common_setup +} + +teardown() { + # save a snapshot of SOLR_HOME for failed tests + save_home_on_failure +} + +@test "Check lifecycle of sentiment classification" { + + # GPU versions is linux and windows only, not OSX. So swap jars. + rm -f ${SOLR_TIP}/modules/analysis-extras/lib/onnxruntime_gpu-1.14.0.jar Review Comment: Release is done. New artifacts should be available soon ;-) (not depending on gpu anymore) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org