Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-04-16 Thread via GitHub


github-actions[bot] commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-2060090828

   This PR had no visible activity in the past 60 days, labeling it as stale. 
Any new activity will remove the stale label. To attract more reviewers, please 
tag someone or notify the d...@solr.apache.org mailing list. Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-02-13 Thread via GitHub


risdenk commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1941952643

   @cpoerschke https://github.com/apache/solr/pull/1510 might be helpful here. 
I have a few wip prs for newer jdks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-02-13 Thread via GitHub


cpoerschke commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1941906981

   > ... It would be interesting to think about if there was a way for Solr 
`main` branch to somehow depend on the Lucene `main` branch release that jumps 
the minimum Java versions all around, and would allow this PR to be merged.
   
   Technically I guess Solr `main` could continue to depend on whatever Lucene 
version and just jumping up the minimum Java version for Solr `main` to 17 
would be sufficient? With all the ups-and-downs of `main` and `branch_9x` 
having different minimums.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-02-13 Thread via GitHub


cpoerschke commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1488136630


##
versions.props:
##
@@ -49,6 +49,7 @@ org.apache.httpcomponents:httpmime=4.5.14
 org.apache.kerby:*=1.0.1
 org.apache.logging.log4j:*=2.21.0
 org.apache.lucene:*=9.9.2

Review Comment:
   Temporarily within this pull request (pre-merge) we could change this to a 
Lucene 10 prerelease based on the _"Update Lucene prerelease"_ steps in 
https://github.com/apache/solr/blob/main/help/dependencies.txt ...
   
   ... might be worth waiting though until Lucene 9.10 is out and 
https://issues.apache.org/jira/browse/SOLR-17157 has upgraded Solr to use it 
i.e. then `solr/main` will be closer to `lucene/main` than it is right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-02-13 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1941576718

   Lucene 9 requires older version of Java than the minimum required version 
that OpenNLP requires. That means that this PR is pending a release of Lucene 
10, and the adoption of Lucene 10 by Solr.   It would be interesting to think 
about if there was a way for Solr `main` branch to somehow depend on the Lucene 
`main` branch release that jumps the minimum Java versions all around, and 
would allow this PR to be merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-18 Thread via GitHub


cpoerschke commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1898950454

   > Feels like what we should be doing is having Solr 10 target Java 17 since 
Lucene 10 will require it, and then this code goes on Solr 10, but not on Solr 
9. This lets us have some more time to experiment with out dealing with the 
headaches of supporting an official release in the 9.x line (backcompat and the 
rest)??
   
   I concur. Also a nice motivation for targeting Java 17 i.e. specific example 
of functionality that it would unlock. And in the meantime "independent plugin" 
approaches remain a possibility in the community, perhaps even in the 
https://github.com/apache/solr-sandbox if someone wanted to pursue that 
(haven't checked how that is built, just kinda "name dropping" `solr-sandbox` 
here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-18 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1898837417

   Feels like what we should be doing is having Solr 10 target Java 17 since 
Lucene 10 will require it, and then this code goes on Solr 10, but not on Solr 
9.  This lets us have some more time to experiment with out dealing with the 
headaches of supporting an official release in the 9.x line (backcompat and the 
rest)??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-18 Thread via GitHub


rzo1 commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1457694995


##
solr/licenses/onnxruntime-LICENSE-MIT.txt:
##
@@ -0,0 +1,21 @@
+MIT License

Review Comment:
   Next opennlp will have 1.16.3 (or higher)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-18 Thread via GitHub


rzo1 commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1898807122

   > So if the classes here were built as an independent 
[plugin](https://solr.apache.org/guide/solr/latest/configuration-guide/solr-plugins.html)
 (with minimum Java17) and then deployed (with the relevant dependencies) into 
a Solr setup running Java17 with the original Solr artefacts (built with 
Java11) -- I wonder if that would work?
   
   I guess this should work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-18 Thread via GitHub


cpoerschke commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1898791814

   > Looking at these build failures and the error message generated, it 
appears that it may be caused by us using Java 11 and OpenNLP being compiled 
with Java 17?? ... Is this a deal breaker for this PR?
   
   Hmm, interesting. So we have:
   * OpenNLP as minimum Java17 as you mention -- 
https://github.com/apache/opennlp/blob/opennlp-2.3.1/pom.xml#L167
   * Lucene as minimum Java11 -- 
https://github.com/apache/lucene/blob/releases/lucene/9.9.1/build.gradle#L75-L76
   * Solr as minimum Java11 -- 
https://github.com/apache/solr/blob/releases/solr/9.4.1/build.gradle#L88
   
   So if the classes here were built as an independent 
[plugin](https://solr.apache.org/guide/solr/latest/configuration-guide/solr-plugins.html)
 (with minimum Java17) and then deployed (with the relevant dependencies) into 
a Solr setup running Java17 with the original Solr artefacts (built with 
Java11) -- I wonder if that would work?
   
   Also noting that https://github.com/apache/lucene/pull/579 bumped Lucene to 
Java17 on `main` branch i.e. presumably then a future Lucene10 will be minimum 
Java17 version.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1883968479

   Looking at these build failures and the error message generated, it appears 
that it may be caused by us using Java 11 and OpenNLP being compiled with Java 
17??
   
   ```

/home/runner/work/solr/solr/solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:39:
 error: cannot access InferenceOptions
   import opennlp.dl.InferenceOptions;
^
 bad class file: 
/home/runner/.gradle/caches/modules-2/files-2.1/org.apache.opennlp/opennlp-dl/2.3.1/8ff28619e6a377fe467b47274f39fd1fc9b2c303/opennlp-dl-2.3.1.jar(/opennlp/dl/InferenceOptions.class)
   class file has wrong version 61.0, should be 55.0
   Please remove or make sure it appears in the correct subdirectory of the 
classpath.
   
/home/runner/work/solr/solr/solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:40:
 error: cannot access DocumentCategorizerDL
   ```
   Is this a deal breaker for this PR?   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


epugh commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1446697110


##
solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:
##
@@ -0,0 +1,566 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.update.processor;
+
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+
+import ai.onnxruntime.OrtException;
+import java.io.File;
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+import opennlp.dl.InferenceOptions;
+import opennlp.dl.doccat.DocumentCategorizerDL;
+import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.common.util.Pair;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.filestore.FileStoreAPI;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams;
+import org.apache.solr.util.plugin.SolrCoreAware;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DocumentCategorizerUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+implements SolrCoreAware {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String SOURCE_PARAM = "source";
+  public static final String DEST_PARAM = "dest";
+  public static final String PATTERN_PARAM = "pattern";
+  public static final String REPLACEMENT_PARAM = "replacement";
+  public static final String MODEL_PARAM = "modelFile";
+  public static final String VOCAB_PARAM = "vocabFile";
+
+  private Path solrHome;
+
+  private SelectorParams srcInclusions = new SelectorParams();
+  private Collection srcExclusions = new ArrayList<>();
+
+  private FieldNameSelector srcSelector = null;
+
+  private String model = null;
+  private String vocab = null;
+  private String analyzerFieldType = null;
+
+  /**
+   * If pattern is null, this this is a literal field name. If pattern is 
non-null then this is a
+   * replacement string that may contain meta-characters (ie: capture group 
identifiers)
+   *
+   * @see #pattern
+   */
+  private String dest = null;
+  /**
+   * @see #dest
+   */
+  private Pattern pattern = null;
+
+  protected final FieldNameSelector getSourceSelector() {
+if (null != srcSelector) return srcSelector;
+
+throw new SolrException(
+SERVER_ERROR, "selector was never initialized, inform(SolrCore) never 
called???");
+  }
+
+  @Override
+  public void init(NamedList args) {
+
+// high level (loose) check for which type of config we have.
+//
+// individual init methods do more strict syntax checking
+if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 
0)) {
+  initSourceSelectorSyntax(args);
+} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= 
args.indexOf(REPLACEMENT_PARAM, 0)) {
+  initSimpleRegexReplacement(args);
+} else {
+  throw new SolrException(
+  SERVER_ERROR,
+  "A combination of either '"
+  + SOURCE_PARAM
+  + "' + '"
+  + DEST_PARAM
+  + "', or '"
+  + REPLACEMENT_PARAM
+  + "' + '"
+  + PATTERN_PARAM
+  + "' init params are mandatory");
+}
+
+Object modelParam = args.remove(MODEL_PARAM);
+if 

Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


epugh commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1446696389


##
solr/licenses/onnxruntime-LICENSE-MIT.txt:
##
@@ -0,0 +1,21 @@
+MIT License

Review Comment:
   I *think* we don't specify the version of onnx, so maybe we poke them to 
update?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


cpoerschke commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1446447648


##
solr/licenses/onnxruntime-LICENSE-MIT.txt:
##
@@ -0,0 +1,21 @@
+MIT License

Review Comment:
   part-answering own question: OpenNLP 2.3.1 uses onnxruntime 1.15.0 - 
https://github.com/apache/opennlp/blame/opennlp-2.3.1/pom.xml#L176



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


cpoerschke commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1446443017


##
solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:
##
@@ -0,0 +1,566 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.update.processor;
+
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+
+import ai.onnxruntime.OrtException;
+import java.io.File;
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+import opennlp.dl.InferenceOptions;
+import opennlp.dl.doccat.DocumentCategorizerDL;
+import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.common.util.Pair;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.filestore.FileStoreAPI;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams;
+import org.apache.solr.util.plugin.SolrCoreAware;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DocumentCategorizerUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+implements SolrCoreAware {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String SOURCE_PARAM = "source";
+  public static final String DEST_PARAM = "dest";
+  public static final String PATTERN_PARAM = "pattern";
+  public static final String REPLACEMENT_PARAM = "replacement";
+  public static final String MODEL_PARAM = "modelFile";
+  public static final String VOCAB_PARAM = "vocabFile";
+
+  private Path solrHome;
+
+  private SelectorParams srcInclusions = new SelectorParams();
+  private Collection srcExclusions = new ArrayList<>();
+
+  private FieldNameSelector srcSelector = null;
+
+  private String model = null;
+  private String vocab = null;
+  private String analyzerFieldType = null;
+
+  /**
+   * If pattern is null, this this is a literal field name. If pattern is 
non-null then this is a
+   * replacement string that may contain meta-characters (ie: capture group 
identifiers)
+   *
+   * @see #pattern
+   */
+  private String dest = null;
+  /**
+   * @see #dest
+   */
+  private Pattern pattern = null;
+
+  protected final FieldNameSelector getSourceSelector() {
+if (null != srcSelector) return srcSelector;
+
+throw new SolrException(
+SERVER_ERROR, "selector was never initialized, inform(SolrCore) never 
called???");
+  }
+
+  @Override
+  public void init(NamedList args) {
+
+// high level (loose) check for which type of config we have.
+//
+// individual init methods do more strict syntax checking
+if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 
0)) {
+  initSourceSelectorSyntax(args);
+} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= 
args.indexOf(REPLACEMENT_PARAM, 0)) {
+  initSimpleRegexReplacement(args);
+} else {
+  throw new SolrException(
+  SERVER_ERROR,
+  "A combination of either '"
+  + SOURCE_PARAM
+  + "' + '"
+  + DEST_PARAM
+  + "', or '"
+  + REPLACEMENT_PARAM
+  + "' + '"
+  + PATTERN_PARAM
+  + "' init params are mandatory");
+}
+
+Object modelParam = args.remove(MODEL_PARAM);
+

Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


cpoerschke commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1446436416


##
solr/licenses/onnxruntime-LICENSE-MIT.txt:
##
@@ -0,0 +1,21 @@
+MIT License

Review Comment:
   Looking up https://github.com/microsoft/onnxruntime/blob/v1.15.0/LICENSE 
here I noticed there's a 1.15.1 and 1.16.x now too, wondering about 1.15.0 vs. 
the later ones then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2024-01-09 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1883364525

   @cpoerschke when I demoed this code at the last community meetup, 
@gerlowskija asked why not to commit it, and I didn't have a super great 
reason.   I'd love your thoughts on this PR since you played some with ONNX as 
well..  Is there anything here you think needs changing before its get merged?  
I'd love to get the ONNX stuff in and unblock your work...   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2023-11-30 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1833889740

   > Looks like a great first step! Glad that OpenNLP 2.3.1 helped move it 
along.
   
   I did a community demo yesterday, and it went well.Having 2.3.1 meant I 
could remove some ugly moving of Jars!  Which made the demo more compelling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2023-11-29 Thread via GitHub


jzonthemtn commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-1832486680

   Looks like a great first step! Glad that OpenNLP 2.3.1 helped move it along.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2023-11-29 Thread via GitHub


jzonthemtn commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1409711871


##
solr/modules/analysis-extras/src/java/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.java:
##
@@ -0,0 +1,566 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.update.processor;
+
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+
+import ai.onnxruntime.OrtException;
+import java.io.File;
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+import opennlp.dl.InferenceOptions;
+import opennlp.dl.doccat.DocumentCategorizerDL;
+import opennlp.dl.doccat.scoring.AverageClassificationScoringStrategy;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.common.util.Pair;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.filestore.PackageStoreAPI;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector;
+import 
org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams;
+import org.apache.solr.util.plugin.SolrCoreAware;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DocumentCategorizerUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+implements SolrCoreAware {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String SOURCE_PARAM = "source";
+  public static final String DEST_PARAM = "dest";
+  public static final String PATTERN_PARAM = "pattern";
+  public static final String REPLACEMENT_PARAM = "replacement";
+  public static final String MODEL_PARAM = "modelFile";
+  public static final String VOCAB_PARAM = "vocabFile";
+
+  private Path solrHome;
+
+  private SelectorParams srcInclusions = new SelectorParams();
+  private Collection srcExclusions = new ArrayList<>();
+
+  private FieldNameSelector srcSelector = null;
+
+  private String model = null;
+  private String vocab = null;
+  private String analyzerFieldType = null;
+
+  /**
+   * If pattern is null, this this is a literal field name. If pattern is 
non-null then this is a
+   * replacement string that may contain meta-characters (ie: capture group 
identifiers)
+   *
+   * @see #pattern
+   */
+  private String dest = null;
+  /**
+   * @see #dest
+   */
+  private Pattern pattern = null;
+
+  protected final FieldNameSelector getSourceSelector() {
+if (null != srcSelector) return srcSelector;
+
+throw new SolrException(
+SERVER_ERROR, "selector was never initialized, inform(SolrCore) never 
called???");
+  }
+
+  @Override
+  public void init(NamedList args) {
+
+// high level (loose) check for which type of config we have.
+//
+// individual init methods do more strict syntax checking
+if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 
0)) {
+  initSourceSelectorSyntax(args);
+} else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= 
args.indexOf(REPLACEMENT_PARAM, 0)) {
+  initSimpleRegexReplacement(args);
+} else {
+  throw new SolrException(
+  SERVER_ERROR,
+  "A combination of either '"
+  + SOURCE_PARAM
+  + "' + '"
+  + DEST_PARAM
+  + "', or '"
+  + REPLACEMENT_PARAM
+  + "' + '"
+  + PATTERN_PARAM
+  + "' init params are mandatory");
+}
+
+Object modelParam = args.remove(MODEL_PARAM);
+  

Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2023-11-27 Thread via GitHub


epugh commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1406194193


##
solr/packaging/test/test_opennlp.bats:
##
@@ -0,0 +1,110 @@
+#!/usr/bin/env bats
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+load bats_helper
+
+setup_file() {
+  common_clean_setup
+  
+}
+
+teardown_file() {
+  common_setup
+  solr stop -all
+}
+
+setup() {
+  common_setup
+}
+
+teardown() {
+  # save a snapshot of SOLR_HOME for failed tests
+  save_home_on_failure
+}
+
+@test "Check lifecycle of sentiment classification" {
+  
+  # GPU versions is linux and windows only, not OSX.  So swap jars.
+  rm -f ${SOLR_TIP}/modules/analysis-extras/lib/onnxruntime_gpu-1.14.0.jar

Review Comment:
   Thank you!!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2023-11-27 Thread via GitHub


rzo1 commented on code in PR #1999:
URL: https://github.com/apache/solr/pull/1999#discussion_r1405818295


##
solr/packaging/test/test_opennlp.bats:
##
@@ -0,0 +1,110 @@
+#!/usr/bin/env bats
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+load bats_helper
+
+setup_file() {
+  common_clean_setup
+  
+}
+
+teardown_file() {
+  common_setup
+  solr stop -all
+}
+
+setup() {
+  common_setup
+}
+
+teardown() {
+  # save a snapshot of SOLR_HOME for failed tests
+  save_home_on_failure
+}
+
+@test "Check lifecycle of sentiment classification" {
+  
+  # GPU versions is linux and windows only, not OSX.  So swap jars.
+  rm -f ${SOLR_TIP}/modules/analysis-extras/lib/onnxruntime_gpu-1.14.0.jar

Review Comment:
   Release is done. New artifacts should be available soon ;-) (not depending 
on gpu anymore)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org