rzo1 commented on code in PR #3663:
URL: https://github.com/apache/solr/pull/3663#discussion_r2355396199
##########
solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc:
##########
@@ -430,6 +432,10 @@ The
{solr-javadocs}/modules/analysis-extras/index.html[`analysis-extras`] module
{solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]:::
Update document(s) to be indexed with named entities extracted using an
OpenNLP NER model.
Note that in order to use model files larger than 1MB on SolrCloud, you must
xref:deployment-guide:zookeeper-ensemble#increasing-the-file-size-limit[configure
both ZooKeeper server and clients].
+{solr-javadocs}/modules/analysis-extras/org/apache/solr/update/processor/DocumentCategorizerUpdateProcessorFactory.html[DocumentCategorizerUpdateProcessorFactory]:::
Classify text in fields using models. These models can be sourced from
Huggingface and run directly in Solr using OpenNLP via {onnx}[ONNX].
Review Comment:
These models must be in **onxx** format and can be ...
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the
cluster)
+* At least 4GB of memory allocated to Solr
+
+=== Step 1: Start Solr with Required Modules
+
+To enable NLP processing in Solr, start Solr with the `analysis-extras` module
and package support:
+
+[,console]
+----
+$ export SOLR_SECURITY_MANAGER_ENABLED=false
+$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true
Review Comment:
Ah it is done given via properties ;-) (maybe just add the properties in the
prerequisites in single line code highlighting?)
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the
cluster)
+* At least 4GB of memory allocated to Solr
+
+=== Step 1: Start Solr with Required Modules
+
+To enable NLP processing in Solr, start Solr with the `analysis-extras` module
and package support:
+
+[,console]
+----
+$ export SOLR_SECURITY_MANAGER_ENABLED=false
+$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true
+----
+
+[NOTE]
+====
+We temporarily disable the security manager to allow loading of the ONNX
runtime. In production environments, you would configure appropriate security
policies instead.
Review Comment:
Would it make sense to link to docs about security policies? Something like:
To learn more about it, follow the link...
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the
cluster)
Review Comment:
Maybe describe how to do that or point to a doc?
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
Review Comment:
Maybe describe how to do that or point to a doc?
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
Review Comment:
Can this be a link reference in adoc instead of the plain text link?
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the
cluster)
+* At least 4GB of memory allocated to Solr
+
+=== Step 1: Start Solr with Required Modules
+
+To enable NLP processing in Solr, start Solr with the `analysis-extras` module
and package support:
+
+[,console]
+----
+$ export SOLR_SECURITY_MANAGER_ENABLED=false
+$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true
+----
+
+[NOTE]
+====
+We temporarily disable the security manager to allow loading of the ONNX
runtime. In production environments, you would configure appropriate security
policies instead.
+====
+
+=== Step 2: Download the Required Model Files
+
+For sentiment analysis, we need two essential files:
+
+1. An ONNX model file that contains the neural network
+2. A vocabulary file that maps tokens to IDs for the model
+
+Let's create a directory for our models and download them:
+
+[,console]
Review Comment:
That will only work on Linux. If you are on OSX (no wget) or Windows (no
wget). Maybe state in the prerequisites that this tutorial was done on Linux
and readers might need to adjust commands ?
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc:
##########
@@ -0,0 +1,468 @@
+= Exercise: Sentiment Analysis with OpenNLP
+:experimental:
+:tabs-sync-option:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[exercise-opennlp]]
+== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr
+
+This tutorial demonstrates how to enhance Solr with advanced Natural Language
Processing (NLP) capabilities through Apache OpenNLP and ONNX.
+You'll learn how to set up a sentiment analysis pipeline that automatically
classifies documents during indexing.
+
+We are going to use the
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model
in the tutorial, however there are many others you can use.
+
+----
+is a bert-base-multilingual-uncased model finetuned for sentiment analysis on
product reviews in
+six languages: English, Dutch, German, French, Spanish, and Italian.
+It predicts the sentiment of the review as a number of stars (between 1 and 5).
+----
+
+=== Prerequisites
+
+Before starting this tutorial, you'll need:
+
+* Apache Solr (version 10 or later)
+* The `analysis-extras` module enabled
+* Packages enabled in Solr (to allow you to upload the model files to the
cluster)
+* At least 4GB of memory allocated to Solr
+
+=== Step 1: Start Solr with Required Modules
+
+To enable NLP processing in Solr, start Solr with the `analysis-extras` module
and package support:
+
+[,console]
+----
+$ export SOLR_SECURITY_MANAGER_ENABLED=false
+$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true
+----
+
+[NOTE]
+====
+We temporarily disable the security manager to allow loading of the ONNX
runtime. In production environments, you would configure appropriate security
policies instead.
+====
+
+=== Step 2: Download the Required Model Files
+
+For sentiment analysis, we need two essential files:
+
+1. An ONNX model file that contains the neural network
+2. A vocabulary file that maps tokens to IDs for the model
+
+Let's create a directory for our models and download them:
+
+[,console]
+----
+$ mkdir -p models/sentiment/
Review Comment:
It is not really clear in which direction the folders need to be created.
Maybe define some variable at the beginning, so people can re-use that through
the tutorial?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]