Copilot commented on code in PR #3663: URL: https://github.com/apache/solr/pull/3663#discussion_r2352968749
########## solr/solr-ref-guide/modules/getting-started/pages/tutorial-opennlp.adoc: ########## @@ -0,0 +1,470 @@ += Exercise: Sentiment Analysis with OpenNLP +:experimental: +:tabs-sync-option: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +[[exercise-opennlp]] +== Exercise: Using OpenNLP and ONNX Models for Sentiment Analysis in Solr + +This tutorial demonstrates how to enhance Solr with advanced Natural Language Processing (NLP) capabilities through Apache OpenNLP and ONNX. +You'll learn how to set up a sentiment analysis pipeline that automatically classifies documents during indexing. + +We are going to use the https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment model in the tutorial, however there are many others you can use. + +---- +is a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in +six languages: English, Dutch, German, French, Spanish, and Italian. +It predicts the sentiment of the review as a number of stars (between 1 and 5). +---- + +=== Prerequisites + +Before starting this tutorial, you'll need: + +* Apache Solr (version 10 or later) +* The `analysis-extras` module enabled +* Packages enabled in Solr (to allow you to upload the model files to the cluster) +* At least 4GB of memory allocated to Solr + +=== Step 1: Start Solr with Required Modules + +To enable NLP processing in Solr, start Solr with the `analysis-extras` module and package support: + +[,console] +---- +$ export SOLR_SECURITY_MANAGER_ENABLED=false +$ bin/solr start -m 4g -Dsolr.modules=analysis-extras -Denable.packages=true +---- + +[NOTE] +==== +We temporarily disable the security manager to allow loading of the ONNX runtime. In production environments, you would configure appropriate security policies instead. +==== + +=== Step 2: Download the Required Model Files + +For sentiment analysis, we need two essential files: + +1. An ONNX model file that contains the neural network +2. A vocabulary file that maps tokens to IDs for the model + +Let's create a directory for our models and download them: + +[,console] +---- +$ mkdir -p models/sentiment/ +$ wget -O models/sentiment/model.onnx https://huggingface.co/onnx-community/bert-base-multilingual-uncased-sentiment-ONNX/resolve/main/onnx/model_quantized.onnx +$ wget -O models/sentiment/vocab.txt https://huggingface.co/onnx-community/bert-base-multilingual-uncased-sentiment-ONNX/raw/main/vocab.txt +---- + +.About ONNX Models +[sidebar] +**** +ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It allows models trained in different frameworks (like PyTorch, TensorFlow, or Hugging Face) to be exported to a standard format that can be used by various runtime environments. + +Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"]. + +The model we're using is a multilingual BERT model fine-tuned for sentiment classification and quantized for better performance. It produces classifications on a 5-point scale from "very bad" to "very good". + +Learn more about ONNX at https://onnx.ai[onnx.ai^, role="external", window="_blank"]. Review Comment: Duplicate line: this exact ONNX reference link appears both on line 80 and line 84. Remove the duplicate on line 84. ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
