[
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193017#comment-15193017
]
Alessandro Benedetti commented on SOLR-8542:
--------------------------------------------
As I briefly discussed with Diego, about how to include the training in Solr as
well :
A simple integration could be :
1) select a supported training library for linear SVM and one for the
LambdaMart ( basically the libraries that you already suggest in the README
could be a starting point)
2) create an Update Request handler that accepts the training set ( and the
format of the training set will be clearly described in the documentation like
: LETOR )
This update handler will basically take the training set file and related
parameters supported by the related library to proceed with the training.
Trying to use the default configuration parameter where possible, in the way to
make it as easy as possible the user interaction.
The update handler will then extract the document features ( a revisit of the
cache could be interesting in here, to improve the rycicling of feature
extraction)
3) update request handler will train the model calling internally the selected
library , using all the parameters provided. The model generated will be
converted in the supported Json format and stored in the model store.
This sample approach could be complicated as much as we want ( we can add
flexibility in the library to be used and make it easy to extend) .
A further next step could be to add a layer of signal processing directly in
Solr , to build the training set as well .
( a sort of REST Api that takes in input the document, queryId, rating score)
and automatically create an entry of the training set stored in some smart way.
Than we can trigger the model generation or set up schedule to refresh the
model automatically.
We could even take into account only certain periods, store training data in
different places, clean the training set automatically from time to time ect
ext :)
Now I am going off topic, but there are a lot of things to do with the training
, to ease the integration :)
Happy to discuss them and get new ideas to improve the plugin which I think is
going to be really , really valuable for the Solr community
> Integrate Learning to Rank into Solr
> ------------------------------------
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
> Issue Type: New Feature
> Reporter: Joshua Pantony
> Assignee: Christine Poerschke
> Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch,
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features
> directly inside Solr for use in training a machine learned model. You can
> then deploy that model to Solr and use it to rerank your top X search
> results. This concept was previously presented by the authors at Lucene/Solr
> Revolution 2015 (
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
> ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson,
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached
> documentation as a github MD file, but are happy to convert to a desired
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin
> with
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'
> --data-binary "@./contrib/ltr/example/techproducts-features.json" -H
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'
> --data-binary "@./contrib/ltr/example/techproducts-model.json" -H
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]