[ https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193017#comment-15193017 ]
Alessandro Benedetti commented on SOLR-8542: -------------------------------------------- As I briefly discussed with Diego, about how to include the training in Solr as well : A simple integration could be : 1) select a supported training library for linear SVM and one for the LambdaMart ( basically the libraries that you already suggest in the README could be a starting point) 2) create an Update Request handler that accepts the training set ( and the format of the training set will be clearly described in the documentation like : LETOR ) This update handler will basically take the training set file and related parameters supported by the related library to proceed with the training. Trying to use the default configuration parameter where possible, in the way to make it as easy as possible the user interaction. The update handler will then extract the document features ( a revisit of the cache could be interesting in here, to improve the rycicling of feature extraction) 3) update request handler will train the model calling internally the selected library , using all the parameters provided. The model generated will be converted in the supported Json format and stored in the model store. This sample approach could be complicated as much as we want ( we can add flexibility in the library to be used and make it easy to extend) . A further next step could be to add a layer of signal processing directly in Solr , to build the training set as well . ( a sort of REST Api that takes in input the document, queryId, rating score) and automatically create an entry of the training set stored in some smart way. Than we can trigger the model generation or set up schedule to refresh the model automatically. We could even take into account only certain periods, store training data in different places, clean the training set automatically from time to time ect ext :) Now I am going off topic, but there are a lot of things to do with the training , to ease the integration :) Happy to discuss them and get new ideas to improve the plugin which I think is going to be really , really valuable for the Solr community > Integrate Learning to Rank into Solr > ------------------------------------ > > Key: SOLR-8542 > URL: https://issues.apache.org/jira/browse/SOLR-8542 > Project: Solr > Issue Type: New Feature > Reporter: Joshua Pantony > Assignee: Christine Poerschke > Priority: Minor > Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, > SOLR-8542-trunk.patch > > > This is a ticket to integrate learning to rank machine learning models into > Solr. Solr Learning to Rank (LTR) provides a way for you to extract features > directly inside Solr for use in training a machine learned model. You can > then deploy that model to Solr and use it to rerank your top X search > results. This concept was previously presented by the authors at Lucene/Solr > Revolution 2015 ( > http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp > ). > The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, > David Grohmann and Diego Ceccarelli. > Any chance this could make it into a 5x release? We've also attached > documentation as a github MD file, but are happy to convert to a desired > format. > h3. Test the plugin with solr/example/techproducts in 6 steps > Solr provides some simple example of indices. In order to test the plugin > with > the techproducts example please follow these steps > h4. 1. compile solr and the examples > cd solr > ant dist > ant example > h4. 2. run the example > ./bin/solr -e techproducts > h4. 3. stop it and install the plugin: > > ./bin/solr stop > mkdir example/techproducts/solr/techproducts/lib > cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar > example/techproducts/solr/techproducts/lib/ > cp contrib/ltr/example/solrconfig.xml > example/techproducts/solr/techproducts/conf/ > h4. 4. run the example again > > ./bin/solr -e techproducts > h4. 5. index some features and a model > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore' > --data-binary "@./contrib/ltr/example/techproducts-features.json" -H > 'Content-type:application/json' > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore' > --data-binary "@./contrib/ltr/example/techproducts-model.json" -H > 'Content-type:application/json' > h4. 6. have fun ! > *access to the default feature store* > http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ > *access to the model store* > http://localhost:8983/solr/techproducts/schema/mstore > *perform a query using the model, and retrieve the features* > http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org