[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

ASF GitHub Bot (JIRA) Wed, 09 Mar 2016 02:35:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186931#comment-15186931
 ]


ASF GitHub Bot commented on SOLR-8542:
--------------------------------------

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/4#discussion_r55499494
  
    --- Diff: solr/contrib/ltr/README.txt ---
    @@ -0,0 +1,330 @@
    +Apache Solr Learning to Rank
    +========
    +
    +This is the main [learning to rank integrated into 
solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp)
    +repository.
    +[Read up on learning to 
rank](https://en.wikipedia.org/wiki/Learning_to_rank)
    +
    +Apache Solr Learning to Rank (LTR) provides a way for you to extract 
features
    +directly inside Solr for use in training a machine learned model.  You can 
then
    +deploy that model to Solr and use it to rerank your top X search results.
    +
    +
    +# Changes to solrconfig.xml
    +```xml
    +<config>
    +  ...
    +
    +  <!-- Query parser used to rerank top docs with a provided model -->
    +  <queryParser name="ltr" 
class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
    +
    +  <!--  Transformer that will encode the document features in the response.
    +  For each document the transformer will add the features as an extra field
    +  in the response. The name of the field we will be the the name of the
    +  transformer enclosed between brackets (in this case [features]).
    +  In order to get the feature vector you will have to
    +  specify that you want the field (e.g., fl="*,[features])  -->
    +  <transformer name="features" 
class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" />
    +
    +
    +  <!-- Component that hooks up managed resources for features and models 
-->
    +  <searchComponent name="ltrComponent" 
class="org.apache.solr.ltr.ranking.LTRComponent"/>
    +  <requestHandler name="/query" class="solr.SearchHandler">
    +    <lst name="defaults">
    +      <str name="echoParams">explicit</str>
    +      <str name="wt">json</str>
    +      <str name="indent">true</str>
    +      <str name="df">id</str>
    +    </lst>
    +    <arr name="last-components">
    +      <!-- Use the component in your requestHandler -->
    +      <str>ltrComponent</str>
    +    </arr>
    +  </requestHandler>
    +
    +  <query>
    +    ...
    +
    +    <!-- Cache for storing and fetching feature vectors -->
    +    <cache name="QUERY_DOC_FV"
    +      class="solr.search.LRUCache"
    +      size="4096"
    +      initialSize="2048"
    +      autowarmCount="4096"
    +      regenerator="solr.search.NoOpRegenerator" />
    +  </query>
    +
    +</config>
    +
    +```
    +
    +
    +# Build the plugin
    +In the solr/contrib/ltr directory run
    +`ant dist`
    +
    +# Install the plugin
    +In your solr installation, navigate to your collection's lib directory.
    +In the solr install example, it would be solr/collection1/lib.
    +If lib doesn't exist you will have to make it, and then copy the plugin's 
jar there.
    +
    +`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar 
mySolrInstallPath/solr/myCollection/lib`
    +
    +Restart your collection using the admin page and you are good to go.
    +You can find more detailed instructions 
[here](https://wiki.apache.org/solr/SolrPlugins).
    +
    +
    +# Defining Features
    +In the learning to rank plugin, you can define features in a feature space
    +using standard Solr queries. As an example:
    +
    +###### features.json
    +```json
    +[
    +{ "name": "isBook",
    +  "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
    +  "params":{ "fq": ["{!terms f=category}book"] }
    +},
    +{
    +  "name":  "documentRecency",
    +  "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
    +  "params": {
    +      "q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)"
    +  }
    +},
    +{
    +  "name":"originalScore",
    +  "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature",
    +  "params":{}
    +},
    +{
    +  "name" : "userTextTitleMatch",
    +  "type" : "org.apache.solr.ltr.feature.impl.SolrFeature",
    +  "params" : { "q" : "{!field f=title}${user_text}" }
    +}
    +]
    +```
    +
    +Defines four features. Anything that is a valid Solr query can be used to 
define
    +a feature.
    +
    +### Filter Query Features
    +The first feature isBook fires if the term 'book' matches the category 
field
    +for the given examined document. Since in this feature q was not specified,
    +either the score 1 (in case of a match) or the score 0 (in case of no 
match)
    +will be returned.
    +
    +### Query Features
    +In the second feature (documentRecency) q was specified using a function 
query.
    +In this case the score for the feature on a given document is whatever the 
query
    +returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs 
dated
    +2 years ago, etc..) . If both an fq and q is used, documents that don't 
match
    +the fq will receive a score of 0 for the documentRecency feature, all other
    +documents will receive the score specified by the query for this feature.
    +
    +### Original Score Feature
    +The third feature (originalScore) has no parameters, and uses the
    +OriginalScoreFeature class instead of the SolrFeature class.  Its purpose 
is
    +to simply return the score for the original search request against the 
current
    +matching document.
    +
    +### External Features
    +Users can specify external information that can to be passed in as
    +part of the query to the ltr ranking framework. In this case, the
    +fourth feature (userTextPhraseMatch) will be looking for an external field
    +called 'user_text' passed in through the request, and will fire if there is
    +a term match for the document field 'title' from the value of the external
    +field 'user_text'. See the "Run a Rerank Query" section for how
    +to pass in external information.
    +
    +### Custom Features
    +Custom features can be created by extending from
    +org.apache.solr.ltr.ranking.Feature, however this is generally not 
recommended.
    +The majority of features should be possible to create using the methods 
described
    +above.
    +
    +# Defining Models
    +Currently the Learning to Rank plugin supports 2 main types of
    +ranking models: [Ranking 
SVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf)
    +and 
[LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf)
    +
    +### Ranking SVM
    +Currently only a linear ranking svm is supported. Use LambdaMART for
    +a non-linear model. If you'd like to introduce a bias set a constant 
feature
    +to the bias value you'd like and make a weight of 1.0 for that feature.
    +
    +###### model.json
    +```json
    +{
    +    "type":"org.apache.solr.ltr.ranking.RankSVMModel",
    +    "name":"myModelName",
    +    "features":[
    +        { "name": "userTextTitleMatch"},
    +        { "name": "originalScore"},
    +        { "name": "isBook"}
    +    ],
    +    "params":{
    +        "weights": {
    +            "userTextTitleMatch": 1.0,
    +            "originalScore": 0.5,
    +            "isBook": 0.1
    +        }
    +
    +    }
    +}
    +```
    +
    +This is an example of a toy Ranking SVM model. Type specifies the class to 
be
    +using to interpret the model (RankSVMModel in the case of Ranking SVM).
    +Name is the model identifier you will use when making request to the ltr
    +framework. Features specifies the feature space that you want extracted
    +when using this model. All features that appear in the model params will
    +be used for scoring and must appear in the features list.  You can add
    +extra features to the features list that will be computed but not used in 
the
    +model for scoring, which can be useful for logging.
    +Params are the Ranking SVM parameters.
    +
    +Good library for training SVM's 
(https://www.csie.ntu.edu.tw/~cjlin/liblinear/ ,
    +https://www.csie.ntu.edu.tw/~cjlin/libsvm/) . You will need to convert the
    +libSVM model format to the format specified above.
    +
    +### LambdaMART
    +
    +###### model2.json
    +```json
    +{
    +    "type":"org.apache.solr.ltr.ranking.LambdaMARTModel",
    +    "name":"lambdamartmodel",
    +    "features":[
    +        { "name": "userTextTitleMatch"},
    +        { "name": "originalScore"}
    +    ],
    +    "params":{
    +        "trees": [
    +            {
    +                "weight" : 1,
    +                "tree": {
    +                    "feature": "userTextTitleMatch",
    +                    "threshold": 0.5,
    +                    "left" : {
    +                        "value" : -100
    +                    },
    +                    "right": {
    +                        "feature" : "originalScore",
    +                        "threshold": 10.0,
    +                        "left" : {
    +                            "value" : 50
    +                        },
    +                        "right" : {
    +                            "value" : 75
    +                        }
    +                    }
    +                }
    +            },
    +            {
    +                "weight" : 2,
    +                "tree": {
    +                    "value" : -10
    +                }
    +            }
    +        ]
    +    }
    +}
    +```
    +This is an example of a toy LambdaMART. Type specifies the class to be 
using to
    +interpret the model (LambdaMARTModel in the case of LambdaMART). Name is 
the
    +model identifier you will use when making request to the ltr framework.
    +Features specifies the feature space that you want extracted when using 
this
    +model. All features that appear in the model params will be used for 
scoring and
    +must appear in the features list.  You can add extra features to the 
features
    +list that will be computed but not used in the model for scoring, which can
    +be useful for logging. Params are the LambdaMART specific parameters. In 
this
    +case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node.
    +
    +A good library for training LambdaMART ( 
http://sourceforge.net/p/lemur/wiki/RankLib/ ).
    +You will need to convert the RankLib model format to the format specified 
above.
    +
    +# Deploy Models and Features
    +To send features run
    +
    +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/fstore' 
--data-binary @/path/features.json -H 'Content-type:application/json'`
    +
    +To send models run
    +
    +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/mstore' 
--data-binary @/path/model.json -H 'Content-type:application/json'`
    +
    +
    +# View Models and Features
    +`curl -XGET 'http://localhost:8983/solr/collection1/schema/fstore'`
    +`curl -XGET 'http://localhost:8983/solr/collection1/schema/mstore'`
    +
    +
    +# Run a Rerank Query
    +Add to your original solr query
    +`rq={!ltr model=myModelName reRankDocs=25}`
    +
    +The model name is the name of the model you sent to solr earlier.
    +The number of documents you want reranked, which can be larger than the
    +number you display, is reRankDocs.
    +
    +### Pass in external information for external features
    +Add to your original solr query
    +`rq={!ltr reRankDocs=3 model=externalmodel efi.field1='text1' 
efi.field2='text2'}`
    +
    +Where "field1" specifies the name of the customized field to be used by one
    +or more of your features, and text1 is the information to be pass in. As an
    +example that matches the earlier shown userTextTitleMatch feature one 
could do:
    +
    +`rq={!ltr reRankDocs=3 model=externalmodel efi.user_text='Casablanca' 
efi.user_intent='movie'}`
    +
    +# Extract features
    +To extract features you need to use the feature vector transformer + set 
the
    +fv parameter to true (this required parameter will be removed in the 
future).
    +For now you need to also use a dummy model with all the features you want 
to
    +extract inside the features parameter list of the model (this limitation 
will
    +also be changed in the future so you can extract features without a dummy 
model).
    +
    +`fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}`
    +
    +## Test the plugin with solr/example/techproducts in 6 steps
    +
    +Solr provides some simple example of indices. In order to test the plugin 
with
    +the techproducts example please follow these steps
    +
    +1. compile solr and the examples
    +
    +    cd solr
    +    ant dist
    +    ant example
    --- End diff --
    
    I think ant example is deprecated in the current master branch,
    we should point that with recent releases,
    ant server 
    is necessary!


> Integrate Learning to Rank into Solr
> ------------------------------------
>
>                 Key: SOLR-8542
>                 URL: https://issues.apache.org/jira/browse/SOLR-8542
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joshua Pantony
>            Assignee: Christine Poerschke
>            Priority: Minor
>         Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>    
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>     
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

Reply via email to