[
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183195#comment-15183195
]
Christine Poerschke commented on SOLR-8542:
-------------------------------------------
bq. ... Question: The only reason we currently have the LTRComponent is so that
it can register the Model and Feature stores as managed resources because it
can be SolrCore aware. Is there a way we can do this without the use of a
component?
Not answering directly the managed resources part of the question but having
noticed that the features.json/model.json needs to be accompanied by various
solrconfig.xml changes in practice - I wonder if configuring models as plugin
part of solrconfig.xml might be something to explore?
----
*current (features|model).json and solrconfig.xml configuration:*
{code}
###### features.json
...
###### firstModel.json
...
###### secondModel.json
...
###### solrconfig.xml
...
<queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
...
<transformer name="features"
class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory"/>
...
<searchComponent name="ltrComponent"
class="org.apache.solr.ltr.ranking.LTRComponent"/>
...
<requestHandler name="/query" class="solr.SearchHandler">
...
<arr name="last-components">
<str>ltrComponent</str>
</arr>
</requestHandler>
...
{code}
----
*potential alternative solrconfig.xml configuration:*
{code}
###### solrconfig.xml
...
<!-- no queryParser name="ltr" element since LTRQParserPlugin is in
QParserPlugin.standardPlugins -->
<!-- no transformer name="features" since LTRFeatureLoggerTransformerFactory is
in TransformerFactory.defaultFactories -->
<reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
<!-- model features -->
<str name="features">originalScore,isBook</str>
<str
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
<str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
<str name="isBook.fq">{!terms f=category}book</str>
<!-- model parameters -->
<float name="weights.originalScore">0.5</float>
<float name="weights.isBook">0.1</float>
</reRankModelFactory>
<reRankModelFactory class="solr.SVMRerankModelFactory">
<str name="">mySecondModelName</str>
...
</reRankModelFactory>
...
{code}
----
_The most obvious implication_ of having a new solrconfig.xml element instead
of (features|model).json managed resources would be that {{solr/core}} rather
than {{solr/contrib/ltr}} contains the code.
* From an end-user perspective this means 'Learning to Rank' support
out-of-the-box i.e. no need to build and deploy extra jar files plus no need to
configure LTRQParserPlugin and LTRFeatureLoggerTransformerFactory queryParser
and transformer elements. Though note that {{<reRankModelFactory
class="mycompany.MyCustomReRankModelFactory">}} customisation is supported if
something other than the out-of-the-box models is required.
* One of the out-of-the-box factories could be a features-only factory similar
to the 'dummyModel' mentioned above, e.g.
{code}
<reRankModelFactory name="featuresOnly" class="solr.NoRerankingFactory">
<str name="features">originalScore,isBook</str>
<str
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
<str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
<str name="isBook.fq">{!terms f=category}book</str>
</reRankModelFactory>
{code}
_A concern might be_ that the reRankModelFactory element(s) would bloat
solrconfig.xml and that the element(s) being embedded in solrconfig.xml would
be more difficult to edit than one or two json files.
* The bloat concern can be addressed via {{xi:include}} e.g.
{code}
###### solrconfig.xml
...
<xi:include href="solrconfig-reRankModelFactory-myFirstModelName.xml"
xmlns:xi="http://www.w3.org/2001/XInclude"/>
...
###### solrconfig-reRankModelFactory-myFirstModelName.xml
<reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
<!-- model features -->
<str name="features">originalScore,isBook</str>
<str
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
<str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
<str name="isBook.fq">{!terms f=category}book</str>
<!-- model parameters -->
<float name="weights.originalScore">0.5</float>
<float name="weights.isBook">0.1</float>
</reRankModelFactory>
{code}
* xml vs. json representation is a fair point, if the feature engineering
process usually outputs json files then perhaps a simple utility script could
help convert that json into solrconfig.xml a reRankModelFactory xml element.
_A factory approach_ could naturally support arbitrary models including
chaining or nesting of models. (A factory approach is of course also possible
with json format.)
{code}
<reRankModelFactory name="myTwoPassModelName"
class="solr.MultiPassRerankModelFactory">
<str name="passPrefixes">simple,complex</str>
<!-- simple model factory -->
<str name="simple.class">solr.SVMRerankModelFactory</str>
<!-- simple model features -->
<str name="simple.features">originalScore,isBook</str>
<str
name="simple.originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
<str
name="simple.isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
<str name="simple.isBook.fq">{!terms f=category}book</str>
<!-- simple model parameters -->
<float name="simple.weights.originalScore">0.5</float>
<float name="simple.weights.isBook">0.1</float>
<!-- complex model factory -->
<str name="complex.class">mycompany.MyComplexRerankModelFactory</str>
<!-- complex model features -->
<str name="complex.features">x,y</str>
<str name="complex.x.class">...</str>
<str name="complex.x.aaa">...</str>
<int name="complex.x.bbb">...</int>
<str name="complex.y.class">...</str>
<int name="complex.y.zzz">...</int>
<!-- complex model parameters -->
<float name="complex.something.configurable">0.42</float>
...
</reRankModelFactory>
{code}
> Integrate Learning to Rank into Solr
> ------------------------------------
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
> Issue Type: New Feature
> Reporter: Joshua Pantony
> Assignee: Christine Poerschke
> Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch,
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features
> directly inside Solr for use in training a machine learned model. You can
> then deploy that model to Solr and use it to rerank your top X search
> results. This concept was previously presented by the authors at Lucene/Solr
> Revolution 2015 (
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
> ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson,
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached
> documentation as a github MD file, but are happy to convert to a desired
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin
> with
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'
> --data-binary "@./contrib/ltr/example/techproducts-features.json" -H
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'
> --data-binary "@./contrib/ltr/example/techproducts-model.json" -H
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]