Re: extract multi-features for one solr feature extractor in solr learning to rank
Hi, Michael, Thank for very valuable feedbacks. > You can pass in different params in the > features.json config for each feature, even though they use the same > feature class. I used this idea to extract some features in this paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/letor3.pdf) e.g. Table 2 (1-15) features are just <query, doc> term features in various forms. { "store" : "MyFeatureStore", "name" : "term_count_1", "class" : "com.apache.solr.ltr.feature.TermCountFeature", "params" : { "field" : "a_text", "terms" : "${user_terms}", "method" : "1" } }, { "store" : "MyFeatureStore", "name" : "term_count_2", "class" : "com.apache.solr.ltr.feature.TermCountFeature", "params" : { "field" : "a_text", "terms" : "${user_terms}", "method" : "2" } }, where method id corresponds to features on Table 2 (1-15). Although those features share the same class, the differences are minor. In product deployment, this overhead may not be an issue. After feature selection, probably only a small number of features are useful. Another use case: use convolution neural network or LSTM to extract embedded feature vector for both query and document, where dimension of the embedded feature vectors should be 50-100. Then we feed those features into learning-to-rank models. > Your performance point about 100 features vs 1 feature is true, > and pull requests to improve the plugin's performance and usability would I will do some performance benchmark for some user cases to justify whether supporting new multi-features for one feature class is worthy. If yes, I will share the results and create pull request. Thanks Jianxiong On 4/18/17, Michael Nilsson <mnilsson2...@gmail.com> wrote: > Hi Jianxiong, > > What you say is true. If you want 100 different feature values extracted, > you need to specify 100 different features in the > features.json config so that there is a direct mapping of features in and > features out. However, you more than likely need > to only implement 1 feature class that you will use for those 100 feature > values. You can pass in different params in the > features.json config for each feature, even though they use the same > feature class. In some cases you might be able to > just have 1 feature output 1 value that changes per document, if you can > collapse those features together. This 2nd option > may or may not work for you depending on your data, what you are trying to > bucket, and what algorithm you are trying to > use because not all algorithms can easily handle this case. To illustrate: > > > *A) Multiple binary features using the same 1 class* > { > "name" : "isProductCheap", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[0 TO 100]" ] > } > },{ > "name" : "isProductExpensive", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[101 TO 1000]" ] > } > },{ > "name" : "isProductCrazyExpensive", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[1001 TO *]" ] > } > } > > > *B) 1 feature that outputs different values (some algorithms don't handle > discrete features well)* > { > "name" : "productPricePoint", > "class" : "org.apache.solr.ltr.feature.MyPricePointFeature", > "params" : { > > // Either hard code price map in MyPricePointFeature.java, or > // pass it in through params for flexible customization, > // and return different values for cheap, expensive, and > crazyExpensive > > } > } > > The 2 options above satisfy most use cases, which is what we were > targeting. > In my specific use case, I opted for option A, > and wrote a simple script that generates the features.json so I wouldn't > have to write 100 similar features by hand. You > also mentioned that you want to extract features sparsely. You can change > the configuration of the Feature Transformer > <http://lucene.apache.org/solr/6_5_0/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html> > > to return features that actuall
extract multi-features for one solr feature extractor in solr learning to rank
Hi, I found that solr learning-to-rank (LTR) supports only ONE feature for a given feature extractor. See interface: https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java Line (281, 282) (in FeatureScorer) @Override public abstract float score() throws IOException; I have a user case: given a, I like to extract multiple features (e.g. 100 features. In the current framework, I have to define 100 features in feature.json. Also more cost for scored doc iterations). I would like to have an interface: public abstract Map score() throws IOException; It helps support sparse vector feature. Can anybody provide an insight? Thanks Jianxiong
solr learning_to_rank (normalizer) unmatched argument type issue
Hi, I created a toy learning-to-rank model in solr in order to show the issues. Feature.json - [ { "store" : "wikiFeatureStore", "name" : "doc_len", "class" : "org.apache.solr.ltr.feature.FieldLengthFeature", "params" : {"field":"a_text"} }, { "store" : "wikiFeatureStore", "name" : "rankScore", "class" : "org.apache.solr.ltr.feature.OriginalScoreFeature", "params" : {} } ] model.json --- { "store" : "wikiFeatureStore", "class" : "org.apache.solr.ltr.model.LinearModel", "name" : "wiki_qaModel", "features" : [ { "name" : "doc_len", "norm" : { "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer", "params" : {"min": "1.0", "max" : "113.8" } } }, { "name" : "rankScore", "norm" : { "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer", "params" : {"min": "0.0", "max" : "49.60385" } } } ], "params" : { "weights": { "doc_len": 0.322, "rankScore": 0.98 } } } I could upload both feature and model and performed re-ranking based on the above model. The issue was that when I stopped the solr server and restarted it. I got error message when I ran the same query to extract the features: "Caused by: org.apache.solr.common.SolrException: Failed to create new ManagedResource /schema/model-store of type org.apache.solr.ltr.store.rest.ManagedModelStore due to: java.lang.IllegalArgumentException: argument type mismatch at org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:700) at org.apache.solr.rest.RestManager.addRegisteredResource(RestManager.java:666) at org.apache.solr.rest.RestManager.access$300(RestManager.java:59) at org.apache.solr.rest.RestManager$Registry.registerManagedResource(RestManager.java:231) at org.apache.solr.ltr.store.rest.ManagedModelStore.registerManagedModelStore(ManagedModelStore.java:51) at org.apache.solr.ltr.search.LTRQParserPlugin.inform(LTRQParserPlugin.java:124) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:719) at org.apache.solr.core.SolrCore.init(SolrCore.java:931) ... 9 more Caused by: java.lang.IllegalArgumentException: argument type mismatch at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1077) at org.apache.solr.ltr.norm.Normalizer.getInstance(Normalizer.java:49) " I found that the issue was related to solr-6.4.2/server/solr/my_collection/conf/_schema_model-store.json " { "initArgs":{}, "initializedOn":"2017-03-31T20:51:59.494Z", "updatedSinceInit":"2017-03-31T20:54:54.841Z", "managedList":[{ "name":"wiki_qaModel", "class":"org.apache.solr.ltr.model.LinearModel", "store":"wikiFeatureStore", "features":[ { "name":"doc_len", "norm":{ "class":"org.apache.solr.ltr.norm.MinMaxNormalizer", "params":{ "min":1.0, "max":113.7862548828}}}, ... " Here the data type for "min'' and "max" are double. When I manually changed them to string. Then everything worked as expected. " "norm":{ "class":"org.apache.solr.ltr.norm.MinMaxNormalizer", "params":{ "min": "1.0", "max": "113.7862548828"}}}, Any insights into the above strange behavior? Thanks Jianxiong