Re: Custom scoring algorithm

Alberto Gimeno Fri, 13 Nov 2009 09:10:46 -0800

Hi again.

I've made a proof of concept using the boost factor. I have done the
following: add a field for each feature and put the field boost factor
as the feature value.


        private static void addDocument(String id, Map<String, Integer>
features, IndexWriter writer) throws IOException {
                Document doc = new Document();

                doc.add(new Field("id", id,
                Field.Store.YES, Field.Index.NO));
                
                for (String key : features.keySet()) {
                        Field field = new Field(key, key,
                        Field.Store.YES, Field.Index.ANALYZED);
                        field.setBoost(features.get(key));
                        doc.add(field);
                }
                
                writer.addDocument(doc);
        }

But I don't know if this is the best way of doing this.

Thanks.


On Fri, Nov 13, 2009 at 3:42 PM, Alberto Gimeno <[email protected]> wrote:
> Hi.
>
> I am developing an application and I would like to add searching
> capabilities. I have a database with items. Each item has a number of
> "features" with a numeric value. Example: feature_x=100,
> feature_y=200. Items can have common or different "features". And they
> can have a variable number of "features" as well. In the whole
> application can be hundreds of different features. I need users to be
> able to make queries by features and I need the results to be sorted
> as the sum of those features. For example if a user looks for "x, y,
> z", the first result should be the item with the greatest
> (features_x+feature_y+feature_z).
>
> I think Lucene can be a solution. But I don't need some of its
> features such as Analyzers, Filters, Tokenizers... I think I should
> implement my own scoring algorithm. How can I do that in the easiest
> possible way? I have read something about payloads. Can they be useful
> for my needings?
>
> My first attempt was to generate dumb strings containing each feature
> name repeated as many times as the feature value. Example: "x, x, x,
> x, y, y, z, z". Of course I know this is a very poor solution. And I
> also have seen that it doesn't work as I expected because the default
> scoring algorithm is much more complex than just counting words.
>
> Thank you very much in advance.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Custom scoring algorithm

Reply via email to