Re: Search Ranking

Meeraj Kunnumpurath Wed, 16 May 2012 13:49:18 -0700

I have tried the same using Lucene directly with the following code,

import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.search.ScoreDoc;


public class LuceneTest {

    public static void main(String[] args) throws Exception {

        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
        RAMDirectory index = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
                analyzer);
        IndexWriter indexWriter = new IndexWriter(index, config);

        Document doc1 = new Document();
        doc1.add(new Field("searchText", "ABC Takeaway [email protected]
[email protected]", Field.Store.YES, Field.Index.ANALYZED));
        Document doc2 = new Document();
        doc2.add(new Field("searchText", "XYZ Takeaway [email protected]",
Field.Store.YES, Field.Index.ANALYZED));

        indexWriter.addDocument(doc1);
        indexWriter.addDocument(doc2);
        indexWriter.close();

        Query q = new QueryParser(Version.LUCENE_35, "searchText",
analyzer).parse("Takeaway");

        int hitsPerPage = 10;
        IndexReader reader = IndexReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopScoreDocCollector collector =
TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

        System.out.println("Found " + hits.length + " hits.");
        for(int i=0;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println((i + 1) + ". " + d.get("searchText"));
        }

    }

}

The output is ..

Found 2 hits.
1. XYZ Takeaway [email protected]
2. ABC Takeaway [email protected] [email protected]

On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
[email protected]> wrote:

> Thanks Ivan.
>
> I don't use Lucene directly, it is used behind the scene by the Neo4J
> graph database for full-text indexing. According to their documentation for
> full text indexes they use white space tokenizer in the analyser. Yes, I do
> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
> search string, and just put "[email protected]", I get Listing 1 first.
>
> Regards
> Meeraj
>
>
> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <[email protected]> wrote:
>
>> Use the explain function to understand why the query is producing the
>> results you see.
>>
>>
>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>> ,
>> int)
>>
>> Does your current query return Listing 2 first? That might be because
>> of term frequencies. Which analyzers are you using?
>>
>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>
>> Cheers,
>>
>> Ivan
>>
>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>> <[email protected]> wrote:
>> > Hi,
>> >
>> > I am quite new to Lucene. I am trying to use it to index listings of
>> local
>> > businesses. The index has only one field, that stores the attributes of
>> a
>> > listing as well as email addresses of users who have rated that
>> business.
>> >
>> > For example,
>> >
>> > Listing 1: "XYZ Takeaway London [email protected] [email protected]
>> > [email protected]"
>> > Listing 2: "ABC Takeaway London [email protected] [email protected]"
>> >
>> > Now when the user does a search with "Takeaway [email protected]", how
>> do I
>> > get listing 1 to always come before listing 2, because it has the term
>> > [email protected] appear twice where as listing 2 has it only once?
>> >
>> > Regards
>> > Meeraj
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Search Ranking

Reply via email to