I'm still relatively new to Lucene.Net. The code below was ran off an 
/tags/Lucene.Net_2_4_0 optimized index on a WinXP, .NET 3.5, Core2 Duo (E6550 
2.33GHz), 4gb RAM machine. The output was:

  // Debug build, debugger not attached
  Found 500 documents in 12 seconds, 40 requests/second

  // Release build, debugger not attached
  Found 500 documents in 11 seconds, 48 requests/second

I don't have any concept if that's fast or slow. My final index will have 50m+ 
documents. I also split apart the 500 search queries into 2,4,6,etc. background 
threads using a shared IndexSearcher and got about the same overall result. Are 
there any obvious speedups I'm missing? I was under the impression
that exact TermQuery matches on NOT_ANALYZED Fields are the fastest
lookups?

Is it wise to try and out-smart Lucene.Net and manually shard the lookups by 
creating a index for each state:

 Dictionary<string, IndexSearch> stateToIndexSearcher;

or is Lucene.Net designed to quickly restrict searches using a Filter? I split 
apart my index such that each state had its own index (the largest index had 6m 
documents) and didn't notice any major speed ups.

/*
d.Add(new Field("id", record.Id, Field.Store.YES, Field.Index.NOT_ANALYZED));
d.Add(new Field("name", record.Name, Field.Store.YES, Field.Index.ANALYZED));
d.Add(new Field("exactname", record.Name, Field.Store.YES, 
Field.Index.NOT_ANALYZED));
d.Add(new Field("address", record.Address, Field.Store.YES, 
Field.Index.ANALYZED));
d.Add(new Field("city", record.City, Field.Store.YES, Field.Index.ANALYZED));
d.Add(new Field("state", record.State, Field.Store.YES, Field.Index.ANALYZED));
d.Add(new Field("zip", record.Zip, Field.Store.YES, Field.Index.ANALYZED));
if (String.IsNullOrEmpty(record.Phoner) == false)
{
    // records with phone numbers are slightly more desireable
    Field phoneField = new Field("phone", record.Phone, Field.Store.YES, 
Field.Index.ANALYZED);
    phoneField.SetBoost(phoneField.GetBoost() * 1.1f); // boost 10%
    d.Add(phoneField);
}
*/
IndexSearcher searcher = new IndexSearcher(
    IndexReader.Open(
        FSDirectory.GetDirectory("../../../index/"), 
        true)); // readOnly

// 26,812,346
int numDocs = searcher.Reader.NumDocs();

Random random = new Random();
string[] randomNames = new string[500];
for (int i = 0; i < randomNames.Length; i++)
{
    int randomDocId = random.Next(1, numDocs);
    randomNames[i] = searcher.Doc(randomDocId).Get("exactname");
}

Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < randomNames.Length; i++)
{
    Query query = new TermQuery(new Term("exactname", randomNames[i]));
    TopDocs hits = searcher.Search(query, 1);
    searcher.Doc(hits.scoreDocs[0].doc);
}
stopwatch.Stop();

Console.WriteLine();
Console.WriteLine("Found {0} documents in {1:n0} seconds, {2:n0} 
requests/second", 
    randomNames.Length, // 0
    stopwatch.Elapsed.TotalSeconds, // 1
    randomNames.Length / stopwatch.Elapsed.TotalSeconds); // 2
Console.WriteLine();
Console.ReadLine();

searcher.Close();

Reply via email to