Re: Sort regeneration in multithreaded server

2004-10-11 Thread Stephen Halsey
Hi Doug,

Thanks for your email and sorry for not trying 1.4.2 before emailing in.  It works 
great since trying the latest version and I now have a fully working test program 
which regenerates the Searcher object in the background and then when that thread is 
finished any new searches use the new Searcher object with the new Sort.  The searches 
stay fast even while regenerating (as I give the background thread low priority) and 
the updates are reflected in the search results at regular intervals.   There don't 
seem to be any memory leaks or anything either. Thanks again for your help and next 
time will definately try latest version first :-)

Steve 
  - Original Message - 
  From: Doug Cutting 
  To: Lucene Users List 
  Sent: Friday, October 08, 2004 7:21 PM
  Subject: Re: Sort regeneration in multithreaded server


  Stephen Halsey wrote:
  > I was wondering if anyone could help with a problem (or should that be
  > "challenge"?) I'm having using Sort in Lucene over a large number of records
  > in multi-threaded server program on a continually updated index.
  > 
  > I am using lucene-1.4-rc3.

  A number of bugs with the sorting code have been fixed since that 
  release.  Can you please try with 1.4.2 and see if you still have the 
  problem?  Thanks.

  Doug

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sort regeneration in multithreaded server

2004-10-08 Thread Doug Cutting
Stephen Halsey wrote:
I was wondering if anyone could help with a problem (or should that be
"challenge"?) I'm having using Sort in Lucene over a large number of records
in multi-threaded server program on a continually updated index.
I am using lucene-1.4-rc3.
A number of bugs with the sorting code have been fixed since that 
release.  Can you please try with 1.4.2 and see if you still have the 
problem?  Thanks.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Sort regeneration in multithreaded server

2004-10-08 Thread Stephen Halsey
Hi,

I was wondering if anyone could help with a problem (or should that be
"challenge"?) I'm having using Sort in Lucene over a large number of records
in multi-threaded server program on a continually updated index.

I am using lucene-1.4-rc3.

Question in more general terms:- Is it possible to write a multithreaded
search program which uses a Sort object that is updated at regular intervals
(e.g. every 5 minutes, taking 5 seconds to regenerate) while the searching
threads continue to do their sorted searching without any 5 seconds
interruption?

Question in quick specific format: Can I generate a new updated Sort object
in a separate Thread of my search server program while the original Sort
object continues to be used in the other Threads of the program and then
switch the searching Threads to the new Sort object?

More details: We are using Lucene to index about one million news articles
and the index size is about 3Gb and needs to be continually updated with new
news records.  I have written a search server which performs sorted searches
on the index.  The "challenge" is that the Sort object does not update in
memory as the index is updated on disk and so has to be regenerated.  This
takes about 5 seconds and so cannot be done for every single search.  I
thought I would be able to regenerate the Sort and Searcher objects in a
separate Thread and then pass them to the searcher Threads for searching,
but have found that there seems to be some kind of memory locking that stops
this from being possible.

I have written a simple test program (attached, with output) that
demonstrates this problem by running a sorted search in one or two threads.
If you run it with one thread it runs fine, with the searches that
regenerate the Sort object taking about 5 seconds and the searches
themselves taking only 0.25 seconds.  But if you run it with two threads
then every search takes about 10 seconds, which implies that the Sort object
is being regenerated for every single search.  I am guessing that this is
because Lucene has been written in a Thread safe way and so to be safe the
Sort object is being regenerated every time?

If it turns out that what I am trying to do is not possible then I will
probably just restart the search server program every 5 minutes and load
balance the searches across a number of servers, but that seems a bit messy
compared to regenerating it in memory in a continually running program?
Thanks in advance, and don't worry - its not urgent and if I don't get the
answer I think it should be OK(ish) doing it the messy restarting server
way.

ta


Steve


testDoTwoSeparateThreadsWithSorts.java:-

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.FilterIndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

import java.net.*;
import java.io.*;
import java.util.*;
import java.lang.*;

//*//
//
// This program tests running two separate threads each running searches and
then refreshing the Sort object
// every so often.  This is needed in our search server since it runs
continuously in multiple threads and
// never dies and so as the lucene index is updated the Sort and Searcher
objects in each thread have to be updated.
// I find with this program that when two threads are running the Sort
object seems to be regererated every time
// which causes each search to take about 10 seconds.  With only one thread
the regeneration of the Sort object takes
// about 5 seconds and then each search only takes 200 milliseconds or so.
//
// cd /home1/moreover/lucene/test_programs/; javac
testDoTwoSeparateThreadsWithSorts.java; java -ms100m -mx200m
testDoTwoSeparateThreadsWithSorts
/home1/moreover/lucene_indexes/testKeepSortInMemoryIndex/ news
dontRunSecondThread
//
// cd /home1/moreover/lucene/test_programs/; javac
testDoTwoSeparateThreadsWithSorts.java; java -ms100m -mx200m
testDoTwoSeparateThreadsWithSorts
/home1/moreover/lucene_indexes/testKeepSortInMemoryIndex/ news
doRunSecondThread
//
//
//
//**//


class testDoTwoSeparateThreadsWithSorts {

public static void main(String[] args) {

 try {
 // initialise variables
 String indexDirectory = args[0];
 String query = args[1];
 String runSecondThread = args[2];

 System.out.println(": Starting first thread to do s