Hello,

I was read a posting from Doug Cutting (circa 2001) that stated the following:

"Multi-CPU and/or multi-disk systems can provide greater parallelism and hence query 
throughput. However Lucene's FSDirectory serializes reads to a given file (since it 
only has a single file descriptor per file) which limits i/o parallelism. Someone with 
a large disk array would be better served by a Directory implementation that uses Java 
1.4's new i/o classes. In particular, the FileChannel class supports reads that do not 
move the file pointer, so that multiple reads on the same file can be in progress at 
the same time."

I attempted to implement this suggestion.  But, I did not have great success.

Basically, I copied the existing FSDirectory (from 1.3-rc1) and modified the 
FCInputStream inner class.  I changed it to get a FileChannel (channel) in the 
constructor and to clone properly.  But, mainly, I changed "readInternal" to look like 
this:

        protected void readInternal(byte[] b, int offset, int len)
                throws IOException
        {
                channel.read(ByteBuffer.wrap(b, offset, len), getFilePointer());
        }

In other words, wrap the byte array, let the channel do the reading, and get the 
current file pointer from the super class.

It works fine...  the same queries return the same results, etc.  But, the new 
Directory implementation consistently falls a few ms short of the old one (over 
sustained trials with various amounts of concurrency) re: overall response time.  
Usually it wins out for both 'querying' (i.e. Searcher.search) and loading (i.e. 
Hits.doc(i) ).

According to the FileChannel API, absolute reads should be able to occur concurrently. 
 However, the existing FSDirectory serializes access to the underlying files.  So, I 
figured that FSDirectory would be faster with a single search thread... but 
FileChannelDirectory would win with multiple threads.  Apparently, not so (given my 
implementation :-).  I also tested on a regular IDE HD and a SCSI.  Both tests, 
however, were Win2k based.


Does anyone know why I might not be seing a performance increase for multiple 
concurrent threads using my "FileChannelDirectory" ?


Any ideas would be appreciated.


Thank you,
Tate

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to