Hi, I am working on a web development project using PHP and mySQL. The team has implemented full text search with mySQL, but is now researching Lucene to help with performance/scalability issues. The team is looking for a developer who has experience working with Lucene and can assist with integrating into our environment. What follows is a brief overview of the problems that we're working to address. If you have the experience with using Lucene with large amounts of data (we have roughly 16 million records) where search time is critical (needs to be under .2 seconds), then please respond. Thanks, Gordon Riggs [EMAIL PROTECTED] 1. Loading index into memory using Lucene's RAMDirectory Why is the Java heap 2.9GB for a 1.4GB index? Why can we not load an index over 1.4GB in size? We receive 'java.lang.OutOfMemoryError' even with the -mx flag set to as high as '10g'. We're using a dedicated test machine which has dual AMD Opteron processors and 12GB of memory. The OS is SuSE Linux Enterprise Server 9 (x86_64). The java version is: Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2) Java HotSpot(TM) 64-Bit Server VM (build Blackdown-1.4.2-fcs, mixed mode) We also get similar results with: Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
2. How to keep Lucene and Java in memory, to improve performance The idea is to have a Lucene "daemon" that loads the index into memory once on startup. It then listens for connections and performs search requests for clients using that single index instance. Do you foresee any problems (other than the ones stated above) with this approach? Garbage collection and/or memory leaks? Performance issues? Concurrency issues with multiple searches coming in at once? What's involved in writing the daemon? Assuming that we need the daemon, we need to find out how big a job it is to develop, what requirements need to be specified, etc. 3. How to interface our PHP web application with Java Our web application is written in PHP so we need a communication interface for performing search queries that is both PHP and Java friendly. What do you think would be a good solution? XML-RPC? What's involved in developing the solution? 4. How to tune Lucene Are there ways to "tune" Lucene in order to improve performance? We already plan on moving the index into memory. What else can be done to improve the search times? Can the way the index is built affect performance? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]