It feels to me like you're major problem might be file IO with all those files. There's no need to split the files up first and then index the files. Just read through the log and index each row. The code fragment you posted should allow you to get the line back from the "line" field of each document. That way, when you execute searches, you also avoid the file I/O. Once indexed, you shouldn't have to go back to the disk to get the data......
This scares me... ********************* ....Then when I search for a term, I get the log-file names that contain the data. Then I buffer-read those files and find out rows containing the data.... ********************* According to the fragment you posted, you already have the data in the index and don't need to go to disk and get it. Which could be a major issue..... Are we still talking about indexing speed or have we segued to search speed? Jeremy's suggestion about MaxBufferedDocs is well heeded..... One final confusion.... Watch the Hits object when searching. The hits object is built for getting the first few hits (100 or so). If you iterate over more documents than that, the Hits object will re-execute your query every 100 or so documents. You'll want to think about a HitCollector or TopDocs object for large result sets. Again, the quick timing thing is to cut off your responses at, say, 50 and see how long the search takes as opposed to collecting all responses. Ditto for the file I/O. If we're talking about the search speed, try it without reading the files.... Keep us posted Erick