Hello, Did take a look at nutch or hadoop or solr? They partially seem to address the things you describe...About the LSI I am not sure what has been done in those projects
Regards Ard > > Hi, Please help me. > Its been a month since i am trying lucene. > My requirements are huge, i have to index and search in TB of data. > I have question regarding three topics: > > 1. Problem in Indexing > As i need to index TB of data, so by googling and visiting > different forum > i deployed following fashion for indexing: > > 1. First i created an array of RAMDirectory then i added the > documents on > it. > After crossing certain threshold i dumped it into my drive > as tempIndex1 > > 2. I repeated the same process until all documents are > indexed in my drive > as tempindex1. tempindex2... > > 3. Then finally i loaded the temp directories and merged in > as a main full > indexed directories. > > 4. I have used threading too for this purpose. > > 5. This some what removed the optimize() overhead of > IndexWriter, as i added > directories together only at the end. > > Am i doing this the right way or not, is there any other > solution to boost > the indexing process. > > 2. Problem in searching > > As lucene doesnt support LSI and SVD so as to achieve > conceptual search, i > first search the lucene index for the user inputted text then > retrieved the > document and then expanded the query using LSI and SVD and > then re-searched > the index. > Now with few words in query doesnt seem to have performance > problem but when > i expand the query i.e. when the query contains ten words > ORed together then > it takes tremendously unacceptable amount of time to get Hits. Is this > obvious or am i missing something here too.. > What are the ways to achieve boost in query performance when the query > contains many terms and especially they are ORed, as for > ANDed query it > requires less time to produce Hits. > I have used single Indexsearcher and my index is optimized as well.. > > > 3. Another Problem > > As i require to dump my database table too inside lucene > along with fulltext > info. > What effect will it have on indexing and searching.? > Also i might need to change the name of Field of Document > indexed in lucene, > will it be possible.? > I know its not possible to change the value of the field but > will it be > possible to change the name of the field or we have to > control externally..? > > Please shade me some light in these things: > Your help is highly anticipated > > -- > View this message in context: http://www.nabble.com/Inrease-the-performance-of-Indexing-in-Lucene-tf4108165.html#a11682360 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
