Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes. --Matt
On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.