Hi First of all, you should probably know, what you want to do exactly. Without this, it is hard to estimate any hardware requirements. I assume, you want to use Hadoop for some kind of offline calculations used for web-based search later ? In your place I would start with reading about, how such indexing can work. Then, when you know what you need and know the algorithms being used, you can estimate the hardware/software requirements.
If you want just to do the indexing and searching of the documents, you could probably look at the CDS Invenio project ( http://cdsware.cern.ch/invenio/index.html ). It provides such functionality already. regards Piotr 2009/5/14 PORTO aLET <portoa...@gmail.com> > Hi, > > My company has about 50GB of pdfs and docs, and we would like to be able to > do some text search over a web interface. > Is there any good tutorial that specifies hardware requirements and > software > specs to do this? > > Regards >