Hi

First of all, you should probably know, what you want to do exactly. Without
this, it is hard to estimate any hardware requirements.
I assume, you want to use Hadoop for some kind of offline calculations used
for web-based search later ?
In your place I would start with reading about, how such indexing can work.
Then, when you know what you need and know the algorithms being used, you
can estimate the hardware/software requirements.

If you want just to do the indexing and searching of the documents, you
could probably look at the CDS Invenio project (
http://cdsware.cern.ch/invenio/index.html ). It provides such functionality
already.

regards

Piotr


2009/5/14 PORTO aLET <portoa...@gmail.com>

> Hi,
>
> My company has about 50GB of pdfs and docs, and we would like to be able to
> do some text search over a web interface.
> Is there any good tutorial that specifies hardware requirements and
> software
> specs to do this?
>
> Regards
>

Reply via email to