RE: [cloudsuite] question about nutch

Volos Stavros Thu, 06 Sep 2012 03:18:48 -0700

Dear Lei,

I would like to ask you to register to the mailing list. This way, your emails 
will be posted immediately. Also,
you will be able to receive any email that is sent to the mailing list and 
might be useful for you.


Regarding your question, frontend and indexing processes should run on 
different nodes. There is no need to
have Hadoop running while you are running the benchmark. The HDFS is used 
during the crawling phase so
someone can speed up the process by using multiple nodes. After crawling and 
preparing the index, you should
copy the index to a local disk (follow the instructions) and then distribute 
the index across multiple indexing nodes
(if your index is too big to fit into the memory of one node). 

Hope this helps.

Regards,
-Stavros.
________________________________________
From: Wang Lei [[email protected]]
Sent: Wednesday, September 05, 2012 3:40 PM
To: [email protected]
Subject: [cloudsuite] question about nutch

Dear all,

I am working on testing cloudsuite performance of our processor.
When I used cloudsuite's nutch testbench, I measured performance of the 
''distributed search" process.
Is that right?
Does the performance of the DataNode, Namenode, jobtracker should be measured 
too?
(I run HDFS namenode together with Nutch frontend and indexing node)
Thanks

--
Lei Wang
Email: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>

RE: [cloudsuite] question about nutch

Reply via email to