Dear Lei, I would like to ask you to register to the mailing list. This way, your emails will be posted immediately. Also, you will be able to receive any email that is sent to the mailing list and might be useful for you.
Regarding your question, frontend and indexing processes should run on different nodes. There is no need to have Hadoop running while you are running the benchmark. The HDFS is used during the crawling phase so someone can speed up the process by using multiple nodes. After crawling and preparing the index, you should copy the index to a local disk (follow the instructions) and then distribute the index across multiple indexing nodes (if your index is too big to fit into the memory of one node). Hope this helps. Regards, -Stavros. ________________________________________ From: Wang Lei [[email protected]] Sent: Wednesday, September 05, 2012 3:40 PM To: [email protected] Subject: [cloudsuite] question about nutch Dear all, I am working on testing cloudsuite performance of our processor. When I used cloudsuite's nutch testbench, I measured performance of the ''distributed search" process. Is that right? Does the performance of the DataNode, Namenode, jobtracker should be measured too? (I run HDFS namenode together with Nutch frontend and indexing node) Thanks -- Lei Wang Email: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>
