Hi Jeongseob,

Exactly. You need to perform the crawling phase multiple times so that you get 
a larger index.

You don't really need to crawl the same public sites we have crawled nor use 
the same terms_en.out. In any case, wikipedia was one of them.

You need to have enough clients to saturate your CPU while maintaining 
quality-of-service.

Hope this helps.

-Stavros.

On Jun 8, 2014, at 8:27 PM, J Ahn wrote:

I am just wondering how to increase the size of crawled index and segments. It 
seems that we need to crawl the large data set again.
Is this right??

In addition, I would like to reproduce the experimental results appeared in the 
paper, clearing the clouds. The paper used an index size of 2GB and data 
segment size of 23GB of content crawled from the public web. Could you explain 
me which public sites you crawled ?

Next, I have a question about configuring clients. How many clients are used in 
the experiments? and what terms_en.out is used ?

- Jeongseob


2013-06-09 16:16 GMT+09:00 Hailong Yang 
<[email protected]<mailto:[email protected]>>:
Hi Zacharias,

Have you tried to increase the size of your crawled index and segments? For 
example, the clearing cloud paper says they used 2GB index and 23GB segments.

Best

Hailong


On Fri, May 31, 2013 at 10:24 PM, zhadji01 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I have a web-search benchmark setup with 4 machines 1 client, 1 front-end, 1 
search server and 1 segment server for fetching the summaries.

All machines are two-socket Xeon E5620 @2.4Ghz, 32GB RAM and they  are 
connected with 1Gb Ethernet. My crawled data is 400 MB index and 4GB segments.

My problem is that the servers' cpu utilization is very low. The max throughput 
I managed to get using the faban client or apache benchmark was ~400-450 
queries/sec with user cpu utlizations: frontend ~5%, search server ~ 10%, 
segment server ~35-39%.

I'm sure that the network is not the bottleneck cause I'm not even close to 
fill the bandwidth.

Can you give any suggestions on how to utilize well the servers or any thoughts 
on what can be the problem?

Thanks Zacharias Hadjilambrou



Reply via email to