I am just wondering how to increase the size of crawled index and segments. It seems that we need to crawl the large data set again. Is this right??
In addition, I would like to reproduce the experimental results appeared in the paper, *clearing the clouds*. The paper used an index size of 2GB and data segment size of 23GB of content crawled from the public web. Could you explain me which public sites you crawled ? Next, I have a question about configuring clients. How many clients are used in the experiments? and what terms_en.out is used ? - Jeongseob 2013-06-09 16:16 GMT+09:00 Hailong Yang <[email protected]>: > Hi Zacharias, > > Have you tried to increase the size of your crawled index and segments? > For example, the clearing cloud paper says they used 2GB index and 23GB > segments. > > Best > > Hailong > > > On Fri, May 31, 2013 at 10:24 PM, zhadji01 <[email protected]> wrote: > >> Hi, >> >> I have a web-search benchmark setup with 4 machines 1 client, 1 >> front-end, 1 search server and 1 segment server for fetching the summaries. >> >> All machines are two-socket Xeon E5620 @2.4Ghz, 32GB RAM and they are >> connected with 1Gb Ethernet. My crawled data is 400 MB index and 4GB >> segments. >> >> My problem is that the servers' cpu utilization is very low. The max >> throughput I managed to get using the faban client or apache benchmark was >> ~400-450 queries/sec with user cpu utlizations: frontend ~5%, search server >> ~ 10%, segment server ~35-39%. >> >> I'm sure that the network is not the bottleneck cause I'm not even close >> to fill the bandwidth. >> >> Can you give any suggestions on how to utilize well the servers or any >> thoughts on what can be the problem? >> >> Thanks Zacharias Hadjilambrou >> > >
