Dear Stavros, Thank you for your reply. I understand the data structures required during the search. The 6GB is only the size of the actual index ( the directory of indexes). The whole data including the segments accounts for 30GB.
Best Hailong On Fri, Oct 19, 2012 at 9:03 AM, Volos Stavros <[email protected]>wrote: > Dear Hailong, > > There are two components that are used when performing a query against the > index serving node: > (a) the actual index (under indexes) > (b) segments (under segments) > > What exactly is 6GB? Are you including the segments as well? > > Regards, > -Stavros. > > > ________________________________________ > From: Hailong Yang [[email protected]] > Sent: Wednesday, October 17, 2012 4:51 AM > To: [email protected] > Cc: Lingjia Tang; Jason Mars > Subject: How to fit the index into the memory for the web search benchmark > > Hi CloudSuite, > > I am experimenting with the web search benchmark. However, I am wondering > how to fit the index into the memory in order to avoid unnecessary disk > access. I have a 6GB index crawled from wikipedia and the RAM is 16GB. > During the workload execution, I noticed there were periodical 2% I/O > utilization increase and the memory used by nutch server was always less > than 500MB. So I guess the whole index is not brought into the memory by > default before serving the search queries, right? Could you tell me how to > do that exactly as you did in the clearing cloud paper. Thanks! > > > Best > > Hailong >
