Dear Stavros, Thank you very much for your reply. I will go through that log right away.
Best Hailong On Mon, Oct 22, 2012 at 4:07 AM, Volos Stavros <[email protected]>wrote: > Dear Hailong, > > The frontend will ask the summary for the top documents. A backend node > will receive a getSummary request for every top document it owns. You can > go through the logs of the backend node and verify that the node does > receive getSummary requests. > > Regards, > -Stavros. > ________________________________________ > From: Hailong Yang [[email protected]] > Sent: Monday, October 22, 2012 10:38 AM > To: Volos Stavros > Cc: [email protected]; Lingjia Tang; Jason Mars > Subject: Re: How to fit the index into the memory for the web search > benchmark > > Dear Stavros, > > I am confused why we need to bring the segments into memory. I examined > the log file from the front end server which recorded the queries sent to > and responses received from the nutch server. The log file showed the nutch > server only replied how many hits were found in the crawled dataset without > being asked for the details of the page contents. So that means when > orchestrating the searching, the object NutchBean never needs to call the > method getSummary that accesses the segments to retrieve the page contents. > That is also to say we don't need to care about whether the size of the > segments could be able to fit into the memory for this specific web search > workload in CloudSuite, right? Please Correct me if I am wrong. > > Best > > Hailong > > > On Sun, Oct 21, 2012 at 9:04 AM, Volos Stavros <[email protected] > <mailto:[email protected]>> wrote: > Dear Hailong, > > The reason you get I/O activity is due to the fact that the segments don't > fit into the memory. > > I would recommend reducing the size of your index so that indexes+segments > occupy roughly 16GB. > > This is relatively easy to do in case you used multiple reducer tasks > (during the crawling phase) to create > multiple partitions. > > (see Notes at http://parsa.epfl.ch/cloudsuite/search.html: The > mapred.reduce.tasks property > determines how many index and segment partitions will be created.) > > Regards, > -Stavros. > ________________________________________ > From: Hailong Yang [[email protected]<mailto:[email protected]>] > Sent: Friday, October 19, 2012 8:03 PM > To: Volos Stavros > Cc: [email protected]<mailto:[email protected]>; Lingjia > Tang; Jason Mars > Subject: Re: How to fit the index into the memory for the web search > benchmark > > Dear Stavros, > > Thank you for your reply. I understand the data structures required during > the search. The 6GB is only the size of the actual index ( the directory of > indexes). The whole data including the segments accounts for 30GB. > > Best > > Hailong > > On Fri, Oct 19, 2012 at 9:03 AM, Volos Stavros <[email protected] > <mailto:[email protected]><mailto:[email protected]<mailto: > [email protected]>>> wrote: > Dear Hailong, > > There are two components that are used when performing a query against the > index serving node: > (a) the actual index (under indexes) > (b) segments (under segments) > > What exactly is 6GB? Are you including the segments as well? > > Regards, > -Stavros. > > > ________________________________________ > From: Hailong Yang [[email protected]<mailto:[email protected] > ><mailto:[email protected]<mailto:[email protected]>>] > Sent: Wednesday, October 17, 2012 4:51 AM > To: [email protected]<mailto:[email protected]><mailto: > [email protected]<mailto:[email protected]>> > Cc: Lingjia Tang; Jason Mars > Subject: How to fit the index into the memory for the web search benchmark > > Hi CloudSuite, > > I am experimenting with the web search benchmark. However, I am wondering > how to fit the index into the memory in order to avoid unnecessary disk > access. I have a 6GB index crawled from wikipedia and the RAM is 16GB. > During the workload execution, I noticed there were periodical 2% I/O > utilization increase and the memory used by nutch server was always less > than 500MB. So I guess the whole index is not brought into the memory by > default before serving the search queries, right? Could you tell me how to > do that exactly as you did in the clearing cloud paper. Thanks! > > > Best > > Hailong > > >
