Thanks for your help Byron... But I think I didn't make myself clear (or didn't understand you answer).
I'll check the last svn, but the merge of segments (SegmentMergeTool.java) that I have here is working fine for the content, but I can't fine de index dir inside the new merged segment dir. So, If I merge the index (IndexMerger.java) of my actual segments (a part of it or all of them) before merge the segments, It's pointing to the old segments. Sorry to boder you with this question, but I re-indexing all my fetched urls every time I merge the segments, and if I don't merge the segs, I get into "Too many open files" :-( Where can I find more documentation about it? Thanks again, Leonardo Barbosa. On Apr 7, 2005 4:48 PM, Byron Miller <[EMAIL PROTECTED]> wrote: > Your merged index will only reference the segments you choose to marge. > > For me i'll have 200 segments of about 1 million urls a piece. I > generally index each one individually and merge 10 and put that on a query > server and work my way down. > > The nice thing is with svn current the merge of segments works fine and > update of scoring is easier to do. > > Takes some handy work, but is doable :) > > -----Original Message----- > From: Leonardo Barbosa <[EMAIL PROTECTED]> > To: [email protected] > Date: Thu, 7 Apr 2005 11:43:38 -0300 > Subject: Merge question > > > Hello, > > > > I configured nutch to crawl and index my intranet periodically, and > > now I'm trying to find the ideal merge process. I've looked in the > > list achive and find a discussion about it (please see below), but I > > still have one question : The solution #2 was kind of standad as I've > > noticed, but my problem is, when I have lots of segment dirs, I start > > to have "Too many open files" exception. > > So I need to merge them, and by doing that, do I need to index it > > again? Because it is an expensive process to index all the content, > > and I have it already indexed in the segment dirs. > > Can't I used the merged index created by "./nutch merge" facility? The > > problem that I've found is that the merged index that I created > > (solution 2) is pointing to the old segments. Can't I "update" the > > index to point to the new fresh merged segment? > > Shouldn't the "./nutch mergesegs" create a merged index? i'm kind of > > confused with this.. :-) > > > > Best regards, > > Leonardo Barbosa. > > > > From > > nutch-user-return-53-apmail-incubator-nutch-user-archive=www.apache.org > > @incubator.apache.org > > Thu Mar 10 18:58:58 2005 > > > > > Should I : > > > > > > 1) merge all the segments and then index them, or > > > 2) Should I index each segment individually and then merge the > > indexes, > > > keeping the segments separate. Or > > > 3) Should I index each segment separately, and keep both segments and > > > indexes separate, and search across multiple indexes (but I have > > heard > > > there are issues with the ranking) > > > > Option #3 is not really that great. You get better performance with a > > merged index. Option #1 would be more work with having to merge the > > segments, and I'm not sure that there is a real advantage to doing that > > over option #2. Option #2 is what most people do. > > > > Luke > > > > -- ------------------------------------------------------------------------------------------ Encumbered forever by desire and ambition There's a hunger still unsatisfied Our weary eyes still stray to the horizon Though down this road we've been so many times Pink Floyd (David Gilmour/Polly Samson) - High Hopes ------------------------------------------------------------------------------------------
