For example, I want to crawl 20,000 pages everyday for 10 days and then merge
the data for search. So far, I can't get it to work.
Any advices? Could someone let me know whether I can do this on 0.8 at all?

Thank you.


From: "Gal Nitzan" <[EMAIL PROTECTED]>
Reply-To: <[EMAIL PROTECTED]>
To: <[email protected]>
Subject: RE: help please! - issues with merging indexes w/ DFS on 0.8 Date: Mon, 3 Apr 2006 19:50:48 +0200
MIME-Version: 1.0
Received: from mail.apache.org ([209.237.227.199]) by bay0-mc6-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Apr 2006 09:50:38 -0700
Received: (qmail 22386 invoked by uid 500); 3 Apr 2006 16:50:38 -0000
Received: (qmail 22375 invoked by uid 99); 3 Apr 2006 16:50:38 -0000
Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2006 09:50:38 -0700
Received: neutral (asf.osuosl.org: local policy)
Received: from [165.212.64.31] (HELO cmsout01.mbox.net) (165.212.64.31) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2006 09:50:36 -0700 Received: from cmsout01.mbox.net (cmsout01.mbox.net [165.212.64.31])by cmsout01.mbox.net (Postfix) with ESMTP id CDD8778150for <[email protected]>; Mon, 3 Apr 2006 16:50:12 +0000 (GMT) Received: from uadvg137.cms.usa.net [165.212.11.137] by cmsout01.mbox.net via smtad (C8.MAIN.3.27X); Mon, 03 Apr 2006 16:50:12 GMT Received: from GALTOP [212.143.37.129] by uadvg137.cms.usa.net(ASMTP/[EMAIL PROTECTED]) via mtad (C8.MAIN.3.27X) with ESMTP id 775kDcqYL0060M37; Mon, 03 Apr 2006 16:50:11 GMT
X-Message-Info: JGTYoYF78jEHjJx36Oi8+Z3TmmkSEdPtfpLB7P/ybN8=
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
Precedence: bulk
List-Help: <mailto:[EMAIL PROTECTED]>
List-Unsubscribe: <mailto:[EMAIL PROTECTED]>
List-Post: <mailto:[email protected]>
List-Id: <nutch-user.lucene.apache.org>
Delivered-To: mailing list [email protected]
X-ASF-Spam-Status: No, hits=0.3 required=10.0tests=MAILTO_TO_SPAM_ADDR
X-Spam-Check-By: apache.org
X-USANET-Source: 165.212.11.137  IN   [EMAIL PROTECTED] uadvg137.cms.usa.net
X-USANET-MsgId: XID682kDcqYm4227X01
X-USANET-Auth: 212.143.37.129  AUTH [EMAIL PROTECTED] GALTOP
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670
Thread-Index: AcZXLmysUv2gMyKTSBO/nXaAbohfRQAFzsUg
Z-USANET-MsgId: XID775kDcqYL0060X37
X-Virus-Checked: Checked by ClamAV on apache.org
Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 03 Apr 2006 16:50:39.0019 (UTC) FILETIME=[BC963FB0:01C6573E]

Hi,

I'm not sure what you are doing so I will just describe hat I'm doing and
maybe you would find the answer :)

Let's make some assumptions:
1. main nutch dfs is: /user/nutchuser
1.1 you should have /user/nutchuser/crawldb
1.2 you should have /user/nutchuser/segments with some fetched segments in
it
2. now you need to create the linkdb
2.1 bin/nutch invertlinks linkdb -dir segments
3. now you need to index your segments
3.1 bin/nutch index indexes crawldb linkdb segments/segment1
segments/segment2 add all your segments here one after another
4. now remove duplicates
4.1 bin/nutch dedup indexes
5. NOW merge
5.1 bin/nutch merge index indexes (this command will create a new folder
named /user/nutchuser/index whuch contains your new merged index).

Hope it helps.

Gal.


-----Original Message-----
From: Olive g [mailto:[EMAIL PROTECTED]
Sent: Monday, April 03, 2006 4:53 PM
To: [email protected]
Subject: help please! - issues with merging indexes w/ DFS on 0.8

Hi gurus,

I ran into similar issue as
http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04073.html.
I just could not get index merging to work. I've browsed the mailing
archives the tried many things
and nothing worked so far. Does 0.8 support merging indexes?

I really appreciate any help.

Thank you!

Olive

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement




_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to