Yes, that's correct. You'll end up with duplicates.

A script should easily remove those if you have the default segment names
and no merge has taken place (remove segments old that seven days).

Another way would be to keep copies of the individual segments over N-Days
and merge the last N-Days every night (time consuming).

The best way, in my opinion is to have a SegmentIntersect tool. Take Segment
A, fetch list from Segment B -- gives Segment C that is A-B.

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stefan
Groschupf
Sent: Friday, February 04, 2005 3:51 PM
To: [EMAIL PROTECTED]
Subject: Re: [Nutch-dev] Segment generation

I tried that but I guess it will generate a fetchlist that may be contains
urls that was already fetched in older Segements. Isn't ?



Am 04.02.2005 um 21:45 schrieb Doug Cutting:

> Try '-addays 7'.  The next fetch date is incremented by seven days 
> when a page is generated for fetch.
>
> Doug
>
> Stefan Groschupf wrote:
>> Hi there,
>> I'm wondering if something was changed I missed in the fetch list 
>> generation tool.
>> I had generate a segment and for some reasons I delete the segment 
>> before fetching it and update db.
>> After that I was trying to generate a new segment however the segment 
>> fetch list has 0 entries. :-o
>> What can I do to regenerate the fetch list without rebuilding the 
>> database?
>> Thanks.
>> Stefan
>> ---------------------------------------------------------------
>> company:        http://www.media-style.com
>> forum:        http://www.text-mining.org
>> blog:            http://www.find23.net
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IntelliVIEW -- Interactive 
>> Reporting
>> Tool for open source databases. Create drag-&-drop reports. Save time
>> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>> _______________________________________________
>> Nutch-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Nutch-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
---------------------------------------------------------------
company:                http://www.media-style.com
forum:          http://www.text-mining.org
blog:                   http://www.find23.net



-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to