Hi Andrzej ,
 
Thanks for your great help.The main reason i am trying this is to remove duplicates as the
DeleteDuplicate tool don't work good for me and end up with many sites with same contents.I will work on what you suggested.
 
Thanks Again,
Kashif
 

Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Kashif Khadim wrote:
> Hi,
> Iam using SegmentMergeTool and it is taking so long, i want to know how
> much time to expect for this tool to finish.After reading segment with
> 200000 enteries it just sits there for two days, is this normal ?.

Definitely not normal... Is the process swapping? You can get a list of
threads and their state by using Ctrl-E. If you get any info, it means
the process is not hanging, just taking its time... ;-)

Probably you need to kill the process anyway. Can you throw in some
LOG.info() here and there and see where it's hanging?

Oh, one more thing: SegmentMergeTool does a lot of random seeking in the
last stage of processing. However, seeking on segments with partially
truncated MapFile "index" files takes a LOT of time... If you suspect
some of your "index" files are truncated (e.g. because of a crashed
fetcher), it's better just to remove them from the offending directory,
and run SegmentReader -fix on them. The speed improvement in seeking
will be like 20-50 times. Just make sure all your segments have correct
"index" files (e.g. by running SegmentReader -list) and re-run the
SegmentMergeTool.

--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers


Do you Yahoo!?
Meet the all-new My Yahoo! � Try it today!

Reply via email to