Hi there,
I got a couple of questions that I need help with, Please help.
I'm sort of new to this nutch-dev emailing listing. I'm not quite should how or
what's the appropriate way of getting envolve with the Nutch development group.
Please let me know Who should I be contacting in regards to issue and question
about Nutch?
I've been using Nutch and customizing it so that the returned search results
can be manage by the use of paging on the web. I'm doing this for my company
and my supervisor has agreed to contribute the code for paging to the nutch
community. Please help guide me on how to proceed with this.
Finally, a technical question. I've using Nutch v0.7 and I've been running
nutch on our company unix system and it was setup to crawl our intranet sites
for updates daily, I've tried using the Merge, dedup, updatedb, and etc...I'd
notice the time complexity and efficiency was less productive than doing a
fresh new crawl. For example if I have two separate crawls from two different
domains such as hotmail and yahoo, what would the time complexity for nutch to
crawl this two domains and then do a merge compare to just doing a single full
crawl of both domains? My guess would be that it will take nutch the same
amount of times to do either one, if that is so is there a reason to use the
Merge at all? Please let me know what you think, I'm still trying to understand
how nutch behave, don't mean to criticize anyone who've work on the Merge
feature for nutch.
Thanks.
Alex
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail makes sharing a breeze.