Hi there,

I got a couple of questions that I need help with, Please help.

I'm sort of new to this nutch-dev emailing listing. I'm not quite should how or 
what's the appropriate way of getting envolve with the Nutch development group. 
Please let me know Who should I be contacting in regards to issue and question 
about Nutch?

I've been using Nutch and customizing it so that the returned search results 
can be manage by the use of paging on the web. I'm doing this for my company 
and my supervisor has agreed to contribute the code for paging to the nutch 
community. Please help guide me on how to proceed with this.

Finally, a technical question. I've using Nutch v0.7 and I've been running 
nutch on our company unix system and it was setup to crawl our intranet sites 
for updates daily, I've tried using the Merge, dedup, updatedb, and etc...I'd 
notice the time complexity and efficiency was less productive than doing a 
fresh new crawl. For example if I have two separate crawls from two different 
domains such as hotmail and yahoo, what would the time complexity for nutch to 
crawl this two domains and then do a merge compare to just doing a single full 
crawl of both domains? My guess would be that it will take nutch the same 
amount of times to do either one, if that is so is there a reason to use the 
Merge at all? Please let me know what you think, I'm still trying to understand 
how nutch behave, don't mean to criticize anyone who've work on the Merge 
feature for nutch. 

Thanks.

Alex



                
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze. 

Reply via email to