Re: Fwd: Search Engine Framework decision
Thanks saurish. My office *intranet *is a sharepoint website. When I am crawling it using nutch, i am getting Unauthorized access(404) error. NTLM realm is used in this website. I checked on one nutch JIRA link that sharepoint could be accessed using nutch. Nutch has below properties in nutch-default.xml. http.proxy.host (should it be intranet site path?) http.proxy.port http.proxy.username (should this contain domain too?) http.proxy.password http.proxy.realm (should it be my desktop machin domain by which i login to my machine? using same domain/username i could access intranet from browser) Also, nutch has httpclient-auth xml file for giving credentials for authentication. What do I provide in below properties in nutch-site.xml? And what should be values in httpclient-auth.xml file? Regards, Rashmi On Mon, Jan 27, 2014 at 3:57 PM, saurish srinivas.oruga...@gmail.comwrote: Hi, Looks like there is support for Sharepoint as well as Windows Share in ManifoldCF. Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc with a local file folder). Nutch 1.7 and Solr 4.5.1 have worked for me. Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html Sent from the Solr - User mailing list archive at Nabble.com. -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org
Re: Fwd: Search Engine Framework decision
Hi, Looks like there is support for Sharepoint as well as Windows Share in ManifoldCF. Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc with a local file folder). Nutch 1.7 and Solr 4.5.1 have worked for me. Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: Search Engine Framework decision
Hi, I want to creating a POC to search INTRANET along with documents uploaded on intranet. Documents(PDF, excel, word document, text files, images, videos) are also exists on SHAREPOINT. sharepoint has Authentication access at module level(folder level). My interanet website is http://myintranet/ http://sparsh/ . and Sharepoint url is different. Documents also exist in file folders. I have below queries: A) Which crawler framework do I use along with Solr for this POC, Nutch or Apache ManifoldCF? B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only configuration level change would make this possible? or I have to write code to parse and send to solr? C) Which version of Solr+nutch+MCF should be used? because nutch version has dependency on solr version. wold nutch 1.7 works properly with solr 4.6.0? -- Rashmi Be the change that you want to see in this world! -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org
Re: Fwd: Search Engine Framework decision
Rashmi, As far as I know Nutch is a web crawler. I don't think it can crawl documents from Microsoft Share Point. ManifoldCF is a better fit in your case. Regarding versioning if you don't have previous setups, then use latest versions of each. Ahmet On Sunday, January 26, 2014 5:24 PM, rashmi maheshwari maheshwari.ras...@gmail.com wrote: Hi, I want to creating a POC to search INTRANET along with documents uploaded on intranet. Documents(PDF, excel, word document, text files, images, videos) are also exists on SHAREPOINT. sharepoint has Authentication access at module level(folder level). My interanet website is http://myintranet/ http://sparsh/ . and Sharepoint url is different. Documents also exist in file folders. I have below queries: A) Which crawler framework do I use along with Solr for this POC, Nutch or Apache ManifoldCF? B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only configuration level change would make this possible? or I have to write code to parse and send to solr? C) Which version of Solr+nutch+MCF should be used? because nutch version has dependency on solr version. wold nutch 1.7 works properly with solr 4.6.0? -- Rashmi Be the change that you want to see in this world! -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org