Re: Fwd: Search Engine Framework decision

2014-01-28 Thread rashmi maheshwari
Thanks saurish.


My office *intranet *is a sharepoint website. When I am crawling it using
nutch, i am getting Unauthorized access(404) error. NTLM realm is used in
this website.

I checked on one nutch JIRA link that sharepoint could be accessed using
nutch. Nutch has below properties in nutch-default.xml.

http.proxy.host (should it be intranet site path?)
http.proxy.port
http.proxy.username  (should this contain domain too?)
http.proxy.password
http.proxy.realm (should it be my desktop machin domain by which i login to
my machine? using same domain/username i could access intranet from browser)


Also, nutch has httpclient-auth xml file for giving credentials for
authentication.

What do  I provide in below properties in nutch-site.xml?


And what should be values in httpclient-auth.xml file?



Regards,
Rashmi


On Mon, Jan 27, 2014 at 3:57 PM, saurish srinivas.oruga...@gmail.comwrote:

 Hi,

 Looks like there is support for Sharepoint as well as Windows Share in
 ManifoldCF.

 Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc
 with a local file folder).

 Nutch 1.7 and Solr 4.5.1 have worked for me.

 Regards,



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org


Re: Fwd: Search Engine Framework decision

2014-01-27 Thread saurish
Hi,

Looks like there is support for Sharepoint as well as Windows Share in
ManifoldCF. 

Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc
with a local file folder).

Nutch 1.7 and Solr 4.5.1 have worked for me.

Regards,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Search Engine Framework decision

2014-01-26 Thread rashmi maheshwari
Hi,

I want to creating a POC to search INTRANET along with documents uploaded
on intranet. Documents(PDF, excel, word document, text files, images,
videos) are also exists on SHAREPOINT. sharepoint has Authentication access
at module level(folder level).

My interanet website is http://myintranet/ http://sparsh/ . and
Sharepoint url is different. Documents also exist in file folders.

I have below queries:
A) Which crawler framework do I use along with Solr for this POC, Nutch
or Apache ManifoldCF?

B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only
configuration level change would make this possible? or I have to write
code to parse and send to solr?

C) Which version of Solr+nutch+MCF should be used? because nutch version
has dependency on solr version. wold nutch 1.7 works properly with solr
4.6.0?
-- 
Rashmi
Be the change that you want to see in this world!




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org


Re: Fwd: Search Engine Framework decision

2014-01-26 Thread Ahmet Arslan


Rashmi,

As far as I know Nutch is a web crawler. I don't think it can crawl documents 
from Microsoft Share Point. ManifoldCF is a better fit in your case.

Regarding versioning if you don't have previous setups, then use latest 
versions of each.

Ahmet


On Sunday, January 26, 2014 5:24 PM, rashmi maheshwari 
maheshwari.ras...@gmail.com wrote:
Hi,

I want to creating a POC to search INTRANET along with documents uploaded
on intranet. Documents(PDF, excel, word document, text files, images,
videos) are also exists on SHAREPOINT. sharepoint has Authentication access
at module level(folder level).

My interanet website is http://myintranet/ http://sparsh/ . and
Sharepoint url is different. Documents also exist in file folders.

I have below queries:
A) Which crawler framework do I use along with Solr for this POC, Nutch
or Apache ManifoldCF?

B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only
configuration level change would make this possible? or I have to write
code to parse and send to solr?

C) Which version of Solr+nutch+MCF should be used? because nutch version
has dependency on solr version. wold nutch 1.7 works properly with solr
4.6.0?
-- 
Rashmi
Be the change that you want to see in this world!




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org