Hi,

We are using Nutch 1.10 and Solr 5. We have around 10 different web sites that 
are crawled regularly. We are changing  protocol of a few websites from http to 
https. So we will have a mix bag of http and https protocols.
I checked in nutch user-mail archive and get that we need to change 
protocol-http to protocol-httpclient.
1: I wanted to find out the best way to handle this
2: What are the issues with using protocol-httpclient i.e. there were previous 
references to issues with use of protocol-httpclient.
3: Steps that need to be taken to update the SOLR index. I think that I will 
need to delete the old http urls from solr index, re-crawl and index  the urls 
that need to be switched to https.

I will be grateful for any guidance or suggestions.

Thanks,
Madhvi

Reply via email to