Thank you very much!
On 8/5/16, 2:13 PM, "Markus Jelsma" <markus.jel...@openindex.io> wrote: >I am not sure which version is was added, you'd have to check CHANGES.txt, but >upgrading is usually a good idea and very simple. >Markus > > > >-----Original message----- >> From:Arora, Madhvi <mar...@automationdirect.com> >> Sent: Friday 5th August 2016 19:53 >> To: user@nutch.apache.org >> Subject: Re: Protocol change to https >> >> Markus so to crawl https and http urls successfully we just need to switch >> to a newer version of Nutch I.e. Higher than Nutch 1.10? >> >> >> >> On 8/5/16, 12:47 PM, "Markus Jelsma" <markus.jel...@openindex.io> wrote: >> >> >Hello - see inline. >> >Markus >> > >> >-----Original message----- >> >> From:Arora, Madhvi <mar...@automationdirect.com> >> >> Sent: Friday 5th August 2016 18:03 >> >> To: user@nutch.apache.org >> >> Subject: Protocol change to https >> >> >> >> Hi, >> >> >> >> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites >> >> that are crawled regularly. We are changing protocol of a few websites >> >> from http to https. So we will have a mix bag of http and https protocols. >> >> I checked in nutch user-mail archive and get that we need to change >> >> protocol-http to protocol-httpclient. >> >> 1: I wanted to find out the best way to handle this >> > >> >You can still use protocol-http, in some recent version we added TLS >> >support to it. >> > >> >> 2: What are the issues with using protocol-httpclient i.e. there were >> >> previous references to issues with use of protocol-httpclient. >> > >> >It does not allow unencoded URL's, but in recent Nutch' we improved basic >> >normalizer to fix it for you. >> > >> >> 3: Steps that need to be taken to update the SOLR index. I think that I >> >> will need to delete the old http urls from solr index, re-crawl and index >> >> the urls that need to be switched to https. >> > >> >Yes, just delete and recrawl and reindex everything. And consider upgrading >> >to 1.12. >> > >> >> >> >> I will be grateful for any guidance or suggestions. >> >> >> >> Thanks, >> >> Madhvi >> >> >> >> >>