Thank you very much!



On 8/5/16, 2:13 PM, "Markus Jelsma" <markus.jel...@openindex.io> wrote:

>I am not sure which version is was added, you'd have to check CHANGES.txt, but 
>upgrading is usually a good idea and very simple.
>Markus
>
> 
> 
>-----Original message-----
>> From:Arora, Madhvi <mar...@automationdirect.com>
>> Sent: Friday 5th August 2016 19:53
>> To: user@nutch.apache.org
>> Subject: Re: Protocol change to https
>> 
>> Markus so to crawl https and http urls successfully we just need to switch 
>> to a newer version of Nutch I.e. Higher than Nutch 1.10? 
>> 
>> 
>> 
>> On 8/5/16, 12:47 PM, "Markus Jelsma" <markus.jel...@openindex.io> wrote:
>> 
>> >Hello - see inline.
>> >Markus 
>> > 
>> >-----Original message-----
>> >> From:Arora, Madhvi <mar...@automationdirect.com>
>> >> Sent: Friday 5th August 2016 18:03
>> >> To: user@nutch.apache.org
>> >> Subject: Protocol change to https
>> >> 
>> >> Hi,
>> >> 
>> >> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites 
>> >> that are crawled regularly. We are changing  protocol of a few websites 
>> >> from http to https. So we will have a mix bag of http and https protocols.
>> >> I checked in nutch user-mail archive and get that we need to change 
>> >> protocol-http to protocol-httpclient.
>> >> 1: I wanted to find out the best way to handle this
>> >
>> >You can still use protocol-http, in some recent version we added TLS 
>> >support to it.
>> >
>> >> 2: What are the issues with using protocol-httpclient i.e. there were 
>> >> previous references to issues with use of protocol-httpclient.
>> >
>> >It does not allow unencoded URL's, but in recent Nutch' we improved basic 
>> >normalizer to fix it for you.
>> >
>> >> 3: Steps that need to be taken to update the SOLR index. I think that I 
>> >> will need to delete the old http urls from solr index, re-crawl and index 
>> >>  the urls that need to be switched to https.
>> >
>> >Yes, just delete and recrawl and reindex everything. And consider upgrading 
>> >to 1.12.
>> >
>> >> 
>> >> I will be grateful for any guidance or suggestions.
>> >> 
>> >> Thanks,
>> >> Madhvi
>> >> 
>> >> 
>> 

Reply via email to