- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Luca Pellegrini
Subject: HOWTO crawl https:// urls passing from a proxy 

When DataPark finds an https url on a web page, it doesnt crawl it and i can 
see on the log that a network error occurred. Looking at my proxy log files, i 
notice that those url haven't been logged. So 
I think that when trying to get an https url, DataPark tries to connect without 
passing from the proxy. 
I added, on the server performing the crawl, an environment variable for both 
http_proxy and https_proxy:
> echo $https_proxy
> http://my.proxy.com:myport
> echo $http_proxy
> http://my.proxy.com:myport

But id doesnt work.
Should i add some command in indexer.conf file in order to crawl https urls? 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=

Reply via email to