- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Luca Pellegrini Subject: HOWTO crawl https:// urls passing from a proxy
When DataPark finds an https url on a web page, it doesnt crawl it and i can see on the log that a network error occurred. Looking at my proxy log files, i notice that those url haven't been logged. So I think that when trying to get an https url, DataPark tries to connect without passing from the proxy. I added, on the server performing the crawl, an environment variable for both http_proxy and https_proxy: > echo $https_proxy > http://my.proxy.com:myport > echo $http_proxy > http://my.proxy.com:myport But id doesnt work. Should i add some command in indexer.conf file in order to crawl https urls? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
