Dear R-help, I have just been informed that I must not rewrite the 'https' as 'http' as some web pages may not download (I think I just got lucky on the ones I've tried thus far). Therefore I would ask if some kind individual could let me know where I can post questions about the RCurl package (Omegahat Repository), because I honestly can't find the mailing list for it, and ask forgiveness for my earlier post :-)
Many thanks, Tony 2008/10/1 Tony Breyal <[EMAIL PROTECTED]> > Dear R-Help, > > From reading the help file, it is my understanding the the download.file() > function does not support HTTPS connections. So therefore, understandably, > the follow produces an error: > > ### R Code > > url <- "https://stat.ethz.ch/pipermail/r-help/2008-October/thread.html" > > destfile <- "//PFO-SBS001/Redirected/tonyb/Desktop/R_web_test/tmp.txt" > > download.file(url, destfile) > Error in download.file(url, destfile) : unsupported URL scheme > > My question is: What about if i remove the 's' from the 'https' url? The > download.file() function seems to now work fine (please see below). Did i > just get lucky with the url I used, or can I in general simply rewrite > 'https' as 'http'. My long term goal is to download hundreds of web pages > and then somehow remove all of the html tags so that only the web page text > remains. No private information is being sent or received for this task (no > passwords etc are used). > > ### R Code > > url <- "http://stat.ethz.ch/pipermail/r-help/2008-October/thread.html" > > destfile <- "//PFO-SBS001/Redirected/tonyb/Desktop/R_web_test/tmp.txt" > > download.file(url, destfile) > trying URL 'http://stat.ethz.ch/pipermail/r-help/2008-October/thread.html' > Content type 'text/html; charset=ISO-8859-1' length 13767 bytes (13 Kb) > opened URL > downloaded 13 Kb > > A quick forum search shows that a package called RCurl (Omegahat > Repository) does support HTTPS connections, but i got an error when using > that and have no idea where the omegahat mailing list is, which is why i'd > like to know about removing the 's' in 'https'. If it turns out there is a > good reason not to remove the 's', then i will repost on. God i hope this > post makes sense lol. > > Many thanks for your valuable time, > Tony Breyal > > Ps. This is my first posting, so please be kind! :-) > PPs. Sorry this post was so long. > PPPs. For anyone interested, this is what happens when using RCurl: > > ### R Code > > library(RCurl) > > txt = getURL(" > https://stat.ethz.ch/pipermail/r-help/2008-October/thread.html") > Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : > SSL certificate problem, verify that the CA cert is OK. Details: > error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify > failed > > OS: Windows Vista Ultimate > R version: 2.7.2 (2008-08-25) > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.