This may be because connection to the site via R is taking a lot of time. I too faced this problem for the site "Social-Mention".
I tried very primitive approach. I put the 'if' condition in the loop. if(length(output)==0){getURL(site) }else{continue with the code} It might help you. Best, Heramb On Fri, Sep 21, 2012 at 8:45 PM, CPV <ceal...@gmail.com> wrote: > Thanks for your suggestion, > The issue was resolved by Duncan's recommendation. > > Now I am trying to obtain data from different pages from the same site > through a loop, however, the getURLContent keeps timing out, the odd part > is that I can access to the link through a browser with no issues at all!! > Any ideas why it keeps timing out? Also how can I keep the loop running > after this error? > > Thanks again for your help! > > > On Wed, Sep 19, 2012 at 11:36 PM, Heramb Gadgil > <heramb.gad...@gmail.com>wrote: > >> Try this, >> >> >> library(RCurl) >> library(XML) >> >> site<-" >> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 >> " >> >> URL<-getURL(site) >> >> Text=htmlParse(URL,asText=T) >> >> This will give you all the web dat in an HTML-Text format. >> >> You can use "getNodeSet" function to extract whatever links or texts that >> you want from that page. >> >> >> I hope this helps. >> >> Best, >> Heramb >> >> >> >> On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote: >> >>> Thanks again, >>> >>> I run the script with the postForm(site, disclaimer_action="I Agree") and >>> it does not seem to do anything, >>> the webpage is still the disclaimer page thus I am getting the error >>> below >>> Error in function (classes, fdef, mtable) : >>> unable to find an inherited method for function "readHTMLTable", for >>> signature "NULL" >>> >>> >>> I also downloaded the latest version of RHTMLForms >>> (omegahat-RHTMLForms-251743f.zip) >>> and it does not seem to install correctly.. I used the code >>> >>> install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip", >>> type="win.binary", repos=NULL) >>> >>> Any suggestion of what could be causing these problems? >>> >>> >>> On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang < >>> dtemplel...@ucdavis.edu >>> > wrote: >>> >>> > You don't need to use the getHTMLFormDescription() and >>> createFunction(). >>> > Instead, you can use the postForm() call. However, >>> > getHTMLFormDescription(), >>> > etc. is more general. But you need the very latest version of the >>> package >>> > to deal with degenerate forms that have no inputs (other than button >>> > clicks). >>> > >>> > You can get the latest version of the RHTMLForms package >>> > from github >>> > >>> > git clone g...@github.com:omegahat/RHTMLForms.git >>> > >>> > and that has the fixes for handling the degenerate forms with >>> > no arguments. >>> > >>> > D. >>> > >>> > On 9/19/12 7:51 AM, CPV wrote: >>> > > Thank you for your help Duncan, >>> > > >>> > > I have been trying what you suggested however I am getting an error >>> when >>> > > trying to create the function fun<- createFunction(forms[[1]]) >>> > > it says Error in isHidden I hasDefault : >>> > > operations are possible only for numeric, logical or complex types >>> > > >>> > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang < >>> > > dtemplel...@ucdavis.edu> wrote: >>> > > >>> > >> Hi ? >>> > >> >>> > >> The key is that you want to use the same curl handle >>> > >> for both the postForm() and for getting the data document. >>> > >> >>> > >> site = u = >>> > >> " >>> > >> >>> > >>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 >>> > >> " >>> > >> >>> > >> library(RCurl) >>> > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE) >>> > >> >>> > >> postForm(site, disclaimer_action="I Agree") >>> > >> >>> > >> Now we have the cookie in the curl handle so we can use that same >>> curl >>> > >> handle >>> > >> to request the data document: >>> > >> >>> > >> txt = getURLContent(u, curl = curl) >>> > >> >>> > >> Now we can use readHTMLTable() on the local document content: >>> > >> >>> > >> library(XML) >>> > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = >>> > FALSE) >>> > >> >>> > >> >>> > >> >>> > >> Rather than knowing how to post the form, I like to read >>> > >> the form programmatically and generate an R function to do the >>> > submission >>> > >> for me. The RHTMLForms package can do this. >>> > >> >>> > >> library(RHTMLForms) >>> > >> forms = getHTMLFormDescription(u, FALSE) >>> > >> fun = createFunction(forms[[1]]) >>> > >> >>> > >> Then we can use >>> > >> >>> > >> fun(.curl = curl) >>> > >> >>> > >> instead of >>> > >> >>> > >> postForm(site, disclaimer_action="I Agree") >>> > >> >>> > >> This helps to abstract the details of the form. >>> > >> >>> > >> D. >>> > >> >>> > >> On 9/18/12 5:57 PM, CPV wrote: >>> > >>> Hi, I am starting coding in r and one of the things that i want to >>> do >>> > is >>> > >> to >>> > >>> scrape some data from the web. >>> > >>> The problem that I am having is that I cannot get passed the >>> disclaimer >>> > >>> page (which produces a session cookie). I have been able to collect >>> > some >>> > >>> ideas and combine them in the code below but I dont get passed the >>> > >>> disclaimer page. >>> > >>> I am trying to agree the disclaimer with the postForm and write the >>> > >> cookie >>> > >>> to a file, but I cannot do it succesfully.... >>> > >>> The webpage cookies are written to the file but the value is >>> FALSE... >>> > So >>> > >>> any ideas of what I should do or what I am doing wrong with? >>> > >>> Thank you for your help, >>> > >>> >>> > >>> library(RCurl) >>> > >>> library(XML) >>> > >>> >>> > >>> site <- " >>> > >>> >>> > >> >>> > >>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 >>> > >> " >>> > >>> >>> > >>> postForm(site, disclaimer_action="I Agree") >>> > >>> >>> > >>> cf <- "cookies.txt" >>> > >>> >>> > >>> no_cookie <- function() { >>> > >>> curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf) >>> > >>> getURL(site, curl=curlHandle) >>> > >>> >>> > >>> rm(curlHandle) >>> > >>> gc() >>> > >>> } >>> > >>> >>> > >>> if ( file.exists(cf) == TRUE ) { >>> > >>> file.create(cf) >>> > >>> no_cookie() >>> > >>> } >>> > >>> allTables <- readHTMLTable(site) >>> > >>> allTables >>> > >>> >>> > >>> [[alternative HTML version deleted]] >>> > >>> >>> > >>> ______________________________________________ >>> > >>> R-help@r-project.org mailing list >>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> > >>> PLEASE do read the posting guide >>> > >> http://www.R-project.org/posting-guide.html >>> > >>> and provide commented, minimal, self-contained, reproducible code. >>> > >>> >>> > >>> >>> > >> >>> > >> ______________________________________________ >>> > >> R-help@r-project.org mailing list >>> > >> https://stat.ethz.ch/mailman/listinfo/r-help >>> > >> PLEASE do read the posting guide >>> > >> http://www.R-project.org/posting-guide.html >>> > >> and provide commented, minimal, self-contained, reproducible code. >>> > >> >>> > > >>> > > [[alternative HTML version deleted]] >>> > > >>> > > ______________________________________________ >>> > > R-help@r-project.org mailing list >>> > > https://stat.ethz.ch/mailman/listinfo/r-help >>> > > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > > and provide commented, minimal, self-contained, reproducible code. >>> > > >>> > > >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.