Thanks for your suggestion, The issue was resolved by Duncan's recommendation.
Now I am trying to obtain data from different pages from the same site through a loop, however, the getURLContent keeps timing out, the odd part is that I can access to the link through a browser with no issues at all!! Any ideas why it keeps timing out? Also how can I keep the loop running after this error? Thanks again for your help! On Wed, Sep 19, 2012 at 11:36 PM, Heramb Gadgil <heramb.gad...@gmail.com>wrote: > Try this, > > > library(RCurl) > library(XML) > > site<-" > http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 > " > > URL<-getURL(site) > > Text=htmlParse(URL,asText=T) > > This will give you all the web dat in an HTML-Text format. > > You can use "getNodeSet" function to extract whatever links or texts that > you want from that page. > > > I hope this helps. > > Best, > Heramb > > > > On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote: > >> Thanks again, >> >> I run the script with the postForm(site, disclaimer_action="I Agree") and >> it does not seem to do anything, >> the webpage is still the disclaimer page thus I am getting the error below >> Error in function (classes, fdef, mtable) : >> unable to find an inherited method for function "readHTMLTable", for >> signature "NULL" >> >> >> I also downloaded the latest version of RHTMLForms >> (omegahat-RHTMLForms-251743f.zip) >> and it does not seem to install correctly.. I used the code >> >> install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip", >> type="win.binary", repos=NULL) >> >> Any suggestion of what could be causing these problems? >> >> >> On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang < >> dtemplel...@ucdavis.edu >> > wrote: >> >> > You don't need to use the getHTMLFormDescription() and >> createFunction(). >> > Instead, you can use the postForm() call. However, >> > getHTMLFormDescription(), >> > etc. is more general. But you need the very latest version of the >> package >> > to deal with degenerate forms that have no inputs (other than button >> > clicks). >> > >> > You can get the latest version of the RHTMLForms package >> > from github >> > >> > git clone g...@github.com:omegahat/RHTMLForms.git >> > >> > and that has the fixes for handling the degenerate forms with >> > no arguments. >> > >> > D. >> > >> > On 9/19/12 7:51 AM, CPV wrote: >> > > Thank you for your help Duncan, >> > > >> > > I have been trying what you suggested however I am getting an error >> when >> > > trying to create the function fun<- createFunction(forms[[1]]) >> > > it says Error in isHidden I hasDefault : >> > > operations are possible only for numeric, logical or complex types >> > > >> > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang < >> > > dtemplel...@ucdavis.edu> wrote: >> > > >> > >> Hi ? >> > >> >> > >> The key is that you want to use the same curl handle >> > >> for both the postForm() and for getting the data document. >> > >> >> > >> site = u = >> > >> " >> > >> >> > >> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 >> > >> " >> > >> >> > >> library(RCurl) >> > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE) >> > >> >> > >> postForm(site, disclaimer_action="I Agree") >> > >> >> > >> Now we have the cookie in the curl handle so we can use that same >> curl >> > >> handle >> > >> to request the data document: >> > >> >> > >> txt = getURLContent(u, curl = curl) >> > >> >> > >> Now we can use readHTMLTable() on the local document content: >> > >> >> > >> library(XML) >> > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = >> > FALSE) >> > >> >> > >> >> > >> >> > >> Rather than knowing how to post the form, I like to read >> > >> the form programmatically and generate an R function to do the >> > submission >> > >> for me. The RHTMLForms package can do this. >> > >> >> > >> library(RHTMLForms) >> > >> forms = getHTMLFormDescription(u, FALSE) >> > >> fun = createFunction(forms[[1]]) >> > >> >> > >> Then we can use >> > >> >> > >> fun(.curl = curl) >> > >> >> > >> instead of >> > >> >> > >> postForm(site, disclaimer_action="I Agree") >> > >> >> > >> This helps to abstract the details of the form. >> > >> >> > >> D. >> > >> >> > >> On 9/18/12 5:57 PM, CPV wrote: >> > >>> Hi, I am starting coding in r and one of the things that i want to >> do >> > is >> > >> to >> > >>> scrape some data from the web. >> > >>> The problem that I am having is that I cannot get passed the >> disclaimer >> > >>> page (which produces a session cookie). I have been able to collect >> > some >> > >>> ideas and combine them in the code below but I dont get passed the >> > >>> disclaimer page. >> > >>> I am trying to agree the disclaimer with the postForm and write the >> > >> cookie >> > >>> to a file, but I cannot do it succesfully.... >> > >>> The webpage cookies are written to the file but the value is >> FALSE... >> > So >> > >>> any ideas of what I should do or what I am doing wrong with? >> > >>> Thank you for your help, >> > >>> >> > >>> library(RCurl) >> > >>> library(XML) >> > >>> >> > >>> site <- " >> > >>> >> > >> >> > >> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 >> > >> " >> > >>> >> > >>> postForm(site, disclaimer_action="I Agree") >> > >>> >> > >>> cf <- "cookies.txt" >> > >>> >> > >>> no_cookie <- function() { >> > >>> curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf) >> > >>> getURL(site, curl=curlHandle) >> > >>> >> > >>> rm(curlHandle) >> > >>> gc() >> > >>> } >> > >>> >> > >>> if ( file.exists(cf) == TRUE ) { >> > >>> file.create(cf) >> > >>> no_cookie() >> > >>> } >> > >>> allTables <- readHTMLTable(site) >> > >>> allTables >> > >>> >> > >>> [[alternative HTML version deleted]] >> > >>> >> > >>> ______________________________________________ >> > >>> R-help@r-project.org mailing list >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide >> > >> http://www.R-project.org/posting-guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >>> >> > >> >> > >> ______________________________________________ >> > >> R-help@r-project.org mailing list >> > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > >> PLEASE do read the posting guide >> > >> http://www.R-project.org/posting-guide.html >> > >> and provide commented, minimal, self-contained, reproducible code. >> > >> >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-help@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > > >> > > >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.