Try this, library(RCurl) library(XML)
site<-" http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 " URL<-getURL(site) Text=htmlParse(URL,asText=T) This will give you all the web dat in an HTML-Text format. You can use "getNodeSet" function to extract whatever links or texts that you want from that page. I hope this helps. Best, Heramb On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote: > Thanks again, > > I run the script with the postForm(site, disclaimer_action="I Agree") and > it does not seem to do anything, > the webpage is still the disclaimer page thus I am getting the error below > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "readHTMLTable", for > signature "NULL" > > > I also downloaded the latest version of RHTMLForms > (omegahat-RHTMLForms-251743f.zip) > and it does not seem to install correctly.. I used the code > install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip", > type="win.binary", repos=NULL) > > Any suggestion of what could be causing these problems? > > > On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang < > dtemplel...@ucdavis.edu > > wrote: > > > You don't need to use the getHTMLFormDescription() and > createFunction(). > > Instead, you can use the postForm() call. However, > > getHTMLFormDescription(), > > etc. is more general. But you need the very latest version of the package > > to deal with degenerate forms that have no inputs (other than button > > clicks). > > > > You can get the latest version of the RHTMLForms package > > from github > > > > git clone g...@github.com:omegahat/RHTMLForms.git > > > > and that has the fixes for handling the degenerate forms with > > no arguments. > > > > D. > > > > On 9/19/12 7:51 AM, CPV wrote: > > > Thank you for your help Duncan, > > > > > > I have been trying what you suggested however I am getting an error > when > > > trying to create the function fun<- createFunction(forms[[1]]) > > > it says Error in isHidden I hasDefault : > > > operations are possible only for numeric, logical or complex types > > > > > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang < > > > dtemplel...@ucdavis.edu> wrote: > > > > > >> Hi ? > > >> > > >> The key is that you want to use the same curl handle > > >> for both the postForm() and for getting the data document. > > >> > > >> site = u = > > >> " > > >> > > > http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 > > >> " > > >> > > >> library(RCurl) > > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE) > > >> > > >> postForm(site, disclaimer_action="I Agree") > > >> > > >> Now we have the cookie in the curl handle so we can use that same curl > > >> handle > > >> to request the data document: > > >> > > >> txt = getURLContent(u, curl = curl) > > >> > > >> Now we can use readHTMLTable() on the local document content: > > >> > > >> library(XML) > > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = > > FALSE) > > >> > > >> > > >> > > >> Rather than knowing how to post the form, I like to read > > >> the form programmatically and generate an R function to do the > > submission > > >> for me. The RHTMLForms package can do this. > > >> > > >> library(RHTMLForms) > > >> forms = getHTMLFormDescription(u, FALSE) > > >> fun = createFunction(forms[[1]]) > > >> > > >> Then we can use > > >> > > >> fun(.curl = curl) > > >> > > >> instead of > > >> > > >> postForm(site, disclaimer_action="I Agree") > > >> > > >> This helps to abstract the details of the form. > > >> > > >> D. > > >> > > >> On 9/18/12 5:57 PM, CPV wrote: > > >>> Hi, I am starting coding in r and one of the things that i want to do > > is > > >> to > > >>> scrape some data from the web. > > >>> The problem that I am having is that I cannot get passed the > disclaimer > > >>> page (which produces a session cookie). I have been able to collect > > some > > >>> ideas and combine them in the code below but I dont get passed the > > >>> disclaimer page. > > >>> I am trying to agree the disclaimer with the postForm and write the > > >> cookie > > >>> to a file, but I cannot do it succesfully.... > > >>> The webpage cookies are written to the file but the value is FALSE... > > So > > >>> any ideas of what I should do or what I am doing wrong with? > > >>> Thank you for your help, > > >>> > > >>> library(RCurl) > > >>> library(XML) > > >>> > > >>> site <- " > > >>> > > >> > > > http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 > > >> " > > >>> > > >>> postForm(site, disclaimer_action="I Agree") > > >>> > > >>> cf <- "cookies.txt" > > >>> > > >>> no_cookie <- function() { > > >>> curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf) > > >>> getURL(site, curl=curlHandle) > > >>> > > >>> rm(curlHandle) > > >>> gc() > > >>> } > > >>> > > >>> if ( file.exists(cf) == TRUE ) { > > >>> file.create(cf) > > >>> no_cookie() > > >>> } > > >>> allTables <- readHTMLTable(site) > > >>> allTables > > >>> > > >>> [[alternative HTML version deleted]] > > >>> > > >>> ______________________________________________ > > >>> R-help@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > >>> > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.