Thank you for your help Duncan, I have been trying what you suggested however I am getting an error when trying to create the function fun<- createFunction(forms[[1]]) it says Error in isHidden I hasDefault : operations are possible only for numeric, logical or complex types
On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang < dtemplel...@ucdavis.edu> wrote: > Hi ? > > The key is that you want to use the same curl handle > for both the postForm() and for getting the data document. > > site = u = > " > http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 > " > > library(RCurl) > curl = getCurlHandle(cookiefile = "", verbose = TRUE) > > postForm(site, disclaimer_action="I Agree") > > Now we have the cookie in the curl handle so we can use that same curl > handle > to request the data document: > > txt = getURLContent(u, curl = curl) > > Now we can use readHTMLTable() on the local document content: > > library(XML) > tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE) > > > > Rather than knowing how to post the form, I like to read > the form programmatically and generate an R function to do the submission > for me. The RHTMLForms package can do this. > > library(RHTMLForms) > forms = getHTMLFormDescription(u, FALSE) > fun = createFunction(forms[[1]]) > > Then we can use > > fun(.curl = curl) > > instead of > > postForm(site, disclaimer_action="I Agree") > > This helps to abstract the details of the form. > > D. > > On 9/18/12 5:57 PM, CPV wrote: > > Hi, I am starting coding in r and one of the things that i want to do is > to > > scrape some data from the web. > > The problem that I am having is that I cannot get passed the disclaimer > > page (which produces a session cookie). I have been able to collect some > > ideas and combine them in the code below but I dont get passed the > > disclaimer page. > > I am trying to agree the disclaimer with the postForm and write the > cookie > > to a file, but I cannot do it succesfully.... > > The webpage cookies are written to the file but the value is FALSE... So > > any ideas of what I should do or what I am doing wrong with? > > Thank you for your help, > > > > library(RCurl) > > library(XML) > > > > site <- " > > > http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18 > " > > > > postForm(site, disclaimer_action="I Agree") > > > > cf <- "cookies.txt" > > > > no_cookie <- function() { > > curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf) > > getURL(site, curl=curlHandle) > > > > rm(curlHandle) > > gc() > > } > > > > if ( file.exists(cf) == TRUE ) { > > file.create(cf) > > no_cookie() > > } > > allTables <- readHTMLTable(site) > > allTables > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.