Thank you for your help Duncan,

I have been trying what you suggested however  I am getting an error when
trying to create the function fun<- createFunction(forms[[1]])
it says Error in isHidden I hasDefault :
operations are possible only for numeric, logical or complex types

On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang <
dtemplel...@ucdavis.edu> wrote:

> Hi ?
>
> The key is that you want to use the same curl handle
> for both the postForm() and for getting the data document.
>
> site = u =
> "
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> "
>
> library(RCurl)
> curl = getCurlHandle(cookiefile = "", verbose = TRUE)
>
> postForm(site, disclaimer_action="I Agree")
>
> Now we have the cookie in the curl handle so we can use that same curl
> handle
> to request the data document:
>
> txt = getURLContent(u, curl = curl)
>
> Now we can use readHTMLTable() on the local document content:
>
> library(XML)
> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE)
>
>
>
> Rather than knowing how to post the form, I like to read
> the form programmatically and generate an R function to do the submission
> for me. The RHTMLForms package can do this.
>
> library(RHTMLForms)
> forms = getHTMLFormDescription(u, FALSE)
> fun = createFunction(forms[[1]])
>
> Then we can use
>
>  fun(.curl = curl)
>
> instead of
>
>   postForm(site, disclaimer_action="I Agree")
>
> This helps to abstract the details of the form.
>
>   D.
>
> On 9/18/12 5:57 PM, CPV wrote:
> > Hi, I am starting coding in r and one of the things that i want to do is
> to
> > scrape some data from the web.
> > The problem that I am having is that I cannot get passed the disclaimer
> > page (which produces a session cookie). I have been able to collect some
> > ideas and combine them in the code below but I dont get passed the
> > disclaimer page.
> > I am trying to agree the disclaimer with the postForm and write the
> cookie
> > to a file, but I cannot do it succesfully....
> > The webpage cookies are written to the file but the value is FALSE... So
> > any ideas of what I should do or what I am doing wrong with?
> > Thank you for your help,
> >
> > library(RCurl)
> > library(XML)
> >
> > site <- "
> >
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> "
> >
> > postForm(site, disclaimer_action="I Agree")
> >
> > cf <- "cookies.txt"
> >
> > no_cookie <- function() {
> >         curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf)
> >         getURL(site, curl=curlHandle)
> >
> >         rm(curlHandle)
> >         gc()
> > }
> >
> > if ( file.exists(cf) == TRUE ) {
> >         file.create(cf)
> >         no_cookie()
> > }
> > allTables <- readHTMLTable(site)
> > allTables
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to