Thanks again,

I run the script with the postForm(site, disclaimer_action="I Agree") and
it does not seem to do anything,
the webpage is still the disclaimer page thus I am getting the error below
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "readHTMLTable", for
signature "NULL"


I also downloaded the latest version of RHTMLForms
(omegahat-RHTMLForms-251743f.zip)
and it does not seem to install correctly.. I used the code
install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip",
type="win.binary", repos=NULL)

Any suggestion of what could be causing these problems?


On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang <dtemplel...@ucdavis.edu
> wrote:

>  You don't need to use the  getHTMLFormDescription() and createFunction().
> Instead, you can use the postForm() call.  However,
> getHTMLFormDescription(),
> etc. is more general. But you need the very latest version of the package
> to deal with degenerate forms that have no inputs (other than button
> clicks).
>
>  You can get the latest version of the RHTMLForms package
>  from github
>
>       git clone g...@github.com:omegahat/RHTMLForms.git
>
>  and that has the fixes for handling the degenerate forms with
>  no arguments.
>
>    D.
>
> On 9/19/12 7:51 AM, CPV wrote:
> > Thank you for your help Duncan,
> >
> > I have been trying what you suggested however  I am getting an error when
> > trying to create the function fun<- createFunction(forms[[1]])
> > it says Error in isHidden I hasDefault :
> > operations are possible only for numeric, logical or complex types
> >
> > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang <
> > dtemplel...@ucdavis.edu> wrote:
> >
> >> Hi ?
> >>
> >> The key is that you want to use the same curl handle
> >> for both the postForm() and for getting the data document.
> >>
> >> site = u =
> >> "
> >>
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> >> "
> >>
> >> library(RCurl)
> >> curl = getCurlHandle(cookiefile = "", verbose = TRUE)
> >>
> >> postForm(site, disclaimer_action="I Agree")
> >>
> >> Now we have the cookie in the curl handle so we can use that same curl
> >> handle
> >> to request the data document:
> >>
> >> txt = getURLContent(u, curl = curl)
> >>
> >> Now we can use readHTMLTable() on the local document content:
> >>
> >> library(XML)
> >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors =
> FALSE)
> >>
> >>
> >>
> >> Rather than knowing how to post the form, I like to read
> >> the form programmatically and generate an R function to do the
> submission
> >> for me. The RHTMLForms package can do this.
> >>
> >> library(RHTMLForms)
> >> forms = getHTMLFormDescription(u, FALSE)
> >> fun = createFunction(forms[[1]])
> >>
> >> Then we can use
> >>
> >>  fun(.curl = curl)
> >>
> >> instead of
> >>
> >>   postForm(site, disclaimer_action="I Agree")
> >>
> >> This helps to abstract the details of the form.
> >>
> >>   D.
> >>
> >> On 9/18/12 5:57 PM, CPV wrote:
> >>> Hi, I am starting coding in r and one of the things that i want to do
> is
> >> to
> >>> scrape some data from the web.
> >>> The problem that I am having is that I cannot get passed the disclaimer
> >>> page (which produces a session cookie). I have been able to collect
> some
> >>> ideas and combine them in the code below but I dont get passed the
> >>> disclaimer page.
> >>> I am trying to agree the disclaimer with the postForm and write the
> >> cookie
> >>> to a file, but I cannot do it succesfully....
> >>> The webpage cookies are written to the file but the value is FALSE...
> So
> >>> any ideas of what I should do or what I am doing wrong with?
> >>> Thank you for your help,
> >>>
> >>> library(RCurl)
> >>> library(XML)
> >>>
> >>> site <- "
> >>>
> >>
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> >> "
> >>>
> >>> postForm(site, disclaimer_action="I Agree")
> >>>
> >>> cf <- "cookies.txt"
> >>>
> >>> no_cookie <- function() {
> >>>         curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf)
> >>>         getURL(site, curl=curlHandle)
> >>>
> >>>         rm(curlHandle)
> >>>         gc()
> >>> }
> >>>
> >>> if ( file.exists(cf) == TRUE ) {
> >>>         file.create(cf)
> >>>         no_cookie()
> >>> }
> >>> allTables <- readHTMLTable(site)
> >>> allTables
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to