Try this,

library(RCurl)
library(XML)

site<-"
http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
"

URL<-getURL(site)

Text=htmlParse(URL,asText=T)

This will give you all the web dat in an HTML-Text format.

You can use "getNodeSet" function to extract whatever links or texts that
you want from that page.


I hope this helps.

Best,
Heramb


On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote:

> Thanks again,
>
> I run the script with the postForm(site, disclaimer_action="I Agree") and
> it does not seem to do anything,
> the webpage is still the disclaimer page thus I am getting the error below
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "readHTMLTable", for
> signature "NULL"
>
>
> I also downloaded the latest version of RHTMLForms
> (omegahat-RHTMLForms-251743f.zip)
> and it does not seem to install correctly.. I used the code
> install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip",
> type="win.binary", repos=NULL)
>
> Any suggestion of what could be causing these problems?
>
>
> On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang <
> dtemplel...@ucdavis.edu
> > wrote:
>
> >  You don't need to use the  getHTMLFormDescription() and
> createFunction().
> > Instead, you can use the postForm() call.  However,
> > getHTMLFormDescription(),
> > etc. is more general. But you need the very latest version of the package
> > to deal with degenerate forms that have no inputs (other than button
> > clicks).
> >
> >  You can get the latest version of the RHTMLForms package
> >  from github
> >
> >       git clone g...@github.com:omegahat/RHTMLForms.git
> >
> >  and that has the fixes for handling the degenerate forms with
> >  no arguments.
> >
> >    D.
> >
> > On 9/19/12 7:51 AM, CPV wrote:
> > > Thank you for your help Duncan,
> > >
> > > I have been trying what you suggested however  I am getting an error
> when
> > > trying to create the function fun<- createFunction(forms[[1]])
> > > it says Error in isHidden I hasDefault :
> > > operations are possible only for numeric, logical or complex types
> > >
> > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang <
> > > dtemplel...@ucdavis.edu> wrote:
> > >
> > >> Hi ?
> > >>
> > >> The key is that you want to use the same curl handle
> > >> for both the postForm() and for getting the data document.
> > >>
> > >> site = u =
> > >> "
> > >>
> >
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> > >> "
> > >>
> > >> library(RCurl)
> > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE)
> > >>
> > >> postForm(site, disclaimer_action="I Agree")
> > >>
> > >> Now we have the cookie in the curl handle so we can use that same curl
> > >> handle
> > >> to request the data document:
> > >>
> > >> txt = getURLContent(u, curl = curl)
> > >>
> > >> Now we can use readHTMLTable() on the local document content:
> > >>
> > >> library(XML)
> > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors =
> > FALSE)
> > >>
> > >>
> > >>
> > >> Rather than knowing how to post the form, I like to read
> > >> the form programmatically and generate an R function to do the
> > submission
> > >> for me. The RHTMLForms package can do this.
> > >>
> > >> library(RHTMLForms)
> > >> forms = getHTMLFormDescription(u, FALSE)
> > >> fun = createFunction(forms[[1]])
> > >>
> > >> Then we can use
> > >>
> > >>  fun(.curl = curl)
> > >>
> > >> instead of
> > >>
> > >>   postForm(site, disclaimer_action="I Agree")
> > >>
> > >> This helps to abstract the details of the form.
> > >>
> > >>   D.
> > >>
> > >> On 9/18/12 5:57 PM, CPV wrote:
> > >>> Hi, I am starting coding in r and one of the things that i want to do
> > is
> > >> to
> > >>> scrape some data from the web.
> > >>> The problem that I am having is that I cannot get passed the
> disclaimer
> > >>> page (which produces a session cookie). I have been able to collect
> > some
> > >>> ideas and combine them in the code below but I dont get passed the
> > >>> disclaimer page.
> > >>> I am trying to agree the disclaimer with the postForm and write the
> > >> cookie
> > >>> to a file, but I cannot do it succesfully....
> > >>> The webpage cookies are written to the file but the value is FALSE...
> > So
> > >>> any ideas of what I should do or what I am doing wrong with?
> > >>> Thank you for your help,
> > >>>
> > >>> library(RCurl)
> > >>> library(XML)
> > >>>
> > >>> site <- "
> > >>>
> > >>
> >
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> > >> "
> > >>>
> > >>> postForm(site, disclaimer_action="I Agree")
> > >>>
> > >>> cf <- "cookies.txt"
> > >>>
> > >>> no_cookie <- function() {
> > >>>         curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf)
> > >>>         getURL(site, curl=curlHandle)
> > >>>
> > >>>         rm(curlHandle)
> > >>>         gc()
> > >>> }
> > >>>
> > >>> if ( file.exists(cf) == TRUE ) {
> > >>>         file.create(cf)
> > >>>         no_cookie()
> > >>> }
> > >>> allTables <- readHTMLTable(site)
> > >>> allTables
> > >>>
> > >>>       [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help@r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>>
> > >>
> > >> ______________________________________________
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to