Thanks for your suggestion,
The issue was resolved by Duncan's recommendation.

Now I am trying to obtain data from different pages from the same site
through a loop, however, the getURLContent keeps timing out, the odd part
is that I can access to the link through a browser with no issues at all!!
Any ideas why it keeps timing out? Also how can I keep the loop running
after this error?

Thanks again for your help!

On Wed, Sep 19, 2012 at 11:36 PM, Heramb Gadgil <heramb.gad...@gmail.com>wrote:

> Try this,
>
>
> library(RCurl)
> library(XML)
>
> site<-"
> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
> "
>
> URL<-getURL(site)
>
> Text=htmlParse(URL,asText=T)
>
> This will give you all the web dat in an HTML-Text format.
>
> You can use "getNodeSet" function to extract whatever links or texts that
> you want from that page.
>
>
> I hope this helps.
>
> Best,
> Heramb
>
>
>
> On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote:
>
>> Thanks again,
>>
>> I run the script with the postForm(site, disclaimer_action="I Agree") and
>> it does not seem to do anything,
>> the webpage is still the disclaimer page thus I am getting the error below
>> Error in function (classes, fdef, mtable)  :
>>   unable to find an inherited method for function "readHTMLTable", for
>> signature "NULL"
>>
>>
>> I also downloaded the latest version of RHTMLForms
>> (omegahat-RHTMLForms-251743f.zip)
>> and it does not seem to install correctly.. I used the code
>>
>> install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip",
>> type="win.binary", repos=NULL)
>>
>> Any suggestion of what could be causing these problems?
>>
>>
>> On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang <
>> dtemplel...@ucdavis.edu
>> > wrote:
>>
>> >  You don't need to use the  getHTMLFormDescription() and
>> createFunction().
>> > Instead, you can use the postForm() call.  However,
>> > getHTMLFormDescription(),
>> > etc. is more general. But you need the very latest version of the
>> package
>> > to deal with degenerate forms that have no inputs (other than button
>> > clicks).
>> >
>> >  You can get the latest version of the RHTMLForms package
>> >  from github
>> >
>> >       git clone g...@github.com:omegahat/RHTMLForms.git
>> >
>> >  and that has the fixes for handling the degenerate forms with
>> >  no arguments.
>> >
>> >    D.
>> >
>> > On 9/19/12 7:51 AM, CPV wrote:
>> > > Thank you for your help Duncan,
>> > >
>> > > I have been trying what you suggested however  I am getting an error
>> when
>> > > trying to create the function fun<- createFunction(forms[[1]])
>> > > it says Error in isHidden I hasDefault :
>> > > operations are possible only for numeric, logical or complex types
>> > >
>> > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang <
>> > > dtemplel...@ucdavis.edu> wrote:
>> > >
>> > >> Hi ?
>> > >>
>> > >> The key is that you want to use the same curl handle
>> > >> for both the postForm() and for getting the data document.
>> > >>
>> > >> site = u =
>> > >> "
>> > >>
>> >
>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
>> > >> "
>> > >>
>> > >> library(RCurl)
>> > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE)
>> > >>
>> > >> postForm(site, disclaimer_action="I Agree")
>> > >>
>> > >> Now we have the cookie in the curl handle so we can use that same
>> curl
>> > >> handle
>> > >> to request the data document:
>> > >>
>> > >> txt = getURLContent(u, curl = curl)
>> > >>
>> > >> Now we can use readHTMLTable() on the local document content:
>> > >>
>> > >> library(XML)
>> > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors =
>> > FALSE)
>> > >>
>> > >>
>> > >>
>> > >> Rather than knowing how to post the form, I like to read
>> > >> the form programmatically and generate an R function to do the
>> > submission
>> > >> for me. The RHTMLForms package can do this.
>> > >>
>> > >> library(RHTMLForms)
>> > >> forms = getHTMLFormDescription(u, FALSE)
>> > >> fun = createFunction(forms[[1]])
>> > >>
>> > >> Then we can use
>> > >>
>> > >>  fun(.curl = curl)
>> > >>
>> > >> instead of
>> > >>
>> > >>   postForm(site, disclaimer_action="I Agree")
>> > >>
>> > >> This helps to abstract the details of the form.
>> > >>
>> > >>   D.
>> > >>
>> > >> On 9/18/12 5:57 PM, CPV wrote:
>> > >>> Hi, I am starting coding in r and one of the things that i want to
>> do
>> > is
>> > >> to
>> > >>> scrape some data from the web.
>> > >>> The problem that I am having is that I cannot get passed the
>> disclaimer
>> > >>> page (which produces a session cookie). I have been able to collect
>> > some
>> > >>> ideas and combine them in the code below but I dont get passed the
>> > >>> disclaimer page.
>> > >>> I am trying to agree the disclaimer with the postForm and write the
>> > >> cookie
>> > >>> to a file, but I cannot do it succesfully....
>> > >>> The webpage cookies are written to the file but the value is
>> FALSE...
>> > So
>> > >>> any ideas of what I should do or what I am doing wrong with?
>> > >>> Thank you for your help,
>> > >>>
>> > >>> library(RCurl)
>> > >>> library(XML)
>> > >>>
>> > >>> site <- "
>> > >>>
>> > >>
>> >
>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
>> > >> "
>> > >>>
>> > >>> postForm(site, disclaimer_action="I Agree")
>> > >>>
>> > >>> cf <- "cookies.txt"
>> > >>>
>> > >>> no_cookie <- function() {
>> > >>>         curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf)
>> > >>>         getURL(site, curl=curlHandle)
>> > >>>
>> > >>>         rm(curlHandle)
>> > >>>         gc()
>> > >>> }
>> > >>>
>> > >>> if ( file.exists(cf) == TRUE ) {
>> > >>>         file.create(cf)
>> > >>>         no_cookie()
>> > >>> }
>> > >>> allTables <- readHTMLTable(site)
>> > >>> allTables
>> > >>>
>> > >>>       [[alternative HTML version deleted]]
>> > >>>
>> > >>> ______________________________________________
>> > >>> R-help@r-project.org mailing list
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>> PLEASE do read the posting guide
>> > >> http://www.R-project.org/posting-guide.html
>> > >>> and provide commented, minimal, self-contained, reproducible code.
>> > >>>
>> > >>>
>> > >>
>> > >> ______________________________________________
>> > >> R-help@r-project.org mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide
>> > >> http://www.R-project.org/posting-guide.html
>> > >> and provide commented, minimal, self-contained, reproducible code.
>> > >>
>> > >
>> > >       [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help@r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>> > >
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to