This may be because connection to the site via R is taking a lot of time. I
too faced this problem for the site "Social-Mention".

I tried very primitive approach. I put the 'if' condition in the loop.

if(length(output)==0){getURL(site)
}else{continue with the code}

It might help you.

Best,
Heramb

On Fri, Sep 21, 2012 at 8:45 PM, CPV <ceal...@gmail.com> wrote:

> Thanks for your suggestion,
> The issue was resolved by Duncan's recommendation.
>
> Now I am trying to obtain data from different pages from the same site
> through a loop, however, the getURLContent keeps timing out, the odd part
> is that I can access to the link through a browser with no issues at all!!
> Any ideas why it keeps timing out? Also how can I keep the loop running
> after this error?
>
> Thanks again for your help!
>
>
> On Wed, Sep 19, 2012 at 11:36 PM, Heramb Gadgil 
> <heramb.gad...@gmail.com>wrote:
>
>> Try this,
>>
>>
>> library(RCurl)
>> library(XML)
>>
>> site<-"
>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
>> "
>>
>> URL<-getURL(site)
>>
>> Text=htmlParse(URL,asText=T)
>>
>> This will give you all the web dat in an HTML-Text format.
>>
>> You can use "getNodeSet" function to extract whatever links or texts that
>> you want from that page.
>>
>>
>> I hope this helps.
>>
>> Best,
>> Heramb
>>
>>
>>
>> On Wed, Sep 19, 2012 at 10:26 PM, CPV <ceal...@gmail.com> wrote:
>>
>>> Thanks again,
>>>
>>> I run the script with the postForm(site, disclaimer_action="I Agree") and
>>> it does not seem to do anything,
>>> the webpage is still the disclaimer page thus I am getting the error
>>> below
>>> Error in function (classes, fdef, mtable)  :
>>>   unable to find an inherited method for function "readHTMLTable", for
>>> signature "NULL"
>>>
>>>
>>> I also downloaded the latest version of RHTMLForms
>>> (omegahat-RHTMLForms-251743f.zip)
>>> and it does not seem to install correctly.. I used the code
>>>
>>> install.packages("C:/Users/cess/Downloads/omegahat-RHTMLForms-251743f.zip",
>>> type="win.binary", repos=NULL)
>>>
>>> Any suggestion of what could be causing these problems?
>>>
>>>
>>> On Wed, Sep 19, 2012 at 9:49 AM, Duncan Temple Lang <
>>> dtemplel...@ucdavis.edu
>>> > wrote:
>>>
>>> >  You don't need to use the  getHTMLFormDescription() and
>>> createFunction().
>>> > Instead, you can use the postForm() call.  However,
>>> > getHTMLFormDescription(),
>>> > etc. is more general. But you need the very latest version of the
>>> package
>>> > to deal with degenerate forms that have no inputs (other than button
>>> > clicks).
>>> >
>>> >  You can get the latest version of the RHTMLForms package
>>> >  from github
>>> >
>>> >       git clone g...@github.com:omegahat/RHTMLForms.git
>>> >
>>> >  and that has the fixes for handling the degenerate forms with
>>> >  no arguments.
>>> >
>>> >    D.
>>> >
>>> > On 9/19/12 7:51 AM, CPV wrote:
>>> > > Thank you for your help Duncan,
>>> > >
>>> > > I have been trying what you suggested however  I am getting an error
>>> when
>>> > > trying to create the function fun<- createFunction(forms[[1]])
>>> > > it says Error in isHidden I hasDefault :
>>> > > operations are possible only for numeric, logical or complex types
>>> > >
>>> > > On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang <
>>> > > dtemplel...@ucdavis.edu> wrote:
>>> > >
>>> > >> Hi ?
>>> > >>
>>> > >> The key is that you want to use the same curl handle
>>> > >> for both the postForm() and for getting the data document.
>>> > >>
>>> > >> site = u =
>>> > >> "
>>> > >>
>>> >
>>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
>>> > >> "
>>> > >>
>>> > >> library(RCurl)
>>> > >> curl = getCurlHandle(cookiefile = "", verbose = TRUE)
>>> > >>
>>> > >> postForm(site, disclaimer_action="I Agree")
>>> > >>
>>> > >> Now we have the cookie in the curl handle so we can use that same
>>> curl
>>> > >> handle
>>> > >> to request the data document:
>>> > >>
>>> > >> txt = getURLContent(u, curl = curl)
>>> > >>
>>> > >> Now we can use readHTMLTable() on the local document content:
>>> > >>
>>> > >> library(XML)
>>> > >> tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors =
>>> > FALSE)
>>> > >>
>>> > >>
>>> > >>
>>> > >> Rather than knowing how to post the form, I like to read
>>> > >> the form programmatically and generate an R function to do the
>>> > submission
>>> > >> for me. The RHTMLForms package can do this.
>>> > >>
>>> > >> library(RHTMLForms)
>>> > >> forms = getHTMLFormDescription(u, FALSE)
>>> > >> fun = createFunction(forms[[1]])
>>> > >>
>>> > >> Then we can use
>>> > >>
>>> > >>  fun(.curl = curl)
>>> > >>
>>> > >> instead of
>>> > >>
>>> > >>   postForm(site, disclaimer_action="I Agree")
>>> > >>
>>> > >> This helps to abstract the details of the form.
>>> > >>
>>> > >>   D.
>>> > >>
>>> > >> On 9/18/12 5:57 PM, CPV wrote:
>>> > >>> Hi, I am starting coding in r and one of the things that i want to
>>> do
>>> > is
>>> > >> to
>>> > >>> scrape some data from the web.
>>> > >>> The problem that I am having is that I cannot get passed the
>>> disclaimer
>>> > >>> page (which produces a session cookie). I have been able to collect
>>> > some
>>> > >>> ideas and combine them in the code below but I dont get passed the
>>> > >>> disclaimer page.
>>> > >>> I am trying to agree the disclaimer with the postForm and write the
>>> > >> cookie
>>> > >>> to a file, but I cannot do it succesfully....
>>> > >>> The webpage cookies are written to the file but the value is
>>> FALSE...
>>> > So
>>> > >>> any ideas of what I should do or what I am doing wrong with?
>>> > >>> Thank you for your help,
>>> > >>>
>>> > >>> library(RCurl)
>>> > >>> library(XML)
>>> > >>>
>>> > >>> site <- "
>>> > >>>
>>> > >>
>>> >
>>> http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=text&stn=05ND012&prm1=3&syr=2012&smo=09&sday=15&eyr=2012&emo=09&eday=18
>>> > >> "
>>> > >>>
>>> > >>> postForm(site, disclaimer_action="I Agree")
>>> > >>>
>>> > >>> cf <- "cookies.txt"
>>> > >>>
>>> > >>> no_cookie <- function() {
>>> > >>>         curlHandle <- getCurlHandle(cookiefile=cf, cookiejar=cf)
>>> > >>>         getURL(site, curl=curlHandle)
>>> > >>>
>>> > >>>         rm(curlHandle)
>>> > >>>         gc()
>>> > >>> }
>>> > >>>
>>> > >>> if ( file.exists(cf) == TRUE ) {
>>> > >>>         file.create(cf)
>>> > >>>         no_cookie()
>>> > >>> }
>>> > >>> allTables <- readHTMLTable(site)
>>> > >>> allTables
>>> > >>>
>>> > >>>       [[alternative HTML version deleted]]
>>> > >>>
>>> > >>> ______________________________________________
>>> > >>> R-help@r-project.org mailing list
>>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> > >>> PLEASE do read the posting guide
>>> > >> http://www.R-project.org/posting-guide.html
>>> > >>> and provide commented, minimal, self-contained, reproducible code.
>>> > >>>
>>> > >>>
>>> > >>
>>> > >> ______________________________________________
>>> > >> R-help@r-project.org mailing list
>>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> > >> PLEASE do read the posting guide
>>> > >> http://www.R-project.org/posting-guide.html
>>> > >> and provide commented, minimal, self-contained, reproducible code.
>>> > >>
>>> > >
>>> > >       [[alternative HTML version deleted]]
>>> > >
>>> > > ______________________________________________
>>> > > R-help@r-project.org mailing list
>>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > > and provide commented, minimal, self-contained, reproducible code.
>>> > >
>>> > >
>>> >
>>> > ______________________________________________
>>> > R-help@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to