subject:"\"Re\\\: How to recover text from a web page\""

Re: How to recover text from a web page

2010-09-22 Thread Jim Ault

Try using the url of the iframe. This will get that HTML only, but  
that could be all you need.


 Most sites will work easily, but some have a security variable that  
is sent along to the iframe url

that signals 'intended use'.


The include should simply add the text of another file to the current  
file, effectively inserting that text at the location of the include.   
The result should be that you will get the HTML without any other steps.



On Sep 22, 2010, at 6:45 PM, David C. wrote:

perhaps there's an iFrame  or include in the html that references  
another

page --



There is definitely an iFrame involved and they have it pointing to
some sort of PHP "servlet" hosted from an entirely different domain.
Doesn't look to be much you can do with that combination.



Jim Ault
Las Vegas



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

2010-09-22 Thread David C.

> perhaps there's an iFrame  or include in the html that references another
> page --
>

There is definitely an iFrame involved and they have it pointing to
some sort of PHP "servlet" hosted from an entirely different domain.
Doesn't look to be much you can do with that combination.

Best regards,
David C.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

2010-09-22 Thread stephen barncard

perhaps there's an iFrame  or include in the html that references another
page --

On 22 September 2010 12:14, Sumner, Walt  wrote:

> Thanks for the lead on screen scrapes, but the problem is there is nothing
> to scrape. The "put URL(...)" and "revBrowserGet(tBrowserId,"htmltext")"
> return the html, but not all of the text that is displayed on the page.
>


Stephen Barncard
San Francisco Ca. USA

more about sqb  
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

2010-09-22 Thread Sumner, Walt

Thanks for the lead on screen scrapes, but the problem is there is nothing
to scrape. The "put URL(...)" and "revBrowserGet(tBrowserId,"htmltext")"
return the html, but not all of the text that is displayed on the page.

In fact, if I use Word's merge documents tool to compare the html from pages
2, 9, and 256 of the petition, there is NO DIFFERENCE in the files. The
petition signatures and comments are embedded in a petition widget, I think,
which I suppose is some javascript applet. Whatever it is, the html
definitely does not contain the petition text that I want to evaluate.

Nevertheless it is trivial to manually select and copy all of the text on
the page. Once it is copied it is easy to automatically paste it, scrape it
(that code works fine), and store data using LiveCode, but I do not see a
way to select and copy text from this widget using LiveCode.

> On Tue, 21 Sep 2010 22:23:17, stephen barncard wrote:
> Why bother with revBrowser at all?  Just  do this in the message box:
> 
> put URL(http://website.com/page.html)
> 
>  and this will put the website html into the message box output. Obviously
> you could do this with fields.
> 
> Check out Jerry's videos on Screen Scraping:
> 
> http://revmentor.com/business-logic-screen-scraping-1
> http://revmentor.com/business-logic-screen-scraping-0
> 
> 
> On 21 September 2010 22:16, Sumner, Walt  wrote:
> 
>> I am trying to recover text from this web page and all of its siblings:
>> 
>> 
>> http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-avail
>> able/#sigs/691732733/user/1
>> 
>> The interesting part of the page is the comments, which do not appear in
>> the HTML, but which can be copied manually. I can open this page in a
>> browser in LiveCode. With manual mouse motions, I can double click a block
>> of text, choose "Select All" from the "Edit" menu, choose "Copy" from the
>> "Edit" menu, and then paste into a field where the comments all appear and
>> are easy to disassemble.
>> 
>> Unfortunately, the revbrowser set command and get function do not do
>> anything comparable AFAICT. The "Select All" choice is not implemented in
>> the DoMenu command. I think that printing a pdf is also out. So, any
>> thoughts on how to automate this part of a petition review? For instance,
>> maybe there is a simple way to save the text to a file with the
>> revBrowserExecuteScript function (using JavaScript for Safari)?
>> 
>> BTW, the browser is fully capable of crashing LiveCode on at least some OSX
>> machines. Please don't lose any work for me.
>> 
>> Thanks,
>> 
>> Walt___
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>> 

Walton Sumner
 


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

2010-09-21 Thread Shadow Slash

Hi Walt,

Umm, if I got it right, you simply want to get the source code of that certain 
page? If that's what you want to do, you can just use the revBrowserGet() 
function and retrieve its "htmltext" property.

Best regards,
Shedo Surashu
www.ShadowSlash.tk

Connect with me on LinkedIn. (http://ph.linkedin.com/in/shadowslash)


--- On Wed, 22/9/10, Sumner, Walt  wrote:

> From: Sumner, Walt 
> Subject: How to recover text from a web page
> To: "use-revolution@lists.runrev.com" 
> Date: Wednesday, 22 September, 2010, 5:16 AM
> I am trying to recover text from this
> web page and all of its siblings:
> 
> http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1
> 
> The interesting part of the page is the comments, which do
> not appear in the HTML, but which can be copied manually. I
> can open this page in a browser in LiveCode. With manual
> mouse motions, I can double click a block of text, choose
> "Select All" from the "Edit" menu, choose "Copy" from the
> "Edit" menu, and then paste into a field where the comments
> all appear and are easy to disassemble. 
> 
> Unfortunately, the revbrowser set command and get function
> do not do anything comparable AFAICT. The "Select All"
> choice is not implemented in the DoMenu command. I think
> that printing a pdf is also out. So, any thoughts on how to
> automate this part of a petition review? For instance, maybe
> there is a simple way to save the text to a file with the
> revBrowserExecuteScript function (using JavaScript for
> Safari)?
> 
> BTW, the browser is fully capable of crashing LiveCode on
> at least some OSX machines. Please don't lose any work for
> me.
> 
> Thanks,
> 
> Walt___
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage
> your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
> 



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

2010-09-21 Thread stephen barncard

Why bother with revBrowser at all?  Just  do this in the message box:

put URL(http://website.com/page.html)

 and this will put the website html into the message box output. Obviously
you could do this with fields.

Check out Jerry's videos on Screen Scraping:

http://revmentor.com/business-logic-screen-scraping-1
http://revmentor.com/business-logic-screen-scraping-0


On 21 September 2010 22:16, Sumner, Walt  wrote:

> I am trying to recover text from this web page and all of its siblings:
>
>
> http://www.thepetitionsite.com/1/keep-life-saving-electronic-cigarettes-available/#sigs/691732733/user/1
>
> The interesting part of the page is the comments, which do not appear in
> the HTML, but which can be copied manually. I can open this page in a
> browser in LiveCode. With manual mouse motions, I can double click a block
> of text, choose "Select All" from the "Edit" menu, choose "Copy" from the
> "Edit" menu, and then paste into a field where the comments all appear and
> are easy to disassemble.
>
> Unfortunately, the revbrowser set command and get function do not do
> anything comparable AFAICT. The "Select All" choice is not implemented in
> the DoMenu command. I think that printing a pdf is also out. So, any
> thoughts on how to automate this part of a petition review? For instance,
> maybe there is a simple way to save the text to a file with the
> revBrowserExecuteScript function (using JavaScript for Safari)?
>
> BTW, the browser is fully capable of crashing LiveCode on at least some OSX
> machines. Please don't lose any work for me.
>
> Thanks,
>
> Walt___
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



-- 



Stephen Barncard
San Francisco Ca. USA

more about sqb  
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to recover text from a web page

Re: How to recover text from a web page

Re: How to recover text from a web page

Re: How to recover text from a web page

Re: How to recover text from a web page

Re: How to recover text from a web page

6 matches

Site Navigation

Mail list logo

Footer information