[phpug] Re: regex and scraping html page

phpug Wed, 15 Jul 2009 22:05:53 -0700

On 16/07/2009, at 4:35 PM, Bob M Brown wrote:
> I recently blogged about my experiences with this:
>     http://www.guru.net.nz/blog/2009/06/screen-scraping-with-jquery.html
>
> Also the Simple DOM HTML Parser looked promising:
>     http://simplehtmldom.sourceforge.net/


I'm planning to use QueryPath next time I need to do screen-scraping in PHP.
http://querypath.org/

It claims to handle both HTML and XHTML, and offers jQuery selectors and
chaining.
e.g
http://api.querypath.org/docs/__examplesource/exsource_s_mbutcher_Code_QueryPath_examples_musicbrainz.php_1c3311dbd9b75031b51315d335999686.html

Regards
Jonathan

>
> ctx2002 wrote:
>> Jochen was posted a question about use regex to extract information
>> from HTML page.
>>
>> as every one can see, the regex is not easy to read and understand.
>>
>> I was thinking why not use xslt to process HTML file? PHP 5 has good
>> support for xslt processor.
>>
>> only extra step we need is to use HTML tidy program to make HTML page
>> "xml well form".
>>
>> for me, xsl file is easier to understand then regex expression.
>>
>> are there other way/tools to extra information from HTML without use
>> regex?
>>
>>
>> >
>>
>
>
> >
>



--~--~---------~--~----~------------~-------~--~----~
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]
-~----------~----~----~----~------~----~------~--~---

[phpug] Re: regex and scraping html page

Reply via email to