On 16/07/2009, at 4:35 PM, Bob M Brown wrote: > I recently blogged about my experiences with this: > http://www.guru.net.nz/blog/2009/06/screen-scraping-with-jquery.html > > Also the Simple DOM HTML Parser looked promising: > http://simplehtmldom.sourceforge.net/
I'm planning to use QueryPath next time I need to do screen-scraping in PHP. http://querypath.org/ It claims to handle both HTML and XHTML, and offers jQuery selectors and chaining. e.g http://api.querypath.org/docs/__examplesource/exsource_s_mbutcher_Code_QueryPath_examples_musicbrainz.php_1c3311dbd9b75031b51315d335999686.html Regards Jonathan > > ctx2002 wrote: >> Jochen was posted a question about use regex to extract information >> from HTML page. >> >> as every one can see, the regex is not easy to read and understand. >> >> I was thinking why not use xslt to process HTML file? PHP 5 has good >> support for xslt processor. >> >> only extra step we need is to use HTML tidy program to make HTML page >> "xml well form". >> >> for me, xsl file is easier to understand then regex expression. >> >> are there other way/tools to extra information from HTML without use >> regex? >> >> >> > >> > > > > > --~--~---------~--~----~------------~-------~--~----~ NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected] -~----------~----~----~----~------~----~------~--~---
