[greasemonkey-users] Re: Parse HTML like the Gods ;)

esquifit Mon, 05 Jan 2009 14:04:22 -0800

On Mon, Jan 5, 2009 at 8:56 PM, Johan Sundström <[email protected]> wrote:
>
> On Wed, Dec 31, 2008 at 11:33 AM, esquifit <[email protected]> wrote:
>> Yahoo! has launched a service (YQL) that makes possible to get a well-formed
>> XML document from any publicly accessible web page, even if the page uses
>> HTML instead of XHTML.  Furthermore, one can fetch only a portion of the xml
>> by specifying an xpath selector.
>
> Interesting hack, though their HTML parser doesn't seem to do a very
> good job. It fails for even some w3c valid HTML like
> http://www.lysator.liu.se/~jhs/test-ml.html when I test it at their
> http://developer.yahoo.com/yql/console/ console (query: select * from
> html where url="http://johan.dev.mashlogic.com/test/index.html";).



Right. Or almost right.  They are aware of the problem and they're
working in improving the service, see [1].
Surprisingly, the described procedure seems to work only with html,
not with xhtml, as in your example, but in this case you can use a
different approach: if you insist in using YQL, you can query the xml
'table' instead of html:

select * from xml where url='http://johan.dev.mashlogic.com/test/index.html'

Besides this, if you are only interested in parsing an xhtml document
from within Greasemonkey, you can DOMParse the responseText property
directly.

[1] http://developer.yahoo.net/forum/index.php?showtopic=508

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"greasemonkey-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/greasemonkey-users?hl=en
-~----------~----~----~----~------~----~------~--~---

[greasemonkey-users] Re: Parse HTML like the Gods ;)

Reply via email to