Hi,

2013/11/26 Luciano Montanaro <mikel...@gmail.com>:
> On Nov 26, 2013 2:07 AM, "Robin Burchell" <robin.burch...@jolla.com> wrote:
> [...]
> My application too depends on it to scrape data from a web page. I need the
> QWebElement interface, otherwise I will need to parse the html on my own.
> [...]
> Well, access to the DOM model...

Depending on how JavaScript-laden the page you are trying to scrape
is, something like BeautifulSoup or Mechanize (both written in Python;
the latter one might sound familiar to Perl programmers, it’s designed
after WWW:Mechanize) might do the job, and in a more lightweight way
(no need to download images or execute JS / layout the page for simple
scraping):

 http://www.crummy.com/software/BeautifulSoup/
 http://wwwsearch.sourceforge.net/mechanize/

Of course, this drags in a new dependency that also isn’t supported at
the moment (Python), but as mentioned in the announcement[1], "we are
actively working on getting Python support into shape”, and once that
will be supported (PyOtherSide QML Plugin), it might be easier to
integrate and more efficient than moving the whole webpage through a
WebView and going through that with the DOM.

And if your page is JavaScript-laden, and you can’t parse the static
HTML using BeautifulSoup or Mechanize, chances are the data parsed by
JavaScript is also available as JSON somewhere (just look into the
webpage code / watch the traffic) - and that’ll definitely be easier
to parse, too :)

HTH :)
Thomas

[1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html
_______________________________________________
SailfishOS.org Devel mailing list

Reply via email to