Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?
As you say, Python is not the solution, short term... Plus I don't really want to use it. I suppose it may make some task easier, if you have the right library... But it adds another interpreted language in the mix, and I would rather avoid it. Anyway, I am not using A WebView at all: I cerate a WebPage, then use setHtml on the file I downloaded, then walk the dom for the nodes I need. I tried with the Xmlparser first, since the site advertises xhtml... but the xml is really broken. If I have to rewrite the data miner, i will simply go over the html to match the right regexp. It looks like the tidy library could do what I need; it is a dependency as well, but it should be a ligher one than Python (and qtWebkit, but that was very convenient...). Also, I have one data source for now, but I expect to have at least one more in the future, possibly more if I find data sources in other countirs, so I will need different data extractors. In any case... The library will still be there, right? This will only prevent my application to be allowed on the Harbour store? I could live with that; my application could live in a third party repository. Luciano On Tue, Nov 26, 2013 at 7:05 AM, Thomas Perl th.p...@gmail.com wrote: Hi, 2013/11/26 Luciano Montanaro mikel...@gmail.com: On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote: [...] My application too depends on it to scrape data from a web page. I need the QWebElement interface, otherwise I will need to parse the html on my own. [...] Well, access to the DOM model... Depending on how JavaScript-laden the page you are trying to scrape is, something like BeautifulSoup or Mechanize (both written in Python; the latter one might sound familiar to Perl programmers, it’s designed after WWW:Mechanize) might do the job, and in a more lightweight way (no need to download images or execute JS / layout the page for simple scraping): http://www.crummy.com/software/BeautifulSoup/ http://wwwsearch.sourceforge.net/mechanize/ Of course, this drags in a new dependency that also isn’t supported at the moment (Python), but as mentioned in the announcement[1], we are actively working on getting Python support into shape”, and once that will be supported (PyOtherSide QML Plugin), it might be easier to integrate and more efficient than moving the whole webpage through a WebView and going through that with the DOM. And if your page is JavaScript-laden, and you can’t parse the static HTML using BeautifulSoup or Mechanize, chances are the data parsed by JavaScript is also available as JSON somewhere (just look into the webpage code / watch the traffic) - and that’ll definitely be easier to parse, too :) HTH :) Thomas [1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html ___ SailfishOS.org Devel mailing list -- Luciano Montanaro Anyone who is capable of getting themselves made President should on no account be allowed to do the job. -- Douglas Adams ___ SailfishOS.org Devel mailing list
Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?
SilicaWebView does more than just wraps QtWebKit's WebView. Particularly SilicaWebView also is somehow caring about pulley menu integration and the rest of SilicaFlickable-like stuff that.. causes issues if you want to to have an address bar above WebView and therefore need to wrap everything into an outer SilicaFlickable. Oh well, maybe I just mess up something with touch handling. A bit more serious issue is that with not permitted QtWebKit import we cannot use WebView constants anymore. E.g. WebView.LoadSucceededStatus For now I have to hardcode them. In general should these constants be copied to SilicaWebView? Cheers, Artem. On Tue, Nov 26, 2013 at 3:07 AM, Robin Burchell robin.burch...@jolla.comwrote: Hi, The reason for not whitelisting QtWebKit is a bit different here: that we don’t want to promise an API that we cannot promise to continue to support. While QtWebKit may continue to limp along for a few years yet, it has been removed from upstream webkit, and has no real active maintainers that I am aware of. The unfortunate reality is that we are not in a position where we can take on the sole maintenance of a web engine (which is a rather large and complex piece of software). We do offer SilicaWebView (in Silica) as a component that does not expose any engine/implementation details (meaning that we can change the implementation to use QtWebEngine, or Gecko, or whatever suits us / works best for the purpose). It should be good enough for simple cases. If you’re lacking something from it, please ask away :) BR, Robin On 26 Nov 2013, at 02:02, Artem Marchenko artem.marche...@gmail.com wrote: Hi all One of the rejection messages I've got in harbour is the following: - In ./usr/share/wikipedia/pages/MainWikipediaPage.qml the 'QtWebKit 3.0' is not allowed - Is WebKit really not allowed? Just double checking as I thought that it's API/ABI is to be very stable at the times when it's going to retire - http://blog.qt.digia.com/blog/2013/09/12/introducing-the-qt-webengine/ (thanks to John Brooks for quickly locating the link). Shouldn't QtWebKit import be whitelisted? Best regards, Artem. -- Artem Marchenko http://agilesoftwaredevelopment.com http://twitter.com/AgileArtem ___ SailfishOS.org Devel mailing list ___ SailfishOS.org Devel mailing list -- Artem Marchenko http://agilesoftwaredevelopment.com http://twitter.com/AgileArtem ___ SailfishOS.org Devel mailing list
Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?
On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote: Hi, The reason for not whitelisting QtWebKit is a bit different here: that we don’t want to promise an API that we cannot promise to continue to support. While QtWebKit may continue to limp along for a few years yet, it has been removed from upstream webkit, and has no real active maintainers that I am aware of. The unfortunate reality is that we are not in a position where we can take on the sole maintenance of a web engine (which is a rather large and complex piece of software). That is very unfortunate, if true, but as OI understand the matter, qtwebkit will not go away anytime soon... My application too depends on it to scrape data from a web page. I need the QWebElement interface, otherwise I will need to parse the html on my own. We do offer SilicaWebView (in Silica) as a component that does not expose any engine/implementation details (meaning that we can change the implementation to use QtWebEngine, or Gecko, or whatever suits us / works best for the purpose). It should be good enough for simple cases. If you’re lacking something from it, please ask away :) Well, access to the DOM model... I don't really find the current trend being an improvement... qtwebkit2 was a functional regression already, and qtwebengine, while still an unknown, seems to be even more restricted. BR, Robin On 26 Nov 2013, at 02:02, Artem Marchenko artem.marche...@gmail.com wrote: Hi all One of the rejection messages I've got in harbour is the following: - In ./usr/share/wikipedia/pages/MainWikipediaPage.qml the 'QtWebKit 3.0' is not allowed - Is WebKit really not allowed? Just double checking as I thought that it's API/ABI is to be very stable at the times when it's going to retire - http://blog.qt.digia.com/blog/2013/09/12/introducing-the-qt-webengine/ (thanks to John Brooks for quickly locating the link). Shouldn't QtWebKit import be whitelisted? Best regards, Artem. -- Artem Marchenko http://agilesoftwaredevelopment.com http://twitter.com/AgileArtem ___ SailfishOS.org Devel mailing list ___ SailfishOS.org Devel mailing list ___ SailfishOS.org Devel mailing list
Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?
Hi, 2013/11/26 Luciano Montanaro mikel...@gmail.com: On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote: [...] My application too depends on it to scrape data from a web page. I need the QWebElement interface, otherwise I will need to parse the html on my own. [...] Well, access to the DOM model... Depending on how JavaScript-laden the page you are trying to scrape is, something like BeautifulSoup or Mechanize (both written in Python; the latter one might sound familiar to Perl programmers, it’s designed after WWW:Mechanize) might do the job, and in a more lightweight way (no need to download images or execute JS / layout the page for simple scraping): http://www.crummy.com/software/BeautifulSoup/ http://wwwsearch.sourceforge.net/mechanize/ Of course, this drags in a new dependency that also isn’t supported at the moment (Python), but as mentioned in the announcement[1], we are actively working on getting Python support into shape”, and once that will be supported (PyOtherSide QML Plugin), it might be easier to integrate and more efficient than moving the whole webpage through a WebView and going through that with the DOM. And if your page is JavaScript-laden, and you can’t parse the static HTML using BeautifulSoup or Mechanize, chances are the data parsed by JavaScript is also available as JSON somewhere (just look into the webpage code / watch the traffic) - and that’ll definitely be easier to parse, too :) HTH :) Thomas [1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html ___ SailfishOS.org Devel mailing list