Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?

2013-11-26 Thread Luciano Montanaro
As you say, Python is not the solution, short term...
Plus I don't really want to use it. I suppose it may make some task
easier, if you have the right library... But it adds another
interpreted language in the mix, and I would rather avoid it.

Anyway, I am not using A WebView at all: I cerate a WebPage, then use
setHtml on the file I downloaded, then walk the dom for the nodes I
need.

I tried with the Xmlparser first, since the site advertises xhtml...
but the xml is really broken.

If I have to rewrite the data miner, i will simply go over the html
to match the right regexp.
It looks like the tidy library could do what I need; it is a
dependency as well, but it should be a ligher one than Python (and
qtWebkit, but that was very convenient...).

Also, I have one data source for now, but I expect to have at least
one more in the future, possibly more if I find data sources in other
countirs, so I will need different data extractors.

In any case... The library will still be there, right? This will only
prevent my application to be allowed on the Harbour store? I could
live with that; my application could live in a third party repository.

Luciano

On Tue, Nov 26, 2013 at 7:05 AM, Thomas Perl th.p...@gmail.com wrote:
 Hi,

 2013/11/26 Luciano Montanaro mikel...@gmail.com:
 On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote:
 [...]
 My application too depends on it to scrape data from a web page. I need the
 QWebElement interface, otherwise I will need to parse the html on my own.
 [...]
 Well, access to the DOM model...

 Depending on how JavaScript-laden the page you are trying to scrape
 is, something like BeautifulSoup or Mechanize (both written in Python;
 the latter one might sound familiar to Perl programmers, it’s designed
 after WWW:Mechanize) might do the job, and in a more lightweight way
 (no need to download images or execute JS / layout the page for simple
 scraping):

  http://www.crummy.com/software/BeautifulSoup/
  http://wwwsearch.sourceforge.net/mechanize/

 Of course, this drags in a new dependency that also isn’t supported at
 the moment (Python), but as mentioned in the announcement[1], we are
 actively working on getting Python support into shape”, and once that
 will be supported (PyOtherSide QML Plugin), it might be easier to
 integrate and more efficient than moving the whole webpage through a
 WebView and going through that with the DOM.

 And if your page is JavaScript-laden, and you can’t parse the static
 HTML using BeautifulSoup or Mechanize, chances are the data parsed by
 JavaScript is also available as JSON somewhere (just look into the
 webpage code / watch the traffic) - and that’ll definitely be easier
 to parse, too :)

 HTH :)
 Thomas

 [1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html
 ___
 SailfishOS.org Devel mailing list



-- 
Luciano Montanaro

Anyone who is capable of getting themselves made President should on
no account be allowed to do the job. -- Douglas Adams
___
SailfishOS.org Devel mailing list

Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?

2013-11-26 Thread Artem Marchenko
SilicaWebView does more than just wraps QtWebKit's WebView. Particularly
SilicaWebView also is somehow caring about pulley menu integration and the
rest of SilicaFlickable-like stuff that.. causes issues if you want to to
have an address bar above WebView and therefore need to wrap everything
into an outer SilicaFlickable.
Oh well, maybe I just mess up something with touch handling.

A bit more serious issue is that with not permitted QtWebKit import we
cannot use WebView constants anymore. E.g. WebView.LoadSucceededStatus

For now I have to hardcode them.
In general should these constants be copied to SilicaWebView?

Cheers,
Artem.




On Tue, Nov 26, 2013 at 3:07 AM, Robin Burchell robin.burch...@jolla.comwrote:

  Hi,

  The reason for not whitelisting QtWebKit is a bit different here: that
 we don’t want to promise an API that we cannot promise to continue to
 support. While QtWebKit may continue to limp along for a few years yet, it
 has been removed from upstream webkit, and has no real active maintainers
 that I am aware of. The unfortunate reality is that we are not in a
 position where we can take on the sole maintenance of a web engine (which
 is a rather large and complex piece of software).

  We do offer SilicaWebView (in Silica) as a component that does not
 expose any engine/implementation details (meaning that we can change the
 implementation to use QtWebEngine, or Gecko, or whatever suits us / works
 best for the purpose). It should be good enough for simple cases. If you’re
 lacking something from it, please ask away :)

  BR,
 Robin

  On 26 Nov 2013, at 02:02, Artem Marchenko artem.marche...@gmail.com
 wrote:

  Hi all

  One of the rejection messages I've got in harbour is the following:
 -
 In ./usr/share/wikipedia/pages/MainWikipediaPage.qml the 'QtWebKit 3.0'
 is not allowed
  -

  Is WebKit really not allowed? Just double checking as I thought that
 it's API/ABI is to be very stable at the times when it's going to retire -
 http://blog.qt.digia.com/blog/2013/09/12/introducing-the-qt-webengine/ (thanks
 to John Brooks for quickly locating the link).

  Shouldn't QtWebKit import be whitelisted?

  Best regards,
 Artem.

  --
 Artem Marchenko
 http://agilesoftwaredevelopment.com
 http://twitter.com/AgileArtem
  ___
 SailfishOS.org Devel mailing list



 ___
 SailfishOS.org Devel mailing list




-- 
Artem Marchenko
http://agilesoftwaredevelopment.com
http://twitter.com/AgileArtem
___
SailfishOS.org Devel mailing list

Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?

2013-11-25 Thread Luciano Montanaro
On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote:

 Hi,

 The reason for not whitelisting QtWebKit is a bit different here: that we
don’t want to promise an API that we cannot promise to continue to support.
While QtWebKit may continue to limp along for a few years yet, it has been
removed from upstream webkit, and has no real active maintainers that I am
aware of. The unfortunate reality is that we are not in a position where we
can take on the sole maintenance of a web engine (which is a rather large
and complex piece of software).


That is very unfortunate, if true, but as OI understand the matter,
qtwebkit will not go away anytime soon...

My application too depends on it to scrape data from a web page. I need the
QWebElement interface, otherwise I will need to parse the html on my own.

 We do offer SilicaWebView (in Silica) as a component that does not expose
any engine/implementation details (meaning that we can change the
implementation to use QtWebEngine, or Gecko, or whatever suits us / works
best for the purpose). It should be good enough for simple cases. If you’re
lacking something from it, please ask away :)


Well, access to the DOM model...
I don't really find the current trend being an improvement... qtwebkit2 was
a functional regression already, and qtwebengine, while still an unknown,
seems to be even more restricted.

 BR,
 Robin

 On 26 Nov 2013, at 02:02, Artem Marchenko artem.marche...@gmail.com
wrote:

 Hi all

 One of the rejection messages I've got in harbour is the following:
 -
 In ./usr/share/wikipedia/pages/MainWikipediaPage.qml the 'QtWebKit 3.0'
 is not allowed
 -

 Is WebKit really not allowed? Just double checking as I thought that
it's API/ABI is to be very stable at the times when it's going to retire -
http://blog.qt.digia.com/blog/2013/09/12/introducing-the-qt-webengine/ (thanks
to John Brooks for quickly locating the link).

 Shouldn't QtWebKit import be whitelisted?

 Best regards,
 Artem.

 --
 Artem Marchenko
 http://agilesoftwaredevelopment.com
 http://twitter.com/AgileArtem
 ___
 SailfishOS.org Devel mailing list



 ___
 SailfishOS.org Devel mailing list
___
SailfishOS.org Devel mailing list

Re: [SailfishDevel] QtWebKit module - shouldm't it be whitelisted?

2013-11-25 Thread Thomas Perl
Hi,

2013/11/26 Luciano Montanaro mikel...@gmail.com:
 On Nov 26, 2013 2:07 AM, Robin Burchell robin.burch...@jolla.com wrote:
 [...]
 My application too depends on it to scrape data from a web page. I need the
 QWebElement interface, otherwise I will need to parse the html on my own.
 [...]
 Well, access to the DOM model...

Depending on how JavaScript-laden the page you are trying to scrape
is, something like BeautifulSoup or Mechanize (both written in Python;
the latter one might sound familiar to Perl programmers, it’s designed
after WWW:Mechanize) might do the job, and in a more lightweight way
(no need to download images or execute JS / layout the page for simple
scraping):

 http://www.crummy.com/software/BeautifulSoup/
 http://wwwsearch.sourceforge.net/mechanize/

Of course, this drags in a new dependency that also isn’t supported at
the moment (Python), but as mentioned in the announcement[1], we are
actively working on getting Python support into shape”, and once that
will be supported (PyOtherSide QML Plugin), it might be easier to
integrate and more efficient than moving the whole webpage through a
WebView and going through that with the DOM.

And if your page is JavaScript-laden, and you can’t parse the static
HTML using BeautifulSoup or Mechanize, chances are the data parsed by
JavaScript is also available as JSON somewhere (just look into the
webpage code / watch the traffic) - and that’ll definitely be easier
to parse, too :)

HTH :)
Thomas

[1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html
___
SailfishOS.org Devel mailing list